Machine-learning based approach to examine ecological processes influencing the diversity of riverine dissolved organic matter composition

dc.contributor.author Muller, Moritz
dc.contributor.author D’Andrilli, Juliana
dc.contributor.author Silverman, Victoria
dc.contributor.author Bier, Raven L.
dc.contributor.author Barnard, Malcolm A.
dc.contributor.author Lee, Miko Chang May
dc.contributor.author Richard, Florina
dc.contributor.author Tanentzap, Andrew J.
dc.contributor.author Wang, Jianjun
dc.contributor.author de Melo, Michaela
dc.contributor.author Lu, YueHan
dc.date.accessioned 2025-01-24T18:57:31Z
dc.date.available 2025-01-24T18:57:31Z
dc.date.issued 2024-05-01
dc.description © The Author(s), 2024. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Müller, M., D’Andrilli, J., Silverman, V., Bier, R. L., Barnard, M. A., Lee, M. C. M., Richard, F., Tanentzap, A. J., Wang, J., de Melo, M., & Lu, Y. (2024). Machine-learning based approach to examine ecological processes influencing the diversity of riverine dissolved organic matter composition. Frontiers in Water, 6, https://doi.org/10.3389/frwa.2024.1379284.
dc.description.abstract Dissolved organic matter (DOM) assemblages in freshwater rivers are formed from mixtures of simple to complex compounds that are highly variable across time and space. These mixtures largely form due to the environmental heterogeneity of river networks and the contribution of diverse allochthonous and autochthonous DOM sources. Most studies are, however, confined to local and regional scales, which precludes an understanding of how these mixtures arise at large, e.g., continental, spatial scales. The processes contributing to these mixtures are also difficult to study because of the complex interactions between various environmental factors and DOM. Here we propose the use of machine learning (ML) approaches to identify ecological processes contributing toward mixtures of DOM at a continental-scale. We related a dataset that characterized the molecular composition of DOM from river water and sediment with Fourier-transform ion cyclotron resonance mass spectrometry to explanatory physicochemical variables such as nutrient concentrations and stable water isotopes (2H and 18O). Using unsupervised ML, distinctive clusters for sediment and water samples were identified, with unique molecular compositions influenced by environmental factors like terrestrial input and microbial activity. Sediment clusters showed a higher proportion of protein-like and unclassified compounds than water clusters, while water clusters exhibited a more diversified chemical composition. We then applied a supervised ML approach, involving a two-stage use of SHapley Additive exPlanations (SHAP) values. In the first stage, SHAP values were obtained and used to identify key physicochemical variables. These parameters were employed to train models using both the default and subsequently tuned hyperparameters of the Histogram-based Gradient Boosting (HGB) algorithm. The supervised ML approach, using HGB and SHAP values, highlighted complex relationships between environmental factors and DOM diversity, in particular the existence of dams upstream, precipitation events, and other watershed characteristics were important in predicting higher chemical diversity in DOM. Our data-driven approach can now be used more generally to reveal the interplay between physical, chemical, and biological factors in determining the diversity of DOM in other ecosystems.
dc.description.sponsorship The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by the United States Department of Energy, Office of Science, Office of Biological and Environmental Research, and Environmental System Science (ESS) Program. This contribution originates from the River Corridor Scientific Focus Area (SFA) project at Pacific Northwest National Laboratory (PNNL). PNNL is operated by Battelle Memorial Institute for the United States Department of Energy under Contract No. DE-AC05-76RL01830. This material is based upon work supported by the Department of Energy Office of Environmental Management under Award Number DE-EM0005228 to the University of Georgia Research Foundation for co-author RB.
dc.identifier.citation Müller, M., D’Andrilli, J., Silverman, V., Bier, R. L., Barnard, M. A., Lee, M. C. M., Richard, F., Tanentzap, A. J., Wang, J., de Melo, M., & Lu, Y. (2024). Machine-learning based approach to examine ecological processes influencing the diversity of riverine dissolved organic matter composition. Frontiers in Water, 6.
dc.identifier.doi 10.3389/frwa.2024.1379284
dc.identifier.uri https://hdl.handle.net/1912/71276
dc.publisher Frontiers Media
dc.relation.uri https://doi.org/10.3389/frwa.2024.1379284
dc.rights Attribution 4.0 International
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject DOM
dc.subject River networks
dc.subject FTICR-MS
dc.subject Molecular composition
dc.subject Random forest
dc.subject Cluster analysis
dc.subject Ecosystem properties
dc.subject Unsupervised machine learning
dc.title Machine-learning based approach to examine ecological processes influencing the diversity of riverine dissolved organic matter composition
dc.type Article
dspace.entity.type Publication
relation.isAuthorOfPublication a20f6b32-a4b6-4e96-891c-b691165d4e9f
relation.isAuthorOfPublication b7e2baf7-30a0-46ea-b292-6f9781a7e10c
relation.isAuthorOfPublication e4fb0478-89e8-4150-aabf-7aa5b2099dd6
relation.isAuthorOfPublication.latestForDiscovery a20f6b32-a4b6-4e96-891c-b691165d4e9f
Files
Original bundle
Now showing 1 - 2 of 2
Thumbnail Image
Name:
MullerM_2024.pdf
Size:
1.57 MB
Format:
Adobe Portable Document Format
Description:
Thumbnail Image
Name:
MullerM_2024supplementary.pdf
Size:
1.77 MB
Format:
Adobe Portable Document Format
Description: