Correction
19 Dec 2023: The PLOS Water Staff (2023) Correction: Convergence of mechanistic modeling and artificial intelligence in hydrologic science and engineering. PLOS Water 2(12): e0000212. https://doi.org/10.1371/journal.pwat.0000212 View correction
Figures
Abstract
Hydrology is a mature physical science based on application of first principles. However, the water system is complex and its study requires analysis of increasingly large data available from conventional and novel remote sensing and IoT sensor technologies. New data-driven approaches like Artificial Intelligence (AI) and Machine Learning (ML) are attracting much “hype” despite their apparent limitations (transparency, interpretability, ethics). Some AI/ML applications lack in addressing explicitly important hydrological questions, focusing mainly on “black-box” prediction without providing mechanistic insights. We present a typology of four main types of hydrological problems based on their dominant space and time scales, review their current tools and challenges, and identify important opportunities for AI/ML in hydrology around three main topics: data management, insights and knowledge extraction, and modelling structure. Instead of just for prediction, we propose that AI/ML can be a powerful inductive and exploratory dimension-reduction tool within the rich hydrological toolchest to support the development of new theories that address standing gaps in changing hydrological systems. AI/ML can incorporate other forms of structured and non-structured data and traditional knowledge typically not considered in process-based models. This can help us further advance process-based understanding, forecasting and management of hydrological systems, particularly at larger integrated system scales with big models. We call for reimagining the original definition of AI in hydrology to incorporate not only today’s main focus on learning, but on decision analytics and action rules, and on development of autonomous machines in a continuous cycle of learning and refinement in the context of strong ethical, legal, social, and economic constrains. For this, transdisciplinary communities of knowledge and practice will need to be forged with strong investment from the public sector and private engagement to protect water as a common good under accelerated demand and environmental change.
Citation: Muñoz-Carpena R, Carmona-Cabrero A, Yu Z, Fox G, Batelaan O (2023) Convergence of mechanistic modeling and artificial intelligence in hydrologic science and engineering. PLOS Water 2(8): e0000059. https://doi.org/10.1371/journal.pwat.0000059
Editor: Chandra A. Madramootoo, McGill University, CANADA
Published: August 7, 2023
Copyright: © 2023 Muñoz-Carpena et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: RMC acknowledges support from USDA-NIFA Hatch projects 1024705 and 1024706, and the University of Florida 2021 Artificial Intelligence Seed Funding program. RMC and ACC acknowledge support from the Army Research Office/Army Research Laboratory under award #W911NF1810267 (Multidisciplinary University Research Initiative). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies either expressed or implied of the Army Research Office or the U.S. Government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction: Hydrology and artificial intelligence–a marriage made in heaven?
The water system is complex, i.e., water flows through dissimilar and highly heterogeneous and anisotropic materials, at varying spatial and temporal scales, and interacting with many biogeochemical and human components often in unknown ways. Modern “Hydrology”, as the science of “all things water”, matured in the XIX and early XX centuries as a mechanistic discipline where application of physical “first principles” (conservation of mass, energy, and momentum) sought to explain the occurrence, movement, and fate of water across all environmental (hydrological) compartments like atmosphere, surface, soil, and aquifers. In time, it expanded as a distinct multidisciplinary science at the interface of physics, chemistry, biology, and socioeconomics [1–3]. From early times, quantification in the form of measurements (e.g., rainfall records in India 400 BC [4]) and systematic experimentation (e.g., flow velocity distribution in rivers by Leonardo da Vinci in the XV century Renaissance [5]) became an integral part of the discipline. This was closely followed by the development of hydraulic principles with mathematical equations like Torricelli’s [6] and Pascal’s [7] in the XVII century, surface flow in the XVIII and XIX century (e.g. Chezy’s [8], Manning’s [9], Saint-Venant’s [10]), and subsurface hydrology concepts from the mid-XIX starting with Darcy’s equation [11]. During the 1970’s hydrology emerged as a global discipline during the UNESCO Hydrological Decade [12] when the global water balance was calculated for the first time after the then East and West geopolitical blocks got together to share data. The establishment of the discipline benefited from the advent of computers and simulation models, starting in the mid XX century with the Stanford Watershed Model IV [13], and from automatic in-situ and remote sensing technologies in the late XX century. In recent years, new earth observation platforms, wide availability of the internet and the emergence of the internet cloud, Internet-of-Things (IoT), and high-performance computing with large storage and data mining capabilities has transformed the discipline in what now can be called the “digital water” era [14].
With increasing theoretical advances and available data, the need to account for variability in measurements and process uncertainty has required the use of statistics as a core discipline within hydrology since early times. Elegant theories built on physical and mathematical principles face uncertainty in their practical application to even controlled media like pipes and channels [15], and this increases in the application to open environments like the atmosphere, natural surfaces, soil and groundwater, and their interphases. Further, hydrological outcomes generally exhibit strong spatial and temporal dependence (i.e., structure and autocorrelation). Often, hydrology seeks to understand and predict extreme or low probability events resulting in excess or scarcity of water and its chemical and biological constituents that can be noxious for humans and the environment. Beyond variability, non-stationarity introduced by land use and climate changes also complicates hydrological observation, interpretation, and prediction capacity. In turn, these challenges affect the reliability of hydrological design needed to support development and minimize the human footprint in the environment.
Parallel and synergistic advances in statistics and hydrology are nicely presented in Maidment’s hydrologic statistics timeline [16]. Key milestones include Pearson’s method of moments in the 1900’s; rigorous statistical treatment of flood flows in the 1930’s; Kendall, Weibull and Gumbel’s works in hydrological frequency distributions in the 1940’s; geostatistics, started in the 1960’s (e.g. [17–19]); systems analysis through global sensitivity analysis started by Cukier [20], Tukey’s exploratory data analysis, and Freeze’s groundwater stochastic analysis in the 1970’s; and more recent advances in process and Bayesian ensemble modelling (e.g. [21–24]) and hypothesis-based model testing (e.g. [25–27]) among many others.
Today, with the surge of remote and in-situ observation platforms and the capacity to integrate disparate data sources (structured and unstructured) into big data, new data-driven approaches like the sub-symbolic branch of Artificial Intelligence (AI), i.e., Machine Learning (ML) and Deep Learning (DL) [28], are attracting much “hype” despite their apparent limitations (transparency and interpretability) [29, 30] due to their high prediction accuracy [31]. Off-the-shelf readily applicable ML software are also popularizing the use of these tools in hydrology among the specialists and non-specialists [32]. A quick review of research publications (Mendeley database) in the last decade (Fig 1) shows a rapid increase (10-fold from 2012–2016 to 2017–2021) in “novel applications” of AI and ML methods in water.
Data from Mendeley: search “water” + “artificial intelligence, machine learning”, 2013–2022.
ML approaches offer great predictive performance for direct applications on uncertain processes that we do not intend to fully understand but we need to estimate precisely, such as controlling devices or predictors or more complex systems subject to study. Among ML methods, DL, especially if used to learn simultaneously from spatial and temporal patterns, could make a strong impact in hydrological modeling although this field still needs strong development in its application to hydrology [32, 33].
While some applications of AI are important contributions, others are lacking in addressing explicitly important hydrological questions and often focus only on “black-box” prediction without providing mechanistic insights of the systems studied. In addition to improved prediction accuracy in comparison to mechanistic models [34, 35], these can be used to help uncovering the underlying principles that govern hydrological systems [32, 36, 37], identifying the most important factors contained in the dataset, preparing and imputing the data [38], emulating computationally costly models [39], or downscaling remote sensing products [37].
What are the advantages of this new AI/ML technology compared to existing mechanistic or statistical tools? What are its limitations? When to choose this over mechanistic or established methods for the same problem? What are the uncertainties associated with these methods, and how to deal with them? To address these questions, scientific papers applying AI/ML methods to hydrological research problems should address why the AI/ML approach is more suitable than a conventional mechanistic model.
The challenges of the technological adoption of AI in hydrology can be depicted through Gartner’s Hype curve [40] (Fig 1). After the initial excitement of the “technology trigger” (AI software availability) followed by the current “hype”, not answering these questions adds more complexity to the process of choosing the correct tool to solve specific hydrological problems that will likely lead to failures, provoke”Disillusionment” (Fig 1), and delay the correct adoption (“Production” in Fig 1) of AI/ML [29] as a useful tool within the extensive statistical and mechanistic “hydrologic tool chest”.
Leveraging the new Earth systems data will help optimize our management of the water bodies for sustainability and resilience [31], but this will require overcoming the deficiencies of ML for systems analysis to increase reliability and interpretability of these applications. [41] noted that multiscale, physics-based modeling and ML approaches interact at both the parameter and system level. At the parameter level, the interaction assists with “…constraining parameter spaces, identifying parameter values, and analyzing sensitivity.” At the system level, the interaction is beneficial for “…exploiting the underlying physics, constraining design spaces, and identifying system dynamics.” The combination of mechanistic and AI/ML tools offers potential opportunities to improve the efficiency and reduce the uncertainty of mechanistic hydrological models and support the credibility and robustness of ML predictions [42]. For example, by ensuring that these hybrid models produce physically consistent results [30, 43], allowing to explore out-of-sample and extreme events, and filter non-feasible model calibrations or profit from validation and learning schemes [44]. Unfortunately, instead of an integrated approach with system’s focus, often maximizing prediction accuracy is the main objective of AI/ML applications.
The overreliance on data as the underlying paradigm to predict system behavior with non-transparent ML applications is perilous [30, 45]. [46] presented value of information as a synthetic way of representing the role and limitations of models in science. Drawing from this concept (Fig 2) we illustrate the loss of information as percentage of variance (dynamics) captured from the real system to the final application of a model. It is important to remember that data is at best a small and opportunistic window into the real behavior of many complex systems, particularly those that exhibit non-linear dynamics with infrequent (long-term) but systematic outcomes that data normally does not fully capture (i.e., data is a biased sample of a system’s behavior). Further, a data-driven tool that only relies on the known data contains additional loss of information (unexplained variance) imposed by the tool structure (classifier, predictor) and during calibration and testing (Fig 2). Finally, the application of the tool to an independent event suffers from the addition of all these uncertainties so that at best a limited amount of the real behavior of the system can be predicted (Fig 2). As discussed later, this requires a careful and formal quantification of these uncertainties and sensitivities towards the final prediction. Indeed, the total uncertainty can deem the application irrelevant, where uncertainty confounds the ability to predict the system [47].
Fraction of variance represents the amount of the real–world system dynamics that each representation (data, model abstraction and model application) provides. Loss of information in each representation is displayed as sources of error.
In this work, we first provide a typology of hydrological problems based on the resolution of their spatiotemporal scales and use this to review current data and mechanistic tools commonly used today, standing challenges in their application, and finally ways to address these challenges to improve hydrological practice in the context of emerging and pressing challenges we face today. We specifically search for novel cases where AI/ML is used to complement established mechanistic and statistical approaches in hydrology and identify areas where both can complement and augment our current ability to improve solutions to hydrological problems. Our review and discussion are a “bird’s view” approach that will necessarily miss many details, but we hope that it will help advance our current view of the challenges faced in hydrology and the role of emerging AI/ML data-driven approaches in the context of existing tools to support new solutions.
2. A time-space typology of hydrological problems and applications: Current tools and challenges
We propose a typology of common problems that hydrologists face in engineering research and practice. This will allow us to identify existing tools (mechanistic and statistical) commonly available to address these types of problems and some of the standing challenges in their application and solution of these problems. As classification criteria we choose the typical domain of each hydrological problem type within the time (t)-space (s) continuum (Fig 3 axes), from fine (t, s), to intermediate (), to coarse (T,
) scales. This conceptual layout allows us to identify 4 types of hydrological problems, I to IV, based on the dominant scale at which processes, and solutions would be considered (Fig 3). With this general classification of hydrological problems (Fig 3) we present a general overview of current key AI/ML, statistical, and other tools used in each problem type and standing gaps and challenges in their use (Table 1). While many tools are used across hydrological problems, some are of special importance to certain problem types and examples are discussed in some detail below.
t, , T represent frequent, intermediate and long time scales. s,
represent small, intermediate and large spatial scales. Four quadrants (types I–IV) are identified for discussion.
Examples with references are provided in the text.
Type I () encompasses hydrological problems at small (t) temporal scales (daily to sub-hourly) and intermediate (
) spatial scales that require early warning and quick response (some examples are provided in Fig 3). The analysis and solutions of these problems require high-frequency data as the system dynamics of interest are fast, while the spatial dynamics can be simplified to a certain extent. Aside from analyses for understanding the system, often the purpose is the management of the system to optimize a reward function that can range from ensuring the protection of lives and property to improving water use efficiency. For this, one of the main goals is accuracy in determining the state of the system. For Type I, data often need to be taken in (near-)real time using dense sensors networks with frequent readings and observations, as decisions are time-sensitive and where AI/ML approaches are proving useful (e.g., starting irrigation to avoid plant stress and loss of productivity [48], supplement traditional watershed discharge routing techniques [49], manage flood control structures under immediate risk [50, 51], mitigate surface runoff sediment and pesticide transport [52], and landslide warning systems [53]. Sensor data often requires spatial and temporal preprocessing for which ML methods can be of great use, like interpolation [54]–e.g. for rainfall estimates in urban areas [55] and nowcasting [56]–, imputation of missing data due to sensor failure or budget constraints [38, 49], and assessment and curation of temporal and spatial outliers [57] that can reveal sensor errors, data corruption, or rare but legitimate phenomena underlying the importance of non-linear dynamics of the system. While in some cases, the relevant system dynamics for the study can be captured adequately by linear models such as autoregressive and dynamic factor models [58–60], in other cases data will be too noisy and/or include non-linear dynamics that are relevant to the purpose of the study. In this case more complicated methods must be employed. For example, a combination of Bayesian methods and AI can provide a more assumption-free approach considering multiple uncertainties [61], or coupling deep-learning with analog forecasting increases greatly the prediction accuracy of extreme weather patterns [62]. Alternatively, signal decomposition of the time series with singular spectrum analysis and similar techniques allows to isolate the signals from “noise”, preparing them for further non-linear time series analysis and causal mapping to untangle hidden system dynamics [63–65]. For example, echo-state neural network machine learning has been used recently to simulate reconstructed sediment rill-bed dynamics from non-linear time series analysis so that morphological development could be forecasted out-of-sample [66]. Tackling multi-domain and multi-physics hydrological problems with mechanistic models at these scales remains a challenge but is emerging with AI/ML support [67].
For Type II () problems, the hydrological system is typically analyzed at large (
) spatial and intermediate temporal (
) scales. This Type II usually targets processes and outcomes from regional, country or global scenarios such as mid-term impact assessments from large scale land use and climate changes. For the case of Type II, data used to address these problems often consists of stacks of spatial layers from diverse sources at separate times. This requires guaranteeing the provenance of all data, converting features into vectors, curating the inconsistencies of geo-temporal scales, feature boundaries, objective positions from various sources, and data quality (e.g., missing data, low resolution. In these cases, correctly classifying land use and cover or its predicted change is a major challenge as use and cover change dynamics at this scale are complex and depend on many and disparate natural and human covariates. These can be addressed with ML methods [33, 68, 69], where the further development in the assimilation of spatial and temporal images by DL within geographical information systems (GIS) and its application to hydrological problems like estimation of potential impact of sea level rise [70, 71] or assessing regional soil erosion or landslides [72, 73] is promising. Obtaining, combining, and accessing all the necessary information and models in a useful framework is computationally challenging and powerful solutions like High-Performance Computing (HPC) and internet cloud storage are a necessary resource to face the requirement of processing multiple quantitative models with vast amounts of data. Depending on the magnitude of the problem, the interpretation and the intra- and extrapolation of results can be equally challenging [33] because pure AI methods are strictly based on the observed distributions of the data and the added difficulty of keeping track of the different process importance and uncertainties in these large data frameworks. Advances in the integration of AI and mechanistic models at this scale would be promising for the extrapolation of AI results out of the observed conditions [36, 43]. In addition to the computational and data integration difficulties, as discussed in the next Vision/Outlook section, model interpretability and explainability of the simulated dynamics and emergent behaviors, and the reliability and applicability of such global model-based AI solutions are still in question and must be carefully investigated [74].
Type III () problems contain the study of slow-evolving processes (long temporal scale T) that are difficult to capture fully in commonly available data records. Usually, the purpose here is informing policies to maintain system resilience based on forecasts and extremes projections or improving knowledge from the reconstruction of the past. Opposite to Type I, where accuracy may be an achievable goal, high uncertainty from unknown and variable future system dynamics under stress conditions typically limits the results to general trends and patterns. In Type III observations about long-term processes are limited and often are taken during a period of assumed system stationarity or when the non-stationary is misrepresented due to lack of long-term data. This leads to high uncertainties about the system properties and its dynamics. For example, studies for hydromorphological and fluvial habitat conservation [75], studies of the long-term effects of climate change on the watershed hydrology [76, 77], or long-term evaluations of aquifer degradation [78, 79] will strongly depend on uncertain factors, such as anthropogenic land use and cover change projections, and their interactions with natural processes. To address uncertainties in the system factors, projections are typically studied with scenario analysis involving orderly combinations of factors while simplifying some of the system dynamics in order to identify extreme events which would lead to system failure and justify adaptation measures that can be very costly [80]. However, scenario assumptions are at times static and oversimplified and can lead to simplistic assessments. For example, land use and cover are assumed constant or “business as usual” when climate is changing and the environment that provides ecosystem services cannot possibly sustain current land use, so more sophisticated approaches must be adopted combining hydrological models and ecological principles [81]. On the other hand, uncertainty in the system dynamics can be addressed by combining results from different existing theories (or models) in an ensemble (usually based on measures of central tendency and dispersion) [82]. The motivation for building ensembles is that while it reduces the detail of the projections, the main trends are expected to be more robust, improving the credibility and trust of the results. While ensembles are widely used to address uncertainties in climate forecasting, the ensemble methods in hydrology still require more development to produce accurate and reliable forecasts [83].
Finally, compared to Type II in large spatial scales, Type IV () narrows the focus on a local (s) scale with an intermediate period (
) of interest. The need for detailed experiments at this scale remains a keystone of hydrological theory development as they allow development of novel experimental approaches to study complex processes, and produce rich datasets to test new theories and models [84]. Combination of indoor (laboratory) and outdoor (field) scales can also open new opportunities to study complex processes in data-rich environments like for example coupling of hydrological, microbiological and geochemical process for enhanced denitrification [85], coupled surface/subsurface transport in monoliths [86], or preferential flow in the unsaturated zone [87]. Experimental results are also needed to ground the analysis of anthropogenic drivers and effects on water quantity and quality for risk management purposes [84]. For example, a typical problem for this Type IV could be “What is the future influence of green infrastructure (GI) facilities at the household, neighborhood, municipality or county scales?” Answering this question involves the local and complex relationships among hydrology, ecology, and socio-economy. While access and management of environmental data at these scales is fast improving with new high-resolution satellite products and ground monitoring networks, high quality and accuracy is required for projects at this scale. Of particular importance are ML applications for downscaling existing coarse-scale remote sensing data that is difficult to get in situ, such as from the Gravity Recovery and Climate Experiment (GRACE) [88, 89], and efforts to downscale from very coarse scale that may support local hydrological applications in data scarce regions [90]. Other important application of ML at this scale is the estimation of soil properties from available covariates that may be correlated [91]. This requires new campaigns for careful data ground-truthing, particularly in many data scarce regions around the world where reliable ground observations are not available. ML applications can help in data-scarcity scenarios, for example in the estimation of ecological flows in small basins with discontinued gaging [92], and dealing with local spatial complexity such as erosion susceptibility [93], or household water use complexity [94]. When ecological and socioeconomic information is needed, obtaining detailed data can be costly and associated with high uncertainties. For a finer detail in the spatial scale, geospatial information can be collected by near surface remote sensing technologies, such as drones and bots, often complementing coarser satellite remote sensing products [95, 96]. Objects (e.g., buildings) [97] or instances from these images can now be successfully recognized by DL (e.g., flooding [98]). Geostatistics, including kriging [99] and Bayesian [100] methods as well as network analysis and ML [101] can help modelling problems where socioeconomic and ecological information are difficult to integrate in mechanistic models. In statistical and mechanistic modelling, the main challenges reside in identifying related multi-disciplinary factors and simulating their interactions. In the previous example of evaluating the effects of GI facilities, although the hydrological impact of GI in managing stormwater runoff can be physically estimated, the overall benefits depend on the interactions among local physical and chemical characteristics, ecology and socioeconomics [102, 103]. The implications go beyond the quantity and quality of stormwater management, affecting land use, property value and ecological services [103]. The extrapolation or portability of knowledge at this scale is a major challenge, as data and models are often very specific of certain conditions and lose reliability if applied to another study case [94, 104, 105]. In the GI example, different localities may value GI effects differently based on economic and cultural preferences that are not considered during the analysis [106–108]. In some cases, this could be addressed through transdisciplinary studies to find socioeconomic or ecological similarities that may support frameworks with a wider application [51, 109, 110]. However, matching the scales of the analysis in transdisciplinary problems can be challenging, as the tools can be very different across the disciplines. For example, in the GI case, points in the hydrological study may be slope positions or pixels, while for the socioeconomic study would be households, etc. [108, 111]. In all, data scarcity and scale matching remain the main challenge at this Type IV scale.
3. Vision/Outlook
We present a vision for the continued co-development of monitoring tools and mechanistic, statistical, and AI/ML analysis methods to address the challenges identified above for the different hydrological problem types. Solutions to these challenges can be organized around three main topics: 1) data management, 2) insights and knowledge extraction, and 3) modelling structure (Fig 4).
3.1. Data management
Data at unprecedently high spatial and temporal resolutions will become increasingly available from dense sensor networks supported by IoT (“Internet-of-Things”) and cloud technologies [112]. With increasing data availability, AI data analysis will become indispensable in formulating new mechanistic knowledge to address increasing water challenges from population expansion and land and climate changes [53, 112]. Since the performance of these tools relies on data quality (missing values, outliers, inaccuracy, and low resolution), data Quality Assurance and Quality Control (QAQC) skills are becoming crucial in tackling hydrological problems [113, 114]. Complex data integration and management require multi-faceted approaches using statistics, physics, geographic or network associations. “Big Data” [115] technologies to automate disparate data integration, curation and mining will become of particular importance to supply data to AI/ML and process simulation and analysis tools. As analyses become more exhaustive and are moved to cloud services, the ongoing development and use of Open Modeling standards [116, 117] and Application Programming Interfaces (API) [118, 119] to access data from integrated repositories will become a standard hydrological practice due to their ability to transmit data, simplify execution of functions, and embed calculation models. Significant efforts are already underway to integrate hydrological data, access and discovery protocols like those by Google Earth Engine [120–122] and CUASHI [123] in the USA or INSPIRE [124] in EU.
3.2. Insights and knowledge extraction
Big Data applications in water resources will require high cost of management and computation [125, 126], fast spatial data-mining will be increasingly essential in determining the efficiency and feasibility of data-centric solutions [127, 128]. Knowledge Graphs (KG) derived from semantic analysis to depict the relationship between two entities offer great promise [129]. Instead of shrinking the size of data, KG transform the insights derived from data through analysis and aggregations into an information-rich network graph structure [130–132]. For example, data of a flooding event could be analyzed and draw a conclusion that “a 38 mm two-hour-precipitation will trigger the flooding in a neighborhood”. This can be converted into a KG of “precipitation → floods → neighborhood”, i.e. while precipitation and neighborhood at the ends are the entities of interest, they are connected by floods. Any KG element like precipitation may also have attributes (e.g. 38 mm total depth and 2-hour duration). Such a graph structure provides more concise and understandable information than raw data. In addition, multiple KGs with common entities can be linked into a network or a set of interrelated networks [133], namely a knowledge base (KB) or canonical database (CD) that opens the opportunity to mine via network analysis deeper knowledge that was not previously viewable from raw data. Integrating into KG “traditional knowledge” (customary or cultural) from unstructured historical data (as formal and informal reports of historical extreme events) from similar locations within rules and forecasting theories will help the design for system adaptation and resilience [134–136]. This traditional knowledge of hydrological problems and applications is mainly stored as texts in books, newspapers, magazines, journal articles and increasingly in image and sound recordings that are unstructured for storage and analysis [133]. While manual reconstruction of historical records is possible, e.g., weather records in ancient China [137, 138], it is often impractical. The off-the-shelf accessibility of Natural Language Processing (NLP) software or packages, such as BERT [139] provides a crucial initial step for hydrologists to extract this traditional knowledge and essential information from other domains. Knowledge mining from CDs containing structured and unstructured sources will become of foremost importance for improving the reliability of hydrological hazard projections [140] and extreme event predictions [141]. Informing hydrological theories with “traditional” or “Indigenous” local knowledge obtained from other fields (e.g., geoarcheology, historical texts, and pollen records) will help to guide model building under non-stationary conditions and inform effective mitigation and adaptation practices relevant to local communities [142, 143].
3.3. Structure of modelling
The emerging global challenges posed by increasing water demand and accelerating climate and land use change require transdisciplinary solutions. Multi-domain and multi-physics theories and knowledge must be merged in a “circular systems” framework where waste is eliminated and resources are circulated at different scales [144]. Development of stacked models [145, 146] and supporting Big Data from different fields into “Big Models” capable of simulating Earth’s socio-environmental system responses and their controllers [147] will attract increasing attention. Such models require large investments to compile the diverse information needed to define a wide range of relevant system conditions and powerful computational and storage capacity to support them. Big Model approaches can suffer from the open availability of data and modeling components due to security protocols, commercial licensing or incompatibility. Because of the high financial and computational costs involved, examples of Big Models are rare, with only a few industrial examples from some IT giants, such as IBM with its Environmental Intelligence Suite [148] and Global High-Resolution Atmospheric Forecasting System [149]. However, model stacking across multiple disciplines requires design based on solid understanding of the overarching system dynamics (physical, social and ecological dependencies) and the component interactions and feedbacks [150]. Correct diagnosis of the linear and non-linear dynamics embedded in time series and spatial data is a critical and often overlooked step to build models that mimic the systems studied, and particularly their extreme and emerging hydrological responses. Causal mapping diagnostic tools from other disciplines [64, 151] should be used regularly to diagnose the strength of low-dimensional endogenous non-linear dynamics embedded in the hydrological system data and applied to inform management decisions [63, 65, 152, 153]. Close partnerships between academia and industry can ensure that the best and most dynamic knowledge is incorporated into these efforts, but a business model needs to be developed first for these collaborations that also align with the academic public mission.
While these ambitious projects could give important insights for policy makers, the resulting Big Models have many sources of uncertainties and bias that will require formal and exhaustive reliability testing before their use to inform public policy and management. As Big Models can suffer from low explainability and interpretability, these will require novel testing tools and frameworks. Traditional statistical model selection criteria (Akaike and Bayesian Information Criteria) or similarity accuracy, precision, and recall measures derived from error and confusion matrix approaches are subject to the representativeness of data and not reliable in judging projections or social impacts, especially in Type III and IV hydrological problems presented before. For time-consuming Big Modelling approaches, emulating outputs (or components) with metamodels (fitted response surfaces from many simulations) will be a necessary trade-off. While this adds a new source of uncertainty (the metamodel error) it can greatly reduce the computational cost required for application and testing the model reliability with tools like global sensitivity analysis and uncertainty analysis (e.g. [154–156]). Parsimony must remain an important principle for theory and mechanistic model development. Computationally efficient dimension-reduction (factor importance) techniques, particularly from global sensitivity analysis based on high-dimensional variance decomposition [20, 157, 158], will be critical in this context [52, 159, 160].
Finding agreement between “black box” and transparent models should not be disregarded in favor of explaining the former [30], as both approaches should at least point at similar important factors. For these comparison reasons, model agnostic factor importance measures are of great use. The further development of the field of explainable AI (XAI) [161–163] is a necessary requisite for AI model approaches to be competitive and complementary to conventional mechanistic models in hydrology. Recent XAI methods such as SHAP [164], LIME[165] or ALE [166] can improve global and local model transparency. These methods have been applied in hydrological problems to uncover local and global dependencies between hydro-climatic variables on evapotranspiration [162] or identify the controlling catchment properties in the hydrological partitioning [167]. As new complexity arises from climate change, fast-paced urban development, and interconnected economies, we can expect past data collected under dominant stationary conditions to be less useful. AI/ML models increase their accuracy with increasing data for their training and calibration, which is necessary for them to be trusted. Climate change demands revision on how models are evaluated. Conventional practice on hydrological modeling is to train and test models on past data, where under accelerating non-stationary conditions this practice can produce underperforming models. [168] recommended using all data for training after estimating model performance, which will likely be underestimated. This is even more relevant for AI/ML models as their predictions cannot be supported generally by physical understanding of the system. Recently [169], attention has been given to the issue of using aggregate metrics (as system inputs and outputs) in AI training and testing, particularly in high-stakes problems. Aggregation, although convenient, can hinder evaluation of the true performance of the AI model, and further complicates interpretability of the hydrological system it represents. To address this issue, these authors [169] recommend “instance-by-instance” evaluation results (i.e., the outputs and scores of these systems for each instance) and making these publicly available. Thus, extrapolation and portability of raw and aggregated data, AI knowledge, and models needs to be considered carefully before high-stakes mitigation and adaptation efforts are developed under accelerating non-stationary hydrological conditions [32], particularly for Type II large scale problems.
The complementary role of ML to mechanistic modeling approaches was succinctly outlined by [41] for the medical field, but insights are directly transferable to the hydrological sciences. When possible, ML and mechanistic modeling should be used in tandem [30, 33]. For example, when a hydrological sciences problem is data rich, ML can help identify correlations, quantify uncertainty, explore design spaces, and identify system dynamics [32, 37, 52]. Such information can then be integrated into mechanistic, process-based models to further analyze sensitivity, constrain design spaces, and predict system dynamics. On the other hand, when only a theoretical system framework with limited data exists ML can be used to generate supplemental synthetic training data, identify parameter values, and analyze sensitivity [33, 37]. Such information can feed explicitly into multi-scale modeling to constrain parameter spaces, identify and explore the interaction of processes, and further analyze sensitivity more closely (e.g. recently on learning how water freezes [170]). Application of current ML techniques in hydrologic scientific and engineering research often halts at the implementation stage of ML without generating new insights into hydrological processes. Instead, additional tandem integration with mechanistic modeling is required to advance our understanding of the underlying processes, design space, and influential parameters. Importantly, ongoing developments and applications like those reviewed in this paper support AI/ML as an inductive and exploratory tool to support the development of new theories to address standing gaps in hydrological data management, modeling structure and knowledge extraction in complex changing systems [33, 171].
4. Concluding remarks: Integration and ethical considerations
Currently many AI/ML hydrological applications fail to offer strong hypotheses for understanding the problem and advancing hydrological science. An additional challenge is that, despite the pressing calls for data sharing and open access from research and technical publications, this practice is not widely adopted or required and the original data from many studies is not readily available even in data-rich areas, hindering development and testing of integrated solutions. To address this critical issue, the submission of AI research and technical works in reputed journals requires the preparation and submission of supplementary materials to the main work with detailed analyses (e.g., full datasets and curation steps, visualization and description of the data used and distribution of residuals, actual processing scripts or software code and access to programs used with input and output examples, etc.). This will contribute to increasing the reliability of hydrological solutions and address the ongoing mistrust in science epidemy in Society today [172].
Most AI research and application in hydrology are devoted to the learning component of AI, where the development of algorithms for prediction has advanced greatly. However, [173] first defined AI as “the science and engineering of making intelligent machines” where the learning component is only one component. The combination of symbolic AI (logical rules) and sub-symbolic AI (ML) in XAI may help to advance this topic [174]. To this end, we offer an integrated vision for AI in hydrological sciences and engineering, what we call the “AI hut”. AI based solutions require the circular integration (the “AI hut”, Fig 5) of: L: Learning (sensonics, big data, pattern recognition/analysis and prediction); D: Decision and risk analytics; M: autonomous Machine response (the technological implementation in hardware). As in a “hut”, the floor and walls are as important as the roof and all integrate the complete the structure. The hydrological sciences and engineering community has isolated expertise in all these components and can integrate them effectively.
The hut requires the transdisciplinary integration of learning (L), decision analytics (D) and automation (A) disciplines and experts, where the AI Learning component must be balanced by the other components (inspired by McCarthy et al. (1955) original definition of AI).
Based on our review of hydrological applications of AI/ML and complementary tools, we envision three main development topics around data management, insights and knowledge extraction, and modelling structure. Advances in these topics will help bridge the standing gap between the current state-of-knowledge and state-of-practice in hydrology. Bridging this gap has been traditionally slow, where innovations take at times decades to become state-of-practice and sometimes are not adopted at all. Possible explanations are few standards and little international organization (e.g. there are UN WMO, WHO etc. but no water dedicated organization). Often water management is devolved to local/regional levels with little coordination where most problems are perceived as engineering application problems not requiring much science or development. While the typical slow state-of-knowledge adoption (“good things take time”) might acceptable under long-stationary phases, to adapt to the expected accelerated rate of change in the remaining part of the XXI century necessitates quick implementation of the best available knowledge at any time. Quick adoption will require forging proactive transdisciplinary alliances across academic communities of knowledge (engineering, earth systems, computational, statistical, and socioeconomic disciplines) and practice (industry, government, consultants, NGOs). As water is generally regarded as a common good, the public sector and institutions must play a proactive role through funding, incentives, regulation, and creation of mission-driven and flexible advisory organizations that ensure the systems integration of expertise to evaluate and adopt solutions at all scales based on the best science available at every moment.
A final critical topic is that of the important ethical, legal, social, and economic (ELSE) implications of AI/ML adoption not only in hydrology but all disciplinary fields [175, 176]. For example, in agriculture [177] there is a pressing call for responsible development and use of AI so that it does not create further economic gaps between wealthy and underserved stakeholders. Recently, [178] reviewed highly cited AI publications between 1991 and 2020 about ELSE implications and found a widening gap between Europe and North America (80% of publications) and Africa and South America (<4% of publications). The authors called for a multidisciplinary approach to gain trust in AI/ML and better adoption practices by end users. In the end, we are faced with the reality that AI-based products and machines do not think like humans or are subject to human ethical restrains and we might not be able to fully explain raw predictions made by them [179].
As AI solutions take a bigger decision-making role in high-stakes problems, we call for the creation of transdisciplinary teams from the learning, decision analytics and machine implementation communities that can develop this vision while carefully addressing transparency, privacy, access, and equity challenges.
References
- 1. Eagleson PS. Hydrologic science: A distinct geoscience. Reviews of Geophysics. 1991;29(2):237–48.
- 2. Peters-Lidard CD, Hossain F, Leung LR, McDowell N, Rodell M, Tapiador FJ, et al. 100 Years of Progress in Hydrology. Meteorological Monographs. 2018;59:25.1–.51.
- 3.
Council NR. Opportunities in the Hydrologic Sciences. Washington, DC: The National Academies Press; 1991. 371 p.
- 4. Strangeways I. A history of rain gauges. Weather. 2010;65(5):133–8. WOS:000277555300006.
- 5.
Pfister L, Savenije HHG, Fenicia F. Leonardo Da Vinci’s water theory: on the origin and fate of water. Wallingford, UK: IAHS Press; 2009.
- 6. Torricelli E. De motu Proiectorum (Liber Secundus). Florence 1644.
- 7. Pascal B. Traitťs de lľ’quilibre des liqueurs et de la pesanteur de la masse de l’air. Contenant l’explication des causes de diners gets de la nature qui n’avaient point etc bien connus jttsgttes ici, et particulièrement de ceux que l’on avait attribués d I’horrcur du vide. 1st ed. Paris1663.
- 8. Chézy A. Formule pour trouver la vitesse de l’eau conduit dan une rigole donnée. Paris Ecole des Ponts et Chaussées; 1776.
- 9. R M. On the flow of water in open channels and pipes. Trans of the Institution of Civil Engineers of Ireland. 1891;20:161–207.
- 10. Bd Saint-Venant. Théorie du mouvement non permanent des eaux, avec application aux crues des rivières et a l’introduction de marées dans leurs lits. Comptes Rendus de l’Académie des Sciences. 1871;73:147–54, 237–40.
- 11.
Te Chow V, Maidment DR, Mays LW. Applied Hydrology: Tata McGraw-Hill Education; 2010.
- 12.
Korzoun V. World water balance and water resources of the earth. Paris: UNESCO Press; 1978.
- 13. Crawford NH, Burges SJ. History of the Stanford watershed model. Water Resources Impact. 2004;6(2):3–6.
- 14. Garrido-Baserba M, Corominas L, Cortés U, Rosso D, Poch M. The Fourth-Revolution in the Water Sector Encounters the Digital Revolution. Environmental Science & Technology. 2020;54(8):4698–705. pmid:32154710
- 15. Fox GA. Process-based design strengthens theanalysis of stream and floodplain systems under a changing climate. 2019.
- 16.
Maidment DR. Handbook of Hydrology. NY: McGraw-Hill Education; 1993.
- 17. Matheron G. La théorie des variables régionalisées et ses applications. Fascicule 5. Les Cahiers du Centre de Morphologie Mathématique de Fontainebleau Paris: École Nationale Supérieure des Mines. 1970.
- 18. Cressie NAC. Statistics for spatial data. New York1993.
- 19.
Goovaerts P. Geostatistics for natural resources evaluation: Oxford University Press on Demand; 1997.
- 20. Cukier RI, Levine HB, Shuler KE. Nonlinear sensitivity analysis of multiparameter model systems. Journal of Computational Physics. 1978;26(1):1–42.
- 21. Ajami N, Duan Q, Sorooshian S. An integrated hydrologic Bayesian multimodel combination framework: Confronting input, parameter, and model structural uncertainty in hydrologic prediction. Water Resources Research. 2007;43(1). WOS:000243532500001.
- 22. Schaake JC, Hamill TM, Buizza R, Clark M. HEPEX: The Hydrological Ensemble Prediction Experiment. Bulletin of the American Meteorological Society. 2007;88(10):1541–8.
- 23. Dion P, Martel J-L, Arsenault R. Hydrological ensemble forecasting using a multi-model framework. Journal of Hydrology. 2021;600:126537.
- 24. Shin S, Her Y, Muñoz-Carpena R, Khare YP. Multi-parameter approaches for improved ensemble prediction accuracy in hydrology and water quality modeling. Journal of Hydrology. 2023:129458.
- 25. Ritter A, Munoz-Carpena R. Performance evaluation of hydrological models: Statistical significance for reducing subjectivity in goodness-of-fit assessments. Journal of Hydrology. 2013;480:33–45. WOS:000315008300004.
- 26. Pfister L, Kirchner JW. Debates—Hypothesis testing in hydrology: Theory and practice. Water Resources Research. 2017;53(3):1792–8.
- 27. Beven KJ. On hypothesis testing in hydrology: Why falsification of models is still a really good idea. WIREs Water. 2018;5(3):e1278.
- 28. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. pmid:26017442
- 29. Mitchell M. Why AI is harder than we think. arXiv preprint arXiv:210412871. 2021.
- 30. Rudin C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature machine intelligence. 2019;1(5):206–15. Epub 2019/05/13. pmid:35603010.
- 31. Sit M, Demiray BZ, Xiang Z, Ewing GJ, Sermet Y, Demir I. A comprehensive review of deep learning applications in hydrology and water resources. Water Science and Technology. 2020;82(12):2635–70. pmid:33341760
- 32. Nearing GS, Kratzert F, Sampson AK, Pelissier CS, Klotz D, Frame JM, et al. What Role Does Hydrological Science Play in the Age of Machine Learning? Water Resources Research. 2021;57(3):e2020WR028091.
- 33. Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N, et al. Deep learning and process understanding for data-driven Earth system science. Nature. 2019;566(7743):195–204. pmid:30760912
- 34. Kim T, Yang T, Gao S, Zhang L, Ding Z, Wen X, et al. Can artificial intelligence and data-driven machine learning models match or even replace process-driven hydrologic models for streamflow simulation?: A case study of four watersheds with different hydro-climatic regions across the CONUS. Journal of Hydrology. 2021;598:126423.
- 35. Adnan RM, Petroselli A, Heddam S, Santos CAG, Kisi O. Comparison of different methodologies for rainfall–runoff modeling: machine learning vs conceptual approach. Natural Hazards. 2021;105(3):2987–3011.
- 36. Chang H, Zhang D. Machine learning subsurface flow equations from data. Computational Geosciences. 2019;23(5):895–910.
- 37. Bortnik J, Camporeale E. Ten Ways to Apply Machine Learning in Earth and Space Sciences. Eos. 2021;102.
- 38. Stekhoven DJ, Buhlmann P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2011;28(1):112–8. pmid:22039212
- 39. Maxwell RM, Condon LE, Melchior P. A Physics-Informed, Machine Learning Emulator of a 2D Surface Water Model: What Temporal Networks and Simulation-Based Inference Can Help Us Learn about Hydrologic Processes. Water. 2021;13(24):3633.
- 40.
Fenn J, Raskino M. Mastering the hype cycle: how to choose the right innovation at the right time. Boston, Mass.: Harvard Business Press; 2008. xviii, 237 p. p.
- 41. Alber M, Buganza Tepole A, Cannon WR, De S, Dura-Bernal S, Garikipati K, et al. Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ digital medicine. 2019;2(1):1–11. pmid:31799423
- 42. Zhang J, Petersen SD, Radivojevic T, Ramirez A, Pérez-Manríquez A, Abeliuk E, et al. Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism. Nature Communications. 2020;11(1):4880. pmid:32978375
- 43. Höge M, Scheidegger A, Baity-Jesi M, Albert C, Fenicia F. Improving hydrologic models for predictions and process understanding using neural ODEs. Hydrology and Earth System Sciences. 2022;26(19):5085–102.
- 44. Irrgang C, Boers N, Sonnewald M, Barnes EA, Kadow C, Staneva J, et al. Towards neural Earth system modelling by integrating artificial intelligence in Earth system science. Nature Machine Intelligence. 2021;3(8):667–74.
- 45.
O’Neil C. Weapons of math destruction: how big data increases inequality and threatens democracy. First edition. ed. New York: Crown; 2016. x, 259 pages p.
- 46. Nearing GS, Gupta HV. The quantity and quality of information in hydrologic models. Water Resources Research. 2015;51(1):524–38.
- 47.
Muller S, Muñoz-Carpena R, Kiker G, editors. Model Relevance. NATO Science for Peace and Security Series C: Environmental Security; 2011; Dordrecht: Springer Netherlands.
- 48. Talaviya T, Shah D, Patel N, Yagnik H, Shah M. Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artificial Intelligence in Agriculture. 2020;4:58–73.
- 49. Khatibi R, Ghorbani MA, Kashani MH, Kisi O. Comparison of three artificial intelligence techniques for discharge routing. Journal of Hydrology. 2011;403(3–4):201–12.
- 50. Demir I, Yildirim E, Sermet Y, Sit MA. FLOODSS: Iowa flood information system as a generalized flood cyberinfrastructure. International Journal of River Basin Management. 2017;16(3):393–400.
- 51. Liu Y, Qin H, Zhang Z, Yao L, Wang Y, Li J, et al. Deriving reservoir operation rule based on Bayesian deep learning method considering multiple uncertainties. Journal of Hydrology. 2019;579:124207.
- 52. Reichenberger S, Sur R, Sittig S, Multsch S, Carmona-Cabrero Á, López JJ, et al. Dynamic prediction of effective runoff sediment particle size for improved assessment of erosion mitigation efficiency with vegetative filter strips. Science of The Total Environment. 2023;857:159572. pmid:36272479
- 53. Piciullo L, Capobianco V, Heyerdahl H. A first step towards a IoT-based local early warning system for an unsaturated slope in Norway. Natural Hazards. 2022;114(3):3377–407.
- 54. Keller JM, Gray MR, Givens JA. A fuzzy K-nearest neighbor algorithm. IEEE Transactions on Systems, Man, and Cybernetics. 1985;SMC-15(4):580–5.
- 55. Nguyen DH, Bae D-H. Correcting mean areal precipitation forecasts to improve urban flooding predictions by using long short-term memory network. Journal of Hydrology. 2020;584:124710.
- 56. Yan Q, Ji F, Miao K, Wu Q, Xia Y, Li T. Convolutional Residual-Attention: A Deep Learning Approach for Precipitation Nowcasting. Advances in Meteorology. 2020;2020:1–12.
- 57. Park CH. Outlier and anomaly pattern detection on data streams. The Journal of Supercomputing. 2019;75(9):6118–28.
- 58. Kaplan D, Muñoz-Carpena R, Ritter A. Untangling complex shallow groundwater dynamics in the floodplain wetlands of a southeastern US coastal river. Water Resources Research. 2010;46:W08528. ISI:000280960900003.
- 59. Ritter A, Regalado C, Muñoz-Carpena R. Temporal Common Trends of Topsoil Water Dynamics in a Humid Subtropical Forest Watershed. Vadose Zone Journal. 2009:437–49. ISI:000266297100017.
- 60. Kuo Y-M, Liu W-w, Zhao E, Li R, Muñoz-Carpena R. Water quality variability in the middle and down streams of Han River under the influence of the Middle Route of South-North Water diversion project, China. Journal of Hydrology. 2019;569:218–29.
- 61. Han S, Coulibaly P. Bayesian flood forecasting methods: A review. Journal of Hydrology. 2017;551:340–51.
- 62. Chattopadhyay A, Nabizadeh E, Hassanzadeh P. Analog Forecasting of Extreme-Causing Weather Patterns Using Deep Learning. Journal of Advances in Modeling Earth Systems. 2020;12(2):e2019MS001958. pmid:32714491
- 63. Huffaker R, Muñoz-Carpena R, Campo-Bescós MA, Southworth J. Demonstrating correspondence between decision-support models and dynamics of real-world environmental systems. Environmental Modelling & Software. 2016;83:74–87.
- 64. Sugihara G, May R, Ye H, Hsieh C-h, Deyle E, Fogarty M, et al. Detecting causality in complex ecosystems. science. 2012;338(6106):496–500. pmid:22997134
- 65. Medina M, Huffaker R, Jawitz JW, Muñoz-Carpena R. Nonlinear Dynamics in Treatment Wetlands: Identifying Systematic Drivers of Nonequilibrium Outlet Concentrations in Everglades STAs. Water Resources Research. 2019;55(12):11101–20.
- 66. Morgan S, Huffaker R, Giménez R, Campo-Bescos MA, Muñoz-Carpena R, Govers G. Experimental evidence that rill-bed morphology is governed by emergent nonlinear spatial dynamics. Scientific reports. 2022;12(1):21500–. pmid:36513727.
- 67. Wang Y, Shi L, Hu X, Song W, Wang L. Multiphysics-Informed Neural Networks for Coupled Soil Hydrothermal Modeling. Water Resources Research. 2023;59(1):e2022WR031960.
- 68. Islam K, Rahman MF, Jashimuddin M. Modeling land use change using Cellular Automata and Artificial Neural Network: The case of Chunati Wildlife Sanctuary, Bangladesh. Ecological Indicators. 2018;88:439–53.
- 69. Talukdar S, Singha P, Mahato S, Shahfahad Pal S, Liou Y-A, et al. Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations—A Review. Remote Sensing. 2020;12(7):1135.
- 70. Nieves V, Radin C, Camps-Valls G. Predicting regional coastal sea level changes with machine learning. Scientific reports. 2021;11(1):7650–. pmid:33828225.
- 71. Adebisi N, Balogun A-L. A deep-learning model for national scale modelling and mapping of sea level rise in Malaysia: the past, present, and future. Geocarto International. 2021;37(23):6892–914.
- 72. Chakrabortty R, Pal SC. Modeling soil erosion susceptibility using GIS-based different machine learning algorithms in monsoon dominated diversified landscape in India. Modeling Earth Systems and Environment. 2023.
- 73. Huang F, Chen J, Du Z, Yao C, Huang J, Jiang Q, et al. Landslide Susceptibility Prediction Considering Regional Soil Erosion Based on Machine-Learning Models. ISPRS International Journal of Geo-Information. 2020;9(6):377.
- 74.
Wong KCL, Wang H, Vos EE, Zadrozny B, Watson CD, Syeda-Mahmood T, editors. Addressing Deep Learning Model Uncertainty in Long-Range Climate Forecasting with Late Fusion. NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning; 2021: Climate Change AI.
- 75. Rabanaque MP, Martínez‐Fernández V, Calle M, Benito G. Basin‐wide hydromorphological analysis of ephemeral streams using machine learning algorithms ‡. Earth Surface Processes and Landforms. 2021;47(1):328–44.
- 76. Yifru BA, Chung I-M, Kim M-G, Chang SW. Assessing the Effect of Land/Use Land Cover and Climate Change on Water Yield and Groundwater Recharge in East African Rift Valley using Integrated Model. Journal of Hydrology: Regional Studies. 2021;37:100926.
- 77. Praveen B, Talukdar S, Shahfahad , Mahato S, Mondal J, Sharma P, et al. Analyzing trend and forecasting of rainfall changes in India using non-parametrical and machine learning approaches. Scientific reports. 2020;10(1):10342–. pmid:32587299.
- 78. Cardenas-Martinez A, Rodriguez-Galiano V, Luque-Espinar JA, Mendes MP. Predictive modelling benchmark of nitrate Vulnerable Zones at a regional scale based on Machine learning and remote sensing. Journal of Hydrology. 2021;603:127092.
- 79. Almuhaylan MR, Ghumman AR, Al-Salamah IS, Ahmad A, Ghazaw YM, Haider H, et al. Evaluating the Impacts of Pumping on Aquifer Depletion in Arid Regions Using MODFLOW, ANFIS and ANN. Water. 2020;12(8):2297.
- 80. Naderi MM, Mirchi A, Bavani ARM, Goharian E, Madani K. System dynamics simulation of regional water supply and demand using a food-energy-water nexus approach: Application to Qazvin Plain, Iran. Journal of Environmental Management. 2021;280:111843. pmid:33360255
- 81. Kepner WG, Ramsey MM, Brown ES, Jarchow ME, Dickinson KJM, Mark AF. Hydrologic futures: using scenario analysis to evaluate impacts of forecasted land use change on hydrologic services. Ecosphere. 2012;3(7):art69.
- 82.
Duan Q, Pappenberger F, Wood A, Cloke HL, Schaake J. Handbook of hydrometeorological ensemble forecasting: Springer Berlin/Heidelberg, Germany; 2019.
- 83.
Werner K, Verkade JS, Pagano TC. Application of Hydrological Forecast Verification Information. In: Duan Q, Pappenberger F, Thielen J, Wood A, Cloke HL, Schaake JC, editors. Handbook of Hydrometeorological Ensemble Forecasting. Berlin, Heidelberg: Springer Berlin Heidelberg; 2016. p. 1–21.
- 84. Hopmans JW, Pasternack G. Experimental hydrology: A bright future. Advances in Water Resources. 2006;29(2):117–20.
- 85. Gorski G, Fisher AT, Beganskas S, Weir WB, Redford K, Schmidt C, et al. Field and Laboratory Studies Linking Hydrologic, Geochemical, and Microbiological Processes and Enhanced Denitrification during Infiltration for Managed Recharge. Environmental Science & Technology. 2019;53(16):9491–501. pmid:31352778
- 86. Ramler D, Strauss P. Technical Note: Combining undisturbed soil monoliths for hydrological indoor experiments. Hydrol Earth Syst Sci. 2023;27(9):1745–54.
- 87. Jarvis N, Koestel J, Larsbo M. Understanding Preferential Flow in the Vadose Zone: Recent Advances and Future Prospects. Vadose Zone Journal. 2016;15(12):vzj2016.09.0075.
- 88. Tapley BD, Watkins MM, Flechtner F, Reigber C, Bettadpur S, Rodell M, et al. Contributions of GRACE to understanding climate change. Nature climate change. 2019;5(5):358–69. pmid:31534490.
- 89. Vishwakarma BD, Zhang J, Sneeuw N. Downscaling GRACE total water storage change using partial least squares regression. Scientific data. 2021;8(1):95–. pmid:33772016.
- 90. Im J, Park S, Rhee J, Baik J, Choi M. Downscaling of AMSR-E soil moisture with MODIS products using machine learning approaches. Environmental Earth Sciences. 2016;75(15).
- 91. Hengl T, Mendes de Jesus J, Heuvelink GBM, Ruiperez Gonzalez M, Kilibarda M, Blagotić A, et al. SoilGrids250m: Global gridded soil information based on machine learning. PloS one. 2017;12(2):e0169748–e. pmid:28207752.
- 92. Sahoo DP, Sahoo B, Tiwari MK, Behera GK. Integrated remote sensing and machine learning tools for estimating ecological flow regimes in tropical river reaches. Journal of Environmental Management. 2022;322:116121. pmid:36070653
- 93. Rahmati O, Tahmasebipour N, Haghizadeh A, Pourghasemi HR, Feizizadeh B. Evaluation of different machine learning models for predicting and mapping the susceptibility of gully erosion. Geomorphology. 2017;298:118–37.
- 94. Duerr I, Merrill HR, Wang C, Bai R, Boyer M, Dukes MD, et al. Forecasting urban household water demand with statistical and machine learning methods using large space-time data: A Comparative study. Environmental Modelling & Software. 2018;102:29–38.
- 95. Alvarez-Vanhard E, Corpetti T, Houet T. UAV & satellite synergies for optical remote sensing applications: A literature review. Science of Remote Sensing. 2021;3:100019.
- 96. Acharya BS, Bhandari M, Bandini F, Pizarro A, Perks M, Joshi DR, et al. Unmanned Aerial Vehicles in Hydrology and Water Management: Applications, Challenges, and Perspectives. Water Resources Research. 2021;57(11):e2021WR029925.
- 97. Zeng Y, Guo Y, Li J. Recognition and extraction of high-resolution satellite remote sensing image buildings based on deep learning. Neural Computing and Applications. 2021;34(4):2691–706.
- 98. Tien Bui D, Hoang N-D, Martínez-Álvarez F, Ngo P-TT, Hoa PV, Pham TD, et al. A novel deep learning neural network approach for predicting flash flood susceptibility: A case study at a high frequency tropical storm area. Science of The Total Environment. 2020;701:134413. pmid:31706212
- 99. Chilès JP, Delfiner P. Geostatistics: Wiley; 2012 2012/02/23.
- 100. Alizadeh B, Ghaderi Bafti A, Kamangir H, Zhang Y, Wright DB, Franz KJ. A novel attention-based LSTM cell post-processor coupled with bayesian optimization for streamflow prediction. Journal of Hydrology. 2021;601:126526.
- 101. Meshram SG, Meshram C, Santos CAG, Benzougagh B, Khedher KM. Streamflow Prediction Based on Artificial Intelligence Techniques. Iranian Journal of Science and Technology, Transactions of Civil Engineering. 2021;46(3):2393–403.
- 102. Assaad RH, Assaf G, Boufadel M. Optimizing the maintenance strategies for a network of green infrastructure: An agent-based model for stormwater detention basins. Journal of Environmental Management. 2023;330:117179. pmid:36608609
- 103. Li L, Uyttenhove P, Van Eetvelde V. Planning green infrastructure to mitigate urban surface water flooding risk–A methodology to identify priority areas applied in the city of Ghent. Landscape and Urban Planning. 2020;194:103703.
- 104. Adamowski J, Karapataki C. Comparison of Multivariate Regression and Artificial Neural Networks for Peak Urban Water-Demand Forecasting: Evaluation of Different ANN Learning Algorithms. Journal of Hydrologic Engineering. 2010;15(10):729–43.
- 105.
Jain DA, Joshi UC, Varshney AK. Short-term water demand forecasting using artificial neural networks: IIT Kanpur experience. Proceedings 15th International Conference on Pattern Recognition ICPR-2000: IEEE Comput. Soc.
- 106. Chatzimentor A, Apostolopoulou E, Mazaris AD. A review of green infrastructure research in Europe: Challenges and opportunities. Landscape and Urban Planning. 2020;198:103775.
- 107. Derkzen ML, van Teeffelen AJA, Verburg PH. Green infrastructure for urban climate adaptation: How do residents’ views on climate impacts and green infrastructure shape adaptation preferences? Landscape and Urban Planning. 2017;157:106–30.
- 108. Miller SM, Montalto FA. Stakeholder perceptions of the ecosystem services provided by Green Infrastructure in New York City. Ecosystem Services. 2019;37:100928.
- 109. Mahjabin T, Mejia A, Blumsack S, Grady C. Integrating embedded resources and network analysis to understand food-energy-water nexus in the US. Science of The Total Environment. 2020;709:136153. pmid:31905549
- 110. Zhang G, Huang G, Liu L, Niu G, Li J, McBean E. Ecological network analysis of an urban water metabolic system based on input-output model: A case study of Guangdong, China. Science of The Total Environment. 2019;670:369–78. pmid:30904651
- 111. Heckert M, Rosan CD. Developing a green infrastructure equity index to promote equity planning. Urban Forestry & Urban Greening. 2016;19:263–70.
- 112. Esposito M, Palma L, Belli A, Sabbatini L, Pierleoni P. Recent Advances in Internet of Things Solutions for Early Warning Systems: A Review. Sensors. 2022;22(6):2124. pmid:35336296
- 113. Shafer MA, Fiebrich CA, Arndt DS, Fredrickson SE, Hughes TW. Quality Assurance Procedures in the Oklahoma Mesonetwork. Journal of Atmospheric and Oceanic Technology. 2000;17(4):474–94. <0474:Qapito>2.0.Co;2.
- 114. Durre I, Menne MJ, Gleason BE, Houston TG, Vose RS. Comprehensive Automated Quality Assurance of Daily Surface Observations. Journal of Applied Meteorology and Climatology. 2010;49(8):1615–33.
- 115. Barrera J, Pachitariu G. Big Data: What is it? And is my data big enough? Resource Magazine. 2018;25(3):18–21.
- 116. Gregersen JB, Gijsbers PJA, Westen SJP. OpenMI: Open modelling interface. Journal of Hydroinformatics. 2007;9(3):175–91.
- 117. Peckham SD, Hutton EWH, Norris B. A component-based approach to integrated modeling in the geosciences: The design of CSDMS. Computers & Geosciences. 2013;53:3–12.
- 118. Slater LJ, Thirel G, Harrigan S, Delaigue O, Hurley A, Khouakhi A, et al. Using R in hydrology: a review of recent developments and future directions. Hydrol Earth Syst Sci. 2019;23(7):2939–63.
- 119. Hughes JD, Russcher MJ, Langevin CD, Morway ED, McDonald RR. The MODFLOW Application Programming Interface for simulation control and software interoperability. Environmental Modelling & Software. 2022;148:105257.
- 120.
Google. Google Earth Engine: A planetary-scale geospatial analysis platform. Mountain View, CA: Google Inc.; 2015
- 121. Hansen M, Potapov P, Moore R, Hancher M, Turubanova S, Tyukavina A, et al. High-Resolution Global Maps of 21st-Century Forest Cover Change. Science. 2013;342(6160):850–3. WOS:000326923000036. pmid:24233722
- 122. Alonso A, Muñoz-Carpena R, Kennedy RE, Murcia C. Wetland landscape spatio-temporal degradation dynamics using the new Google Earth Engine cloud-based platform: Opportunities for non-specialists in remote sensing. Trans ASABE. 2016;59(5):1333–44.
- 123.
CUASHI. Annual membership report. Consortium of Universities for the Advancement of Hydrologic Science-CUASHI, Inc. Arlington, MA 02476: 2021.
- 124. Commission E. Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE): European Commission; 2007 [cited 2022 September 2022]. EC INSPIRE Web Knowledge Base. Available from: https://inspire.ec.europa.eu.
- 125. Chen Y, Han D. Big data and hydroinformatics. Journal of Hydroinformatics. 2016;18(4):599–614.
- 126. Adamala S. An Overview of Big Data Applications in Water Resources Engineering. Mach Learn Res. 2017;2(1):10–8.
- 127. Goyal H, Sharma C, Joshi N. An Integrated Approach of GIS and Spatial Data Mining in Big Data. International Journal of Computer Applications. 2017;169(11):1–6.
- 128.
Pekmez Z. Mining big data for sustainable water management. DIEM: Dubrovnik International Economic Meeting; Dubrovnik, Croatia: University of Dubrovnik, Croatia; 2020.
- 129. Sheth A, Padhee S, Gyrard A. Knowledge graphs and knowledge networks: the story in brief. IEEE Internet Computing. 2019;23(4):67–75.
- 130. Ouarda TBMJ, Girard C, Cavadias GS, Bobée BRegional flood frequency estimation with canonical correlation analysis. Journal of Hydrology. 2001;254(1–4):157–73.
- 131. Yang Y, Zhu Y, Jian P. Application of Knowledge Graph in Water Conservancy Education Resource Organization under the Background of Big Data. Electronics. 2022;11(23):3913.
- 132. He L, Ye W, Wang YX, Feng HS, Chen BX, Liang DZ. Using knowledge graph and RippleNet algorithms to fulfill smart recommendation of water use policies during shale resources development. Journal of Hydrology. 2023;617:128970.
- 133. Rondón Díaz JD, Vilches-Blázquez LM. Characterizing water quality datasets through multi-dimensional knowledge graphs: a case study of the Bogota river basin. Journal of Hydroinformatics. 2022;24(2):295–314.
- 134. Shu C, Ouarda TBMJ. Flood frequency analysis at ungauged sites using artificial neural networks in canonical correlation analysis physiographic space. Water Resources Research. 2007;43(7).
- 135. Yan J, Lv T, Yu Y. Construction and Recommendation of a Water Affair Knowledge Graph. Sustainability. 2018;10(10):3429.
- 136. Tounsi A, Temimi M, Gourley J. On the use of machine learning to account for reservoir management rules and predict streamflow. Neural Computing & Applications. 2022;34(21):18917–31. WOS:000815555200003.
- 137. Wang P. Meteorological records from ancient chronicles of China. Bulletin of the American Meteorological Society. 1979;60(4):313–8. <0313:MRFACO>2.0.CO;2. WOS:A1979GZ61500002.
- 138. Chen S, Su Y, Fang X, He J. Climate records in ancient Chinese diaries and their application in historical climate reconstruction–a case study of Yunshan Diary. Clim Past. 2020;16(5):1873–87.
- 139.
Devlin B. The Big Data zoo–taming the beasts: the need for an integrated platform for enterprise information. Cape Town: 9sight Consulting. 2012.
- 140. Tounsi A, Temimi M. A systematic review of natural language processing applications for hydrometeorological hazards assessment. Natural Hazards. 2023. WOS:000929262400002. pmid:36776702
- 141. Brázdil R, Kundzewicz ZW, Benito G. Historical hydrology for studying flood risk in Europe. Hydrological Sciences Journal. 2006;51(5):739–64.
- 142. Zvobgo L, Johnston P, Williams P, Trisos C, Simpson N, Initiat GAM. The role of indigenous knowledge and local knowledge in water sector adaptation to climate change in Africa: a structured assessment. Sustainability Science. 2022;17(5):2077–92. WOS:000777363900001.
- 143.
Sioui M. Chapter 1—Introduction: The need for Indigenous knowledge-based water and drought policy in a changing world. In: Sioui M, editor. Current Directions in Water Scarcity Research. 4: Elsevier; 2022. p. 1–11.
- 144.
Stuchtey M. Rethinking the water cycle: McKinsey Global Institute; 2015 [April 2022]. Available from: https://www.mckinsey.com.br/~/media/McKinsey/Business%20Functions/Sustainability/Our%20Insights/Rethinking%20the%20water%20cycle/Rethinking%20the%20water%20cycle.pdf
- 145. Fast A, Jensen D, editors. Why Stacked Models Perform Effective Collective Classification. 2008 Eighth IEEE International Conference on Data Mining; 2008 15–19 Dec. 2008.
- 146. Günes F. The SAS Data Science Blog [Internet]2017. [April 2022]. Available from: https://blogs.sas.com/content/subconsciousmusings/2017/05/18/stacked-ensemble-models-win-data-science-competitions.
- 147.
Siltanen S. The Big Models of Earth and Space. Step into the World of Mathematics: Math Is Beautiful and Belongs to All of Us. Cham: Springer International Publishing; 2021. p. 37–58.
- 148.
IBM-EIS. IBM Environmental Intelligence Suite: Industries: Web source; 2022 [April 2022]. Available from: https://www.ibm.com/products/environmental-intelligence-suite/industries
- 149.
IBM-GRAF. IBM Global High-Resolution Atmospheric Forecasting System (IBM GRAF): Web source; 2022 [April 2022]. Available from: https://www.ibm.com/weather/industries/cross-industry/graf
- 150. Ponnambalam K, Mousavi SJ. CHNS Modeling for Study and Management of Human–Water Interactions at Multiple Scales. Water. 2020;12(6):1699.
- 151. Huffaker R, Canavari M, Muñoz-Carpena R. Distinguishing between endogenous and exogenous price volatility in food security assessment: An empirical nonlinear dynamics approach. Agricultural Systems. 2018;160:98–109.
- 152. Medina M, Huffaker R, Jawitz JW, Muñoz-Carpena R. Seasonal dynamics of terrestrially sourced nitrogen influenced Karenia brevis blooms off Florida’s southern Gulf Coast. Harmful Algae. 2020;98:101900. pmid:33129457
- 153. Delforge D, Muñoz-Carpena R, Van Camp M, Vanclooster M. A Parsimonious Empirical Approach to Streamflow Recession Analysis and Forecasting. Water Resources Research. 2020;56(2):e2019WR025771.
- 154. Blatman G, Sudret B. A comparison of three metamodel-based methods for global sensitivity analysis: GP modelling, HDMR and LAR-gPC. Procedia-Social and Behavioral Sciences. 2010;2(6):7613–4.
- 155. Storlie CB, Swiler LP, Helton JC, Sallaberry CJ. Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models. Reliability Engineering & System Safety. 2009;94(11):1735–63.
- 156. Lauvernet C, Helbert C. Metamodeling methods that incorporate qualitative variables for improved design of vegetative filter strips. Reliability Engineering & System Safety. 2020;204:107083.
- 157. Sobol’ I, Levitan Y. On the use of variance reducing multipliers in Monte Carlo computations of a global sensitivity index. Computer Physics Communications. 1999;117(1–2):52–61. WOS:000079344200007.
- 158.
Saltelli A. Global sensitivity analysis: the primer. Chichester, England; Hoboken, NJ: John Wiley; 2008. x, 292 p. p.
- 159. Khare YP, Muñoz-Carpena R, Rooney RW, Martinez CJ. A multi-criteria trajectory-based parameter sampling strategy for the screening method of elementary effects. Environmental Modelling & Software. 2015;64:230–9.
- 160. Srivastava V, Graham W, Munoz-Carpena R, Maxwell RM. Insights on geologic and vegetative controls over hydrologic behavior of a large complex basin–Global Sensitivity Analysis of an integrated parallel hydrologic model. Journal of hydrology. 2014;519:2238–57.
- 161.
Adadi A, Berrada M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access. 2018;6:52138–60.
- 162. Chakraborty D, Başağaoğlu H, Winterle J. Interpretable vs. noninterpretable machine learning models for data-driven hydro-climatological process modeling. Expert Systems with Applications. 2021;170:114498.
- 163. Başağaoğlu H, Chakraborty D, Lago CD, Gutierrez L, Şahinli MA, Giacomoni M, et al. A Review on Interpretable and Explainable Artificial Intelligence in Hydroclimatic Applications. Water [Internet]. 2022; 14(8).
- 164. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nature machine intelligence. 2020;2(1):56–67. Epub 2020/01/17. pmid:32607472.
- 165.
Ribeiro MT, Singh S, Guestrin C. "Why Should I Trust You?". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016/08/13: ACM; 2016.
- 166. Apley DW, Zhu J. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2020;82(4):1059–86.
- 167. Cheng S, Cheng L, Qin S, Zhang L, Liu P, Liu L, et al. Improved Understanding of How Catchment Properties Control Hydrological Partitioning Through Machine Learning. Water Resources Research. 2022;58(4).
- 168. Shen H, Tolson BA, Mai J. Time to Update the Split‐Sample Approach in Hydrological Model Calibration. Water Resources Research. 2022;58(3).
- 169. Burnell R, Schellaert W, Burden J, Ullman TD, Martinez-Plumed F, Tenenbaum JB, et al. Rethink reporting of evaluation results in AI. Science. 2023;380(6641):136–8. pmid:37053341
- 170. Piaggi PM, Weis J, Panagiotopoulos AZ, Debenedetti PG, Car R. Homogeneous ice nucleation in an ab initio machine-learning model of water. Proceedings of the National Academy of Sciences. 2022;119(33):e2207294119. pmid:35939708
- 171. Kraft B, Jung M, Körner M, Koirala S, Reichstein M. Towards hybrid modeling of the global hydrological cycle. Hydrol Earth Syst Sci. 2022;26(6):1579–614.
- 172. Muñoz-Carpena R, Batelaan O, Willems P, Hughes DA. Editorial–Why it is a blessing to be rejected: improving science with quality publications. Journal of Hydrology: Regional Studies. 2020;31:100717.
- 173. McCarthy J, Minsky ML, Rochester N, Shannon CE. A proposal for the Dartmouth summer research project on artificial intelligence (1955). Reprinted online at http://www-formal stanford edu/jmc/history/dartmouth/dartmouth html. 2018.
- 174. Calegari R, Ciatto G, Omicini A. On the integration of symbolic and sub-symbolic techniques for XAI: A survey. Intelligenza Artificiale. 2020;14(1):7–32.
- 175. Moser C, Den Hond F, Lindebaum D. What humans lose when we let AI decide. MIT Sloan Management Review. 2022:12–4.
- 176. Pazzanese C. Great promise but potential for peril. The Harvard Gazette. 2020.
- 177. Tzachor A, Devare M, King B, Avin S, Ó hÉigeartaigh S. Responsible artificial intelligence in agriculture requires systemic understanding of risks and externalities. Nature Machine Intelligence. 2022;4(2):104–9.
- 178. Benefo EO, Tingler A, White M, Cover J, Torres L, Broussard C, et al. Ethical, legal, social, and economic (ELSE) implications of artificial intelligence at a global level: a scientometrics approach. AI and Ethics. 2022:1–16.
- 179. Knight W. The Dark Secret at the Heart of AI. MIT Technology Review. 2017:1–12.