Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Physically-constrained evapotranspiration models with machine learning parameterization outperform pure machine learning: Critical role of domain knowledge

  • Yeonuk Kim,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Writing – original draft

    Affiliations Institute for Resources, Environment and Sustainability, University of British Columbia, Vancouver, Canada, Division of Hydrologic Sciences, Desert Research Institute, Las Vegas, Nevada, United States of America

  • Monica Garcia,

    Roles Funding acquisition, Investigation, Supervision, Writing – review & editing

    Affiliation Estación Experimental de Zonas Áridas. Consejo Superior de Investigaciones Científicas (EEZA-CSIC), Almería, Spain

  • T. Andrew Black,

    Roles Funding acquisition, Investigation, Supervision, Writing – review & editing

    Affiliation Faculty of Land and Food Systems, University of British Columbia, Vancouver, Canada

  • Mark S. Johnson

    Roles Conceptualization, Funding acquisition, Investigation, Supervision, Writing – review & editing

    mark.johnson@ubc.ca

    Affiliations Institute for Resources, Environment and Sustainability, University of British Columbia, Vancouver, Canada, Department of Earth, Ocean and Atmospheric Sciences, University of British Columbia, Vancouver, Canada

Abstract

Physics-informed machine learning techniques have emerged to tackle challenges inherent in pure machine learning (ML) approaches. One such technique, the hybrid approach, has been introduced to estimate terrestrial evapotranspiration (ET), a crucial variable linking water, energy, and carbon cycles. A key advantage of these hybrid ET models is their improved performance, particularly under extreme conditions, compared to ET estimates relying solely on ML. However, the mechanisms driving their improved performance are not well understood. To address this gap, we developed six hybrid approaches based on different physical formulations of ET and compared them with a pure ML model. All models employed the random forest algorithm and were trained on daily-scale ET observations, in-situ meteorological data and satellite remote sensing. We found a strong correlation (r = 0.93) between the sensitivity of ET estimates to machine-learned parameters and model error (root-mean-square error; RMSE), indicating that reduced sensitivity minimizes error propagation and improves performance. Notably, the most accurate hybrid model (RMSE = 17.8 W m-2 in energy unit) utilized a novel empirical parameter, which is relatively stable due to land-atmosphere equilibrium, outperforming both the pure ML model and hybrid models requiring conventional parameters (e.g., surface conductance). These results imply that conventional parameterizations may require reevaluated to effectively integrate physical models with machine learning, as conventional choices may not be optimal for this new, hybrid, paradigm. This study underscores the critical role of domain knowledge in setting up hybrid models, potentially guiding future hybrid model developments beyond ET estimation.

1 Introduction

Terrestrial evapotranspiration (ET), which represents the sum of plant transpiration and evaporation from soil and intercepted water, plays a crucial role in Earth’s climate as it is a nexus of the water, energy, and carbon cycles [13]. However, direct measurements of ET are limited in space and time, making satellite remote sensing-based ET estimation an essential tool. The remote sensing-based ET models generally fall into two broad categories: physically-based approaches [49] and those utilizing machine learning algorithms (ML) [1014].

In the last decade, ML algorithms have emerged as a popular approach to estimate ET. For example, random forests, support vector machines, and artificial neural networks are most widely applied to directly estimate ET using satellite and meteorological data as inputs [15,16]. More recently, advanced models such as deep learning (e.g., long short-term memory networks) [17] and ensemble frameworks combining multiple ML algorithms have been introduced in ET estimation [18]. These ML models have been shown to estimate ET with high accuracy, especially with the availability of a large number of satellite and field observations, making them a key tool in ET studies.

Despite their advantages, ML-based approaches face several challenges. These include their ‘black-box’ nature, which obscures the internal logic of the models, a reliance on large datasets, and a lack of physical constraints (e.g., surface energy balance and diffusion laws) that can hinder their generalizability and extrapolation capability, especially relevant in the context of climatic change [19,20]. To overcome these limitations, various forms of physics-informed ML, including hybrid modeling, have emerged as promising alternatives [19,21]. Here, the term hybrid approaches specifically refers to models that employ an ML approach as a sub-model to predict intermediate quantities within a physically-based framework [21]. For ET estimation, hybrid models typically employ ML to estimate difficult-to-measure parameters, such as the ratio of actual ET to potential ET (PET) or surface conductance representing soil and vegetation water stress, and then integrate these ML-determined parameters into physically-based ET models [2230]. These hybrid ET models show promise as they can combine the advantages of both physically-based models and machine learning approaches. Reflecting this trend, for example, the widely used Global Land Evaporation Amsterdam Model (GLEAM) has transitioned from a fully physically based model to a hybrid approach that constrains PET in its latest version [30].

Recent studies have demonstrated that hybrid ET models outperform pure ML models under extreme conditions [22,23,27], but evidence suggests that this advantage is not as pronounced under normal conditions [22,26,27,29]. Importantly, the mechanisms by which hybrid ET models enhance performance over pure ML models remain poorly understood, although it is generally believed that incorporating physical knowledge can lead to model improvements. The underlying assumption is that error propagation from ML-determined intermediate parameters to ET in hybrid approaches may be smaller than the direct ET estimation error using a pure ML model, thereby enhancing the accuracy of ET estimates. However, this assumption holds only if physically-based ET models are not overly sensitive to small variations in these intermediate empirical parameters. To our knowledge, no previous study has explicitly tested this underlying assumption.

To fill this knowledge gap, we examine the extent to which hybrid models enhance ET estimation performance over pure ML models, with a focus on model sensitivity to empirical parameters. We hypothesize that a lower sensitivity of ET estimates to ML-determined empirical parameters leads to improvement in model performance. To test this hypothesis, we have developed six hybrid ET models that incorporate various physical equations and empirical parameters to assess how the physical structure of the model and the choice of empirical parameter influence performance relative to pure ML models.

These hybrid models employ equations based on the Penman equation [31] (one model), the Penman-Monteith equation [32] (two models, one of which is the FAO reference crop ET equation [33]), the Priestley-Taylor equation [34] (one model), and a modified version of the Penman-Monteith equation that includes relative humidity gradients (PMRH) [35] (two models). The hybrid ET models employ the random forest (RF) algorithm to estimate intermediate parameters (e.g., the ratio between actual ET and PET, or direct physical quantities such as surface resistance and vertical relative humidity flux). Remote sensing data from the Soil Moisture Active Passive (SMAP) and MODerate resolution Imaging Spectrometer (MODIS) satellite platforms, fused with meteorological observations were used as inputs to the RF algorithm. The machine-learned intermediate parameters were then substituted into each physical equation listed above. We evaluate these hybrid ET models in comparison to daily-scale ET observations derived from eddy-covariance (EC) measurements at 40 AmeriFlux sites across various land cover types and climates.

2 Data

In this study, we utilized a combination of satellite and field observations. The satellite data were retrieved using Google Earth Engine [36] while field observations were accessed through the AmeriFlux website (https://ameriflux.lbl.gov/data/download-data).

We utilized surface soil moisture (0−5 cm) data from the SMAP enhanced L3 radiometer version 5 (SPL3SMP_E), which has a spatial resolution of 9 km with 2−3 days revisit time [37]. To ensure high-quality measurements, we specifically used descending orbit measurements. This soil moisture (SM) dataset combines the SMAP L-band radiometer and Sentinel-1 C-band radar to improve the resolution of the data. In addition, we utilized vegetation water content (VegWC) data provided by the SPL3SMP_E product as ancillary data. We also included the fraction of photosynthetically active radiation (FPAR) from the MODIS product MCD15A3H version 6.1 [38] as another satellite input. This product provides FPAR measurements at a spatial resolution of 400 m, and is a 4-day composite. The missing days between satellite data were interpolated using the next available value.

For meteorological and ET observations, we used EC flux tower data. To ensure the use of high-quality EC observations that have undergone standardized processing, including quality control and gap-filling, we employed the AmeriFlux FLUXNET dataset. This dataset was developed using the ONEFlux processing method, which is consistent with the FLUXNET2015 dataset [39]. We obtained meteorological data and daily energy balance-corrected latent and sensible heat fluxes using the Bowen-ratio conservation method [40]. This approach was employed because the physically-based models used in this study require energy balance closure. Here, the latent heat flux (LE) derived using the EC systems represents ET observations.

We selected field data for periods when all required variables were available. Moreover, we only included data for periods for which the quality control flag indicated that more than 80% of the high quality half-hourly data were used to generate the daily ET data. As the SMAP observations were available only after March 2015, we only included field data that overlapped with the satellite observation period. After applying these filtering criteria, we were left with 40 AmeriFlux FLUXNET sites (6 Canadian sites and 34 US sites) representing 39,000 daily observations (Fig 1 and Table 1). The 40 sites span a wide range of land cover types.

thumbnail
Table 1. Information for the 40 eddy-covariance sites. The fourth column shows the land cover types based on the International Geosphere-Biosphere Programme (IGBP) classification, which include evergreen needleleaf forests (ENF; 7 sites), deciduous broadleaf forests (DBF; 4 sites), closed shrublands (CSH; 2 sites), opened shrublands (OSH; 5 sites), grasslands (GRA; 8 sites), permanent wetlands (WET; 7 sites), and croplands (CRO; 7 sites). The fifth column shows the Köppen climate classification, which include mid-latitude steppe and desert (Bsh; 2 sites), tropical steppe (Bsk; 3 sites), subtropical steppe (Bwk; 2 sites), humid subtropical (Cfa; 7 sites), hot-summer mediterranean (Csa; 5 sites), warm-summer mediterranean (Csb; 3 sites), hot-summer humid continental (Dfa; 5 sites), warm-summer humid continental (Dfb; 4 sites), subarctic (Dfc; 2), extremely-cold subarctic (Dfd; 4), and Tundra (ET; 2).

https://doi.org/10.1371/journal.pone.0328798.t001

thumbnail
Fig 1. Spatial distribution of 40 AmeriFlux FLUXNET sites used in this study.

Each point indicate site locations and the different shapes representing the International Geosphere-Biosphere Programme (IGBP) land cover classification, which include evergreen needleleaf forests (ENF; 7 sites), deciduous broadleaf forests (DBF; 4 sites), closed shrublands (CSH; 2 sites), opened shrublands (OSH; 5 sites), grasslands (GRA; 8 sites), permanent wetlands (WET; 7 sites), and croplands (CRO; 7 sites). Map background from Natural Earth (http://www.naturalearthdata.com).

https://doi.org/10.1371/journal.pone.0328798.g001

3 Pure ML and hybrid ET models

3.1 Random forest model and inputs

We used the RF algorithm to develop one pure ML ET model (Pure-ML) and six hybrid ET models (Fig 2). The RF algorithm is an ensemble method of regression trees [81], and involves generating bootstrapped datasets, creating independent regression trees using randomly sampled variables, and aggregating the estimation results of the individual regression trees. RF has been shown to outperform or be comparable to other widely-used machine learning algorithms for estimating ET [15,16].

thumbnail
Fig 2. Flow chart of the six hybrid ET models and a pure machine learning model using random forest (RF) algorithms.

The model input variables include soil moisture (SM), vegetation water content (VegWC), fraction of photosynthetically active radiation (FPAR), air temperature (T), relative humidity (RH), CO2 concentration (CO2), aerodynamic conductance (gaH), global radiation (Rg), and available energy (AE).

https://doi.org/10.1371/journal.pone.0328798.g002

To create the RF models, we utilized the “Caret” and “randomForest” R packages [82,83], tuning the mtry parameter independently for each model, including both pure ML and hybrid models. Here, the mtry parameter, which is the key hyperparameter in the RF model that requires tuning, indicates the number of randomly sampled variables at each split of a decision tree. We used threefold cross-validation for mtry parameter tuning, initially setting the number of regression trees (ntree) to 100. After identifying the optimal mtry value for each model, we increased ntree to 500 to assess any performance improvement (Table 2). For training and validation, we allocated 75% of total daily ET observations, with the remaining 25% used to test model performance. It should be noted that while the RF algorithm includes several other hyperparameters (e.g., nodesize), mtry and ntree are the most influential in determining model performance and are most commonly tuned during training. Therefore, all other parameters were kept at their default values as defined in the “randomForest” R package.

thumbnail
Table 2. Summary of random forest hyperparameters after tuning. Descriptions of the ntree and mtry parameters are provided in the main text. The hybrid models are denoted as follows. H1: H-Penman, H2: H-PT, H3: H-PM-gs, H4: H-PM-Kc, H5: H-PMRH-dRH, and H6: H-PMRH-FRH. Detailed descriptions of each hybrid model are available in Section 3.3.

https://doi.org/10.1371/journal.pone.0328798.t002

We selected the inputs for the ML algorithm based on a similar prior study that evaluated both pure ML and hybrid ET models [22]. Inputs to the RF algorithm included satellite-derived SM, VegWC, FPAR, and in-situ meteorological measurements of air temperature (T, K), relative humidity (RH), CO2 concentration (CO2, μmol mol-1), global radiation (Rg, W m-2), and aerodynamic conductance for sensible heat (gaH, m s-1). Here, gaH was estimated from a semi-empirical model using friction velocity and wind speed [84,85] as described in the following section.

3.2 Aerodynamic conductance for heat and water vapour transfer

We estimate the daily-scale aerodynamic conductance for heat (gaH, m s-1) by inverting aerodynamic resistance for heat (raH, s m-1). We consider both the aerodynamic resistance to momentum transfer and the additional boundary layer resistance for heat transfer, also known as the excess resistance [84].

(1)(2)

where u(zr) is reference height wind speed (m s-1) and u* is friction velocity (m s-1). We estimated raH (s m-1) using the bigleaf R package [85].

We then assume that the boundary layer resistance for water vapour transfer is equivalent to that for sensible heat transfer (i.e., the similarity assumption). Therefore, Equation (2) serves as the aerodynamic conductance for both heat and water vapor. It should be noted that although we utilized the measured friction velocity, gaH is still an estimated value and not a true observation, as we utilized an empirical estimate of the boundary layer resistance [86]. As a result, the estimated gaH based on Equations (1) and (2) contains a degree of uncertainty. Addressing the semi-empirical nature of gaH in developing hybrid ET model is beyond our scope, as it has been explicitly covered in other studies [25,26].

3.3 Hybrid ET models

In this study, we developed six distinct hybrid approaches for estimating ET. Each approach incorporates a single intermediate parameter that cannot be directly derived from meteorological data alone. This parameter is estimated using a RF algorithm, which takes both satellite-based and meteorological variables as inputs, as illustrated in Fig 2. The ML–derived intermediate parameter is then integrated into the corresponding physically based equation to compute ET.

To train the RF models, we initially derived empirical parameters by inversely solving each physical model. This derived data then served as the training dataset for the RF algorithm. Occasionally, the derived empirical parameter values were unrealistic, which required constraining derived values within certain limits to enhance model performance (details provided in each model description below). We determined the optimal constraints that minimized error during the training process. For these models, we set the ntree value to 100 and used the default mtry value (number of input variables divided by three), adjusting the constraints accordingly for the training and validation set. The determined optimal limit was then used to tune the mtry parameter, after which the ntree value was increased to 500.

All hybrid ET models used the same input variables for their respective RF algorithms. In contrast, the pure ML model used the same inputs as the hybrid models plus available energy (AE, W m-2, which is the difference between net radiation and soil heat flux neglecting air-column and above-ground-biomass energy storage), which is not required as an input for the hybrid models because AE is incorporated within the physical equations of these models (see Fig 2). This, following the methodology outlined by Zhao et al. [22], ensures a fair and objective comparison between the pure ML and hybrid models. The following subsection provides detailed descriptions of the six hybrid ET models developed in this study.

3.3.1 Hybrid model 1: H-Penman.

The first hybrid ET model integrates potential ET (PET) equation with an empirical stress factor predicted by RF algorithm. The Penman equation, also known as open water ET, serves as the upper limit of ET [31]. Some hydrological approaches, such as the Budyko model, estimate ET by reducing the Penman PET [87]. A recent hybrid ET model also employs the Penman equation, predicting ET by multiplying it by a machine learning-determined parameter [27]. LE (and thereby ET) can be estimated as follows:

(3)

where is the saturation vapour pressure (e*(T)) slope with respect to temperature (T) (kPa K-1), γ is the psychrometric constant (kPa K-1), AE is available energy (W m-2, which is the difference between net radiation and soil heat flux and heat storage change within above-ground biomass and air-column), ρ is the air density (kg m-3), cp is the specific heat of air (J kg-1 K-1), VPD is vapour pressure deficit (kPa), and fP denotes the empirical parameter reducing Penman Equation for open water to estimate LE for the unsaturated land surface.

Theoretically, if there is no bias in gaH, the fP value should not exceed 1. In our dataset, this holds true for the most part, as 99% of the inferred fP values from observations are below 1.02. However, a small number of fP values do exceed this limit. These extreme values posed challenges during the training of the RF model to estimate fP, as they introduced instability and degraded model performance.

To ensure theoretical consistency and improve model performance, we applied a clamping strategy to limit the influence of these outliers. Specifically, we tested upper limits for fP by systematically varying the clamping threshold from the 99th to the 99.95th percentile in 0.05% increments. Based on this tuning process, we found that setting the upper limit to the 99.85th percentile (1.28) resulted in the lowest RMSE. Accordingly, we constrained fP values to the range of 0 to 1.28 by replacing any higher values with 1.28. This approach improved the stability and accuracy of the RF model for this hybrid configuration.

3.3.2 Hybrid model 2: H-PT.

The second hybrid ET model combines the Priestley-Taylor (PT) equation [34], with a stress factor that is predicted by the RF algorithm. The PT equation also provides PET values under conditions where heat advection is not strong. Satellite-based ET estimation models widely employ the PT PET, and ET is estimated by reducing the PET values through multiplication by stress factors [4,8,88,89]. In this study, we simplify these types of models by using a single stress factor that is estimated using the RF algorithm.

(4)

where fPT represents the water stress factor that is predicted by the RF algorithm. Here, 1.26 is the Priestley-Taylor coefficient. Multiplying by fPT reduces PT PET to actual LE. It should be noted that PT PET implicitly assumes that represents the upper limit of actual LE. However, actual LE can sometimes exceed PT PET due to the significant effect of advection and evaporation of intercepted precipitation, especially in irrigated agriculture in dry regions [e.g., 90]. Similar to the first hybrid model, we adjusted the upper limit of fPT to minimize the RMSE in the validation dataset. We examined the 99% to 99.95% quantile range of inferred fPT values and determined that setting fPT between 0 and 2.18 covers 99.45% of the inferred values. This Hybrid-PT model is similar to the approach of Koppa et al. [24], but they are not exactly same in that Koppa et al. [24] partitioned ET into soil evaporation and plant transpiration.

3.3.3 Hybrid model 3: H-PM-gs.

The third hybrid model is based on the Penman-Monteith (PM) equation [32]. The PM equation is also referred to as the big-leaf model because it parameterizes the land surface as a single large leaf, where surface conductance (or inversely, surface resistance) represents water regulation by this big leaf.

(5)

where gs is the surface conductance (m s-1).

In this hybrid approach, gs is estimated using the RF algorithm with satellite and meteorological inputs. Following Zhao et al. (2019), the RF predicts the logarithmic value of gs instead of the original scale, as the logarithmic value is more normally distributed. Since the RF model should be trained based on measurements, we infer gs by inverting Equation (5), and this information is used to train the RF model. In some cases, the inferred gs values may be physically unrealistic (e.g., negative) due to the uncertainty in measurements and gaH estimations. In such cases, gs is set to the minimum value (i.e., 1% quantile of inferred gs excluding negative gs) if LE <= 0, and to the maximum value (99% quantile of inferred gs excluding negative gs) if LE > 0. Similar to previous hybrid models, the upper and lower limits were selected to minimize RMSE in the validation dataset. This approach enables effective training of the RF model while ensuring accurate reproduction of LE when gs is precisely predicted.

The hybrid ET model based on the PM equation with gs estimation has become a widely used approach [22,23,26,28,29]. It is worth noting that the PM equation linearizes the saturation vapor pressure curve versus temperature (known as the Clausius‐Clapeyron relationship), which can introduce bias when the temperature difference between the land and the atmosphere is large. To resolve this issue, Zhao et al. [22] employed a quadratic form of the PM equation [91], and Chen et al. [23] employed an alternative equation to PM based on the exponential Clausius‐Clapeyron relationship [92]. We also tested the quadratic form of the PM equation and found similar performance to the original PM equation. For the sake of conciseness, we present results only for the original PM equation.

3.3.4 Hybrid model 4: H-PM-Kc.

The fourth hybrid model is also based on the PM equation. Specifically, we employ PM based reference ET (ET0) approach proposed by FAO [33]. In this approach, LE (and thereby ET) can be estimated by multiplication of ET0 and crop coefficient (Kc):

(6)

Recognizing the challenges in estimating gs, the FAO ET0 method sets gs to 1/70 (m s-1), representing well-watered grass. ET0 sometimes serves as an upper limit of ET, similar to PET, as ET rarely exceeds ET0 values. Consequently, Kc is typically not significantly larger than 1, similar to fp and fPT. However, some inferred Kc values based on observations significantly exceed theoretical expectations, similar to the cases of fp and fPT, which can degrade model performance. Therefore, we set the range of Kc from 0 to 1.77, based on the 99.2% quantile of the inferred Kc. This upper limit was also determined by tuning to minimize the RMSE of the validation dataset.

3.3.5 Hybrid model 5: H-PMRH-dRH.

The fifth hybrid ET model employs the PM equation expressed using relative humidity (PMRH) proposed by Kim et al. [35]. The PMRH ET expression is similar to the PM equation but does not incorporate surface conductance. Instead, ET is expressed as a sum of two terms: “Surface Flux Equilibrium (SFE)” ET and the ET flux driven by the vertical relative humidity difference. The equation is:

(7)

where RH is atmosphere relative humidity, and dRH is the vertical relative humidity difference between the land surface and the atmosphere (i.e., , where RHsurf is land surface relative humidity). The first term represents SFE ET, an estimate of ET under equilibrium conditions, which can be easily calculated from AE and meteorological observations [93]. The multiplier in the second term, i.e., , can be also estimated from meteorological observations, but dRH cannot be directly measured. Therefore, we aim to estimate dRH using the RF algorithm, noting dRH is influenced by surface water limitations [35]. It should be noted that the original expression for ET by Kim et al. [35] used in the second term, but we approximate it in Equation (7) by using as the difference is marginal [35].

Similar to other hybrid approaches, the RF model in the H-PMRH-dRH model is trained using inferred dRH from in-situ observations by inverting Equation (7). It should be noted that, unlike other water stress parameters, dRH can be both positive and negative, depending on whether the land surface is wet or dry relative to atmospheric conditions. The predicted dRH from the RF model is then used in Equation (7) to estimate ET.

3.3.6 Hybrid model 6: H-PMRH-FRH.

Recognizing semi-empirical nature of gaH, the sixth hybrid model employs a ML algorithm to estimate gaH dRH instead of dRH, while using the PMRH equation.

(8)

where is relative humidity flux constrained by the vertical difference of relative humidity and aerodynamic conductance.

The target empirical parameter FRH is estimated using the RF algorithm, trained with inferred FRH values from measurements. Since dRH can be both positive and negative, FRH can also be both positive and negative. The H-PMRH-FRH model is quite similar to the H-PMRH-dRH model, but its physical component of the equation is not affected by the uncertainty caused by the semi-empirical gaH equation.

3.4 Model evaluation

The root-mean-square error (RMSE) was used as the primary statistical evaluation metrics. Additionally, we present the ratio of the standard deviation of the modeled values to that of the observations () to evaluate how well the model captures the variability of the observations. Accurately capturing the standard deviation is known to be crucial for predicting extreme hydrologic events (e.g., the Kling–Gupta efficiency introduced by Gupta et al. [94]). We also present coefficient of determination (R2) to compare the ML performance in estimating each intermediate parameter for the hybrid models, as RMSE is not directly comparable across parameters due to the differing units of each parameter.

To assess the ability of the models to perform under extreme conditions, such as droughts and heatwaves, we sampled ET values corresponding to the 0th-3th percentiles (i.e., < 3%) and 97th-100th percentiles (i.e., > 97%) of environmental variables for each site using test set results [22,27].

4 Derivative-based global sensitivity assessment

We evaluate the sensitivity of each hybrid model to empirical parameters, which are key components in understanding model behavior. The physically-based components of the hybrid ET models are analytically differentiable, enabling the straightforward calculation of first-order sensitivities through partial derivatives:

(9)

where x represents the intermediate empirical parameter in each hybrid model: fP, fPT, Kc, gs, dRH, and FRH.

To compare the impacts of uncertainty in ML-derived empirical parameter (x) on the resulting ET estimates, and to standardize these impacts across different units of , a sensitivity index can be used by scaling the local sensitivity coefficients by the characteristic range of variation of each parameter (e.g., standard deviation) [95] to provide a local sensitivity index:

(10)

However, our primary interest lies in assessing global sensitivities to compare the impact of each empirical parameter across the different hybrid models. While variance-based methods like Sobol’s are commonly employed, the analytically differentiable nature of our hybrid models’ physical components allows us to use Derivative-based Global Sensitivity Measures (DGSM) [96]. This approach is supported by research demonstrating theoretical links with the well-established Sobol’s method [97]. The DGSM approach uses the following equation:

(11)

where E denotes expectation, so that v is the expectation value of . While DGSM methods typically involve estimating partial derivatives numerically, in this study we utilize analytical solutions obtained by differentiating. This approach not only enhances the precision and reliability of our sensitivity measures but also simplifies implementation.

Analogous to the local sensitivity index, we define the derivative-based global sensitivity index as follows.

(12)

This index is particularly useful as it is expressed in the same units as the output ET, revealing the variability of ET due to the variability of the empirical parameters. This facilitates a direct comparison across the empirical parameters, enhancing our understanding of their importance in different hybrid modeling frameworks.

5 Results

5.1 Tuning hyperparameters vs. employing hybrid approaches

We first examine the impact of employing a hybrid modeling approach on model performance compared to hyperparameter tuning during model training (Fig 3). Starting with the baseline Pure-ML model where ntree is set to 100, we observed that increasing ntree to 500 led to a slight improvement in model performance, as indicated by a small reduction in RMSE. Tuning the mtry parameter from its default value (i.e., number of input variables divided by three) to the optimal value resulted in an additional, though modest, decrease in RMSE.

thumbnail
Fig 3. Comparison of model performance changes during the RF hyperparameter optimization of the Pure-ML model and the implementation of the best hybrid model.

The optimization process includes: (1) increasing ntree from 100 to 500, and (2) tuning mtry from its default value (number of input variables divided by three) to the optimal value. This hyperparameter tuning is then compared with the performance improvement when selecting the best hybrid model, identified as the H-PMRH-FRH model. These results are based on the training set, with ntree = 100 serving as the baseline setting.

https://doi.org/10.1371/journal.pone.0328798.g003

However, when comparing these Pure-ML results to the performance of the hybrid models, the difference becomes more significant. The H-PMRH-FRH model, identified as the best-performing hybrid model, achieved the lowest RMSE in the training/validation dataset. Replacing the Pure-ML model with the H-PMRH-FRH model reduced RMSE by approximately 0.35 W m−2. Although the absolute value of this reduction may seem small, it is substantial when compared to the more modest improvements gained from hyperparameter tuning. This comparison underscores that the hybrid approach can provide a greater improvement in model accuracy than hyperparameter tuning.

5.2 Test set performance

Next, we evaluated the test set performance of the Pure-ML model and six hybrid models for estimating LE (Fig 4 and Table 3). These test set results were based on ntree set to 500 and the optimal mtry for each model. While the daily LE estimation performance of the Pure-ML model and the six hybrid models were similar, the hybrid models generally demonstrated better performance in terms of RMSE, with the exception of the H-Penman model.

thumbnail
Table 3. Summary of test set model performance. The hybrid models are denoted as follows. H1: H-Penman, H2: H-PT, H3: H-PM-gs, H4: H-PM-Kc, H5: H-PMRH-dRH, and H6: H-PMRH-FRH.

https://doi.org/10.1371/journal.pone.0328798.t003

thumbnail
Fig 4. Comparison of test set model performance of Pure-ML model (a) and six hybrid models (b-g). the dashed lines represent the one-to-one lines, the y-axis represents energy balance corrected LE observations from eddy covariance, while the x-axis shows the model estimated values.

The unit of RMSE is in W m-2. For reference, 28.46 W m-2 of LE for 24 hours is equivalent to 1 mm day-1 of ET at 15 °C.

https://doi.org/10.1371/journal.pone.0328798.g004

Regarding the ratio of standard deviations (), the H-PT model showed the closest value to 1, followed by the H-PMRH-FRH model. Notably, the H-PMRH-FRH model demonstrated the most significant improvement compared to the Pure-ML model, showing the lowest RMSE among all models, and the second highest value for . The consistency of the H-PMRH-FRH model’s lowest RMSE in both the test set and training/validation set evaluations further confirms its superior performance.

Fig 5 presents a comparison of the test set RMSE differences between the hybrid models and the Pure-ML model across different land cover types. The boxplots illustrate the distribution of RMSE differences for each hybrid model across the test sites within each land cover type. The white triangles indicate the mean RMSE difference for each model, offering a summary measure of performance that does not group by individual site variations. A negative RMSE difference indicates that a hybrid model outperforms the Pure-ML model, reflecting a reduction in error.

thumbnail
Fig 5. Boxplot comparison of test set RMSE differences (Hybrid minus Pure-ML) across the IGBP land cover types: all combined (All), cropland (CRO), deciduous broadleaf and evergreen needleleaf forests (DBF/ENF), grassland (GRA), open shrubland/closed shrubland (OSH/CSH), and wetland (WET).

The RMSE difference is shown for each hybrid model relative to the Pure-ML model. The number of sites in each land cover type is indicated by n. The box plots show the distribution of RMSE differences across individual sites, while the white triangles represent the mean RMSE difference across all sites in each IGBP category, not grouped by each site. Negative values indicate that the hybrid model outperforms the Pure-ML model.

https://doi.org/10.1371/journal.pone.0328798.g005

The results show that, on average, the hybrid models tend to outperform the Pure-ML model across most land cover types. The H-PMRH-FRH model consistently shows the largest reductions in RMSE across all land cover types, except for wetlands, and it was the only model to show performance improvement across all IGBP categories. In wetlands, Penman or Penman-Monteith-based models perform better, likely due to the saturated land surface conditions. However, the H-Penman model shows higher RMSE differences in land cover types such as CRO and GRA, indicating that it may not perform as effectively in water-limited environments.

5.3 Intermediate parameters’ predictability and sensitivity

The test set performance of the six hybrid ET models reveals varying levels of accuracy across the models. In this section, we explore the underlying reasons for these differences in accuracy. Our first focus is on determining whether better performance of the ML algorithm in estimating each model’s intermediate parameter (i.e., fP, fPT, Kc, gs, dRH, and FRH) correlates with improved ET estimation (Fig 6). Since the units of the intermediate parameters differ, RMSE is not an appropriate metric. Instead, we use R2 as a more suitable statistical measure for comparing the predictability of these intermediate parameters. Here, R2 is based on the test set results.

thumbnail
Fig 6. Relationship between the test set RMSE of the hybrid ET models and the test set R2 of each model’s intermediate parameter estimated by ML.

The Pearson’s correlation coefficient (R) between the x and y axes values is presented in the upper-left corner.

https://doi.org/10.1371/journal.pone.0328798.g006

Interestingly, the results show that a higher R2 for intermediate parameter estimation does not necessarily correspond to better ET estimation performance. In fact, with the exception of one outlier, the H-PT model, lower R2 values for the intermediate parameters are generally associated with higher ET estimation performance as assessed using RSME values. For instance, the H-Penman model, which has the highest R2 for its intermediate parameter, has the lowest performance in ET estimation (i.e., the highest RMSE). This counterintuitive finding suggests that when ML techniques are used to estimate a parameter that is inherently more difficult to predict, it may enhance the overall accuracy of the final ET prediction. This could be because the ML model is better leveraged when tasked with predicting more complex or less predictable parameters, thus maximizing its capability and impact on the final model performance.

Next, we conducted a sensitivity analysis. As we indicated earlier, we hypothesized that the lower sensitivity of ET estimates to ML-determined empirical parameters would lead to improved model performance. To test this hypothesis, we evaluated the relationship between the derivative-based Global Sensitivity Index and the RMSE as shown in Fig 7a. The correlation coefficient (R = 0.93, p < 0.05) between the Global Sensitivity Index (GSI) and RMSE suggests a strong positive relationship, indicating that models with lower sensitivity to empirical parameters tend to have lower RMSE values. This implies that decreased sensitivity to empirical parameters in hybrid models can contribute to reduced estimation errors, supporting our hypothesis.

thumbnail
Fig 7. Relationship between the derivative-based Global Sensitivity Index (GSI) and model performance metrics.

The x-axis represents GSI and the y-axis represents the models’ performance (test set RMSE) (a) or scaled standard deviation () (b). The dotted lines indicate the results of the Pure-ML model. The correlation (R) between GSI and the performance metrics are also presented.

https://doi.org/10.1371/journal.pone.0328798.g007

The dotted lines in Fig 7a indicate the results of the Pure-ML model, serving as a benchmark for comparison. The H-Penman model is the only hybrid model whose performance is degraded compared to the Pure-ML model, exhibiting a higher RMSE and a higher GSI than the Pure-ML model. Among the models, H-PMRH-FRH exhibited the lowest RMSE and one of the lowest GSI values, indicating its superior performance likely driven by a lower sensitivity to empirical parameters.

We can also speculate that a higher sensitivity to the empirical parameter indicates that the variability from the physical part of the model is small, implying a lesser impact from the physical model. This interpretation would be reasonable if the total standard deviation of ET is not small when GSI is small. We found that a small GSI does not necessarily correspond to a small variability of estimated ET. In fact, the standard deviation of estimated ET is negatively correlated with GSI, although this relationship is weak and is not statistically significant (R = −0.7, p = 0.083) (Fig 7b). A negative correlation implies that a model’s sensitivity to empirical parameters is not the sole determinant of its overall variability. Even if a model has low sensitivity to these parameters, it can still exhibit high variability due to significant contributions from the physical equations governing the process.

5.4 Model performance for extreme conditions

In this section, we evaluate how well each hybrid model improved ET estimation compared to the Pure-ML model under extreme conditions (Fig 8). The definition of extreme conditions can be found in section 3.3. The performance improvement varied significantly depending on the specific extreme conditions. However, when considering the median model improvement across all extremes tested, the H-PMRH-FRH model generally shows consistently large performance improvements. Notably, this result remains consistent even when the definition of extreme conditions is adjusted to represent the 1% extreme values (upper or lower 1%), as well as upper or lower 2%, instead of 3%.

thumbnail
Fig 8. Model performance improvement using the hybrid approach for LE estimation under extreme environmental conditions.

The performance improvement is quantified as differences in RMSE between the hybrid approaches and the Pure-ML model. Each boxplot represents the distribution of RMSE differences for a specific hybrid model compared to the Pure-ML model, across different environmental extremes. All input variables of the ML algorithm were used here, represented by different colors’ jitter points. Extremes are defined as lower (< 3%: downward pointing triangles) and higher (> 97%: upward pointing triangles) quantiles of test set data.

https://doi.org/10.1371/journal.pone.0328798.g008

The H-PT model exhibits large performance degradation under low radiation conditions (Rg < 3%, that is, within the lowest 3 percentiles of observed Rg for each site) and low temperature conditions (T < 3%), resulting in a wide distribution of model performance in extreme conditions. This degradation is likely due to the PT equation’s structure, which doesn’t account for the aerodynamic term, particularly when temperature or radiation are low, and sensible heat advection is strong. Interestingly, the H-PT model displays the highest value overall (Fig 7b), yet this did not translate to model improvements under extreme conditions (Fig 8).

6 Discussion

6.1. Underlying reasons for the outperformance of PMRH based models

Physically constrained hybrid approaches for estimating ET have evolved rapidly in recent years. While the specific implementations vary, most previous studies adopt one of the physical formulations used in our work in developing hybrid models [2230]. These prior studies generally report comparable performance between pure ML and hybrid model, while maintaining consistency with physical constraints. This aligns with our findings, where all tested models showed relatively similar performance, while some hybrid models show moderate performance improvement.

A unique contribution of the present study lies in systematically evaluating multiple hybrid structures, which enabled us to explore how the integration of physical constraints can enhance model performance. Specifically, we found that hybrid models are more accurate when the intermediate parameter estimated by machine learning is both difficult to predict and less influential in propagating error into the final ET estimate. Notably, the two PMRH-based hybrid models demonstrated the lowest GSI values, leading to the lowest RMSE under both normal and extreme conditions. Here, we explore the theoretical reasons for this finding.

Salvucci and Gentine [98] showed that vertical gradients of relative humidity tend to be stable due to land-atmosphere equilibration, resulting in minimized variance of the relative humidity gradients. Later studies showed that the atmospheric boundary layer tends to stabilize atmospheric relative humidity over multi-day timescales, resulting in stable and minimal vertical relative humidity flux [93,99]. Indeed, previous studies found that setting FRH to zero can approximate ET without considering the variability of FRH, which is known as Surface Flux Equilibrium (SFE) [100,101]. This implies that the first term on the right-hand side of Eqs. 7 and 8 (representing the SFE-based ET) already approximates actual ET reasonably well, unlike other physical formulations such as the PM potential ET, which deviate more substantially from actual ET.

As a result, reducing the residual error using machine learning within the PMRH-based hybrid model may be more effective than with other approaches. Furthermore, FRH and dRH exhibit lower variability than other empirical parameters due to the stabilizing influence of atmospheric boundary layer processes, leading to less error propagation in ET estimates (Fig 7). In contrast, the widely used gs is known to be a major source of error in ET estimates due to its higher variability [102].

The improved performance of H-PMRH-FRH over H-PMRH-dRH may be attributed to the uncertainty in gaH. In section 3.2, we detail how gaH was estimated using a semi-empirical model, and explain why that introduces uncertainty as discussed in a previous study [26]. H-PMRH-FRH model may reduce this uncertainty compared to H-PMRH-dRH by embedding gaH within the empirical parameter, . Additionally, negative feedback between gaH and dRH (e.g., higher gaH corresponding to lower dRH due to faster mixing) results in more stable FRH values compared to dRH. Variable importance analysis supports this feedback (Fig 9), showing that gaH is the most important variable for predicting dRH but not for FRH, even though is embedded in FRH.

thumbnail
Fig 9. Variable importance in estimating LE using the Pure-ML (a), and variable importance in estimating each empirical parameter in the six hybrid models (b-g).

Here, %IncMSE indicates the increase of the mean squared error when variables are randomly permuted. In-situ meteorological measurements are shown in blue (Meteo), and satellite-derived land surface parameters are shown in orange (Satellite).

https://doi.org/10.1371/journal.pone.0328798.g009

Moreover, H-PMRH-FRH is unique among the hybrid models in identifying satellite inputs as the most important variables (Fig 9). This suggests that meteorological data is effectively integrated into the physical equation, while the empirical parameter FRH relies more on land surface information that is not included in the physical model.

6.2. Potential caveats

Despite the valuable insights it provides, our analysis is subject to several limitations. Firstly, we trained and evaluated our model using EC measurements, which incorporate systematic uncertainties due to the lack of energy balance closure. This means that the measured sum of latent and sensible heat fluxes is typically smaller than the measured AE, posing a challenge when employing physical models, as all the tested models assume energy balance closure. To address this issue, we utilized energy balance-corrected EC data based on the Bowen ratio preservation method [39,40]. However, it should be noted that this correction may result in an overestimation of ET [103].

Secondly, we utilized not only meteorological variables measured in the field but also satellite-driven data. However, due to the discrepancy between the footprint size representing EC observations and pixel size representing satellite remote sensing, remotely sensed variables may not accurately represent the land surface conditions observed by EC measurements, especially over heterogeneous land cover. For example, the SMAP soil moisture product has a spatial resolution of approximately 9 km, which is substantially larger than the typical EC footprint. As a result, the satellite-derived variables may not fully capture the local conditions observed by EC towers, potentially affecting model performance. Nonetheless, since all tested models, both hybrid and pure ML, used the same set of satellite inputs, the influence of this uncertainty on the machine-learned parameters is expected to be comparable across models. Therefore, the model comparisons remain valid despite this limitation. Moreover, satellite-driven information proved to be influential in determining empirical parameters in our variable importance analysis (Fig 9). This finding implies the suitability of utilizing satellite remote sensing as a predictor of site-level ET.

Third, in this study we employed only the RF algorithm to evaluate the performance of hybrid models relative to a pure ML model. Including additional machine learning algorithms would significantly increase the complexity of the model comparison. Therefore, we focused on RF, which has been shown to perform well in estimating ET compared to other ML algorithms [15]. Nonetheless, our findings remain subject to validation using alternative ML approaches such as artificial neural networks. Future studies could address this limitation by testing multiple ML algorithms within a simplified set of hybrid model configurations.

Finally, although we explore performance differences among the hybrid models and the Pure-ML model, the disparities in overall model performance were relatively small particularly in water units. This could be attributed to the ML model already performing at close to the measurement uncertainty [104]. Nevertheless, even these small differences in daily timescale errors can have a significant impact when a model predicts ET during extreme events, or when models are applied outside the current climatic ranges. Also, this accuracy enhancement is substantial compared to the hyperparameter tuning of RF model.

7 Conclusions

In this study, we evaluated physically-constrained hybrid ET models to enhance our understanding of how hybrid models improve ET estimation performance over pure ML approaches. Our key findings include: (1) employing hybrid models significantly enhances performance over the pure ML model, especially when compared to the gains from hyperparameter tuning of the RF algorithm; (2) the best-performing hybrid model, H-PMRH-FRH, consistently demonstrates high performance across the training/validation set, test set, various land cover types, and under extreme conditions; (3) generally, the more challenging it is to estimate the intermediate parameter requiring ML, the more accurate is the final ET prediction; and (4) minimizing the sensitivity of ET estimates to ML-determined parameters is crucial, as it reduces error propagation and leads to more robust performance improvements.

Our evaluation of six hybrid ET models in both normal and extreme conditions demonstrated that using vertical relative humidity flux as an ML-determined parameter exhibited the best performance compared to conventional approaches. This superior performance can be attributed to the relatively small variability of relative humidity flux, likely due to the Surface Flux Equilibrium theory and relatively lesser impact from uncertainty introduced by the semi-empirical aerodynamic conductance equation. This highlights the critical importance of domain knowledge in selecting appropriate physical models and parameter combinations for hybrid model development. Also, our results challenge the utility of the widely-used conventional parameters in hybrid approaches, as these conventional parameters, such as surface conductance, may not be the best option for hybrid methods due to their high variability and sensitivity.

While our study focused on ET, these insights are likely applicable to hybrid models for other quantities that require empirical parameters, even when physical processes are well defined. By leveraging domain knowledge, researchers can select better physical structures and parameters, ultimately enhancing the performance and reliability of hybrid models across various applications. For instance, similar approaches could be beneficial in modeling other hydrological or atmospheric processes.

Acknowledgments

We express our thanks to the AmeriFlux data providers, site investigators, technicians, and AmeriFlux Management Project team without who this effort would not have been possible.

References

  1. 1. Katul GG, Oren R, Manzoni S, Higgins C, Parlange MB. Evapotranspiration: A process driving mass transport and energy exchange in the soil‐plant‐atmosphere‐climate system. Rev Geophys. 2012;50(3).
  2. 2. Wang K, Dickinson RE. A review of global terrestrial evapotranspiration: Observation, modeling, climatology, and climatic variability. Rev Geophys. 2012;50(2).
  3. 3. Fisher JB, Melton F, Middleton E, Hain C, Anderson M, Allen R, et al. The future of evapotranspiration: Global requirements for ecosystem functioning, carbon and climate feedbacks, agricultural management, and water resources. Water Resour Res. 2017;53(4):2618–26.
  4. 4. Fisher JB, Tu KP, Baldocchi DD. Global estimates of the land–atmosphere water flux based on monthly AVHRR and ISLSCP-II data, validated at 16 FLUXNET sites. Remote Sens Environ. 2008;112(3):901–19.
  5. 5. Leuning R, Zhang YQ, Rajaud A, Cleugh H, Tu K. A simple surface conductance model to estimate regional evaporation using MODIS leaf area index and the Penman‐Monteith equation. Water Resour Res. 2008;44(10).
  6. 6. Mu Q, Zhao M, Running SW. Improvements to a MODIS global terrestrial evapotranspiration algorithm. Remote Sens Environ. 2011;115(8):1781–800.
  7. 7. García M, Sandholt I, Ceccato P, Ridler M, Mougin E, Kergoat L, et al. Actual evapotranspiration in drylands derived from in-situ and satellite data: Assessing biophysical constraints. Remote Sens Environ. 2013;131:103–18.
  8. 8. Martens B, Miralles DG, Lievens H, van der Schalie R, de Jeu RAM, Fernandez-Prieto D, et al. GLEAM v3: satellite-based land evaporation and root-zone soil moisture. Geosci Model Dev. 2017;10(5):1903–25. pmid:WOS:000401485700001
  9. 9. Melton FS, Huntington J, Grimm R, Herring J, Hall M, Rollison D, et al. OpenET: filling a critical data gap in water management for the Western United States. J American Water Resour Assoc. 2021;58(6):971–94.
  10. 10. Jung M, Reichstein M, Ciais P, Seneviratne SI, Sheffield J, Goulden ML, et al. Recent decline in the global land evapotranspiration trend due to limited moisture supply. Nature. 2010;467(7318):951–4. pmid:20935626
  11. 11. Tramontana G, Jung M, Schwalm CR, Ichii K, Camps-Valls G, Ráduly B, et al. Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms. Biogeosciences. 2016;13(14):4291–313.
  12. 12. Xu T, Guo Z, Liu S, He X, Meng Y, Xu Z, et al. Evaluating different machine learning methods for upscaling evapotranspiration from flux towers to the regional scale. JGR Atmospheres. 2018;123(16):8674–90.
  13. 13. Shang K, Yao Y, Liang S, Zhang Y, Fisher JB, Chen J, et al. DNN-MET: A deep neural networks method to integrate satellite-derived evapotranspiration products, eddy covariance observations and ancillary information. Agric Forest Meteorol. 2021;308–309:108582.
  14. 14. Bodesheim P, Jung M, Gans F, Mahecha MD, Reichstein M. Upscaled diurnal cycles of land–atmosphere fluxes: a new global half-hourly data product. Earth Syst Sci Data. 2018;10(3):1327–65.
  15. 15. Shi H, Luo G, Hellwich O, Xie M, Zhang C, Zhang Y, et al. Evaluation of water flux predictive models developed using eddy-covariance observations and machine learning: a meta-analysis. Hydrol Earth Syst Sci. 2022;26(18):4603–18.
  16. 16. Amani S, Shafizadeh-Moghadam H. A review of machine learning models and influential factors for estimating evapotranspiration using remote sensing and ground-based data. Agric Water Manag. 2023;284:108324.
  17. 17. Babaeian E, Paheding S, Siddique N, Devabhaktuni VK, Tuller M. Short- and mid-term forecasts of actual evapotranspiration with deep learning. J Hydrol. 2022;612:128078.
  18. 18. Jung M, Koirala S, Weber U, Ichii K, Gans F, Camps-Valls G, et al. The FLUXCOM ensemble of global land-atmosphere energy fluxes. Sci Data. 2019;6(1):74. pmid:31133670
  19. 19. Karpatne A, Atluri G, Faghmous JH, Steinbach M, Banerjee A, Ganguly A, et al. Theory-Guided data science: a new paradigm for scientific discovery from data. IEEE Trans Knowl Data Eng. 2017;29(10):2318–31.
  20. 20. Shen C, Appling AP, Gentine P, Bandai T, Gupta H, Tartakovsky A, et al. Differentiable modelling to unify machine learning and physical models for geosciences. Nat Rev Earth Environ. 2023;4(8):552–67.
  21. 21. Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N, et al. Deep learning and process understanding for data-driven Earth system science. Nature. 2019;566(7743):195–204. pmid:30760912
  22. 22. Zhao WL, Gentine P, Reichstein M, Zhang Y, Zhou S, Wen Y, et al. Physics‐Constrained Machine Learning of Evapotranspiration. Geophys Res Lett. 2019;46(24):14496–507.
  23. 23. Chen H, Huang JJ, Dash SS, Wei Y, Li H. A hybrid deep learning framework with physical process description for simulation of evapotranspiration. J Hydrol. 2022;606:127422.
  24. 24. Koppa A, Rains D, Hulsman P, Poyatos R, Miralles DG. A deep learning-based hybrid model of global terrestrial evaporation. Nat Commun. 2022;13(1):1912. pmid:35395845
  25. 25. Hu X, Shi L, Lin G, Lin L. Comparison of physical-based, data-driven and hybrid modeling approaches for evapotranspiration estimation. J Hydrol. 2021;601:126592.
  26. 26. ElGhawi R, Kraft B, Reimers C, Reichstein M, Körner M, Gentine P, et al. Hybrid modeling of evapotranspiration: inferring stomatal and aerodynamic resistances using combined physics-based and machine learning. Environ Res Lett. 2023;18(3):034039.
  27. 27. Shang K, Yao Y, Di Z, Jia K, Zhang X, Fisher JB, et al. Coupling physical constraints with machine learning for satellite-derived evapotranspiration of the Tibetan Plateau. Remote Sens Environ. 2023;289:113519.
  28. 28. Liu Y, Zhang S, Zhang J, Tang L, Bai Y. Assessment and Comparison of Six Machine Learning Models in Estimating Evapotranspiration over Croplands Using Remote Sensing and Meteorological Factors. Remote Sensing. 2021;13(19):3838.
  29. 29. Guo N, Chen H, Han Q, Wang T. Evaluating data-driven and hybrid modeling of terrestrial actual evapotranspiration based on an automatic machine learning approach. J Hydrol. 2024;628:130594.
  30. 30. Miralles DG, Bonte O, Koppa A, Baez-Villanueva OM, Tronquo E, Zhong F, et al. GLEAM4: global land evaporation and soil moisture dataset at 0.1 resolution from 1980 to near present. Sci Data. 2025;12(1):416. pmid:40064907
  31. 31. Penman HL. Natural evaporation from open water, hare soil and grass. Proc R Soc Lond A Math Phys Sci. 1948;193(1032):120–45. pmid:18865817
  32. 32. Monteith JL. Evaporation and environment. Cambridge: Cambridge University Press (CUP). 1965.
  33. 33. Allen RG, Pereira LS, Raes D, Smith M. Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. 300. Rome: FAO; 1998.
  34. 34. Priestley CHB, Taylor RJ. On the assessment of surface heat flux and evaporation using large-scale parameters. Mon Wea Rev. 1972;100(2):81–92.
  35. 35. Kim Y, Garcia M, Morillas L, Weber U, Black TA, Johnson MS. Relative humidity gradients as a key constraint on terrestrial water and energy fluxes. Hydrol Earth Syst Sci. 2021;25(9):5175–91.
  36. 36. Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens Environ. 2017;202:18–27.
  37. 37. O’Neill P, Chan S, Njoku E, Jackson T, Bindlish R, Chaubell J, et al. SMAP Enhanced L3 Radiometer Global and Polar Grid Daily 9 km EASE-Grid Soil Moisture, Version 5 [dataset]. 2021.[cited 2023-02-06]. NASA National Snow and Ice Data Center Distributed Active Archive Center. Referenced in
  38. 38. Myneni R, Knyazikhin Y, Park T. MODIS/Terra+Aqua Leaf Area Index/FPAR 4-Day L4 Global 500m SIN Grid V061 [dataset]. 2021.[cited 2023-02-06]. NASA EOSDIS Land Processes DAAC. Referenced in
  39. 39. Pastorello G, Trotta C, Canfora E, Chu H, Christianson D, Cheah Y-W, et al. The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Sci Data. 2020;7(1):225. pmid:32647314
  40. 40. Twine TE, Kustas WP, Norman JM, Cook DR, Houser PR, Meyers TP, et al. Correcting eddy-covariance flux underestimates over a grassland. Agric Forest Meteorol. 2000;103(3):279–300.
  41. 41. Staebler R. AmeriFlux FLUXNET-1F CA-Cbo Ontario - Mixed Deciduous, Borden Forest Site [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/ Referenced in https://doi.org/10.17190/AMF/1854365
  42. 42. Knox S. AmeriFlux FLUXNET-1F CA-DBB2 Delta Burns Bog 2 [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881564
  43. 43. Christen A, Knox S. AmeriFlux FLUXNET-1F CA-DBB Delta Burns Bog [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881565
  44. 44. Black TA. AmeriFlux FLUXNET-1F CA-LP1 British Columbia - Mountain pine beetle-attacked lodgepole pine stand [dataset]. 2021. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1832155
  45. 45. Arain MA. AmeriFlux FLUXNET-1F CA-TP3 Ontario - Turkey Point 1974 Plantation White Pine [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881566
  46. 46. Arain MA. AmeriFlux FLUXNET-1F CA-TPD Ontario - Turkey Point Mature Deciduous [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881567
  47. 47. Billesbach D, Kueppers L, Torn M, Biraud S. AmeriFlux FLUXNET-1F US-A32 ARM-SGP Medford hay pasture [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881568
  48. 48. Biraud S, Fischer M, Chan S, Torn M. AmeriFlux FLUXNET-1F US-ARM ARM Southern Great Plains site- Lamont [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1854366
  49. 49. Rey-Sanchez C, Wang CT, Szutu D, Shortt R, Chamberlain SD, Verfaillie J, et al. AmeriFlux FLUXNET-1F US-Bi1 Bouldin Island Alfalfa [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1871134
  50. 50. Rey-Sanchez C, Wang CT, Szutu D, Hemes KS, Verfaillie J, Baldocchi D. AmeriFlux FLUXNET-1F US-Bi2 Bouldin Island corn [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1871135
  51. 51. Euskirchen E. AmeriFlux FLUXNET-1F US-BZB Bonanza Creek Thermokarst Bog [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881569
  52. 52. Euskirchen E. AmeriFlux FLUXNET-1F US-BZF Bonanza Creek Rich Fen [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881570
  53. 53. Euskirchen E. AmeriFlux FLUXNET-1F US-BZo Bonanza Creek Old Thermokarst Bog [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881571
  54. 54. Euskirchen E. AmeriFlux FLUXNET-1F US-BZS Bonanza Creek Black Spruce [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881572
  55. 55. Desai AR. AmeriFlux FLUXNET-1F US-CS2 Tri county school Pine Forest [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881577
  56. 56. Frank J, Massman B. AmeriFlux FLUXNET-1F US-GLE GLEES [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1871136
  57. 57. Liu H, Huang M, Chen X. AmeriFlux FLUXNET-1F US-Hn2 Hanford 100H grassland [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1902832
  58. 58. Liu H, Huang M, Chen X. AmeriFlux FLUXNET-1F US-Hn3 Hanford 100H sagebrush [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881580
  59. 59. Euskirchen E, Shaver G, Bret-Harte S. AmeriFlux FLUXNET-1F US-ICs Imnavait Creek Watershed Wet Sedge Tundra [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1871138
  60. 60. Euskirchen E, Shaver G, Bret-Harte S. AmeriFlux FLUXNET-1F US-ICt Imnavait Creek Watershed Tussock Tundra [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881583
  61. 61. Tweedie C. AmeriFlux FLUXNET-1F US-Jo1 Jornada Experimental Range Bajada Site [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1902833
  62. 62. Vivoni ER, Perez-Ruiz ER. AmeriFlux FLUXNET-1F US-Jo2 Jornada Experimental Range Mixed Shrubland [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881584
  63. 63. Brunsell N. AmeriFlux FLUXNET-1F US-KFS Kansas Field Station [dataset]. 2022. AmerFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881585
  64. 64. Brunsell N. AmeriFlux FLUXNET-1F US-KLS Kansas Land Institute [dataset]. 2021. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1854367
  65. 65. Law B. AmeriFlux FLUXNET-1F US-Me2 Metolius mature ponderosa pine [dataset]. 2021. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1854368
  66. 66. Wood J, Gu L. AmeriFlux FLUXNET-1F US-MOz Missouri Ozark Site [dataset]. 2021. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1854370
  67. 67. Noormets A. AmeriFlux FLUXNET-1F US-NC4 NC_AlligatorRiver [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1902837
  68. 68. Torn M, Dengel S. AmeriFlux FLUXNET-1F US-NGC NGEE Arctic Council [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1902838
  69. 69. Blanken PD, Monson RK, Burns SP, Bowling DR, Turnipseed AA. AmeriFlux FLUXNET-1F US-NR1 Niwot Ridge Forest (LTER NWT1) [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1871141
  70. 70. Silveira M. AmeriFlux FLUXNET-1F US-ONA Florida pine flatwoods [dataset]. 2021. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1832163
  71. 71. Flerchinger G. AmeriFlux FLUXNET-1F US-Rms RCEW Mountain Big Sagebrush [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881587
  72. 72. Baker J, Griffis T, Griffis T. AmeriFlux FLUXNET-1F US-Ro1 Rosemount- G21 [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881588.
  73. 73. Baker J, Griffis T. AmeriFlux FLUXNET-1F US-Ro4 Rosemount Prairie [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881589
  74. 74. Baker J, Griffis T. AmeriFlux FLUXNET-1F US-Ro5 Rosemount I18_South [dataset]. 2021. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1818371
  75. 75. Baker J, Griffis T. AmeriFlux FLUXNET-1F US-Ro6 Rosemount I18_North [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881590
  76. 76. Flerchinger G. AmeriFlux FLUXNET-1F US-Rwf RCEW Upper Sheep Prescibed Fire [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881591
  77. 77. Flerchinger G. AmeriFlux FLUXNET-1F US-Rws Reynolds Creek Wyoming big sagebrush [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881592
  78. 78. Shortt R, Hemes K, Szutu D, Verfaillie J, Baldocchi D. AmeriFlux FLUXNET-1F US-Sne Sherman Island Restored Wetland [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1871144
  79. 79. Chamberlain SD, Oikawa P, Sturtevant C, Szutu D, Verfaillie J, Baldocchi D. AmeriFlux FLUXNET-1F US-Tw3 Twitchell Alfalfa [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881594
  80. 80. NEON. AmeriFlux FLUXNET-1F US-xBR NEON Bartlett Experimental Forest (BART) [dataset]. 2022. AmeriFlux AMP. Available from: https://ameriflux.lbl.gov/data/download-data/. Referenced in https://doi.org/10.17190/AMF/1881598
  81. 81. Breiman L. Random forests. Machine Learning. 2001;45(1):5–32.
  82. 82. Kuhn M. Caret: classification and regression training. Astrophysics Source Code Library. 2015.
  83. 83. Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.
  84. 84. Thom AS. Momentum, mass and heat exchange of vegetation. Quart J Royal Meteoro Soc. 1972;98(415):124–34.
  85. 85. Knauer J, El-Madany TS, Zaehle S, Migliavacca M. Bigleaf-An R package for the calculation of physical and physiological ecosystem properties from eddy covariance data. PLoS One. 2018;13(8):e0201114. pmid:30106974
  86. 86. Knauer J, Zaehle S, Medlyn BE, Reichstein M, Williams CA, Migliavacca M, et al. Towards physiologically meaningful water-use efficiency estimates from eddy covariance data. Glob Chang Biol. 2018;24(2):694–710. pmid:28875526
  87. 87. Milly PCD, Dunne KA. Potential evapotranspiration and continental drying. Nature Climate Change. 2016;6(10):946–9.
  88. 88. Lathuillière MJ, Dalmagro HJ, Black TA, de Arruda PHZ, Hawthorne I, Couto EG, et al. Rain-fed and irrigated cropland-atmosphere water fluxes and their implications for agricultural production in Southern Amazonia. Agric Forest Meteorol. 2018;256–257:407–19.
  89. 89. Brown MG, Black TA, Nesic Z, Foord VN, Spittlehouse DL, Fredeen AL, et al. Evapotranspiration and canopy characteristics of two lodgepole pine stands following mountain pine beetle attack. Hydrol Process. 2013;28(8):3326–40.
  90. 90. Baldocchi D, Knox S, Dronova I, Verfaillie J, Oikawa P, Sturtevant C, et al. The impact of expanding flooded land area on the annual evaporation of rice. Agric Forest Meteorol. 2016;223:181–93.
  91. 91. Paw U KT, Gao W. Applications of solutions to non-linear energy budget equations. Agric Forest Meteorol. 1988;43(2):121–45.
  92. 92. McColl KA. Practical and theoretical benefits of an alternative to the Penman‐Monteith evapotranspiration equation. Water Resour Res. 2020;56(6).
  93. 93. McColl KA, Salvucci GD, Gentine P. Surface flux equilibrium theory explains an empirical estimate of water‐limited daily evapotranspiration. J Adv Model Earth Syst. 2019;11(7):2036–49.
  94. 94. Gupta HV, Kling H, Yilmaz KK, Martinez GF. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J Hydrol. 2009;377(1–2):80–91.
  95. 95. McClarren RG. Local sensitivity analysis based on derivative approximations. Uncertainty quantification and predictive computational science: a foundation for physical scientists and engineers. Cham: Springer International Publishing; 2018. p. 95–109.
  96. 96. Kucherenko S, Rodriguez-Fernandez M, Pantelides C, Shah N. Monte Carlo evaluation of derivative-based global sensitivity measures. Reliab Eng Syst Saf. 2009;94(7):1135–48.
  97. 97. Sobol’ IM, Kucherenko S. Derivative based global sensitivity measures and their link with global sensitivity indices. Math Comput Simul. 2009;79(10):3009–17.
  98. 98. Salvucci GD, Gentine P. Emergent relation between surface vapor conductance and relative humidity profiles yields evaporation rates from weather data. Proc Natl Acad Sci U S A. 2013;110(16):6287–91. pmid:23576717
  99. 99. Kim Y, Garcia M, Black TA, Johnson MS. Assessing the Complementary Role of Surface Flux Equilibrium (SFE) Theory and Maximum Entropy Production (MEP) Principle in the Estimation of Actual Evapotranspiration. J Adv Model Earth Syst. 2023;15(7).
  100. 100. McColl KA, Rigden AJ. Emergent simplicity of continental evapotranspiration. Geophys Res Lett. 2020;47(6).
  101. 101. Chen S, McColl KA, Berg A, Huang Y. Surface flux equilibrium estimates of evapotranspiration at large spatial scales. J Hydrometeorol. 2021;22(4):765–79.
  102. 102. Polhamus A, Fisher JB, Tu KP. What controls the error structure in evapotranspiration models?. Agric Forest Meteorol. 2013;169:12–24.
  103. 103. Mauder M, Foken T, Cuxart J. Surface-energy-balance closure over land: a review. Boundary-Layer Meteorol. 2020;177(2–3):395–426.
  104. 104. Richardson AD, Mahecha MD, Falge E, Kattge J, Moffat AM, Papale D, et al. Statistical properties of random CO2 flux measurement uncertainty inferred from model residuals. Agric Forest Meteorol. 2008;148(1):38–50.