Comparison of five Boosting-based models for estimating daily reference evapotranspiration with limited meteorological variables

Tianao Wu; Wei Zhang; Xiyun Jiao; Weihua Guo; Yousef Alhaj Hamoud

doi:10.1371/journal.pone.0235324

Abstract

Accurate ET₀ estimation is of great significance in effective agricultural water management and realizing future intelligent irrigation. This study compares the performance of five Boosting-based models, including Adaptive Boosting(ADA), Gradient Boosting Decision Tree(GBDT), Extreme Gradient Boosting(XGB), Light Gradient Boosting Decision Machine(LGB) and Gradient boosting with categorical features support(CAT), for estimating daily ET₀ across 10 stations in the eastern monsoon zone of China. Six different input combinations and 10-fold cross validation method were considered for fully evaluating model accuracy and stability under the condition of limited meteorological variables input. Meanwhile, path analysis was used to analyze the effect of meteorological variables on daily ET₀ and their contribution to the estimation results. The results indicated that CAT models could achieve the highest accuracy (with global average RMSE of 0.5667 mm d^-1, MAE of 4199 mm d^-1and Adj_R² of 0.8514) and best stability regardless of input combination and stations. Among the inputted meteorological variables, solar radiation(Rs) offers the largest contribution (with average value of 0.7703) to the R² value of the estimation results and its direct effect on ET₀ increases (ranging 0.8654 to 0.9090) as the station’s latitude goes down, while maximum temperature (T_max) showes the contrary trend (ranging from 0.8598 to 0.5268). These results could help to optimize and simplify the variables contained in input combinations. The comparison between models based on the number of the day in a year (J) and extraterrestrial radiation (Ra) manifested that both J and Ra could improve the modeling accuracy and the improvement increased with the station’s latitudes. However, models with J could achieve better accuracy than those with Ra. In conclusion, CAT models can be most recommended for estimating ET₀ and input variable J can be promoted to improve model performance with limited meteorological variables in the eastern monsoon zone of China.

Citation: Wu T, Zhang W, Jiao X, Guo W, Hamoud YA (2020) Comparison of five Boosting-based models for estimating daily reference evapotranspiration with limited meteorological variables. PLoS ONE 15(6): e0235324. https://doi.org/10.1371/journal.pone.0235324

Editor: Vassilis G. Aschonitis, Hellenic Agricultural Organization - Demeter, GREECE

Received: May 9, 2020; Accepted: June 14, 2020; Published: June 29, 2020

Copyright: © 2020 Wu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data underlying the results presented in the study are available fromNational Meteorological Information Center (NMIC) of China Meteorological Administration (CMA) (http://data.cma.cn/).

Funding: This study is financially supported by National Natural Science Foundation of China (No: 51609064) and the Fundamental Research Funds for the Central Universities (B19020185).

Competing interests: The authors have declared that no competing interests exist.

Introduction

Reference evapotranspiration (ET₀) is an essential factor in both of hydrological and ecological process [1–5]. Since ET₀ plays a crucial role in calculating crop water requirement, water budgeting and agricultural water management, accurate estimation of ET₀ is very meaningful and also serves as the foundation of realizing water-saving irrigation and intelligent irrigation. Methods of obtaining ET₀ can be generally divided into three types: experimental method, empirical models and numerical simulations. Although experimental determination can measure ET₀ directly, it can hardly be popularized due to its tedious operation steps and strong regional limitations [6–8]. Now days, FAO-56 Penman-Monteith (FAO-56 PM) model is generally regarded as the most authentic method for estimating ET₀ in semiarid and humid regions and the estimation result is also widely used as the target to validate other models in areas where ET₀ data are not available [9–12]. However, the meteorological variables required by FAO-56 PM model for estimating ET₀ are difficult to obtain or fully available in most regions, which makes it difficult to be implemented. According to the principle of selecting ideal model for estimating ET₀ proposed by Shih [13], ideal models should be based on minimal input variables with acceptable accuracy. Therefore, empirical models based on less meteorological variables have evolved to enhance the practicality of empirical models over the years [12,14–16], which can be generally classified as temperature-based, radiation-based, pan evaporation-based, mass transfer-based and combination type [4]. Among all these empirical models, Hargreaves-Samani model [17] requires the least meteorological variables input and has already been proved its accuracy around the world, which makes it the most popular empirical model. Other empirical models based on simplified Penman-Monteith model and solar radiation, such as Priestley-Taylor model [18], Irmak model [19] and Makkink model [20], have also been implemented in areas where full meteorological factors can hardly be obtained. However, these methods usually have such regional limitation and poor portability that they are not suitable to be applied for accurate estimation directly without taking localization approach.

By introducing intelligent algorithms for analyzing the non-linear relationship between meteorological variables and ET₀, numerical simulation method using machine learning and deep learning have been advanced greatly. Since Kuma first investigated artificial neural network (ANN) models for estimating ET₀ [21], this kind of method has attracted more and more researchers because of its short time, high precision and strong generalization ability. These algorithms can be generally classified as artificial neural networks-based [9,22–25], tree-based [7,26,27], kernel-based [28,29], heuristic-based [27,30,31] and hybrid algorithm-based [32,33].

To further improve the accuracy of machine learning algorithm in ET₀ estimation, ensemble learning has drawn attention from more and more researchers. The core idea of ensemble learning is to combine several ‘weak learners’ to build a new ‘strong learner’, so as to reduce bias, variance and improve prediction results. Common ensemble learning models like Random Forest [34], Gradient Boosting Decision Tree [35] and Extreme Gradient Boosting models [36] have already widely used in various classification and regression problems [3,37–39] based on the characteristics of simple structure and high accuracy.

This study provides a comparison of five Boosting-based models to find out the best Boosting-based for estimating daily ET₀ under the condition of limited input variables in the eastern monsoon zone of China. Therefore, the main purpose of this study produced as follows: (1) to compare the accuracy and stability of Boosting-based models with various input combinations across different climate zones; (2) to find an effective approach for improving the modeling accuracy under the condition of limited input variables.

Material and methods

Study area and data description

Geographically, the eastern monsoon zone of China is located in the east of the Great Khingan Mountains, south of the Inner Mongolia Plateau and east of the eastern edge of the Tibetan Plateau, including the second-level Loess Plateau, Sichuan Basin, Yunnan-Guizhou Plateau and the Hengduan Mountain area, as well as the third-level coastal plain and hilly areas. The climate types of the eastern monsoon zone include temperate monsoon climate, subtropical monsoon climate and tropical monsoon climate. This study area is significantly affected by the ocean monsoon in summer and the cold air flow from the north in winter. The annual average temperature changes significantly with latitude, showing a decreasing trend from south to north. This zone accounts for about 45% of the country's land area and 95% of the Chinese total population. As the eastern monsoon zone servers as one of the main farming areas of China, the research on the estimation model of ET₀ can provide scientific basis for the accurate prediction of crop water demand in this region and improve the utilization efficiency of agricultural water resources, which is of great significance to the sustainable utilization of water resources.

According to the climate type and latitude distribution range of the eastern monsoon zone, 10 meteorological stations (Harbin, Shenyang, Yan 'an, Jinan, Nanjing, Changsha, Chengdu, Kunming, Nanning and Guangzhou) were selected as research stations. To be more specific, Harbin, Shenyang, Yan 'an, Jinan belong to the temperate monsoon zone (TMZ), Nanjing, Changsha, Chengdu, Kunming belong to the subtropical monsoon zone (SMZ) and Nanning, Guangzhou belong to the tropical monsoon zone (TPMZ).

In order to test and verify the accuracy and stability of Boosting-based models for ET₀ estimation, daily meteorological variables, including maximum(T_max), and minimum(T_min) air temperature, relative humidity (RH), wind speed at 2 m height (U₂) and solar radiation (Rs) from 1997 to 2016 continuously, were selected as the training and testing data set. The above meteorological data was obtained from the National Meteorological Information Center (NMIC) of China Meteorological Administration (CMA) with good quality and high precision and the missing data was interpolated through PYTHON KNN interpolation method in data pre-processing. The annual average values of the main meteorological variables at above stations during the study period were illustrated in Table 1.

Download:

Table 1. The annual average of the main meteorological variables of 10 stations during the study period.

https://doi.org/10.1371/journal.pone.0235324.t001

All daily meteorological data were normalized to fall between 0 and 1 to improve the convergence rate of the model and minimize the influence of absolute scale. The normalization equation is as follows [3,24,26]: (1) Where X_norm is the normalized value, X₀, X_min, and X_max are the real value, the minimum value, and the maximum value of the same variable, respectively.

FAO-56 Penman-Monteith model

Since it is difficult to obtain the practical ET₀ data in this study area, ET₀ values calculated by the FAO-56 Penman-Monteith model are regarded as the target for training and testing the Boosting-based models, which is a widely used and acceptable practice in this case [2,8,22,30,40].

The FAO-56 PM model is expressed as: (2) Where ET₀ is reference evapotranspiration (mm d^-1), R_n is net radiation (MJ m^-2 d^-1), G is soil heat flux density (MJ m^-2 d^-1), T_mean is mean air temperature at 2 m height (°C), e_s is saturation vapor pressure (kPa), e_a is actual vapor pressure (kPa), Δ is slope of the saturation vapor pressure function (kPa °C^-1), γ is psychometric constant (kPa °C^-1), U₂ is wind speed at 2 m height (m s^-1).

Boosting-based models

Boosting algorithm is a category of the ensemble learning algorithm. The principle of the Boosting algorithm is to first train a weak learner1 from the training set with the initial weight and then update the weight according to the error. When the weight becomes higher, samples with high error rate are more valued in the latter weak learner 2. After adjusting the weight based on the training set, the repetition of single weak learner is performed until the number of weak learners reaches the predetermined number. Finally, the weak learners are integrated through the set strategy (usually by weighted averaging) to obtain the final strong learner for regression or classification purpose [41].

In 1997, Freund proposed the first practical Boosting algorithm-Adaptive Boosting [42], which laid the foundation for Boosting from an idea to a practical approach. Subsequently, Friedman introduced the idea of gradient descent into the Boosting algorithm and then proposed the Gradient Boosting algorithm [35] which is more practical and can handle different loss functions. Based on the above research, Boosting-based model has been continuously developed by researchers and has already been widely used in classification and regression problems. In this study, five Boosting-based models, Adaptive Boosting (ADA), Gradient Boosting Decision Tree(GBDT), Extreme Gradient Boosting(XGB), Light Gradient Boosting Decision Machine(LGB) and Gradient boosting with categorical features support(CAT), are employed to compare their performance of estimating ET₀ value. All codes of Boosting-based models introduced in this study were written in Python and performed in a laptop with Intel Core i7-9750H CPU @2.60GHz, NVDIA GeForce GTX 1660Ti GPU and 16GB of RAM. For evaluating the performance of each model at the same level of model structure and complexity, only ‘n_estimators’ and ‘learning_rate’ were set to 500 and 0.05 respectively and other hyper parameters were set to default.

Adaptive Boosting (ADA).

The first boosting algorithm, Adaptive Boosting (ADA) was proposed by Freund [42]. AdaBoost assigns equal initial weights to all training data for weak learns training, then updates the weight distribution according to the prediction results. To be more specific, higher weights are assigned to mispredicted samples while lower weights are given to samples predicted correctly, which makes the next training step more focused on mispredicted samples to reduce bias. Above process is repeated until the specified number of iterations or the expected error rate is reached, then all predicted results of the weak learners are added linearly with weights as the final result. The detailed calculation procedures of ADA are described as follows:

For the given dataset , the steps of ADA model for regression problem can be expressed as follows:

(1) Initialize the weight distribution of the training samples as follows:

For i = 1,2,3…M (3) (2) For k (k = 1,2,3…K), taking D_k as the training set of weak learner f_k(x) and calculating the following indicators:

(a) Maximum error: (4) (b) Relative error of each sample: (5) (c) Regression error rate: (6) (d) Weight of weak learner f_k(x): (7) (e) Weight distribution of samples is updated as: (8) Where Z_k is normalizing factor: (9) (3) The final strong learner is obtained as: (10) Where g(x) is the median of all α_mf_m(x), m = 1,2,3…M.

Although ADA is no longer suitable for the current scenario of large sample, high-latitude data usage, its appearance has turned Boosting idea from an initial conjecture into a practical algorithm, which greatly promoted the development of subsequent Boosting-based algorithms.

Gradient Boosting Decision Tree (GBDT).

The Gradient Boosting Decision Tree (GBDT) is an iterative decision tree algorithm, proposed by Friedman [35]. The weak learners in GBDT model have strong dependencies between each other and are trained by progressive iterations based on the residuals. The results of all weak learners are added together as the final result, which makes GBDT have great advantages in over-fitting and computational cost fields and also insensitive to data set missing and can reduce bias at the same time. The detailed calculation procedures of GBDT are described as follows:

For the given dataset , the steps of GBDT model for regression problem can be expressed as follows:

(1) Initialize the weak learner: (11) Where L(y_i, γ) is the loss function.

(2) For m (m = 1,2,3…M) sample in the training set, the residual along the gradient direction is written as: 12 Where n is the number of the estimators (‘n_estimators), n = 1,2,3…,N.

(3) Taking (x_i, r_im) i = 1,2,3…,m as the training data of the weak learner n and the leaf node region is R_nj, j = 1,2,3…,J. For this new weak learner, the optimal negative gradient fitting value of each leaf node is calculated as follows: (13) (4) The model is updated as: (14) (5) The final strong learner is obtained as: (15)

Extreme Gradient Boosting (XGB).

Extreme Gradient Boosting (XGB) is an improved algorithm based on GBDT algorithm [36]. Different from the original GBDT model, XGB model obtains the residual by performing second-order Taylor expansion on the cost function, and adds a regularization term to control the complexity of the model at the same time. The addition of regularization terms reduces the variance of the model and makes the model more simplified, making XGB model superior to original GBDT model in terms of weighing the bias-variance tradeoff and preventing overfitting. Also, XGB supports multiple cost functions and parallel operations on feature granularity.

The specific calculation procedures of XGB are described as follows:

(1) Define the objective function as follows: (16) Where C is a constant term, which can be commonly omitted and R(f_k) is the regularization term at the k time iteration, defined as follows: (17) Where α is complexity of leaves, T is the number of the leaves, η is the penalty parameter and ωj is the output result of each leaf node.

(2) Introduce second-order Taylor series of objective function and adopt the mean square error as the loss function, the objective function can be described as follows: (18) Where is f_k, g_i and h_i is the first and second derivative of loss function, respectively.

the output result of each leaf node.

(3) Determine the final loss value by summing the loss values of leaf nodes. Therefore, the objective function can be expressed as: (19) Where , , and I_j indicates all samples in leaf node j.

Light Gradient Boosting Decision Machine (LGB).

Light Gradient Boosting Decision Machine (LGB) is a novel algorithm from Microsoft [43], which has the advantages of lower memory consumption, higher precision and faster training efficiency. Traditional Boosting-based algorithms need to scan all the sample points for each feature to select the best segmentation point, which leads to the model taking too much time in the large sample and high latitude data condition. In order to solve the above problems and further improve the efficiency and scalability of the model, LGB introduces the Gradient-based One-Side Sampling(GOSS) and Exclusive Feature Bundling(EF-B) algorithm. Fig 1 illustrates the special strategy adopted by LGB algorithm and detailed introduction is as follows:

Download:

Fig 1. Special process of LGB algorithm.

(a) Histogram-based algorithm; (b) Obtain difference value by histogram value; (c) Level-wise and leaf-wise strategies.

https://doi.org/10.1371/journal.pone.0235324.g001

The GOSS algorithm does not use all sample points to calculate the gradient, but instead reserves the sample points with large gradients and performs random sampling on the sample points with small gradients to complete the data sampling in order to maintain the accuracy of information gain. Information gain indicates the expected reduction in entropy caused by splitting the nodes based on attributes, which can be described as follows: (20) (21) Where En(B) is the information entropy of the collection B, p_d is the ratio of B pertaining to category d, D is the number of categories, v is the value of attribute V and B_v is the subset of B for which attribute has value v.

As shown in Fig 1A, the EF-B uses a histogram-based approach, which can discrete floating-point eigenvalues into k integers and construct a histogram of k width. In this way, optimal segmentation point can be found based on the discrete value of histogram with lower memory consumption. In addition, Fig 1B also manifests that the histogram of a single leaf can be obtained by contrasting the histogram of its parent node with that of its sibling node in LGB algorithm, which further increases the speed of the model.

The general process of level-wise and leaf-wise strategies is shown in Fig 1C. Compared with the level-wise strategy, the limited leaf-wise strategy used by LGB could be more effective because it only split at the leaf with the largest information gain and the limited depth can prevent overfitting effectively.

Gradient boosting with categorical features support (CAT).

Gradient boosting [44] with categorical features support (CAT) introduces a modified target-based statistics (TBS) to use all data set for training and avoid potential overfitting problem by performing random permutations. To be more specific, CAT first randomly sorts all samples, and then takes a value from a category-based feature. Each sample's feature is converted to a numerical value by taking an average value based on the category label that precedes the sample, and adding priority and weight coefficients of priority. In the process of building new weak learners, CAT first uses the gradient of the sample points before the sample X_n to estimate the model, and then uses these models to calculate the gradient of X_n and update the model. Moreover, CAT uses the oblivious tree as the weak learner, and the index of each leaf node in the oblivious tree can be encoded as a binary vector of length equal to the depth of the tree, which further enhance the model’s ability to resist overfitting.

Compared with XGB and LGB, CAT has following main contributions:

Categorical features can be handled automatically by using TBS before training process.
Feature dimensions can be enriched by combining the category features according to the relationship between different ones.
Overfitting problem can be better resisted by adopting complete oblivious tree.

Calibration and validation of the models

This study considered limited meteorological variables input combinations as the combination of air temperature data (T_max and T_min) with Rs, RH and U₂ respectively. In addition, Since extraterrestrial radiation (Ra) is commonly applied to improve the modeling accuracy for estimating ET₀ with limited input meteorological variables and it is closely related to the geographic data of the station and the number of the days in a year(J), this study also employed J as the input variable to compare with the modeling accuracy improvement brought by Ra and J.

As summarized above, six input meteorological variables combinations were shown in Table 2, these combinations are: (1) T_max, T_min, Rs; (2) T_max, T_min, RH; (3) T_max, T_min, U₂; (4) T_max, T_min; (5) T_max, T_min, Ra and (6) T_max, T_min, J.

Download:

Table 2. The input meteorological variables combinations for different models.

https://doi.org/10.1371/journal.pone.0235324.t002

10-fold cross validation method was used to better evaluate the accuracy of the model and reduce the randomness brought by test samples, and the average value of 10-fold cross-validation result is used as the final performance of the model. In addition, meteorological data from 1997 to 2011, 2012 to 2016 were used as the training and testing set respectively, with a different proportion of training and testing sets from that of 10-fold cross validation stage to analyze model accuracy on daily scale.

Performance criteria

Present study introduced root mean square error (RMSE), mean absolute error (MAE) and adjusted R² (Adj_R²) to evaluate performance of the models [4,24,26,28,45]. (22) (23) (24) (25) Where and are ET₀ values estimated by FAO-56 PM and other models respectively. and are the mean values of the ET₀ values estimated by FAO-56 PM model and other models respectively. N and P are the number of test samples and variables, respectively. i is the number of i-th step, n is the number of the total steps. RMSE is in mm day^-1, with the value range from 0 (optimum value) to +∞ (worst value). MAE is in mm/d, with the value range from 0 (optimum value) to +∞ (worst value). R² and Adj_R² are dimensionless, with the value range from 1 (optimum value) to −∞ (worst value).

Results

Comparison of different Boosting-based models with various input combinations on daily scale

The performances of Boosting-based models with daily different meteorological variables inputs at Harbin, Shenyang, Yan 'an, Jinan, Nanjing, Changsha, Chengdu, Kunming, Nanning and Guangzhou stations were illustrated in Tables 3–12, respectively. Tables manifested that the tested models generally had similar performance ranking across 10 stations. For brevity, Harbin, Changsha and Guangzhou were chosen as representatives of TMZ, SMZ and TPMZ respectively to describe in detail.

Download:

Table 3. Performance of Boosting-based models during 10-fold cross validation and testing stages at Harbin station.

https://doi.org/10.1371/journal.pone.0235324.t003

Download:

Table 4. Performance of Boosting-based models during 10-fold cross validation and testing stages at Shenyang station.

https://doi.org/10.1371/journal.pone.0235324.t004

Download:

Table 5. Performance of Boosting-based models during 10-fold cross validation and testing stages at Yan’an station.

https://doi.org/10.1371/journal.pone.0235324.t005

Download:

Table 6. Performance of Boosting-based models during 10-fold cross validation and testing stages at Ji’nan station.

https://doi.org/10.1371/journal.pone.0235324.t006

Download:

Table 7. Performance of Boosting-based models during 10-fold cross validation and testing stages at Nanjing station.

https://doi.org/10.1371/journal.pone.0235324.t007

Download:

Table 8. Performance of Boosting-based models during 10-fold cross validation and testing stages at Changsha station.

https://doi.org/10.1371/journal.pone.0235324.t008

Download:

Table 9. Performance of Boosting-based models during 10-fold cross validation and testing stages at Chengdu station.

https://doi.org/10.1371/journal.pone.0235324.t009

Download:

Table 10. Performance of Boosting-based models during 10-fold cross validation and testing stages at Kunming station.

https://doi.org/10.1371/journal.pone.0235324.t010

Download:

Table 11. Performance of Boosting-based models during 10-fold cross validation and testing stages at Nanning station.

https://doi.org/10.1371/journal.pone.0235324.t011

Download:

Table 12. Performance of Boosting-based models during 10-fold cross validation and testing stages at Guangzhou station.

https://doi.org/10.1371/journal.pone.0235324.t012

As shown in Table 3, CAT models generally achieved the best performance (on average RMSE of 0.5259 mm d^-1, MAE of 0.3614 mm d^-1 and Adj_R² of 0.9168) among all the tested models with all input combinations at Harbin station (TMZ), followed by LGB (on average RMSE of 0.5430 mm d^-1, MAE of 0.3671 mm d^-1 and Adj_R² of 0.9142) and XGB (on average RMSE of 0.5376 mm d^-1, MAE of 0.3727 mm d^-1 and Adj_R² of 0.9128). The GBDT models could also achieve acceptable precision (on average RMSE of 0.5618 mm d^-1, MAE of 0.3883 mm d^-1 and Adj_R² of 0.9041), while the original ADA models had the relatively worst performance (on average RMSE of 0.6597 mm d^-1, MAE of 0.5077 mm d^-1 and Adj_R² of 0.8704).

During the 10-fold cross validation stage, models with M1(Rs) input (with RMSE ranged from 0.4288–0.5748 mm d^-1, MAE ranged from 0.2871–0.4715 mm d^-1, Adj_R² ranged from 0.9019–0.9461) and M2(RH) input (with RMSE ranged from 0.4334–0.6108 mm d^-1, MAE ranged from 0.2919–0.4857 mm d^-1, Adj_R² ranged from 0.8883–0.9446) performed the best. When only T_max and T_min data were inputted, models based on M4 input achieved the worst precision (with RMSE ranged from 0.4288–0.5748 mm d^-1, MAE ranged from 0.2871–0.4715 mm d^-1, Adj_R² ranged from 0.8373–0.8726), followed by models based on M3(U₂) input (with RMSE ranged from 0.5997–0.7052 mm d^-1, MAE ranged from 0.4259–0.5417 mm d^-1, Adj_R² ranged from 0.8537–0.8940). It is worth to see that models based on M5(Ra) and M6(J), which are combinations of temperature data with Ra and J respectively, could achieve better performance than models based on M4 and even models based on M3 input. In addition, models based on M6 (with RMSE ranged from 0.5122–0.6429 mm d^-1, MAE ranged from 0.3473–0.4783 mm d^-1, Adj_R² ranged from 0. 8782–0.9231) could obtain slightly better accuracy than models based on M5 (with RMSE ranged from 0.5224–0.6812 mm d^-1, MAE ranged from 0.3507–0.5107 mm d^-1, Adj_R² ranged from 0.8632–0.9201).

The performance of tested models at Changsha station (SMZ) was demonstrated in Table 6. The accuracy ranking of various Boosting-based models was same as that of Harbin station, which was in the order of CAT, LGB, XGB, GBDT and ADA. However, the overall simulation accuracy suffered a slight decrease compared with the performance of models at Harbin station, the average Adj_R² value of ADA, GBDT, XGB, LGB and CAT decreased by 13.02%, 11.22%, 10.00%, 10.00% and 9.72% respectively. Particularly, models based on M1 combination achieved the best precision (with RMSE ranged from 0.2746–0.4275 mm d^-1, MAE ranged from 0.1959–0.3527 mm d^-1, Adj_R² ranged from 0.8976–0.9573),which was far ahead of models with other input combinations. The above results could also be found in the testing stage.

Table 12 showed the performance of tested models at Guangzhou station (TPMZ). Same performance ranking of tested models could also be found at Guangzhou station, but overall simulation accuracy suffered a more significant decrease, the average Adj_R² value of ADA, GBDT, XGB, LGB and CAT decreased by 38.39%, 30.08%, 27.83%, 27.43 and 26.73% respectively, compared with those of models at Harbin station. In terms of effect of input combinations on modeling accuracy, models with M5 (with RMSE ranged from 0.6494–0.7622 mm d^-1, MAE ranged from 0.5036–0.6171 mm d^-1, Adj_R² ranged from 0.4512–0.5992) performed slightly worse than models with M3 (with RMSE ranged from 0.6322–0.7658 mm d^-1, MAE ranged from 0.4906–0.6215 mm d^-1, Adj_R² ranged from 0.4457–0.6207), while models with M6 (with RMSE ranged from 0.6171–0.7580 mm d^-1, MAE ranged from 0.4745–0.6138 mm d^-1, Adj_R² ranged from 0.4574–0. 6382) still performed better than models with M3, while the effect of the other input combinations was generally the same as that of Harbin station and Changsha station.

In conclusion, CAT models could offer the highest accuracy among all tested models no matter under what input combination or at which station, followed by LGB and XGB models, which could also achieve relatively satisfactory precision. There is no doubt CAT1 based on Rs obtained the best performance and be highly recommend for estimating daily ET₀ in this study area. However, CAT5 and CAT6 models based on only temperature data and partial geographic data could achieve acceptable accuracy with fewest meteorological variables, which can be regarded as more cost-effective and more conducive to promotion and application.

Comparison of model accuracy stability with different input combinations across 10 stations

Fig 2 demonstrated the average RMSE values of Boosting-based models with various input combinations across 10 stations as a box plot. Because of the modeling accuracy of ADA models were much worse than the other Boosting-based models, the stability comparison employed in present study mainly focuses on GBDT, XGB, LGB and CAT models. Among the tested models, CAT model not only achieved the smallest average RMSE value, but also the most concentrated distribution of RMSE values regardless of the input combinations, which indicated that the CAT model had the best precision stability. The stability of the other three models is basically the same, thus the modeling accuracy should be the primary consideration when selecting one model for estimating ET₀ among these 3 models. In terms of the effect of input combinations, taking CAT models as example, the RMSE values of CAT model based on M2 input obtained the minimal fluctuation (with RMSE ranged from 0.4334–0.6044) across 10 stations, followed by models based on M3, M6, M5, M4 and M1. It's also worth noting that although the accuracy of models with M1 input was the highest in each station, the accuracy gap between stations across different climate zones was the largest (with RMSE ranged from 0.2746–0.5861), which may as a result of the differences in the Rs distribution among each station and the different contribution of Rs to daily ET₀ across various climate zones.

Download:

Fig 2. Average RMSE values of Boosting-based models at 10 stations under different input combinations.

https://doi.org/10.1371/journal.pone.0235324.g002

The results of path analysis between meteorological variables and ET₀ at 10 stations

Path analysis is a method proposed by Sewell Wright for studying the direct and indirect effects of independent variables on dependent variables and quantitatively analyzing the mutual influence degree of factors. Therefore, this study introduced path analysis to analyze the effect of T_max, T_min, RH, U₂ and Rs on daily ET₀. The results of path analysis between meteorological variables and ET₀ across all stations were shown in Table 13.

Download:

Table 13. Path analysis between meteorological variables and ET₀ at 10 stations.

https://doi.org/10.1371/journal.pone.0235324.t013

It could be found from Table 13 that except for RH and U₂, the other 3 meteorological variables all had positive correlation coefficient to ET₀ at all 10 stations. As illustrated in Fig 3A, the dashed line is the trend line of its corresponding meteorological variables. At stations in TMZ, the correlation coefficient of T_max, T_min, RH, U₂ and Rs were 0.799–0.860,0.643–0.787,-0.442- -0.614,0.034–0.240 and 0.843–0.865 respectively. When it comes to stations in SMZ, the correlation coefficient of T_max, T_min, RH, U₂ and Rs were 0.741–0.793,0.374–0.639,-0.343- -0.806, -0.091–0.345 and 0.866–0.927 respectively. And the correlation coefficient of T_max, T_min, RH, U₂ and Rs were 0.527–0.643,0.270–0.402,-0.349- -0.367, -0.045–0.209 and 0.865–0.909 respectively at stations in TPMZ. It’s obvious to see that from Harbin station to Guangzhou station, the correlation coefficient of T_max, T_min, RH and U₂ with ET₀ showed decrease trend in general, while only Rs were on the contrary (increased from 0.749 at Harbin station to 0.826 at Guangzhou station), which indicated that the overall contribution of Rs on ET₀ increased significantly and became more and more crucial for accurately estimating ET₀ as the latitude of the station goes down in this study area.

Download:

Fig 3. Path analysis results of meteorological variables to daily ET0 across different stations.

(a) Correlation coefficient between meteorological variables and ET₀ at 10 stations; (b) Direct effect of meteorological variables on ET₀ at 10 stations; (c) The contribution of meteorological variables to R² value at 10 stations.

https://doi.org/10.1371/journal.pone.0235324.g003

The direct effects tendency of T_max, T_min, RH, U₂ and Rs on ET₀ across 10 stations was shown in Fig 3B. At Harbin station, T_max contributed the largest direct effect (0.544), which was 0.154 more than Rs (0.390), but the direct effect of T_min was almost none, only 0.003. As the station’s latitude goes down, the direct effect of T_max showed a significant decrease, the direct effect of T_min, RH (absolute value) and Rs had apparent rise, and that of U₂ showed a slight rise. When it comes to Guangzhou station, the direct effect of T_max only left 0.104, while the direct effect of Rs rose to 0.711 and T_min exceeded T_max to 0.348. This trend is similarly reflected in the overall contribution of variables to R². It can be seen from Fig 3C, specially, as the second most contributing variable (0.142) at Harbin station, T_max reduced to the least contributing variable (0.001)at Guangzhou station. On the contrary, T_min, which is the least contributing variable at Harbin station (0.000), could offer a contribution to R² of 0.069 at Guangzhou station. Other meteorological variables also gradually increase the contribution to the R² of ET₀ estimation results, as the latitude goes from north to south.

In conclusion, Rs had the greatest contribution to ET₀, followed by T_max, T_min and RH, while U₂ generally had the least effect on daily ET_0. These results provided an explanation for the difference in the modeling accuracy of models with same input condition between stations across different climate zones and could also offer a reliable reference for selecting appropriate input combination for ET₀ estimation in different climate zones.

Discussion

Effect of Ra and J on improving model accuracy

Ra has been proved that it can improve the estimation accuracy of daily ET₀ when only limited meteorological variables are available [26,46–48]. As shown in Fig 4, taking CAT models as examples, CAT4 based on only temperature data could only obtain an average RMSE value ranging from 0.5542–0.8204 mm day^-1, while the average RMSE values of CAT5 and CAT6 ranged from 0.4914–0.7042 mm day^-1and 0.4507–0.6877 mm day^-1 respectively. Compared with CAT4, employing Ra could make the average RMSE value decrease by 20.67%, 19.14%, 17.63%, 14.17%, 10.22%, 6.18%, 5.29%, 11.34%, 5.26% and 4.42% from Harbin to Guangzhou station, while using J could make that decreased by 22.22%, 19.79%, 18.93%, 16.18%, 11.72%, 7.33%, 5.10%, 18.67%, 9.95% and 9.17% respectively. It is obvious to see that CAT6 performed even better than CAT5 and this kind of improvement on modeling accuracy decreased as the station’s latitude decreases. This phenomenon may as a result of meteorological conditions of the stations in this study area are generally quite stable and ET₀ is the result of the coupling effect of various meteorological variables, so J contains more overall information and variation pattern of ET₀ than the single calculated Ra.

Download:

Fig 4. Comparison of the RMSE value of CAT4, CAT5 and CAT6 across 10 stations.

https://doi.org/10.1371/journal.pone.0235324.g004

To sum up, the result of employing Ra with only temperature data for estimating ET₀ in present study was also same as previous. As a parameter to calculate Ra, J could also improve the modeling accuracy with limited inputs and was even better than Ra. Therefore, models based on J input can be recommended for estimating ET₀ when partial meteorological variables and geographical data are absent.

Strategy for selecting proper input combination at different stations

According to Tables 3–12 and the results of path analysis in 3.3, we can optimize the meteorological variables involved in the input combination at different stations. For example, T_min has the least contribution to R² and smallest direct effect on daily ET₀ at Harbin station, thus T_min could be removed from these input combinations without reducing the modeling accuracy. Similarly, T_max could be removed from the input combinations at Guangzhou station due to the fact that it could hardly make contribution to the R² value of the estimation results. The different model performance between various stations could also be explained by the path analysis results. Taking CAT models in 10-fold validation stage as examples, the correlation coefficient of RH and U2 at Kunming station are much higher than those at other stations, thus CAT2 model (with RMSE of 0.3977 mm d^-1, MAE of 0.3106 mm d^-1 and Adj_R² of 0.9168) and CAT3 model (with RMSE of 0.4785 mm d^-1, MAE of 0.3721 mm d^-1 and Adj_R² of 0.8618) achieved the highest precision compared with CAT2 and CAT3 models at other stations. In conclusion, the selection of the input combination for estimating daily ET₀ should be based on the importance and contribution of meteorological variables at each single station, so as to make use of available meteorological variables more effectively to obtain better accuracy.

Conclusion

This study investigated 5 Boosting-based models, including original Adaptive Boosting(ADA), Gradient Boosting Decision Tree(GBDT), Extreme Gradient Boosting(XGB), Light Gradient Boosting Decision Tree (LGB) and Gradient boosting with categorical features support(CAT), for accurately estimating daily ET₀ value with 6 different meteorological variables input combinations at 10 stations across the eastern monsoon zone of China. The results indicated that the CAT models had the highest accuracy and stability over all tested models under the same input combinations across all stations. And the LGB and XGB models could achieve very close accuracy, while original ADA models produced the worst performance. Under the condition of limited meteorological variables input, Rs definitely plays the most important role for accurately estimating daily ET₀ value which makes the models based on M1 provide the best accuracy regardless of which station. Model with M2 input combination could offer the second highest precision, while models based on M4 (only temperature data) had the worst estimation accuracy. However, when Ra and J were employed with temperature data, the modeling accuracy increased significantly. The accuracy of models based on M6 generally ranked the third place (better than models with M3 input) and models based on M5 ranked the fifth place (much better than models with M4 input). Thus, in terms of improving the accuracy of the models with limited meteorological variables, J has better effect than Ra and is more easier to obtain in this study and the improvement brought by employing J was more and more significant as the latitude of the stations increases compared with employing Ra.

In summary, the CAT could be most highly recommended for estimating daily ET₀ and J can be highly recommended for improving the accuracy of models when limited meteorological variables are available or geographical information is absent in the eastern monsoon zone of China.

Acknowledgments

We sincerely thank the National Climatic Centre of the China Meteorological Administration for providing the daily meteorological database used in this study.

References

1. Allen R, Pereira L, Raes D, Smith M. Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56; FAO-Food and Agriculture Organization of the United Nations, Rome (http://www. fao. org/docrep). ARPAV (2000), La caratterizzazione climatica della Regione Veneto, Quad. Geophysics. 1998;156: 178.
2. Antonopoulos VZ, Antonopoulos A V. Daily reference evapotranspiration estimates by artificial neural networks technique and empirical equations using limited input climate variables. Comput Electron Agric. 2017.
- View Article
- Google Scholar
3. Wu L, Fan J. Comparison of neuron-based, kernel-based, tree-based and curve-based machine learning models for predicting daily reference evapotranspiration. PLoS One. 2019. pmid:31150448
- View Article
- PubMed/NCBI
- Google Scholar
4. Tabari H, Grismer ME, Trajkovic S. Comparative analysis of 31 reference evapotranspiration methods under humid conditions. Irrig Sci. 2013.
- View Article
- Google Scholar
5. Gu Z, Qi Z, Burghate R, Yuan S, Jiao X, Xu J. Irrigation Scheduling Approaches and Applications: A Review. J Irrig Drain Eng. 2020.
- View Article
- Google Scholar
6. Allen RG, Jensen ME, Wright JL, Burman RD. Operational estimates of reference evapotranspiration. Agron J. 1989;81: 650–662.
- View Article
- Google Scholar
7. Feng Y, Cui N, Gong D, Zhang Q, Zhao L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric Water Manag. 2017. pmid:28154450
- View Article
- PubMed/NCBI
- Google Scholar
8. Falamarzi Y, Palizdan N, Huang YF, Lee TS. Estimating evapotranspiration from temperature and wind speed data using artificial and wavelet neural networks (WNNs). Agric Water Manag. 2014.
- View Article
- Google Scholar
9. Xu JZ, Peng SZ, Zhang RM, Li DX. Neural network model for reference crop evapotranspiration prediction based on weather forecast. Shuili Xuebao/Journal Hydraul Eng. 2006.
- View Article
- Google Scholar
10. Khoshhal J, Mokarram M. Model for prediction of evapotranspiration using MLP neural network. Int J Environ Sci. 2012;3: 1000.
- View Article
- Google Scholar
11. Tabari H. Evaluation of reference crop evapotranspiration equations in various climates. Water Resour Manag. 2010.
- View Article
- Google Scholar
12. Kisi O. Comparison of different empirical methods for estimating daily reference evapotranspiration in Mediterranean climate. J Irrig Drain Eng. 2013;140: 4013002.
- View Article
- Google Scholar
13. Shih SF. Data requirement for evapotranspiration estimation. J Irrig Drain Eng. 1984;110: 263–274.
- View Article
- Google Scholar
14. Allen RG. Self-calibrating method for estimating solar radiation from air temperature. J Hydrol Eng. 1997.
- View Article
- Google Scholar
15. Alexandris S, Stricevic R, Petkovic S. Comparative analysis of reference evapotranspiration from the surface of rainfed grass in central Serbia, calculated by six empirical methods against the Penman-Monteith formula. Eur Water. 2008;21: 17–28.
- View Article
- Google Scholar
16. Jensen M, Haise H. Estimating Evapotranspiration from Solar Radiation. Proc Am Soc Civ Eng J Irrig Drain Div. 1963.
- View Article
- Google Scholar
17. Hargreaves GH, Samani ZA. Reference crop evapotranspiration from temperature. Appl Eng Agric. 1985;1: 96–99.
- View Article
- Google Scholar
18. PRIESTLEY CHB, TAYLOR RJ. On the Assessment of Surface Heat Flux and Evaporation Using Large-Scale Parameters. Mon Weather Rev. 1972.
- View Article
- Google Scholar
19. Irmak S, Irmak A, Allen RG, Jones JW. Solar and net radiation-based equations to estimate reference evapotranspiration in humid climates. J Irrig Drain Eng. 2003.
- View Article
- Google Scholar
20. Makkink GF. Testing the Penman formula by means of lysismeters. Int Water Eng. 1957.
- View Article
- Google Scholar
21. Kumar M, Raghuwanshi NS, Singh R, Wallender WW, Pruitt WO. Estimating evapotranspiration using artificial neural network. J Irrig Drain Eng. 2002.
- View Article
- Google Scholar
22. Abdullah SS, Malek MA, Abdullah NS, Kisi O, Yap KS. Extreme Learning Machines: A new approach for prediction of reference evapotranspiration. J Hydrol. 2015;527: 184–195.
- View Article
- Google Scholar
23. Kişi Ö. Evapotranspiration modeling using a wavelet regression model. Irrig Sci. 2011.
- View Article
- Google Scholar
24. Feng Y, Cui N, Zhao L, Hu X, Gong D. Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China. J Hydrol. 2016.
- View Article
- Google Scholar
25. Zheng H, Yuan J, Chen L. Short-Term Load Forecasting Using EMD-LSTM neural networks with a xgboost algorithm for feature importance evaluation. Energies. 2017.
- View Article
- Google Scholar
26. Fan J, Yue W, Wu L, Zhang F, Cai H, Wang X, et al. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric For Meteorol. 2018.
- View Article
- Google Scholar
27. Kisi O. Modeling reference evapotranspiration using three different heuristic regression approaches. Agric Water Manag. 2016.
- View Article
- Google Scholar
28. Kişi O, Çimen M. Evapotranspiration modelling using support vector machines. Hydrol Sci J. 2009.
- View Article
- Google Scholar
29. Wu L, Huang G, Fan J, Zhang F, Wang X, Zeng W. Potential of kernel-based nonlinear extension of Arps decline model and gradient boosting with categorical features support for predicting daily global solar radiation in humid regions. Energy Convers Manag. 2019.
- View Article
- Google Scholar
30. Malik A, Kumar A, Kisi O. Monthly pan-evaporation estimation in Indian central Himalayas using different heuristic approaches and climate based models. Comput Electron Agric. 2017;143: 302–313.
- View Article
- Google Scholar
31. Shiri J, Nazemi AH, Sadraddini AA, Landeras G, Kisi O, Fakheri Fard A, et al. Comparison of heuristic and empirical approaches for estimating reference evapotranspiration from limited inputs in Iran. Comput Electron Agric. 2014.
- View Article
- Google Scholar
32. Mosavi A, Edalatifar M. A Hybrid Neuro-Fuzzy Algorithm for Prediction of Reference Evapotranspiration. Lecture Notes in Networks and Systems. 2019.
- View Article
- Google Scholar
33. Mohammadi B, Mehdizadeh S. Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agric Water Manag. 2020.
- View Article
- Google Scholar
34. Breiman L. Random forests. Mach Learn. 2001.
- View Article
- Google Scholar
35. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001.
- View Article
- Google Scholar
36. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM; 2016. pp. 785–794.
37. Wang S, Fu Z, Chen H, Ding Y, Wu L, Wang K. Simulation of Reference Evapotranspiration Based on Random Forest Method. Nongye Jixie Xuebao/Transactions Chinese Soc Agric Mach. 2017.
- View Article
- Google Scholar
38. Huang G, Wu L, Ma X, Zhang W, Fan J, Yu X, et al. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J Hydrol. 2019.
- View Article
- Google Scholar
39. Fan J, Ma X, Wu L, Zhang F, Yu X, Zeng W. Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric Water Manag. 2019.
- View Article
- Google Scholar
40. Allen RG, Pereira LS, Raes D, Smith M. Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao, Rome. 1998;300: D05109.
41. Freund Y, Schapire R, Abe N. A short introduction to boosting. Journal-Japanese Soc Artif Intell. 1999;14: 1612.
- View Article
- Google Scholar
42. Freund Y, Schapire RE. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J Comput Syst Sci. 1997.
- View Article
- Google Scholar
43. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;2017–Decem: 3147–3155.
- View Article
- Google Scholar
44. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. Catboost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems. 2018.
- View Article
- Google Scholar
45. Landeras G, Ortiz-Barredo A, López JJ. Comparison of artificial neural network models and empirical and semi-empirical equations for daily reference evapotranspiration estimation in the Basque Country (Northern Spain). Agric water Manag. 2008;95: 553–565.
- View Article
- Google Scholar
46. Valiantzas JD. Temperature-and humidity-based simplified Penman’s ET0 formulae. Comparisons with temperature-based Hargreaves-Samani and other methodologies. Agric Water Manag. 2018.
- View Article
- Google Scholar
47. Yu T, Cui N, Zhang Q, Hu X. Applicability evaluation of daily reference crop evapotranspiration models in Northwest China. Paiguan Jixie Gongcheng Xuebao/Journal Drain Irrig Mach Eng. 2019.
- View Article
- Google Scholar
48. Feng Y, Peng Y, Cui N, Gong D, Zhang K. Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data. Comput Electron Agric. 2017.
- View Article
- Google Scholar

[ref1] 1. Allen R, Pereira L, Raes D, Smith M. Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56; FAO-Food and Agriculture Organization of the United Nations, Rome (http://www. fao. org/docrep). ARPAV (2000), La caratterizzazione climatica della Regione Veneto, Quad. Geophysics. 1998;156: 178.

[ref2] 2. Antonopoulos VZ, Antonopoulos A V. Daily reference evapotranspiration estimates by artificial neural networks technique and empirical equations using limited input climate variables. Comput Electron Agric. 2017.
View Article
Google Scholar

[3] View Article

[4] Google Scholar

[ref3] 3. Wu L, Fan J. Comparison of neuron-based, kernel-based, tree-based and curve-based machine learning models for predicting daily reference evapotranspiration. PLoS One. 2019. pmid:31150448
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref4] 4. Tabari H, Grismer ME, Trajkovic S. Comparative analysis of 31 reference evapotranspiration methods under humid conditions. Irrig Sci. 2013.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref5] 5. Gu Z, Qi Z, Burghate R, Yuan S, Jiao X, Xu J. Irrigation Scheduling Approaches and Applications: A Review. J Irrig Drain Eng. 2020.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref6] 6. Allen RG, Jensen ME, Wright JL, Burman RD. Operational estimates of reference evapotranspiration. Agron J. 1989;81: 650–662.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Feng Y, Cui N, Gong D, Zhang Q, Zhao L. Evaluation of random forests and generalized regression neural networks for daily reference evapotranspiration modelling. Agric Water Manag. 2017. pmid:28154450
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref8] 8. Falamarzi Y, Palizdan N, Huang YF, Lee TS. Estimating evapotranspiration from temperature and wind speed data using artificial and wavelet neural networks (WNNs). Agric Water Manag. 2014.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Xu JZ, Peng SZ, Zhang RM, Li DX. Neural network model for reference crop evapotranspiration prediction based on weather forecast. Shuili Xuebao/Journal Hydraul Eng. 2006.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Khoshhal J, Mokarram M. Model for prediction of evapotranspiration using MLP neural network. Int J Environ Sci. 2012;3: 1000.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Tabari H. Evaluation of reference crop evapotranspiration equations in various climates. Water Resour Manag. 2010.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Kisi O. Comparison of different empirical methods for estimating daily reference evapotranspiration in Mediterranean climate. J Irrig Drain Eng. 2013;140: 4013002.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Shih SF. Data requirement for evapotranspiration estimation. J Irrig Drain Eng. 1984;110: 263–274.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Allen RG. Self-calibrating method for estimating solar radiation from air temperature. J Hydrol Eng. 1997.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Alexandris S, Stricevic R, Petkovic S. Comparative analysis of reference evapotranspiration from the surface of rainfed grass in central Serbia, calculated by six empirical methods against the Penman-Monteith formula. Eur Water. 2008;21: 17–28.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Jensen M, Haise H. Estimating Evapotranspiration from Solar Radiation. Proc Am Soc Civ Eng J Irrig Drain Div. 1963.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Hargreaves GH, Samani ZA. Reference crop evapotranspiration from temperature. Appl Eng Agric. 1985;1: 96–99.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. PRIESTLEY CHB, TAYLOR RJ. On the Assessment of Surface Heat Flux and Evaporation Using Large-Scale Parameters. Mon Weather Rev. 1972.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Irmak S, Irmak A, Allen RG, Jones JW. Solar and net radiation-based equations to estimate reference evapotranspiration in humid climates. J Irrig Drain Eng. 2003.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Makkink GF. Testing the Penman formula by means of lysismeters. Int Water Eng. 1957.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Kumar M, Raghuwanshi NS, Singh R, Wallender WW, Pruitt WO. Estimating evapotranspiration using artificial neural network. J Irrig Drain Eng. 2002.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Abdullah SS, Malek MA, Abdullah NS, Kisi O, Yap KS. Extreme Learning Machines: A new approach for prediction of reference evapotranspiration. J Hydrol. 2015;527: 184–195.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Kişi Ö. Evapotranspiration modeling using a wavelet regression model. Irrig Sci. 2011.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Feng Y, Cui N, Zhao L, Hu X, Gong D. Comparison of ELM, GANN, WNN and empirical models for estimating reference evapotranspiration in humid region of Southwest China. J Hydrol. 2016.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Zheng H, Yuan J, Chen L. Short-Term Load Forecasting Using EMD-LSTM neural networks with a xgboost algorithm for feature importance evaluation. Energies. 2017.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref26] 26. Fan J, Yue W, Wu L, Zhang F, Cai H, Wang X, et al. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric For Meteorol. 2018.
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref27] 27. Kisi O. Modeling reference evapotranspiration using three different heuristic regression approaches. Agric Water Manag. 2016.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Kişi O, Çimen M. Evapotranspiration modelling using support vector machines. Hydrol Sci J. 2009.
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref29] 29. Wu L, Huang G, Fan J, Zhang F, Wang X, Zeng W. Potential of kernel-based nonlinear extension of Arps decline model and gradient boosting with categorical features support for predicting daily global solar radiation in humid regions. Energy Convers Manag. 2019.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref30] 30. Malik A, Kumar A, Kisi O. Monthly pan-evaporation estimation in Indian central Himalayas using different heuristic approaches and climate based models. Comput Electron Agric. 2017;143: 302–313.
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref31] 31. Shiri J, Nazemi AH, Sadraddini AA, Landeras G, Kisi O, Fakheri Fard A, et al. Comparison of heuristic and empirical approaches for estimating reference evapotranspiration from limited inputs in Iran. Comput Electron Agric. 2014.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref32] 32. Mosavi A, Edalatifar M. A Hybrid Neuro-Fuzzy Algorithm for Prediction of Reference Evapotranspiration. Lecture Notes in Networks and Systems. 2019.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref33] 33. Mohammadi B, Mehdizadeh S. Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agric Water Manag. 2020.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref34] 34. Breiman L. Random forests. Mach Learn. 2001.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref35] 35. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref36] 36. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM; 2016. pp. 785–794.

[ref37] 37. Wang S, Fu Z, Chen H, Ding Y, Wu L, Wang K. Simulation of Reference Evapotranspiration Based on Random Forest Method. Nongye Jixie Xuebao/Transactions Chinese Soc Agric Mach. 2017.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref38] 38. Huang G, Wu L, Ma X, Zhang W, Fan J, Yu X, et al. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J Hydrol. 2019.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref39] 39. Fan J, Ma X, Wu L, Zhang F, Yu X, Zeng W. Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agric Water Manag. 2019.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref40] 40. Allen RG, Pereira LS, Raes D, Smith M. Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao, Rome. 1998;300: D05109.

[ref41] 41. Freund Y, Schapire R, Abe N. A short introduction to boosting. Journal-Japanese Soc Artif Intell. 1999;14: 1612.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref42] 42. Freund Y, Schapire RE. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J Comput Syst Sci. 1997.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref43] 43. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: A highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;2017–Decem: 3147–3155.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref44] 44. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. Catboost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems. 2018.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref45] 45. Landeras G, Ortiz-Barredo A, López JJ. Comparison of artificial neural network models and empirical and semi-empirical equations for daily reference evapotranspiration estimation in the Basque Country (Northern Spain). Agric water Manag. 2008;95: 553–565.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref46] 46. Valiantzas JD. Temperature-and humidity-based simplified Penman’s ET0 formulae. Comparisons with temperature-based Hargreaves-Samani and other methodologies. Agric Water Manag. 2018.
View Article
Google Scholar

[133] View Article

[134] Google Scholar

[ref47] 47. Yu T, Cui N, Zhang Q, Hu X. Applicability evaluation of daily reference crop evapotranspiration models in Northwest China. Paiguan Jixie Gongcheng Xuebao/Journal Drain Irrig Mach Eng. 2019.
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref48] 48. Feng Y, Peng Y, Cui N, Gong D, Zhang K. Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data. Comput Electron Agric. 2017.
View Article
Google Scholar

[139] View Article

[140] Google Scholar

Figures

Abstract

Introduction

Material and methods

Study area and data description

FAO-56 Penman-Monteith model

Boosting-based models

Adaptive Boosting (ADA).

Gradient Boosting Decision Tree (GBDT).

Extreme Gradient Boosting (XGB).

Light Gradient Boosting Decision Machine (LGB).

Gradient boosting with categorical features support (CAT).

Calibration and validation of the models

Performance criteria

Results

Comparison of different Boosting-based models with various input combinations on daily scale

Comparison of model accuracy stability with different input combinations across 10 stations

The results of path analysis between meteorological variables and ET0 at 10 stations

Discussion

Effect of Ra and J on improving model accuracy

Strategy for selecting proper input combination at different stations

Conclusion

Acknowledgments

References

The results of path analysis between meteorological variables and ET₀ at 10 stations