Comparison of neuron-based, kernel-based, tree-based and curve-based machine learning models for predicting daily reference evapotranspiration

Accurately predicting reference evapotranspiration (ET0) with limited climatic data is crucial for irrigation scheduling design and agricultural water management. This study evaluated eight machine learning models in four categories, i.e. neuron-based (MLP, GRNN and ANFIS), kernel-based (SVM, KNEA), tree-based (M5Tree, XGBoost) and curve-based (MARS) models, for predicting daily ET0 with maximum/maximum temperature and precipitation data during 2001–2015 from 14 stations in various climatic regions of China, i.e., arid desert of northwest China (NWC), semi-arid steppe of Inner Mongolia (IM), Qinghai-Tibetan Plateau (QTP), (semi-)humid cold-temperate northeast China (NEC), semi-humid warm-temperate north China (NC), humid subtropical central China (CC) and humid tropical south China (SC). The results showed machine learning models using only temperature data obtained satisfactory daily ET0 estimates (on average R2 = 0.829, RMSE = 0.718 mm day−1, NRMSE = 0.250 and MAE = 0.508 mm day−1). The prediction accuracy was improved by 7.6% across China when information of precipitation was further considered, particularly in (sub)tropical humid regions (by 9.7% in CC and 12.4% in SC). The kernel-based SVM, KNEA and curve-based MARS models generally outperformed the others in terms of prediction accuracy, with the best performance by KNEA in NWC and IM, by SVM in QTP, CC and SC, and very similar performance by them in NEC and NC. SVM (1.9%), MLP (2.0%), MARS (2.6%) and KNEA (6.4%) showed relatively small average increases in RMSE during testing compared with training RMSE. SVM is highly recommended for predicting daily ET0 across China in light of best accuracy and stability, while KNEA and MARS are also promising powerful models.


Introduction
Accurate prediction of reference evapotranspiration (ET 0 ) is significant for irrigation schedules design, crop growth modeling and agricultural water management [1][2][3][4][5]. Various a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 predict daily ET 0 in Iraq, and found it exhibited good efficiency and generalization performances. [29] estimated daily ET 0 with the ELM and GRNN models based on air temperatures along in southwestern China. They found that the ELM model was superior to the GRNN model. The kernel-based SVM, LS-SVM and ELM models are also coupled with pre-treatment or optimization algorithms such as wavelet transform (WT) [30] and genetic algorithm (GA) [31] to improve the prediction accuracy. Recently, an improved version of kernel-based machine learning models, i.e., kernel-based nonlinear extension of Arps decline model (KNEA) has been developed and successfully applied in various fields [32][33]. However, this new powerful model has not yet been tested in evapotranspiration studies.
The tree-based machine learning models have recently started to draw researchers' attention due to their relative simplicity but powerful capability in time-series prediction [34]. [35] evaluated the performance of the M5Tree model for estimating daily ET 0 in California of USA, and found the M5Tree model gave satisfactory ET 0 estimates. [36] evaluated the M5Tree and feedforward ANNs models to predict ET 0 in the arid regions. It was concluded that ET 0 values predicted by the M5Tree and ANNs models agreed well with the FAO-56 PM values. [37] compared the M5Tree and ANNs models to predict ET 0 at two sites in USA. They found that the M5Tree model outperformed the ANNs models for estimation of ET 0 when the input and output data at the target station were not available. [38] compared the random forest (RF) to the GRNN model for prediction of daily ET 0 in southwestern China, and concluded that the RF model gave better daily ET 0 estimates than the GRNN model. [39] further used a hybrid RF model with the wavelet algorithm for daily ET 0 estimation in Southern Iran. They indicated that the new coupled RF model outperformed the classic RF model. [40] explored the performance of two kernel-based models and four tree-based models for daily ET 0 estimation with limited meteorological data across China. They found that the extreme gradient boosting (XGBoost) and gradient boosting decision tree (GBDT) exhibited similar accuracy and stability compare with the kernel-based SVM and ELM models.
Other machine learning models, such as the curve-based MARS model, have also been applied to predict ET 0 . [27] modeled monthly ET 0 in Mediterranean Turkey using the MARS, LSSVM and M5Tree models. It was found that the MARS model outperformed the leastsquares support vector regression (LSSVR) and M5Tree models. [41] predicted monthly ET 0 in Iran using the MARS, SVM, GEP and empirical models. The results showed that the MARS and SVM-RBF models were generally superior to the GEP and SVM-Poly models. [42] has also evaluated the performances of the GEP model and the (semi)empirical models for predicting daily ET 0 in the hyper-arid regions of Iran, and revealed that the superiority of the GEP model for ET 0 estimation over the (semi)empirical models.
Although the neuron-based, kernel-based, tree-based and curve-based machine learning models have been widely used to predict ET 0 around the world, their performances are inconsistent in various ET 0 studies. Particularly, there is still lack of direct and comprehensive comparison of various categories of machine learning models for prediction of ET 0 in a specific region or country such as China with a vast territory and diverse climates. Due to the limited availability of complete climatic variables, the applicability of these machine learning models for estimation of ET 0 with more cheaply and reliably measured meteorological variables (e.g. temperature and precipitation) should be explored. In addition, although the utmost attention is usually paid to prediction accuracy when applying machine learning models, model stability is another major factor to consider because unstable models may produce inaccurate ET 0 estimates if new data are included [34]. Thus, the objectives of this study are to: (1) determine the effects of temperature and precipitation (a variable representing relative humidity to some extent) on the prediction accuracy of daily ET 0 in different climatic zones of China, and (2) further compare both the prediction accuracy and model stability of eight machine learning models in four categories (MLP, GRNN, ANFIS, SVM, KNEA, M5Tree, XGBoost and MARS) for predicting daily ET 0 across China as a case study.

Case study and site description
According to multiple-year mean temperatures, precipitation and altitude (Table 1), China is classified as seven climatic regions (Fig 1), i.e. the arid desert of northwest China (NWC), the semi-arid steppe of Inner Mongolia (IM), the Qinghai-Tibetan Plateau (QTP), the (semi-) humid cold-temperate northeast China (NEC), the semi-humid warm-temperate north China (NC), the humid subtropical central China (CC), and the humid tropical south China (SC) [43][44]

Data collection and analysis
Long-term daily maximum (T max ) and minimum (T min ) temperatures, relative humidity (H r ), wind speed (U) and horizontal global solar radiation (R s ) from 2001-2015 were obtained at 14 representative stations across various climatic regions of China (Fig 1). The geographical locations and meteorological values of the 14 stations are presented in Table 1. These meteorological data are provided by the National Meteorological Information Center (NMIC) of China Meteorological Administration (CMA), who has rigorously examined the data quality. The daily data were further excluded if any of the above meteorological data were missing. Overall, missing data accounted for only 0.08% of the database, ranging from 0 to 0.53% at various weather stations.

FAO-56 Penman-Monteith equation
The FAO-56 Penman-Monteith model was utilized to calculate daily ET 0 (mm day -1 ) and provide the reference data for the training and testing of machine learning models in this paper: where, Rn: net radiation (MJ m -2 day -1 ); G: soil heat flux (MJ m -2 day -1 ); T mean : average ambient temperature (˚C); U 2 : wind speed at 2 m height (m s -1 ); e s : saturation vapor pressure (kPa); e a : actual vapor pressure (kPa); Δ: slope of the vapor pressure curve (kPa˚C -1 ); γ: psychrometric constant (kPa˚C -1 ). Detailed calculation procedures can be found in [6].

Machine learning models for predicting daily reference evapotranspiration
Multilayer perceptron neural networks (MLP). The MLP model is one of the widely used ANNs models, which is a feed-forward neural network for nonlinear function approximation. The MLP model consists of three layers: the input, hidden and output layers. The hidden layer often has only one layer, and the number of neural unit has to be determined by the trial and error approach. In the present study, a three-layer neural network was developed: the first layer was input layer, the neural number was equal to the input numbers and the output layer has one neural unit. The MLP model is trained by the Levenberg-Marquardt algorithm, which interpolates between the Gauss-Newton algorithm (GNA) and the gradient descent algorithm. It is more robust than the GNA but still can stick with local, rather than global minima. Further details on the MLP model refer to [4].
Generalized regression neural network (GRNN). The GRNN model is proposed by [45] and is one of the radial basis function neural network (RBF) models. This model can approximate non-linear function of the input and output vectors with a function estimate obtained from the training dataset. It shows a parallel structure and no iterative process is required for model learning between the inputs and outputs. The GRNN model does not need iterative training procedures compared with the back propagation method. Further details on the GRNN model is given by [46].
Adaptive neuro-fuzzy inference system (ANFIS). The ANFIS model is proposed by [47], which is a multi-layer adaptive network coupled with neural networks with the fuzzy inference system. The first-order Sugeno fuzzy model with two fuzzy if-then rules is applied in the ANFIS model to approximate the nonlinear function in this study. The ANFIS model is consisted of five layers: the fuzzification, product, normalization, de-fuzzification and output layers. The model uses different node functions to learn and adjust the parameters in a fuzzy inference system, where the forward and backward passes are applied to decrease the computed errors. More details about the ANFIS model is given by [47].
M5 model tree (M5Tree). The M5Tree model is firstly established by [48], which is a powerful learning method to estimate the true values in a large dataset. It has a series of linear regression functions at the terminal nodes, which develops relationships between the independent variables and a dependent variable. The model firstly constructs a regression tree by splitting the instance space in a recursive manner, and selects the one maximizing the expected errors reduction following all the potential splits. The over-grown trees are then pruned and the sub-trees are then replaced by the linear regression functions. Further details of the M5Tree model refer to [49].
Extreme gradient boosting (XGBoost). The XGBoost model is proposed by [50] and is originated from the idea of "boosting". The XGBoost model integrates all the predictions of a series of "weak" learners to develop a "strong" learner via an additive training process. It is supposed to avoid the over-fitting issue and reduce the computational time. This can be obtained by simplifying the objective functions and combining the predictive and regularization terms, while it maintains optimum computation efficiency at the same time. Parallel calculation is also automatically implemented during the training period. More details about the XGBoost model refer to [50]. Support vector machine (SVM). The SVM model is developed by [51], which is widely used for classification, pattern recognition and regression analysis. The SVM model can estimate the regression on the basis of a set of kernel functions, which are capable of implicitly converting the original, lower-dimensional input dataset to a higher-dimensional feature space. The SVM model has been successfully applied in predicting ET 0 [31,41]. The radial basis function (RBF) nonlinear kernel function was utilized in the present study as a result of its outstanding performance for predicting ET 0 relative to other kernel functions [19] (Kisi, 2015), such as linear, polynomial and sigmoid functions. Further information about the SVM model is given by [51].
Kernel-based nonlinear extension of Arps decline model (KNEA). The KNEA model is a newly nonlinear model initially proposed by [32]based on the Arps decline model and kernel method. Compared with the non-parametric and "Black-Box" kernel-based models such as least-squares SVM, the KNEA model is based on the idea of "Grey-Box" and uses the semi-parametric formulation to build the nonlinear models [52]. The kernel-based grey system models are more efficient with small samples [53][54], while the KNEA model performed better with larger samples since samples are not accumulated in the model.
The KNEA model can be described as: where f(x) is the output at this time, f(x−1) is the output at the last step time. u(x) represents the factors affecting the output, g(u(x)) can be interpreted as the relationship between u(x) and f(x), μ is the bias. From this model, we can see that the output of this time is the result of joint action between the output from last time step and the influencing factors at this time.
The nonlinear function g(u(x)) is difficult to determine and can be translated to: This means mapping the original influencing factors into the new space. Therefore, the formula (2) can be written as: Although we still can't solve Eq (4), we can find a very small value so that the difference between the left and right side of the equation is as small as possible: s:t: where γ is the regularization term, it can control the model smoothness. Like SVM, this optimization problem can be solved by Lagrangian multiplier method: where λ x is the Lagrangian multiplier. The Karush-Kuhn-Tucker (KKT) conditions for optimality of the Lagrangian multiplier method are as follows: where ,where I n-1 is an n−1 dimensional identity matrix with all the diagonal elements as 1 and others as 0. λ,μ and a can be obtained by Eq (9). The O ij can be employed a kernel function K(�,�), which satisfies the Mercer's theorem, and a Gauss-type kernel function was selected in the present study. Further information about is given by [32].
Multivariate adaptive regression spline (MARS). The MARS model is a non-parametric regression approach proposed by [55], which needs no assumption on the relationships between the independent and dependent variables. In the MARS model, a series of coefficients and functions defined as basis functions are used for modeling. The basis function of the MARS model is the outcome of a truncated spline function or multiple spline functions. The number of basis functions and the determination of basis functions are automatically determined by data. Meanwhile, the MARS model integrates the merits of the recursive auto-fractional regression method in dividing spatial regions, projection tracking method in processing high-dimensional data and the advantages of accumulative regression node self-adaptation. Further details of the MARS model refer to [10].

Input combinations and K-fold cross-validation
Precipitation is not directly correlated to ET 0 , but it is a manifestation of relative humidity to some extent and may correct the temperature-based ET 0 models. However, the real amount of precipitation may underestimate or exaggerate its effect on the reduction of daily ET 0 due to large variation range from 0 mm to even hundreds of mm in humid regions. Therefore, a simple transformed precipitation (P t , 1 for precipitation > 0 and 0 for precipitation = 0) was applied here to represent the general effect of precipitation on ET 0 prediction. Two input combinations of meteorological variables were thus used in the present study to assess the temperature and precipitation effects on daily ET 0 prediction, i.e., C1: T max , T min and R a ; C2: T max , T min , and R a . The K-fold cross-validation method was applied, where the obtained temperature and precipitation data during 2001-2015 were equally partitioned into five periods. Four periods were used for model training and the last one was used to test the models, which was run over the five various stages ( Table 2). The main parameters of the eight machine learning models were optimized by using the grid-search method.

Statistical evaluation
Four common statistical indicators were used in this study to evaluate the models, i.e., RMSE, R 2 , MAE and NRMSE, which can be expressed as [56][57]: where Y i,m , Y i,e , � Y i;m and � Y i;e are the measured, estimated, mean of measured, mean of estimated reference evapotranspiration, respectively; n is the number of observed data. Higher R 2 values indicate high prediction accuracy, whereas lower values of RMSE, MAE and NRMSE suggest better model performance. Considering the requirements of the MLP and KNEA models, the raw climatic data were normalized between 0 and 1 as follows: where z n and z i is the normalized and raw data; z max and z min are the minimum and maximum raw data.

Comparison of prediction accuracy of eight machine learning models across China
The  [58] found that the LS-SVM model yielded accurate ET 0 estimation in the Changwu County, China. [13] indicated that the SVM model outperformed the ANNs model for the estimation of daily ET 0 in an extreme arid region of China. [27] found that the LS-SVM model outperformed the MARS and M5Tree models, while the MARS model was superior to the LS-SVM and M5Tree models in cross-station applications. [59] suggested that the SVM model gave better daily ET 0 estimation than the tree-based assemble models (RF, M5Tree, GBDT and XGBoost) under various input combinations across China. Specifically, the GRNN model outperformed all the other machine learning models in daily ET 0 modeling in the seven climatic zones of China during the training period. However,   higher accuracy in the Qinghai-Tibet Plateau. The RMSE values obtained by these best-performing models in each climatic zone were generally smaller than or close to those obtained in the corresponding regions by previous studies when using only T max and T min data, e.g., by the SVM (0.539 mm day −1 ) and ANN (0.561 mm day −1 ) models in Ejina City of China [13], by ELM (0.444-0.498 mm day −1 ) mode, GANN (0.445-0.499 mm day −1 ) and WNN (0.443-0.641 mm day −1 ) models in the humid region of southwest China using T max and T min [2], by SVM Comparison of machine learning models for predicting daily reference evapotranspiration (0.530-0.868 mm day −1 ), M5Tree (0.637-0.953 mm day −1 ) and XGBoost (0.532-0.817 mm day −1 ) models in different climatic zones of China [40]. The scatter plots of daily FAO-56 PM ET 0 values and those predicted by the eight machine learning models for the capital city of China (Beijing) over the five-fold cross validation periods under the two input combinations during testing are presented in Figs 3 and 4, respectively. It can be seen that the selected machine learning models had various prediction accuracies over the five periods. Overall, higher statistical errors were attained during the S4 Comparison of machine learning models for predicting daily reference evapotranspiration period (2010-2012), and the S5 period (2013-2015) produced higher prediction accuracy. These differences were largely resulted from the time-series changes in climatic variables among the five periods. This confirms the needs to apply the K-fold cross-validation method for accurately estimating daily ET 0 in various climates [27,59]. Nevertheless, the performances of these machine learning models showed the same tendency at various cross-validation stages, with better daily ET 0 estimates by the SVM, KNEA and MARS models. The dispersion degree of the data points of the SVM, KNEA and MARS models was lower than that of the M5Tree, in the whole China. These statistical results were generally similar to those obtained by various machine learning models with only T max and T min data in previous studies [2,13,40]. However, the incorporation of P t as input parameter to the machine learning models introduced an average decrease in RMSE by 7.7% and 7.6% during the training and testing periods, respectively. Specifically, an average decrease in RMSE by 4.8%, 6.0%, 7.4%, 6.5% and 6.5% were obtained during the testing period by considering P t in the machine learning models in NWC, IM, QTP, NEC and NC, respectively. However, much higher decreases in RMSE were obtained in CC (by 9.7%) and SC (by 12.4%) by machine learning models with the input combination of T max , T min , P t and R a compared with those with the input combination of T max , T min and R a . These results indicated that the incorporation of precipitation information in machine learning models can improve the prediction accuracy of daily ET 0 , particularly in the subtropical and tropical humid regions (Tables 3-9). [59]have found that the prediction accuracy of empirical and machine learning models for estimating daily global solar radiation can be much improved by considering precipitation as an input, because it was a manifestation of cloud cover and could correct the temperature-based models by considering its effects on the radiation reduction. This can also explain why the performance of machine learning models for daily ET 0 prediction from daily minimum/maximum temperature can be enhanced when the information of precipitation was further included.

Comparison of model stability of eight machine learning models
As illustrated earlier, the GRNN, XGBoost and M5Tree models outperformed the corresponding MLP, SVM, KNEA, ANFIS and MARS models for predicting daily ET 0 in the whole China during the training period in terms of R 2 , RMSE, NRMSE and MAE (Tables 3-9). However, the SVM, KNEA and MARS models produced better daily ET 0 estimates compared with the other machine learning models during the testing period. The percentage increase in RMSE during testing relative to RMSE during training by the eight machine learning models for estimating daily ET 0 in NWC (Urumqi and Dunhuang), IM (Yinchuan and Erenhot), NEC (Harbin and Shenyang), NC (Beijing and Zhengzhou), QTP (Geermu and Lasa), CC (Wuhan and Guilin) and SC (Guangzhou and Haikou) under the two input combinations are also shown in Tables 3-9. The model stability is also an essential factor to consider for predicting more accurate and reliable daily ET 0 . These tables suggested that the SVM, MLP and MARS models were the most stable models with the consistently small percentage increase in RMSE during testing relative to that during training in all the climatic zones of China (on average 1.9%, 2.0% and 2.6%, respectively). The KNEA and ANFIS models also exhibited relatively smaller increase in testing RMSE (on average 6.4% and 7.8%, respectively). However, the GRNN, M5Tree and XGBoost models exhibited the much larger increases in testing RMSE (on average 20.1%, 14.5% and 12.0%, respectively). These increase indicated the instability of the GRNN, M5Tree and XGBoost models as they introduced high decreases in model performances when including new dataset. [40]showed that the kernel-based SVM and ELM models were more stable compared with the tree-based RF, M5Tree and XGBoost models for the estimation of daily ET 0 . [34]also found that the RF and bagging models showed greater increases in RMSE during testing compared with the SVR and gradient models when predicting global solar radiation.
These suggest that the kernel-based machine learning models (e.g., SVM, ELM and KNEA) are generally more stable than the tree-based models (RF, M5Tree and XGBoost).

Comprehensive evaluation of eight machine learning models
The SVM, KNEA and MARS models outperformed the other machine learning models in daily ET 0 modeling in terms of prediction accuracy during the testing period. Considering the model stability, the SVM, MLP and MARS exhibited very small percentage increase in RMSE during testing (< 3.0%), while the KNEA and ANFIS models showed relatively small increase in testing RMSE (< 8.0%). The SVM model exhibited the best combination of prediction accuracy and model stability among the eight machine learning models, while the KNEA and MARS model also provided satisfactory combination of prediction accuracy and model stability. Comprehensively considering the prediction accuracy and model stability, the SVM, KNEA and MARS models are recommended for estimating daily ET 0 using only temperature and precipitation data across various climatic regions of China and maybe elsewhere in similar climates.

Conclusions
The performance of eight machine learning models in four categories, e.g. neuron-based (MLP, GRNN, ANFIS), kernel-based (SVM, KNEA), tree-based (M5Tree, XGBoost) and curve-based (MARS) models, for the estimation of daily ET 0 were compared based on only temperature and precipitation data during 2001-2015 obtained from 14 representative stations across various climatic zones of China. The results showed that the machine learning models using only temperature attained satisfactory daily ET 0 estimation. The prediction accuracy was further improved across China when information of precipitation was considered, especially in the (sub)tropical humid regions. This indicates that precipitation is a manifestation of relative humidity to some extent and can correct the temperature-based ET 0 models. The kernelbased SVM, KNEA and curve-based MARS models generally gave more accurate daily ET 0 estimates than the other models for, with the best performance by KNEA in NWC and IM, by SVM in QTP, CC and SC, as well as a similar best performance by them in NEC and NC. The SVM, MLP, MARS and KNEA models showed relatively small percentage increase in RMSE during testing over the training one. Comprehensively considering both prediction accuracy and model stability, SVM is highly suggested, while KNEA and MARS are also alternative models for predicting daily ET 0 in various climatic regions of China. The satisfactory performances of these proposed machine learning models with ambient temperatures and transformed precipitation indicates that it is possible for near-future prediction of daily ET 0 using public weather forecasts, including daily maximum and minimum temperatures and whether there is precipitation or not. Nevertheless, more study is needed to explore the performances of the proposed machine learning models at varying temporal scales or in various climatic regions.