Comparison of five Boosting-based models for estimating daily reference evapotranspiration with limited meteorological variables

Accurate ET0 estimation is of great significance in effective agricultural water management and realizing future intelligent irrigation. This study compares the performance of five Boosting-based models, including Adaptive Boosting(ADA), Gradient Boosting Decision Tree(GBDT), Extreme Gradient Boosting(XGB), Light Gradient Boosting Decision Machine(LGB) and Gradient boosting with categorical features support(CAT), for estimating daily ET0 across 10 stations in the eastern monsoon zone of China. Six different input combinations and 10-fold cross validation method were considered for fully evaluating model accuracy and stability under the condition of limited meteorological variables input. Meanwhile, path analysis was used to analyze the effect of meteorological variables on daily ET0 and their contribution to the estimation results. The results indicated that CAT models could achieve the highest accuracy (with global average RMSE of 0.5667 mm d-1, MAE of 4199 mm d-1and Adj_R2 of 0.8514) and best stability regardless of input combination and stations. Among the inputted meteorological variables, solar radiation(Rs) offers the largest contribution (with average value of 0.7703) to the R2 value of the estimation results and its direct effect on ET0 increases (ranging 0.8654 to 0.9090) as the station’s latitude goes down, while maximum temperature (Tmax) showes the contrary trend (ranging from 0.8598 to 0.5268). These results could help to optimize and simplify the variables contained in input combinations. The comparison between models based on the number of the day in a year (J) and extraterrestrial radiation (Ra) manifested that both J and Ra could improve the modeling accuracy and the improvement increased with the station’s latitudes. However, models with J could achieve better accuracy than those with Ra. In conclusion, CAT models can be most recommended for estimating ET0 and input variable J can be promoted to improve model performance with limited meteorological variables in the eastern monsoon zone of China.


Introduction
Reference evapotranspiration (ET 0 ) is an essential factor in both of hydrological and ecological process [1][2][3][4][5]. Since ET 0 plays a crucial role in calculating crop water requirement, water budgeting and agricultural water management, accurate estimation of ET 0 is very meaningful and also serves as the foundation of realizing water-saving irrigation and intelligent irrigation. Methods of obtaining ET 0 can be generally divided into three types: experimental method, empirical models and numerical simulations. Although experimental determination can measure ET 0 directly, it can hardly be popularized due to its tedious operation steps and strong regional limitations [6][7][8]. Now days, FAO-56 Penman-Monteith (FAO-56 PM) model is generally regarded as the most authentic method for estimating ET 0 in semiarid and humid regions and the estimation result is also widely used as the target to validate other models in areas where ET 0 data are not available [9][10][11][12]. However, the meteorological variables required by FAO-56 PM model for estimating ET 0 are difficult to obtain or fully available in most regions, which makes it difficult to be implemented. According to the principle of selecting ideal model for estimating ET 0 proposed by Shih [13], ideal models should be based on minimal input variables with acceptable accuracy. Therefore, empirical models based on less meteorological variables have evolved to enhance the practicality of empirical models over the years [12,[14][15][16], which can be generally classified as temperature-based, radiation-based, pan evaporation-based, mass transfer-based and combination type [4]. Among all these empirical models, Hargreaves-Samani model [17] requires the least meteorological variables input and has already been proved its accuracy around the world, which makes it the most popular empirical model. Other empirical models based on simplified Penman-Monteith model and solar radiation, such as Priestley-Taylor model [18], Irmak model [19] and Makkink model [20], have also been implemented in areas where full meteorological factors can hardly be obtained. However, these methods usually have such regional limitation and poor portability that they are not suitable to be applied for accurate estimation directly without taking localization approach.
By introducing intelligent algorithms for analyzing the non-linear relationship between meteorological variables and ET 0 , numerical simulation method using machine learning and deep learning have been advanced greatly. Since Kuma first investigated artificial neural network (ANN) models for estimating ET 0 [21], this kind of method has attracted more and more researchers because of its short time, high precision and strong generalization ability. These algorithms can be generally classified as artificial neural networks-based [9,[22][23][24][25], treebased [7,26,27], kernel-based [28,29], heuristic-based [27,30,31] and hybrid algorithm-based [32,33].
To further improve the accuracy of machine learning algorithm in ET 0 estimation, ensemble learning has drawn attention from more and more researchers. The core idea of ensemble learning is to combine several 'weak learners' to build a new 'strong learner', so as to reduce bias, variance and improve prediction results. Common ensemble learning models like Random Forest [34], Gradient Boosting Decision Tree [35] and Extreme Gradient Boosting models [36] have already widely used in various classification and regression problems [3,[37][38][39] based on the characteristics of simple structure and high accuracy.
This study provides a comparison of five Boosting-based models to find out the best Boosting-based for estimating daily ET 0 under the condition of limited input variables in the eastern monsoon zone of China. Therefore, the main purpose of this study produced as follows: (1) to compare the accuracy and stability of Boosting-based models with various input combinations across different climate zones; (2) to find an effective approach for improving the modeling accuracy under the condition of limited input variables.

Study area and data description
Geographically, the eastern monsoon zone of China is located in the east of the Great Khingan Mountains, south of the Inner Mongolia Plateau and east of the eastern edge of the Tibetan Plateau, including the second-level Loess Plateau, Sichuan Basin, Yunnan-Guizhou Plateau and the Hengduan Mountain area, as well as the third-level coastal plain and hilly areas. The climate types of the eastern monsoon zone include temperate monsoon climate, subtropical monsoon climate and tropical monsoon climate. This study area is significantly affected by the ocean monsoon in summer and the cold air flow from the north in winter. The annual average temperature changes significantly with latitude, showing a decreasing trend from south to north. This zone accounts for about 45% of the country's land area and 95% of the Chinese total population. As the eastern monsoon zone servers as one of the main farming areas of China, the research on the estimation model of ET 0 can provide scientific basis for the accurate prediction of crop water demand in this region and improve the utilization efficiency of agricultural water resources, which is of great significance to the sustainable utilization of water resources.
According to the climate type and latitude distribution range of the eastern monsoon zone, 10 meteorological stations (Harbin, Shenyang, Yan 'an, Jinan, Nanjing, Changsha, Chengdu, Kunming, Nanning and Guangzhou) were selected as research stations. To be more specific, Harbin, Shenyang, Yan 'an, Jinan belong to the temperate monsoon zone (TMZ), Nanjing, Changsha, Chengdu, Kunming belong to the subtropical monsoon zone (SMZ) and Nanning, Guangzhou belong to the tropical monsoon zone (TPMZ).
In order to test and verify the accuracy and stability of Boosting-based models for ET 0 estimation, daily meteorological variables, including maximum(T max ), and minimum(T min ) air temperature, relative humidity (RH), wind speed at 2 m height (U 2 ) and solar radiation (Rs) from 1997 to 2016 continuously, were selected as the training and testing data set. The above meteorological data was obtained from the National Meteorological Information Center (NMIC) of China Meteorological Administration (CMA) with good quality and high precision and the missing data was interpolated through PYTHON KNN interpolation method in data pre-processing. The annual average values of the main meteorological variables at above stations during the study period were illustrated in Table 1. All daily meteorological data were normalized to fall between 0 and 1 to improve the convergence rate of the model and minimize the influence of absolute scale. The normalization equation is as follows [3,24,26]: Where X norm is the normalized value, X 0 , X min , and X max are the real value, the minimum value, and the maximum value of the same variable, respectively.

FAO-56 Penman-Monteith model
Since it is difficult to obtain the practical ET 0 data in this study area, ET 0 values calculated by the FAO-56 Penman-Monteith model are regarded as the target for training and testing the Boosting-based models, which is a widely used and acceptable practice in this case [2,8,22,30,40]. The FAO-56 PM model is expressed as: Where ET 0 is reference evapotranspiration (mm d -1 ), R n is net radiation (MJ m -2 d -1 ), G is soil heat flux density (MJ m -2 d -1 ), T mean is mean air temperature at 2 m height (˚C), e s is saturation vapor pressure (kPa), e a is actual vapor pressure (kPa), Δ is slope of the saturation vapor pressure function (kPa˚C -1 ), γ is psychometric constant (kPa˚C -1 ), U 2 is wind speed at 2 m height (m s -1 ).

Boosting-based models
Boosting algorithm is a category of the ensemble learning algorithm. The principle of the Boosting algorithm is to first train a weak learner1 from the training set with the initial weight and then update the weight according to the error. When the weight becomes higher, samples with high error rate are more valued in the latter weak learner 2. After adjusting the weight based on the training set, the repetition of single weak learner is performed until the number of weak learners reaches the predetermined number. Finally, the weak learners are integrated through the set strategy (usually by weighted averaging) to obtain the final strong learner for regression or classification purpose [41]. In 1997, Freund proposed the first practical Boosting algorithm-Adaptive Boosting [42], which laid the foundation for Boosting from an idea to a practical approach. Subsequently, Friedman introduced the idea of gradient descent into the Boosting algorithm and then proposed the Gradient Boosting algorithm [35] which is more practical and can handle different loss functions. Based on the above research, Boosting-based model has been continuously developed by researchers and has already been widely used in classification and regression problems. In this study, five Boosting-based models, Adaptive Boosting (ADA), Gradient Boosting Decision Tree(GBDT), Extreme Gradient Boosting(XGB), Light Gradient Boosting Decision Machine(LGB) and Gradient boosting with categorical features support(CAT), are employed to compare their performance of estimating ET 0 value. All codes of Boosting-based models introduced in this study were written in Python and performed in a laptop with Intel Core i7-9750H CPU @2.60GHz, NVDIA GeForce GTX 1660Ti GPU and 16GB of RAM. For evaluating the performance of each model at the same level of model structure and complexity, only 'n_estimators' and 'learning_rate' were set to 500 and 0.05 respectively and other hyper parameters were set to default.
Adaptive Boosting (ADA). The first boosting algorithm, Adaptive Boosting (ADA) was proposed by Freund [42]. AdaBoost assigns equal initial weights to all training data for weak learns training, then updates the weight distribution according to the prediction results. To be more specific, higher weights are assigned to mispredicted samples while lower weights are given to samples predicted correctly, which makes the next training step more focused on mispredicted samples to reduce bias. Above process is repeated until the specified number of iterations or the expected error rate is reached, then all predicted results of the weak learners are added linearly with weights as the final result. The detailed calculation procedures of ADA are described as follows: For the given dataset D ¼ fx i ; y i g M i¼1 , the steps of ADA model for regression problem can be expressed as follows: (1) Initialize the weight distribution of the training samples as follows: (2) For k (k = 1,2,3. . .K), taking D k as the training set of weak learner f k (x) and calculating the following indicators: (a) Maximum error: (b) Relative error of each sample: (c) Regression error rate: (d) Weight of weak learner f k (x): (e) Weight distribution of samples is updated as: Where Z k is normalizing factor: (3) The final strong learner is obtained as: Where g(x) is the median of all α m f m (x), m = 1,2,3. . .M.
Although ADA is no longer suitable for the current scenario of large sample, high-latitude data usage, its appearance has turned Boosting idea from an initial conjecture into a practical algorithm, which greatly promoted the development of subsequent Boosting-based algorithms.
Gradient Boosting Decision Tree (GBDT). The Gradient Boosting Decision Tree (GBDT) is an iterative decision tree algorithm, proposed by Friedman [35]. The weak learners in GBDT model have strong dependencies between each other and are trained by progressive iterations based on the residuals. The results of all weak learners are added together as the final result, which makes GBDT have great advantages in over-fitting and computational cost fields and also insensitive to data set missing and can reduce bias at the same time. The detailed calculation procedures of GBDT are described as follows: For the given dataset D ¼ fx i ; y i g M i¼1 , the steps of GBDT model for regression problem can be expressed as follows: (1) Initialize the weak learner: Where L(y i , γ) is the loss function.
(3) Taking (x i , r im ) i = 1,2,3. . .,m as the training data of the weak learner n and the leaf node region is R nj , j = 1,2,3. . .,J. For this new weak learner, the optimal negative gradient fitting value of each leaf node is calculated as follows: (4) The model is updated as: (5) The final strong learner is obtained as: Extreme Gradient Boosting (XGB). Extreme Gradient Boosting (XGB) is an improved algorithm based on GBDT algorithm [36]. Different from the original GBDT model, XGB model obtains the residual by performing second-order Taylor expansion on the cost function, and adds a regularization term to control the complexity of the model at the same time. The addition of regularization terms reduces the variance of the model and makes the model more simplified, making XGB model superior to original GBDT model in terms of weighing the bias-variance tradeoff and preventing overfitting. Also, XGB supports multiple cost functions and parallel operations on feature granularity.
The specific calculation procedures of XGB are described as follows: (1) Define the objective function as follows: Where C is a constant term, which can be commonly omitted and R(f k ) is the regularization term at the k time iteration, defined as follows: Where α is complexity of leaves, T is the number of the leaves, η is the penalty parameter and ωj is the output result of each leaf node.
(2) Introduce second-order Taylor series of objective function and adopt the mean square error as the loss function, the objective function can be described as follows: Where o qðx i Þ is f k , g i and h i is the first and second derivative of loss function, respectively. the output result of each leaf node.
(3) Determine the final loss value by summing the loss values of leaf nodes. Therefore, the objective function can be expressed as: Where h i , and I j indicates all samples in leaf node j. [43], which has the advantages of lower memory consumption, higher precision and faster training efficiency. Traditional Boostingbased algorithms need to scan all the sample points for each feature to select the best segmentation point, which leads to the model taking too much time in the large sample and high latitude data condition. In order to solve the above problems and further improve the efficiency and scalability of the model, LGB introduces the Gradient-based One-Side Sampling(GOSS) and Exclusive Feature Bundling(EF-B) algorithm. Fig 1 illustrates the special strategy adopted by LGB algorithm and detailed introduction is as follows:

Light Gradient Boosting Decision Machine (LGB). Light Gradient Boosting Decision Machine (LGB) is a novel algorithm from Microsoft
The GOSS algorithm does not use all sample points to calculate the gradient, but instead reserves the sample points with large gradients and performs random sampling on the sample points with small gradients to complete the data sampling in order to maintain the accuracy of information gain. Information gain indicates the expected reduction in entropy caused by splitting the nodes based on attributes, which can be described as follows: Where En(B) is the information entropy of the collection B, p d is the ratio of B pertaining to category d, D is the number of categories, v is the value of attribute V and B v is the subset of B for which attribute has value v. As shown in Fig 1A, the EF-B uses a histogram-based approach, which can discrete floating-point eigenvalues into k integers and construct a histogram of k width. In this way, optimal segmentation point can be found based on the discrete value of histogram with lower memory consumption. In addition, Fig 1B also manifests that the histogram of a single leaf can be obtained by contrasting the histogram of its parent node with that of its sibling node in LGB algorithm, which further increases the speed of the model.
The general process of level-wise and leaf-wise strategies is shown in Fig 1C. Compared with the level-wise strategy, the limited leaf-wise strategy used by LGB could be more effective because it only split at the leaf with the largest information gain and the limited depth can prevent overfitting effectively.
Gradient boosting with categorical features support (CAT). Gradient boosting [44] with categorical features support (CAT) introduces a modified target-based statistics (TBS) to use all data set for training and avoid potential overfitting problem by performing random permutations. To be more specific, CAT first randomly sorts all samples, and then takes a value from a category-based feature. Each sample's feature is converted to a numerical value by taking an average value based on the category label that precedes the sample, and adding priority and weight coefficients of priority. In the process of building new weak learners, CAT first uses the gradient of the sample points before the sample X n to estimate the model, and then uses these models to calculate the gradient of X n and update the model. Moreover, CAT uses the oblivious tree as the weak learner, and the index of each leaf node in the oblivious tree can be encoded as a binary vector of length equal to the depth of the tree, which further enhance the model's ability to resist overfitting.
Compared with XGB and LGB, CAT has following main contributions: 1. Categorical features can be handled automatically by using TBS before training process.
2. Feature dimensions can be enriched by combining the category features according to the relationship between different ones.
3. Overfitting problem can be better resisted by adopting complete oblivious tree.

Calibration and validation of the models
This study considered limited meteorological variables input combinations as the combination of air temperature data (T max and T min ) with Rs, RH and U 2 respectively. In addition, Since extraterrestrial radiation (Ra) is commonly applied to improve the modeling accuracy for estimating ET 0 with limited input meteorological variables and it is closely related to the geographic data of the station and the number of the days in a year(J), this study also employed J as the input variable to compare with the modeling accuracy improvement brought by Ra and J. As summarized above, six input meteorological variables combinations were shown in Table 2, these combinations are: (1) T max , T min , Rs; (2) T max , T min , RH; (3) T max , T min , U 2 ; (4) T max , T min ; (5) T max , T min , Ra and (6) T max , T min , J. 10-fold cross validation method was used to better evaluate the accuracy of the model and reduce the randomness brought by test samples, and the average value of 10-fold cross-validation result is used as the final performance of the model. In addition, meteorological data from 1997 to 2011, 2012 to 2016 were used as the training and testing set respectively, with a different proportion of training and testing sets from that of 10-fold cross validation stage to analyze model accuracy on daily scale.

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Adj Where ET PM i and ET M i are ET 0 values estimated by FAO-56 PM and other models respectively. ET PM 0 and ET M 0 are the mean values of the ET 0 values estimated by FAO-56 PM model and other models respectively. N and P are the number of test samples and variables, respectively. i is the number of i-th step, n is the number of the total steps. RMSE is in mm day -1 , with the value range from 0 (optimum value) to +1 (worst value). MAE is in mm/d, with the value range from 0 (optimum value) to +1 (worst value). R 2 and Adj_R 2 are dimensionless, with the value range from 1 (optimum value) to −1 (worst value).

Comparison of different Boosting-based models with various input combinations on daily scale
The performances of Boosting-based models with daily different meteorological variables inputs at Harbin, Shenyang, Yan 'an, Jinan, Nanjing, Changsha, Chengdu, Kunming, Nanning and Guangzhou stations were illustrated in Tables 3-12, respectively. Tables manifested that the tested models generally had similar performance ranking across 10 stations. For brevity, Harbin, Changsha and Guangzhou were chosen as representatives of TMZ, SMZ and TPMZ respectively to describe in detail.  ADA1  GBDT1  XGB1  LGB1  CAT1   M2  T max , T min , RH  ADA2  GBDT2  XGB2 LGB2 CAT2 LGB6 CAT6 As shown in Table 3, CAT models generally achieved the best performance (on average RMSE of 0.  During the 10-fold cross validation stage, models with M1(Rs) input (with RMSE ranged from 0.4288-0.5748 mm d -1 , MAE ranged from 0.2871-0.4715 mm d -1 , Adj_R 2 ranged from 0.9019-0.9461) and M2(RH) input (with RMSE ranged from 0.4334-0.6108 mm d -1 , MAE ranged from 0.2919-0.4857 mm d -1 , Adj_R 2 ranged from 0.8883-0.9446) performed the best. When only T max and T min data were inputted, models based on M4 input achieved the worst precision (with RMSE ranged from 0.4288-0.5748 mm d -1 , MAE ranged from 0.2871-0.4715 mm d -1 , Adj_R 2 ranged from 0.8373-0.8726), followed by models based on M3(U 2 ) input (with RMSE ranged from 0.5997-0.7052 mm d -1 , MAE ranged from 0.4259-0.5417 mm d -1 ,

PLOS ONE
Comparison of five Boosting-based models for estimating daily reference evapotranspiration Adj_R 2 ranged from 0.8537-0.8940). It is worth to see that models based on M5(Ra) and M6 (J), which are combinations of temperature data with Ra and J respectively, could achieve better performance than models based on M4 and even models based on M3 input. In addition, models based on M6 (with RMSE ranged from 0.5122-0.6429 mm d -1 , MAE ranged from 0.3473-0.4783 mm d -1 , Adj_R 2 ranged from 0. 8782-0.9231) could obtain slightly better accuracy than models based on M5 (with RMSE ranged from 0.5224-0.6812 mm d -1 , MAE ranged from 0.3507-0.5107 mm d -1 , Adj_R 2 ranged from 0.8632-0.9201). The performance of tested models at Changsha station (SMZ) was demonstrated in Table 6. The accuracy ranking of various Boosting-based models was same as that of Harbin station, which was in the order of CAT, LGB, XGB, GBDT and ADA. However, the overall simulation accuracy suffered a slight decrease compared with the performance of models at Harbin station, the average Adj_R 2 value of ADA, GBDT, XGB, LGB and CAT decreased by 13.02%, 11.22%, 10.00%, 10.00% and 9.72% respectively. Particularly, models based on M1 combination achieved the best precision (with RMSE ranged from 0.2746-0.4275 mm d -1 , MAE ranged

PLOS ONE
Comparison of five Boosting-based models for estimating daily reference evapotranspiration from 0.1959-0.3527 mm d -1 , Adj_R 2 ranged from 0.8976-0.9573),which was far ahead of models with other input combinations. The above results could also be found in the testing stage. Table 12 showed the performance of tested models at Guangzhou station (TPMZ). Same performance ranking of tested models could also be found at Guangzhou station, but overall simulation accuracy suffered a more significant decrease, the average Adj_R 2 value of ADA, GBDT, XGB, LGB and CAT decreased by 38.39%, 30.08%, 27.83%, 27.43 and 26.73% respectively, compared with those of models at Harbin station. In terms of effect of input combinations on modeling accuracy, models with M5 (with RMSE ranged from  In conclusion, CAT models could offer the highest accuracy among all tested models no matter under what input combination or at which station, followed by LGB and XGB models, which could also achieve relatively satisfactory precision. There is no doubt CAT1 based on Rs obtained the best performance and be highly recommend for estimating daily ET 0 in this study area. However, CAT5 and CAT6 models based on only temperature data and partial geographic data could achieve acceptable accuracy with fewest meteorological variables, which can be regarded as more cost-effective and more conducive to promotion and application.  Because of the modeling accuracy of ADA models were much worse than the other Boosting-based models, the stability comparison employed in present study mainly focuses on GBDT, XGB, LGB and CAT models. Among the tested models, CAT model not only achieved the smallest average RMSE value, but also the most concentrated distribution of RMSE values regardless of the input combinations, which indicated that the CAT model had the best precision stability. The stability of the other three models is basically the same, thus the modeling accuracy should be the primary consideration when selecting one model for estimating ET 0 among these 3 models. In terms of the effect of input combinations, taking CAT models as example, the RMSE values of CAT model based on M2 input obtained the minimal fluctuation (with RMSE ranged from 0.4334-0.6044) across 10 stations, followed by models based on M3, M6, M5, M4 and M1. It's also worth noting that although the accuracy of models with M1 input was the highest in each station, the accuracy gap between stations across different climate zones was the largest (with RMSE ranged from 0.2746-0.5861), which may as a result of the differences in the Rs distribution among each station and the different contribution of Rs to daily ET 0 across various climate zones.

The results of path analysis between meteorological variables and ET 0 at 10 stations
Path analysis is a method proposed by Sewell Wright for studying the direct and indirect effects of independent variables on dependent variables and quantitatively analyzing the mutual influence degree of factors. Therefore, this study introduced path analysis to analyze the effect of T max , T min , RH, U 2 and Rs on daily ET 0 . The results of path analysis between meteorological variables and ET 0 across all stations were shown in Table 13. It could be found from Table 13 that except for RH and U 2 , the other 3 meteorological variables all had positive correlation coefficient to ET 0 at all 10 stations. As illustrated in Fig  It's obvious to see that from Harbin station to Guangzhou station, the correlation coefficient of T max , T min , RH and U 2 with ET 0 showed decrease trend in general, while only Rs were on the contrary (increased from 0.749 at Harbin station to 0.826 at Guangzhou station), which indicated that the overall contribution of Rs on ET 0 increased significantly and became more and more crucial for accurately estimating ET 0 as the latitude of the station goes down in this study area.
The direct effects tendency of T max , T min , RH, U 2 and Rs on ET 0 across 10 stations was shown in Fig 3B. At Harbin station, T max contributed the largest direct effect (0.544), which was 0.154 more than Rs (0.390), but the direct effect of T min was almost none, only 0.003. As the station's latitude goes down, the direct effect of T max showed a significant decrease, the direct effect of T min , RH (absolute value) and Rs had apparent rise, and that of U 2 showed a slight rise. When it comes to Guangzhou station, the direct effect of T max only left 0.104, while the direct effect of Rs rose to 0.711 and T min exceeded T max to 0.348. This trend is similarly reflected in the overall contribution of variables to R 2 . It can be seen from Fig 3C, specially, as the second most contributing variable (0.142) at Harbin station, T max reduced to the least contributing variable (0.001)at Guangzhou station. On the contrary, T min , which is the least contributing variable at Harbin station (0.000), could offer a contribution to R 2 of 0.069 at Guangzhou station. Other meteorological variables also gradually increase the contribution to the R 2 of ET 0 estimation results, as the latitude goes from north to south.  In conclusion, Rs had the greatest contribution to ET 0 , followed by T max , T min and RH, while U 2 generally had the least effect on daily ET 0. These results provided an explanation for the difference in the modeling accuracy of models with same input condition between stations across different climate zones and could also offer a reliable reference for selecting appropriate input combination for ET 0 estimation in different climate zones.

Effect of Ra and J on improving model accuracy
Ra has been proved that it can improve the estimation accuracy of daily ET 0 when only limited meteorological variables are available [26,[46][47][48]. As shown in Fig 4, 16.18%, 11.72%, 7.33%, 5.10%, 18.67%, 9.95% and 9.17% respectively. It is obvious to see that CAT6 performed even better than CAT5 and this kind of improvement on modeling accuracy decreased as the station's latitude decreases. This phenomenon may as a result of meteorological conditions of the stations in this study area are generally quite stable and ET 0 is the result of the coupling effect of various meteorological variables, so J contains more overall information and variation pattern of ET 0 than the single calculated Ra.
To sum up, the result of employing Ra with only temperature data for estimating ET 0 in present study was also same as previous. As a parameter to calculate Ra, J could also improve the modeling accuracy with limited inputs and was even better than Ra. Therefore, models based on J input can be recommended for estimating ET 0 when partial meteorological variables and geographical data are absent.

Strategy for selecting proper input combination at different stations
According to Tables 3-12 and the results of path analysis in 3.3, we can optimize the meteorological variables involved in the input combination at different stations. For example, T min has the least contribution to R 2 and smallest direct effect on daily ET 0 at Harbin station, thus T min could be removed from these input combinations without reducing the modeling accuracy. Similarly, T max could be removed from the input combinations at Guangzhou station due to the fact that it could hardly make contribution to the R 2 value of the estimation results. The In conclusion, the selection of the input combination for estimating daily ET 0 should be based on the importance and contribution of meteorological variables at each single station, so as to make use of available meteorological variables more effectively to obtain better accuracy.

Conclusion
This study investigated 5 Boosting-based models, including original Adaptive Boosting(ADA), Gradient Boosting Decision Tree(GBDT), Extreme Gradient Boosting(XGB), Light Gradient Boosting Decision Tree (LGB) and Gradient boosting with categorical features support(CAT), for accurately estimating daily ET 0 value with 6 different meteorological variables input combinations at 10 stations across the eastern monsoon zone of China. The results indicated that the CAT models had the highest accuracy and stability over all tested models under the same input combinations across all stations. And the LGB and XGB models could achieve very close accuracy, while original ADA models produced the worst performance. Under the condition of limited meteorological variables input, Rs definitely plays the most important role for accurately estimating daily ET 0 value which makes the models based on M1 provide the best accuracy regardless of which station. Model with M2 input combination could offer the second highest precision, while models based on M4 (only temperature data) had the worst estimation accuracy. However, when Ra and J were employed with temperature data, the modeling accuracy increased significantly. The accuracy of models based on M6 generally ranked the third place (better than models with M3 input) and models based on M5 ranked the fifth place (much better than models with M4 input). Thus, in terms of improving the accuracy of the models with limited meteorological variables, J has better effect than Ra and is more easier to obtain in this study and the improvement brought by employing J was more and more significant as the latitude of the stations increases compared with employing Ra. In summary, the CAT could be most highly recommended for estimating daily ET 0 and J can be highly recommended for improving the accuracy of models when limited meteorological variables are available or geographical information is absent in the eastern monsoon zone of China.