Advanced machine learning model for better prediction accuracy of soil temperature at different depths

Soil temperature has a vital importance in biological, physical and chemical processes of terrestrial ecosystem and its modeling at different depths is very important for land-atmosphere interactions. The study compares four machine learning techniques, extreme learning machine (ELM), artificial neural networks (ANN), classification and regression trees (CART) and group method of data handling (GMDH) in estimating monthly soil temperatures at four different depths. Various combinations of climatic variables are utilized as input to the developed models. The models’ outcomes are also compared with multi-linear regression based on Nash-Sutcliffe efficiency, root mean square error, and coefficient of determination statistics. ELM is found to be generally performs better than the other four alternatives in estimating soil temperatures. A decrease in performance of the models is observed by an increase in soil depth. It is found that soil temperatures at three depths (5, 10 and 50 cm) could be mapped utilizing only air temperature data as input while solar radiation and wind speed information are also required for estimating soil temperature at the depth of 100 cm.


Introduction
For different climatic zones whether it is tropical, arid or semi-arid, soil temperature is considered as one of the most essential variables affection the agricultural water management and process. As a result, forecasting the soil temperature could be of importance for water resources planners, especially for agricultural water demand. The fluctuations of the soil temperature at different time-increments hourly, daily or monthly plays a substantial role on the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 moisture status of the soil at different depth and governs the exchange of the energy and moisture in the boundary of the soil-atmosphere interaction layers [1]. In general, the soil temperature is the key factor for the successfulness of the agricultural process as it dominates the evaporation and the evapotranspiration, plant growth, ventilation and root conditions [2,3]. Furthermore, the soil temperature influences the status of the microorganisms and its activities within the soil (reference). The fact that the soil temperature alters with respect to the depth (keep in mind that the fluctuation of the temperature at the soil surface is higher than in deeper level) motivates the researcher to recommend the necessity for monitoring the soil temperature at different depth. In this context, it is necessary to monitor and evaluate the soil temperature at different depths [4,5].
In fact, the basic factor affecting the soil temperature and its distribution within the soil depth the climatic variables including air temperature, relative humidity, wind speed, solar radiation, rainfall, atmospherics and sunshine duration. Generally, most of the existing studies on soil temperature relied on a few or all these variables to predict the soil temperature [6,7,8].
It should be noted that in few cases most of these variables might not available and the interrelationship between the soil temperature and these variables are highly non-linear. Such truth, the ability of the machine learning models motivates the researchers to be utilized as the most effective technique to accurately predict the soil temperature [9,10,11].
During the last two decades, the machine learning methods have been applied and showed high effectiveness and accurate performance to several engineering applications, especially for forecasting, prediction, pattern recognition problems. In 2014, Coactive Neuro-Fuzzy Inference System (CANFIS) has been employed to forecast the daily soil temperature in arid and semi-arid areas by [12]. Relatively good performance for forecasting the soil temperature has been achieved, however, the range of the maximum error was slightly high. For the Bandar Abbas and Kerman stations in Iran, Nahvi et al. [13] developed modified version of the Extreme Learning Machine (ELM) by integrating with Self-adaptive Evolutionary (SaE) algorithm and introduced (SaE-ELM) model. The model has been structured considering the atmospheric pressure, air temperature and global solar radiation as inputs. It has been verified that the soil temperature forecasting accuracy has been slightly improved using the SaE-ELM model.
Furthermore, in 2017, Adaptive Neuro-Fuzzy Inference System (ANFIS), Gene Expression Programming (GEP), and Artificial Neural Network (ANN) methods have been utilized as a modeling technique for estimating the soil temperature (ST) at various depths for two different stations in Turkey by [11]. It has been reported that the GEP method outperformed the other methods attaining better accuracy for forecasting the soil temperature at all depths. In the same year, Mehdizadeh et al. [14] examined the GEP as a forecasting model for monthly soil temperatures of 31 stations in Iran. However, the model has been developed using different set of input variables including geographical information and period component rather than relied on the traditional meteorological variables as reported earlier. The achieved results from the study showed that the utilization of the ANFIS model enhanced the prediction accuracy of the soil temperature for all the 31 stations.
It has been reported that the major challenges for achieving accurate predicting of the soil temperature is unavailability of the most meteorological variables needed as the model inputs [12]. In addition, the prediction of the soil temperature has inner uncertainties in terms of the measurement sensors' precision, a noise because of sensors and the nonlinear feature interrelationship. The conventional forecasting/predicting methods are found to be inappropriate for meeting these requirements specifically when it is required for forecasting the nonlinear dynamical variable in nature. Additionally, it is difficult to implement the classical modelling techniques when the system behavior is anonymous or slightly known. In this case, the use of new techniques becomes very essential in especially such a complex nonlinear dynamical system. In addition, for forecasting applications that includes several inputs, the selections of the most appropriate input selections are considered the most significant step in developing the forecasting model. Therefore, the selection of the minimum possible parameters that enclosed the most essential information for the model to be able to accurately forecast the desired parameter is vital step in structuring the model. In our study for forecasting the soil temperature, there are several inputs that should be considered in the model and it might be necessary to use different combinations of parameters due to the unavailability of some parameters. The model's input selection is a necessary step to assure the successfulness of the model performance achieving accurate prediction accuracy for the model's output. However, the existing research manuscripts for ST prediction did pay attention for this step as long as the required data are available for the model developers. On the other hand, the availability of the model' inputs variables are not necessarily accessible for all case studies. Therefore, in this study, there is a need to investigate the potential for developing accurate ST prediction model relying on the most suitable model's input pattern. In this context, it will be curious to introduce a method that might able to automatically prior select the most appropriate input selections. In this context, Group Method of Data Handling (GMDH) method has been employed in order to optimally select the appropriate input parameters for soil temperature at different depth [15]. GMDH is considered as an effective self-organizing algorithm that able to be adapted with machine learning method and permits the accomplishment of proper selection from database.
It should be noted that in order to acquire accurate ST values in the field that there is a need install several thermometers at several soil depths. In addition, the installation should be carried out at different locations within the study area at the same time to assure the consistency and the accuracy of the collected data [16,17]. The implementation of these procedures several are definitely costly and time-consuming especially in developing countries [18]. As a result, the accessibility of accurate and consistent ST data are very limited and hence there a need for robust model that able to capture the mapping between the input(s) and the ST as the model's output Feng et al. [19]. Recently, Mehdizadeh et al. [20] developed Fractionally Autoregressive Integrated Moving Average (FARIMA) model so as to predict the ST and compare the results with classical Artificial Intelligent (AI) models namely; Gene Expression Programming (GEP) and Feed Forward Back Propagation Neural Network (FFBPNN) methods. Although that the results showed that FARIMA outperformed the FFBPNN and GEP methods, the prediction accuracy for ST using FARIMA were relatively inadequate for the extreme ST values, Mehdizadeh et al. [20].
Due to the highly expensive costs and the extensive delinquent and difficulty for direct measurements of soil temperature which is essential for several applications in meteorological, hydrological and agricultural process, it becomes crucial to examine the potential of machine learning methods to estimate the soil temperature. In this context, the current study, an investigation for predicting the soil temperature utilizing several machine learning methods has been proposed and assessed. As it has been reported earlier, it could be noticed that there were a lot of research efforts have been developed to predict the ST at different depths. However, the major inadequacy that have been experienced upon utilizing those models in predicting the ST, using the traditional statistical model such as ARIMA, is that there is a need to predefine proper stochastic procedure to identify the associated uncertainty for all used variables in the model. In addition, for the recently classical machine learning models, prior interrelationship information between different used variables, for example, covariance, variance and correlation values have to be accurately recognized to select the proper model' input-output architecture. Furthermore, the classical machine learning models experienced over-fitting problems that could lead to unexpected relatively high prediction errors when different input patterns are examined. In this study, in fact, the ELM method has been developed in order to bridge the research gaps and drawbacks in these prediction modeling methods. The ELM has been proved to be reliable procedure and promising algorithm for overcoming the over-fitting problems. In fact, the ELM's procedure is designed to overcome the disadvantages of both prediction modeling concepts, the traditional and machine learning. The ELM's procedure allows to considerably minimize the possibility of experiencing over-fitting while training and hence consistent prediction accuracy for unexpected input pattern could be achieved. In addition, the random projection procedure within parallel computing techniques increase the possibility  of accomplishing successful convergence procedure and lessen the time needed to achieve the performance goal.
The purpose of the model is to predict the soil temperature at different depths (5, 10, 50 and 100cm) on a monthly basis. So as to substantiate the exactness of the developed methods, comprehensive comparative analysis has been carried between the proposed machine learning models including CART, GMDH, ELM, and ANN. In addition, different grouping and pattern of climate variables have been examined as inputs for the model including air temperature, solar radiation, relative humidity and wind speed. Different statistical indices have been evaluated to examine the performance of the models to compare how accurate is the model output with the desired soil temperature at different depths. It should be noticed here that it is the first attempt to utilize both the CART and GMDH models as a predictor for soil temperature.

Used data
In the study, monthly climatic data, air temperature (T), relative humidity (RH), solar radiation (SR), wind speed (W) and soil temperature for the depths of 5, 10, 50, and 100 cm were obtained from Mersin station (longitude 34˚38 0 E, latitude 36˚48 0 N, altitude 3 m) which is operated by Turkish Meteorological Service. The study area (Fig 1) has Mediterranean climate with wet winters and dry summers [21,11]. The winter can get very heavy rains and flooding is a big problem in some regions. The air temperature ranges from 24˚C (winter) to 40˚C (summer). The data cover 25-year monthly records from 1986 to 2010. In the study, the first 80% was utilized for training and remaining 20% was utilized for testing. Table 1 sums up the brief statistical properties of the climatic and soil temperature data. It is apparent that ST data show normal distribution (skewness values are close to 0). Maximum values of the soil temperatures at different depths are higher in the test phase in comparison with training phase. This may limit the extrapolation capabilities of the implemented models [22,23]. It can be said that the variation of ST decreases with respect to depth increment (see St. Dev. in Table 1). The soil temperature variation at various depths is illustrated in Fig 2. As observed, ST at the depths of 5 cm, 10 cm and 50 cm have high correlations while the ST at 100 cm has totally different variation compared to other values of three depths. Figs 3-6 demonstrate the visual relationships between each climatic input and ST values for different depths. It is clear that the air temperature is highly correlated with ST especially for the first three depths and it is followed by the Rs, W and RH, respectively.

ANN.
Artificial neural network (ANN) is a data processing based on the neural structure of the human brain. It constructs relations between inputs and outputs. It has parallel data processing architecture like human neural system [24]. The basic element of a human neural system is a neuron, which has four basic components. Neurons receive weighted inputs, combine them, apply nonlinear operation and give the output. Therefore, the artificial neuron, which is the elementary processing element of an ANN, has four functions like natural neurons. Clustering these artificial neurons forms the artificial neural network. This clustering happens by making layers, which are then associated with each other. Most applications need ANN to

PLOS ONE
Soil temperature prediction using extreme learning machine have three interconnected layers, which are input, hidden and output layers. Fig 7 illustrates an ANN architecture with three inputs, four hidden nodes and two outputs. These ANNs are known as multi-layer perceptron (MLP) [25,26,27]. The connection among layers is the one of the most important features of an ANN. It can be feedforward or feedback. Feedforward networks are unidirectional while feedback networks have loops.
One of the most recognized advantages of an ANN is that it can learn. Learning occurs by adjusting weights to minimize the error between predicted and observed values. There are different training algorithms, which minimizes the error. One of the most popular training algorithms is Bayesian regularization, which is also used in this study [28]. The following formula shows the sigmoid function which is used in this study.
Learning occurs by adjusting weights to minimize the error between predicted and observed values. There are different training algorithms, which minimizes the error. One of the most popular training algorithms is Bayesian regularization, which is also used in this study [28]. Besides theoretical complexity and needs of fine tuning weakness of ANN, one of the main advantages of this method is its effectiveness in complex relational variables and high dimensional problems.
CART. Leo Breiman et al. [29] introduced the classification and regression tree (CART). It is used as a prediction model, which uses a binary decision tree, see Fig 8 [30,31,32]. It is specially fitted for tasks in which a little priori knowledge exists. The motivation behind the analyses by means of tree-building algorithms is to decide split conditions, which gives correct classification of cases or prediction. CART divides the datasets into child nodes until it reaches stable state in which dividing leaf nodes don't improve the entire tree. There are three main steps in the CART methodology. These steps are tree growing, tree pruning, and selection of optimal tree. Since CART integrates continuous and categorical dependent variables, it is widely used by many practitioners.
Friedl et al. [33] noted that boosting and bagging algorithms can improve the performance of CART algorithm. Deciding about a threshold value and input feature are the important parts of the forming the structure of the decision tree. Since it is decision tree it naturally discards the ineffective input features. CART models are very interpretable since the impact of each input variable on the output can be envisioned by the related tree-based structure.
ELM. For the past decades in the feed forward neural networks the learning speed was generally slower than desired since the entire network parameters are iteratively tuned mostly using slow gradient-based learning algorithm. Also, more human involvement is needed in classical learning methods to get suitable model parameters. Extreme learning machine (ELM) is proposed as a new machine learning method to get the better of the classical machine learning models for single-hidden layer feed-forward neural networks (SLFNs) [34,35]. It simply uses the idea of a random projection and then linear regression. During its learning phase tuning hidden neurons is not required. A sub-network of several nodes can be used as a hidden node.
In the last decades it has been used in variety problems like feature learning, classification, regression, and clustering [36,37,38,39,40,41,42,43]. ELM became faster and efficient for big data processing because of the improvements of the parallel computing techniques.
The output function of generalized single-hidden layer feed-forward neural networks with L hidden nodes can be defined as: where β 1 is output weights and G is hidden node output function. During training phase, the quantity of hidden nodes, output function and hidden node is given, ELM learning process consists of the following steps: Assigning randomly hidden node parameters (a i ,b i ), i = 1,. . .,L, calculating hidden layer output matrix: . . . 5 is the target function [34]. GMDH. Group method of data handling (GMDH) is originated by Ivakhnenko [44]. It is a set of inductive algorithms. It consists of clusterization, rebinarization, parametric, probability and analogues complexing algorithms. It has big area of applications like optimization, data mining, complex system modeling, pattern recognition, and deep learning. It is cited as one of the oldest deep learning methods [45]. By its inductive nature it finds the optimal structure of the model without interaction of the authors.
The performance of GMDH is better than the classical alternatives like ARIMA, back-propagation neural network, single exponential smooth [46]. GMDH has three parts which are input variables, external and internal criteria, see     Polynomial functions are used in the majority of GMDH algorithms and the network formed by using GDMH is adaptive. A generic relation between input/output variables can be shown by the following Kolmogorov-Gabor polynomial equation [47].
where y is the node output, x i ,x j ,x k ,. . . are inputs, and a 0 ,a i ,a ij ,. . . are the coefficients of the polynomials.

Application and results
In the presented work, the exactness of four data-driven methods, ELM, ANN, CART, GMDH and multi-linear regression (MLR), is examined in mapping monthly soil temperature at various depths. Sigmoid function and 250 hidden nodes were used for the ELM after trying various numbers. For the ANN models, Bayesian regulation was employed for training and the optimal hidden node number was determined as 25. The hidden neurons' number was decided based on trial and error considering minimization of the training error and the performance of the models was checked by testing samples that had not any role in model training (calibration stage). In the hidden/output layers, sigmoid/purelin activation functions were utilized, respectively. The loss function was mean-squared error. Root mean square error (RMSE) was utilized to assess the employed models. To check the over training which is the well-known challenge in AI-based techniques, dataset was classified as training and testing and models were calibrated utilizing training samples and the performance of the models was checked by testing samples that had not any role in model training (calibration stage). Therefore, an acceptable result in testing stage proved that there is no over training in the proposed models.
The following criteria were used for evaluation of the model results

Root Mean Square Error RMSE ð Þ ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Determination Coefficient R 2 ð Þ ¼ Where N = data quantity, ST im = measured soil temperature, � ST m = mean of measured soil temperature, ST ip = predicted soil temperature, � ST p = mean of predicted soil temperature. Training and test results of the ELM models with various input scenarios are summed up in Table 2. At the first the depths of 5, 10 and 50 cm, the models with only temperature input performs superior to the other models while for the depth 100 cm, the ELM model with T, Rs, W or full inputs has the least RMSE and the highest NSE and R 2 . The best models' accuracies of the ELM with respect to RMSE range from 1.190 cm (10 cm depth) to 3.211 cm (100 cm depth) in modeling soil temperature at various depths. Table 3 reports the training and test statistics of the ANN with different climatic inputs in estimation of soil temperature at 5, 10, 50 and 100 cm depths. The optimal ANN model was found for only temperature input for each depth. The RMSE range of the best ANN models is 1.429-5.407 cm (10 cm-100 cm depths).
The RMSE, NSE and R 2 statistics of the CART are provided in Table 4. A different trend is observed for this method compared to ELM and ANN. Linear structure of CART may be the reason of this. The best CART models for the depths 5, 10, 50 and 100 cm were obtained from the fourth, second, first and the third input combinations, respectively. The RMSE of the best model increases from 1.269 cm (10 cm depth) to 3.330 cm (100 cm depth). Table 5 presents the training and test results of the GMDH in respect of RMSE, NSE and R 2 in mapping soil temperatures of various depths. For this method, temperature input provides the best performance for the depths of 50 cm and 100 cm while for the other two depths, the model with T and RH (second input combination) has the best accuracy. The error range of the GMDH with respect to RMSE varies from 1.165 cm (50 cm depth) to 4.486 cm (100 cm depth). The training and test results of the MLR are reported in Table 6 in mapping soil temperature at four different depths. The best MLR models for the depths 5, 10, 50 and 100 cm were obtained from the first, first, second and the third input combinations, respectively. The RMSE increment of the best models is from 0.997 cm (50 cm depth) to 4.045 cm (100 cm depth). As a linear model, MLR seems to be worse than the other linear structured CART model except the ST at 50 cm in which the MLR with T and R H input performs superior to the other four methods in this case.
The RMSE values of the best models are visually compared in Fig 10 in the bar graph forms. The differences among the models with respect to various climatic inputs can be better seen from these graphs. In case of 5 cm depth, including climatic variables (RH, Rs and W) in inputs does not affect the accuracy of the CART and GMDH while the exactness of the ELM and ANN decreases and the one input model with temperature data has the lowest RMSE for the all methods. In estimation of soil temperature at 10 cm and 50 cm depths, also same trend is observed except the last one and two-input combinations of the GMDH. In case of 100 cm depth, adding Rs and W variables into input combination generally increases the performance of ELM and CART and they perform superior to the ANN and GMDH. As an example, the regression tree of the CART model for modeling soil temperature at depth of 5 cm is illustrated in Fig 11. The scatterplots of the optimal models in estimating ST at various depths are shown in Figs 12-15. In case of soil temperatures at 5 cm depth (ST 5 ) and at 10 cm depth (ST 10 ), the ELM and GMDH generally have less scattered estimates than the CART and ANN models (Figs 12  and 13). The ELM better predicts low ST 5 and ST 10 (lower than 15 o C) while the GMDH has better estimates middle values (between 15 and 30 o C). Here, the main advantage of the ELM model compared to GMDH model is it uses only air temperature data while the latter model also requires relative humidity. In case of soil temperatures at 50 cm depth (ST 50 ), the fit line of the GMDH model is closer to the exact line (Y(estimate) = T (target or observed)) which indicates that the estimates of the GMDH are closer to the observed values compared to ELM, ANN, CART, MLR model (Fig 14). All the optimal models use only temperature input. In case of soil temperatures at 100 cm depth (ST 100 ), the ELM model is relatively better than the other models while the ANN model provides the worst results (Fig 15). Time variation of the models' estimates and observed ST values are shown in Figs 16-19. All five methods could catch the general trend of the ST values at three depths while for the depth of 100 cm, considerable under-and over-estimations are observed for the all methods. The ANN seems to be  inadequate in catching ST at 100 cm depth while the GMDH and MLR have considerable over-and underestimates at this depth. The main reason of this might be the low correlation between climatic inputs and ST at 100 cm depth. The other may be the different behavior of ST at 100 depth as also observed from Fig 2. In overall, the ELM provides better accuracy than the ANN, CART, GMDH and MLR in estimating soil temperature at different multiple depths. This result is in accordance with the study of Feng et al. [19] in which ELM was applied in estimating soil temperature at the depths of 2, 5, 10 and 20 cm and compared with GRNN, BPNN and RF models. Better estimates were obtained from ELM compared to other models.

Conclusion
The abilities of four machine learning methods, ELM, ANN, CART and GMDH in estimating soil temperature at different depths were compared utilizing various combinations of climatic variables as inputs and results were compared with MLR model. The following conclusions can be reached from application results: • It was found that the models' accuracies generally decrease by increase in soil depth.
• Soil temperatures at 5, 10 and 50 cm depths could be successfully predicted using only air temperature data as input. In prediction of ST at 100 cm depth, however, solar radiation and wind speed information are also needed.
• ELM method generally provided superior accuracy to the other methods in predicting monthly soil temperatures at various depths.
• ELM can be used in real soil temperature forecasting which carries importance for agricultural decision systems.
In the current research, models were tested by data from only one site. In future studies, the models may be tested using more data from other sites and ELM may be compared with other machine learning methods such as neuro-fuzzy, fuzzy-genetic, neuro-genetic.