Figures
Abstract
Drought is a climate risk that affects access to safe water, crop development, ecological stability, and food production. Therefore, developing drought prediction methods can lead to better management of surface and groundwater resources. Similarly, machine learning can be used to find improved relationships between nonlinear variables in complex systems. Initially, the standardized precipitation evapotranspiration index (SPEI) was calculated, and then using large–scale signals such as large–scale climate signals (the North Atlantic Oscillation, the Arctic Oscillation, the Pacific Decadal Oscillation, and the Southern Oscillation Index), along with climatic variables including temperature, precipitation, and potential evapotranspiration, predictions were made for the period of 1966–2014. Several new machine learning models including Least Square Support Vector Regression (LSSVR), Group Method of Data Handling (GMDH), and Multivariate Adaptive Regression Splines (MARS) were used for prediction. The results showed that in estimating SPEI in moderately arid climates, the GMDH model with criteria (RMSE = 0.26, MAE = 0.17, NSE = 0.95 in validation) under scenario S1 (included all variables plus the SPEI of the previous month) performed better, while in arid and cold climates, the LSSVR model (RMSE = 0.22, MAE = 0.18, NSE = 0.95 in validation) under S1, and in arid and hot climate, the LSSVR model (RMSE = 0.29, MAE = 0.19, NSE = 0.93 in validation) under scenario S2 (included meteorological variables plus the SPEI of the previous month) had higher prediction accuracy. Although the MARS model was less accurate in validation, it showed higher accuracy during calibration compared to the other two models in all climates. The results showed that using large–scale signals for predicting SPEI was beneficial. It can be concluded that machine learning models are useful tools for predicting the SPEI drought index in different climates within similar ranges.
Citation: Bour S, Kayhomayoon Z, Hassani F, Ghordoyee Milan S, Bazrafshan O, Berndtsson R (2025) Enhancement of standardized precipitation evapotranspiration index predictions by machine learning based on regression and soft computing for Iran’s arid and hyper–arid region. PLoS ONE 20(3): e0319678. https://doi.org/10.1371/journal.pone.0319678
Editor: Majid Niazkar, Euro-Mediterranean Center for Climate Change: Fondazione Centro Euro-Mediterraneo sui Cambiamenti Climatici, ITALY
Received: September 26, 2024; Accepted: February 5, 2025; Published: March 18, 2025
Copyright: © 2025 Bour et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data for this study are publicly available from The NOAA Physical Sciences Laboratory (PSL) website (https://psl.noaa.gov/data/gridded/data.noaa.oisst.v2.highres.html).
Funding: This research was funded independently and supported by the authors. Ronny Berndtsson was funded by the MECW (The Middle East in the Contemporary World) project at the Centre for Advanced Middle Eastern Studies, Lund University.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Drought is caused by a temporary imbalance between the supply and demand of water, usually due to less rainfall than average [1]. Drought often develops slowly over time and has long–term effects [2]. Therefore, a challenge for managers and stakeholders is determining the onset of a drought period. Additionally, there are different severities of drought in terms of timing and spatial extent [3]. Drought is a natural phenomenon occurring in all climates [4]. Based on the operational definition, there are four main types of droughts. Meteorological drought indicates a lack of rainfall compared to the long–term average. Agricultural drought signifies moisture deficiency and reduced agricultural productivity. Hydrological drought denotes insufficient surface and groundwater supply. Another type is socio–economic drought where inadequate water supply leads to negative impacts on the environment, economy, and society [5]. Several indices have been developed by researchers to study drought, each addressing certain variables for the assessment. The efficiency of drought monitoring systems depends on the indices chosen according to the drought situation in the region. Over the years, numerous indices have been developed for monitoring drought in meteorological sectors [6].
The most important factor in drought situations is rainfall, but changes in temperature and consequent evaporation and transpiration can exacerbate or mitigate drought severity [7,8]. Various indices are based on rainfall or a combination of rainfall and evapotranspiration. Monitoring drought using different indices will undoubtedly yield different results. Therefore, since drought is a phenomenon dependent on multiple variables, it seems that, alongside precipitation, evapotranspiration should be considered, especially for arid and semi–arid regions [9]. Indices that consider both precipitation and evapotranspiration can be used to monitor current and future climatic changes. An appropriate index introduced for monitoring agricultural drought is the Standardized Precipitation Evapotranspiration Index (SPEI), which, due to considering the water balance, has a strong correlation with soil moisture and fast response to drought.
Previous studies have shown that SPEI is a robust indicator for drought monitoring and analysis [10–14]. Various models have been proposed for predicting drought indices. Models such as Autoregressive Integrated Moving Average (ARIMA), Markov chains, linear, and nonlinear regression models have been widely used by researchers [15–19]. Due to limitations such as the need for extensive data, heavy reliance on expert knowledge for identification, and sometimes complex approaches, machine learning methods have started to be developed in this area [20].
Machine learning models are capable of providing robust predictions in a time–efficient manner and with fewer variables [21]. These models are efficient in simulating environmental and hydrological phenomena [22–25]. In recent years, researchers have started to develop such models to predict various drought indices [26–29]. Machine learning models encompass a wide spectrum of methods, each with different performance based on the algorithms defined in their structure. However, one of the primary challenges we face in predicting drought is the use of local and regional data, such as rainfall observations and climatology. On the other hand, the limited length of historical data including uncertainties and errors necessitates researchers to continuously restructure prediction methods [7]. Over the past two decades, the use of climatic signals and remote sensing data have overcome some of these problems [7]. Climatic signals, with long recurrence periods, are sufficiently reliable and are used worldwide for predicting droughts, floods, minimum or maximum river flows, the onset of warm or cold seasons, etc. [30].
Several studies in recent years have shown that the trend of drought and salinity is related to large–scale climatic phenomena. These phenomena are defined as standardized numerical indices such as SOI, NAO, PDO, and NOI, which have been examined by researchers for their correlation and relationships with local hydroclimatic variables in different regions of the world. These indices are obtained through measurements of temperature and air pressure at various points in the atmosphere [31]. Several studies have utilized meteorological parameters and large–scale variables to study drought. For example, Deo et al. (2018) [32] predicted agricultural drought using climatic variables and signals and the Support Vector Regression (SVR) model in Australia. The results showed that the model had good capability in estimating drought, and there was no significant difference between observed and estimated drought characteristics. Tian et al. (2018) [33] address the prediction of agricultural drought using SPEI at different time scales and the indices niño southern oscillation, Western Pacific Subtropical High (WPHS(, and SVR. The results showed that SPEI–6 reflected soil moisture better than other variables, and WHPS mainly controlled the region’s temperature, with the addition of this variable improving the model’s accuracy. Morid et al. (2007) [34] predicted drought using an artificial neural network, precipitation, and the SOI and NAO indices in Tehran province. The results showed that precipitation was the best predictor, and the two selected signals had little effect on prediction. Additionally, the results of the correlation between the standardized precipitation index (SPI) and ENSO and PDO indices showed a strong relationship between SPI and the two mentioned climate signals at the time of drought occurrence [35].
Among various machine learning models, Multivariate Adaptive Regression Splines (MARS) and least square support vector regression (LSSVR) models are some of the most recent ones that have been utilized in various research studies. MARS can be used to provide regression relationships. The model has been successfully used in predicting groundwater response by Fernandes et al. (2024) [36], runoff prediction by Singh et al. (2022) [37], surface water quality index prediction by Trach et al. (2022) [38], and evaluation of the scour propagation rates around pipelines by Najafzadeh et al. (2022) [39]. LSSVR has also been evaluated in various time series studies such as the prediction of longitudinal dispersion coefficient of rivers by Arya Azar et al. (2023) [23], prediction of groundwater level by Kayhomayoon et al. (2023) [40], and evaluation by Kumar et al. (2023) in predicting river flow [41]. Along with these two models, Group Method of Data Handling (GMDH) is also based on deep learning. One of the capabilities of this model is to convert a complex data structures into simpler ones, which can be effective in estimating parameters such as SPEI. However, this aspect has so far received less attention in research. However, Table 1 presents a summary on studies for SPEI prediction using machine learning. Due to global warming, an increasing number of studies is focusing on drought prediction. Many of these studies have used meteorological variables in the estimation of the SPEI, e.g., Poornima and Pushpalatha (2019) [42], Dikshit et al. (2021) [43], and Mokhtar et al. (2021) [44]. Fewer studies have used data from satellite imagery to predict SPEI (e.g., Deo et al., 2018 [32]). To summarize, less studies have been made on SPEI prediction using climate variables and global climate indices. Thus, one of the novelties of this study is the utilization of the three machine learning methods; MARS, LSSVR, and GMDH for this purpose. Each of these methods has a unique structure and performance. Use of the MARS model is novel and the extraction of its regression relationship could prove beneficial in prediction of SPEI. However, questions persist regarding its accuracy compared to other machine learning methods such as LSSVR and GMDH, as well as the complexity of the extracted relationship. Therefore, the present study aimed to: (1) extract the most effective variables in predicting agricultural drought using MARS, LSSVR, and GMDH, and (2) compare the characteristics of observed and predicted droughts and introduce the best model for predicting agricultural drought in three climates; hyper–arid–moderate, arid–cold, and arid–hot climate in Iran. Hence, the novelty of this research compared to similar studies lies in considering the use of new machine learning models for predicting the SPEI index in these three typical climate areas, taking into account climate variables and remote sensing data.
2. Materials and methods
2.1. Study area
Iran is generally located in the arid and semi–arid region, however, with a wide variety of this general climate. Commonly, the north of Iran has highlands and mildly arid or humid areas, while the south, with the lowest elevation, is mainly characterized by moderate to severe aridity. This makes Iran constantly at risk of droughts and vulnerable to the effects of climate change. Among the different regions with different climatic characteristics, one study area was randomly selected for evaluation in each of these climatic region. Accordingly, climate analyses were performed using the modified extended de–Martone method [54] for 13 synoptic climatic stations with 48–year records (577 months). Table 2 shows the characteristics of the climatic stations in each of these selected regions. The three selected arid climates were: arid-cold, arid–hot, and hyper–arid–moderate. The areas can be said to be representative for each type of climate. As can be seen, 13 stations represent these climates. Six stations were located in arid-cold climate, three in arid-hot, and four in hyper–arid–moderate climate. In the arid–cold, arid–hot and hyper–arid–moderate climates, the mean annual precipitation varied between 161 and 319 mm, 22.5 and 248 mm, and 56 and 80 mm), respectively.
To predict the SPEI index, two data groups including meteorological data such as maximum temperature (Tmax), minimum temperature (Tmin), mean temperature (Tmean), precipitation (Pr), and potential evapotranspiration (ETPc) and large–scale climatic signals (AO, NAO, PDO and SOI) were used at a monthly time step during the time period 1966–2014. The meteorological time series used in this study are presented in Fig 1. According to the figure, the highest Tmax for arid–hot, hyper–arid–moderate and arid–cold climates was 41.4, 44.5 and 37.1°C, respectively. On average, Tmax in these climates fluctuated in the range of 29.7, 29.8, and 20.7°C. Tmin fluctuated between 7.3 and 30.9°C, –3.4 and 30.8°C, and –7.1 and 23.1°C for the mentioned climates, respectively. The highest mean annual precipitation for these climates was 242.6, 72.2, and 102.7 mm, respectively, which was less than the average annual precipitation for the entire Iran, equivalent to 250 mm. The largest annual potential evapotranspiration occurred in hyper–arid–moderate climate (256.6 mm), which averaged 73.8 mm per year in this climate. Among the three climates, the arid–hot climate had the highest average annual potential evapotranspiration (97.0 mm) (Fig 1).
The meteorological variables were used together with large–scale climate indices in an attempt to improve SPEI prediction [55]. Fig 2 shows the corresponding large-scale climate indices that were used as input to the modeling after standardization and normalization. These were obtained from the NOAA website. The ENSO index is related to two indices, the Southern oscillation index (SOI), sea level pressure (SLP), and sea surface temperature (SST) in the equatorial Pacific ocean, where its negative and positive values indicate the cold and warm phases of ENSO or La Niña and El Niño conditions [56]. The Pacific decadal oscillation (PDO) is a climate oscillation pattern with its center of variability over the Pacific ocean and North america, and its measurements of SST and SLP are taken in the North pacific ocean (20° N) and in the northern United States [57]. The Arctic oscillation (AO) is the northern Hemisphere annular mode and is characterized by winds circulating the Arctic at a latitude of around 55o N. The North Atlantic oscillation (NAO) consists of a north–south dipole of anomalies, with one center located over Greenland and the other center of opposite sign spanning the central latitudes of the North Atlantic between 35o N and 40oN.
2.2. SPEI index
Developed by Vicente–Serrano et al. (2010) [58], the standardized precipitation evapotranspiration index (SPEI) is a widely used metric for evaluating drought conditions. This index takes into account precipitation, temperature, and potential evapotranspiration (PET). The SPEI combines the sensitivity of the Palmer drought severity Index (PDSI) to changes in evapotranspiration with the straightforward calculations and multiscale characteristics of the standardized precipitation index (SPI). As a result, it incorporates the attributes of both the SPI and the PDSI. To obtain the value of SPEI, the first step includes estimating the evapotranspiration for each month. Then, utilizing a straightforward water balance model, the difference between precipitation (P) and potential evapotranspiration (PET) for month i is calculated using:
Calculating the SPEI is similar to the method used for computing the SPI, as it involves estimating cumulative probability values for the Di values through “i” by fitting a probability density function. Due to the tendency of Di values to converge to negative values, two–parameter probability functions are unsuitable for this calculation. Vicente–Serrano et al. (2010) [58] found that the three–parameter log–logistic probability density function provided the best fit for the Di values among the various functions tested. The general form of this probability density function is:
where α, β, and γ are the scale, shape, and main parameters, respectively, for the Di values within the domain of ∞ < D> γ. The cumulative probability function of this distribution is:
To estimate SPEI, we utilized the Classic Abramovich–Vastigan function according to:
where W is calculated using:
where p represents the probability of the specified values of D being exceeded. The values of C0, C1, and C2, as well as d1, d2, and d3, are constants. The SPEI is a standardized variable and can therefore be compared with other SPEI values in space and time. A SPEI value equal to zero corresponds to cumulative probabilities (D) of 0.05.
2.3. Machine learning models
Three machine learning models were utilized to estimate the SPEI–12 index, each of which is explained below. The models employed in this study included MARS, LSSVR, and GMDH, all of which possess unique structures and performance levels. Each of these methods offers specific features: MARS provides a robust regression relationship that is conducive to accurate predictions, LSSVR is an enhanced version of SVR that, despite its simplicity, yields highly accurate results in forecasting, and GMDH is a deep learning model that simplifies a complex structure through a series of steps.
2.3.1. Least square support vector regression.
Introduced in 1999 [30], LSSVR aims to overcome some of the shortcomings of SVR since it provides us with lower computational complexity, higher performance, and lower runtime. Assume that {xk, yk}k = 1N is training data where input and output data include xk ∈ RN and yk ∈ R, respectively. The nonlinear regression function in the initial weighting space is:
where W and b represent the weight and bias of the regression function, respectively. T denotes the transpose symbol, and φ(x) is the nonlinear mapping of inputs in the high–dimensional feature space. In LSSVR, we are looking for a solution for the optimizing problem according to:
where e indicates an error function and γ is a regulation parameter and controls the approximation function. The solution to Eqn. (7) is:
where α is the Lagrangian coefficient. The LSSVR model can be written in the form:
where K(χ, χk) is called the kernel function.
2.3.2. Multivariate adaptive regression splines (MARS).
Presented by Friedman (1991) [59], MARS is a non–parametric regression method for flexible regression modeling of high–dimensional data. The general form of this model can be expressed as [60]:
where x is a variable and there is a constant value in this model called a node. For each variable, the identified node will differ, and the coefficient βm varies with each state of hm. The number and location of the nodes are determined through a forward–backward step–by–step process (Fig 3). During the forward phase, many nodes are generated, while in the backward phase, the nodes that contribute less to the overall fit are removed. At each step, from all the lines of each basis function, a linear process is selected that minimizes the defects of the fitted model. Basic functions that yield the minimum possible values to the model undergo elimination steps, and the optimal model is then selected. The loss of fit index used is based on a standardized criterion:
where is the dependent value predicted by the model, n is the number of observations in the data series, M is the number of non–constant terms in the model, and C(M) is the error function. The purpose of C(M) is to compensate for the complexity of the model; in this situation, excessive fitting is avoided to increase the efficiency of the model.
2.3.3. Group method of data handling (GMDH).
GMDH is a model designed to address complex and nonlinear problems [61]. It constructs a self–organizing model capable of handling prediction, classification, and other system issues. The GMDH method automatically determines the number of neurons, hidden layers, effective input variables, and the network structure [62] (Fig 4). This method assumes that the relationship between the system’s input and output can be represented by a high–order equation:
where X (x1, x2, …, xm) is the input vector and a (a1, a2, …, am) is the vector of coefficients or weights and y is the output. GMDH includes four steps. The first step includes generating new variables z1, z2,..., zn. For all input variables (x1, x2, …, xm), second–order regression equations are defined using the training data according to:
Then, the error of the neurons is calculated to distinguish active and inactive neurons. This is followed by removing inactive neurons. In the third step, new variables are generated as the input of the regression equations. The second step is then repeated and the outputs that have the best approximation to the desired output are selected. The GMDH is finally assessed using the validation data (Fig 4).
2.4. Research methodology
The research methodology is presented in Fig 5. To predict the SPEI–12 index, first, the parameters affecting it, which vary over time, need to be selected. Therefore, meteorological variables including maximum temperature (Tmax), minimum temperature (Tmin), mean temperature (Tmean), precipitation (Pr), and potential evapotranspiration (ETPc), as well as large–scale climate indices including AO, NAO, PDO, and SOI for the three aforementioned climates during 1966–2014, were chosen. Since different combinations of these variables as inputs to the model yield different results, various patterns of variable combinations were formulated. Three different scenarios were considered. The scenarios were arranged based on a combination of meteorological variables and large–scale climate indices to evaluate the accuracy of predictions using various parameters. The scenarios were created using two categories of meteorological data and large–scale signal data (Table 3). According to the table, the first scenario (S1) included all variables plus the SPEI of the previous month, the second scenario (S2) included meteorological variables plus the SPEI of the previous month, and the third scenario (S3) included large–scale indices plus the SPEI of the previous month. It should be noted that the data were considered on a monthly basis, with a total of 577 data points, of which 70% were used for model calibration and 30% for validating the models. Furthermore, the SPEI–12 index in three climates was estimated using the aforementioned scenarios and machine learning models (LSSVR, GMDH, and MARS) to select the most suitable model and the variables that have the greatest impact on SPEI (Fig 5).
2.3.4. Error evaluation criteria for machine learning models.
The accuracy of the models and scenarios developed was assessed using error evaluation criteria. Root mean square error (RMSE) (Eqn. 14), mean absolute error (MAE) (Eqn. 15), and Nash–Sutcliffe efficiency (NSE) (Eqn. 16) we re employed to assess the performance of machine learning methods across various scenarios [50] (Tian, 2021):
where x obs is the observed value, xsim is the simulated value, is average of the observed value, and n is the number of samples. The closer the RMSE and MAE values are to zero, the better the performance of the models. On the other hand, values close to one for NSE indicate good performance.
We used Eqn. (17) and (18) to analyze the uncertainty in the output results from the models and calculating uncertainty intervals. The intervals show upper and lower bounds for the estimated SPEI values related to acceptable uncertainty ranges. In this case, σ represents the standard deviation and µ represents the mean output over 100 model runs.
3. Results
Descriptive statistics of all input variables are provided in Table 4. In Fig 6, the SPEI–12 index is presented for each climate. As shown in Fig 6a, from 1966 to 2000, the index indicates that the hyper–arid–moderate climate was in a wet period and had almost entered into a drought since 2000. In the arid–cold climate, as depicted in Fig 6b, a drought period started around 1998, with the most severe drought experienced in 2006 and 2009. According to Fig 6c, the arid–hot climate experienced less drought compared to the hyper–arid–moderate and arid–cold climates, and the SPEI–12 index followed more of a sinusoidal pattern. Based on the graph, the drought period for the climate in question started around 2008 and experienced its most severe drought in 2010. These changes have been due to a decrease in precipitation and a relative increase in temperature since around 1995.
Before discussing the model results, it is important to present the optimal variable values for each model. In the LSSVR, the kernel function was considered Gaussian, with the optimal parameters σ2 and γ of 5.36 and 136.03, respectively. In the GMDH, the maximum number of neurons per layer was set to 20, the maximum number of layers to 5, and the selection pressure α to 0.6. In the MARS, the number of basic functions (NBF) varied for each scenario and climate condition, with the highest degree of interaction in the final model being 2. Scenarios S1 and S2 in the arid–cold climate and scenario S1 in the arid–hot climate required a higher number of functions to achieve the optimal solution (Table 5).
The number of functions used in the MARS model for each climate is shown in Table 6. As can be seen, about 17 functions have been extracted for hyper–arid–moderate climates. For the other two climates, 23 functions were extracted. In the hyper–arid–moderate climate, the extracted functions were related to the S2 scenario, in which meteorological parameters and SPEIn–1 were involved. Also, in arid – cold climate, the extracted functions were obtained from the S1 scenario, in which the results were suitable than other scenarios. Finally, in the arid–hot climate, the S1 scenario, which included all parameters, had better estimation accuracy. According to extractive functions, it is observed that some functions require the implementation of the previous function. For example, the third function requires the implementation of the second function in a arid – hot climate. This relationship has also been observed for cold climates (λ3), demonstrating that some functions rely on each other. This relationship can be linked to fluctuations in precipitation and the SPEI value. For example, functions λ9, λ10, and λ14 are influenced by the outcomes of functions two and four. Therefore, cosidering into account SPEI (n–1) variables and precipitation can improve estimate accuracy.
Finally, by summing the functions of the above table in each climate, the final extracted relationship for each climate was obtained. As can be seen, all three climates have a separate relationship, where Eqn. (19) represents the hyper–arid–moderate climate. In this equation, the constant value wais 0.288. Similarly, Eqn. (20) and (21) represent the extraction equation for arid–cold and arid–hot climates, respectively. The high number of functions in both dry–hot and dry–cold climates is attributed to the presence of SPEI fluctuations relative to zero. In contrast, in the dry–cold climate with low fluctuations, fewer functions were utilized. In both mentioned equations, the constant value of the equation is positive and for arid–cold and arid–hot climates were 0.375 and 1.147, respectively.
To evaluate the results of the models, evaluation metrics including RMSE, MAE, and NSE were utilized. These results for both calibration and validation are presented in Table 7. As mentioned, three different input scenarios were considered, which included ground station data, satellite data, and SPEI with a one–month lag (S1), ground station data, and SPEI with a one–month lag (S2), and satellite data and SPEI with a one–month lag (S3). Based on the results of the evaluation criteria, the proposed models have estimated the SPEI–12 index with high accuracy under different scenarios. For the hyper–arid–moderate climate, different scenarios have led to high accuracy of each model. In this climate, the results of the criteria for all three scenarios of the models are very close to each other.
The LSSVR model, by using satellite data with (RMSE = 0.24, MAE = 0.17, NSE = 0.96 for validation data(, has the most accuracy among the scenarios. This model, using scenario 1, has estimated the calibration data with criteria (RMSE = 0.20, MAE = 0.13, NSE = 0.97), which has the most accuracy compared to the other two scenarios. Also, in this scenario, the prediction accuracy of the validation data (RMSE = 0.26, MAE = 0.17, NSE = 0.94) is also very close to the selected scenario. In this climate, GMDH model using all parameters of (AO, NAO, PDO, SOI) and ground station (Tmax, Tmin, Tmean, Pr, ETPc) with criteria (RMSE = 0.22, MAE = 0.15, NSE = 0.96 for calibration data and RMSE = 0.26, MAE = 0.17, NSE = 0.95) for validation data) has been more accurate in predicting SPEI than the other two models. The MARS model has estimated calibration data more accurately than the other two models, so that the most accuracy of this model is obtained among calibration data in the second scenario with the criteria (RMSE = 0.26, MAE = 0.17, NSE = 0.95). The mentioned model has almost the same accuracy in predicting the validation data in scenario 1 and 2, but the highest accuracy of the model has been achieved in scenario 2 (RMSE = 0.27, MAE = 0.19, NSE = 0.97).
In the arid and cold climate, the highest accuracy of the models was obtained by utilizing S1 scenario data. In this climate, the LSSVR model showed higher accuracy compared to the two GMDH and MARS models with criteria (RMSE = 0.23, MAE = 0.17, NSE = 0.95 for calibration data) and (RMSE = 0.22, MAE = 0.18, NSE = 0.95 for validation data). Additionally, the GMDH model had a very close performance to the LSSVR model in predicting validation data (RMSE = 0.22, MAE = 0.16, NSE = 0.96), but its performance decreased in predicting calibration data. Although the MARS model estimated the calibration data with an accuracy of (RMSE = 0.20, MAE = 0.15, NSE = 0.95) in the selected scenario, it predicted the validation data with a relatively higher error in all three scenarios (Table 7). In the arid and hot climate, using variables from scenario S2 led to the highest accuracy of the models in estimating SPEI. In predicting the calibration data for all three scenarios, the models showed very close performance. In these data, the models had better performance using scenarios S1 and S2 compared to scenario S3, with the RMSE range in scenarios S1 and S2 being between 0.22 and 0.31, and in scenario S3, between 0.35 and 0.4. It should be noted that in estimating the calibration data, the MARS model had the highest accuracy using scenario S1 (RMSE = 0.22, MAE = 0.14, NSE = 0.95). Overall, in this climate, the LSSVR model provided a better estimation of SPEI–12 compared to the other two models. A general comparison of the models in the three studied climates shows that the MARS model had the highest accuracy in estimating SPEI–12 in the arid and cold climate under scenario S2, and for validation data, the LSSVR model in the arid and cold climate under scenario S1 exhibited higher accuracy. Consequently, it can be stated that large–scale climatic signals had a positive impact on increasing the accuracy of predictive models (Table 7).
Figs 7–9 present the time series plot of observational and simulated data for the selected scenarios, error plots, and the distribution of data for each model in three different climates. As evident from the time series plot, the simulated and observational data exhibited very good agreement, especially in points such as maximum and minimum values, where the models performed very well in identifying these values (Fig 7). It is worth mentioning that the error values for all models followed a normal distribution. In the Hyper–arid–moderate, the MARS model had the lowest error (RMSE = 0.22) and the lowest standard deviation (St.D = 0.22). The error range was lower compared to the other two models in the respective figure. In the arid–cold and arid–hot climates, the LSSVR and MARS models had lower values (RMSE = 0.22, MSE = 0.052) and (RMSE = 0.27, MSE = 0.055) respectively (Fig 8). According to the figures, the MARS model had equal or sometimes better results than the other two models, which is attributed to its better performance in estimating the calibration data, which had more data points (Fig 9).
In Fig 10, a visual summary of the data is presented in box plots, showing the data obtained from the models and observations. The figure illustrates that the space between the first and third quartiles of the data obtained from machine learning models is very close to the observational data. Additionally, the alignment of the median lines of these models with the observational data confirms the high accuracy of the machine learning models in estimating SPEI. An important point to note is that in the arid–cold climate, the MARS model had an outlier. However, in the arid–hot climate, both observational and simulated data had outliers, which the machine learning models are capable of identifying among the observational data. The max and min values in the hyper–arid–moderate and cold–arid climates were largely close to the values in the observational data, but in the arid–hot climate, the min values in both the MARS and GMDH models were higher than the observational data, while both max and min values in the LSSVR model are closer to the observational data.
For comparing the performance of the models, a Taylor diagram in Fig 11 was utilized. In this diagram, the performance of the models is evaluated using RMSE, Standard Deviation, Correlation Coefficient, and the proximity of model data to observational data. In the first stage, it was evident that the correlation coefficient between the machine–learning model data and observational data in all three climates ranged between 0.95 and 0.99, indicating a very good correlation (Fig 11a). Additionally, the RMSD for the data was also very low. In Figs 11b and 11c, the models have quite similar performance, but in Fig 11a, the MARS model is positioned slightly closer to the other two models. Based on this diagram, for the hyper–arid–moderate, arid–cold, and arid–hot regions, considering the discussed content and the position of the data relative to the observational data, the GMDH, LSSVR, and LSSVR models, respectively, had higher prediction accuracy. In general, the results of the Taylor diagram indicate that the machine learning models have a high prediction accuracy for the studied climates. Based on the results obtained, it is evident that all three models perform better in scenarios one and two. However, the results indicate that scenario one was the most effective for the arid–cold region among the three models used. This suggests that the performance of the signal parameters has enhanced forecast accuracy. Therefore, the utilization of large–scale signals can greatly aid in achieving more precise forecasting in certain regions.
In the Regline diagram (Fig 12a), the LSSVR model is much closer to the observed data in terms of appearance. Two other models have estimated SPEI values in the range (–1 and 0) lower than the observed values. In the hyper–arid–moderate climate (Fig 12b), the shape of the model data largely matches the observational data. In the arid–hot climate (Fig 12c), the models have estimated a large amount of data similar to the observation data, but the two LSSVR and MARS models have not been able to correctly estimate the minimum values, while the GMDH model has estimated these values with great accuracy.
Fig 13 displays the predicted values along with the uncertainty band in all three climates using the selected models. Based on the graphs, the models have successfully estimated the SPEI values, aligning with the observed trend (Fig 13a–c). The predicted values in all climates fall within the acceptable uncertainty range. However, in the hot climate, values at step 530 slightly exceed the permissible uncertainty range, indicating increased uncertainty during severe droughts. While average SPEI values and predicted trends are within acceptable uncertainty levels, significant uncertainty remains in maximum and minimum values, warranting further investigation (Fig 13a–c). Overall, the models show minimal uncertainty in all climates, providing almost accurate estimates.
4. Discussion
The prediction results of the SPEI index indicated that the hyper–arid–moderate climate experienced a period of drought. In the arid and cold climate, the drought period began in 1998, and the most severe drought periods were experienced in 2006 and 2009. The arid and hot climate experienced less drought compared to the hyper–arid–moderate, and arid and cold climates. Therefore, the severity of drought varied in different climates and occurred at different times. This depends greatly on the changes in precipitation and temperature in the region. On the other hand, precipitation patterns and temperature changes also depend to a large extent on the topography of the region, sea surface temperature, and elevation of the region. This creates a complex system of various factors that affect drought. Hence, it is necessary for managers to devise specific strategies based on the type of climate. The results of this study can greatly assist managers in water and environmental resource management. By utilizing the findings of this research, water resources can be managed more effectively, and optimal allocation of surface and groundwater resources can be achieved. Appropriate management solutions and scenarios can be adopted, and with the help of water consumers, the needs of various regions can be managed under critical conditions.
In this study, three different models with different machine–learning structures were employed. The accuracy of their predictions showed that LSSVR, which has a relatively simpler structure, can be used. However, if a regression relationship model is desired, MARS is recommended. One of the special features of this model is its ability to extract regression relationships with accurate predictions. The results of this study demonstrated the successful performance of machine learning models for this problem, and MARS was preferred due to its better accuracy compared to other models, regardless of model structure. Therefore, this model is most suitable for a specific period and range, but results may vary slightly in different groups and ranges. Hence, investigating them in other ranges is deemed necessary. The consideration of uncertainty is among the important and necessary aspects of result analysis. Based on the results obtained, the models performed differently in each climate, serving as an example of the performance of GMDH. It can be concluded that the models’ performance varies according to different conditions. Their effectiveness is influenced by changes in input variables and the amount of SPEI, as well as the physical conditions of the area. While physics and regional conditions are typically not factored into machine learning models, regional conditions are indirectly considered through data changes. Consequently, no single model can be deemed superior without further investigation in this field. Understanding factors beyond parameter settings in machine learning models, such as changes in input parameters and regional conditions, can enhance the models’ performance in predicting SPEI. Considering the vegetation cover of areas could be an effective parameter for future research.
Uncertainty in the output of the results was assessed, and was found to be within an acceptable range. However, there is uncertainty in data collection, drought index calculation, and modeling knowledge, which can impact the results and SPEI values. Therefore, it is recommended that this issue be addressed in future research. The impact of climate change can be analyzed in predicting the duration and frequency of the aforementioned drought period, which is always recommended for further investigation in the continuation of this research. Climate change causes changes in precipitation patterns and temperature, which in turn have a significant impact on drought. This can lead to alterations in the frequency and duration of drought periods. Among the consequences of drought and climate change on water resources, one can mention the reduction of surface and groundwater resources, which has a significant impact on the water supply sources of the area. Additionally, the reduction of rainfall is one of the potential effects of droughts, which can challenge the recharge of the aquifer. Therefore, it is recommended to consider appropriate and drought–adapted management approaches. The limitations of this research include the unavailability of information in recent years (since 2015– 2024), as well as not considering the vegetation cover of the region due to limited access and uncertainty in input data. These limitations can be overcome by adopting appropriate approaches for future research and comparing the results with those of this study.
5. Conclusions
This study aimed to predict the SPEI–12 drought index in three climates: Hyper–arid–moderate, containing four meteorological stations; arid–cold, containing 6 stations; and arid–hot, containing 3 stations across Iran over a period from 1966 to 2014. For each climate, the best predictive model along with the most influential variables were identified. The predictive models included LSSVR, GMDH, and MARS. To predict the SPEI–12 index alongside meteorological data (Tmin, Tmax, Tmean, Pr, and ETPc), large–scale climate signal data (AO, NAO, PDO, and SOI) were used to assess their impact on predicting SPEI–12. The findings of this study showed that in Hyper–arid–moderate climate, the drought period began in 2000, as before that, the index was mostly above zero. In the arid and cold climate, the behavior was similar to the moderate and hyper–arid climate, and the most severe droughts occurred in 2006 and 2009. However, in the arid and hot climate, the SPEI–12 index had more oscillatory behavior, and a severe drought occurred in 2010. The use of large–scale climate signals alongside meteorological variables increased the accuracy of the models in predicting SPEI–12. Based on the extracted results in the arid–cold and arid–hot climates, the LSSVR had better forecasting performance with lower error evaluation criteria. For the Hyper–arid–moderate climates, GMDH shoved better forecasting accuracy than other methods. However, MARS also yielded acceptable forecasting results by providing regression relationships in all three climates. The findings of this study can contribute to better water management processes in regions like Iran and, consequently, facilitate sustainable water resource use by farmers and water stakeholders. By analyzing the results and forecasting the SPEI for future periods, it is possible to predict drought conditions. This prediction can then inform the implementation of appropriate strategies to meet water needs. Ultimately, this proactive approach can aid in effective water resource management. This research recommends exploring various perspectives, including the use of remote sensing combined with machine learning, gamma testing to select the best input combination, and examining the performance of MARS, LSSVR, and GMDH models in similar climates in different regions. These methods will help assess the comprehensiveness of the research findings.
Acknowledgments
The authors acknowledge the support from the Strategic Research Area: The Middle East in the Contemporary World (MECW) at the Centre for Advanced Middle Eastern Studies, Lund University, Sweden.
References
- 1.
Pereira LS, Cordery I, Iacovides I. Coping with water scarcity: Addressing the challenges. Springer Science & Business Media. 2009.
- 2. Wilhite D, Sivakumar M, Pulwarty R. Managing drought risk in a changing climate: the role of national drought policy. Weather Clim Extrem. 2014;3:4–13.
- 3. Burke EJ, Perry RH, Brown SJ. An extreme value analysis of UK drought and projections of change in the future. Journal of Hydrology. 2010;388(1–2):131–43.
- 4. Djerbouai S, Souag–Gamane D. Drought forecasting using neural networks, wavelet neural networks, and stochastic models: case of the Algerois Basin in North Algeria. Water Resources Management. 2016;30:2445–64.
- 5. Fung KF, Huang YF, Koo CH, Mirzaei M. Improved SVR machine learning models for agricultural drought prediction at downstream of Langat River Basin, Malaysia. Journal of Water and Climate Change. 2020;11(4):1383–98.
- 6. Mendicino G, Senatore A, Versace P. Groundwater resource index (GRI) for drought monitoring and forecasting in a Mediterranean climate. Journal of Hydrology. 2008;357:282–302.
- 7.
Deo RC, Şahin M. Application of the artificial neural network model for prediction of monthly standardized precipitation and evapotranspiration index using hydrometeorological parameters and climate indices in eastern Australia. Atmospheric Research. 2015, 161, 65–81.
- 8. Zhang Y, Yang H, Cui H, Chen Q. Comparison of the ability of ARIMA, WNN and SVM models for drought forecasting in the Sanjiang Plain, China. Natural Resources Research. 2020;29(2):1447–64.
- 9. Esfahanian E, Nejadhashemi AP, Abouali M, Adhikari U, Zhang Z, Daneshvar F, et al. Development and evaluation of a comprehensive drought index. J Environ Manage. 2017;185:31–43. pmid:28029478
- 10. Begueria S, Vicente–Serrano S, Reig F, Latorre B. The standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring. International Journal of Climatology. 2014;34(10):3001–23.
- 11. Li X, He B, Quan X, Liao Z, Bai X. Use of the Standardized Precipitation Evapotranspiration Index (SPEI) to characterize the drying trend in Southwest China from 1982–2012. Remote Sensing. 2015;7(8):10917–37.
- 12. Manatsa D, Mushore T, Lenouo A. Improved predictability of droughts over the Southern Africa using the Standardized precipitation evapotranspiration index and ENSO. Theoretical and Applied Climatology. 2015;127(1–2):259–74.
- 13. Liu Z, Wang Y, Shao M, Jia X, Li X. Spatiotemporal analysis of multiscalar drought characteristics across the Loess Plateau of China. Journal of Hydrology. 2016;534:281–99.
- 14. Maca P, Pech P. Forecasting SPEI and SPI Drought Indices Using the Integrated Artificial Neural Networks. Comput Intell Neurosci. 2016;2016:3868519. pmid:26880875
- 15. Paulo A, Pereira L. Prediction of SPI drought class transitions using Markov chains. Water Resources Management. 2009;21:1813–27.
- 16. Abebe A, Foerch G. Stochastic simulation of the severity of hydrological drought. Water & Environment J. 2008;22(1):2–10.
- 17.
Alam ATMJ, Rahman MS, Sadaat AHM. Computational intelligence techniques in earth and environmental sciences. Springer, Dordrecht. 2014.
- 18. Bazrafshan O, Salajegheh A, Bazrafshan J, Mahdavi M, Marj A. Hydrological drought forecasting using ARIMA models (case study: Karkheh Basin). Ecopersia. 2015;3:1099–117.
- 19. Mossad A, Alazba A. Drought Forecasting Using Stochastic Models in a Hyper-Arid Climate. Atmosphere. 2015;6(4):410–30.
- 20. Ghordoyee Milan S, Kayhomayoon Z, Arya Azar N, Berndtsson R, Ramezani MR, Kardan Moghaddam H. Using machine learning to determine acceptable levels of groundwater consumption in Iran. Sustainable Production and Consumption. 2023;35:388–400.
- 21. Kayhomayoon Z, Arya Azar N, Ghordoyee Milan S, Berndtsson R, Najafi Marghmaleki S. Application of soft computing and evolutionary algorithms to estimate hydropower potential in multi–purpose reservoirs. Applied Water Science. 2023;13(9):183.
- 22. Arya Azar N, Ghordoyee Milan S, Kayhomayoon Z. The prediction of longitudinal dispersion coefficient in natural streams using LS-SVM and ANFIS optimized by Harris hawk optimization algorithm. J Contam Hydrol. 2021;240:103781. pmid:33799017
- 23.
Arya Azar N, Kardan N, Ghordoyee Milan S. Developing the artificial neural network–evolutionary algorithms hybrid models (ANN–EA) to predict the daily evaporation from dam reservoirs. Engineering with Computers. 2023, 39(2), 1375–1393.
- 24.
Ali Z, Hussain I, Faisal M, Nazir HM, Hussain T, Shad MY, et al. Forecasting drought using multilayer perceptron artificial neural network model. Advances in Meteorology. 2017, 2017, 1–9
- 25. Kousari MR, Hosseini ME, Ahani H, Hakimelahi H. Introducing an operational method to forecast long–term regional drought based on the application of artificial intelligence capabilities. Theoretical and Applied Climatology. 2017;127:361–80.
- 26. Seibert M, Merz B, Apel H. Seasonal forecasting of hydrological drought in the Limpopo Basin: a comparison of statistical methods. Hydrology and Earth System Sciences. 2017;21:1611–29.
- 27. Fung KF, Huang YF, Koo CH. Coupling fuzzy–SVR and boosting–SVR models with wavelet decomposition for meteorological drought prediction. Environmental Earth Sciences. 2019;78(1):1–18.
- 28. Khan N, Sachindra DA, Shahid S, Ahmed K, Shiru MS, Nawaz N. Prediction of droughts over Pakistan using machine learning algorithms. Advances in Water Resources. 2020;139103562.
- 29. Shamshirband S, Hashemi S, Salimi H, Samadianfard S, Asadi E, Shadkani S, et al. Predicting standardized streamflow index for hydrological drought using machine learning models. Engineering Applications of Computational Fluid Mechanics. 2020;14(1):339–50.
- 30. Deo RC, Kisi O, Singh VP. Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model. Atmospheric Research. 2017;184:149–75.
- 31. Cordery I, McCall M. A model for forecasting drought from teleconnections. Water Resources Research. 2000;36(3):763–76.
- 32. Deo RC, Salcedo–Sanz S, Carro–Calvo L, Saavedra–Moreno B. Drought prediction with standardized precipitation and evapotranspiration index and support vector regression models. Integrating disaster science and management. 2018:151–74.
- 33. Tian Y, Xu Y-P, Wang G. Agricultural drought prediction using climate indices based on Support Vector Regression in Xiangjiang River basin. Sci Total Environ. 2018;622–623:710–20. pmid:29223897
- 34. Morid S, Smakhtin V, Bagherzadeh K. Drought forecasting using artificial neural networks and time series of drought indices. International Journal of Climatology: A Journal of the Royal Meteorological Society. 2007;27(15):2103–11.
- 35. Canon J, Gonzalez J, Valde’s J. Precipitation in the Colorado River Basin an Its Low Frequency Associations With PDO and ENSO Signals. Journal of Hydrology. 2007;344:252–64.
- 36. Fernandes VJ, de Louw PG, Bartholomeus RP, Ritsema CJ. Machine learning for faster estimates of groundwater response to artificial aquifer recharge. Journal of Hydrology. 2024;637131418.
- 37. Singh AK, Kumar P, Ali R, Al–Ansari N, Vishwakarma DK, Kushwaha KS, et al. An integrated statistical–machine learning approach for runoff prediction. Sustainability. 2022;14(13):8209.
- 38. Trach R, Trach Y, Kiersnowska A, Markiewicz A, Lendo–Siwicka M, Rusakov K. A study of assessment and prediction of water quality index using fuzzy logic and ANN models. Sustainability. 2022;14(9):5656.
- 39. Najafzadeh M, Oliveto G, Saberi-Movahed F. Estimation of scour propagation rates around pipelines while considering simultaneous effects of waves and currents conditions. Water. 2022;14(10):1589.
- 40. Kayhomayoon Z, Jamnani MR, Rashidi S, Milan SG, Azar NA, Berndtsson R. Soft computing assessment of current and future groundwater resources under CMIP6 scenarios in northwestern Iran. Agricultural Water Management. 2023;285:108369.
- 41. Kumar V, Kedam N, Sharma KV, Mehta DJ, Caloiero T. Advanced machine learning techniques to improve hydrological prediction: A comparative analysis of streamflow prediction models. Water. 2023;15(14):2572.
- 42. Poornima S, Pushpalatha M. Drought prediction based on SPI and SPEI with varying timescales using LSTM recurrent neural network. Soft Computing. 2019;23:8399–412.
- 43. Dikshit A, Pradhan B, Huete A. An improved SPEI drought forecasting approach using the long short-term memory neural network. J Environ Manage. 2021;283:111979. pmid:33482453
- 44. Mokhtar A, Jalali M, He H, Al-Ansari N, Elbeltagi A, Alsafadi K, et al. Estimation of SPEI Meteorological Drought Using Machine Learning Algorithms. IEEE Access. 2021;9:65503–23.
- 45.
Shang J, Zhao B, Hua H, Wei J, Qin G, Chen G. Application of informer model based on SPEI for drought forecasting. Atmosphere. 2023, 14(6), 951.
- 46. Soh Y, Koo C, Huang Y, Fung K. Application of artificial intelligence models for the prediction of standardized precipitation evapotranspiration index (SPEI) at Langat River Basin, Malaysia. Computers and Electronics in Agriculture. 2018;144:164–73.
- 47. Abbasi A, Khalili K, Behmanesh J, Shirzad A. Drought monitoring and prediction using SPEI index and gene expression programming model in the west of Urmia Lake. Theoretical and Applied Climatology. 2019;138:553–67.
- 48. Xu D, Zhang Q, Ding Y, Zhang D. Application of a hybrid ARIMA-LSTM model based on the SPEI for drought forecasting. Environ Sci Pollut Res Int. 2022;29(3):4128–44. pmid:34403057
- 49. Ghasemi P, Karbasi M, Nouri AZ, Tabrizi MS, Azamathulla HM. Application of Gaussian process regression to forecast multi–step ahead SPEI drought index. Alexandria Engineering Journal. 2021;60(6):5375–92.
- 50. Tian W, Wu J, Cui H, Hu T. Drought prediction based on feature–based transfer learning and time series imaging. IEEE Access. 2021;9:101454–68.
- 51. Karbasi M, Karbasi M, Jamei M, Malik A, Azamathulla HM. Development of a new wavelet–based hybrid model to forecast multi–scalar SPEI drought index (case study: Zanjan city, Iran). Theoretical and Applied Climatology. 2022;147:499–522.
- 52. Lotfirad M, Esmaeili–Gisavandani H, Adib A. Drought monitoring and prediction using SPI, SPEI, and random forest model in various climates of Iran. Journal of Water and Climate Change. 2022;13(2):383–406.
- 53. Danandeh Mehr A, Rikhtehgar Ghiasi A, Yaseen ZM, Sorman AU, Abualigah L. A novel intelligent deep learning predictive model for meteorological drought forecasting. Journal of Ambient Intelligence and Humanized Computing. 2023;14(8):10441–55.
- 54. Khalili A, Rahimi J. High–resolution spatiotemporal distribution of precipitation in Iran: a comparative study with three global–precipitation datasets. Theoretical and Applied Climatology. 2014;118:211–21.
- 55. Liu W, Zhu S, Huang Y, Wan Y, Wu B, Liu L. Spatiotemporal variations of drought and their teleconnections with large–scale climate indices over the Poyang Lake Basin, China. Sustainability. 2020;12(9):3526.
- 56. Kao H, Yu J. Contrasting Eastern–Pacific and Central–Pacific types of El Niño. Journal of Climate. 2009;22:615–32.
- 57. Kushnir Y, Robinson WA, Bladé I, Hall NMJ, Peng S, Sutton R. Atmospheric GCM response to extratropical SST anomalies: Synthesis and evaluation. Journal of Climate. 2002;15(16):2233–56.
- 58. Vicente–Serrano SM, Beguería S, López–Moreno JI. A multiscalar drought index sensitive to global warming: the standardized precipitation evapotranspiration index. Journal of Climate. 2010;23(7):1696–718.
- 59. Friedman JH. Multivariate adaptive regression splines. The Annals of Statistics. 1991;19(1):1–67.
- 60. Najafzadeh M, Anvari S. Long-lead streamflow forecasting using computational intelligence methods while considering uncertainty issue. Environ Sci Pollut Res Int. 2023;30(35):84474–90. pmid:37369900
- 61. Ivakhnenko AG. The group method of data handling; a rival of the method of stochastic approximation. Soviet Automatic Control c/c of Avtomatika. 1968;1:43–55.
- 62. Moshizi ZGN, Bazrafshan O, Etedali HR, Esmaeilpour Y, Collins B. Application of inclusive multiple model for the prediction of saffron water footprint. Agricultural Water Management. 2023;277(1):108125.