Machine learning models development for accurate multi-months ahead drought forecasting: Case study of the Great Lakes, North America

The Great Lakes are critical freshwater sources, supporting millions of people, agriculture, and ecosystems. However, climate change has worsened droughts, leading to significant economic and social consequences. Accurate multi-month drought forecasting is, therefore, essential for effective water management and mitigating these impacts. This study introduces the Multivariate Standardized Lake Water Level Index (MSWI), a modified drought index that utilizes water level data collected from 1920 to 2020. Four hybrid models are developed: Support Vector Regression with Beluga whale optimization (SVR-BWO), Random Forest with Beluga whale optimization (RF-BWO), Extreme Learning Machine with Beluga whale optimization (ELM-BWO), and Regularized ELM with Beluga whale optimization (RELM-BWO). The models forecast droughts up to six months ahead for Lake Superior and Lake Michigan-Huron. The best-performing model is then selected to forecast droughts for the remaining three lakes, which have not experienced severe droughts in the past 50 years. The results show that incorporating the BWO improves the accuracy of all classical models, particularly in forecasting drought turning and critical points. Among the hybrid models, the RELM-BWO model achieves the highest level of accuracy, surpassing both classical and hybrid models by a significant margin (7.21 to 76.74%). Furthermore, Monte-Carlo simulation is employed to analyze uncertainties and ensure the reliability of the forecasts. Accordingly, the RELM-BWO model reliably forecasts droughts for all lakes, with a lead time ranging from 2 to 6 months. The study’s findings offer valuable insights for policymakers, water managers, and other stakeholders to better prepare drought mitigation strategies.


Introduction
Drought is a harmful and creeping climatic crisis that can occur in all climates and negatively influences ecosystems and human societies.Unlike rapidly developing natural disasters (e.g., hurricanes, floods, and storms), drought is a slow-developing phenomenon that poses a significant and enduring threat to human daily life, agriculture, ecology, and the economy [1,2].The lack of precipitation for a long time is the direct and leading cause of drought [3].Since two decades ago, droughts have become increasingly common and severe due to climate change brought on by global warming, causing food and water crises in some regions worldwide [4][5][6].Moreover, the occurrence of drought leads to various issues, including desertification, water scarcity, and forest fires, which have negative repercussions at both local and global scales [7][8][9][10].Furthermore, drought that occurs in areas located in the North American continent and other countries has a great relationship with forest fires [11,12].Natural disasters that struck almost all world countries have severely affected enormous numbers of people all over the globe.Statistically analyzed data estimates that about 3.8 billion people worldwide have suffered from these catastrophes and that half of those have suffered due to droughts [13].Also, the drought disaster has directly or indirectly killed 1.3 million people out of 3.5 million who died due to natural disasters.
Drought is typically categorized into four types: meteorological drought, agricultural drought, hydrological drought, and socio-economic drought [14][15][16].The three physical drought categories (i.e., metrological, agricultural, and hydrological droughts) self-explanatorily describe the lack of water in the hydrological cycle.It can be said that drought is associated significantly with precipitation, and thus the onset of metrological drought begins when there is a rainfall deficit.Hydrological droughts often deal with the decrease in the river flow, groundwater level, dam reservoir storage, and lake water level due to long-term meteorological droughts [17].In addition, persistent hydrological drought causes agricultural droughts.In agricultural drought, soil moisture is deficient, resulting in a reduction in crop yields.
Several drought indices have been developed since the 1960s for monitoring and forecasting droughts based on individual or multiple meteorological or hydrological variables (e.g., rainfall, evapotranspiration, streamflow, groundwater level, and runoff) [18].It is worth mentioning that among all drought indexes, the "standardized precipitation index (SPI)", proposed by [19], is broadly used to quantify drought across the world and receives considerable attention from researchers [5,20,21].Based on SPI principles, several hydrological parameters have been derived and widely used to assess and monitor droughts, such as standardized streamflow index (SSI) [22,23], standardized hydrological drought index (SHDI) [24], standardized groundwater level index (SGI) [25], and standardized runoff index (SRI) [26,27].These indices' popularity and widespread use can be attributed to their ability to effectively evaluate and track drought conditions across diverse timeframes and geographic locations while requiring minimal data inputs and maintaining strong comparability in time and space [28,29].
Since drought is a manifestation of climate variability events, scholars have developed drought forecasting models to minimize the adverse effects of drought on agriculture, water resource systems, socio-economic aspects, and hydropower generation.Notably, climate variability events refer to natural fluctuations in climate patterns that can affect temperature, precipitation, and other climate factors over time.Climate change can intensify the impacts of drought, leading to more frequent occurrences of water scarcity [30,31].This poses significant challenges for ecosystems and communities that depend on reliable access to water resources.Thus, drought forecasting is substantial and necessary to make effective decisions in managing water resources, executing drought mitigation strategies, and establishing robust early warning systems [32].Therefore, selecting an appropriate model is the most crucial factor in drought forecasting after identifying the drought type.The forecast of droughts is frequently implemented using physical and data-driven models.Conceptual and physical models that necessitate substantial data about a catchment or basin process, resulting in complicated predicting models [24].However, data-driven models are more popular because they are flexible, require less data, and do not consider the process of a studied catchment.In recent decades, different machine learning (ML) models and statistical approaches were intensively utilized, such as autoregressive integrated moving average [33,34], support vector regression [35][36][37], adaptive neuro-fuzzy inference system [38][39][40], artificial neural network (ANN) [41][42][43], deep learning models [44,45], and assembling based approaches like random forest [46,47] for forecasting drought over the world.Some researchers have developed hybrid models based on natureinspired algorithms that are effective and adaptable in forecasting drought.The meta-heuristic algorithms can help produce models with high prediction accuracy by training or optimizing the parameters of ML models, which substantially affects the model performance [48,49].Notably, fourteen meta-algorithms have been intensively used to improve the ML model's performance in drought forecasting [24,[50][51][52][53][54].
Enhancing the structure of ML-based models in drought forecasting has aroused the interest of researchers in the last few years.Some scholars tackled fundamental issues of some ML models, such as ANN.Despite being a popular approach among ML-based models in drought sectors, this model encounters difficulties related to generalization, complexity, and learning speed, along with the challenge of becoming stuck in a local minimum.Thus, an extreme learning machine (ELM) was introduced to address the drawbacks of ANN (i.e., complexity, slow convergence, and overfitting) and then used widely to solve engineering problems and drought forecasting.ELM has several advantages compared to ANN, such as better generalization, fast learning, simpler structure, and superior performance [55,56].Accordingly, ELM has been applied for drought forecasting in different climate regions and obtained promising results [1,30,57,58].Nevertheless, it is possible to adjust the model structure to enhance its performance and stability by using regularization parameters and nature-inspired algorithms to train the model effectively.
Researchers have not paid enough attention to monitoring and forecasting hydrological drought, despite its importance in managing water resources [59].The main objective of this study is to evaluate and forecast hydrological drought in the Great Lakes, situated in North America, which contains over one-fifth of the world's fresh water.The Great Lakes are an essential source of fresh water for people, and droughts in these water bodies can significantly impact water availability and the ecosystem.Notably, the recent droughts in the Great Lakes have been mainly attributed to climate change-induced elevated water temperatures, resulting in increased evaporation due to a substantial decline in winter ice cover [60].This study used Multivariate Standardized Lake Water Level Index (MSWI) to assess the hydrological drought over the Grate Lakes.MSWI can simultaneously display the drought condition from the standpoints of many time scales.To forecast hydrological drought for different lead-time horizons (from one to six months ahead), four hybrid models were developed by combining regularized extreme learning machine (RELM), RF, ELM, and SVR with the Beluga whale optimization (BWO) algorithm.The BWO algorithm is a novel nature-inspired algorithm with robust and efficient performance in solving engineering problems.Moreover, it displays excellent and superior results when tested against fifteen meta-heuristic algorithms in solving different engineering problems [61].The hybrid models were compared and validated against established standard models (RF, ELM, and SVR).
Previous drought works have suggested using standard performance metrics such as correlation of determinations (R 2 ), root mean square error (RMSE), and others to choose the best prediction models [62,63].However, drought time series data, particularly at higher time scales, include a significantly high auto-correlation [64][65][66].This can result in very high R 2 values and lower forecasting errors, rendering such metrics less effective in selecting the best models.Accordingly, this paper's proposed advanced evaluation method focuses on accurately detecting the turning points in drought data and evaluating the model's ability to capture them effectively.Finally, uncertainty analysis was used to evaluate the forecasting accuracy and select which period lead times can be reliably forecasted.

Random forest (RF)
The principle of an ensemble and bagging learning technique served as the foundation for creating the RF.The decision tree approach utilizing the bagging methodology is a crucial factor in the effectiveness of this model in solving engineering problems, especially those related to regression [67].In RF, nodes are nominated randomly based on the most significant input features, enhancing the learning, forecasting, and maintaining reliability to improve the generalization capacity.The following procedures are considered to establish the RF model [57]: i.From training data observations, select random i data samples bootstrap sample.
ii. Establish the decision tree related to the data selected (i) from the above step.
iii.Select the n-trees (n trees ) that require to be built.iv.Repeat the steps from one to three.v. Forecast MSWI by accumulating the aggregative forecasts of n trees .
The RF model's capacity has gained a good reputation in solving hydrological and environmental problems and also conducted in drought forecasting [46,[68][69][70][71].

Support vector regression (SVR)
SVR is a supervisor learning machine used widely in water resource applications and drought forecasting.The main idea behind SVR is to map the input vectors to higher feature dimensional spaces through a nonlinear mapping and then conduct a regression model based on the calculated feature spaces.Even with limited observations from the training datasets, the approach can effectively find an excellent solution to nonlinear problems [48,72].The structural risk minimization (SRM) concept, which SVM employs, seeks to minimize the upper and lower bounds of the generalization error, which is the sum of the training error and a confidence level [73,74].This concept is distinct from the more common empirical risk minimization (ERM) method, which focuses only on reducing the total error generated during the training stage.
Assuming the MSWI i in the datapoints is the actual normalized value of the i th sample while the input parameter of that target is X i .The sample set accordingly can be derived as {(Xi, Yi)} i=1 N where the N is the total samples, SVR uses the form in the following Equation to approximate the function.Finally, the term Y i is this case is MSWI i .
The ;(X) is the high-dimensional space.Estimating coefficients (W and b) requires using a regularized risk function, represented by Eq (2).
The regularized term in the above equations is kWk 2 , and C to calculate the trade-off between training error and model flatness.In Eq (2), the second term represents the empirical error determined by the intensity loss function (Eq 3).The loss function equals zero if the anticipated value corresponds with the tube (a region around the predicted regression line).Conversely, the loss is a value representing the difference between the tube residual of the tube (ε) and the projected value [75].The Equation above is converted into the principal objective function represented by Eq 4 to estimate the coefficients (W and b).
Subject to : In the Equation above the slack variables are ξ 1 , and x * 1 .Eq (5) is expressed as follows once the kernel function K(X i , X j ) is introduced; Subject to : In the above Equation, the α i , and a * i are the Lagrange multipliers while the i, j are different samples that are used in the training dataset.Accordingly, Eq (3) can be expressed as below: Notably, the Gaussian kernel density function is used in this study [76].

Extreme learning machine (ELM)
ELM is a cutting-edge algorithm invented in 2006 to train the single-layer feedforward neural network.ELM's inventor affirmed that this algorithm could replace the conventional algorithm (i.e., backpropagation algorithms) to train ANN due to its quick learning and superior generalization [77].the ELM model can be expressed mathematically as follows: where Parameters a and b are the hidden node parameters (weights and bias values), x i is the input variable, R refers to the set of real number values, R d is the d-dimensional set of real numbers, and G is the nonlinear transfer function.The transfer function is essential in the ELM model because it is responsible for mapping input data to a complex and higher dimensional space; hence, the model can find a satisfactory solution for complex engineering problems [78].ELM algorithm through training ANN is based on two essential phases, the first stage is random features mapping, and the second one is called linear parameters solving.ELM uses several nonlinear activation functions to convert the inputs (i.e., variables) into feature space by randomly adjusting weights and bias values of t'e ELM's hidden layer.Accordingly, the hidden layer parameters (a, b) are assigned randomly.The other significant stage in ELM is to calculate the weight values (θ) which connect the hidden layer of the ELM model with the output layer.These wights are linearly calculated using least square error sense by minimizing the approximations error: The term Y is this case is MSWI (training data) and the H matrix depends on input variables, and hidden layer parameters in the above formula can be calculated below [79]: The Y is the training data response, which can be expressed as: The k.k is the Frobenius norm, and thus, the calculated output weights |(θ*) can be determined as follow: In the above mathematical expression, the H + is the Moore-Penrose generalized inverse of the matrix of H. ELM differs from conventional neural networks in that there is no requirement to fine-tune every parameter of the model, including the weights and biases of the hidden layer.After random selection of the hidden layer weight and bias values, the ELM can be considered a linear system.Then, the weight values that connect the hidden layer with the output layer of ELM are systematically determined by the generalized inverse procedure of the hidden layer output matrices.As a result, the ELM model is several times quicker than classical algorithms used for training feedforward networks.

Regularized extreme learning machine
As discussed in the previous section, the only output weight matrix of the ELM model is calculated from the training procedure.The training error should decrease to a minimum value to obtain these weights.However, minimizing only the training error might reduce the generalization capacity of the developed model when estimating samples that were not included in training data points [80].Thus, the classical ELM solutions would have some serious weaknesses.The first problem is that ELM, based on the empirical risk minimization (ERM) concept, frequently leads to over-fitting models [81].Due to its direct calculation of the lowest norm least-squares solutions, ELM has a poor control capability, which is regarded as the second disadvantage.Another drawback is that it might lead to less reliable estimates, for example, if data includes outlier values or heteroskedasticity.
The highest level of generalizability is achieved in some advanced models when the weights' norm values and forecasting training -error rate are minimized simultaneously [82].Accordingly, this paper suggested RELM based on the structural risk minimization (SRM) concept of statistical learning theory to address mentioned disadvantages [83].The uniqueness of this characteristic in the RELM model is attributed to its regularization parameter (γ).It is important to mention that the loss function for the ELM model is modified in this work using γ parameter to reduce both the training error and the weight values norm simultaneously.The mathematical expression of the cost function for RELM model can be illustrated below: Subject to error ε: Thus, the relation can be expressed as the following mathematical expression: The Lagrangian Equation is defined as the follow to calculate the optimum solution: In the above equations, the Lagrangian multiplier matrix is λ, and ε is the training error, ε The N is the total number of training samples.According to the previous equations, the optimal weights of the RELM output layer are computed as the following Equation: The term I in the above Equation is called the unit matrix.The formula of calculated ŷ vector is suitable when the hidden neurons are more than the training samples.Otherwise, the following Equation can be used to find the weights.

Beluga whale algorithm (BWO)
BWO is a sophisticated metaheuristic algorithm invented in 2022 to solve engineering and optimization problems [61].The algorithm was employed to improve the efficiency of drought forecasting by training the RELM model.BWO simulates the behaviour of beluga whales that live in regular groups, searching for and catching prey in the sea [84].The mathematical model of BWO depends on three essential stages: exploration, exploitation, and whale fall.The first stage (exploration phase) in the mathematical model simulates the swimming behaviour of the Beluga whale, while the exploitation stage is inspired by preying behaviour of these animals.The third stage is inspired by the whale's fall into the depths of the sea, or it was hunted by people or predators such as polar bears; hence it's called "whale fall".It is worth noting that during the optimization process, the whale fall stage is executed in each iteration after the exploration and exploitation stages are completed.The main steps needed to implement this algorithm can be summarized as follows: i. Initialization: input the algorithm parameters, such as the maximum number of iterations (T max ) and population size (n).The initial position of each population size (beluga whales) is randomly assigned in the space search.Besides, the fitness values are computed according to the given objective function.
ii. Extraction and exploration phases update: according to the balance factor (Bf) value, each beluga whale is appointed to be involved in the exploration or exploitation stage.If the Bf value exceeds 0.5, that means the Beluga whale is entering into the exploration phase, and thus its position will be updated according to the following Equation: The current iteration in the evacuation is T while the new position of ith whale on jth dimensional space is x i,j T+1 .The symbols r 1 , and r 2 are random numbers between (0, 1).x i;p j T is the position of the ith beluga whale on p j dimension.
Otherwise, when the Bf value is less than 0.5, the updating mechanism is controlled by the exploitation stage, and eventually, the position of each whale is updated using Eq 21.After that, the fitness values of new positions are determined and sorted to achieve the best outcome in the current iteration.
T and X r T are current positions for the ith beluga whale and a random beluga whale, respectively.The two terms in the Equation (r 3 and r 4 ) are random numbers between (0, 1) while C 1 is a constant number that depends on r 4 current and maximum iteration (T and . Finally, the L F is the Levy flight function [85]. β is the default constant (1.5), while μ, and v are normally distributed random numbers.
iii.Whale fall phases update: The probability of whale fall (W f ) is determined in each iteration, considering that some beluga whales may fall into the deep sea or die.Consequently, it is essential in such cases to update the position of a beluga whale.
In the above equations, the r 5 , r 6 , and r 7 are assigned randomly between one and zero.Besides, X step is the step size of a whale fall.
In the above Equation, the u b , and l b are upper and lower bond variables.The C 2 is a step factor that depends mainly on W f .
Finally, the probability of whale fall can be calculated using the following formula.
iv. Terminating condition check: If the current iteration exceeds the maximum number of iterations, the BWO algorithm terminates.Otherwise, repeat steps ii to iii.

Model development
Multiple ML-based models have been created to forecast droughts in the Great Lakes region with the ability to forecast several months in advance.This study integrated a modified version of ELM called RELM with BWO to create a hybrid model known as RELM-BWO.The performance of this model was compared to classical models such as ELM, SVR, and RF.The RELM-BWO model was also validated against other hybrid models that combined classical models with BWO, namely SVR-BWO, RF-BWO, and ELM-BWO.All models were trained using past values (5-month lag) to forecast MSWI for several time steps ranging from 1 to 6 months ahead.In this study, time series drought data is divided into a training set (75% of the data from December 1921 to January 1997) and a testing set (data points from February 1997 to December 2021).This division ratio balances the amount of data for training and testing in machine learning models, resulting in good results [86,87].The method aims to train the model on historical data and assess its predictive efficiency on the remaining portion.This approach provides insights into the accuracy of future forecasts for drought data using historical information and is considered more challenging than random data division in time series analysis.
Once the input selection was determined, all input vectors and their corresponding targets were normalized to a range between 0 and 1.For the hybrid models, the primary objective of BWO was to optimize the hyperparameters of the classical models, as they greatly influence the accuracy of the model's predictions.The objective function used was the root mean square error, and BWO aimed to minimize this function by selecting the best hyperparameters for each model.In the case of RELM-BWO, BWO was employed to determine the optimal value of the regularization parameter (γ) and the weight and bias values of the input layer.The remaining parameters for the other models calculated by BWO are listed in Table 1.All the models were implemented using MATLAB 2020a, and the primary process can be observed in Fig 3.

Statistical parameters
The forecasting results were evaluated to identify the best forecasting model.Several statistical indicators have been used to assess the forecasting results, such as Mean absolute error (MAE), Root means square error (RMSE), Relative error (RE%), and Correlation of determination

ELM
Weights range from -1 to 1, Hidden Layer Nodes are 7.

BWO
The maximum iteration is 100, and the number population is 50.(R 2 ).MAE is used widely to assess regression models in hydrological studies.This criterion measures the mean absolute forecasting error.RMSE is one of the most famous indices used to evaluate comparable models.RMSE measures the mean square forecasting error and has a higher value than MAE.Besides, RE% is the absolute error divided by the actual data value, while R 2 is a measurement used to explain how much variability of the forecasted values can be caused by its relationship to the measured data.The mathematical expression of the statistical metrics can be seen as following [76]:

RELM-BWO
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 n

RE% ¼ MSWI obs i À MSWI pred i MSWI obs
The best forecasting models provide the possible lower value of RMSE, RE%, and the highest R 2 .In the above formulas, the n is the total number of samples, MSWI obs i is the observed drought value, MSWI pred i is the forecasted values, and � MSWI pred is the mean forecasted value.

Uncertainty analysis
The forecasting accuracy decreases when the lead time horizon increases [59,88]; hence, adopting an efficient norm to assess forecasting efficiency is crucial.Since it is difficult to evaluate the reliability of the ML-based models via classical performance measures (i.e., RMSE, R 2 , etc.) in forecasting the drought at longer lead time scales, UN is used because it is an efficient tool to examine the dependability of forecasting results.Based on the concept of Monte Carlo Simulation, the UN of the adopted model is quantified.This study's consideration of input parameter uncertainty relates to the accuracy and representativeness of the input data used to make forecasts [89].The applied approach comprises producing random input data that matches the empirical probability distribution utilized to represent the input parameter, executing the forecasting model, and generating output from the data obtained [90].The current study depends on randomly resampling the input data set without replacement 100000 times while maintaining the same ratio rate between the training and testing sets [91,92].In other words, for each observation, 100000 data points are generated from their corresponding empirical probability distribution and then fed to the applied model to generate the responses.To calculate the 95% confidence intervals for each response, the 2.5 th (X L ) and 97.5 th (X U ) percentiles of the cumulative distribution comprising 100000 datapoints are determined.The validity of the final drought model is assessed by calculating its robustness measure, which is determined by the proportion of observed drought values that fall within the 95% confidence interval.A higher ratio indicates greater validity, while a lower ratio indicates the opposite.The Equation for the 95% prediction uncertainty (95PPU) is provided below.

Bracketted by 95 PPU
In the Eq 32, the term 95PPU denotes 95% predicted uncertainty; n is the total number of drought data records; i is the current month that can be changed from 1 to m; X i L , and X i U are lower and upper bands of uncertainty, and X i Obs is the observed data of ith month.Additionally, the average width of the confidence interval is calculated using the d−factor, which can be expressed through the following Equation: The � s x is the standard deviation of the observed data while the average distance between the upper band (97.5 th percentile) and the lower band (2.5 th percentile).The higher value of the d-factor represents a higher degree of uncertainty.Theoretically, the d-factor is zero, but it normally cannot be attained due to the model uncertainty.Thus, the ideal value of the d-factor is thus considered to be smaller than one [93].

Case study and data description
The Great Lakes are a collection of interconnected freshwater lakes with some sea features.They are in the central-eastern part of North America and are linked to the Atlantic Ocean by the Saint Lawrence River.All five lakes-Superior, Michigan, Huron, Erie, and Ontario-are located inside or close to the international boundary between Canada and the United States (see Fig 4 [94]).Lakes Michigan and Huron (Lake Michigan-Huron) are hydrologically considered one body of water since they are connected at the Straits of Mackinac.Furthermore, the Great Lakes Waterway qualifies modern ships to navigate between the lakes.Since approximately one-fifth (21%) of the world's surface freshwater is contained in the Great Lakes, they are considered the largest group of freshwater lakes on the earth in terms of the total surface area (244,106 km 2 ) [95].The Great Lakes are one of the most significant water resources in the world, and they provide more than fifty million people in eastern North America with water that can be used for various purposes [96].The five lakes' cumulative capacity is approximately 6 x 10 15 gallons, which is enough water to drown North America to an average depth of one meter and indicates how huge the lakes are.It is essential to mention that Lake Superior is the most immense, profound, and upstream of the Lakes [97].It is important to note that the water level data were acquired and gathered from open access source website [98].The data covers a long period of time from 1918 to 2021.Notably, the raw data for all lakes are available in B1 Table in S1 File, which can be found in Appendix B.

Multivariate standardized water level index (MSWI)
Standardized water level index (SWI) is calculated using the same method as the SPI, but instead of using monthly rain, the water level is used as input [99].The MSWI is calculated based on several SWI time series for a specific station, and each time scale represents a particular time horizon.The MSWI considers every subset of SWI time scales (i.e., SWI-1, SWI-2, SWI-3, . .., to SWI-48 months) for studied lakes.Examining multiple SWI time series in detail can offer valuable insights into drought events.However, comparing and analyzing these time series at different scales can be challenging for scholars and may lead to confusion and inefficiencies in addressing the issues.In addition, it is challenging to precisely define the duration of the timescale, such as long or short-term, and accurately connect it to a specific category of drought impact.For example, short-term drought can be represented on a three-month time scale, which raises the logical question of why a standard drought index cannot define two or four-month scales.Accordingly, the standard drought index may not be derived from a specific criterion to classify some drought time scales (e.g., SWI-4, SWT-5, etc.) as short, medium, or long scale.Given the concerns, some researchers suggested decreasing the number of examined drought aggregation time scales and accompanying series by employing a multivariate approach [100].Notably, a reduction in SWI series does not mean that the SWI for some time scales are not calculated, but rather that a substantial portion of the variability of the SWI on different time scales is summed up into certain series.Thus, the reduction of these drought time scales is computed by using an effective method called principal component analysis (PCA).
PCA is a widely used prominent approach for analyzing datasets with high-dimensional space (features) [101], providing several benefits such as retaining maximum information while reducing data features, increasing data interpretability, and enabling visualization of multidimensional data.PCA displays the following linear combinations of the I original variables: In the above Equation, the PC k is the kth principal component while, E k is the kth eigenvector.The original ith variable is represented in the above formula as X i and finally, the e ik is the ith element of the kth eigenvector.In PCA, linear combinations are constructed from the original variables.These combinations, known as principal components, are formed in such a way that they are mutually uncorrelated, meaning that they do not have any shared information.The number of principal components that can be generated is determined by the number of original variables in the dataset.The first principal component (PC 1 ) can account for a significant proportion of the variance in the original variables.PCA is particularly useful when there is a high correlation between variables in the dataset.A cross-correlation matrix of the predictors must be constructed to perform PCA, from which the eigenvalues and eigenvectors are calculated.The eigenvector corresponding to the highest eigenvalue represents the weighted coefficients of the first principal component, and this process is repeated for subsequent components.
As PC 1 holds the most essential information of the data, the MSWI is derived from this component.However, unlike the standard index (e.g., SWI), which has a mean of zero and a standard deviation of one, PC 1 does not meet these requirements due to its algebraic characteristics, which make its values incomparable across different months or locations.Therefore, the time series of PC 1 must be standardized using the following Equation based on the mean and standard deviation for various months throughout the year.
In the Eq 36, the Z 1YM (MSWI) is the PC 1 standardized value corresponding to Mth month and Yth year, � PC 1M , and SD 1M are the average and PC 1 component for Mth month.Notably, the � PC 1M value is very small and close to zero; therefore, it can be neglected.The data of MSWI is organized in ascending order, and a graphic of its corresponding empirical probability distribution is shown to identify the categories of drought and wet severity (see Fig 5) [102].

Describing droughts in the Great Lakes
The main focus of this study was on two prominent lakes, specifically Lake Superior and Lake Michigan-Huron, within the forecasting models.These lakes were given significant attention due to their larger size and greater depth, distinguishing them from the other lakes.Besides, they are located upstream, as the water flows from them to the rest of the lakes and then to the sea (see A1 Fig in S1 Appendix).The figures attached in the appendix (A2 Fig- A4 Figs in S1 Appendix) show that in the last fifty years, the three lakes (Lake Erie, Lake Ontario, and Lake St. Clair) did not witness any severe drought events and that the lowest value of the drought indicator is around -0.5.The reason may be related to the dams constructed and the lakes' applied regulation process.Notably, Lake Superior and Lake Ontario are currently regulated under some plans implemented by the International Joint Commission.Since 1921, Lake Superior's levels have been regulated, while the water level of Lake Ontario was regulated in 1958.Because there is a significant difference in the elevation, the regulation process of Lake Ontario does not affect the upper lakes' water level.Also, the regulation of Lake Superior has an important influence on the whole Great Lakes System.Notably, the detailed drought analysis via MSWI for all lakes can be found in B2 Table in S2 File, which is located in Appendix B.
According to Fig 6, there is a large variation in the monthly levels of Lake Superior compared to Lake Michigan-Huron.Both lakes have a large area, but the first one is located upstream, and the water flows from it to the other lakes, which significantly varies the monthly levels of stored water.Based on Fig 6C , and 6F, both lakes suffered from severe drought from mid-1921 to 1929 and late 1998 to 2015.Lake Michigan-Huron generally experienced more frequent drought events than Lake Superior; however, the latter faced the severest drought.The analysis also found that Lake Michigan-Huron experienced four severe drought events between 1920 and 2020, while Lake Superior Lake has only faced two drought events during the same period.The droughts of the 1921s and 1929s were primarily caused by reduced precipitation, whereas the drought of the 2000s was mainly attributed to the elevated water temperatures induced by climate change, resulting in enhanced evaporation due to a substantial reduction in winter ice cove [60].It is essential to mention that the MSWI can explain more than 95% of the variability for the nominated SWI time scales in the observed lakes.
The temporal analysis of drought intensity in the Great Lakes region, as depicted in Fig 7, unveils a dynamic pattern of fluctuating drought conditions from 1921 to 2021.Notably, Lake Superior and Lake Michigan-Huron emerge as critical focal points with notable variations in drought intensity.Conversely, Lake Ontario transitions from drier to wetter conditions, indicating an overall increase in drought values (MSWI) and a positive drought trend.Similarly, Lake Erie experiences a substantial transition towards positive drought conditions.Lake St. Clair also exhibits patterns consistent with the other lakes.This comprehensive analysis demonstrates the diverse drought intensities observed across the Great Lakes over the analyzed period, particularly emphasizing the significant decrease in drought observed in Lake Superior and Lake Michigan-Huron from 1998 to 2021.This pronounced finding highlights a significant shift in drought conditions for these lakes, underscoring the region's evolving nature of drought dynamics.

Results and discussion
The results section is divided into two main scenarios.In the first scenario, the RELM-BWO model is validated against standalone models, while the second scenario compares the performance of the RELM-BWO model with other hybrid models.Since severe drought periods were observed in only Lake Superior and Lake Michigan-Huron over the past fifteen years, the https://doi.org/10.1371/journal.pone.0290891.g007models were compared, and the best performing one was selected to forecast drought over the remaining three lakes.

First Scenario: A comparison of RELM-BWO with classical models
5.1.1Modelling results.This section discusses the evaluated perfo7rmance of four applied models (RF, SVR, RLM, and RELM-BWO) in forecasting MSWI time series for different lead times using various statistical measures and comparable graphs for Lake Superior and Lake Michigan-Huron.The performance measures over the testing phase for both lakes are provided in Tables 2 and 3.The provided assessment shows that the best forecasting result for Lake Superior was obtained for RELM-BWO (R 2 = 0.9687, RMSE = 0.0202, and MAE = 0.0156) compared to the other drought forecasting models.Similarly, the adopted model performs better in forecasting one-step month ahead for Lake Michigan-Huron, attaining higher accuracy (R 2 = 0.9825, RMSE = 0.0117, and MAE = 0.0093).Generally, the quantitative assessment showed that the performance of the forecasting models in Lake Superior is slightly lower than in Lake Michigan-Huron.Furthermore, the second-best model for forecasting drought is ELM, followed by SVR and the RF shows the poorest performance.
The evaluated results showed that the forecasting accuracy of the models reduces when the lead time increases.The forecasting results generally deteriorate with the increase in forecasting multistep month ahead drought due to accumulated errors.For example, for Lake Superior, the RF model is considered among the models the most influenced by the change of time interval where the MARE% varies from 32.14 to 56.20% followed by SVR (5.53 to 35.74%), ELM (2.46 to 36.72%), and RELM-BWO (2 to 30.62%).Regarding Lake Michigan-Huron, all models, except the RF, suffered from a gradual decrease in performance with the increase in lead time.It can be observed that these models have similar characteristics in forecasting the drought for both lakes, producing the peak errors in the forecasting drought for a six-month ahead lead time.The RF has an abnormal performance for forecasting drought at Lake Michigan-Huron as it generates the maximum error at the fourth-step lead time.However, the RF has the lowest forecasting accuracy for all time intervals compared to other models for both lakes.
The quantitative analysis, in general, shows that the most difficult case for the adopted models is forecasting drought at a six-period lead time.In other words, forecasting drought for the next six months is a major challenge for the proposed models because of the loss of important information as well as the sharp changes that occur in drought patterns through such a period, which was revealed by the analyzes previously reported, so it is necessary to test the efficacy of the models in this case.Accordingly, scatter plot diagrams, as shown in Figs 8 and 9 for both lakes, are created.Such a diagram allows us to visualize each data record and provides much information on how the forecasting values deviate from their corresponding actual values.The figures show that the suggested model offers fewer scatter points than comparable models.It is significant to conclude that the forecasting accuracy of Lake Michigan-Huron is higher than Lake Superior.The reasons may relate to the training conditions of the forecasting models.In this study, the models were calibrated using 75% of the drought time series data, while the remaining 25% of the hydrological drought data was reserved for testing purposes.The training data for Lake Michigan-Huron includes many positive and negative drought values, and the characteristics of the training data are more similar to the test data compared to Lake Superior (see Fig 6C and 6F).In more detail, the training data for Lake Superior contained a small percentage (5.88%) of the severe drought data (MSWI � -1), while that value approximately tripled (16.32%) when it came to Lake Michigan-Huron.To sum it up, the presence of multiple periods of severe drought conditions enhances the chance of obtaining accurate forecasts.Conversely, when such observations are few in the training data, it is normal for a forecasting model to observe a decrease in accuracy, especially when asked to simulate severe drought patterns.
Assessing the efficacy of the proposed models is crucial, particularly during the severe drought period (MSWI � -1).The RE% values for each model are calculated (as illustrated in Fig 10) across various lead-time intervals to determine the practical benefits of the RELM-BWO model concerning its forecasting precision and performance compared to other models.The proposed RELM-BWO model generally outperforms other comparable methods, with the median relative error (depicted as a horizontal line within the boxplot) being lower.

Analyzing forecasts of turning points.
The performance metrics listed in Tables 2 and 3 evaluate a model's overall performance by considering all predicted points.However, when forecasting drought, metrics such as RMSE, R 2 , and others may not be ineffective indicators due to high auto-correlation.An overly smoothed time series is unsuitable for testing a prediction model's effectiveness or robustness due to high auto-correlation.This can lead to R 2 values exceeding 0.9 for all models, which may not be significant for drought forecasting unless the model can accurately forecast turning points.Hence, assessing the error in turning points of the drought time series data is critical to gauge the effectiveness of models used in forecasting drought.
The primary objective of this study's method was to evaluate the model's ability to accurately simulate turning points in the drought data.An illustration of the process for identifying critical and turning points in time series data can be found in Fig 11A .Once the these points were detected, the corresponding observations from multiple models were collected.The final step involved calculating the average absolute error of the turning and critical points (AAETP) for each model separately.It is significant to mention that the suggested process was conducted for both lakes to assess the model's predictability in forecasting hydrological drought up to six months in advance.

AAETP ¼ average ðjE i jÞ ð37Þ
where E i is a vector is computed for each model by subtracting the actual values of the critical and turning points from the values forecasted by the models.
In Fig 11B and 11C, it was demonstrated that the RELM-BWO model outperformed other models in forecasting turning points and yielded fewer AAETP values.The RELM-BWO model's effectiveness is measured by its ability to decrease the AAETP in drought forecast for both lakes at different lead times.The study's findings show that the RELM-BWO model outperforms traditional models like SVR, RF, and ELM for Lake Superior, resulting in a 37.33%, 76.74%, and 32.21% improvement in forecasting accuracy.However, the accuracy of drought forecasting for Lake Michigan-Huron decreases slightly, but the RELM-BWO model still outperforms SVR, RF, and ELM by reducing forecasting errors by 16.18%, 56.27%, and 22.93%, respectively.

Second Scenario: Validation of the hybrid models
This section presents a comparison of four hybrid models, namely RELM-BWO, SVR-BWO, RF-BWO, and ELM-BWO, to forecast drought conditions over one to six months in advance for the two largest lakes.The performance of each model in forecasting drought, considering various lead times, is demonstrated in Table 4.The statistical analysis revealed that the RELM-BWO model outperforms other models in terms of drought forecasting, exhibiting the lowest values for MAE (ranging from 0.0156 to 0.1524), RMSE (ranging from 0.020 to 0.191), MAPE (ranging from 2 to 30.62%), and highest R 2 (ranging from 0.9179 to 0.9687) for Lake Superior.Similarly, for Lake Michigan-Huron, RELM-BWO demonstrates the lowest values for MAE (ranging from 0.0093 to 0.0812), RMSE (ranging from 0.0117 to 0.1005), MAPE (ranging from 2.82 to 15.45%), and highest R 2 (ranging from 0.9639 to 9825).Overall, the RELM-BWO model is considered the best-performing model, followed by ELM-BWO, SVR-BWO and RF-BWO, respectively.
The RELM-BWO model's accuracy has been tested against other hybrid models by measuring its ability to reduce the AAETP indicator for various lead times in the selected lakes.The overall results, which are presented in Table 4, demonstrate that the RELM-BWO model outperforms the ELM-BWO, SVR-BWO, and RF-BWO models, respectively, for Lake Superior.This results in a significant improvement in forecasting accuracy of 17.27%, 29.65%, and 54.69%.Similarly, for Lake Michigan-Huron, the RELM-BWO model was able to significantly decrease the AAETP indicator percentage values, resulting in an 11.19%, 7.21%, and 39.35% improvement compared to the ELM-BWO, SVR-BWO, and RF-BWO models, respectively.
The accuracy of drought forecasting six months in advance, as exemplified by the hybrid models, has been visually analyzed using Taylor plots (refer to Fig 12) and Violin plots (refer to Fig 13).Taylor plots reveal that the RELM-BWO model demonstrates a closer standard deviation to the calculated drought than other models, with the highest R 2 value and the lowest RMSE for both lakes.Evaluating the effectiveness of these models, especially during critical

Validation of RELM-BWO: A comparative analysis against previous models in literature
A more thorough analysis is needed to confirm the superiority of the proposed RELM-BWO model in forecasting drought one month ahead.This requires verifying the accuracy of the RELM-BWO model by comparing its performance with effective models developed in previous studies.One such study employed sophisticated models that combined wavelet transform (WA) with ANFIS, resulting in the WANFIS model [103].The study's conclusion indicates that the WANFIS model performs satisfactorily in forecasting mid-term drought, achieving the highest accuracy with an R 2 value of 0.9133.Another study examined a model created by merging WA and ANN, which achieved an impressive R 2 of 0.938 [104].Additionally, an innovative approach was developed by combining LSTM and ARIMA models to enhance the accuracy of hydrological drought forecasts [105].The integration of these models resulted in a hybrid model that achieved an impressive R 2 score exceeding 0.91.Furthermore, a separate study utilized an advanced forecasting model that combined Online Sequential ELM with two decomposing filters, namely "Kalman filter regression and coupled with the boundary corrected maximal overlap discrete wavelet transform" [1], to forecast drought in Iran and the model demonstrated a high accuracy with an R 2 of 0.93.Furthermore, a hybrid model incorporating ANN, SARIMA, and a genetic algorithm was employed to forecast hydrological drought, yielding a good R 2 of 0.945 [106].
Other researchers have introduced advanced models using Fuzzy-SVR, Boosted-SVR, and classical SVR for drought forecasting, with the Fuzzy-SVR model performing the best with an R 2 of 0.903 [107].Additionally, other researchers have examined individual models for drought forecasting, such as ANN, achieving good results (R 2 > 0.9) [62].Lastly, a sophisticated combined model was developed by integrating WA, ARIMA, and ANN, resulting in a higher R 2 of 0.872 [108].In summary, the models assessed demonstrated satisfactory accuracy, with R 2 values ranging from 0.872 to 0.945.However, these models' forecasting accuracy remains lower than that of the RELM-SO model, which boasts a superior forecasting accuracy (R 2 > 0.968).

Results of uncertainty analysis (UA)
According to the previous discussions and the results presented, it can be confirmed that the RELM-BWO has the best performance among the employed models, obtaining the most accurate forecasts.Even though this model is superior, the forecasting accuracy gradually decreases when the lead time increases.Hence, it is critical to determine the maximum lead time for reliable drought forecasting that the proposed model can provide.Therefore, it is necessary to conduct the uncertainty analysis to ensure the reliability and validity of the results obtained.
UA results provide much information on how the variations of forecasting results take place due to the input variations.The robust model has solid stability in its outcomes when there are a few chances in the input variables.In this study, the uncertainty analysis was performed for the best models (RELM-BWO) throughout the testing phase.The values bracketed by lower and upper 95PPU for both lakes exceeded 88.5% (see Fig 14).Notably if the value bracketed by 95PPU is more than 80%, the models have produced good forecasting [109].However, Fig 15 shows that the d-factor values generally have a gradual increase when the lead time increases.For Lake Superior, the model provides excellent forecasting for the one-and two-month step-ahead drought because the forecasting results have less uncertainty with d-factor values ranging from 0.493 to 0.595.Conversely, the model provides a higher uncertainty (d-factor is more than one) in three-, four-, five, and six-month ahead, indicating that the drought forecasting results in these time intervals are not good, and the model performs poorly.According to the same Figure, it can be observed that the forecasting accuracy of Lake Michigan-Huron is slightly better than Lake Superior because it provides a lower value of the d-factor, supporting outcomes of previously described statistical indicators.The slight discrepancy in the accuracy of drought prediction may be due to the different hydrological characteristics of the two studied lakes.Other factors, such as the volume, surface area of the studied lakes, and regulation operations, may affect the fluctuation of the levels of those lakes and consequently influence the drought patterns.It can be said that the suggested model has a solid level of accuracy when forecasting a drought of one-, two-, and three-month ahead because the d-factor (less than one) ranges from 0.403 to 0.885.
Forecasting hydrological drought for extended time horizons is challenging due to various factors that can increase uncertainty and errors in the forecast.As the forecasting horizon increases, the relationship between inputs and outputs becomes more complicated, hindering prediction accuracy.This increasing complexity often results in higher levels of uncertainty, making it more challenging to predict future values accurately.Other factors impacting forecast accuracy include changes in underlying data patterns, seasonal effects, and unexpected events that affect data patterns.

Investigating the use of RELM-BWO for forecasting drought in Lake
Ontario, Lake Erie, and Lake St. Clair It was also demonstrated that the proposed model (RELM-BWO) effectively forecasts drought conditions in two greatest lakes.Therefore, it is crucial to utilize this model for forecasting droughts in the remaining three lakes (Lake Ontario, Lake Erie, and Lake St. Clair) that have not experienced severe droughts in the past fifty years.The performance metrics of the model's drought forecasts, ranging from one to six months in advance for all lakes, are provided in Table 5.The quantitative results indicate that the model exhibits the highest accuracy in forecasting droughts for Lake St. Clair (MAE ranging from 0.0184 to 0.1139, RMSE ranging from 0.0240 to 0.1383, and MAPE ranging from 5.96 to 33.75%).The model also demonstrates the lowest AAETP values (ranging from 0.0178 to 0.1091) and the highest R 2 values of 0.9905 to 0.9635).The model's accuracy is satisfactory for the other two lakes, with Lake Ontario exhibiting the least accurate forecasts compared to the other lakes due to higher uncertainty values.Uncertainty analysis shows that the model reliably forecasts droughts up to six months in advance, except for Lake Ontario, where dependable forecasts are generated only up to three months ahead.

Discussion
The study's results indicate that including the applied metaheuristic algorithm (BWO) significantly improves the accuracy of classical approaches like RF, SVR, and ELM in drought forecasting across different lead times.These results are consistent with previous research [51,110], indicating that employing hybridized machine learning methods improves forecast accuracy and overall predictive performance.By utilizing the BWO algorithm to optimize hyperparameters of standalone models, the forecasting of MSWI in the studied lakes becomes more accurate.Both statistical indices and visual inspections confirm that the BWO algorithm significantly enhances the accuracy of all models when forecasting hydrological drought based on MSWI.The BWO algorithm provides precise parameters to the models, resulting in an increase in their generalization capacity.The RELM-BWO model emerges as the most effective among the hybrid models tested.The study illustrates that the optimal computation of the regularization parameter using the BWO algorithm has made the RELM-BWO model superior to the ELM-BWO model.By enhancing accuracy and performance, the BWO algorithm solidifies the RELM-BWO model as the top-performing hybrid model for MSWI-based drought forecasting.
As the forecasting lead time increases, the accuracy of all hybrid models gradually decreases, making it challenging to select reliable forecasts.To address this, uncertainty analysis proves to be an effective tool for identifying reliable forecasts.The study reveals that some lakes can reliably forecast drought up to 2 months ahead, while others require 3-or 6-months lead time.The varying accuracy may be attributed to the data quality, indicating that models trained with different drought conditions yield more accurate forecasting results.

Conclusion
This study utilized four hybrid ML-based models (SVR-BWO, ELM-BWO, RF-BWO, and RELM-BWO) and three classical models (ELM, RF, and SVR) to forecast drought in the Great Lakes region for several months in advance (1 to 6 months).Drought calculations were performed via MSWI using water level data from 1918 to 2020.The dataset was split into a training set (December 1921 to January 1997) and a testing set (February 1997 to December 2021) for model development.Over the past 50 years, severe drought events have been limited to Lake Superior and Lake Michigan-Huron.Thus, the selected models were employed to forecast droughts for several months in advance, specifically for these two lakes, and the best model was used to forecast the drought for the other three lakes.To assess the models' ability to capture turning points, a new indicator called AAETP was introduced, as traditional criteria (RMSE and R 2 ) were sometimes insufficient due to high auto-correlation in the drought dataset.The hybrid models, with BWO optimization, outperformed the classical models.Among them, RELM-BWO was the most accurate and reliable model.Evaluating the recommended model's effectiveness in reducing AAETP for various lead times, it was observed that, in the case of Lake Superior, it achieved substantial improvements of 37.33%, 76.74%, and 32.21% compared to SVR, RF, and ELM, respectively.Similarly, for Lake Michigan-Huron, the RELM-BWO model demonstrated reductions of 16.18%, 56.27%, and 22.93% in AAETP compared to SVR, RF, and ELM, respectively.Also, the RELM-BWO consistently outperformed other hybrid models, with enhancements of 17.27% to 54.69% for Lake Superior and 7.21% to 39.35% for Lake Michigan-Huron.Generally, as the forecast lead time increased, all models' accuracy declined.However, the proposed model showed a slightly smaller accuracy deterioration than the others.To ensure the reliability of the forecasting results, uncertainty analysis was conducted using d-factor and 95PPU criteria.The analysis exhibited that RELM-BWO reliably forecasts droughts up to two months in advance for Lake Superior and up to three months ahead for Lake Michigan-Huron.Furthermore, uncertainty analysis confirmed that RELM-BWO can estimate droughts up to six months in advance for Lake Erie and Lake St. Clair, and up to three months ahead for Lake Ontario.
Fig 1 provides the random forest model's flowchart.

Fig 6 .
Fig 6.Water level elevation VS drought.a), and d) Boxplot diagrams showing the monthly variation of water level for Lake Superior and Lake Michigan-Huron.b), and c) line graphs showing the yearly water level elevation while c), and f) presenting the MSWI from 1920 to 2020.https://doi.org/10.1371/journal.pone.0290891.g006

Fig 11 .
Fig 11.Analysis of the errors in the turning and critical points at different lead time forecasting.a) an example of detecting the turning points, b) results of AAETP for Lake Superior, and c) results of AAETP for Lake Michigan-Huron.https://doi.org/10.1371/journal.pone.0290891.g011 drought periods (MSWI � -1), is essential.The RE% values of each model have been calculated (as depicted in Fig 13) to ascertain the practical advantages of the RELM-BWO model in terms of its forecasting accuracy and performance compared to other models.The proposed RELM-BWO model generally outperforms hybrid models, with a lower median relative error (indicated by the white hollow point near the Figure's centre) and a smaller interquartile range (a grey line at the Figure's core).

Fig 15 .
Fig 15.Values of d-factor showing the reliable accuracy of forecasted drought event based on lead time intervals for Lake Superior and Lake Michigan-Huron.https://doi.org/10.1371/journal.pone.0290891.g015

Table 5 . Performance indicators of the hybrid model for Lake Ontario, Lake Erie, and Lake St. Clair, the use of bold numbers indicates that the forecasting is deemed unreliable due to significant uncertainty in the projected values.
https://doi.org/10.1371/journal.pone.0290891.t005