Drought index prediction using advanced fuzzy logic model: Regional case study over Kumaon in India

A new version of the fuzzy logic model, called the co-active neuro fuzzy inference system (CANFIS), is introduced for predicting standardized precipitation index (SPI). Multiple scales of drought information at six meteorological stations located in Uttarakhand State, India, are used. Different lead times of SPI were computed for prediction, including 1, 3, 6, 9, 12, and 24 months, with inputs abstracted by autocorrelation function (ACF) and partial-ACF (PACF) analysis at 5% significance level. The proposed CANFIS model was validated against two models: classical artificial intelligence model (e.g., multilayer perceptron neural network (MLPNN)) and regression model (e.g., multiple linear regression (MLR)). Several performance evaluation metrices (root mean square error, Nash-Sutcliffe efficiency, coefficient of correlation, and Willmott index), and graphical visualizations (scatter plot and Taylor diagram) were computed for the evaluation of model performance. Results indicated that the CANFIS model predicted the SPI better than the other models and prediction results were different for different meteorological stations. The proposed model can build a reliable expert intelligent system for predicting meteorological drought at multi-time scales and decision making for remedial schemes to cope with meteorological drought at the study stations and can help to maintain sustainable water resources management.


Introduction
Drought is among the natural hazards and a recurrent climatic feature observed in most climatic regions in the world. Factors determining the impact of drought include its severity, areal extent, frequency and duration [1]. Drought, as one of the environmental disasters, has expose the better performance of the SPA-LOPA model in the evaluation for independent protection layers of the BG system. Recently, a number of studies have used ML models for predicting meteorological droughts using various drought indices. Mokhtarzad et al. [17] evaluated the possibility of using ANN, ANFIS, & SVM models for prediction of meteorological drought at Bojnourd, Tehran based on SPI. They confirmed the capability of the SVM model over other models. Nguyen et al. [18] assessed the ANFIS model for meteorological drought prediction using SPI and SPEI in Khanhhoa Province Vietnam. Results showed SPI & SPEI suitable for the prediction task in the study region using the ANFIS model. Zhang et al. [19] forecasted drought using the ARIMA, ANN, WA-ANN, and SVR models using 3-and 6-month SPI values in the Haihe River basin, China. The forecasted results of SPI-3 and SPI-6 revealed that the WA-ANN model better predicted than did the ANN model. Ali et al. [20] focused on multi-scalar SPIbased meteorological drought prediction in Pakistan using three different models (M5Tree, ensemble-ANFIS, & minimax probability machine regression (MPMR)). From the results, the ensemble-ANFIS model was found to outperform the other models in predicting SPI 6 & SPI 12 compared to SPI 3 prediction. Liu et al. [21] applied ELM, online sequential ELM (OS-ELM), and self-adaptive evolutionary ELM (SAE-ELM) for drought forecasting based on SPI and SPEI in Khanhhoa Province, Vietnam. The study reported the SAE-ELM models to perform best compared to the other models. Mouatadid et al. [22] applied MLR, ELM, LSSVR, and ANN models for drought prediction over eastern Australia using multi-scalar SPI & SPEI. The study reported ELM and ANN models to perform best compared to MLR & LSSVR models in terms of drought prediction. Soh et al. [23] applied the WT-ARIMA-ANN and WT-ANFIS models for meteorological drought forecasting using 1-, 3-, and 6-month SPEI in the Langat River basin, Malaysia. Comparison of results reveals WT-ARIMA-ANN outperformed than the other for SPEI-3 and SPEI-6 prediction in the study region.
According to the literature, the exploration of new reliable and robust version of AI models is still ongoing. Also, AI models behave differently from one region to another. Hence, this is essential to understand the influence of synoptic climatological information on each station. The efficiency of the CANFIS model is investigated for drought index (SPI) forecasting. Two models (i.e., MLPNN and MLR) are developed for validation. Six meteorological stations, including Almora, Bageshwar, Champawat, Nainital, Pithoragarh, and Pantnagar, were selected for meteorological drought prediction, based on multiple SPI lead times (e.g., SPI-1, SPI-3, SPI-6, SPI-12, & SPI-24). Statistical modeling techniques (i.e., ACF and PACF) were employed for the abstraction of input based on correlated lag months.

Case study region and data description
The present study was conducted at six meteorological stations; Almora, Bageshwar, Champawat, Nainital, Pithoragarh, and Pantnagar positioned in the Kumaon region of Uttarakhand State, India (Fig 1, https://www.diva-gis.org/gdata). The altitude of the Kumaon region varies from 223m to 3669m above MSL with the geographical area of 21313 km 2 . Table 1 presents altitude, latitude, longitude, and data available all through the year in the region. The Uttarakhand State (28˚43' N to 31˚28' N latitudes, and 77˚34' E to 81˚03' E longitudes) sharing its northwest boundary with Himachal Pradesh, South boundary with Uttar Pradesh, the southeast boundary with Nepal, and the northeast boundary with China. The altitudes of Uttarakhand State ranges from 145m to 7796m above MSL and comprises with 13 districts, clustered into 2 administrative regions, (i) Garhwal region with 7 districts (Haridwar, Tehri Garhwal, Pauri Garhwal, Chamoli, Dehradun, Rudraprayag, and Uttarkashi), and (ii) Kumaon region with 6 districts (Almora, Bageshwar, Champawat, Nainital, Pithoragarh and Udham Singh Nagar (Pantnagar)). It is characterized by temperate climate, although the plains have a tropical climate, which has a temperature range of -0 to 43˚C with annual rainfall ranging from 260-3955 mm. Major rainfall events (60 to 85% of the annual total) have occurred from June to September (monsoon season).
The monthly scale if weather data (i.e., rainfall) for 5 stations; Almora, Bageshwar, Champawat, Nainital, and Pithoragarh were acquired from the Indian Meteorological Department

Calculation of the SPI
The standard index for defining, monitoring and analysing the meteorological drought (MD) conditions on multi-time scales is SPI, discovered by McKee et al. [24]. More than (� 30) years monthly precipitation data is required for computation of SPI for a given time-scale at any place by transforming the original precipitation series into a standardized normal distribution. Three probability distributions; normal, lognormal, and gamma were applied to the running sum of 1-, 3-, 6-, 9-, 12-, and 24-month rainfall series, and out of these three bests, one was decided though KS (Kolmogorov-Smirnov) test. The KS test revealed the gamma probability distributions fitted well to the running sum series of rainfall data. In the current study, the computation of SPI involved the use of gamma distribution at 1-, 3-, 6-, 9-, 12-, and 24-month time-scales over Almora, Bageshwar, Champawat, Nainital, Pithoragarh, and Pantnagar stations. For more information on the mathematical calculation of the SPI, one can refer to [25][26][27][28].

Co-active neuro-fuzzy inference system (CANFIS)
Jang et al. [29] invented the basic concept of CANFIS model by extending the adaptive neurofuzzy inference system (ANFIS) to produce multiple outputs. It may be used as universal approximator of any nonlinear function. The CANFIS model assimilates the features of a fuzzy inference system (FIS) and artificial neural network (ANN) together in a single frame to process the complex systems rapidly and accurately. The dominant potential of CANFIS model stems from the pattern-dependent weights between the consequent layer and the fuzzy association layer. Fig 2a and 2b demonstrate the assembly of membership functions (MF) and CANFIS model with two input variables (x and y), one output (c), under first-order Takagi-Sugeno-Kang (TSK) model with IF-THEN for CANFIS model is as follows [30,31]: where, A 1 , A 2, and B 1 , B 2 = the MFs for the inputs x and y, respectively; p 1 , q 1 , r 1 and p 2 , q 2 , r 2 = the parameters of the consequent part (Fig 2a). The characteristics of each layer is described as follows: Layer 1 (fuzzification layer): The nodes of this layer are adaptive (square), generates membership function (or grades) of crisp input and each node output is computed as: where, O 1,i = the output of the ith layer, A i and B i = the linguistic labels (small, medium, large etc.), x and y = the inputs to ith node, and m A i and m B i = the membership functions for A i and B i linguistic labels, respectively. The mathematical expression of the Gaussian MF is written as: where, d and σ are the conditional parameters of the function. The parameters of this layer are stated as premise parameters.
Layer 2 (rule layer): this node is circular and facilitated with P operator. The output of this layer, called firing strengths, is the product of corresponding signals obtained from layer 1. For example: Layer 3 (normalization layer): this layer is circular and characterized by an N operator. The main purpose of this layer is to normalize the signal of the previous layer and facilitated as normalized firing strength by: Layer 4 (defuzzification layer): every node in this layer is square, and the parameters of this layer are mentioned as consequent parameters. The contribution of ith rule towards the total output is computed by Eq (8): Layer 5 (summation layer): this layer is also known as output node, labeled as S. In this node the overall output is computed by summing all the incoming signals: In this research, the CANFIS model was formulated with error-and-trail procedure using gaussian (Gauss) MF, TSK fuzzy model, hyperbolic tangent (Tanh) activation function, and delta-bar-delta (D-B-D) learning algorithm for multi-scalar SPI prediction at six study stations. NeuroSolutions 5.0 software [32] was utilized to calibrate (train) the CANFIS model with a threshold of 0.001 for 1000 iterations.

Multi-layer perceptron neural network (MLPNN) model
Haykin [33] was the first scholar introduced the concept of the MLPNN model. MLPNN model is a network of several layers of parallel processing units called neurons. In the MLPNN model, each layer is linked to the subsequent layer via interconnections called weights (W). A typical illustration of the feed forward MLPNN model, which consists of input (i), hidden (j) and output (k) layers through interconnected weights (Wij & Wjk) among the neuron layers is shown in Fig 3. The exact number of neurons and hidden layers are required for accurate mapping of the entire training dataset, which is problem-specific (the number of predictors and predictands). The correction of values of the initially estimated weights is progressively done through training by matching the predicted output with the pre-determined through backpropagation [34]. The explicit expression for an output value in the MLPNN model is written as: where, Y is the output vector, Wij is the weight in hidden layer connecting the i th neuron in the input layer and j th neuron in the hidden layer, W jk is the weight in the output layer connecting the j th neuron in the hidden layer and k th neuron in the output layer, X i is i th input variable for input-layer, N i and N j are the neurons in the input and hidden layers, and f j and f k are activation function of hidden and output layer neurons, expressed as A supervised learning approach, which contains three layers of input/hidden/output, was used to design the architecture of the MLPNN model. Data normalization was realized using the Tanh activation function (varies from -1 to 1) with the D-B-D learning algorithm. This technique was considered fairly because of its quickness and robustness compared to the traditional gradient descent. Regarding the hidden layer, the optimal size of neurons was decided through 2n + 1 concept provided by [35,36]; here, n represents the number of inputs. The training of the MLPNN model was terminated after reaching 1000 epochs with a 0.001 threshold value. The designed MLPNN model was applied at different locations for MD prediction.

Multiple linear regression (MLR) model
Among several well-established regression models within the field of hydrology and climate MLR model is implemented widely [22]. The MLR model was selected as a second model to validate the capacity of the CANFIS model to predict the multi-scalar SPI. The MLR model module the collinearity among one target (dependent) variable and several (two or more) independent variables [37,38]. The regression equation of the MLR model can be written as: where, SPI = the target variable at multi-time scales, SPI t−1 , SPI t−2 to SPI t−n are input parameters, w 0 is the intercept of the MLR equation, and w 1 to w k are the weights of the MLR equation.

Optimal input nomination and model development
Nominating the appropriate input-output variables for modeling nonlinear hydrological processes is a tedious task. In this research, long-term monthly rainfall data were utilized to compute multi-time scale SPI (i.e., 1, 3, 6, 9, 12 and 24-month). The ACF and PACF analysis were performed for picking up the optimal inputs (significant lags) for target output [39][40][41]. The ACF and PACF are calculated using the Eqs 13 and 14: where, N is the multi-scalar SPI observation in entire series, Y t and � Y are the mean whole series, and k is the lag through series. Afterward, these PACF values were tested at 5% significance level (SL) by constructing the upper and lower critical limits (UCL and LCL) by Eq (15): Figs 4a-4f to 9a-9f demonstrate the PACF results of multi-scalar SPI at Almora, Bageshwar, Champawat, Nainital, Pithoragarh, and Pantnagar stations, respectively. The dotted red line in these figures indicates the UCL and LCL at 5% SL if PACF value crosses these limits counted statistically significant, and utilized for CANFIS, MLPNN, and MLR models development. Table 2 provides the details of developed models with inputs and outputs, while Table 3 summarizes the details of training (70%) and testing (30%) datasets percentages of multi-scalar SPI utilized by CANFIS, MLPNN and MLR models for MD prediction at six different study stations.

Performance evaluation metrics
The predictive performance of proposed and other models (i.e., CANFIS, MLPNN, and MLR) were examined by using several performance evaluation metrices; the RMSE (root mean square error), NSE (Nash-Sutcliffe efficiency), COC (coefficient of correlation), and WI (Willmott index) [42], and by pictorial inspection through scatter plot and Taylor diagram [43]. Their mathematical expression can be written as:

PLOS ONE
Drought index prediction using advanced fuzzy logic model

PLOS ONE
Drought index prediction using advanced fuzzy logic model

PLOS ONE
Drought index prediction using advanced fuzzy logic model Table 2. Output-input relationship of SPI for prediction using CANFIS, MLPNN and MLR models at study stations.

SPI-24 SPI-24 t-1, SPI-24 t-2, SPI-24 t-3, SPI-24 t-12
https://doi.org/10.1371/journal.pone.0233280.t002 2. Nash-Sutcliffe efficiency [46]: 3. Coefficient of correlation [47,48]: cal;i À SPI cal ÞðSPI pre;i À SPI pre Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 4. Willmott index [49,50]: where, SPI pre and SPI cal are the predicted and calculated multi time scale SPI values for the i th dataset, SPI pre and SPI cal are the average of predicted and calculated multi time scale SPI values, jSPI pre;i À SPI cal j represent the absolute difference between predicted and calculated mean values, jSPI cal;i À SPI cal j represent the absolute difference between calculated and their mean values, and N is the total number of observations in a dataset.

Results of application and discussion
The SPI was computed at multi time scales (1,3,6,9,12, and 24-months) for meteorological drought (MD) prediction in the Kumaon region by the application of relatively new AI model called CANFIS. Two predictive models (i.e., MLPNN and MLR) were established for validation. Six meteorological stations, including Almora, Bageshwar, Champawat, Nainital, Pithoragarh, and Pantnagar, were used for modeling. Optimal inputs (lags) were nominated through PACF at 5% SL for all SPI scales. Then models were evaluated statistically and graphically. The model having minimal absolute error measures (RMSE) and highest (NSE, COC, and WI) best-goodness-of-fit over the testing phase recognized healthier model for MD prediction over the study area. The MD prediction results of applied AI models are discussed in the following sub-sequent section. The MD condition was predicted by finding the suitability of CANFIS, MLPNN and MLR models for all SPI scales at six study stations. All the formulated models were trained with 70% dataset, whereas the remaining 30% dataset was used for testing. Tables 4, 5 -24), and Pithoragarh (SPI-12, and SPI-24). Figs 10a-10f to 15a-15f illustrate the temporal variation "scatter plot" among predicted vs calculated multi-time scale SPI observation generated by applied models (i.e., CANFIS, MLPNN and MLR) during the testing phase at six study stations. As seen from these figures the estimates of CANFIS model are adjacent to the 1:1 (best fit) line for SPI-1, SPI-3, SPI-6, SPI-9, and SPI-12 at Almora and Champawat stations, for SPI all scales at Bageshwar and Pantnagar stations, for SPI-1, SPI-3, SPI-6, SPI-9, and SPI-24 at Nainital station, and for SPI-6 and SPI-9 at Pithoragarh station. Additionally, these figures also show a similar pattern of results as mentioned Tables 4 to 6.
Taylor diagram [43] concept was utilized to map the spatial pattern of calculated (reference field) vs predicted (test field) multi-time scale SPI value by applied models (i.e., CANFIS, MLPNN, and MLR) through the testing phase over the study region. Taylor diagram is a 2-dimensional graphical presentation incorporated the RMSE, correlation coefficient, and standard deviation metrics together in one frame as the polar plot demonstrated in Figs 16a-16f to 21a-21f. It was recorded from these figures that the CANFIS, MLPNN and MLR models have a similar outline of results as observed in Tables 4 to 6 and Figs 10a-10f to 15a-15f. Therefore, it is suggested that the applied models with optimal lags can predict multi-time scale SPI effectively at six study stations.
The viability of relatively new artificial intelligence model called CANFIS model was assessed for predicting the MD at six stations; Almora, Bageshwar, Champawat, Nainital, Pithoragarh, and Pantnagar, based, based on multi-scalar standardized precipitation index (SPI). The input variables were selected, based on statistical analysis (i.e., ACF and PACF) of the most correlated lags to predict multiple SPI scale values. Based on the prediction accuracy of the proposed CANFIS model, the proposed model distinguished itself over the competing MLPNN and the MLR models. The MD prediction by the CANFIS model over the study stations displays the latent of the model (Table 7). It mimicked the actual trend of the SPI in this particular region and demonstrated an intelligent system that can be valuable for water resources managers and policymakers for drought mitigation.  The results of proposed model were compared and validated against the nature-inspired algorithm and stochastic (time-series) model built by numerous drought indices (DIs). For instance, there are studies conducted on the SPI prediction using various versions of AI models [40,[51][52][53][54][55]. Memarian et al. [56] applied the CANFIS model to predict the meteorological drought in Birjand, Iran using global climatic indicators and lagged values of SPI. They found a better predictive capability of the CANFIS model in the study region. Fung et al. [57] forecasted meteorological drought in Langat River basin, Malaysia using hybrid wavelet integrated with boosting-SVR (W-B-SVR), multi-input-fuzzy-SVR (W-MI-F-SVR), and weighted-fuzzy-SVR (W-WF-SVR) models based on 1, 3, and 6-month SPEI. Results reveal the superior multi-scales SPEI was forecasted by the W-WF-SVR model. Kisi et al. [58] examined the potential of hybrid ANFIS-PSO (particle swarm optimization), ANFIS-GA (genetic algorithm), ANFIS-ACO (ant colony optimizer), ANFIS-BOA (butterfly optimization algorithm) against classical ANFIS to forecast the meteorological drought at three synoptic stations located in Iran, based on multi-scalar SPI. They fund the superior performance of hybrid ANFIS models for forecasting SPI 3, SPI 6, SPI 9, and SPI 12 at study stations.
The reported literature evidenced the capability of ML models in drought metrological drought prediction. The overall finding of this research suggested that AI models (i.e., CANFIS & MLPNN) achieved better meteorological drought forecasting at different time scales at the considered stations. As future research devotion, sensitivity analysis can be conducted for the data, input variables and models to investigate the potential source influencing the modeling performance results.

Conclusion
This research implements a relatively new AI model (i.e., CANFIS) to predict meteorological drought using multiple SPI scales at Almora, Bageshwar, Champawat, Nainital, Pithoragarh and Pantnagar stations positioned in the Kumaon region of Uttarakhand State, India. The results yielded by the CANFIS model were compared against the MLPNN and MLR models for each study station through performance evaluation indicators (RMSE, NSE, COC, and WI), and visual explanation (i.e., scatter plot and Taylor diagram). According to the results of comparison, the best model were obtained with Gaussian MFs, TSK fuzzy model, Tanh activation function, D-B-D learning algorithm at Almora and Champawat stations (SPI-1, SPI-3, SPI-6, SPI-9, and SPI-12), at Bageshwar and Pantnagar stations (for all SPI scales), at Nainital station (SPI-1, SPI-3, SPI-6, SPI-9, and SPI-24), and at Pithoragarh station (SPI-6, and SPI-9). Consequently, the MLPNN model achieves the best prediction for SPI-24 (6-13-1) at Almora station, for SPI-12 (6-13-1) at Nainital station, and SPI-1 (6-8-1) and SPI-3 (9-18-1) at Pithoragarh station. The MLR model attains worst prediction at all stations and SPI scales, expect SPI-24 at Champawat station, and SPI-12 and 24 at Pithoragarh station for prediction of meteorological drought. Therefore, this study demonstrates the worth utility machine learning models; CANFIS and MLPNN for the magnificent prediction of current SPI based on antecedent phases. Furthermore, the MD prediction through multi-time scale SPI observations by machine learning models will hydrologists, agriculturists, water managers, and policymakers to project drought mitigation strategy for sustainable planning and management of water resources in the study region.