Figures
Abstract
Accurate forecasts of water demand are a crucial factor in the strategic planning and judicious use of finite water resources within a region, underpinning sustainable socio-economic development. This study aims to compare the applicability of various artificial intelligence models for long-term water demand forecasting across different water use sectors. We utilized the Tuojiang River basin in Sichuan Province as our case study, comparing the performance of five artificial intelligence models: Genetic Algorithm optimized Back Propagation Neural Network (GA-BP), Extreme Learning Machine (ELM), Gaussian Process Regression (GPR), Support Vector Regression (SVR), and Random Forest (RF). These models were employed to predict water demand in the agricultural, industrial, domestic, and ecological sectors using actual water demand data and relevant influential factors from 2005 to 2020. Model performance was evaluated based on the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), with the most effective model used for 2025 water demand projections for each sector within the study area. Our findings reveal that the GPR model demonstrated superior results in predicting water demand for the agricultural, domestic, and ecological sectors, attaining R2 values of 0.9811, 0.9338, and 0.9142 for the respective test sets. Also, the GA-BP model performed optimally in predicting industrial water demand, with an R2 of 0.8580. The identified optimal prediction model provides a useful tool for future long-term water demand forecasting, promoting sustainable water resource management.
Citation: Shu J, Xia X, Han S, He Z, Pan K, Liu B (2024) Long-term water demand forecasting using artificial intelligence models in the Tuojiang River basin, China. PLoS ONE 19(5): e0302558. https://doi.org/10.1371/journal.pone.0302558
Editor: A. L. Mahfoodh, UNITEN: Universiti Tenaga Nasional, MALAYSIA
Received: September 10, 2023; Accepted: April 4, 2024; Published: May 22, 2024
Copyright: © 2024 Shu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by Sichuan Science and Technology Program (2023NSFSC0807), Opening Fund of Sichuan Mineral Resources Research Center (SCKCZY2022-YB017, SCKCZY2022-YB018) and the General Program of Sichuan Center for Disaster Economy Research (ZHJJ2022-YB002). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
As a result of the converging forces of global climate change, rapid economic progression, and accelerated urbanization, numerous countries worldwide are grappling with water scarcity issues [1]. In response, the accurate forecasting of future water demand becomes crucial, equipping regional managers to evaluate prospective supply-demand conditions and devise targeted water management approaches to maximize the long-term benefit of water resources [2]. Typically, water demand forecasts are divided into long-term (forecast intervals exceeding two years), medium-term (intervals between three months and two years), and short-term (periods less than three months) [3]. Pertinently, long-term forecasting facilitates the formulation of effective policies and strategies for the operation and management of water supply systems and the identification of beneficial water conservation measures [4]. Factors influencing water demand forecasting are multifaceted, encompassing economic conditions, policy directives, and residential habits. In the context of long-term forecasting, consideration of an expanded set of influences is necessary, potentially inclusive of macroscopic information such as regional economy, climate, and population [5–7]. Historically, while water demand predictions often focused on total regional or seasonal water use [8, 9], they overlooked the significant sectorial variation in water usage.
Agricultural, industrial, domestic, ecological and other sectors consider different influences on water demand, and the exogenous variables of each sector itself (urban population, industrial structure, cropping structure, etc.) are influenced by human planning and activity, which results in water demand forecasting being a complex system with high uncertainty and ambiguity [9, 10]. Methodologies for the study of this complex system can be divided into three categories: traditional qualitative methods, univariate time series models based on statistical methods and multivariate models based on artificial intelligence [11, 12]. While traditional methods such as exponential forecasting [13], quota methods [14] and trend forecasting [15] can accommodate some of the demand forecasting needs, they usually require detailed measurement data and expert knowledge to update model parameters and structure [16]. Traditional statistical models such as the ARIMA model [17] obtain better forecasting results through the relationship between historical and future values of a single factor, which is simple in principle and easy to implement. However, such methods are suitable for short-term forecasting on a weekly, daily or even hourly basis, which requires high smoothness of the time series and cannot handle time series with non-linear characteristics [16]. On the contrary, data-driven artificial intelligence algorithms such as artificial neural networks [18], support vector machines [19], system dynamics [20] and Kalman filtering [21] are able to explore the logical relationships within the data, with the advantages of high prediction accuracy and fast computing speed. BP neural networks are one of the relatively mature network structures in ANNs. Wu [22] conducted an empirical study on water demand forecasting in Taiyuan City and found that the PCA-BP model was superior in forecasting accuracy to models such as ARIMA, Grey-Markov, and serial regression. Despite its relatively recent introduction, ELM has also been introduced into various fields of forecasting applications. Deo and Şahin [23] showed that the ELM model developed for predicting the monthly effective drought index significantly outperformed the ANN model. Adnan [24] et al. have successfully applied ELM combined with SAMOA, PSOGWO and other meta-heuristic optimization algorithms for monthly flow prediction from local hydrometeorological data. However, the employment of ELM models in water demand forecasting is less common [25]. Methods demanding larger sample sizes often struggle to handle the intricate non-linear mapping relationships between water demand and its influencing factors under conditions of limited samples, given the inherent non-linear and non-smooth characteristics of the available data in the water demand forecasting process [26]. SVR methods, grounded in the principle of structural risk minimization rather than empirical risk minimization [27], resist over-learning when trained on small samples, thereby demonstrating robust generalization capabilities. It has been substantiated that establishing appropriate model parameter settings can significantly enhance SVR prediction accuracy [28]. Despite this, both the ANN and SVR models above function as black box models, meaning they can predict target variable values based on the data, but they cannot illuminate the underlying rules or patterns within the model [29]. In contrast, decision tree algorithms, including Random Forest (RF), offer an alternative to black box models by providing a systematic graphical representation of explanatory variables and their critical values. This approach can effectively distinguish sub-populations with differing behaviors in the target variable (in this case, water demand). RF, in particular, represents an exemplary manifestation of such algorithms and has proven effective in predicting water demand in high-dimensional data [30]. Furthermore, researchers have recently shown increasing interest in probabilistic forecasting methods [31, 32]. Gaussian Process Regression (GPR), a probabilistic forecasting algorithm emerging from statistical learning and Bayesian theory, possesses robust generalization capabilities for modelling interval predictions, handling missing and anomalous data, as well as dealing with both high and small sample issues [33].
Despite the validation of each of the aforementioned methodological models as reliable for water demand forecasting, it is important to note that no single model is universally superior to all others in all cases. It remains necessary to examine each region independently to assess the merits of each model or combination of methods [26]. Therefore, this study aims to compare the applicability of five selected models (GA-BP, ELM, GPR, SVR and RF) for predicting water demand in different water use sectors. The aim is to identify the most appropriate forecasting models for each water use type, taking into account its inherent characteristics and the non-linear relationship between influencing factors and water demand. This will facilitate reliable water demand forecasting for 2025 and provide a basis for regional water resource management. The Tuojiang River Basin in Sichuan Province serves as the study area for this paper, with key influencing factors extracted for four water use sectors: agriculture, industry, domestic and ecological. We aim to build a water demand forecasting model based on the above five AI techniques. The optimal model for each water use sector will be selected using RMSE, MAPE and R2 as evaluation indicators. These will be used to complete the water demand forecasts for the study area in 2025. The paper is organised as follows: Section 1 provides an overview of the current state of research in water demand forecasting. Section 2 provides an overview of the study area. Section 3 establishes water demand impact indicators and briefly describes the underlying principles of the methodology used in this study. Section 4 is the results. Section 5 provides a series of discussions. Section 6 presents the conclusions.
Study area
The Tuojiang River basin (,
) (Fig 1), situated in southwest China, is a first-order tributary of the Yangtze River spanning a length of 627.4 km and covering an area of 27 km2. The main regions through which the Tuojiang River flows include Deyang, Chengdu, Ziyang, Neijiang, Zigong and Luzhou. It provides an important source of water. The terrain of the basin varies from high in the northwest to low in the southeast, displaying a landscape characterized by northern mountainous areas, central plains and low hills, and southern hills. The region’s favorable topographic and climatic conditions support a robust industry and high levels of socio-economic activity, resulting in significant water demand. Simultaneously, the relatively complex natural environment fosters variability in the socio-economic development and industrial structures of the different regions, rendering the Tuojiang River basin an ideal study area for water demand forecasting.
These maps with approval numbers can be accessed free of charge from the agency’s public website GS(2023)2765 (http://bzdt.ch.mnr.gov.cn /browse.html?picId = %224o28b0625501ad13015501ad2bfc2190%22), Sichuan S(2021)00056 (https://scsm.mnr.gov.cn/StandMaps/mapDetails.html).
Data and methodology
This research compared the applicability of two types of methods for water demand forecasting in different water use sectors. They are artificial neural network models (GA-BP, ELM) and nonlinear regression models (GPR, SVR, RF). Fig 2 illustrates the methodology applied in this study. Firstly, data preparation for the study area included establishing the indicator system and collecting data corresponding to the input and output variables. This was followed by data cleaning, regularization, and division into an 80% training set and a 20% testing set. Secondly, Matlab software was utilized to conduct modeling and simulation analyses of the five method categories for different water use sectors. This enabled the acquisition of predicted values for the training and testing sets, and visualization of the fit. The performance of the model methods was then evaluated using R2, RMSE, and MAPE as metrics, leading to the selection of an optimal model for each water use sector. Finally, the smoothing index model, a traditional time series forecasting model, was employed to predict the 2025 impact factor data as input variables for the optimal model. This allowed the prediction of the 2025 agricultural, industrial, domestic, and ecological water demands for the study area.
Data collection and indicators extraction
The data employed in this research comprises annual water demand for agricultural, industrial, domestic, and ecological sectors across six regions in the Tuojiang River Basin, spanning the period 2005 to 2020. The full set of data collected can be found in the first worksheet of the S1 Table. Fig 3 illustrates the variability in water demand within this timeframe for the entire basin, accompanied by data on the structure of water consumption. The Tuojiang River Basin’s total water demand exhibits minor fluctuations between 2005 and 2015, roughly 96×10⁸m3, punctuated by a surge in demand from 2016 to 2018, followed by a declining trend in 2019 and 2020. In terms of water consumption composition, agriculture claims the most substantial and relatively steady share, approximately 60%, succeeded by industrial, domestic, and ecological uses. Industrial water use displays a year-on-year decrease, while domestic water use manifests a pronounced ascending trend. The overall proportion of ecological water usage remains within a narrow range over the years.
Secondary indicators for forecasting water demand in the four sectors—agricultural, industrial, domestic, and ecological—were derived from previous research findings as presented in Table 1. Descriptive statistics of all indicators are shown in Table 2. Upon data collection, further relevant analysis was carried out, as detailed in the RESULTS section.
Predictive models
This section briefly describes the principle and theoretical background of GA-BP due to space limitation, the rest of the methods (ELM, GPR, SVR, RF) are well known and are provided only for the reference of interested readers: ELM [42], GPR [33], SVR [43], RF [44].
The BP Neural Network is a class of neural networks wherein the input sample data is processed forward and the output error is propagated backward. The BP neural networks acquire a priori knowledge via a search procedure targeting the ideal weight set for neuronal connections and threshold values [45]. The recurrent updating of connection weights may be overseen by GA, a global optimization algorithm grounded in evolutionary and natural selection principles, eventually leading to the creation of a set of BP weights conducive to optimal or near-optimal performance of the network structure [46]. The algorithmic flowchart of the GA-BP is shown in Fig 4 below.
Model performance and evaluation
To evaluate the accuracy of the prediction model, statistical indices such as R2, RMSE, and MAPE are employed [13], with their respective calculation formulae provided below:
(1)
(2)
(3)
Where, yk represents the true value of the dependent variable, represents the sample mean,
represents the predicted value, and n represents the number of samples. MAPE employs percentages to gauge the magnitude of deviations, offering ease of understanding and interpretation, and is less susceptible to extreme values. RMSE measures the discrepancy between predicted and true values. It is sensitive to outliers in the data and needs to be paired with the magnitude of true values for interpretation. R2 indicates the goodness of fit; the closer the value is to 1, the higher the explanation power of the independent variable on the dependent variable.
Results
This section presents the results of five distinct models—GA-BP, ELM, GPR, SVR, and RF—employed for forecasting water demand across four different water use sectors. Detailed numerical results can be found in the second through sixth worksheets of the S1 Table. Each model’s performance was assessed using statistical parameters. In each method, 80% of the historical annual water demand data was allocated to the training set, with the remaining 20% assigned to the test set. The gathered data was segmented into four categories: agricultural, industrial, domestic, and ecological. Each category consisted of 16 years of water demand data from six regions, yielding a total of 96 data points. Out of these, 75 data points were used for the training phase, and 21 were used for testing. Prior to model construction, Pearson correlations between the input and output sets were taken into account.
Data preprocessing
Before bringing normalized data into the model for training and testing, it is necessary to clean the data to remove outliers. Outliers may lead to overfitting of the model, but there is a high degree of variability in the level of socio-economic development of the six cities within the Tuojiang River Basin, making it highly probable that there are extreme values in the dataset that contain important information. In order to avoid removing normal extreme values, this study used multiple outlier detection methods, integrating three outlier detection methods, Isolation Forest, Elliptic Envelope, and One-Class SVM, using the vote method to determine the final outliers and replacing the outliers with the mean value. The boxplot provided an overall overview of the data distribution before and after outlier processing, as shown in Fig 5.
Factors affecting water demand
Correlation analysis was performed on the data to understand the relationship between the different input variables and the dependent variable. Pearson correlation analysis was performed on agricultural, industrial, domestic, and ecological water demand, and their selected 4–5 influencing factors. The scatter matrix plots of all the characterized data were plotted to further understand the distribution of the data for each indicator and their relationship with the water demand, and the associated heat map obtained is shown in Fig 6.
The results indicate that in agricultural terms, the area of irrigated arable land (A2) has the most significant impact (0.9384) and is positively correlated with water demand. This result, along with agricultural EVA (A1)and the area of total crop sown (A3), each with Pearson coefficients above 0.7, confirms that these variables are the primary factors influencing agricultural water demand. This finding is consistent with the reality that irrigation accounts for a substantial proportion of water usage in agriculture. Industrial water use is positively correlated with the number of industrial enterprises above a designated size (I4) (0.8762), industrial EVA (I2) (0.6737), and the operating income of industrial enterprises above designated size (I3) (0.5986), in descending order of correlation. Given that large-scale industrial enterprises typically require substantial water resources for their production processes, it is reasonable to utilize these relevant statistical indicators as input variables for predicting industrial water demand. In addition, the negative correlation observed between the percentages of primary and secondary industries (A4, I1) on local agricultural and industrial water demand can be attributed to the more noticeable regional differences in agricultural and industrial water demands. Their industrial share statistics are based on the local industrial structure, and sample points with higher water demand and smaller industrial share data lead to a negative Pearson correlation coefficient. However, the absolute values of these correlation coefficients suggest that both input parameters maintain a degree of correlation with their dependent variables and remain beneficial when used as input parameters for artificial intelligence models. In terms of domestic water use, population density (D2) (0. 7777), a statistical indicator indicating the spatial distribution of the population, shows a significant positive correlation with the dependent variable. Furthermore, indicators linked to quality of life, such as per capita GDP (D1) (0.6859), urban per capita disposable income (D4) (0.467) and rural per capita disposable income (D5) (0.5255), demonstrate a substantial positive correlation. Conversely, the urbanization rate (D3) (-0.146), a measure of urbanization level, is negatively correlated with domestic water use. Ecologically, the correlation between the area of urban green space (E2) (0.9230) and the area of road sweeping and cleaning (E3) (0.9074), both requiring human intervention to supply water, holds high significance in relation to ecological water demand. In contrast, runoff depth (E1), rainfall (E4) and average temperature (E5), as several meteorological factors with greater uncertainty, indirectly influence ecological water demand by affecting the amount of ecological water available, evapotranspiration, and the living conditions of the ecological environment. This is reflected in the relatively weak correlation between these three input variables and the dependent variable. In summary, most of the input variables extracted for this study show a significant correlation with the output variables. Although some factors exhibit a relatively slight effect on the target variables, all parameters were used as inputs due to the low dimensionality of the extracted indicator characteristics, which do not increase the computational complexity of the model.
Comparison of model performance
In this study, Matlab2020a was used to construct the model. In the training stage, Mean Squared Error is adopted as the fitness of algorithm update, and the main parameters of the algorithm were set as shown in Table 3.
Agricultural factors
The fitting results for agricultural water demand prediction from each approach are shown in Fig 7. The optimal model is further selected for quantitative evaluation using the established evaluation indicators (Table 4).
Comparison of predicted and actual values based on (a)GA-BP; (b)GPR; (c)SVR; (d)ELM; (e)RF for agricultural water demand.
Fig 7 illustrates the deviation between the predicted and actual water demand in both the training and test sets. A closer proximity of data points to the diagonal line signifies a better fit from the model. Each of the five models yields a Pearson correlation coefficient greater than 0.9, signifying a robust correlation between the predicted and actual values. Of these, the GPR model exhibits a stronger concentration of data points on the diagonal for both the training and test sets, and its Pearson correlation coefficient of 0.9948 is the highest among the five models.
Upon comparisons of R2, RMSE, and MAPE values among the five models, the bold data in the Table signify the optimal model under each evaluation indicators. It’s evident that the three evaluation indicators indicate the best performance of the GPR model in both the training and testing stages. When combined with insights obtained from Fig 7, it’s evident that the GPR model outperforms the other four models in predicting agricultural water demand.
Industrial factors
The fitting results of various models for the prediction of industrial water demand are illustrated in Fig 8. The optimal model is selected by further quantitative evaluation using the evaluation indicators (Table 5).
Comparison of predicted and actual values based on (a)GA-BP; (b)GPR; (c)SVR; (d)ELM; (e)RF for industrial water demand.
The principle of Fig 8 is the same as that of section 4.2. Despite the GPR model having the highest Pearson coefficient at 0.9772, several data points from the testing set substantially deviate from the diagonal. The data point distribution in the GA-BP and ELM models is similar, with Pearson coefficients around 0.96, but the GA-BP model shows a higher concentration of data points for smaller values.
From Table 5 it can be seen that the GPR model performs best in the training stage, with an R2 of 0.9809. However, in the testing stage, the R2 for the GPR model drops to 0.8074, which is lower than that of the GA-BP model. Considering the performance of each model in both training and testing, the GA-BP model exhibits greater stability. In conjunction with the model performance displayed in Fig 8, the GA-BP model is determined to be the best for predicting industrial water demand.
Domestic factors
The fitting results of various models for domestic water demand prediction are shown in Fig 9. The optimal model is chosen for further quantitative evaluation using the designated evaluation indicators (Table 6).
Comparison of predicted and actual values based on (a)GA-BP; (b)GPR; (c)SVR; (d)ELM; (e)RF for domestic water demand.
Fig 9 shows the same principle as discussed in section 4.2. From the distribution of data points in relation to the position of the diagonal line, it can be seen that for data points with smaller actual values, the models, except for the SVR model, fit the actual values better in both the training and testing phases. However, for the data points with larger values located in the upper right part of the distribution, the data points deviated from the diagonal to a greater extent in the rest of the models except for the GPR model, especially in the test set. Meanwhile, the GPR model has the largest Pearson coefficient value.
As seen in Table 6, the three evaluation indices indicate the optimal performance of the GPR model, both in the training and testing phases. Coupling this with insights from Fig 9, it’s discernible that the GPR model outshines the other models in predicting domestic water demand.
Ecological factors
Fig 10 illustrates the fitting results from various models used to predict ecological water demand. The optimal model is then chosen for a more in-depth, quantitative evaluation, utilizing certain evaluation indicators (Table 7).
Comparison of predicted and actual values based on (a)GA-BP; (b)GPR; (c)SVR; (d)ELM; (e)RF for ecologial water demand.
Fig 10 shows the same principle as 4.2. The Pearson correlation value of GPR model is the largest, which is 0.9678, followed by RF, GA-BP, SVR and ELM. From the distribution of data points, the SVR model fit is clearly the worst, with a large number of data points having predicted values much lower than the actual values.
From Table 7 it can be seen that the GPR model clearly outperforms the others during the training phase. Even though the GPR model’s MAPE value is not the smallest in the testing stage, both the R2 and RMSE values affirm the model’s reliability and accuracy for the test set. This, coupled with the model performance portrayed in Fig 10, confirms that the GPR model is the superior choice for predicting ecological water demand.
Predicting future water demand
Applying the most appropriate water demand forecasting models for each water use sector identified in this study to the 2025 water demand projections for the study area provides decision-makers with reliable, annual water demand estimates per location. This information is crucial for decision-makers to align water resource availability with economic development levels, enabling optimal scheduling of water supply systems. In the SPSS software platform, for the time-series data of each influencing factor in terms of years, three non-seasonal time-series models in the exponential smoothing model were used: Holt linear trend model, Brown linear trend, and decay trend, and based on the model fitting statistic R2 and stable R2 to determine the most suitable predictive model for each influencing factor to complete the prediction of the input variables of the indicator system in 2025. The previously identified water demand forecasting models for each water use sector were applied individually to determine the agricultural, industrial, domestic, and ecological water demands for the six regions in 2025. These findings are illustrated in Fig 11, with actual water demands for 2020 included for comparative trend analysis (Table 8).
These maps with approval numbers can be accessed free of charge from the agency’s public website GS(2023)2765 (http://bzdt.ch.mnr.gov.cn /browse.html?picId = %224o28b0625501ad13015501ad2bfc2190%22), Sichuan S(2021)00056 (https://scsm.mnr.gov.cn/StandMaps/mapDetails.html).
Fig 11 reveals that by 2025, Chengdu, as the provincial capital of Sichuan, will have the highest agricultural and domestic water demand in the Tuojiang River basin. The water consumption pattern in Deyang in 2025 will parallel Chengdu, with agricultural and domestic water demand being more significant. Ziyang will demonstrate a more balanced water demand across the three primary water use sectors, albeit at a lower level. Neijiang and Zigong exhibit similar socio-economic levels, with their respective GDP rankings as 10th and 11th within the province in 2022. This similarity is reflected in their comparable water demand and water consumption structures across all sectors. However, due to Neijiang’s industrial development prominence and large-scale industrial expansion, its industrial water demand will be more pronounced. Luzhou’s domestic water demand will be lower than other regions, but its agricultural water demand will be higher, and industrial water demand will align with Ziyang and Neijiang. In terms of ecology, all regions will exhibit relatively low ecological water demand, and specific value changes can be seen in Table 8.
Table 8 reveals that agriculturally, apart from Chengdu and Ziyang, all other regions will witness varying degrees of agricultural water demand growth in 2025. This highlights that the Tuojiang River basin, being a developed agricultural region in Sichuan Province, will continue prioritizing specialty agriculture. Industrial water demand will notably increase across all regions. Consequently, each region must adjust its current development model and improve industrial water-saving efficiency to maximize industrial water conservation and emission reduction. Domestically, Chengdu will experience a decline in domestic water demand as urbanization matures and residents’ water conservation awareness generally increases. In contrast, the other regions will see a significant rise in domestic water demand due to urbanization progression and increased tap water usage in rural areas. Ecologically, only Chengdu and Deyang will experience a minor decrease in ecological water demand, while all other regions will see an increase.
Discussion
As an indispensable natural resource for human survival and social development, water plays an extremely important role in sustainable socio-economic development [47]. However, increasing water scarcity under the influence of human activities and urbanization is seen as a global systemic risk [48, 49]. In order to develop targeted management measures, water demand forecasting is an important tool to assess the state of water security and to identify possible problems in future water resource systems. Previous forecasting models are mainly suitable for large data, and it is difficult to obtain accurate forecasts for small sample data [8]. This paper compares the applicability of different methodological models to various water use sectors based on short time series data and applies the best model to the 2025 water demand forecast. The results show that agricultural water demand in the Tuo River basin will increase by 2025, indicating an increase in the intensity of agricultural production. In recent two decades, Tuojiang River basin has been the top priority of comprehensive water pollution control, and agricultural and rural non-point source pollution is a major source of pollution in the Tuojiang River basin. Therefore, under the prospect of expanding the scale of agricultural production in the future, cultivating modern and efficient agriculture and improving the ability of agricultural non-point source pollution control are the focus of future exploration of new paths for green development and transformation in the Tuojiang basin, and promoting ecological environmental protection and sustainable economic and social development in the Tuojiang basin. Industrial water use efficiency is an important indicator of industrial water demand and is related not only to the technological heterogeneity of different regions, but also to the control and management of industrial water pollution [50]. In this regard, specific measures such as optimising the industrial layout, promoting new water-saving technologies and limiting the scale of high water-consuming and high-polluting enterprises are needed to ease the burden of water resources on the industrial side. The continued increase in urbanization rates has not only led to a corresponding increase in urban domestic water consumption, but has also led to an increase in impervious area as a result of rapid urbanization to accommodate migration to urban centres [51], resulting in reduced groundwater recharge and increased evaporation of surface water runoff [52]. Consequently, decision makers should take into account other ecological aspects of water security arising from changes in impervious areas while coping with increasing domestic water demand, and minimise the negative ecological impacts of reduced ecological water consumption.
There are two primary areas where this study could be improved: (1) The water demand prediction index applicable to each internal system may vary due to differences in geographical, climatic, economic, and social conditions of each region. If the model proposed in this paper is directly applied to other regions, the problem of low indicator correlation may arise. To augment the universal applicability of the research results, further extraction of general indicators will be pursued. (2) Given that water demand is closely linked to water vulnerability, future research in water forecasting will concentrate on investigating water vulnerability at smaller scales. This will increase the value of regional water demand forecasting in optimizing water operations and planning.
Conclusions
This paper employed artificial intelligence models to establish long-term, annual water demand forecasting models. Using the six regions along the Tuojiang River basin as the study area, the output data is the actual annual water demand of each water use sector. The input data for agricultural water demand (AWD) are Agricultural EVA (A1), Area of irrigated arable land (A2), Area of total crop sown (A3), Percentage of primary industries (A4). The input data for industrial water demand (IWD) are Percentage of secondary industries (I1), Industrial EVA (I2), Operating income of industrial enterprises above designated size (I3), Number of industrial households above designated size (I4). The input data for domestic water demand (DWD) are Per capita GDP (D1), Population density (D2), Urbanization rate (D3), Urban per capita disposable income (D4), Rural per capita disposable income (D5). The input data for ecological water demand (EWD) are Runoff depth (E1), Urban green space (E2), Area of road sweeping and cleaning (E3), Rainfall (E4), Average temperature (E5). The sample size for each water use sector is comprised of the data points from these six regions from 2005 to 2020, amounting to 96 in total. The efficacy of the five forecasting models, which include artificial neural network models (GA-BP, ELM) and regression models (GPR, SVR, RF), is evaluated by comparing the correlation between predicted and actual values in the training and test sets, as well as the statistical evaluation indicator values. The results reveal that the GPR models offer superior performance for agricultural, domestic, and ecological water demand forecasting, while the GA-BP model performs best for industrial water demand forecasting. The optimal model for each water sector is applied to forecast the water demand in 2025 in the study area. The forecast indicates that by 2025, most regions will experience increased water demand across each sector. These forecast results can aid managers in obtaining a clearer understanding of future water use structures and supply and demand trends, allowing them to manage supply and storage in order to optimize distribution costs and maximize socio-economic sustainable development coordination across regions.
Supporting information
S1 Table. It mainly includes data on the following: The water demand and influencing factors data of the Tuojiang River Basin from 2005 to 2020; the predicted values of all models in the training and testing stages; the values of influencing factors and the results of water demand prediction in 2025.
https://doi.org/10.1371/journal.pone.0302558.s001
(XLSX)
Acknowledgments
Authors also would like to thank two esteemed anonymous Reviewers for the very constructive and valuable comments that really improved quality of the manuscript.
References
- 1. Fan L., Liu G., Wang F., Ritsema C., & Gissen V. (2014) Domestic water consumption under intermittent and continuous modes of water supply. Water Resources Management, 28(3), 853–865.
- 2. Al-Zahrani M., & Abo-Monasar A. (2015). Urban residential water demand prediction based on artificial neural networks and time series models. Water Resources Management, 29(10), 3651–3662.
- 3.
Billings R. B., & Jones C. V. (2011). Forecasting urban water demand: American Water Works Association. American Water Works Association.
- 4. Firat M., Turan M. E., & Yurdusev M. A. (2009). Comparative analysis of fuzzy inference systems for water consumption time series prediction. Journal of Hydrology, 374(3), 235–241.
- 5. Tiwari M., & Adamowski J. (2013). Urban water demand forecasting and uncertainty assessment using ensemble wavelet-bootstrap-neural network models. Water Resources Research 49(10), 6486–6507.
- 6. Huang H., Zhang Z., & Song F. (2021). An ensemble-learning-based method for short-term water demand forecasting. Water Resources Management, 35(6), 1757–1773.
- 7. Ghiassi M., Fa’al F., & Abrishamchi A. (2017). Large metropolitan water demand forecasting using DAN2, FTDNN, and KNN models: A case study of the city of Tehran. Urban Water Journal, 14(6), 655–659.
- 8. Wu H., Zeng B., & Zhou M. (2017). Forecasting the Water Demand in Chongqing, China Using a Grey Prediction Model and Recommendations for the Sustainable Development of Urban Water Consumption. International Journal of Environmental Research and Public Health 14(11), 1386. pmid:29140266
- 9. Li H., Wang X., & Guo H. (2022). Uncertain time series forecasting method for the water demand prediction in Beijing. Water Supply, 22(3), 3254–3270.
- 10. Babel M., & Shinde V. (2011). Identifying Prominent Explanatory Variables for Water Demand Prediction Using Artificial Neural Networks: A Case Study of Bangkok. Water Resources Management, 25(6), 1653–1676.
- 11. Huang H., Zhang Z., Lin Z., & Liu S. (2022). Hourly water demand forecasting using a hybrid model based on mind evolutionary algorithm. Water supply, 22(1), 917–927.
- 12. Pandey P., Bokde N., Dongre S., & Gupta R. (2021). Hybrid Models for Water Demand Forecasting. Journal of Water Resources Planning and Management, 147(2), 1–13.
- 13. Kang H. S., Kim H., Lee J., Lee I., Kwak B. Y., & Im H. (2015). Optimization of pumping schedule based on water demand forecasting using a combined model of autoregressive integrated moving average and exponential smoothing. Water Supply, 15(1), 188–195.
- 14. Wang F. (2013). Medium and long term water demand prediction in liaoning province based on grey system prediction model and the quota method prediction model. Journal of Shenyang Agricultural University, 44(4), 491–494. (In Chinese).
- 15. Guo L., Huang B., Qiu J., & Huang F. (2017). Trend and regression analysis-based water demand prediction of Pearl River Delta urban agglomeration. Water Resources and Hydropower Engineering, 48(1), 23–28. (In Chinese)
- 16. Liu G., Yuan M., Chen X., Lin X., & Jiang Q. (2023). Water demand in watershed forecasting using a hybrid model based on autoregressive moving average and deep neural networks. Environmental Science and Pollution Research, 30, 11946–11958 pmid:36100789
- 17. Enbeyle W., Hamad A., Al-Obeidi A., Abebaw S., Belay A., Markos A.,… et al. (2022). Trend analysis and prediction on water consumption in southwestern ethiopia. Journal of Nanomaterials, 2022, 3294954.
- 18. Zubaidi S., Dooley J., Alkhaddar R., Abdellatif M., Al-Bugharbee H., & Ortega-Martorell S. (2018). A novel approach for predicting monthly water demand by combining singular spectrum analysis with neural networks. Journal of Hydrology, 561, 136–145.
- 19. Candelieri A., Giordani I., Archetti F., Barkalov K., Meyerov I., Polovinkin A.,… et al. (2018). Tuning hyperparameters of a SVM-based water demand forecasting system through parallel global optimization. Computers & Operations Research, 106, 202–209.
- 20. Chen L., Gan X., Yi B., Qin Y., & Lu L. (2022). Domestic water demand prediction based on system dynamics combined with social-hydrology methods. Hydrology Research, 53(8), 1107–1128.
- 21. Chen G., Long T., Bai Y., & Zhang J. (2019). A Forecasting Framework Based on Kalman Filter Integrated Multivariate Local Polynomial Regression: Application to Urban Water Demand. Neural Processing Letters, 50(1), 497–513.
- 22. Wu J., Wang Z., & Dong L. (2021). Prediction and analysis of water resources demand in Taiyuan City based on principal component analysis and BP neural network. Water Infrastructure, Ecosystems and Society, 70(8), 1272.
- 23. Deo R., & Şahin M. (2015). Application of the extreme learning machine algorithm for the prediction of monthly effective drought index in eastern Australia. Atmospheric Research, 153, 512–525.
- 24. Adnan R., Dai H., Mostafa R., Islam A., Kisi O., Elbeltagi A., et al. (2023). Modelling groundwater level fluctuations by ELM merged advanced metaheuristic algorithms using hydroclimatic data. Geocarto International, 38(1), 2158951.
- 25. Mouatadid S., & Adamowski J. (2017). Using extreme learning machines for short-term urban water demand forecasting. Urban Water Journal, 14(6), 630–638.
- 26. Groppo G., Costa M. & Libânio M. (2019). Predicting water demand: a review of the methods employed and future possibilities. Water Supply, 19(8), 2179–2198.
- 27. Subasi A. (2013). Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. Computers in Biology and Medicine, 43(5), 576–586. pmid:23453053
- 28. Shen L., Chen H., Yu Z., Kang W., Zhang B., Li H.,… et al. (2016). Evolving support vector machines using fruit fly optimization for medical data classification. Knowledge-Based Systems, 96(15), 61–75.
- 29. Tiwari M., & Adamowski J. (2015). Medium-term urban water demand forecasting with limited data using an ensemble wavelet-bootstrap machine-learning approach. Journal of Water Resources Planning and Management. 141 (2): 04014053.
- 30. Villarin M., & Rodriguez-Galiano V. (2019). Machine learning for modeling water demand. Journal of Water Resources Planning and Management, 145(5), 04019017.
- 31. Wang Y., Feng B., Hua Q., & Sun L. (2021). Short-Term Solar Power Forecasting: A Combined Long Short-Term Memory and Gaussian Process Regression Method. Sustainability, 13(7), 3665.
- 32. Liu Y., Huang D., Liu B., Feng Q., & Cai B. (2021). Adaptive ranking based ensemble learning of Gaussian process regression models for quality-related variable prediction in process industries. Applied Soft Computing, 101, 107060.
- 33. Wan X., Li X., Wang X., Yi X., Zhao Y., He X.,… et al. (2022). Water quality prediction model using Gaussian process regression based on deep learning for carbon neutrality in papermaking wastewater treatment system. Environmental Research, 211, 112942. pmid:35189104
- 34. Zhang L., Su X., Singh V., Ayantobo O., & Xie J. (2018). Logarithmic Mean Divisia Index (LMDI) decomposition analysis of changes in agricultural water use: a case study of the middle reaches of the Heihe River basin, China. Agricultural Water Management, 208, 422–430.
- 35. Chen X., Li F., Li X., Hu Y., & Hu P. (2020). Evaluating and mapping water supply and demand for sustainable urban ecosystem management in Shenzhen, China. Journal of Cleaner Production, 251, 119754.
- 36. Yang Z., Li B., Wu H., Li M., Fan J., Chen M., et al. (2023). Water consumption prediction and influencing factor analysis based on PCA-BP neural network in karst regions: a case study of Guizhou Province. Environmental Science and Pollution Research, 30, 33504–33515. pmid:36480138
- 37. Chen L., Xu L., Xu Q., & Yang Z. (2016). Optimization of urban industrial structure under the low-carbon goal and the water constraints: a case in Dalian, China. Journal of Cleaner Production, 114, 323–333.
- 38. Fan L., Gai L., Tong Y., & Li R. (2017). Urban water consumption and its influencing factors in China: Evidence from 286 cities. Journal of Cleaner Production, 166, 124–133.
- 39. Zhang F., Yang D. Tang H., & Liu Y. (2015). Analyses of the changing process and influencing factors of water resource utilization in Megalopolis of Arid Area. Water Resources, 42(5), 712–720.
- 40. Zuo M., Kang S., Niu J., & Lu H. (2018). A new technique to estimate regional irrigation water demand and driving factor effects using an improved SWAT model with LMDI factor decomposition in an arid basin. Journal of Cleaner Production, 185, 814–828.
- 41. Beal C., & Stewart R. (2014). Identifying Residential Water End Uses Underpinning Peak Day and Peak Hour Demand. Journal of Water Resources Planning and Management, 140(7), 04014008.
- 42. Li C., Zhou J., Tao M., Du K., Wang S., Armaghani D., et al. (2022). Developing hybrid ELM-ALO, ELM-LSO and ELM-SOA models predicting advance rate of TBM. Transportation Geotechnics, 36, 100819.
- 43.
Vapnik V., Golowich S., & Smola A. (1997). "Support vector methods for function approximation regression estimation and signal processing" in Advances in Neural Information Processing Systems, MA, Cambridge: MIT Press 9, 187–281.
- 44. Yan B., Zhang X., Tang C., Wang X., Yang Y., & Xu W. (2023). A Random Forest-Based Method for Predicting Borehole Trajectories. Mathematics, 11, 1297.
- 45. Adebiyi A., Adewumi A., & Ayo C. (2014). Comparison of arima and artificial neural networks models for stock price prediction. Journal of Applied Mathematics, 2014(1), 1–7.
- 46. Oyebode O., Babatunde D., Monyei C., & Babatunde O. (2019). Water demand modelling using evolutionary computation techniques: integrating water equity and justice for realization of the sustainable development goals. Heliyon, 5(11), e02796. pmid:31844725
- 47. Chen W., Chen Y., & Feng Y. (2021). Assessment and Prediction of Water Resources Vulnerability Based on a NRS-RF Model: A Case Study of the Song-Liao River Basin, China. Entropy 23, 882. pmid:34356423
- 48. Vörösmarty C., McIntyre P., Gessner M., Dudgeon D., Prusevich A., Green P., et al. (2010). Global threats to human water security and river biodiversity. Nature 467 (7315), 555–561. pmid:20882010
- 49. Bakker K. (2012). Water security: research challenges and opportunities. Science, 337(6097), 914–915.
- 50. Li J., & Ma X. (2015). Econometric analysis of industrial water use efficiency in China. Environment Development and Sustainability, 17(5), 1209–1226.
- 51. Akanksha B., Bramha D., Suneel P., & Chander K. (2020). Predicting impact of urbanization on water resources in megacity Delhi. Remote Sensing Applications: Society and Environment 20, 100361.
- 52. Avtar R., Kumar P., Oono A., Saraswat C., Dorji S., & Hlaing Z. (2017). Potential application of remote sensing in monitoring ecosystem services of forests, mangroves and urban areas. Geocarto International, 32(8), 874–885.