Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Determination of the Optimal Training Principle and Input Variables in Artificial Neural Network Model for the Biweekly Chlorophyll-a Prediction: A Case Study of the Yuqiao Reservoir, China

  • Yu Liu,

    Affiliations Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing, China

  • Du-Gang Xi,

    Affiliations The PLA Information Engineering University, Zhengzhou, China, Naval Institute of Hydrographic Surveying and Charting, Tianjin, China

  • Zhao-Liang Li

    Affiliations Key Laboratory of Agri-informatics, Ministry of Agriculture / Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, China, ICube, UdS, CNRS, 300 boulevard Sebastien Brant, CS 10413, 67412 Illkirch, France

Determination of the Optimal Training Principle and Input Variables in Artificial Neural Network Model for the Biweekly Chlorophyll-a Prediction: A Case Study of the Yuqiao Reservoir, China

  • Yu Liu, 
  • Du-Gang Xi, 
  • Zhao-Liang Li


Predicting the levels of chlorophyll-a (Chl-a) is a vital component of water quality management, which ensures that urban drinking water is safe from harmful algal blooms. This study developed a model to predict Chl-a levels in the Yuqiao Reservoir (Tianjin, China) biweekly using water quality and meteorological data from 1999-2012. First, six artificial neural networks (ANNs) and two non-ANN methods (principal component analysis and the support vector regression model) were compared to determine the appropriate training principle. Subsequently, three predictors with different input variables were developed to examine the feasibility of incorporating meteorological factors into Chl-a prediction, which usually only uses water quality data. Finally, a sensitivity analysis was performed to examine how the Chl-a predictor reacts to changes in input variables. The results were as follows: first, ANN is a powerful predictive alternative to the traditional modeling techniques used for Chl-a prediction. The back program (BP) model yields slightly better results than all other ANNs, with the normalized mean square error (NMSE), the correlation coefficient (Corr), and the Nash-Sutcliffe coefficient of efficiency (NSE) at 0.003 mg/l, 0.880 and 0.754, respectively, in the testing period. Second, the incorporation of meteorological data greatly improved Chl-a prediction compared to models solely using water quality factors or meteorological data; the correlation coefficient increased from 0.574-0.686 to 0.880 when meteorological data were included. Finally, the Chl-a predictor is more sensitive to air pressure and pH compared to other water quality and meteorological variables.


Chlorophyll-a (Chl-a) is commonly used as an indicator of the abundance of phytoplankton and the population levels of primary productivity in the lakes and reservoirs that provide most of the drinking water for dozens of large and medium cities in China. Predicting the levels of Chl-a is a vital part of water quality management to ensure that urban drinking water is safe from harmful algal blooms.

Chl-a levels in lakes and reservoirs have been modeled for over 40 years [1], [2], and several statistical and process-based physical models have been developed using analysis of phytoplankton. Two of the most commonly used statistical predictors are linear regression models [3], [4] and principal component analysis [5], [6], [7]. These methods are simple but often do not yield reliable results, and sometimes even produce significant errors due to poor statistical stability and the use of linear equations. With improved understanding of aquatic ecosystem processes and advanced computing capabilities, physical models are now used to address water quality problems [8], [9], [10]. Although these models can describe variations in Chl-a levels based on the mechanism, they are not well suited for most Chinese lakes and reservoirs they require a significant amount of field data.

Artificial neural networks (ANNs), which imitate the basic characteristics of the human brain such as self-adaptability, self-organization and error tolerance, are able to map non-linear relationships among the variables that are typical of aquatic ecosystems [11]. Since their first application for the prediction of algal blooms from water quality databases of the Saidenbach Reservoir in Germany [12], ANNs have been widely applied to study Chl-a. Some examples of their application include prediction of algal blooms in Lake Kasumigaura in Japan [13], forecasting the incidence of cyanobacteria in the Murray River in Australia [14], estimation of the Chl-a levels in three water bodies in Turkey [15], analysis of algal bloom dynamics in the coastal waters of Hong Kong [16], elucidation of phytoplankton dynamics in the Nakdong River in Korea [17], prediction of the Chl-a levels in the Nanzui water area of Dongting Lake in China [18], and modeling of Chl-a levels during spring algal blooms in the Xiangxi Bay of the Three Gorges Reservoir in China [19]. These studies revealed that ANNs outperform traditional statistical models in modeling non-linear behavior and are more flexible than physical models because they require less detailed knowledge of the aquatic ecosystem. However, none of these studies encountered difficulties specific to modeling of the Yuqiao Reservoir, which has extensive submerged aquatic plants in addition to problems common to most reservoirs, such as abundant blue algae, limited data, highly variable water levels, and complex physical and chemical processes. Shallow water and appropriate nutrition conditions in the Yuqiao Reservoir have led to extensive growth of submerged aquatic plants.

Furthermore, although it is important to select the proper training method to improve prediction, few studies have systematically analyzed the performance of different ANNs in predicting Chl-a levels. Finally, almost all these studies used only water quality data as inputs, whereas meteorological factors that greatly affect the growth and accumulation of algae were rarely considered. Therefore, this study developed an accurate biweekly Chl-a predictor for the Yuqiao Reservoir by selecting appropriate training methods based on comparison of several ANN and non-ANN methods and by determining the appropriate model inputs including meteorological factors.

Study Area and Data

1. Study area

The Yuqiao Reservoir (Fig. 1) is located downstream of the Haihe River Basin in northern China. It is the largest reservoir and the only source of drinking water for Tianjin, the third largest city in China with a population of 2.92×107 in 2010. The reservoir was built in 1959 and used as a regulating reservoir during the diversion project from Luanhe to Tianjian in 1983. The reservoir surface area is 86.8 km2, and its volume and average depth at normal water level are 0.42×109 m3 and 4.6 m, respectively. The mean annual precipitation and air temperature of the basin are 750 mm and 11.5°C, respectively.

The ecosystem of the Yuqiao Reservoir has undergone significant changes over the past few decades because of the natural evolution of biological species, changes in water diversion patterns, accelerated eutrophication of water quality, and substantial reduction of runoff resulting from climate change and human activities. The dominant species of submerged vegetation has changed from Potamogeton maackianus to Potamogeton crispus, and the biomass of Potamogeton crispus in late May increased from 4.8×107 kg in 1988 to 1.19×108 kg in 2009, whereas the distribution area of this species increased from 34.85% to 60.84%, according to remote-sensing estimates made using the Huanjing-1A/B satellite. The safety of the water supply of the Yuqiao Reservoir is now threatened by excessive growth of Potamogeton crispus in spring and algal outbreaks in summer. Potamogeton crispus is a submerged aquatic plant that purifies water by absorbing excess nutrients and competing for resources with cyanobacteria during its high growth period from April to mid-May. Furthermore, Potamogeton crispus promotes the growth of cyanobacteria by accelerating nutrient release during its explosive death and decay during late May and early June.

2. Data

(1) Samples. In this study, water quality and meteorological data from 1999–2012 were collected (S1 Data). The water quality data comprise 20 parameters including water temperature (Tw), pH, conductivity (Cond), transparency (Tran), chloride (CL), hardness (Hard), ammonia nitrogen (NH4-N), nitrate-nitrogen (NO3-N), nitrite-nitrogen (NO2-N), total nitrogen (TN), dissolved oxygen (DO), permanganate index (PI), biochemical oxygen demand (BOD), total phosphorus (TP), phosphate (PS), total solids (TS), suspended solids (SPS), soluble solids (SLS), salinity (SAL) and Chl-a. These data were collected by the Yuqiao Reservoir Administrative Bureau from the center of the reservoir (117°30′30.24″E and 40°02′35.02″N) approximately every two weeks in the summer and monthly in other seasons.

Meteorological data were obtained at the Tianjin site (117° 04′ E and 39° 05′ N) daily by the National Weather Service Information Center. These data include 11 parameters: mean air pressure (P), maximum air pressure (Pmax), minimum air pressure (Pmin), mean air temperature (Ta), maximum air temperature (Tamax), minimum air temperature (Tamin), precipitation (PCP), average wind speed (WS), maximum wind speed (WSmax), sunshine duration (SD), and total radiation (R).

(2) Features and variables. Feature extraction and determination of variables are important for any pattern recognition task, especially for Chl-a prediction, which involves complicated processes and variations. Excessive inputs may result in inefficient Chl-a prediction, whereas limited inputs may fail to describe the relationship between the influential variables and Chl-a levels.

To prepare features and variables for model inputs, we first interpolated the field water quality data into biweekly sets using a linear method and then processed the meteorological data to match the water quality data. The predicted day was set as Day0, and the current day was set as Day15. Therefore, the average value of the water quality and meteorological data of the preceding days 15–165 were processed into biweekly intervals. Second, considering the absence of field data for days 0–15 relative to the predicted day, we supplemented these data with the 10-year (2000–2009)-average water quality and meteorological variables of the corresponding period. Therefore, a total of 372 variables ((11 meteorological data + 20 water quality variables) × 12) were prepared (Table 1).

Table 1. Correlation coefficients of Chlorophyll-a (Chl-a) with water quality and Chl-a with meteorological variables.

To reduce the dimensionality of the input data and to determine the appropriate model inputs, a threshold was applied to the correlation coefficient (Table 1). Variables whose correlation coefficient with Chl-a was over 0.5 were considered relatively important and selected as inputs. Therefore, a total of 27 variables of 6 water quality features (6 Tw variables: Tw30, Tw45, Tw60, Tw75, Tw90, and Tw105; 3 DO variables: DO30, DO45, and DO60; 3 PI variables: PI30, PI45, and PI60; 5 TP variables: TP30, TP45, TP60, TP75, and TP90; 5 NO3-N variables: Nia0, Nia15, Nia30, Nia45, and Nia60; 5 Chl-a variables: Chla0, Chla15, Chla30, Chla45, and Chla60) and 16 variables of 4 meteorological features (4 P variables: P60, P75, P90, and P105; 4 Pmax variables: Pmax60, Pmax75, Pmax90, and Pmax105; 4 Pmin variables: Pmin60, Pmin75, Pmin90, and Pmin105; 4 Ta variables: Ta90, Ta105, Ta120, and Ta135) were selected.

Based on field experience, meteorological variables such as WS, SD and R were added as inputs because of their close relationship to Chl-a despite their low correlation coefficients (<0.5), which result from the typical non-linear relationship between variables and Chl-a. Meteorological variables whose correlation coefficient with Chl-a was over 0.3 were also selected as inputs. Therefore, 14 meteorological variables (5 WS variables: WS0, WS15, WS30, WS45, and WS60; 5 SD variables: SD0, SD15, SD30, SD45, and SD60; 4 R variables: R30, R45, R60, and R75) were selected.

pH was also selected as an input despite a low correlation coefficient, similar to the WS meteorological variables.

Considering the similarity of air pressure variables, 3 air pressure features (P, Pmax and Pmin) were reduced to one (P), and 8 air pressure variables were excluded (4 Pmax variables: Pmax60, Pmax75, Pmax90, and Pmax105; 4 Pmin variables: Pmin60, Pmin75, Pmin90, and Pmin105).

Precipitation was not considered in this study despite relatively good correlation coefficients (0.38–0.48) because precipitation events were rare, and their values varied greatly, which might cause significant uncertainty in the prediction model.

Chla0 was excluded because by definition it was the predicted Chl-a, i.e., the 10-year-average Chl-a. The use of Chla0 might significantly influence the annual average Chl-a prediction model, making it less flexible to variations in water quality and meteorological conditions.

Therefore, a total of 49 variables of 12 features (27 variables of 7 water quality features and 22 variables of 5 meteorological features) were selected for this study (Table 2).

(3) Configuration. Based on the above parameters, the Chl-a predictor was designed as shown in Fig. 2. The predictor comprises three parts: an input layer, an output layer and several hidden layers. Each layer contains several neurons. Each neuron receives inputs from neurons in the previous layers or from external sources and then converts the inputs either to an output signal or to another input signal for neurons in the next layer. The connections between neurons in successive layers were assigned weighted values, which represent the importance of that connection in the network.

Fig 2. Configuration of the Chlorophyll-a (Chl-a) predictor using artificial neural networks (ANN).

Left: input water quality and meteorological variables extracted by the correlation coefficient threshold method; middle: configuration of the predictor; right: predicted Chl-a. White circles on the left and right represent input and output neurons, respectively. Black circles represent neurons in the hidden layer. Lines around the circles indicate the data flow. A total of 49 variables of 12 features (27 variables of 7 water quality features and 22 variables of 5 meteorological features) were used.


This section introduces the strategy used to develop a Chl-a predictor, which considers factors including choice of an appropriate training method, determination of adequate model inputs, and identification of suitable network architecture and parameters.

1. Training method

To identify which model is best suited for the Chl-a predictor, the following six widely used ANNs were compared: Back Propagation (BP), Probabilistic Neural Network (PNN), Modular Neural Networks (MNN), Jordan-Elman network, Self-Organizing Map (SOM), and Co-Active Neuro-Fuzzy Inference System (CANFIS). BP is most likely the most widely used ANN and comprises a feed-forward multi-layer neural network in which connections can jump over one or more layers, and errors are propagated back to connections stemming from the input units. PNNs are nonlinear hybrid networks typically containing a single hidden layer of processing elements and use Gaussian transfer functions; all weights can be calculated analytically in these networks. MNNs combine the results from several parallel multilayer perceptrons. SOMs transform arbitrary dimensional inputs into a one- or two-dimensional discrete map considering topological constraints. CANFIS integrates adaptable fuzzy inputs with a modular neural network to rapidly and accurately approximate complex functions. These ANNs are described in detail in Liu et al. [20].

To examine the performance of ANNs, ANNs were compared to two typical traditional non-ANN methods: principal component analysis (PCA) and support vector machine (SVM). PCA is a widely used statistical method, which identifies relatively few “features” or components that as a whole represent the full object state. SVM geometrically separates the training set using a hyperplane or more complex surfaces if necessary; SVM is a new mathematical method, which is widely used in modeling ecosystems.

The ANN predictors were performed using the NeuroSolutions 6.31 ( software for the MATLAB neural network toolbox.

2. Model inputs

To examine the feasibility of including meteorological variables in the Chl-a predictor, which uses only water quality data, three models were constructed and analyzed using the following inputs: (a) only water quality factors (WQ) (27 variables of 7 features); (b) only meteorological factors (MF) (22 variables of 5 features); (c) both water quality and meteorological factors (WM) (49 variables of 12 features).

3. Evaluation indices

The performance of the Chl-a predictor was measured first by computational cost and then by precision. The first evaluation index was based on the training time required, and the second index was based on the normalized mean square error (NMSE), the correlation coefficient (Corr), and the Nash-Sutcliffe coefficient of efficiency (NSE). These evaluation indices are described below.


ypi is the predicted Chl-a value at moment i, and yi is the observed value; N is the number of days with interval of 15 days; is the average Chl-a value observed at all moments.

4. Model parameters

Selection of parameters such as the number of hidden layers, number of neurons, and learning rules, etc. was mainly based on the performance of NMSE, Corr, and NSE, which depended on the experience of the researcher and several tests. The Chl-a predictor was trained with maximum supervised epochs of 10000 times, and average MSE less than 0.01 were used as the termination constraint condition.

The learning momentum of the 6 ANN models was set as 0.7. A hyperbolic tangent function was used as the transfer function for axons (TanhAxons) as follows: f(xi,wi) = tanh(xilin), where xilin = βxi is the scaled and offset activity inherited from the linear axon. The learning momentums and TanhAxons were the same for SVM and PCA in the output layers.

Other parameters of the 8 Chl-a models with the same inputs are shown in Table 3. For the three models using different inputs, parameters were identical to the chosen model shown in Table 3 except for the number of neurons in the input and hidden layers. The number of neurons in the input layer for the models WQ, MF and WM were 27, 22, and 49, respectively, whereas the number of neurons in the first hidden layer were 30, 20 and 40, respectively, and in the second hidden layer, there were 25, 20 and 30 neurons, respectively.

5. Training and validation

Since 1983, when the Yuqiao Reservoir became the only source of drinking water for Tianjin city, the greatest changes in water quality, weather conditions and ecosystem in the reservoir occurred during 1999–2012. These changes occurred because of increased nutrient input, significant reduction of runoff, change in water diversion patterns, and natural evolution of ecosystems, which were closely related to climate change, urban water consumption, and newly built water conservancy projects in the upper reaches. The Chl-a level varied from 0.00–0.35 mg/l in 1999–2009 and from 0.00–0.28 mg/l in 2010–2012. The factors influencing the aquatic ecosystems were similar in 2010–2012 and 1999–2009, and there were no extreme weather conditions or changes in water utility patterns. Therefore, the prediction model was developed using data from 1999–2009 and tested using data from 2010–2012 because generally approximately 80% of the samples are used for training and the rest are used for testing while developing ecologic models.

Among the development data, seventy percent were randomly selected to train the model, and the remaining data were used for cross-validation. To avoid over-fitting the network, training was stopped if there was no improvement from the cross-validation process after 100 iterations. Weighted connection values were adjusted to minimize the RMSE between the desired and predicted outputs.

Because the training data spanned most cases of extreme conditions in the Yuqiao Reservoir since 1983, and the validation data were appropriate to test the performance of the proposed model, the Chl-a predictor should illustrate variations in Chl-a levels corresponding to changes in the ecosystem, weather conditions, and water diversion plans. Furthermore, the proposed model used a greater number of appropriate water quality factors and incorporated meteorological factors as inputs, whereas traditional predictors only use a limited number of water quality factors; therefore, compared to most traditional Chl-a predictors, the proposed model should adapt better to variations in weather conditions and water diversion patterns.

However, the performance of the model under new water diversion patterns and extreme weather conditions is unclear. This scope of this study did not include cases with limited water quality data and extreme conditions, which are low-probability events and occur randomly.

6. Sensitivity

To examine how the trained Chl-a predictor reacted to changes in each input, a sensitivity analysis was performed. Each input to the model was altered by 5%, 10% and 20%, and the corresponding change in output was calculated. For an input indicator to be considered sensitive, the corresponding output variation had to be greater than the input variation. A maximum input alteration of 20% was selected because some parameters such as pH and air pressure are relatively stable and vary by less than 20%.

Results and Discussion

The performance of the Chl-a predictor was examined in three ways. First, 6 ANNs and 2 non-ANN predictors were compared to identify the appropriate model. Second, three ANN models with different input variables were developed to determine the feasibility of incorporating meteorological variables. Third, a sensitivity analysis was performed to examine how the trained network reacted to changes in each input.

1. Comparison of ANN and non-ANN predictors

Table 4 shows the results of the training, validation and testing of the eight Chl-a predictors. Except for the PNN method, all other methods required a training period that was less than 30 seconds. There were no time limits for most ANNs.

Except for the SVM method, the Corr between the observed and predicted Chl-a values of the ANNs was 0.524–0.880 in the testing period. This level of precision was consistent with similar studies on other water bodies, such as a correlation coefficient of 0.5–0.7 in the Putrajaya Lake of Malaysia [21] and 0.77 in the Nakdong River Basin of South Korea [17]. The performance of the ANN predictors was largely satisfactory considering the difficulties encountered in modeling the Yuqiao Reservoir, which contains extensive submerged aquatic plants in addition to the complex physical, chemical, and biological processes observed in other water bodies. Furthermore, the long-term series training data extending over 11 years (1999–2009) also contributed to the complexity of Chl-a prediction because the important factors governing Chl-a values changed significantly over time. For example, Potamogeton crispus became the dominant submerged aquatic plant with biomass and distribution areas that doubled every May over the past 30 years.

The training precision for the predictors was ranked in the following order based on the NMSE, Corr, and NSE evaluation indices: BP > MNN, CANFIS, SOM, Jordan/Elman, PNN > PCA > SVM. The results obtained during validation and testing were consistent with results obtained during training.

In this study, all 6 ANNs outperformed non-ANN methods. For example, the NSE of ANNs during testing was 0.604–0.754, whereas the NSE of the PCA and SVM methods was 0.540 and 0.491, respectively. The failure of the PCA and SVM methods may result from the complex nonlinear nature of the Yuqiao Reservoir ecosystem.

Among the ANNs, the BP method best predicted Chl-a levels with NMSE, Corr and NSE values of 0.003 mg/l, 0.880, and 0.754, respectively, during testing. However, there was no clear advantage of one ANN over others because all 6 ANN models yielded acceptable results.

The SVM method is not suitable for Chl-a prediction because Corr was < 0.1 and NSE was < 0.50 during the training, validation and testing periods; this is potentially because the SVM method treats multi-category problems as a series of binary problems and may thus fail to capture the high variability of the aquatic system in the Yuqiao Reservoir.

In conclusion, the performance of the eight predictors indicated that ANNs, especially when trained by the BP method, are a powerful alternative to traditional modeling techniques for Chl-a prediction.

2. Incorporation of meteorological variables

Fig. 3 and Table 5 show the results of the three ANN models with different inputs of water quality and meteorological variables. The model with only meteorological factors (MF) as inputs always overestimated the concentration of Chl-a, whereas the model with only water quality variables (WQ) as inputs underestimated Chl-a, which was evident during the training period. Combining the water quality and meteorological variables (WF) improved the performance of the Chl-a predictor greatly by accurately detecting peak timing and magnitude. For example, the Corr of the WF model was 0.880, whereas the Corr of the WQ and MF models was only 0.574 and 0.686, respectively. The NSE of the WF model was 0.754, whereas the NSE of the WQ and MF models was 0.225 and 0.662, respectively.

Fig 3. Scatter plots of the observed data vs. the model predictions using different inputs.

(A) Chl-a prediction with different variables used as inputs for training; (B) is the same as (A) but for validation; (C) is the same as (A) but for testing.

3. Sensitivity

The sensitivity of the Chl-a predictor to water quality variables is shown in Fig. 4. The sensitivity decreased in the following order: pH > DO, Tw, PI > NO3-N, TP and the prior Chl-a.

Fig 4. Sensitivity of the predictor to water quality variables.

Bars indicate changes in Chl-a values caused by changes in the input variables, which were altered by 5%, 10% and 20%. Black, slash-filled, and cross line-filled bars indicate the change in Chl-a values caused by 5%, 10%, and 20% changes in input variables, respectively. Tw, water temperature; DO, dissolved oxygen; PI, permanganate index; TP, total phosphorus; NO3-N, nitrate-nitrogen.

Tw has a short-term positive effect on the Chl-a concentration but a negative impact over longer durations. For example, the concentration of Chl-a increases with increasing water temperature in the preceding 30–75 days but reduces with increasing water temperature in the preceding 90–105 days. This may be because warm water promotes the growth of algae in the summer and Potamogeton crispus in the spring; excessive growth of Potamogeton crispus can inhibit the growth of algae by competing for nutrients and light.

Chl-a is very sensitive to pH variations, and the Chl-a concentration increases at twice the rate of pH change. This is potentially because a slight decrease in pH may significantly promote algal photosynthesis by increasing the dissolution of CO2 in water.

A higher Chl-a value generally implies a higher level of DO. To some extent, the level of DO can indicate how much oxygen is produced by phytoplankton.

PI and TP have similar effects on Chl-a, and Chl-a is relatively more sensitive to the permanganate index than to TP. This is because PI can indicate the abundance of phytoplankton, whereas TP influences Chl-a indirectly by promoting the growth of phytoplankton.

Chl-a has similar sensitivity to NO3-N and water temperature: Chl-a first increases then decreases with increasing NO3-N.

The Chl-a concentration is closely related to the Chl-a level during the preceding 15–60 days. This indicates that algal seeds significantly influence growth and Chl-a levels in the subsequent two months.

The sensitivity of the Chl-a predictor to meteorological variables is shown in Fig. 5. The sensitivity decreased as follows: P > WS, SD, and R > Ta.

Fig 5. Sensitivity of the predictor to meteorological variables.

Bars indicate changes as described for Fig. 4. P, daily average air pressure; Ta, average air temperature; WS, wind speed; SD, sunshine duration; R, total radiation.

The Chl-a concentration increases rapidly as P decreases because low air pressure promotes floating and accumulation of algae on the water surface.

Chl-a is almost completely insensitive to changes in Ta; the Chl-a level varied by less than 5% when Ta was altered by 20%. This is because air temperature influences the aquatic system indirectly with water as a medium.

WS has a short-term negative effect and a long-term positive effect on Chl-a levels. This is because strong wind promotes the release of nutrients from the sediment, which promotes Chl-a increase in a relatively slow manner; however, strong wind rapidly inhibits the growth and accumulation of algal particles.

Longer SD and R periods result in increased Chl-a because they increase energy input to the aquatic ecosystem, which promotes photosynthesis.

Sensitivity to meteorological variables is meaningful for short-term forecasts and for long-term prevention of algal blooms. For example, consecutive days with low air pressure, slight wind speed and increasing SD and R in summer indicate a higher probability of algal bloom, which can help water quality management departments to implement advance countermeasures.


To develop an appropriate biweekly Chl-a predictor for the Yuqiao Reservoir, this study first compared several Chl-a predictors trained using different methods and then examined the feasibility of incorporating meteorological factors for prediction. In addition, a sensitivity analysis was performed to examine how the Chl-a predictor reacted to changes in each input. The following observations were made:

  1. (1). ANN is a powerful predictive alternative to traditional modeling techniques for Chl-a prediction with Corr values of 0.524–0.880 in the testing period. The BP model yields better results compared to other ANN models.
  2. (2). Combining the water quality and meteorological data greatly improves the performance of the Chl-a predictor compared to models using water quality or meteorological data alone as inputs; the Corr values increased from 0.574–0.686 to 0.880 when both inputs were combined.
  3. (3). Among the meteorological variables, Chl-a is most sensitive to air pressure, followed by wind velocity, sunshine duration, total radiation, and air temperature. Chl-a is more sensitive to changes in pH compared to other water quality variables such as DO, water temperature, NO3-N, TP and prior Chl-a values.


The authors would like to thank Kai Zhang and Wei Xu at the Tianjin Hydrology Research Institute for the field work, Yuqiao Reservoir Administrative Bureau for providing water quality data and the National Weather Service Information Center of China for providing meteorological data. The authors greatly appreciated the careful and insightful suggestions and comments of reviewers that helped to improve the manuscript and data analysis.

Author Contributions

Conceived and designed the experiments: YL DGX. Performed the experiments: YL. Analyzed the data: YL ZLL. Contributed reagents/materials/analysis tools: ZLL. Wrote the paper: YL DGX. Designed the software used in analysis: DGX.


  1. 1. Chen CW. Concepts and utilities of ecologic models. Journal of Sanitary Engineering Division, ASCE, 1970; 96: 1085–1097.
  2. 2. Di Toro DM, O’Connor DJ, Thomann RV. A dynamic model of phytoplankton population in the Sacramento-San Joaquin Delta. Adv. Chem. Ser. 1971; 106: 131–180.
  3. 3. Onderka M. Correlations between several environmental factors affecting the bloom events of cyanobacteria in Liptovska Mara reservoir (Slovakia)-A simple regression model. Ecological Modelling. 2007; 209: 412–416.
  4. 4. Cho KH, Kang JH, Ki SJ, Park Y, Cha SM, Kim JH. Determination of the optimal parameters in regression models for the prediction of chlorophyll-a: a case study of the Yeongsan Reservoir, Korea. The Science of the total environment. 2009; 407: 2536–2545. pmid:19211132
  5. 5. Çamdevýren H, Demýr N, Kanik A, Keskýn S. Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecological Modelling, 2005; 181, 581–589.
  6. 6. Liu Y, Guo HC, Yang PJ. Exploring the influence of lake water chemistry on chlorophyll a: a multivariate statistical model analysis. Ecological Modelling, 2010; 221: 681–688.
  7. 7. Primpas I, Tsirtsis G, Karydis M, Kokkoris GD. Principal component analysis: development of a multivariate index for assessing eutrophication according to the European water framework directive. Ecological Indicators, 2010; 10: 178–183.
  8. 8. Orlob GT (Ed.). Mathematical Modeling of Water Quality: Streams, Lakes, and Reservoirs. New York, NY: John Wiley & Sons;1983; pp.518.
  9. 9. Chapra SC.Surface Water-quality modeling. Waveland Press; 1997, pp.844.
  10. 10. Martin JL, McCutcheon SC. Hydrodynamics and transport for water quality modeling. CRC Press, 1998, pp.816.
  11. 11. Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S. Application of neural networks to modeling nonlinear relationships in ecology. Ecological Modelling, 1996; 90: 39–52.
  12. 12. French M, Recknagel F. Modeling algal blooms in freshwaters using artificial neural networks. In: Zanetti P. (Ed.), Computer Techniques in Environmental Studies V, vol. II: Environment Systems. Computational Mechanics Publications, Boston; 1994, pp. 87–94.
  13. 13. Yabunaka KI, Hosomi M, Murakami A. Novel application of a back-propagation artificial neural network model formulated to predict algal bloom. Water Science and Technology. 1997; 36: 89–97.
  14. 14. Maier HR, Sayed T, Lence BJ. Forecasting cyanobacterium Anabaena spp. in the River Murray, South Australia, using B-spline neurofuzzy models. Ecological Modelling. 2001; 146: 85–96.
  15. 15. Karul C, Soyupak S. A comparison between neural network based and multiple regression models for chlorophyll-a estimation. Ecological Informatics. In: Recknagel F. (Ed.), Springer: Berlin, Germany; 2006, pp. 309–323.
  16. 16. Lee JHW, Harrison PJ, Kuang CP, Yin KD. Eutrophication dynamics in Hong Kong coastal waters: physical and biological interactions. The Environment in Asia Pacific Harbours. In: Wolanski E. (ed), Springer: Berlin, Germany; 2006, pp.187–206.
  17. 17. Jeong KS, Kim DK, Joo GJ. River phytoplankton prediction model by artificial neural network: Model performance and selection of input variables to predict time-series phytoplankton proliferations in a regulated river system. Ecological Informatics. 2006; 1: 235–245. pmid:16930680
  18. 18. Xu M, Zeng GM, Xu XY, Huang GH, Sun W, Jiang XY. Application of Bayesian regularized BP neural network model for analysis of aquatic ecological data-a case study of chlorophyll-a prediction in Nanzui water area of Dongting Lake. Journal of Environmental Sciences (in Chinese with English abstract). 2005; 17: 946–952.
  19. 19. Luo HJ, Liu DF, Huang YP. Support vector regression model of chlorophyll-a during spring algal bloom in xiangxi bay of three gorges reservoir, China. Journal of Environmental Protection. 2012; 3: 420–425.
  20. 20. Liu Y, Xia J, Shi CX, Hong Y. An improved cloud classification algorithm for China’s FY-2C multi-channel images using artificial neural network. Sensors. 2009; 9: 5558–5579. pmid:22346714
  21. 21. Malek S, Ahmad SMS, Singh SKK, Milow P, Salleh A. Assessment of predictive models for chlorophyll-a concentration of a tropical lake. BMC Bioinformatics. 2012; 12 (S-13): S12 BMC Bioinformatics 12 (S-13): S12 P1.