Lake eutrophication prediction based on improved MIMO-DD-3Q Learning

Li Wang; Chaoran Ning; Xiaoyi Wang; Jiping Xu; Zhiyao Zhao; Jiabin Yu; Huiyan Zhang; Qian Sun; Yuting Bai; Xuebo Jin; Qianhui Tang

doi:10.1371/journal.pone.0294278

Abstract

As for the problem that the traditional single depth prediction model has poor strain capacity to the prediction results of time series data when predicting lake eutrophication, this study takes the multi-factor water quality data affecting lake eutrophication as the main research object. A deep reinforcement learning model is proposed, which can realize the mutual conversion of water quality data prediction models at different times, select the optimal prediction strategy of lake eutrophication at the current time according to its own continuous learning, and improve the reinforcement learning algorithm. Firstly, the greedy factor, the fixed parameter of Agent learning training in reinforcement learning, is introduced into an arctangent function and the mean value reward factor is defined. On this basis, three Q estimates are introduced, and the weight parameters are obtained by calculating the realistic value of Q, taking the average value and the minimum value to update the final Q table, so as to get an Improved MIMO-DD-3Q Learning model. The preliminary prediction results of lake eutrophication are obtained, and the errors obtained are used as the secondary input to continue updating the Q table to build the final Improved MIMO-DD-3Q Learning model, so as to achieve the final prediction of water eutrophication. In this study, multi-factor water quality data of Yongding River in Beijing were selected from 0:00 on July 26, 2021 to 0:00 on September 5, 2021. Firstly, data smoothing and principal component analysis were carried out to confirm that there was a certain correlation between all factors in the occurrence of lake eutrophication. Then, the Improved MIMO-DD-3Q Learning prediction model was used for experimental verification. The results show that the Improved MIMO-DD-3Q Learning model has a good effect in the field of lake eutrophication prediction.

Citation: Wang L, Ning C, Wang X, Xu J, Zhao Z, Yu J, et al. (2023) Lake eutrophication prediction based on improved MIMO-DD-3Q Learning. PLoS ONE 18(11): e0294278. https://doi.org/10.1371/journal.pone.0294278

Editor: Sani Isah Abba, King Fahd University of Petroleum and Minerals, SAUDI ARABIA

Received: June 7, 2023; Accepted: October 30, 2023; Published: November 14, 2023

Copyright: © 2023 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The name of the organization that restricts the data is Beijing Technology and Business University, and the contact person is Qi Bojian, whose email address is 2424163121@qq.como, for the purpose of protecting sensitive information and avoiding public opinion risks.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Rivers and lakes are very important fresh water resources in China, and also one of the precious resources that people depend on for survival. Recently, with the rapid development of our social economy and the improvement of human activities, lake eutrophication [1] has become the primary problem of river and lake treatment in China. The occurrence of lake eutrophication [2] is jointly affected by several indexes, such as physical and chemical indexes, biochemical indexes and nutrient salt indexes [3]. These include:KMNO₄, COD,BOD₅,TOC, NH₃-N, chroma, conductivity, TDS, turbidity, NO₃-N, Chl-a [4] and fluoride. The increase or decrease of these factors will have a certain impact on the eutrophication of lake [5], and then affect the water ecological balance of the whole river and lake. In recent years, lake eutrophication in different degrees has occurred in many rivers and lakes in China, which has also caused some harm. In the past decade, for example, there have been multiple bloom outbreaks in Lake Wu [6], which led to the sudden drop of dissolved oxygen in the water and the death of a large number of fish, resulting in serious lake eutrophication problems [7]. From 2016 to 2018, Chaohu Lake was evaluated according to the TLI method, and some waters in Chaohu Lake showed mild and moderate eutrophication, so it is a necessary research direction to predict lake eutrophication [8].

At present, the prediction modeling methods [9] of lake eutrophication are mainly divided into two categories: the mechanism-driven prediction modeling method [10] of lake eutrophication and the data-driven prediction modeling method [11]. The modeling methods of lake eutrophication driven by mechanism can be divided into three categories: firstly, the single nutrient load model which only considers the limiting factors is generated, and this kind of model has a vague expression of lake eutrophication [12] and has great limitations. Secondly, the multi-nutrient load model appeared, which was not suitable for rivers and lakes with a large spatial geographic range and was affected by spatial geographic location and region. Finally, it is a complex dynamic model [13] based on the combination of hydrodynamics [14] and ecosystem changes, which reflects the growth law and characteristics of physical and chemical indexes to reflect the eutrophication of lake [15]. However, this kind of model is complicated to construct and difficult to accurately fit the actual situation. Therefore, it cannot accurately predict the eutrophication of lake only based on the mechanism.

The data-driven modeling method for lake eutrophication prediction is to analyze and mine a large number of historical monitoring data. It does not take into account the physical, chemical and biological relationships among various indicators, nor does it require prior knowledge, but only considers the internal laws hidden in the data information of the system. Therefore, it is widely used in the prediction of lake eutrophication. However, most of the current prediction methods for lake eutrophication use a single data-driven model for prediction, such as machine learning, regression model grey theory model, etc. [16–18], but these models all have problems such as low prediction accuracy or too long prediction time.

Water quality concentration data that produce lake eutrophication are characterized by multiple indexes, temporal correlation, and strong data mutancy, so deep learning algorithms that are good at data analysis are generally selected for the prediction of such data [19]. The long and short term memory network can accurately capture the internal relationship between the front and back elements in the time series data, and form short-term memory by forgetting the front elements to guide the back elements, while retaining the guidelines to form long-term memory [20, 21]. In the deep random forest, key variables are found and sorted through the input data through the multi-grain scanning process, features are captured according to the sliding window, and features are fully captured and processed data are recorded in the cascade forest process [22, 23]. Transformer is a kind of neural network with self-attention mechanism, which can use time series data as the input of encoder in Transformer model and predict future values in an autoregressive way in the decoder part [24, 25]. However, the data of water quality concentration resulting in lake eutrophication are affected by climate, temperature and other factors, and the data will produce abrupt values. Therefore, a time series modeling method suitable for multi-factor prediction of lake eutrophication was adopted in this study by combining multiple types of traditional single prediction models and applying different prediction algorithms for different periods of data.

Traditional Reinforcement Learning [26] Agent interact with the surrounding environment in an unknown environment according to the "Exploration-Utilization" code of conduct, conduct observation and analysis through continuous exploration and discovery, and then continue to learn according to the rewards and punishments obtained, and finally obtain an optimal decision-making process [27]. When traditional reinforcement learning deals with specific learning tasks, the key lies in the establishment of the Agent own state space and action space, as well as the way of interaction with the environment, so as to enable the Agent to find the optimal strategy in the specific learning task. In the field of lake eutrophication prediction, Deep Reinforcement Learning [28] makes use of its powerful computing power and deep data mining ability to observe the internal relationship between various factors. At the same time, it relies on the learning decision-making ability of Reinforcement Learning and the nature of considering long-term returns to optimize a single model so as to achieve better prediction effect [29]. Therefore, it is an urgent problem to be solved in the field of lake eutrophication prediction to build a deep reinforcement learning model [30, 31] that can contain multiple factors and clearly capture the temporal correlation between data.

Aiming at the problem that the above existing technologies are not accurate enough to deal with abrupt change data in the field of lake eutrophication prediction, this study proposes a prediction method of lake eutrophication based on the Improved MIMO-DD-3Q Learning model, which solves the problem that the prediction results of a single depth prediction model are biased when the multi-factor time series data fluctuates greatly. Meanwhile, the Reinforcement Learning algorithm is improved. The problem that the training efficiency of Reinforcement Learning Agent is slow and it is easy to fall into local optimal [32, 33] is solved. At the same time, the error correction of prediction results [34, 35] is carried out to improve the prediction accuracy of the model and provide a new way of thinking for the field of lake eutrophication prediction.

2 Improved MIMO-DD-3Q Learning

2.1 Construction deep Q Learning model

In traditional Q Learning, Agent learn and update according to the “Exploration-Utilization” code of conduct. Excessive exploration will lead to the decrease of Agent learning efficiency and slow updating of Q Learning strategy, while excessive utilization will lead to the Agent easily falling into local optimization, reducing the accuracy of Q Learning strategy, and greatly increasing the training and learning time. Aiming at this problem, the original linear behavior criterion of Agent is improved in this step. Firstly, a parameter of Q Learning, the greed factor ε, is defined, and the arctangent function is introduced. The greedy factor parameter of the Agent is changed according to a certain trend, the specific changes are as follows: (1)

Where, u represents the U-th training, and the curve of greed factor ε changing with the number of iterations is shown in Fig 1.

Download:

Fig 1. Curve of greedy factor with the number of iterations after the introduction of arctangent function.

https://doi.org/10.1371/journal.pone.0294278.g001

Secondly, multiple deep learning prediction models are defined as state space sets of S_t, reinforcement learning training, which can be expressed as follows:

Where, S_L is the dynamic prediction model at time t, and W is the number of optional prediction models at time t. The action space set A_t for prediction based on the prediction model obtained from the current state is defined as reinforcement learning, which is expressed as follows:

Where, A_L is the actions predicted by the L-th prediction model at time t, and K is the number of actions that may occur after the current state is selected and predicted at time t.

After the above definition of the state space and action space of reinforcement learning, the agent can obtain the current prediction results of multiple indicators after single-step training. In order to enable deep Q Learning to better solve the Markov decision process, this model defines the reward factor of reinforcement learning as multi-index mean reward, which is expressed as follows: (2) (3)

Where, s is the number of prediction indicators, R_ave is the average value of reward values of multiple prediction indicators, R_I is the reward factor of the I-th prediction indicator, y_t is the true value of this prediction indicator, and y_p is the predicted value of this prediction indicator.

2.2 Construction MIMO deep 3Q Learning model

At time t, the agent interacts with the environment. According to the actions made at the current time, the average reward value R_ave and the state S′ at time t+1 are obtained. Furthermore, three estimated Q values are obtained according to the state S′ at time t+1, which are expressed as follows: (4)

Where, Q_i(S′, A′) represents the i-th estimated value of Q function selected, A′ is the action selected at time t+1, and S′ is the state at time t+1.

According to the three estimated Q values, the three real Q values at the previous moment are updated. Then, the three real Q values at the current moment are calculated by calculating the average value and minimum value. A constant is introduced to obtain the weight parameter with the obtained average value and minimum value, and finally the weighted Q value is calculated to obtain the final Q Learning strategy. The calculation method is shown in Fig 2.

Download:

Fig 2. Flow chart of updating Q table after the introduction of weight parameters.

https://doi.org/10.1371/journal.pone.0294278.g002

The estimated Q value is obtained and the Q value of the previous time is updated in the following way: (5) (6) (7) (8) (9)

In the formula, α is the learning rate of the Agent in Q Learning, γ is the decay coefficient of the Agent learning in Q Learning, c is A constant, Q_ave(S,A) represents the average value of the three Q values in the current state, Q_min(S,A) represents the minimum value of the three Q values in the current state, λ_(S,A) represents the weight parameter in the current state, Q*(S,A) represents the final Q Learning strategy in the current state. In Formula (5), max_A′Q_i(S′,A′) said choice of the i-th Q estimate, i = 1,2,3.

2.3 Construction MIMO-DD-3Q Learning model

The preliminary prediction results are obtained by Improving MIMO-DD-3Q Learning model. After obtaining error values according to the obtained results, multiple groups of error data are used as the second input of the Deep Q Learning model. Then, Improving MIMO-DD-3Q Learning model is constructed through the improved deep 3Q learning training, which can improve the accuracy of the model. To get a final prediction. Improving MIMO-DD-3Q Learning model, as shown in Fig 3.

Download:

Fig 3. Structure diagram of Improved MIMO-DD-3Q Learning model.

https://doi.org/10.1371/journal.pone.0294278.g003

The specific steps to Improving MIMO-DD-3Q Learning model are as follows:

Set the training frequency threshold of the model, Dual 3Q Learning initial state model set M, control learning rate α, balance future reward decay factor γ. In the embodiment of the invention, the state model set M = {LSTM network, Deep-RF network, Transformer network}. Obtain the water quality time sequence data input after filtering in Step 1, and set the sample time sequence length as Nt.
Initialize Q*(S,A), define probability parameter σ∈(0,1), and start learning. When σ>ε, select action A_t randomly with the probability of σ. Otherwise, select action A_t = argmaxQ′ according to Q table, where Q′’ represents the Q value of each predicted action, and select the action with the largest Q value to execute.
Carry out the first retraining learning, in the single step time, execute: update the reward value R_ave by Formula (2); Q*(S,A) is updated by Formula (9); The prediction error is obtained according to the preliminary prediction results of the prediction model.
After u times of training, the optimal strategy Q*(S,A) is obtained after completing the prediction processing of the sample timing series, and the prediction model set, preliminary prediction result set and prediction error set are also obtained.
The parameters of the Improving MIMO-DD-3Q Learning model were initialized, and the prediction error set was input into the model for the second training learning. The process of execution in a single time step was described in Step 3.
After the second training reaches the threshold, the optimal strategy is obtained, and the prediction model set and the corrected error set are output. In a single time step, the corrected error is compensated to the preliminary prediction result to obtain the predicted index values within the time step.

The Improving MIMO-DD-3Q Learning model parameters stored in the two-training learning will be used to predict water eutrophication. Improving MIMO-DD-3Q Learning model can improve the prediction accuracy of eutrophication data. In order to find the optimal Q learning strategy on the single step prediction, LSTM network, Deep-RF network and Transformer network are used in this step as the state data set for Improving MIMO-DD-3Q Learning model, and each training of Improving MIMO-DD-3Q Learning model is a complete time series prediction.

For MIMO-DD-3Q Learning, the depth model to be predicted at the current moment is selected according to the Q learning strategy in a single step time, and the reward R_ave after the end of the action and the state of the next moment are obtained. At the same time, the preliminary prediction results of the moment, the selected prediction model set at the moment and the error set of the moment are obtained. After the completion of the MIMO-DD-3Q Learning, the model parameters are initialized and the input data is replaced with error sets to obtain a new Q learning strategy. The above operations are repeated in other processes, so as to jointly constitute the prediction process of Improving MIMO-DD-3Q Learning prediction model for lake eutrophication and obtain the final prediction result.

When the updated Improving MIMO-DD-3Q Learning model is used for real-time lake eutrophication prediction, the water quality time series data currently collected including the eutrophication prediction index are filtered and input into the model. The preliminary prediction result set and prediction error set are obtained through the first prediction, and the corrected error set is obtained through the second prediction. The error is compensated to the preliminary prediction results of corresponding time steps, and the final output of each prediction index value representing lake eutrophication. The complete prediction process is shown in Fig 4.

Download:

Fig 4. Lake eutrophication prediction flow chart with improving MIMO-DD-3Q Learning.

https://doi.org/10.1371/journal.pone.0294278.g004

3 Experimental verification

3.1 Data set

Taking 12 eutrophication prediction indexes of a river including KMNO₄, COD,BOD₅,TOC, NH₃-N, chroma, conductivity,TDS, turbidity, NO₃-N, Chl-a and fluoride as examples, the method proposed in this study was used to predict eutrophication of water bodies. The data obtained were screened and processed. The selection time span is from 0 o ’clock on July 26, 2021 to 0 o ’clock on September 5, 2021, during which the sample sampling interval is once every hour, and the time length is 1008 groups of data with 12 data features. In the overall experiment, the first 900 groups of data were selected as the training set of the model, and the first 90 groups of the remaining data were selected as the test set of the model. Specific data are shown in Table 1:

Download:

Table 1. Experimental data set.

https://doi.org/10.1371/journal.pone.0294278.t001

3.2 RAF and PCA

3.2.1 Recursive average filtering.

The water quality data of the area to be studied were collected and the concentration values of various factors were measured. The embodiment of this study included 12 factors such as KMNO₄, COD,BOD₅,TOC, NH₃-N, chroma, conductivity,TDS, turbidity, NO₃-N, Chl-a and fluoride. The prediction index values of the water to be studied were measured at different time points to obtain the water quality time series data. Each measurement sample contains the concentration values of these factors measured at a point in time.

Lake eutrophication is a phenomenon produced by the joint action of many factors, and the multi-factor data will be affected by certain noise during measurement, and the noise of the data is similar to the Gaussian noise distribution. Since there is noise in the data itself, which affects the prediction effect of the prediction model, the data is smoothed first and processed by means of recursive average. Two sets of sequences a and b are constructed first, which are shown as follows: (10) (11)

Where, l represents the length of sequence a. The larger the value of l is, the smoother the data will be. n is the number of samples, and N_n is the n-th sample.

Convolving the two sets of sequences a and b gives the smoothed sequence b′, which is expressed as follows: (12)

The smoothed data sample is obtained, and the data comparison before and after de-noising for some factors is shown in Figs 5 and 6.

Download:

Fig 5. Comparison before and after the removal of Chl-a data noise.

https://doi.org/10.1371/journal.pone.0294278.g005

Download:

Fig 6. Comparison of fluoride data before and after noise removal.

https://doi.org/10.1371/journal.pone.0294278.g006

3.2.2 Principal component analysis.

For data with multiple indexes and multidimensional dimensions, principal component analysis can not only reduce and simplify the data, but also judge the effective correlation between various indexes. After the smooth data is obtained by the above method, the initial data matrix is first established with the data samples and the selected prediction indicators of water eutrophication, and the initial data matrix B is defined. There are s horizontal prediction indicators of water eutrophication and v longitudinal data samples. The matrix is shown as follows: (13)

The data matrix B get matrix standardizing, matrix elements in computation formula is as follows: (14)

Inspection by KMO and Bartlett again about the suitability of the data matrix principal component analysis, the KMO value is greater than 0.5 to meet the criteria of principal component analysis. If yes, the principal component will be extracted next, and the final number of principal components will be determined by the size of eigenvalue close to one and the total contribution rate of eigenvalue is greater than 85%. According to the standardization of matrix calculation s a predictor of characteristic value, characteristic value contribution rate computation formula is as follows: (15) (16)

Where e_k is the variance interpretation rate of the k eigenvalue λ_k, and E_p is the sum of the variance interpretation rates of p eigenvalues. The number of principal components can be determined according to the obtained results. The component matrix table is obtained by SPSS (Social Science Statistical Software Package). The more approximate the absolute value of the load coefficient of each index in different principal components is 1, the higher the explanation rate of the index to the principal component is. Through all the above processes, the principal component calculation formula is finally determined as follows: (17)

In the formula, F_i represents the i principal component, and there are s prediction indexes of water eutrophication in total. X_k is the k prediction index, and a total of p principal components are determined. m_ik is the load coefficient value of the k prediction index in the ith principal component.

According to the principal component analysis, it can be proved that all factors are related in the prediction of water eutrophication, but the correlation is strong or weak. The prediction index with high interpretation rate among p principal components can be selected according to the actual situation. In this study, 12 predictors were selected for embodiments.

The initial data matrix was constructed according to the data sample and the eutrophication prediction index of water body. After standardization processing, KMO value and Bartlett value were obtained to prove that the data sample was suitable for principal component analysis. The results are shown in Table 2:

Download:

Table 2. Principal component analysis table.

https://doi.org/10.1371/journal.pone.0294278.t002

The eigenvalue and contribution rate of eigenvalue were calculated to determine the number of final principal components. The results are shown in Table 3:

Download:

Table 3. Interpretation of data population variance.

https://doi.org/10.1371/journal.pone.0294278.t003

It can be determined from Table 3 that the number of principal components is 5. According to the obtained factor loading coefficient, the ratio coefficients of different indicators in each principal component were analyzed, and the importance of the index was determined according to the common degree (common variance factor), as shown in Table 4 below.

Download:

Table 4. Factor load coefficient table.

https://doi.org/10.1371/journal.pone.0294278.t004

3.3 Model evaluation index

MAE, RMSE, MAPE were used to predict the model results.

Define the predicted value as: , The true value is: Q = {q₁, q₂,⋯,q_n}.

MAE(Mean Absolute Error) is calculated as follows:

RMSE(Root Mean Square Error) is calculated as follows:

MAPE(Mean Absolute Percentage Error) is calculated as follows:

4 Result and discussion

4.1 Performance comparison of MIMO-DD-3Q Learning

Firstly, the nonlinear greed factor is introduced to increase the "Exploration" process of Agent in the early stage of Q Learning, so as to quickly update the Q table. With the gradual increase of the number of iterations, the curve of the greed factor gradually becomes flat and gradually approaches a certain value. At this time, the Agent can realize the "Utilization" process in Q learning to the maximum extent. The training times of introducing nonlinear greed factor and fixed greed factor were selected to compare with the updating convergence time of Q table, as shown in Table 5 below:

Download:

Table 5. Update convergence time of some training times.

https://doi.org/10.1371/journal.pone.0294278.t005

First, the Agent selects the action according to the Q table or selects the action randomly through the probability of greed factor ε. The reward value obtained after prediction and the three estimated Q values at the next moment to obtain the three actual Q values at the moment, and then obtains the weight parameter of the final Q value by calculating the average value and the minimum value, and finally updates the final Q value. Q obtained by different iterations of timing data and partially connected moments adopted in this paper is shown in Table 6 below:

Download:

Table 6. Q table of 3Q Learning with the introduction of weight parameters (part).

https://doi.org/10.1371/journal.pone.0294278.t006

4.2 Comparison of prediction results of lake eutrophication

First of all, LSTM network, Deep-RF network and Transformer network are selected as the prediction model for Agent action selection, namely the initial state set. Then, the improved parameters are initialized and Q table is initialized, so that Agent can start to learn according to the given "Exploration-Utilization" rule. Then update the Q value of the current moment according to the obtained reward factor and the estimated value of the next moment. With the gradual increase of the number of iterations, the optimal learning strategy is obtained. The selection action is carried out according to the current optimal strategy, and the prediction result of the One-Dual 3Q Learning is obtained. Make the Agent learn again and find the optimal learning strategy, so as to get the final prediction result. Take August 1 from 00:00 to 12:00 as an example. The prediction model selected by the model is shown in Table 7 below:

Download:

Table 7. Results of 3Q Learning selection model in different time periods.

https://doi.org/10.1371/journal.pone.0294278.t007

The lake eutrophication index is taken as the input of model selection action prediction, and the reward factor is calculated according to the obtained results, so as to update the model parameters, make it find the optimal strategy, and then obtain the final prediction result through the optimal strategy. Compare the predicted value of lake eutrophication with the true value curves of LSTM, Deep-RF, Transformer, 3Q Learning model and DD-3Q learning model, and the results are shown in Fig 7. The predicted value and true value curves of each model from time node 21 to time node 40 are shown in Fig 8. The evaluation indicators of each model are shown in Table 8 below:

Download:

Fig 7. Curves of true and predicted values of five models of Chl-a concentration and fluoride concentration.

https://doi.org/10.1371/journal.pone.0294278.g007

Download:

Fig 8. Curve of Chl-a concentration and fluoride concentration at time nodes 21–40.

https://doi.org/10.1371/journal.pone.0294278.g008

Download:

Table 8. Model evaluation indicators.

https://doi.org/10.1371/journal.pone.0294278.t008

As shown in Fig 8, target represents the target curve, which is represented by DD-3Q Learning in this research method. As can be seen from the figure, Compared with the prediction curves of LSTM model, Deep-RF model, Transformer model and 3Q Learning model, the prediction curves of Chl-a concentration prediction and fluoride concentration prediction based on this research method are closer to the real target curves of Chl-a concentration prediction and fluoride concentration concentration. It can be seen from the evaluation indicators in Table 8. The error between the predicted value and the real value is the smallest and the accuracy is the highest.

4.3 Discussion

The purpose of this experiment is that Improving MIMO-DD-3Q Learning model proposed in this study is significantly better than LSTM, Deep-RF, Transformer and One Dual 3Q Learning models in predicting lake eutrophication. Meanwhile, the efficiency of Q learning training is improved by improving Q Learning algorithm. Taking the prediction results of Chl-a concentration index and fluoride index as an example, based on the test results and the curves of the predicted and true values of each model after local amplification, the results of the three error results of each model in Fig 8 and Table 8 can be obtained. Firstly, it can be observed that only using LSTM model for water eutrophication prediction results in the largest error; secondly, only using Transformer model for lake eutrophication prediction results slightly decrease compared with LSTM model; however, Transformer model has greater errors in some data mutation moments. In the single prediction model, the prediction error of Deep-RF model is smaller than the previous two models. In the prediction of lake eutrophication of the One Dual 3Q Learning model, the error is significantly decreased compared with the previous three models. Finally, the prediction error of Improving MIMO-DD-3Q Learning model proposed in this study is the lowest and has an obvious downward trend.

5 Conclusion

This paper takes multi-factor water quality data that may cause lake eutrophication as the research object, analyzes the influence of each index on the water eutrophication phenomenon, improves the existing reinforcement learning algorithm, and proposes a lake eutrophication prediction method based on Improving MIMO-DD-3Q Learning model. The following conclusions are obtained through the example verification of the water quality monitoring data of Yongding River in Beijing.

For the prediction of lake eutrophication, it is necessary and difficult to accurately predict the data with strong volatility. The traditional single depth prediction model has advantages and disadvantages in predicting the steep and gentle areas of data, and the introduction of Q Learning can combine the advantages of multiple prediction models. At the same time, by taking advantage of the precise decision-making power of reinforcement learning and considering long-term returns, the unified modeling of multi-factor correlation and multi-model combination of lake eutrophication is realized.
Aiming at the problems of slow training efficiency of Q Learning model and easy to fall into local optimization, the Q Learning algorithm is improved, and the greedy factor algorithm with arctangent function is proposed, so that the Agent can fully explore the environment in the early stage. Three Q estimates are introduced to update the Q table, and the final Improving MIMO-DD-3Q Learning model is constructed. To improve the training efficiency of the model and reduce the possibility of the model falling into the local optimal as far as possible.

References

1. Rose Gregersen,Howarth Jamie D,Atalah Javier,Pearman John K,Waters SeanLi Xun, et al. Paleo-diatom records reveal ecological change not detected using traditional measures of lake eutrophication.[J]. The Science of the total environment,2023.
- View Article
- Google Scholar
2. Wen ShuaiLong,Lu YueHan,Luo ChunYan,An ShiLin,Dai JiaRu,Liu ZhengWen, et al. Adsorption of humic acids to lake sediments: Compositional fractionation, inhibitory effect of phosphate, and implications for lake eutrophication[J]. Journal of Hazardous Materials,2022,433.
- View Article
- Google Scholar
3. Liu Yi,Chen Jining,Mol Arthur P. J. Evaluation of Phosphorus Flows in the Dianchi Watershed, Southwest of China[J]. Population and Environment,2004,25(6).
- View Article
- Google Scholar
4. Yamamoto Ren,Harada Masayoshi,Hiramatsu Kazuaki,Tabata Toshinori. Three-layered Feedforward artificial neural network with dropout for short-term prediction of class-differentiated Chl-a based on weekly water-quality observations in a eutrophic agricultural reservoir[J]. Paddy and Water Environment,2021(prepublish).
- View Article
- Google Scholar
5. Schallenberg Marc. The application of stressor–response relationships in the management of lake eutrophication[J]. Inland Waters,2020.
- View Article
- Google Scholar
6. Bruns Nicholas E., Heffernan James B., Ross Matthew R. V., Doyle Martin. A simple metric for predicting the timing of river phytoplankton blooms[J]. Ecosphere,2022,13(12).
- View Article
- Google Scholar
7. Hailu Sheferaw Ayele,Minaleshewa Atlabachew. Review of characterization, factors, impacts, and solutions of Lake eutrophication: lesson for lake Tana, Ethiopia[J]. Environmental Science and Pollution Research,2021,28(12).
- View Article
- Google Scholar
8. Biswajit Bhagowati, Bishal Talukdar, Narzary Binanda Khungur Ahamad Kamal Uddin. Prediction of lake eutrophication using ANN and ANFIS by artificial simulation of lake ecosystem[J]. Modeling Earth Systems and Environment,2022,8(4).
- View Article
- Google Scholar
9. Lian Ming Zhao,Bo Zeng. Prediction Modeling Method of Interval Grey Number Based on Different Type Whitenization Weight Functions[J]. Applied Mechanics and Materials,2013,2700(411–414).
- View Article
- Google Scholar
10. Mingyang Wang,Enzhi Wang,Xiaoli Liu,Congcong Wang. Topological graph representation of stratigraphic properties of spatial-geological characteristics and compression modulus prediction by mechanism-driven learning[J]. Computers and Geotechnics,2023,153.
- View Article
- Google Scholar
11. van der Schoot L S,van den Reek J M P A. Data-driven prediction of biologic treatment responses in psoriasis: steps towards precision medicine.[J]. The British journal of dermatology,2021,185(4).
- View Article
- Google Scholar
12. Derevenskaya O.Y.,Unkovskaya E.N.,Mingazova N.M. Zooplankton under the Conditions of Lake Eutrophication and Acidification[J]. Uchenye Zapiski Kazanskogo Universiteta Seriya Estestvennye Nauki,2019,161(4).
- View Article
- Google Scholar
13. Jingwen Yan, Donghao Jin, Xin Liu, Chaoqun Zhang, Heyang Wang. A coupled combustion and hydrodynamic model for the prediction of waterwall tube overheating of supercritical boiler[J]. Fuel,2023,334(P1).
- View Article
- Google Scholar
14. Mardani Neda,Suara Kabir,Fairweather Helen,Brown Richard,Adrian McCallum,Roy C. Sidle. Improving the Accuracy of Hydrodynamic Model Predictions Using Lagrangian Calibration[J]. Water,2020,12(2).
- View Article
- Google Scholar
15. Huang Jiacong,Gao Junfeng,Zhang Yinjun. Eutrophication Prediction Using a Markov Chain Model: Application to Lakes in the Yangtze River Basin, China[J]. Environmental Modeling & Assessment,2016,21(2).
- View Article
- Google Scholar
16. Zufei Li,Shuo Ding,Qi Zhong,Jugao Fang,Junwei Huang,Zhigang Huang, et al. A machine learning model for three years survival state prediction of HPSCC patients via multi parameters.[J]. The Journal of laryngology and otology,2023.
- View Article
- Google Scholar
17. Yanzhong Wang,Kai Zhang,Xiaopeng Ma,Piyang Liu,Haochen Wang,Xin Guo, et al. A physics-guided autoregressive model for saturation sequence prediction[J]. Geoenergy Science and Engineering,2023,221.
- View Article
- Google Scholar
18. Huiming Duan,Xinyu Pang. A novel grey prediction model with system structure based on energy background: A case study of Chinese electricity[J]. Journal of Cleaner Production,2023,390.
- View Article
- Google Scholar
19. Zhang Ming yan Du Xu, Hung Jui Long Li Hao, Liu Meng fan Tang Hengtao. Analyzing and Interpreting Students Self-regulated Learning Patterns Combining Time-series Feature Extraction, Segmentation, and Clustering[J]. Journal of Educational Computing Research,2022,60(5).
- View Article
- Google Scholar
20. Xue Zuo,Rui Zhu,Yuankai Zhou. Online tracking and prediction of slip ring degradation using chaos theory based on LSTM neural network[J]. Measurement Science and Technology,2023,34(5).
- View Article
- Google Scholar
21. Yang Cao,Yunsheng Qian,Jiawei Zhang,Yanan Wang,Yizheng Lang. An LSTM-based adaptive prediction control model for the wire diameter control of high-precision optical fiber drawing machines[J]. Optical Fiber Technology,2023,77.
- View Article
- Google Scholar
22. Diyuan Li, Zida Liu, Armaghani Danial Jahed Xiao Peng, Jian Zhou. Novel Ensemble Tree Solution for Rock burst Prediction Using Deep Forest[J]. Mathematics,2022,10(5).
- View Article
- Google Scholar
23. Chang Yang. Gas Concentration Prediction Method Based on Denoising Deep Forest[J]. Journal of Physics: Conference Series,2022,2303(1).
- View Article
- Google Scholar
24. Wu Xin, Li Jian, Huang Qi. Big Data-Based Transformer Substation Fault Prediction Method[J]. Journal of Electronic Science and Technology,2021,19(02):173–185.
- View Article
- Google Scholar
25. Yong Zhou,Yizhuo Li,Dengjia Wang,Yanfeng Liu. A multi-step ahead global solar radiation prediction method using an attention-based transformer model with an interpretable mechanism[J]. International Journal of Hydrogen Energy,2023,48(40).
- View Article
- Google Scholar
26. Matthew Overlin,Steven Iannucci,Bradly Wilkins,Alexander McBain,Jason Provancher. Reinforcement learned adversarial agent (ReLAA) for active fault detection and prediction in space habitats.[J]. NPJ microgravity,2023,9(1).
- View Article
- Google Scholar
27. Qingang Zhang,Mahbod Muhammad Haiqal Bin,Chng ChinBoon,Lee PohSeng,Chui CheeKong. Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.[J]. IEEE transactions on cybernetics,2022,PP.
- View Article
- Google Scholar
28. Zhuang Wang,Yi Ai,Qinghai Zuo,Shaowu Zhou,Hui Li. A Policy-Reuse Algorithm Based on Destination Position Prediction for Aircraft Guidance Using Deep Reinforcement Learning[J]. Aerospace,2022,9(11).
- View Article
- Google Scholar
29. DU Yong Ping JIN Xing Nan, HAN Hong Gui, WANG Lu Lin. Reusable electronic products value prediction based on reinforcement learning[J]. Science China(Technological Sciences),2022,65(07):1578–1586.
- View Article
- Google Scholar
30. Balamurugan Nagaiah Mohanan Adimoolam Malaiyalathan, Alsharif Mohammed H., Uthansakul Peerapong. A Novel Method for Improved Network Traffic Prediction Using Enhanced Deep Reinforcement Learning Algorithm[J]. Sensors,2022,22(13).
- View Article
- Google Scholar
31. Wei Jin, Qiming Fu, Jianping Chen, Yunzhe Wang, Lanhui Liu, You Lu, et al. A novel building energy consumption prediction method using deep reinforcement learning with consideration of fluctuation points[J]. Journal of Building Engineering,2023,63(PA).
- View Article
- Google Scholar
32. Xie Ya. Study on Multi-Agent Q Learning Based on Prediction[J]. International Review on Computers and Software IRECOS,2013,8(4).
- View Article
- Google Scholar
33. Alexander Kensert,Gilles Collaerts,Kyriakos Efthymiadis,Gert Desmet,Deirdre Cabooter. Deep Q-learning for the selection of optimal isocratic scouting runs in liquid chromatography[J]. Journal of Chromatography A,2021,1638(prepublish).
- View Article
- Google Scholar
34. Sellamuthu Kandasamy,Vishnu Kumar Kaliappan. Q-Learning-Based Pesticide Contamination Prediction in Vegetables and Fruits[J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING,2023,45(1).
- View Article
- Google Scholar
35. Hong Yang,Yuanxun Cheng,Guohui Li. A new traffic flow prediction model based on cosine similarity variational mode decomposition, extreme learning machine and iterative error compensation strategy[J]. Engineering Applications of Artificial Intelligence,2022,115.
- View Article
- Google Scholar

[ref1] 1. Rose Gregersen,Howarth Jamie D,Atalah Javier,Pearman John K,Waters SeanLi Xun, et al. Paleo-diatom records reveal ecological change not detected using traditional measures of lake eutrophication.[J]. The Science of the total environment,2023.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Wen ShuaiLong,Lu YueHan,Luo ChunYan,An ShiLin,Dai JiaRu,Liu ZhengWen, et al. Adsorption of humic acids to lake sediments: Compositional fractionation, inhibitory effect of phosphate, and implications for lake eutrophication[J]. Journal of Hazardous Materials,2022,433.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Liu Yi,Chen Jining,Mol Arthur P. J. Evaluation of Phosphorus Flows in the Dianchi Watershed, Southwest of China[J]. Population and Environment,2004,25(6).
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Yamamoto Ren,Harada Masayoshi,Hiramatsu Kazuaki,Tabata Toshinori. Three-layered Feedforward artificial neural network with dropout for short-term prediction of class-differentiated Chl-a based on weekly water-quality observations in a eutrophic agricultural reservoir[J]. Paddy and Water Environment,2021(prepublish).
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Schallenberg Marc. The application of stressor–response relationships in the management of lake eutrophication[J]. Inland Waters,2020.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Bruns Nicholas E., Heffernan James B., Ross Matthew R. V., Doyle Martin. A simple metric for predicting the timing of river phytoplankton blooms[J]. Ecosphere,2022,13(12).
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Hailu Sheferaw Ayele,Minaleshewa Atlabachew. Review of characterization, factors, impacts, and solutions of Lake eutrophication: lesson for lake Tana, Ethiopia[J]. Environmental Science and Pollution Research,2021,28(12).
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Biswajit Bhagowati, Bishal Talukdar, Narzary Binanda Khungur Ahamad Kamal Uddin. Prediction of lake eutrophication using ANN and ANFIS by artificial simulation of lake ecosystem[J]. Modeling Earth Systems and Environment,2022,8(4).
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Lian Ming Zhao,Bo Zeng. Prediction Modeling Method of Interval Grey Number Based on Different Type Whitenization Weight Functions[J]. Applied Mechanics and Materials,2013,2700(411–414).
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Mingyang Wang,Enzhi Wang,Xiaoli Liu,Congcong Wang. Topological graph representation of stratigraphic properties of spatial-geological characteristics and compression modulus prediction by mechanism-driven learning[J]. Computers and Geotechnics,2023,153.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. van der Schoot L S,van den Reek J M P A. Data-driven prediction of biologic treatment responses in psoriasis: steps towards precision medicine.[J]. The British journal of dermatology,2021,185(4).
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Derevenskaya O.Y.,Unkovskaya E.N.,Mingazova N.M. Zooplankton under the Conditions of Lake Eutrophication and Acidification[J]. Uchenye Zapiski Kazanskogo Universiteta Seriya Estestvennye Nauki,2019,161(4).
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Jingwen Yan, Donghao Jin, Xin Liu, Chaoqun Zhang, Heyang Wang. A coupled combustion and hydrodynamic model for the prediction of waterwall tube overheating of supercritical boiler[J]. Fuel,2023,334(P1).
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Mardani Neda,Suara Kabir,Fairweather Helen,Brown Richard,Adrian McCallum,Roy C. Sidle. Improving the Accuracy of Hydrodynamic Model Predictions Using Lagrangian Calibration[J]. Water,2020,12(2).
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Huang Jiacong,Gao Junfeng,Zhang Yinjun. Eutrophication Prediction Using a Markov Chain Model: Application to Lakes in the Yangtze River Basin, China[J]. Environmental Modeling & Assessment,2016,21(2).
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Zufei Li,Shuo Ding,Qi Zhong,Jugao Fang,Junwei Huang,Zhigang Huang, et al. A machine learning model for three years survival state prediction of HPSCC patients via multi parameters.[J]. The Journal of laryngology and otology,2023.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Yanzhong Wang,Kai Zhang,Xiaopeng Ma,Piyang Liu,Haochen Wang,Xin Guo, et al. A physics-guided autoregressive model for saturation sequence prediction[J]. Geoenergy Science and Engineering,2023,221.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Huiming Duan,Xinyu Pang. A novel grey prediction model with system structure based on energy background: A case study of Chinese electricity[J]. Journal of Cleaner Production,2023,390.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Zhang Ming yan Du Xu, Hung Jui Long Li Hao, Liu Meng fan Tang Hengtao. Analyzing and Interpreting Students Self-regulated Learning Patterns Combining Time-series Feature Extraction, Segmentation, and Clustering[J]. Journal of Educational Computing Research,2022,60(5).
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Xue Zuo,Rui Zhu,Yuankai Zhou. Online tracking and prediction of slip ring degradation using chaos theory based on LSTM neural network[J]. Measurement Science and Technology,2023,34(5).
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Yang Cao,Yunsheng Qian,Jiawei Zhang,Yanan Wang,Yizheng Lang. An LSTM-based adaptive prediction control model for the wire diameter control of high-precision optical fiber drawing machines[J]. Optical Fiber Technology,2023,77.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Diyuan Li, Zida Liu, Armaghani Danial Jahed Xiao Peng, Jian Zhou. Novel Ensemble Tree Solution for Rock burst Prediction Using Deep Forest[J]. Mathematics,2022,10(5).
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Chang Yang. Gas Concentration Prediction Method Based on Denoising Deep Forest[J]. Journal of Physics: Conference Series,2022,2303(1).
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Wu Xin, Li Jian, Huang Qi. Big Data-Based Transformer Substation Fault Prediction Method[J]. Journal of Electronic Science and Technology,2021,19(02):173–185.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref25] 25. Yong Zhou,Yizhuo Li,Dengjia Wang,Yanfeng Liu. A multi-step ahead global solar radiation prediction method using an attention-based transformer model with an interpretable mechanism[J]. International Journal of Hydrogen Energy,2023,48(40).
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref26] 26. Matthew Overlin,Steven Iannucci,Bradly Wilkins,Alexander McBain,Jason Provancher. Reinforcement learned adversarial agent (ReLAA) for active fault detection and prediction in space habitats.[J]. NPJ microgravity,2023,9(1).
View Article
Google Scholar

[77] View Article

[78] Google Scholar

[ref27] 27. Qingang Zhang,Mahbod Muhammad Haiqal Bin,Chng ChinBoon,Lee PohSeng,Chui CheeKong. Residual Physics and Post-Posed Shielding for Safe Deep Reinforcement Learning Method.[J]. IEEE transactions on cybernetics,2022,PP.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Zhuang Wang,Yi Ai,Qinghai Zuo,Shaowu Zhou,Hui Li. A Policy-Reuse Algorithm Based on Destination Position Prediction for Aircraft Guidance Using Deep Reinforcement Learning[J]. Aerospace,2022,9(11).
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref29] 29. DU Yong Ping JIN Xing Nan, HAN Hong Gui, WANG Lu Lin. Reusable electronic products value prediction based on reinforcement learning[J]. Science China(Technological Sciences),2022,65(07):1578–1586.
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref30] 30. Balamurugan Nagaiah Mohanan Adimoolam Malaiyalathan, Alsharif Mohammed H., Uthansakul Peerapong. A Novel Method for Improved Network Traffic Prediction Using Enhanced Deep Reinforcement Learning Algorithm[J]. Sensors,2022,22(13).
View Article
Google Scholar

[89] View Article

[90] Google Scholar

[ref31] 31. Wei Jin, Qiming Fu, Jianping Chen, Yunzhe Wang, Lanhui Liu, You Lu, et al. A novel building energy consumption prediction method using deep reinforcement learning with consideration of fluctuation points[J]. Journal of Building Engineering,2023,63(PA).
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref32] 32. Xie Ya. Study on Multi-Agent Q Learning Based on Prediction[J]. International Review on Computers and Software IRECOS,2013,8(4).
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref33] 33. Alexander Kensert,Gilles Collaerts,Kyriakos Efthymiadis,Gert Desmet,Deirdre Cabooter. Deep Q-learning for the selection of optimal isocratic scouting runs in liquid chromatography[J]. Journal of Chromatography A,2021,1638(prepublish).
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref34] 34. Sellamuthu Kandasamy,Vishnu Kumar Kaliappan. Q-Learning-Based Pesticide Contamination Prediction in Vegetables and Fruits[J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING,2023,45(1).
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref35] 35. Hong Yang,Yuanxun Cheng,Guohui Li. A new traffic flow prediction model based on cosine similarity variational mode decomposition, extreme learning machine and iterative error compensation strategy[J]. Engineering Applications of Artificial Intelligence,2022,115.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

Figures

Abstract

1 Introduction

2 Improved MIMO-DD-3Q Learning

2.1 Construction deep Q Learning model

2.2 Construction MIMO deep 3Q Learning model

2.3 Construction MIMO-DD-3Q Learning model

3 Experimental verification

3.1 Data set

3.2 RAF and PCA

3.2.1 Recursive average filtering.

3.2.2 Principal component analysis.

3.3 Model evaluation index

4 Result and discussion

4.1 Performance comparison of MIMO-DD-3Q Learning

4.2 Comparison of prediction results of lake eutrophication

4.3 Discussion

5 Conclusion

References