Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Online soft measurement method for chemical oxygen demand based on CNN-BiLSTM-Attention algorithm

  • Libo Liu,

    Roles Writing – review & editing

    Affiliation School of Environmental and Chemical Engineering, Shenyang University of Technology, Shenyang, China

  • Xueyong Tian ,

    Roles Writing – original draft

    tianxueyong@sut.edu.cn

    Affiliation School of Environmental and Chemical Engineering, Shenyang University of Technology, Shenyang, China

  • Yongguang Ma,

    Roles Methodology

    Affiliation School of Environmental and Chemical Engineering, Shenyang University of Technology, Shenyang, China

  • Wenxia Lu,

    Roles Validation

    Affiliation School of Environmental and Chemical Engineering, Shenyang University of Technology, Shenyang, China

  • Yuanqing Luo

    Roles Resources, Visualization

    Affiliation School of Environmental and Chemical Engineering, Shenyang University of Technology, Shenyang, China

Abstract

The measurement of chemical oxygen demand (COD) is very important in the process of sewage treatment. The value of COD reflects the effectiveness and trend of sewage treatment to a certain extent, but obtaining accurate data requires high cost and labor intensity. To1 solve this problem, this paper proposes an online soft measurement method for COD based on Convolutional Neural Network-Bidirectional Long Short-Term Memory Network-Attention Mechanism (CNN-BiLSTM-Attention) algorithm. Firstly, by analyzing the mechanism of the aerobic tank stage in the Anaerobic-Anoxic-Oxic (A2O) wastewater treatment process, the selection range of input variables was preliminarily determined, and the collected sample dataset was subjected to correlation analysis. Finally, pH, dissolved oxygen (DO), electrical conductivity (EC), and water temperature (T) were determined as input variables for soft measurement prediction of COD.Then, based on the feature extraction ability of CNN and the advantage that BiLSTM is able to capture the backward and forward dependencies in time series data, combined with the attention mechanism that can assign higher weights to the key data, a CNN-BiLSTM-Attention algorithm model was established to soft measure COD in the effluent from the aerobic zone of the A2O wastewater treatment process. At the same time, root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and coefficient of determination (R2) were utilized Three indicators were used to evaluate the model, and the results showed that the model can accurately predict the value of COD and has a high accuracy. At the same time, compared with models such as CNN-LSTM-Attention, CNN-BiLSTM, CNN-LSTM, LSTM, RNN, BP, SVM, XGBoost, and RF etc., the results showed that the CNN-BiLSTM Attention model performed the best, proving the superiority of the algorithm model.The Wilcoxon signed-rank test indicates significant differences between the CNN-BiLSTM-Attention model and other models.

1. Introduction

With the continuous advancement of industrial processes, countries around the world are paying more and more attention to water resource protection issues and investing a large amount of resources to support research on water environment pollution monitoring and control [1, 2]. As is well known, Chemical Oxygen Demand (COD) is an important indicator of the degree of water pollution [3]. The most commonly used method for measuring COD at present is rapid digestion spectrophotometry. Although this method has high detection accuracy, it has problems such as long instrument preparation time and complex operation process, which cannot meet the needs of workers to grasp COD values in real time [4]. Evolving soft measurement technology offers an alternative approach to this problem [5, 6].

Soft measurement technology is a method of using computer technology in industrial processes to estimate, predict, and monitor key process variables in real time [7]. These variables are often not directly measurable or can only be obtained through limited, discrete measurement data. The main goal of soft measurement technology is to estimate unknown variables using known process variable data through models and algorithms, in order to achieve process optimization, quality control, and automation [8]. Using soft measurement technology, the mechanism and correlation analysis of COD are carried out, and indicator parameters that can be easily and accurately measured are selected as input variables. A soft measurement model is constructed to achieve soft measurement prediction of COD values [9, 10].

The application of soft measurement technology in sewage treatment process is very extensive. As the core of soft measurement technology, soft measurement models have been extensively studied by domestic and foreign scholars in this field, and various modeling methods have been developed. Among them, the use of neural networks for soft measurement prediction of key indicator parameters in sewage treatment process has become a research hotspot.Zhang et al. employed a modeling approach that combines a three-layer Back Propagation (BP) neural network with support vector machine to predict the chemical oxygen demand (COD) of lake water quality. The experimental results demonstrate the model’s excellent performance and reliable prediction outcomes [11]. Liu et al. utilized least squares support vector Machine (LS-SVM) to establish a prediction model for effluent chemical oxygen demand (COD) in an anaerobic wastewater treatment system [12]. ZHAO et al. introduced a new soft sensing method. In the proposed method, Sparse Principal Component Analysis (SPCA) is used for dimensionality reduction of the dataset, and the soft sensing model is constructed through an improved Extreme Learning Machine (ELM) algorithm. The results show good performance in predicting BOD5 and COD [13]. Wu et al. constructed a water quality prediction model for the Jinjiang River in China based on artificial neural network (ANN), discrete wavelet transform (DWT), and short-term memory (LSTM) techniques. However, due to limited monitoring conditions, water quality prediction can only be performed for a single location [14]. Nair et al. introduced the development, implementation, and validation of a hybrid soft sensor for estimating total phosphorus (TP) and chemical oxygen demand (COD) in the inflow and outflow of full-scale sewage treatment plants, while also equipped with a GUI for visualization. The results showed that soft sensors have great potential for achieving real-time display of parameters within an acceptable accuracy range [15]. Boshnakov et al. constructed a soft sensing method using BP neural network to monitor COD and BOD in effluent during wastewater treatment. The simulation results show that the soft sensing method based on neural networks can accurately estimate state variables and can be used for real-time monitoring of biochemical wastewater treatment processes [16].

Currently, extensive research is being conducted on the prediction of water quality parameters in sewage treatment engineering using neural networks. While most studies have achieved their predictive goals, several issues still persist: (1) Many studies remain at the algorithmic stage, with some algorithms exhibiting relatively low prediction accuracy. (2) Although the input variables selected by most algorithms have high correlation, the actual measurement is difficult, so it is of little significance for real-time measurement and monitoring. (3) Some soft measurement research solely focuses on predicting historical data without realizing real-time predictions.

In order to solve the above problems, this paper constructs a neural network model based on CNN-BiLSTM-Attention algorithm, and uses the model for the soft measurement of COD in the effluent of aerobic zone of the A2O wastewater treatment process, which greatly improves the efficiency of COD determination compared with the traditional laboratory method of COD determination and has great significance in the aspects of real-time monitoring of the water quality situation, cost saving, and so on. The model is of great significance in real-time monitoring of water quality and cost saving.

2. Data collection and selection of input variables

2.1 Source of water quality sample data

The experimental data used in this article comes from the aerobic tank part of an A2O method urban sewage treatment simulation device in the laboratory. The schematic diagram of the A2O method urban sewage treatment simulation device is shown in Fig 1, which consists of an inlet tank, anaerobic tank, anoxic tank, aerobic tank, secondary sedimentation tank, effluent tank, and internal and external reflux pipelines.

thumbnail
Fig 1. A2O method urban sewage treatment simulation device.

https://doi.org/10.1371/journal.pone.0305216.g001

The main processes occurring in the aerobic pool are nitrification of ammonia nitrogen and oxidative decomposition of organic matter:

Ammonia nitrification: When the sewage enters the aerobic pool, nitrifying bacteria in the sludge undergo nitrification under aerobic conditions to convert ammonia nitrogen in the sewage into nitrate, thereby generating a nitrifying liquid. Subsequently, this nitrifying liquid is directed towards the anoxic pool through internal circulation for further reactions.

The oxidative degradation of organic matter: The oxidation and decomposition of organic matter in the water provide energy for phosphorus-absorbing microorganisms, which then absorb phosphorus from the water. Phosphorus is assimilated into the cellular tissue of these microorganisms and subsequently discharged from the system as phosphorus-rich sludge after undergoing precipitation separation.

The aerobic pool of the A2O method urban sewage treatment simulation device in this study has a volume of 87.56L. To optimize the reduction of urban domestic sewage, glucose and milk powder were selected as carbon sources, while ammonium sulfate ((NH4)2SO4) was chosen as the nitrogen source for sewage allocation. The sewage was prepared with a carbon-nitrogen ratio of 100:5, ensuring that when COD in the sewage is 500mg/L, the nitrogen content remains at 25mg/L. Table 1 presents the specific materials used for sewage preparation.

2.2 Mechanism analysis of COD in aerobic tank effluent

The process of selecting input variables for COD in aerobic tank effluent mainly includes: mechanism analysis of COD in aerobic tank effluent, water quality data collection, and correlation analysis. Firstly, a mechanism analysis is conducted on the COD of the effluent from the aerobic tank of the A2O sewage treatment process to preliminarily determine the types of input variables. At the same time, the selection range of input variables should follow the principle of being able to be collected in real-time by sensors, that is, easy to measure. Then, data collection is carried out based on the preliminarily determined input variables. Finally, correlation analysis is used to adjust and screen the input variables, and finally, the input variables are determined.

The key to analyzing the mechanism of COD in the effluent of aerobic tanks in A2O sewage treatment process lies in how the process efficiently removes organic pollutants from water. In the anaerobic stage, complex organic matter in wastewater is first decomposed into smaller molecules by anaerobic microorganisms. Next, in the anaerobic stage of denitrification reaction, denitrifying bacteria will use nitrate as an electron acceptor for anaerobic respiration, producing nitrogen gas. At the same time, the oxidation-reduction potential (ORP) of wastewater will decrease, and this reaction process will also be accompanied by a certain amount of organic matter degradation, further reducing the COD content. Finally, in the aerobic stage, which is the main stage for COD removal, nitrifying bacteria use dissolved oxygen (DO) to oxidize and decompose the remaining organic matter, converting it into carbon dioxide and water, thereby significantly reducing the COD concentration in water. At the same time, some inorganic salts such as nitrates are also produced, which can affect the electrical conductivity (EC) of wastewater. The high or low water temperature (T) is not conducive to the growth of microorganisms, which in turn affects the rate of reaction.

According to the above analysis, the indicators that have a significant impact on COD include pH, ORP, EC, DO, and T. Among them, the pH value will affect the microbial activity in the A2O process, with an ideal pH of 6–9. Within this range, the efficiency of microbial decomposition of organic matter is higher, thereby reducing COD in water quality. When the pH is too low or too high, it will inhibit the activity of microorganisms and reduce the efficiency of COD removal; The value of ORP has a certain impact on the growth of microorganisms. In anaerobic and anaerobic stages, lower ORP helps promote the growth of certain specific microorganisms, which contribute to the decomposition of organic matter and the removal of nitrogen and phosphorus. In the aerobic stage, higher ORP is conducive to the oxidation and decomposition of organic matter, thereby removing COD; EC is an indicator that reflects the ion concentration in water, indirectly reflecting the content of salt and organic matter in wastewater. Excessive EC may indicate a high concentration of organic matter, which may increase the difficulty of COD removal. Meanwhile, high salinity environments may have inhibitory effects on certain microorganisms, affecting the removal of COD; Water temperature affects the activity of microorganisms and the reaction rate of various reactions, thereby affecting the value of COD; In the aerobic stage, an appropriate level of DO can ensure the rate of nitrification reaction and promote the oxidation and decomposition of organic matter, so dissolved oxygen has a significant impact on COD.

In summary, through the mechanism analysis of COD in aerobic tank effluent, the input variables of COD in aerobic tank effluent were preliminarily determined as pH, ORP, EC, T, and DO.

2.3 Collection of water quality sample data

Once the initial selection range of input variables is determined, data collection should be carried out to provide sample data for subsequent correlation analysis and model training. This article uses the A2O method urban sewage treatment simulation device mentioned earlier for data collection, and in order to measure the amount of activated sludge in the aeration tank, MLSS collection is added to the data collection process. After 180 days of experimentation, a total of 500 sets of valid data were obtained, and some sample data (collected in March 2023 in Northeast China) are shown in Table 2.

2.4 Selection of input variables

After the data collection is completed, the next step is to analyze these variables by correlation analysis, which is a statistical analysis method to study the correlation between two or more random variables in the same state. Currently, there are three types of correlation coefficients used to represent the correlation between variables: Pearson correlation coefficient, Spearman correlation coefficient and Kendall correlation coefficient. The correlation coefficient can reflect the direction and degree of the trend of change between two variables. The range of values is [–1,1], with 0 indicating no correlation [17]. Positive values indicate positive correlation and negative values indicate negative correlation. The larger the absolute value of the correlation coefficient, the stronger the correlation. By analyzing the process and historical monitoring dataset of the A2O wastewater treatment process, the conditions for the applicability of the Pearson correlation coefficient were met and the correlation analysis was performed using SPSS software.

Pearson correlation coefficient is a linear correlation coefficient used to measure the linear relationship between two variables [18, 19]. The Pearson correlation coefficient is calculated as follows: (1) Where cov(X,Y) is the covariance of the variable (X,Y), σX and σY are their respective standard deviations.

To verify the correlation between COD and the selected input variables, historical monitoring data covering these parameters were selected for correlation analysis. The correlation analysis is shown in Fig 2, and the correlation coefficients are shown in Table 3.

In order to further verify the rationality of the selected input variables, correlation analysis is conducted between the selected input variables. If the correlation between the selected input variables is strong, it means that there is a lot of duplicate information between the two, and the relevant input variables can be deleted. The correlation coefficients between input variables are shown in Table 4.

thumbnail
Table 4. Correlation coefficient between input variables.

https://doi.org/10.1371/journal.pone.0305216.t004

According to the analysis results in Table 3, it can be seen that the correlation coefficient between ORP and EC is 0.778, which has a strong correlation, indicating that there is a lot of redundant information between them. At the same time, considering economic and maintenance factors, ORP was removed from the selected input variables, and the final input variables were determined as pH, EC, T, and DO.

2.5 Data normalization

The essence of model training is to find a minimum loss function through a large amount of training data. This study involves multiple variables, which have different physical meanings and dimensions. If normalization is not carried out, it will have a significant impact on the training of the model. Therefore, before inputting sample data into the model, all variables in 500 sets of sample data need to be normalized and scaled to a certain ratio between 0 and 1 [20]. By normalizing, the training time is reduced, the convergence speed of the model is accelerated, and the prediction accuracy of the model is improved. The formula for data normalization is as follows: (2) Where x* is the result of data normalization, x is the sample data, min x is the minimum value in the data, and max x is the maximum value in the data. Since the input data is normalized, the obtained prediction result is also between 0 and 1, so the final prediction result should be de-normalized when output, and the data de-normalized formula is as follows: (3) Where y* is the predicted value after normalization and x* is the predicted value after de-normalization.

3. Modeling method of CNN-BiLSTM-Attention hybrid model

3.1 Convolutional neural networks

Convolutional neural network (CNN) is a feedforward neural network with convolutional operation and deep structure. CNN is usually composed of input layer, convolutional layer, pooling layer, fully connected layer, and output layer. Convolutional layers are specialized for data processing, used to filter input data and extract useful information, and use convolutions of different sizes to check the data for convolution [21, 22]. A pooling layer is periodically inserted between consecutive convolution layers to gradually reduce the spatial size of the data body and effectively prevent overfitting. Finally, the training results are obtained through fully connected layers and output layers. A one-dimensional CNN can extract corresponding features of time series. Water quality data belongs to time series; Therefore, one-dimensional CNN networks are applied to feature extraction of wastewater quality data. The process of multidimensional matrix processing for time series data is as follows:

Convolution layer: The preprocessed sewage water quality data is used as input, and the feature matrix is generated by convolution operation. (4) Where: conv is the convolution operation in CNN network; and are the input and output of the convolution process, respectively; l is the length of the sewage water quality data sequence, i and j represent the processing position in the convolution process. is the weight of the convolution layer; is the bias of the convolutional layer. (5) Where and are the input and output of the activation function respectively; f is the nonlinear activation function Relu.

Pooling layer: The pooling operation is performed on the obtained feature matrix. (6) The pooling function represents the pooling operation; represents the output after the pooling layer.

In this paper, the feature vectors obtained by the CNN network convolution operation are input into the subsequent BiLSTM network layers for time-related feature extraction.

3.2 Long Short-Term Memory Neural Networks

Long Short-Term Memory Neural Network (LSTM) is a special type of Recurrent Neural Network (RNN) that has good performance in processing time series data [23, 24]. However, RNN has a long-term dependency problem. When the input time series data is too long, RNN cannot effectively learn long-distance dependency relationships. To solve this problem, researchers combined gating functions and hidden states to propose LSTM [25]. LSTM introduces a "gate" mechanism to control the flow and loss of features, solving the long-term dependency problem of RNN and effectively avoiding problems such as gradient disappearance and explosion [26]. Fig 3 shows the basic structure of LSTM, and the relevant formulas of LSTM are as follows:

  1. 1. A forget gate is a hidden cell state in an LSTM that determines the extent to which the previous time’s cell state is retained for the current time.
(7)
  1. 2. The input gate determines how much of the network’s input at the current time is saved to the unit state.
(8)(9)
  1. 3. The update of the storage unit involves discarding unnecessary information and adding new information learned by the network at the current moment.
(10)
  1. 4. Calculate the output gate and the hidden layer state at the current time
(11)(12)

where, f, i, C, o and h are the outputs of the forget gate, input gate, cell state, output layer and hidden node, respectively. σ and tanh are the sigmoid activation function and the hyperbolic tangent function, respectively. W and b denote the corresponding weight coefficient matrix and bias, respectively.

3.3 Bi-directional Long Short-Term Memory Neural Networks

Bidirectional Long Short-Term Memory Neural Network (BiLSTM) is an optimization and improvement of LSTM, and is a neural network with a powerful architecture. It is formed by combining forward LSTM and backward LSTM, rather than simply increasing the depth of the model through stacking. The hidden state ht of BiLSTM at the current time t includes forward hidden state ht−1 and backward hidden state ht+1. Traditional LSTM can only process data in one direction, so BiLSTM combines reverse LSTM to capture important features that traditional LSTM may miss and fully consider past and future information [27, 28]. Therefore, BiLSTM greatly expands the amount of information available in neural networks, improving the utilization of sample data and prediction accuracy. The structure of BiLSTM is shown in Fig 4.

Where and represent the outputs of the forward LSTM and reverse LSTM hidden layers, respectively. The BiLSTM layer generates an output vector Y, where each element is computed as follows. (13) Where the σ function is used to couple sequences and , and the final output of BiLSTM is represented as Y = [y1,y2,⋯,yt].

3.4 Attention mechanism

Attention mechanism originates from the investigation of human vision [29]. For instance, our visual system usually tends to focus on the parts of the image that are helpful for judgment, while ignoring irrelevant information. Different variables have different impacts on the main variable, and important variables contain more critical information, which has a greater impact on the main variable [30]. However, traditional neural networks cannot distinguish the importance of variables, resulting in the inability to highlight important features during model training. Therefore, attention mechanism is introduced for optimization. The attention mechanism assigns different weights based on different features, that is, assigns higher weights to key information, reduces or ignores irrelevant information through weight differentiation, amplifies important information needed, improves processing efficiency and model prediction accuracy. The formula for attention mechanism is as follows: (14) (15) (16) Where v, U and W represent the learnable network parameters, q is the query vector, s(xi, q) is the attention scoring function, and αi is the attention distribution.

For the input [x1,x2xN]of the Attention layer, the correlation between the query vector and each input x is calculated through the attention scoring function s(xi, q) to obtain a score, and then these scores are normalized using the softmax function to obtain a series of attention distributions [α1,α2αN]. Finally, the input information is weighted and summed according to the attention distribution to obtain the output sequence att(X,q). The structure of the attention mechanism is shown in Fig 5.

3.5 CNN-BiLSTM-Attention algorithm model

In this paper, a CNN-BiLSTM-Attention algorithm model is proposed and applied to soft-measurement prediction of COD values of effluent from aerobic zone of A2O wastewater treatment process. First, the input data are preprocessed, next, Conv1D is used to perform convolution operation on the data to extract the local features of the data. Then, BiLSTM serializes the extracted local features. Meanwhile, during the training process, Dropout layer is utilized to randomly reject some features to improve the robustness of the model. On this basis, the attention mechanism is introduced to weight the sequence data, and different weights are assigned according to the importance of the sequence data. Finally, the predicted values in the test set are output, and the error analysis is given and the model is saved. The flow chart of the model is shown in Fig 6.

3.6 Model performance metrics

In order to test the performance of the CNN-BiLSTM-Attention hybrid model, three indexes, root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination (R2), were used to evaluate the CNN-BiLSTM-Attention hybrid model. RMSE reflects the degree of deviation between the true value and the predicted value, and MAE represents the average absolute error between the true value and the predicted value. The smaller the RMSE and MAE are, the closer the predicted value is to the true value and the better the prediction effect is, while R2 represents the degree of fit, and the closer it is to 1, the better the fitting ability of the data is. The formula for RMSE, MAE, MAPE, and R2 is as follows: (17) (18) (19) (20) Where m is the number of samples in the water quality sample data, yi is the actual COD value, is the predicted COD value, RMSE is the mean square error, MAE is the mean absolute error, R2 is the coefficient of determination, and MAPE is the average absolute percentage error.

4.CNN-BiLSTM-Attention model architecture and performance analysis

4.1 Experimental environment

In this paper, the deep learning framework Keras is applied to build the environment required for simulation experiments. The specific environment parameters are: CPU, AMD Ryzen 7 4800H; GPU, NVIDIA GeForce GTX 1650; RAM, 16GB; Operating system, windows 11, 64-bit;keras version 2.8.0; tensorflow version 2.8.0; python version 3.9.

4.2 Model parameter setting

In this study, GridSearchCV from scikit-learn library is used to find the best combination of hyperparameters, which provides a convenient interface to define hyperparameter grids, instantiate models, perform cross-validation and performance evaluation. Considering the grid search method in this paper, we define several key hyperparameters, including learning rate, batch size, dropout rate, CNN layer number and convolution kernel parameter, BiLSTM layer number and cell number. At the same time, a reasonable search space is set for each hyperparameter to cover the best possible configuration, as shown in Table 5.

During grid search, cross validation is used to evaluate the performance of the model. Cross validation divides the dataset into multiple subsets, and then uses each subset as the test set and the remaining subsets as the training set for model training and validation. This process will be repeated multiple times, each time using a different subset as the test set. Finally, calculate the average of all validation results as the performance indicator for this hyperparameter combination. Through multiple experimental analyses, the optimal values of these parameters were obtained, as shown in Table 6.

4.3 Model performance and analysis

In order to verify the performance of the CNN-BiLSTM-Attention algorithm model, this paper divides 20% of the 500 sample data after data normalization into the test set and 80% into the training set, and trains and tests the model, the model training results are shown in Fig 7. Meanwhile, evaluate the performance of the model using RMSE, MAE, MAPE, and R2.

thumbnail
Fig 7.

(a) Comparison between true and predicted values in the test set (b) Prediction error distribution density distribution map.

https://doi.org/10.1371/journal.pone.0305216.g007

It can be seen from Fig 7 that the model can capture the inherent law of the data well, with high prediction accuracy, small error between the predicted value and the true value, and fast convergence speed.

4.4 Comparative validation of different models

In order to further verify the performance of CNN-BiLSTM-Attention model, CNN-BiLSTM-Attention model was compared with other models (CNN-LSTM-Attention, CNN-BiLSTM, CNN-LSTM, LSTM, RNN, BP).The comparison of COD prediction results is shown in Fig 8, and RMSE, MAE, MAPE, and R2 of each model structure are shown in Table 7.

thumbnail
Table 7. Comparison of evaluation indicators for different models.

https://doi.org/10.1371/journal.pone.0305216.t007

From Fig 8 and Table 7, it can be concluded that the prediction accuracy of the CNN-BiLSTM-Attention model is higher than that of other models. In terms of RMSE, the CNN-BiLSTM-Attention model decreased by 17.18%, 23.19%, 40.36%, 53.44%, 63.24%, 64.64%, 57.17%, 56.06%, and 64.57% compared to the other nine models, respectively. In terms of MAE, the CNN-BiLSTM-Attention model reduced by 18.61%, 22.63%, 45.95%, 51.07%, 58.38%, and 62.16% compared to the other nine models, respectively. In terms of MAPE, the CNN-BiLSTM-Attention model decreased by 5.13%, 6.35%, 9.66%, 20.85%, 27.90%, 29.28%, 24.15%, 23.83%, and 28.78% compared to the other nine models, respectively. In R 2 In terms of performance, the CNN-BiLSTM-Attention model has improved by 1.17%, 1.92%, 3.62%, 7.26%, 13.85%, 16.14%, 9.54%, 10.03%, and 12.70% compared to the other nine models, respectively. Through model comparison, the accuracy of the proposed CNN-BiLSTM-Attention model for COD prediction has been further verified, and its performance is superior to other models.

4.5 Wilcoxon signed-rank test

In order to consolidate the conclusions of the comparative verification of different models in this chapter, we conducted a non parametric statistical test, namely Wilcoxon signed rank test, on the proposed CNN-BiLSTM-Attention model and CNN-LSTM-Attention model, and CNN-BiLSTM model and CNN-LSTM model. the null hypothesis is these four models are similar, that is, combining convolutional neural networks, bidirectional long short-term memory, and attention mechanisms did not improve the accuracy of soft measurement. Therefore, the soft measurement results of the four models mentioned above are very similar or identical. In Table 8, the average, standard deviation, maximum, and minimum values of various performance evaluation indicators for the four models are presented. The CNN-BiLSTM-Attention model was compared pairwise with other models. According to the Wilcoxon signed rank test, it can be concluded from this table that after 10 independent runs, RMSE of CNN-BiLSTM-Attention model The mean values of MAE, MAPE, and R2 are all superior to other models. In Table 9, the positive and negative ranks of each pair of error measures corresponding to the CNN-BiLSTM-Attention model and other models are presented. Further evaluate and analyze the tests in Table 10 based on the descriptive statistical data presented in Tables 8 and 9.

The Wilcoxon signed rank test shows significant differences among the four models mentioned above, with Z-values of the test statistic being -2.312, -2.452, -2.617, -2.121, -3.332, -3.134, -3.421, -3.109, -3.501, -3.683, -3.603, and -3.555, respectively. From Asymp.Sig.(2-tailed) can infer the relationship between the CNN-BiLSTM-Attention model and the CNN-LSTM-Attention model, RMSE is 0.020, MAE is 0.014, MAPE is 0.008, R2 is 0.033. The CNN-BiLSTM-Attention Model and CNN-BiLSTM All indicators between CNN-LSTM are 0. Based on the results of Wilcoxon’s signed-rank test, we reject the null hypothesis, the CNN-BiLSTM-Attention model differs significantly from other models.

5. Conclusion

COD is very critical in the wastewater treatment process, but due to the complexity of water quality conditions, accurate prediction is very challenging. Therefore, this paper proposed a soft measurement method of COD based on CNN-BiLSTM-Attention algorithm to predict the COD value in the wastewater treatment process, and verified the accuracy of the model through experiments. It provides a low-cost and efficient method for real-time monitoring of COD in wastewater treatment.

The use of this model combined with auxiliary variable sensors as a measurement method for COD has advantages such as high monitoring efficiency, low labor intensity, and small sampling error compared to traditional laboratory measurement methods. Compared with COD sensors, there are also many advantages, including low cost, short maintenance cycle, and strong environmental adaptability. Especially when the monitoring conditions are poor and the monitoring points are scattered, the advantages of this model can be better reflected.

Although there has been research on this soft measurement method, it still has certain limitations and can be further improved in future research, including: (1) when used in new application scenarios, the model needs to be retrained with historical data from new application scenarios. (2) In the short term, the accuracy of soft measurement can be guaranteed, but over time, if there are significant changes in water quality conditions, the error will increase.

In the future research, the historical data should be saved in real time, and the latest historical data should be used to train the model every once in a while, so as to ensure a high accuracy for a long time and extend the effective prediction period of the model. In addition, new application scenarios and more sample data will be introduced in the future to further verify the performance of the model.

Supporting information

References

  1. 1. Malviya A, Jaspal D. Artificial intelligence as an upcoming technology in wastewater treatment: a comprehensive review. Environmental Technology Reviews. 2021;10: 177–187.
  2. 2. Ye Z, Yang J, Zhong N, Tu X, Jia J, Wang J. Tackling environmental challenges in pollution controls using artificial intelligence: A review. Science of The Total Environment. 2020;699: 134279. pmid:33736193
  3. 3. Pattnaik BS, Pattanayak AS, Udgata SK, Panda AK. Machine learning based soft sensor model for BOD estimation using intelligence at edge. Complex Intell Syst. 2021;7: 961–976.
  4. 4. Yu P, Cao J, Jegatheesan V, Du X. A Real-time BOD Estimation Method in Wastewater Treatment Process Based on an Optimized Extreme Learning Machine. Applied Sciences. 2019;9: 523.
  5. 5. Alsulaili A, Refaie A. Artificial neural network modeling approach for the prediction of five-day biological oxygen demand and wastewater treatment plant performance. Water Supply. 2021;21: 1861–1877.
  6. 6. Cao H, Han L, Li L. A deep learning method for cyanobacterial harmful algae blooms prediction in Taihu Lake, China. Harmful Algae. 2022;113: 102189. pmid:35287935
  7. 7. Cheng T, Harrou F, Kadri F, Sun Y, Leiknes T. Forecasting of Wastewater Treatment Plant Key Features Using Deep Learning-Based Models: A Case Study. IEEE Access. 2020;8: 184475–184485.
  8. 8. Li P, Wang D, Li W, Liu L. Sustainable water resources development and management in large river basins: an introduction. Environ Earth Sci. 2022;81: 179, s12665-022-10298–9. pmid:35280111
  9. 9. Wang W, Yang C, Han J, Li W, Li Y. A soft sensor modeling method with dynamic time-delay estimation and its application in wastewater treatment plant. Biochemical Engineering Journal. 2021;172: 108048.
  10. 10. Wu J, Cheng H, Liu Y, Huang D, Yuan L, Yao L. Learning soft sensors using time difference–based multi-kernel relevance vector machine with applications for quality-relevant monitoring in wastewater treatment. Environ Sci Pollut Res. 2020;27: 28986–28999. pmid:32424758
  11. 11. Zhang Y, Duan Z, Yi A, Hu J, Chen Y. Research on COD Soft Measurement Technology Based on Multi-Parameter Coupling Analysis Method. JMSE. 2022;10: 683.
  12. 12. Liu Z, Wan J, Ma Y, Wang Y. Online prediction of effluent COD in the anaerobic wastewater treatment system based on PCA-LSSVM algorithm. Environ Sci Pollut Res. 2019;26: 12828–12841. pmid:30887455
  13. 13. Zhao F, Liu M, Wang K, Wang T, Jiang X. A soft measurement approach of wastewater treatment process by lion swarm optimizer-based extreme learning machine. Measurement. 2021;179: 109322.
  14. 14. Wu J, Wang Z. A Hybrid Model for Water Quality Prediction Based on an Artificial Neural Network, Wavelet Transform, and Long Short-Term Memory. Water. 2022;14: 610.
  15. 15. Nair A, Hykkerud A, Ratnaweera H. Estimating Phosphorus and COD Concentrations Using a Hybrid Soft Sensor: A Case Study in a Norwegian Municipal Wastewater Treatment Plant. Water. 2022;14: 332.
  16. 16. Fan L, Boshnakov K. Neural-network-based water quality monitoring for wastewater treatment processes. 2010 Sixth International Conference on Natural Computation. Yantai, China: IEEE; 2010. pp. 1746–1748. https://doi.org/10.1109/ICNC.2010.5584378
  17. 17. Janbain I, Jardani A, Deloffre J, Massei N. Deep Learning Approaches for Numerical Modeling and Historical Reconstruction of Water Quality Parameters in Lower Seine. Water. 2023;15: 1773.
  18. 18. Ye G, Wan J, Deng Z, Wang Y, Chen J, Zhu B, et al. Prediction of effluent total nitrogen and energy consumption in wastewater treatment plants: Bayesian optimization machine learning methods. Bioresource Technology. 2024;395: 130361. pmid:38286171
  19. 19. Ye G, Wan J, Bai Y, Wang Y, Zhu B, Zhang Z, et al. Prediction of the effluent chemical oxygen demand and volatile fatty acids for anaerobic treatment based on different feature selections machine-learning methods from lab-scale to pilot-scale. Journal of Cleaner Production. 2024;437: 140679.
  20. 20. Wang M, Yang Z, Tai C, Zhang F, Zhang Q, Shen K, et al. Prediction of road dust concentration in open-pit coal mines based on multivariate mixed model. Gomes R, editor. PLoS ONE. 2023;18: e0284815. pmid:37099504
  21. 21. Cai Y, Guo J, Tang Z. An EEMD-CNN-BiLSTM-attention neural network for mixed frequency stock return forecasting. IFS. 2022;43: 1399–1415.
  22. 22. Chung WH, Gu YH, Yoo SJ. District heater load forecasting based on machine learning and parallel CNN-LSTM attention. Energy. 2022;246: 123350.
  23. 23. Anul Haq M. CDLSTM: A Novel Model for Climate Change Forecasting. Computers, Materials & Continua. 2022;71: 2363–2381.
  24. 24. Shi M, Yang B, Chen R, Ye D. Logging curve prediction method based on CNN-LSTM-attention. Earth Sci Inform. 2022;15: 2119–2131.
  25. 25. Pu Z, Yan J, Chen L, Li Z, Tian W, Tao T, et al. A hybrid Wavelet-CNN-LSTM deep learning model for short-term urban water demand forecasting. Front Environ Sci Eng. 2023;17: 22.
  26. 26. Cai H, Zhang C, Xu J, Wang F, Xiao L, Huang S, et al. Water Quality Prediction Based on the KF-LSTM Encoder-Decoder Network: A Case Study with Missing Data Collection. Water. 2023;15: 2542.
  27. 27. Li J, Chen Z, Li X, Yi X, Zhao Y, He X, et al. Water quality soft-sensor prediction in anaerobic process using deep neural network optimized by Tree-structured Parzen Estimator. Front Environ Sci Eng. 2023;17: 67.
  28. 28. Qu D, Wang S, Liu H, Meng Y. A Car-Following Model Based on Trajectory Data for Connected and Automated Vehicles to Predict Trajectory of Human-Driven Vehicles. Sustainability. 2022;14: 7045.
  29. 29. Wang L, Deng X, Ge P, Dong C, J. Bethel B, Yang L, et al. CNN-BiLSTM-Attention Model in Forecasting Wave Height over South-East China Seas. Computers, Materials & Continua. 2022;73: 2151–2168.
  30. 30. Ma T, Xiang G, Shi Y, Liu Y. Horizontal in situ stresses prediction using a CNN-BiLSTM-attention hybrid neural network. Geomech Geophys Geo-energ Geo-resour. 2022;8: 152.