A Four-Stage Hybrid Model for Hydrological Time Series Forecasting

Hydrological time series forecasting remains a difficult task due to its complicated nonlinear, non-stationary and multi-scale characteristics. To solve this difficulty and improve the prediction accuracy, a novel four-stage hybrid model is proposed for hydrological time series forecasting based on the principle of ‘denoising, decomposition and ensemble’. The proposed model has four stages, i.e., denoising, decomposition, components prediction and ensemble. In the denoising stage, the empirical mode decomposition (EMD) method is utilized to reduce the noises in the hydrological time series. Then, an improved method of EMD, the ensemble empirical mode decomposition (EEMD), is applied to decompose the denoised series into a number of intrinsic mode function (IMF) components and one residual component. Next, the radial basis function neural network (RBFNN) is adopted to predict the trend of all of the components obtained in the decomposition stage. In the final ensemble prediction stage, the forecasting results of all of the IMF and residual components obtained in the third stage are combined to generate the final prediction results, using a linear neural network (LNN) model. For illustration and verification, six hydrological cases with different characteristics are used to test the effectiveness of the proposed model. The proposed hybrid model performs better than conventional single models, the hybrid models without denoising or decomposition and the hybrid models based on other methods, such as the wavelet analysis (WA)-based hybrid models. In addition, the denoising and decomposition strategies decrease the complexity of the series and reduce the difficulties of the forecasting. With its effective denoising and accurate decomposition ability, high prediction precision and wide applicability, the new model is very promising for complex time series forecasting. This new forecast model is an extension of nonlinear prediction models.


Introduction
Hydrological time series forecasting plays an increasingly important role in the planning, management and optimal allocation of water resources [1]. However, it is still a difficult task due to the complicated stochastic characteristics existing in hydrological series. Further, hydrological processes are affected not only by climate change [2]- [3], including precipitation, evaporation and temperature, but also by human activities and socioeconomic development [4]. Therefore, the hydrological time series always tend to be nonlinear and time-varying [5]. The complex nonlinearity, high irregularity and multi-scale variability make the forecasting of hydrological time series a difficult task. Although many researchers have investigated the problem of hydrological time series forecasting [6]- [7], completely understanding of hydrological processes has not yet been achieved. The forecast accuracy of the current forecasting models is still not high, especially for complex time series.
The current approaches to hydrological forecasting can be divided into two categories: the process-driven models and the data-driven models [8]. Models in the first category mainly consider the internal physical mechanisms of hydrological processes, and they usually need a large amount of data for calibration and validation. However, there is not always enough data available [9]- [10]. The data-driven models are known as black-box methods [11], and they do not consider the physical hydrological process, instead identifying the relationship between the inputs and the outputs mathematically. The data-driven models have been proved to have the advantage of lower demands for quantitative data, better prediction performance and simpler formulation than the process-driven models [12]- [13].
The data-driven models developed in recent decades contain two main categories: traditional statistical techniques and artificial intelligence (AI) tools [14]- [15]. The statistical models can provide good prediction results when the series are linear or near-linear, but they cannot capture the nonlinear patterns hidden in hydrological time series. The nonlinear and AI models include artificial neural networks (ANNs), genetic algorithms (GAs) and support vector machines (SVMs), which provide powerful solutions to nonlinear hydrological forecasting [16]- [18]. However, these AI methods have their own shortcomings and disadvantages. For example, ANNs often suffers from overfitting, and SVMs are usually sensitive to parameter selection.
To overcome the shortcomings of the data-driven models described above and obtain results that are more accurate in forecasting, many hybrid models have been proposed and applied in hydrological series forecasting [19]- [20]. Recently, some hybrid models based on the principle of 'decomposition and ensemble' have been proposed. The main purpose of decomposition is to simplify the forecasting process, and the results of ensemble are used to evaluate the forecast performance. Forecast models of this type have already been applied in the field of hydrology research. For example, Kisi [21] used a combination of linear regression model and discrete wavelet transform to predict the river stage. Nourani et al. [22] and Kisi [23] combined the wavelet technique with ANNs to predict rainfall or streamflow time series. Sang [24] developed a method for discrete wavelet decomposition of series and proposed an improved wavelet modeling framework for hydrologic time series forecasting. The results of these studies prove that the 'decomposition and ensemble' principle based forecasting methods reduce the difficulty of forecasting and outperform the single models.
However, there are still many problems in prediction when using the 'decomposition and ensemble' principle. First, previous researches show that the widely used method presently in decomposing the hydrological time series is the wavelet analysis. However, the effectiveness of the wavelet decomposition is affected by many factors. For example, the accurate wavelet decomposition of series is still a problem and it depends heavily on the choice of wavelet basis function [25]. Second, the results of ensemble are usually defined as the sum of the individual forecasting results [26]. However, this is unreasonable, primarily because the degrees of importance of the intrinsic mode function (IMF) and the residual components are different [27]. In addition, the complicated hydrological time series usually contain noises and show complex characteristics due to the random or uncertain factors of environment [28]- [29], and the hydrological time series forecasting models without considering denoising may influence the prediction accuracy [30]- [31].
To overcome the above three shortcomings, we propose three improvements to promote the prediction accuracy based on the principle of 'decomposition and ensemble'. The first improvement is to reduce the noises involved in hydrological time series before decomposition, which has a great influence on the forecasting accuracy and may result in over-fitting or underfitting problems [32]. Thus, a novel four-stage hybrid model is developed, consisting of denoising, decomposition, component prediction and ensemble. The second improvement is to adopt more accurate and effective methods for the four stages. During the denoising stage, the empirical mode decomposition (EMD)-based method is employed. The improved EMD method, ensemble empirical mode decomposition (EEMD), is selected as the decomposition tool. Unlike the wavelet analysis and almost other previous decomposition methods, the EMD method describes the local time scale instantaneous frequencies better and does not need any predetermined basis functions [33]- [34]. Another advantage of the EMD-based technique is that it is very suitable for nonlinear and nonstationary time series. The third improvement is utilizing the ANN model instead of simply adding together the individual forecasting results to obtain the ensemble results. Additionally, we choose more reasonable cases with different characteristics and comparison models to thoroughly evaluate the effectiveness of the proposed hybrid-forecasting model. In this paper, we compare our method with other six different methods using six different cases.
The remainder of this paper is organized as follows: Section 2 provides a brief introduction to the methodology of the four stages mentioned above. Section 3 introduces the cases and data used in the evaluation of the proposed model. Section 4 presents and discusses the results of the case study. Finally, Section 5 provides the conclusions of the paper.

Methods
There are four main stages involved in the proposed hybrid prediction model, i.e., denoising, decomposition, components prediction and ensemble prediction. The methods selected for each stage will be briefly introduced in this section.

The denoising stage
The EMD method. At the beginning of the proposed algorithm, the EMD-based denoising method is employed to reduce the noises contained in the time series. The EMD is an adaptive decomposition method, especially for nonlinear and nonstationary data. The essence of the EMD is to extract IMF components from complex signals. The IMF should satisfy the following two conditions [35]: (a) In the whole data set, the number of extrema and the number of zero crossings must either equal or differ at most by one; (b) The mean value of the envelope defined by the local maxima and minima should be zero at any point.
For an original time series x( t) ( t~1, 2, Á Á Á , m) , the main steps of the EMD are as follows: (1) Identify all of the local extrema of x( t) .
(2) Create the upper envelope e up ( t) and lower envelope e low (t) by the cubic spline interpolation, respectively. where the difference is defined as d(t) : (5) Check the properties of d( t) : (i) If d( t) satisfies the requirements (a) and (b), then d( t) is denoted as the i th IMF, and x( t) is replaced with the residuer( t)~x( t) { d( t) . The i th IMF is denoted as c i ( t) , and i is the order number of the IMF; (ii) If d( t) is not an IMF, replace x( t) with d( t) .
(6) Repeat steps 1-5 until the residue r( t) becomes a monotonic function or the number of extrema is less than or equal to one, from which no further IMF can be extracted.
Finally, the original signal x( t) can be expressed as the sum of the IMFs and the residue r( t) : where n is the number of IMFs, c i ( t) ( i~1, Á Á Á , n) is the ith IMF and r( t) is the final residue. The residue r( t) can be seen as the trend of the signal x( t) [36].

The EMD-based denoising method
Many algorithms have been proposed to reduce the noises in series, including spectral analysis, Fourier transforms, wavelet transform and empirical model decomposition (EMD) [37,38]. Fourier transforms mainly address linear and stationary signals, and the effectiveness of the wavelet-analysis-based denoising method depends on the choice of basic wavelet function and decomposition level [39]. While, the EMD method directly decomposes the original signal into a finite number of components and it performs much better for non-linear and non-stationary signals. In this study, the EMD-based denoising method proposed by Kopsinis and McLaughlin [40] is adopted to reduce the noises of the hydrological time series, and it is briefly introduced as follows: For the signalx( t) ( t~1, 2, Á Á Á , m) , using the EMD method described above, x( t) can be expressed as the sum of the IMFs and the residue r( t) : where n is the number of IMFs and r( t) is the residue of x( t) .
To reduce the noises, a generalized reconstruction of the denoised signal is given as follows: The core issue of EMD-based denoising is to reconstruct the signal using only the IMFs that contain useful information and discard those that carry primarily noises, i.e., the IMFs that have similar amounts of energy with the noise-only signal. According to the feature of EMD, the power spectra of all of the IMFs excepting the first noise-only IMF exhibit self-similar characteristics and the noise-only IMFs energies linearly decrease in a semilog way. Therefore, the first IMF carries the highest amounts of noise-only energy and noises. In practice, the noise-only IMF energies can be calculated according to [41]: where b and r are parameters and depend on the number of sifting iterations used in EMD implementation, which can be After the analysis and energy calculation of the IMFs, for a noisy signal x(t) , a generalized reconstruction of the denoised signal x(t) is given as follows: where the parameters M 1 and M 2 control the number of IMFs used in the reconstruction process [43], which give us flexibility on the exclusion of the noisy low-order IMFs and on the optional thresholding of the high-order ones. It has been empirically found that the good choice of M 1 and M 2 are 2 and n{ 2, respectively [40].c i (t) represents the i th thresholded IMF. Inspired by the wavelet thresholding scheme, the thresholded IMF can be obtained by setting the element of each IMF to zero if its amplitude is less than the threshold, and the denoised signal is reconstructed utilizing the high-amplitude elements only [40,44]. Generally,c i (t) can be obtained by two thresholding schemes: the soft thresholding, or the hard thresholding, where T i is the threshold of the i th IMF. In our algorithm, we adopt different thresholds T i for different IMFs. The adaptive threshold T i is defined as follows: where E i can be calculated using Eq. 4 and Eq. 5. n is the number of IMFs. m is the length of the original data series. C is a constant, which can be selected for each data according to the denoising performance by setting the constant from 0.4 up to 1.4 with step of 0.1 [40]. Actually, according to the results of [40], the optimal constant C for the denoising algorithm was found to be between 0.6 and 0.8 with a small performance discrepancy for any constant between the above values. Therefore, in our experiments, the constant C is set to 0.7.

The decomposition stage
The EMD method has been proven to be an effective decomposition method [35]. However, an obvious drawback of EMD is the frequent appearance of mode mixing. To overcome this defect, Wu and Huang [45] proposed the EEMD method, which is a substantial improvement of EMD and can better separate the scales by adding white noise series to the original time series. Therefore, the EEMD method is selected to decompose the hydrological time series in this stage. The steps of the EEMD are as follows: (1) Add a white noise series to the original data; (2) Decompose the data with added white noise into IMFs using EMD method mentioned above; (3) Repeat step (1) and step (2), but add different white noise series each time; (4) Obtain the ensemble means of corresponding IMFs as the final results.
The added white noise series can cancel each other by taking the average of the corresponding IMFs and the mean IMFs can be close to the original time series by adding noise repeatedly. Therefore, this can significantly reduce the chance of mode mixing and represent a substantial improvement over the original EMD. Nevertheless, how to select the optimal size of the ensemble and the amplitude of the added noise is still an open problem [46]. In fact, the effect of the added white noise can be decreased according to the well-established statistical rule [45]: where N is the number of ensemble members, " is the amplitude of the added noise, and " n is the final standard deviation of error, which is defined as the difference between the input signal and the corresponding IMF. In this paper, the number of ensemble members is set to 100 and the standard deviation of white noise series is set to 0.2 [47].

The component prediction stage
ANNs are considered as nonlinear statistical data modeling tools that can simulate the complex relationship between inputs and outputs. In this study, radial basis function neural network (RBFNN), as an ANN technique, is adopted to predict the decomposed IMFs and residual components. The main reason for selecting RBFNN for prediction is that it has a simple structure and a flexible number of neurons [48]. Unlike other types of feedforward neural networks, the RBFNN has only one hidden layer. Furthermore, RBFNN has a strong approximation ability and fast convergence rate [49]. RBFNN is a three-layer feed-forward neural network, consisting of an input layer, a hidden layer and an output layer. Fig. 1 shows the architecture of the RBFNN.
In Fig. 1, x~( x 1 , x 2 , Á Á Á , x m ) is the input vector; y~( y 1 , y 2 , Á Á Á , y p ) is the output vector; w ij andw ij 0 are connection weights. The radial basis function of the hidden layer is denoted as Q ( x, r i ) ( i~1, 2, Á Á Á , n) . The most popular radial basis function is the Gaussian function: where s is the standard deviation, r i is the i th node center of the hidden layer, and x { r i k kis the Euclidean distance between x and r i .
The main steps of the RBFNN-based forecasting model are shown as follows: Step 1. Standardization. The time series where x min and x max denote the minimum and the maximum of the time series x 1 , x 2 , . . . , x m f g .
Step 2. Forecasting. The forecasting process contains three stages, of which the first is determining the training set and the where y Nz 1 , y Nz 2 , . . . , y m f g is the final prediction result.

The ensemble stage
In the ensemble stage, we adopt a linear neural network (LNN) to integrate the prediction results of the above components, which has a simple structure and the characteristics with fast convergence and high precision. The LNN uses the Widrow-Hoff learning rule known as the least mean square (LMS) to train the neural nets [50]. The architecture of the LNN is shown in Fig. 2.
The LNN algorithm can be expressed as follows: where p~( p 1 , p 2 , Á Á Á , p m ) is the input vector, m is the number of neuron nodes in the input layer, yis the target output vector, bis a bias, w~( w 1 , w 2 , Á Á Á , w m ) is the connection weight vector, which connects the input variables to the neurons, and v( . ) is the transfer function of the single-layer LNN. The mean square error of the LNN neural can be expressed as follows: where err is the mean square error, nis the number of samples, y i ( p) is the network input and t i ( p) is the target output. The learning rule of LNN is to minimize the mean square error by adjusting the weight vector and the bias, which can be adjusted by the following formulae: where g is a learning rate. When g is larger, the learning and convergence speeds are faster, and when g is excessively large, the learning process will be unstable and the error will be bigger. Therefore, to obtain a good convergence to the optimal weight and bias, a suitable learning rate should be chosen.
The overall process of the proposed four-stage hybrid model   (1) Denoising. Remove noises inx( t) by the EMD interval thresholding-based method, and obtain the denoised time series ofx(t) , denoted asx(t) .

Case study and experimental design
To illustrate the effectiveness of our proposed four-stage hybrid forecasting model, several real hydrological cases are studied in this section.

Study area
The Haihe River Basin (HRB) is selected in this research. As the largest water system in northern China, the HRB is extremely important to China. Influenced by climatic change and human activities, the precipitation and streamfollow have a remarkably interannual and interdecadal variation in the HRB. The precipitation in flood season (June-September) generally accounts for 70%-85% of the annual precipitation. Annual variability of precipitation is also very large. For example, the annual precipitation is more than 800 mm in wet years but only approximately 270 mm in drought years. This leads to complex characteristics in hydrological series. Analyzing the complex characteristics of the hydrological time series and predicting their future trends are significant for water resources planning and ensuring sustainable economic and social development.

Data description
To thoroughly evaluate the effectiveness of the proposed hybrid forecasting model, six cases with different characteristics are analyzed. The time series of the six cases are denoted by S1, S2, S3, S4, S5 and S6, respectively. Table 1 provides the essential information of the six cases, including the lengths, the time ranges, etc. Specifically, the series S1 presents a 62-year (1951-2012) annual mean precipitation data of the whole HRB. The series S2 and S3 both present 62-year  precipitation data measured at the Beijing weather station, but the first one is annual precipitation data and the other is summer precipitation data, which is defined as the sum of the monthly precipitation in June, July and August. The series S4 presents annual runoff data from 1956 to 2000 measured at the Guantai hydrologic station. The series S5 and S6 are monthly runoff data from January 1956 to December 2000, 540 data points in total. S5 is measured at the Xiangshuibao station in the Yang River of the HRB and S6 is measured at the Miyun Reservoir station in the HRB.
For the six cases, the precipitation data are collected from the China meteorological data sharing service system, and the runoff data are extracted from China's hydrological yearbook. It should be noted that the data for S2, S3, S4 and S5 are directly collected from the observed stations, as shown in Fig. 4. The annual mean precipitation data of S1 are calculated from the precipitation data of the 44 meteorological stations (shown in Fig. 4) based on the Thiessen polygon theory in Geographic Information System (GIS) software ArcGIS 9.3. The main reason of selecting the six time series is that they have different spatial and temporal scales, including not only data on a wholebasin scale (S1) but also data for single hydrological stations (S2, S3, S4, S5). Additionally, they also have different time scales, including yearly, monthly and seasonal data. Selecting data of different types is helpful for validating the applicability of the model.

Model evaluation
Comparison models. To assess the effectiveness of the proposed EMD-EEMD-RBFNN-LNN hybrid prediction framework, we compared our method with other forecasting approaches. In our experiments, four types, altogether six comparison methods are set. These comparison methods are set in accordance with the following three purposes: (1) To verify the roles of the denoising and decomposition stages in improving the prediction performance, four types of comparison models are set: C1, C2, C3 and C4. (2) To verify the validity of the method selected in each stage of C4, other three hybrid prediction methods with the type of C4 are given. (3) To verify the effectiveness of the ANNs model compared with traditional statistical models, the ARIMA method is selected as a comparison with RBFNN in the prediction stage. Based on the three points above, the comparison models of EMD-EEMD-RBFNN-LNN are set as shown in Table 2. The detailed descriptions of the comparison models in Table 2 are given as follows: The first kind of comparison model C1 is set as a one-stage single prediction model and it predicts the trend of the original time series directly. For comparison, the three types of comparison models C1, C2 and C3 adopt the same approaches with C4 in the corresponding stages. Therefore, the method used by C1 is RBFNN or ARIMA, and the ARIMA method [51] is employed as the comparison method to test the effectiveness of the ANNs method. C2 is set as a two-stage hybrid model. The first stage is to reduce the noises in the original time series and the second stage is to predict the trend of the denoised series. Specifically, we utilize the EMD-based denoising method to reduce the noises of the original time series and then predict the trend of the denoised series with the RBFNN (or ARIMA) method, which is denoted as EMD-RBFNN (ARIMA). C3 is set as a three-stage hybrid prediction model. First, decompose the original time series into various components using EEMD method; then, predict the trends of the components with the RBFNN (ARIMA) model; and finally, integrate the prediction results of all the components based on LNN, which is denoted as EEMD-RBFNN (ARIMA)-LNN.
In addition, C4 is a type of four-stage hybrid model proposed in this paper. First, reduce the noises in the original time series; then decompose the denoised time series and predict the trend of each component; and finally, assemble the prediction results of all of the components. The method we employ for each stage of C4 has been introduced in Section 2, the EMD-EEMD-RBFNN-LNN is developed as our final proposed method. To verify the validity of the selected method in each stage of C4, other three hybrid prediction methods with the form of C4 are given as follows: (1) To test the effectiveness of the ensemble method LNN with the traditional simple addition (ADD) method, the first comparison method denoted by EMD-EEMD-RBFNN-ADD is set. (2) The second method is denoted as EMD-EEMD-ARIMA-LNN, which is set to test the effectiveness of the prediction method RBFNN compared with traditional statistic methods, for example, the ARIMA method. (3) The third method is denoted as EMD-WA-RBFNN-LNN, which is set to test the validity of the EEMD method in the decomposition stage, the wavelet analysis (WA) method is employed as a comparison method with EEMD.

Evaluation criteria
To evaluate the prediction accuracy, the data for the six forecasted series are divided into two parts (Table 3) for calibration and verification, respectively. The mean relative error (MRE), mean absolute error (MAE) and root mean square error (RMSE) are used for evaluating of different prediction methods, respectively, which are defined as follows: where r( i) is the real value, f ( i) is the forecasted data, and m is the size of the data. MRE, MAE and RMSE measure the deviation between the actual and predicted value.

Denoising results
The denoising results of the six hydrological time series by the EMD-based method are shown in Fig. 5. The statistical characteristic values, including mean ( x), standard deviation (s ), signalto-noise ratio (SNR) and the root mean square error (RMSE) are used to evaluate the denoising effectiveness, and these are listed in Table 4.
where x(t) is the original time series,x(t) is the denoised time series.
RMSE (Eq. 18) measures the differences between two series; actually, the smaller the RMSE is, the better the performance that can be obtained by denoising. The SNR (Eq. 18) quantifies how much a signal has been corrupted by noises. Table 4 shows the statistical characteristic values. In Table 4, x,x andNare the original time series, denoised time series and noisy time series, respectively. From Table 4, we can see that the mean values of the denoised and original series are similar. Further, the statistical values of wavelet analysis are also listed in Table 4, and its denoised results are shown in Fig. 5. Compared with the wavelet analysis, the SNR is larger and the RMSE is smaller in the EMDbased denoising method. Therefore, the EMD-based method shows much better performance in denoising, which can be clearly seen from Fig. 5.

Decomposition results and complexity analysis
Using the EEMD method, the six denoised hydrological time series obtained above are decomposed into several independent IMFs and one residue. For convenience, the i th IMF is denoted as IMFi (i~1, Á Á Á , n), and n is the number of IMFs. The decomposition results are illustrated in Fig. 6, which lists all of the IMFs in order from the highest frequency to the lowest frequency, andRis the residual component that maintains the trend of the original time series. It is easy to see that the series S1, S2, S3 and S4 are decomposed into four independent IMFs and one residue, whereas S5 and S6 are decomposed into seven and eight independent IMFs and one residue, respectively. The decomposition results indicate that the hydrological time series have complex multi-scale characteristics.
In this study, the complexities of the original hydrological time series and the IMFs obtained by decomposition are measured with the Lempel-Ziv complexity (LZC) theory, which has been used in multiple contexts and is considered an effective model for the measurement of complexity and randomness [52]- [53]. Fig. 7 shows the LZC of the six original series, the denoised series and the IMFs. From Fig. 7, we can see that most of the LZC of the original time series in the six cases are larger than 1, which indicates that the hydrological time series have complex characteristics [54]. The LZC values for the denoised time series are smaller than those for the original time series, meaning that the denoising process reduces the complexity of the original sequences. Fig. 7 also shows that the denoised series are decomposed into several IMFs with smaller LZC values, and they decrease from the highest frequency to the lowest frequency; furthermore, most of the LZC values of the IMFs are much smaller than those for the undecomposed series. Therefore, decomposition greatly reduces the difficulty of forecasting.

Forecasting results
For each extracted IMF and residual component, the RBFNN is adopted to forecast the decomposed components. Similarly, the ARIMA method is employed as a comparative forecast model; that is, each component is predicted by both RBFNN and ARIMA methods. One important part of RBFNN is the determination of the neural nodes of the layers, and the other is the selection of network parameters. In this study, the RBFNN is trained by the toolbox newrb in Matlab, which has the format: net = newrb (P, T, goal, spread, MN, DF), where the 'goal' and 'spread' are two important parameters. The neural node of the output layer is set as 1, and the number of neurons of the input layer is determined by training for many cycles; it does not need to select the neural node of the hidden layer because newrb adds neurons to the hidden layer adaptively. In an ARIMA (p-d-q) model, the best ARIMA model for each training sample is determined based on the Schwarz Criterion [55].
All of the IMFs and residual components of the six hydrological time series are predicted. Taking S1 and S5 as examples, Tables 5  and 6 show the optimal structure and prediction error of the RBFNN and ARIMA models when modeling the IMFs of the denoised series. The results of the prediction errors in both tables show that the RBFNN performs better than the ARIMA method. Furthermore, with the frequency of each component gradually reduced from IMF1 to the residue R, the prediction accuracy of each component gradually increases from IMF1 to R, which further implies that simply adding the forecasting results of the IMFs in the ensemble stage is unreasonable and inaccurate.
The results of evaluating the forecasting of six cases are shown in Table 7, where the values in brackets are the prediction results of the ARIMA method as a comparison with the RBFNN model, the text in bold shows the results of the proposed EMD-EEMD-RBFNN-LNN model, and the other values are its comparison models. Fig. 8 shows the prediction results of the six cases by using the four-stage hybrid forecasting model, which contains the EMD-EEMD-RBFNN-LNN method and its three comparison models: EMD-EEMD-RBFNN-ADD, EMD-EEMD-ARIMA-LNN and EMD-WA-RBFNN-LNN.

Discussion
Based on Table 7 and Fig. 8, we can get the following conclusions: (  Table 7. It has been shown that the forecasting results of any series by C4 are more accurate than those by C1, C2 and C3, indicating that C4 improves the prediction framework. Through comparing several methods in C4, in Fig. 8, we can see that the deviation between the actual value (in blue) and the predicted value is the smallest for the EMD-EEMD-RBFNN-LNN model (in red). This indicates that the proposed EMD-EEMD-RBFNN-LNN hybrid model has the best prediction performance and improves prediction accuracy. (2) Removing the noise of the original time series before forecasting improves the prediction accuracy. This can be seen from Table 7 respectively. This indicates that the nonlinear AI models are more suitable for prediction than traditional statistical models. (5) Compared with the single prediction model, the hybrid prediction model has better forecasting performance. This is because the forecasting results of the four models C1, C2, C3 and C4 have significant differences. Among them, C1 has the worst forecasting accuracy, while C4 has the best forecasting accuracy, especially for the time series with complex multitime characteristics, such as S5 and S6. (6) For several hybrid prediction models in C4, the prediction performances are different when using different decomposition and ensemble methods. Compared with other decomposition methods, as in Table 7 and Fig. 8, the proposed EMD-EEMD-RBFNN-LNN algorithm can yield much better prediction performance than the EMD-WA-RBFNN-LNN method, demonstrating that EEMD is much more efficient in decomposition than the WA method. For the ensemble strategy, the performances of EMD-EEMD-RBFNN-LNN and EMD-EEMD-ARIMA-LNN are both better than EMD-EEMD-RBFNN-ADD, indicating that LNN is the more powerful ensemble method. (7) Based on Table 7, it can be easily seen that the proposed four-stage model C4 performs better than the other three comparison models for all six cases. Therefore, the proposed 'denoising-decomposition-prediction-ensemble' framework has wide applicability for hydrological time series forecasting.

Conclusions
Considering the intrinsic complexity of hydrological time series, a new method with four stages, EMD-EEMD-RBFNN-LNN, is proposed for predicting the hydrological time series. The results of six cases show that the proposed hybrid prediction model improves the prediction performance significantly and outperforms some other popular forecasting methods. From the results of the six experiment cases, the following conclusions can be drawn: (1) The proposed EMD (denoising)-EEMD (decomposition)-RBFNN (prediction)-LNN (ensemble) model is significantly superior to all other comparison methods in terms of prediction accuracy, including the models ARIMA, EMD-RBFNN and EMD-WA-RBFNN-LNN, etc. (2) The time series denoising and decomposition enhance the forecasting performance, which suggests that the denoising and decomposition strategies are effective approaches for improving prediction accuracy. (3) The results of the complexity analysis show that the denoising and decomposition stages decrease the complexity of the series and reduce the difficulties of the forecasting. (4) As the nonlinear and non-stationary characteristics existed in the hydrological time series, the nonlinear model EMD-EEMD-RBFNN-LNN is more suitable for prediction than traditional statistic models. (5) The proposed four-stage hybrid model has wide applicability in hydrological time series forecasting as it can improve the prediction performance for several hydrological time series with different characteristics.
As the denoising and decomposition stages can decrease the complexity of the series, enhance the forecasting performance and reduce the difficulties of the forecasting, the proposed model can be utilized as a very promising method for complex time series forecasting, especially for hydrological time series with multiple components and high irregularity. In addition, this new forecast model is also capable of solving other nonlinear prediction problems. Certainly, there are still two problems which need to be further studied: (1) The prediction methods need to be further improved because any black-box model has some defects, such as the choice of reasonable parameters and structures; (2) The present study only considers univariate time series analysis, but some factors affecting hydrological time series such as climate change are not taken into consideration. If these factors can be included into the proposed hybrid prediction model, the forecasting performance will be greatly improved.