Short-term power load forecasting based on the CEEMDAN-TCN-ESN model

Ensuring an adequate electric power supply while minimizing redundant generation is the main objective of power load forecasting, as this is essential for the power system to operate efficiently. Therefore, accurate power load forecasting is of great significance to save social resources and promote economic development. In the current study, a hybrid CEEMDAN-TCN-ESN forecasting model based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and higher-frequency and lower-frequency component reconstruction is proposed for short-term load forecasting research. In this paper, we select the historical national electricity load data of Panama as the research subject and make hourly forecasts of its electricity load data. The results show that the RMSE and MAE predicted by the CEEMDAN-TCN-ESN model on this dataset are 15.081 and 10.944, respectively, and R2 is 0.994. Compared to the second-best model (CEEMDAN-TCN), the RMSE is reduced by 9.52%, and the MAE is reduced by 17.39%. The hybrid model proposed in this paper effectively extracts the complex features of short-term power load data and successfully merges subseries according to certain similar features. It learns the complex and varying features of higher-frequency series and the obvious regularity of the lower-frequency-trend series well, which could be applicable to real-world short-term power load forecasting work.


Introduction
In recent years, the global natural environment has continued to deteriorate, energy demand has exceeded supply, and many places in the world have varying degrees of power supply shortage phenomena, but in some areas, there is a serious waste of electricity, which is a great loss to human society.Ensuring a sufficient power supply while minimizing redundant power generation is the main objective of power load forecasting; therefore, in recent years, power load forecasting has become an important direction of research in the field of natural sciences.
Power load forecasting mainly includes short-term load forecasting and medium-to longterm load forecasting.Different forecasting methods have been proposed for short-term load forecasting in recent years, which can be broadly divided into four categories according to

Time series forecasting method
Sadaei et al. [3] proposed a combination of fuzzy time series (FTS) and seasonal autoregressive integrated sliding average (SARFIMA) for forecasting seasonal time series that follow a long memory process.Mi et al. [4] proposed a method based on an improved exponential smoothing gray model for forecasting short-term power load data.Gray correlation analysis was first used to identify the main influencing factors, exponential smoothing was then applied to process the original data and build a gray forecasting model using a smoothed series in line with the exponential trend, and inverse exponential smoothing was finally used to recover the forecast values.
All the above improved time series forecasting models have achieved good forecasting results, but major limitations still exist in dealing with complex and variable nonlinear relationships among data.Generally, they can only forecast lower-frequency data and perform poorly when dealing with higher-frequency data.

Machine learning
Hnin and Jeenanunta [5] used support vector regression (SVR) to build a prediction model for daily electricity demand and used the particle swarm optimization algorithm (PSO) to optimize the parameters of SVR, which substantially improved the prediction accuracy of the model.Albuquerque et al. [6] used a regularized machine learning model with random forest (RF) and the Lasso Lars method, successfully extracting trends and seasonality in each time region and thus achieving more accurate forecasts in all time domains.Fan et al. [7] proposed a combined SVR-GC-RF algorithm short-term load forecasting model, using SVR to address the nonlinear relationships among data and a garbage collection algorithm (GC) to extract mutation points from long-term data and reduce randomness.Finally, the prediction performance is optimized using RF.Ribeiro [8] proposed a hybrid learning model based on a dual decomposition approach.The locally weighted regression (STL) is used to decompose the original time series into seasonal, trend, and residual components.Then, variational mode decomposition (VMD) is adopted to decompose the STL residual into different frequencies, and seasonal naïve is used to handle the seasonal patterns.Moreover, in consideration of the nonlinearities of the remaining components, extreme learning machines, ridge regression, and support vector regression models are employed to handle the STL trend and VMD components.
The accuracy of machine learning in completing prediction tasks often requires excellent feature engineering, and the setting of model parameters also has a great impact on the prediction accuracy.Machine learning models often perform poorly on chaotic systems.

Deep learning
In recent years, deep learning models have been increasingly adopted in the research of time series such as wind speed and temperature.Compared with traditional statistical models, deep learning models often have better performance in the prediction task of multi-feature complex time series.Lv et al. [9] proposed an effective hybrid model for wind speed forecasting tasks based on an improved hybrid time series decomposition strategy (HTD) and the novel multi-objective binary backtracking search algorithm (MOBBSA).Furthermore, the advanced Sequence-to-Sequence (Seq2Seq) predictor is used to obtain the final forecasting result.In addition, Lv et al. [10] also proposed the FWNSDEC-SSA-ConvLSTM model for enhancing the short-term wind speed forecasting performance.The filter-wrapper non-dominated sorting differential evolution algorithm incorporating K-medoid clustering (FWNSDEC) is designed to select key meteorological factors, and singular spectrum analysis (SSA) is used to decompose the meteorological factors.Then, the convolutional long short-term memory (ConvLSTM) network is adopted to obtain the final forecasting result.Similarly, deep learning is very important in the prediction task of short-term power loads.
Zhou et al. [11] proposed a novel predictive system based on a data denoising strategy, statistical predictive systems, artificial intelligence forecasting systems and a multi-objective optimization strategy.After the data denoising process, the reconstructed data are used as the input of different subsystems.Then, to obtain stable forecasting results, a multi-objective dragonfly algorithm is used to estimate the weight coefficient of the subsystems.Trierweiler Ribeiro et al. [12] used an echo state network (ESN) for short-term load forecasting and proposed a Bayesian optimization algorithm to calculate the hyperparameters of the ESN model.Guo et al. [13] used convolutional neural networks (CNNs) to cascade shallow and deep features from different scales, and the feature vectors from different scales were fused and fed into a long short-term memory network (LSTM) for future load prediction.Khwaja et al. [14] used an integrated machine learning approach based on an artificial neural network (ANN) for short-term load forecasting and proposed a technique that combines bagging and boosting to improve ANN model performance.Peng et al. [15] proposed a long short-term memory-based model for energy consumption forecasting.The empirical wavelet transform (EWT) technology was first used to decompose the energy consumption sequence into several components.Then, each component is concatenated as the input of the attention-based LSTM model to obtain the predicted value of each component.Finally, the predicted value of each component is integrated to obtain the final output.
Compared with the methods, deep learning has stronger fault tolerance and can handle complex data information.However, it is limited due to the amount of sample data needed, slow convergence, complex optimization function, and being prone to fall into local extremes.

Research methods proposed in this paper
This paper proposes a combined model CEEMDAN-TCN-ESN to achieve point estimate forecasting of short-term electric load.In the first stage, this paper trains and predicts the original series that have been preprocessed but not decomposed by complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) using multiple models, compares the prediction accuracy of a single model on this dataset and finds the best single prediction model.Then, the original time series of the electric load are decomposed by using the CEEMDAN decomposition technique to obtain a total of 15 subseries, including trend terms, and the same set of models is used to forecast each subseries.All 15 forecasts are added up to obtain the final forecasting results and compare the forecasting effect of a single model on the corresponding datasets before and after decomposition to reflect the enhancement effect of CEEMDAN modal decomposition on the forecasting performance.Moreover, to improve the forecasting accuracy and efficiency, this paper reconstructs all 14 subseries obtained after CEEMDAN decomposition of the original series, except the trend term, into two combined series of high frequency and low frequency.
Considering the similarity of the characteristics of lower-frequency series and trend term series, (i.e., compared to higher-frequency series, both have relatively gentle changes and more obvious regularity).Therefore, the lower-frequency series and the trend term series are combined to obtain two merged series of higher-frequency and lower-frequency trends and use the same set of models as before to model and predict the two merged series, determine the best model for each of the two series, build a combined model for prediction, and calculate the performance improvement percentage relative to the best single model when the series are not decomposed and the best single model when the series are decomposed but not reconstructed.
In this study, a special combination model is designed by combining the modal decomposition technique and the dual prediction model.Compared with the research methods without time series decomposition or using only a single prediction model, it has higher prediction efficiency and accuracy.The main contributions of this paper are as follows.
1.A special combination model is designed by combining the modal decomposition technique and the dual prediction model.Compared with the research methods without decomposition techniques or using only a single prediction model, it has higher prediction efficiency and accuracy.
2. This paper is not limited to a single traditional statistical perspective or a single machine learning perspective but considers the prediction model from two fields at the same time.
The respective classical models are selected as experimental alternatives to discover the fitting effects of different classes of models on data with different characteristics.By observing their respective performances in the original series, higher-frequency, and lower-frequency-trend series to achieve a two-way comparison between horizontal and vertical.The rest of this paper is organized as follows.Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), reconstruction, temporal convolutional network (TCN), and echo state network (ESN) are introduced in detail in Section 2. The specific learning framework of CEEMDAN-TCN-ESN will be introduced in Section 3. Section 4 contains the parameter settings of the model, the output results of the model applied to the dataset, and the performance indexes of the proposed model and the benchmark model.Section 5 applies the proposed combined model to another dataset to check the robustness of the model.Finally, the conclusions are presented in Section 6.

Methodologies
In this section, CEEMDAN, intrinsic mode function (IMF) component reconstruction, TCN, and ESN are introduced one by one, which are the basis of the combinatorial model proposed in this paper.

CEEMDAN
Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is an algorithm implemented on top of the empirical mode decomposition (EMD) algorithm.
The basic principle of the EMD algorithm is based on the local characteristic time scale of the original signal.The original signal is decomposed into characteristic mode functions to obtain a series of intrinsic mode function (IMF) components from high frequency to low frequency.The EMD algorithm has excellent properties such as adaptability and multiresolution, but the algorithm also has some obvious disadvantages, including envelope fitting bias, endpoint effect, modal blending, etc.The modal aliasing phenomenon is manifested in these two aspects: (1) signals with different feature scales appear in the same IMF component, and (2) signals with the same feature scale appear in different IMF components.
To solve the problem of modal confusion, Chang et al. and Torres et al. [16,17] proposed two other decomposition methods, EEMD and CEEMDAN.The main improvement idea of the EEMD algorithm is that by using the zero-mean property of white noise, the white noise obeying uniform distribution is introduced several times in the process of decomposition, and the noise of the original signal itself is masked by the artificially added noise to obtain a more accurate upper and lower envelope.At the same time, the decomposition results are averaged, and the more times the averaging process is performed, the smaller the impact of noise on the decomposition.
Assuming that the original time series is X(t), the decomposition steps of CEEMDAN are as follows.
(1) First, a Gaussian white noise sequence of length t periods (t is the length of the original time series) is added to the original sequence, and the adaptive coefficient of Gaussian white noise in each phase is set to a 0 .Then, the first modal component IMF 1 is obtained by EMD decomposition of the time series after white noise has been added.Furthermore, the above procedure is repeated p times to obtain p IMF 1 and calculate their mean values to obtain IMF 1 .The specific process is demonstrated in Eqs (1) and (2): (2) Remove IMF 1 from the original time series X(t) and label the remaining sequence as r i 1 ðtÞ.The adaptive signal E 1 (wn i (t)) will be calculated by EMD and added to the remaining sequence r i 1 ðtÞ.Next, similar to step (1), Eqs (1) and ( 2) are repeated p times to obtain p IMF 2 , and then their mean values are calculated to obtain IMF 2 .The specific process is shown in Eqs (3)-( 5): (3) For the kth (k = 2,3. . .n) modal components, similar to step (2), IMF k is obtained.The specific process is shown in Eqs ( 6)-( 8): (4) The above steps will be repeated until the remaining components are no longer suitable for decomposition.Finally, all the IMFs that satisfy the conditions are successfully extracted, and the trend term r n (t) is obtained at the same time.

Reconstruction
The IMF components obtained from the CEEMDAN decomposition satisfy the local symmetry of the upper and lower envelopes with respect to the time axis.For high-frequency IMF components, the upper and lower envelopes are basically obtained by connecting numerous signal peaks, so the symmetry of the upper and lower envelopes means that the IMF component data are also basically symmetric and the mean value of the data tends to be close to zero.However, for low-frequency IMF components, due to their long signal period, the envelopes are obtained by interpolating a small number of signal peaks, resulting in a large deviation of the envelope trend from the original signal trend.Therefore, for low-frequency IMF components, the signal component data are often not symmetric when the envelope is symmetric or even far apart, and the mean value of the data generally does not tend to be close to zero [18].
Based on this, this paper first sums the IMF components obtained from CEEMDAN decomposition item by item, defines the sum of the first i IMFs as index i (i = 1,2. .... k), calculates the mean value of index 1 to index k , and conducts Student's t test on whether the mean value significantly deviates from zero.The specific process is shown in Eqs ( 9)-( 11):

TCN
The temporal convolutional network (TCN) stands for time domain short for convolutional network, which originally consisted of dilated, causal 1D convolutional layers with the same input and output lengths.The temporal convolutional network is improved based on the traditional 1D convolutional neural network while combining causal convolution, dilated convolution and residual linking.Some researchers have applied TCN model to predict short-term distance headway, ultra-short-term wind power, rockburst risk level and other things [19][20][21].
The convolutional structure of the TCN model is shown in Fig 1.
The receptive field of the TCN network neuron, i.e., the network memory length, is determined by the convolutional kernel size, the expansion coefficient and the number of convolutional layers, and the formula for the t-th operation after the expansion convolution operation is shown in Eq (12): In Eq (12), x is the input sequence, * is the convolution operation, u is the convolution kernel size, f(i) is the i-th element of the convolution kernel, and x t−p.i is the element of the input sequence that corresponds to the convolution kernel.
The TCN residual module contains the underlying causal expansion convolution layer, the weight normalization layer, the activation function and the dropout layer.The structure of the N residual module is shown in Fig 2: Weight normalization can solve the gradient explosion problem and speed up the computation effectively.The ReLU activation function is also used, and the dropout layer is added after the ReLU activation layer to prevent overfitting to achieve the regularization effect.Finally, the problem of different latitudes of the residual tensor is adjusted by nonlinear mapping (e.g., 1×1 convolution).

ESN
The echo state network (ESN), also known as reservoir computing, was proposed by Jaeger in 2001, and a reserve pool composed of neurons obtained by a randomly generated sparsely connected internal weight matrix is used as the hidden layer to project the input layer to a highdimensional, nonlinear space representation.The ESN is not trained to generate the hidden layer weights of the neural network but to generate the hidden layer weights in advance, and this process is performed separately from the process of training the weights from the hidden layer to the output layer.The basic idea is based on the premise that the generated reserve pool has some good properties, i.e., it is often guaranteed that excellent performance can be obtained by training the weights from the reserve pool to the output layer using only linear methods [22,23].Input layer: Input the tensor of k×1, multiply the input by W in (the weight matrix with the input unit to the inside of the reserve layer) and enter the resulting product into the reserve layer.
Reserve layer (N-node network): Each node in the reserve pool corresponds to a state and is represented using X N .The internal neuron connection weight matrix W is also present in the reserve layer.
Output layer: Multiply the output of the reserve pool by W out (the weight matrix with the inside of the reserve layer to the output layer) to obtain the target value y.
The respective values of the input layer u(t), reserve pool neuron hidden state x(t), and output layer y(t) at moment t are taken as in Eqs ( 13)-( 15 16) and (17).
In the above equations, f is the activation function of the reserve layer's cells, and f out is the activation function of the output layer's cells.W in is the connection weight matrix of the input layer to the reserve layer.W is the internal connection weight matrix of the reserve layer.W back is the connection weight matrix of the output layer to the reserve layer.All of them are preprogrammed before the model training, and only W out (the connection weight matrix of the reserve layer to the output layer) is obtained through the training model.
The research framework of this paper is shown in Fig 4.

Framework of the CEEMDAN-TCN-ESN
In this section, a detailed introduction to the CEEMDAN-TCN-ESN learning framework is presented.First, the algorithmic principles of the learning framework are shown in the form of pseudocode.The learning framework of CEEMDAN-TCN-ESN is shown in Fig 5.
The CEEMDAN-TCN-ESN learning framework can be specifically divided into four steps.
In this paper, first, after completing basic operations such as data preprocessing and smoothing for a time series, CEEMDAN is used to perform modal decomposition of the series to obtain n modal components.
2. Sequence Reconstruction.The modal components obtained in (1) are then reconstructed serially by summing the subseries obtained from the CEEMDAN decomposition item by item, defining the sum of the first i IMFs as index i (i = 1,2. .... n), calculating the mean of index 1 to index n , and conducting Student's t test on whether the mean value is significantly different from zero or not.At this point, the first k-1 signal components are the higher-frequency components; thus, the first k-1 modal components are combined to obtain the higher-frequency series, and the remaining components except the last component (trend term) are combined to obtain the lower-frequency series.The lower-frequency-trend series is obtained by merging the lowerfrequency series with the trend term, which completes the reconstruction task of the series.
3. Model Selection.The higher-frequency and lower-frequency-trend series obtained from the reconstruction in (2) are fitted and predicted by various models, and the optimal prediction models for the two series are selected based on the previously determined model performance indicators (e.g., RMSE and MAE).The prediction data of the training set and the test set are obtained for the two series when the optimal models are applied for prediction.

Experimental analysis
In this section, information about the dataset used, the indicators used to evaluate the predictive performance of the models, and the parameter settings of the various models chosen for the experiments are provided.

Data source
The data used in this paper are derived from the Panama national electricity load data.This dataset provides data on Panama's national electricity load at one-hour intervals and provides a total of 17,520 data points.The time span of the series is from 0:00 on January 1st, 2018, to 23:00 on December 31st, 2019.The statistical results of the raw data are shown in Table 1.

Evaluation indicators
To assess the prediction effectiveness of each model from a quantitative perspective, this paper chooses the root mean square error (RMSE), mean absolute error (MAE) and goodness-of-fit (R 2 ) to measure the prediction accuracy and generalization ability of different models.Assuming y i is the true data value and ŷi is the predicted value of the model, where i = 1,2. ..n (n is the sample size), the mathematical representation of the abovementioned evaluation indicators is shown in Eqs ( 18)- (20).

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 n From the definitions of the above three evaluation indicators, it can be seen that if the values of RMSE and MAE are smaller and the value of R 2 is closer to 1, the prediction error of the model will be smaller and the prediction effect of the model will be better, suggesting that the model's generalization ability may be stronger.

Data preprocessing
The current study uses power load data of two years in length and hourly in frequency, and the dataset does not contain missing values.Outliers in short-term power load forecasting have a large negative impact on the establishment of the model, so the identification and processing of outliers is critical in the training process.To achieve convenient and rigorous identification of outlier points, it is assumed that the electricity load data in the dataset obey a normal distribution.Then, according to the 3δ principle of a normal distribution, the electricity load values that differ from the sample mean by more than 3 times the standard deviation are identified as outliers, whereby five outlier points are detected in the electricity load data of Panama for two years (2018-2019) and replaced with missing values.Since the electric load data are continuous and periodic to some extent, this paper chooses the linear interpolation method to fill the above missing values.In addition, to avoid the problem of increasing the training time or even failing to converge due to the existence of odd sample data in the dataset, the data are normalized before training.
In this paper, the first 80% of the data in the dataset are used as the training set, and the remaining 20% are divided into the test set.The training set is used to select key parameters for the training model and to build the model, while the test set is used to evaluate the prediction accuracy of the model.ARIMA: First, the series is tested for smoothness and white noise, and the test results show that the series is a smooth and nonwhite noise series after data preprocessing.Then, the autoregressive order, moving average order and differential order of the ARIMA model are set as p, q and d, respectively, and the ranges of p, q and d are set at (0, 2).Combining the AIC and SBC, the parameters of the ARIMA model are finally determined as follows: p = 2, q = 1, and d = 1, which means fitting an ARIMA( The parameters of the TCN and ESN are specified in ( 5) and (6).

CEEMDAN parameter settings
In this paper, the CEEMDAN algorithm is used in the PyEMD package to set different modal numbers to decompose and test the training set data.The results show that when the modal number is set to 15, the score of each subseries obtained after CEEMDAN decomposition is the most stable.

Sequence reconstruction parameter setting
This paper uses MATLAB to reconstruct all the subseries obtained after CEEMDAN decomposition except the trend term according to the high-and low-frequency characteristics and define the sum of the first i signal components as index i .The basis for dividing the higher-frequency series and lower-frequency series is set as whether the mean value of the first k indexes is significantly different from zero or not.If it is not significantly different from zero, the first k signal components will be divided into a higher-frequency series, and if it is significantly different from zero, then the first k-1 components will be classified as a higher-frequency series, and the remaining components will constitute a lower-frequencytrend series with a trend term.

TCN parameter settings
In this paper, the higher-frequency series obtained after reconstruction are used as the input series of the TCN prediction model.Based on the theoretical research in related aspects and combined with the specific dataset selected in this paper, the key parameters of the TCN are finally shown in Table 2. Since the trend of power load data tends to be temporally periodic, and generally on a daily basis, the input of each sample of the TCN network in this paper is the power load data of the previous 24 hours, and the output is the point estimate of the current power load forecast.
The TCN network consists of one residual module, followed by a fully connected layer, in which the residual module contains two convolutional units and one nonlinear mapping.To extract the sample information more fully, the convolutional kernel size of the convolutional unit is preset to the common Figs 2 and 3, and it is found that the prediction effect is significantly better when the convolutional kernel size is 3 than 2 after preliminary experiments.The dropout layer is established to prevent overfitting, and the dropout rate is set to 0.2 (i.e., some neurons will be randomly deleted with a probability of 0.2).The expansion coefficient (dilations) is set to [1,2,4,8] (exponentially increasing by 2), and the number of filters is set to 64.Since temporal signal modeling cannot violate the temporal order, which will produce causal convolution, the padding parameter is set as 'Causal'.For the choice of model optimizer, the Adam optimizer has a faster and more stable convergence speed in this experiment compared to stochastic gradient descent.For other parameters, the number of batch sizes and max epochs are set to 32 and 80, respectively, and the loss function is chosen as the commonly used MSE.
(2) ESN parameter settings: The lower-frequency series obtained after reconstruction are used as the input series of the ESN prediction model.Based on the theoretical research in related aspects and combined with the specific dataset selected in this paper, the key parameters of ESN are finally shown in Table 3.
For an echo state network (ESN), the main parameters include the size of the reserve pool (N R ), the spectral radius (ρ), the input scale factor (W in ), and the output regularization factor (λ r ).MATLAB is used to complete the ESN prediction experiment for lower-frequency-trend series, setting the number of samples in the initialized reserve pool to 100, the size of the reserve pool to 800, the update rate to 0.001, and the output regularization factor to 0.001.During the propagation of the echo state network, the weight matrix from the input layer to the reserve pool (W in ), the weight matrix of the output layer to the reserve pool (W back ), and the internal neuron connection weight matrix (W) in the reserve pool are set in advance and generated randomly.The spectral radius is the maximum value in the mode of all eigenvalues of the weight matrix W. For the selection of the optimizer, "Adam" is chosen.The loss function is still the commonly used MSE.

Single model prediction.
To demonstrate the enhancement effect of CEEMDAN decomposition on short-term electricity load forecasting, this paper first uses different single models for short-term electricity load forecasting without decomposing the sequence, and eight different forecasting methods (BiLSTM, ARIMA, GRU, RBF, CNN, LSTM, TCN, and ESN) are selected for experiments.The evaluation results of each model on the test set (Panama national electricity load data from 4:00 on August 8th, 2019, to 23:00 on December 31st, 2019) are shown in Table 4 and Fig 6.
After analyzing the results of the single model forecasting experiments and considering the three evaluation indicators, this paper finds that the TCN model has the best forecasting effect on this dataset, BILSTM is the second best single forecasting model, and the ARIMA model has the worst forecasting effect.This shows that the two neural network models, TCN and BILSTM, have excellent performance in the short-term electric load forecasting task.The possible reasons for this are as follows: (1) short-term power load data are often influenced by many factors, and they often have complex and variable characteristics with weak regularity, so using traditional statistical models (e.g., ARIMA) to forecast them can only extract linear trends or some simple features in the data, so the prediction results will be worse.Compared with the traditional statistical model, the neural network model has more advantages in processing data with complex and changeable characteristics.(2) For the short-term power load prediction problem, RNNs (such as LSTM and BILSTM) may work better than RBF and CNN, and LSTM has an extra gate compared with GRU, which can be used to control the flow of information, so it may improve the prediction effect.(3) Meanwhile, BILSTM, as a two-way (long-term and short-term memory), consists of a combination of forward LSTM and backward LSTM, which can better capture the relationship between the sequence of temporal features than the unidirectional LSTM, taking into account both the information in front and behind.Since time series prediction sometimes needs to use both previous and later data as input items to build the model, BILSTM can improve the prediction effect compared with LSTM.(4) Unlike LSTM, which can only process time series data sequentially, TCN can process time series data in parallel.At the same time, the TCN contains many parameters and has a more complex structure.TCN has a flexible perceptual field (jointly determined by the size of the convolution kernel, the dilation coefficient, and the number of residual blocks), so it can flexibly cope with different prediction problems by adjusting the parameters.In addition, LSTM often suffers from gradient disappearance and gradient explosion, while TCN circumvents these problems.In summary, TCN performs best when the original series is predicted directly using a single model after data preprocessing only.Considering the complexity of short-term power load data features, this paper next performs a modal decomposition of the series to fully extract the features of this series and complete further prediction tasks.The original Panamanian national electricity load data series is decomposed into several IMF components with different frequencies, and the IMF component series with lower frequencies show a smoother trend.The original time series is divided into 15 subseries after CEEMDAN decomposition.The frequency of the signal components from IMF1 to IMF15 gradually decreases, the amplitude gradually decreases, the wavelength gradually increases, and there is no obvious trend of change in each subseries.The smoothness of the decomposed subseries increases compared with the original series, and the noise contained in the sequence gradually decreases; thus, sequence decomposition is likely to improve the prediction effect.

IMF component reconstruction results.
Since the original Panamanian national electric load data series will obtain many decomposition modes after decomposition, the decomposition modes with high similarity need to be reconstructed to avoid many feature calculations and manual analysis.Considering that different models have significant differences in applicability in both high-and low-frequency data, while the characteristics between the series attributed to the same frequency are approximately the same, i.e., the models have basically the same prediction effect on them, higher-frequency, lower-frequency and trend components are selected for discriminant reconstruction.This paper first sums the IMF components obtained from CEEMDAN decomposition item by item, defines the sum of the first i IMFs as index i (i = 1,2. ....k), calculates the mean value of index 1 to index k , and performs Student's t test to determine whether the mean value is significantly different from zero to discriminate the high-and low-frequency components.The discriminant result is that the first four signal components, namely, IMF1-IMF4, are high-frequency components, IMF5-IMF14 are low-frequency components, and IMF15 is the trend term.The higher-frequency and lower-frequency components are then summed separately to achieve sequence reconstruction.The reconstructed higher-frequency, lower-frequency and trend (partial) series are shown in Fig 8 .4.4.4TCN experimental results.To compare the prediction performance of different models in terms of time series of higher-frequency electricity load data after reconstructing the IMF components, eight different prediction methods such as BiLSTM, ARIMA, GRU, RBF, Based on the graphs, the evaluation indicators of the TCN model are found to be optimal compared to the evaluation indicators of other models after analysis, which indicates that the TCN model has excellent performance in the prediction of higher-frequency power load data.The prediction process of the TCN model for higher-frequency electric load data is shown in Fig 10.This paper analyzes the possible reasons for the excellent performance of the TCN model compared to other models in predicting higher-frequency electric load data as follows: (1) First, TCN can perform parallel processing when making predictions while facing time series sensitive problem tasks, although RNN (e.g., LSTM) may usually be more suitable than BP and RBF.It can be used to control the information flow and thus may improve the prediction results, but compared with TCN, it can only perform sequential processing.Therefore, it may  be less effective than TCN for higher-frequency time series prediction.(2) Second, TCN has a flexible perceptual field, which is determined by the number of layers, convolutional kernel size, and expansion coefficient.It can be flexibly customized according to the different characteristics of different tasks.Therefore, when dealing with high-frequency data, there is no problem of gradient disappearance and gradient explosion, as other models often have.Some research results show that TCNs with the introduction of architectural elements such as null convolution and residual connectivity are more effective than recursive architectures such as LSTM in different time series modeling tasks.(3) While ARIMA is a traditional time series forecasting method, its accuracy is relatively poor compared to that of neural network forecasting models.Moreover, ESN presents crossover prediction results among the eight models, presumably because ESN is not suitable for time series prediction modeling of this dataset.Therefore, TCN is improved in different aspects compared with other models, so the TCN model is chosen to predict the time series of high-frequency power load data after the reconstruction of IMF components.Based on the graphs, the evaluation indicators of the ESN model are found to be optimal compared to the evaluation indicators of other models after analysis, which indicates that the ESN model has excellent performance in predicting low frequency and trend term power load data.
This paper analyzes the possible reasons for the excellent performance of the ESN model compared with other models as follows: (1) Compared with traditional recurrent neural networks, the biggest advantage of ESN is that it simplifies the training process of the network, solves the problems of the traditional recurrent neural network structure that is difficult to determine and overly complex training algorithms, and overcomes the problems of memory reduction in recursive networks.(2) Compared with CNN, LSTM and other neural network models with more complex structures, ESN uses a large-scale random sparse network (reserve pool) as the information processing medium, maps the input signal from the low-dimensional input space to the high-dimensional state space, and uses a linear regression method to train some connection rights of the network in the high-dimensional state space, while the other connection rights are randomly generated and kept constant during the network training process.It also effectively avoids the problem of overfitting.
Therefore, according to the above analysis, the ESN is improved in different aspects compared with other models, so the ESN model is chosen in this paper to predict the time series of lower-frequency and trend series.

CEEMDAN-TCN-ESN experimental results.
Based on the above analysis, the TCN model is chosen to predict the reconstructed higher-frequency series, while the ESN model is chosen to predict the reconstructed lower-frequency and trend series.The TCN model, which is the most effective among the single models, is then used to predict all 15 subseries obtained from the original series after CEEMDAN decomposition, and the evaluation indicators of the two methods are compared to highlight the superiority of the CEEMDAN-TCN-ESN model.That is, this paper compares the prediction performance of the two models resulting from whether the IMF components are reconstructed with respect to the time series of power load data and the improvement effect of the above two models with respect to the prediction effect of the best single prediction model TCN without decomposition.In this paper, based on the single model prediction in 4.   indicating that the target model proposed in this paper has a higher prediction accuracy, better prediction performance, and a significant improvement in the prediction effect.This may be due to the division into two series of high-frequency and low-frequency trends in the reconstruction process.Then, relatively suitable models are selected for modeling and prediction according to the characteristics of different sequences to enhance the final prediction effect.Additionally, the summation of high-frequency, low-frequency and trend terms after their  identification also has a certain degree of denoising effect, so the prediction effect is also improved.

Robustness analysis
Robustness analysis is critical and necessary for better application of the model to real-life applications.The data used in this section are derived from the solar power hourly data of the United States in the Kaggle data platform's short-term electricity load forecasting dataset.A total of 19445 solar power hourly data points (from 0:00 on January 1st, 2020   From Table 8 and Fig 13, it is clear that the hybrid CEEMDAN-TCN-ESN model proposed in this paper still has good prediction results when applied to other datasets, which suggests that the model has strong robustness, and the model's adaptability to different datasets and prediction results are more satisfactory.

Conclusion
Due to the complex characteristics of the data and many related influencing factors, this paper establishes a hybrid model of the TCN and ESN based on CEEMDAN decomposition and high-and low-frequency series reconstruction modes for the STLF problem.In this paper, CEEMDAN decomposition is used to decompose the original power load series into subseries with different frequency characteristics, and all the subseries are reconstructed according to their frequency characteristics to obtain the higher-frequency and the lower-frequency-trend series.Then, the same eight models are used to predict the higher-frequency and lower-frequency-trend series, and the best prediction models for the two reconstructed series are TCN and ESN.Finally, the prediction results for the two series are combined to output the final power load forecast results.Through the experimental analysis and the comparative analysis of different models, the results show the following: (1) For the prediction of the original electricity load series, TCN has the best performance among the single prediction models chosen for this experiment, indicating that it can extract the features of the original time series to a large extent.BILSTM second, and ARIMA is the worst, indicating that deep learning approaches generally outperform traditional statistical models in STLF.(2) CEEMDAN decomposition can fully extract time series features and significantly improve the prediction accuracy of the model.(3) Sequence reconstruction can fuse the sequences with similar frequency features in the subseries obtained from CEEMDAN decomposition to obtain two reconstructed series.The optimal model is selected for each of the two series to produce a hybrid model to further improve the prediction effect of the model.( 4) The RMSE, MAE, and R 2 of the proposed hybrid model are 15.081, 10.944, and 0.994, respectively, which are higher than those of the other models.Compared to the best single prediction model (TCN), the RMSE is reduced by 51.3%, the MAE is reduced by 52.85%, and the R 2 is improved by 2.16%.Compared to the second-best model (CEEMDAN-TCN), the RMSE is reduced by 9.52%, the MAE is reduced by 17.39%, and the R 2 is improved by 0.20%.
The CEEMDAN-TCN-ESN model proposed in this paper performs well in STLF and has certain application prospects, but the model still has the following problems: the expenditure of the model is large, the depth of the TCN is shallow, and only the hour-by-hour forecasting of short-term power load data is performed.In the future, if the experimental budget is sufficient, deepening the depth of the network to explore the prediction accuracy of the proposed model for 3 h, 6 h, 12 h or even 24 h in the future and expanding the scope of the model could be considered.

4 .
Integration of Predicted Results.The training-set prediction data of the two sequences in (3) are summed to obtain the final training-set prediction results and evaluation indicators of CEEMDAN-TCN-ESN; similarly, the test-set prediction data of the two sequences are summed to obtain the final test-set prediction results and evaluation indicators of CEEMDAN-TCN-ESN in this paper.

4 . 4 . 5
ESN experimental results.To compare the prediction performance of different models in terms of time series of electricity load data with low frequency and trend terms after reconstructing the IMF components, 8 different prediction methods (BiLSTM, ARIMA, GRU, RBF, CNN, LSTM, TCN, and ESN) were selected for experiments.The evaluation results and prediction results of each model on the test set are shown in Table 6 and Fig 11.

Fig 9 .Fig 10 .
Fig 9. Experimental results of the prediction of high-frequency data by each model.https://doi.org/10.1371/journal.pone.0284604.g009 4.1, 3 different prediction methods (CEEMDAN-ESN, CEEM-DAN-TCN and CEEMDAN-TCN-ESN) are further selected for the experiments.The evaluation indicators and forecasting results of the 5 models on the test set are shown in Table 7 and Fig 12.In this paper, the superiority of the CEEMDAN-TCN-ESN model is demonstrated based on the comparison model.Based on the results in Fig 12, the following conclusions can be drawn: comparing the evaluation indicators of the model without the reconstruction strategy with that of the model with the reconstruction strategy, the following conclusions can be

drawn: ( 1 )
The prediction effects of the CEEMDAN-TCN model and the CEEMDAN-ESN model with the CEEMDAN modal decomposition-based approach are significantly improved compared with the single TCN model and the ESN model, respectively.obtained a significant improvement, i.e., the CEEMDAN-TCN and CEEMDAN-ESN models reduced RMSE by 46.17% and 32.95% and MAE by 42.92% and 30.02%, respectively, compared to the TCN model and ESN model without the decomposition strategy.R 2 increased by 1.95% and 1.96%, respectively.This indicates that the high-frequency, low-frequency, and trend term subseries out of the CEEMDAN-based decomposition are smoother and more regular, which in turn reduces the effect of data instability on the prediction effect and indicates that TCN and ESN are more applicable to the prediction of the CEEMDAN-decomposed series.(2) The CEEM-DAN-TCN-ESN model with the reconstruction strategy is more suitable than the CEEM-DAN-TCN-ESN model without the reconstruction strategy.For the CEEMDAN-TCN model and CEEMDAN-ESN model, the RMSE decreased by 9.52% and 35.73%, and the MAE decreased by 17.39% and 40.15%, respectively.R 2 increased by 0.20% and 0.91%, respectively.It shows that the CEEMDAN-TCN-ESN model has a significant improvement in model evaluation indexes compared with the CEEMDAN-TCN model and the CEEMDAN-ESN model,

to 4 :
00 on March 21st, 2022) in 48 states of the United States are selected for the robustness analysis experiments of the proposed model.The first 16445 data points are used as the training set, and the last 3000 data points are used as the test set.The final prediction results of each model on the test set are shown in Table 8 and Fig 13.

Table 1 . The statistical results of raw data.
Single prediction model parameter settingsIn this experiment, eight single models are selected to model the time series directly after data preprocessing.The eight single models selected are ARIMA, LSTM, BILSTM, CNN, TCN, GRU, RBF, and ESN.The parameters of each single model are set as follows.
2,1,1) model.BILSTM: The model parameters of BILSTM in this test are set as follows.The number of hidden layer and hidden layer units is set to 2 and 32, respectively, the learning rate is set to 0.01, and the training round number is set to 80. "Adam" is chosen as the optimizer, the loss function is MSE, and the batch size is 32.LSTM: The model parameters of LSTM in this test are set as follows."ReLU" is chosen as the activation function, the learning rate and the discard rate are set to 0.01 and 0.2, respectively, and the training round number is set to 80. "Adam" is chosen as the optimizer, the loss function is MSE, and the batch size is 32.
16N: The model parameters of the CNN in this test are set as follows.The number of filters and the convolutional kernel size are set to 32 and 2, respectively, "ReLU" is chosen as the activation function, both the number of strides and pool_size are set to 1, the training round number is set to 80, "Adam" is chosen as the optimizer, the loss function is MSE and the batch size is16.GRU: The model parameters of the GRU in this test are set as follows.The number of hidden layer units and the discard rate are set to 100 and 0.2, respectively, the output layer contains 1 neuron, and the number of training rounds is set to 80. "Adam" is chosen as the optimizer, the loss function is MSE, and the batch size is 32.RBF: The model parameters of RBF in this test are set as follows.The radial basis function expansion rate is set to 1000.

Table 7 . Evaluation results of the forecast model on the indicators of electric load data.
https://doi.org/10.1371/journal.pone.0284604.t007

Table 8 . Evaluation results of the forecast model on the indicators of solar power hourly data.
https://doi.org/10.1371/journal.pone.0284604.t008