Figures
Abstract
This study focuses on improving short-term power load forecasting, a critical aspect of power system planning, control, and operation, especially within the context of China’s "dual-carbon" policy. The integration of renewable energy under this policy has introduced complexities such as nonlinearity and instability. To enhance forecasting accuracy, the VMD-SE-BiSATCN prediction model is proposed. This model improves computational efficiency and reduces prediction errors by analyzing and reconstructing sequence component complexity using sample entropy (SE) following variational mode decomposition (VMD). Additionally, a self-attention mechanism is integrated into the temporal convolutional network (TCN) to overcome the traditional TCN’s limitations in capturing long-term dependencies. The model was evaluated using data from the China Ninth Electrical Attribute Modeling Competition and validated with real-world data from a specific county in Shijiazhuang City, Hebei Province, China. Results indicate that the VMD-SE-BiSATCN model outperforms other models, achieving a mean absolute error (MAE) of 92.87, a root mean square error (RMSE) of 126.906, and a mean absolute percentage error (MAPE) of 0.81%.
Citation: Huang Y, Feng Q, Han F (2024) Short-term power load forecasting in China: A Bi-SATCN neural network model based on VMD-SE. PLoS ONE 19(9): e0311194. https://doi.org/10.1371/journal.pone.0311194
Editor: Nebojsa Bacanin, Univerzitet Singidunum, SERBIA
Received: June 19, 2024; Accepted: September 15, 2024; Published: September 30, 2024
Copyright: © 2024 Huang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
As large quantities of power resources cannot be retained as instantaneous energy sources, the generation and consumption of electrical loads must occur simultaneously. Relying on historical data, short-term power load forecasting reflects the trends in future power consumption. By leveraging power forecasting results, the power generation department can develop production and scheduling plans, thereby significantly enhancing the efficiency of the power system. The aspiration and objective of "striving to peak carbon dioxide emissions by 2030 and achieve carbon neutrality by 2060" was declared by the Chinese government in recent years [1]. The incorporation of new energy sources leads to a more varied energy structure, which accentuates the nonlinearity and instability of loads. As a result, load forecasting faces increasingly complex challenges. Precise prediction plays a crucial role in ensuring the secure and stable operation, as well as the advancement, of the power system [2].
Short-term load forecasting is an essential foundation for power system scheduling and planning, necessitating the creation of increasingly accurate prediction models. Current models for short-term load forecasting can be broadly categorized into two groups: traditional methods [3,4] and artificial intelligence methods [5–7]. Common traditional techniques include regression analysis [8] and Kalman filtering [9]. However, conventional techniques are typically constrained by their frameworks and often perform poorly when handling complex nonlinear time-series data. With advancements in high-performance computing and big data technology, artificial intelligence methods have become the primary choice for prediction, resulting in significant improvements in modeling nonlinear temporal characteristics. Artificial intelligence techniques encompass machine learning [10–13] and deep learning [14–17], with deep learning techniques dominating the field. Presently, the prevailing approaches for short-term power load forecasting primarily are based on recurrent neural network (RNN) models [18]. Long short-term memory (LSTM) neural networks and Gated Recurrent Unit Network (GRU) are the most representative, as they effectively mitigate the gradient vanishing problem common in conventional RNNs during long-term sequences. Kwon et al [19] use the LSTM model to estimate power load time series data through regression, considering the impact of various temporal dimensions on the load. In their study, Yan et al [20] employed the Bi-LSTM model as a power load prediction model to enhance the accuracy of forecasting results. Some researchers have integrated the feature extraction capabilities of convolutional neural networks (CNN) into hybrid prediction models, conducting extensive research in this area. Wu et al [21] proposed a short-term power load forecasting model based on CNN and BiLSTM, where CNN effectively extracts features, while the LSTM-BiLSTM layers predict the load based on these extracted features.
Although the extraction capability can be improved by stacking CNN layers, the receptive field of deep CNN layers is limited by the fixed convolution kernel size. The temporal convolutional network (TCN) model [22–24] can effectively extract features from complex nonlinear time series and generate predictions. Its dilated causal convolutional component significantly improves the learning capability of deep networks. Wang et al [25] used the TCN model for power load forecasting. TCN offers the advantage of supporting massively parallel processing, unlike RNN forecasting. However, traditional TCNs still face the problem of limited receptive fields. Some researchers have incorporated an attention mechanism into the network architecture to improve the model’s performance, enabling it to extract more relevant information features. Cheng and colleagues [26] incorporated the Convolutional Block Attention Module (CBAM), which combines spatial and channel attention mechanisms, into a TCN model. The application of TCN-CBAM in predicting complex chaotic time series demonstrated its ability to extract time series features more efficiently. Tang et al [27] combined TCN with a temporal attention mechanism, effectively extracting the nonlinear relationship between meteorological factors and load. Tong et al [28] proposed an Attention-based Spatiotemporal Convolutional Network (ACN), which integrates a one-dimensional convolutional block attention module (CBAM) structure, achieving the highest prediction accuracy with fewer parameters.
Extracting beneficial trends from power load series data is challenging for a single forecasting model due to the nonlinear, variable, and stochastic nature of the data. Analyzing valuable data from power load series that exhibit significant fluctuations remains a major challenge. Time series decomposition techniques, such as wavelet analysis (WA) [29,30] and empirical mode decomposition (EMD) [31–33], have been widely employed by researchers in power load forecasting. Meng et al. [34] developed a prediction model integrating EMD with Bi-LSTM. However, this approach is susceptible to modal aliasing when sudden changes occur in the signal. As a result, more researchers are turning to variational mode decomposition (VMD) [35,36], which can dynamically determine the number of decompositions based on specific conditions. This method offers greater adaptability and effectively reduces the likelihood of modal aliasing. To further enhance prediction accuracy, Ran et al [37] employed sample entropy (SE) to recombine the data after mode decomposition, which enhances the efficiency of feature extraction. Table 1 summarizes the applications of machine learning and deep learning in power load forecasting. Therefore, the short-term power load forecasting model presented in this research employs a hybrid decomposition technique combining VMD and SE to effectively dissect load sequences and extract their characteristics.
In conclusion, this research proposes a bi-directional SATCN model for predicting power load data, with initial decomposition performed using VMD technology. To facilitate feature extraction for load sequences with high fluctuations, SE is also employed to reconstruct the sequences following modal decomposition. The experimental data for this study were obtained from the regional load dataset of the China Ninth Electrical Attribute Modeling Competition. The validation data were collected from a county in Shijiazhuang City, Hebei Province, China.
The following are the contributions of this study:
- The study develops a predictive model rooted in the principles of deconstruction and reconstruction. Most contemporary research employs modal decomposition of time series and subsequently inputs the decomposed individual component series directly into forecasting models. These forecasting approaches tend to ignore the interdependence among the decomposed series, which leads to a higher margin of error. This study proposes the use of sample entropy to reconstruct the time series data following VMD decomposition, resulting in enhanced prediction accuracy and decreased model training complexity.
- By incorporating a self-attention mechanism module into the TCN, it becomes possible to capture intrinsic sequence correlations and focus the network’s attention on the most relevant features.
- The construction of a bidirectional SATCN model enables feature extraction from both forward and backward directions, thereby improving load forecasting capability.
The other sections of the paper are organized as follows: section 2 describes the methodology, section 3 details the presented model, section 4 analyzes and discusses the results of the experiment, and section 5 concludes the research.
2. Theories and methods
2.1 Variational Mode Decomposition (VMD)
Proposed in 2014, variational mode decomposition (VMD) is a fully non-recursive and adaptive technique for signal sequence decomposition. It can decompose a complex original signal into several simple intrinsic mode functions (IMFs) [38]. This approach effectively mitigates the endpoint effect and modal aliasing problems associated with empirical mode decomposition (EMD). As a result, VMD can process nonlinear and non-smooth signals more effectively. Because weather and seasonality can affect short-term power load, VMD is better equipped to adjust to these fluctuations, making it particularly well-suited for short-term power load forecasting.
Let’s assume that the original sequence f(t) is decomposed into k eigenmode functions uk(t):
(1)
where Ak(t) is the instantaneous amplitude and Ak(t)≥0; ωk(t) is the instantaneous frequency.
The flow of VMD decomposition is shown in Fig 1 with the following steps:
- Introduce a quadratic penalty factor α and the Lagrange multiplication operator λ(t) to construct an unconstrained variational problem. Set the initial values of
, and λ1.
- By alternately updating the values of
, and λn+1 to calculate the extreme points of the extended Lagrangian expression, and ultimately to determine the optimal set of load decomposition modes. The updated equation:
(2)
(3)
(4) where n is the number of iterations; τ is the convex function optimization parameter.
- The IMF component of a VMD decomposed original power load sequence is derived by iterative computation until either the convergence requirement is met. The overall flow of VMD decomposition is shown in Fig 1.
2.2 Sample Entropy (SE)
Based on approximate entropy (AE), sample entropy (SE) provides a numerical measure of time series complexity [39]. Compared to AE, SE calculates the approximation without involving data segment comparison and is independent of data length, offering better consistency and reducing the errors associated with AE.
After decomposing a time series, the resulting component sequences capture different frequency and amplitude components of the power load signal. These component sequences might share certain similarities. However, predicting each component sequence individually would escalate computational complexity and potentially amplify prediction errors. By reconstructing IMF component sequences using sample entropy, it becomes possible to more accurately and reliably depict the overall dynamic behavior of power load signals, thereby enhancing the precision and resilience of short-term power load forecasting. The specific steps for this calculation are outlined below:
Let the original data be {u(i),i =1,…,N} with N points.
- Transform the sequence into an m-dimensional vector, i.e., u(i) = [u(i),u(i+1),…,u(i+m−1)], i = 1,2,…,N−m+1. Define d[u(i),u(j)] as the maximum difference of the corresponding elements between two vectors u(i) and u(j).
(5) where i,j∈[1,N−m+1],i≠j.
- Calculate the total number of d[u(i),u(j)]<r for each u(i) and determine the percentage share.
(6)
- Compute the average of
for all i, denoted Bm(r):
(7)
- Repeat steps (1) ~ (3) as the dimension m increases by 1 and compute Bm+1(r).
- The SampEn value of the final time series is expressed as:
(8) where the similarity tolerance r is usually 0.1 to 0.25 times the time series standard deviation, and the parameter dimension m is typically selected as 1 or 2.
2.3 Temporal Convolutional Network (TCN)
The temporal convolutional network (TCN) is a temporal model derived from convolutional neural networks (CNNs). The main architecture of TCN primarily consists of residual blocks and dilated causal convolutions.
TCN has proven effective in short-term power load forecasting. It can effectively capture dependencies in time series data, enabling a deeper understanding of power load fluctuation patterns. Furthermore, the parallel processing capability of TCN allows for efficient processing of large datasets, while its residual module architecture mitigates the issue of gradient vanishing and enhances training stability. This is particularly crucial when handling complex time series data, such as power load.
2.3.1 Dilated Causal Convolution (DCC).
Temporal convolutional networks utilize dilated causal convolution (DCC) to expand the receptive field, as depicted in Fig 2. DCC samples the input data at intervals, where d denotes the size of the interval. This approach allows for a larger receptive field to be achieved with fewer convolutional layers. The expression for dilated convolution is given by Eq (9).
(9)
where X represents the input data sequence and
denotes the output results; F(t) represents the convolution result for the t-th element in the input data (X0,…,Xt); h(i) is the i-th element in the convolution kernel; k represents the convolution kernel size, and d is the dilation factor.
2.3.2 Residual Block (RB).
The residual block (RB) is a proven technique for overcoming the challenges associated with training deep networks. The architecture of the RB is illustrated in Fig 3. In TCN, the input to the residual block is denoted by X, and the output is represented by o, as shown in Eq (10).
(10)
where Activation is the activation function, which in this research is set to ReLU.
The TCN network is constructed by stacking multiple residual blocks, each of which has two dilated causal convolution layers. The weight normalization layer (Weight Norm) standardizes the weights and normalizes the inputs to the hidden layers. The activation function, ReLU, introduces nonlinearities into the TCN network. Dropout regularization prevents overfitting, while residual connections directly map the inputs and mitigate network degradation caused by adding more layers.
2.4 Self-Attention mechanism (SA)
Vaswani et al originally proposed the self-attention mechanism [40], which allows for focusing on significant information while reducing attention to irrelevant data through weight allocation. The self-attention mechanism is specifically designed to capture interdependencies within a sequence. Fig 4 illustrates the self-attention architecture, where the input data is multiplied by three weight matrices—Wq,Wk, and Wv, to produce three vectors: Q (query), K (key), and V (value).
The specific arithmetic is shown below:
(11)
where dK is the dimension of Q,K and
plays a moderating role.
3. The proposed hybrid model
This research introduces a novel forecasting model incorporating VMD, SE, and the Bi-SATCN network to accurately predict short-term power loads. Firstly, VMD is utilized to decompose the highly unstable load sequence into multiple intrinsic mode functions (IMFs) with different center frequencies. The decomposed IMF component sequences are subsequently reconstructed using their sample entropy values, creating new component sequences termed SE-IMF. Each component is integrated with meteorological factors. The extracted features are then fed into the Bi-SATCN network for training and prediction. The Bi-SATCN network enhances the TCN model by incorporating a self-attention mechanism module into its framework. This modification effectively captures long-term dependencies and critical information. Additionally, to address the issue of insufficient forward and backward data correlation, a bidirectional TCN model structure was developed. Fig 5 illustrates the basic structure of the prediction model.
The model consists of three primary components: a time series decomposition layer, a reconstruction layer for decomposition sequences based on sample entropy, and a Bi-SATCN neural network prediction layer.
3.1 Time series decomposition layer
The raw data consists of daily meteorological data m(t) gathered daily and hourly load data f(t). In the experiment, the load data f(t) was decomposed using VMD to obtain five IMF components that are relatively smooth. Fig 6 illustrates the outcome of this decomposition.
By observing Fig 6, from top to bottom, the original, undecomposed power load data is shown, followed by the five IMF components derived from the decomposition. Each IMF sequence exhibits certain similarities. IMF2, IMF3, IMF4, and IMF5 all oscillate around the 0-axis and exhibit symmetry around the 0-axis. However, IMF4 displays more tightly spaced oscillations. The fluctuations of IMF1 and IMF2 are comparatively infrequent, with IMF1 showing the smallest amplitude. The trend of IMF1 aligns with the general trend of the original load.
3.2 Decomposition sequence reconstruction layer based on sample entropy
Reconstructing highly similar IMFs is beneficial for enhancing the efficiency of load forecasting and decreasing the computational workload. In this subsection, sample entropy (SE) is used to calculate the complexity of each IMF derived from the decomposition in the preceding section. IMF components with similar SE values are then combined and reorganized into new components (SE-IMF). The similarity tolerance r is set to 0.1Std, and the parameter dimension m is set to 2, as introduced in Section 2.2. Fig 7 illustrates the calculation results of sample entropy.
The sample entropy value reflects the complexity or irregularity within the signal. Higher sample entropy values indicate a more complex signal, whereas lower sample entropy values suggest a more regular signal. It can be observed that the sample entropy value of the original load sequence is 0.56, and the sample entropy values for IMF1—IMF5 are 0.11, 0.35, 0.23, 0.65, and 0.29, respectively. Since the sample entropy value of IMF4 is higher than that of the original sequence, it is classified as a new high-frequency sequence; IMF2, IMF3, and IMF5 are grouped into a mid-frequency sequence, and IMF1 is designated as a low-frequency sequence. Fig 8 illustrates the outcome of the recombination.
In Fig 8, IMF1 is renamed as the low-frequency sequence SE-IMF1, while IMF2, IMF3, and IMF5 are merged and renamed SE-IMF2. IMF4, which exhibits the most tightly fluctuating behavior, is named the fluctuation sequence SE-IMF3.
3.3 Bi-SATCN neural network prediction layer
This research will utilize the Bi-SATCN network model to forecast power load. By incorporating self-attention into the TCN network architecture, it is possible to emphasize important features in the data that are crucial for load prediction. This is achieved by assigning different weights to emphasize and capture the internal structural features of the sequence. Furthermore, the Bi-SATCN network takes into account the correlation between data nodes in both forward and backward directions by utilizing bidirectional hidden layers in its learning mechanism. Fig 9 illustrates the architecture of the Bi-SATCN neural network.
As illustrated in Fig 9, the Bi-SATCN architecture consists of forward SATCN and backward SATCN. Let X denote the load sequence, where N is the length of X, X = {x1,x2,…,xN}, and represents the reverse sequence of X. By feeding the forward sequence X into the forward SATCN, the output is denoted by
, Similarly, the reverse sequence
is fed into the backward SATCN, with the output is denoted by
. The Bi-SATCN can be expressed as follows:
(12)
(13)
(14)
where SATCN denotes the forward SATCN,
denotes the backward SATCN, ⊕ represents the matrix addition operation,
is the inverse of
, and Output represents the final output of Bi-SATCN.
The network output at the present time , is given by:
(15)
In the network, SATCN operates on inputs from time t andearlier (x1,x2,…,xt), and operates on inputs from time t and later (xN,xN−1,…,xt). This bidirectional approach enables the network to leverage deep interactions, fully extracting features through both forward and backward processing.
3.4 Forecasting model process
Fig 10 illustrates the flowchart for short-term power load forecasting using the proposed model. The steps involved in the procedure are as follows:
Step 1: Divide the data into load data f(t) and meteorological data m(t). Load forecasting results are influenced by various external factors, with changes in meteorological conditions having a direct and immediate impact on power loads. The most commonly considered meteorological data include temperature, humidity, rainfall, and other related factors.
Step 2: Using VMD, the highly fluctuating load data is decomposed into several IMF components of different frequencies.
Step 3: After calculating each IMF’s sample entropy values, new subsequences (SE-IMF) are created by recombining components with similar complexity.
Step 4: Each of the reconstructed components is individually fused with the meteorological data m(t).
Step 5: The fused data is fed into the Bi-SATCN network for training. The model’s hyperparameters are updated and modified by computing the loss function MSE.
Step 6: The Bi-SATCN network is used to make predictions for each subsequence. These predictions are then generated by the fully connected layer.
Step 7: The final forecast is obtained by aggregating the predicted values of the components. The assessment indicators used to evaluate the model’s predictive power include Mean Absolute Percentage Error (MAPE), Coefficient of Determination (R2), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE).
4. Experimental analysis
4.1 Experimental setting and data analysis
Using the PyCharm Community Edition 2023.2 x64 development environment, the model was developed for the experiments using the TensorFlow framework.
The power load dataset from Region 1 of the 9th China Electrical Mathematical Modeling Competition was used as the experimental data in this study. The dataset includes power load data and meteorological data from January 1, 2012, to December 31, 2012. The power load data was sampled at 1-hour intervals, with 24 time points per day, resulting in a total of 8,760 time points, as shown in Fig 11. The meteorological factors include the daily maximum temperature, daily minimum temperature, average temperature, relative humidity, and rainfall. Table 2 presents a portion of the meteorological data details (from January 1, 2012, to January 7, 2012). Additionally, real power load data from Shijiazhuang City, Hebei Province, was selected to validate the applicability and scalability of the proposed model. This data includes power load data and daily meteorological data from July 1, 2021, to April 26, 2022, also sampled at 1-hour intervals, with 8,760 time points in total. All datasets were divided, with the first 356 days used as the training set and the last 10 days as the test set, to evaluate the model’s performance.
4.2 Evaluation indicators
The experiments used root mean square error (RMSE), mean absolute percentage error (MAPE), coefficient of determination (R2), and absolute error (MAE) as evaluation indexes. Smaller values of MAE, MAPE, and RMSE indicate higher accuracy. Similarly, the closer the R2 value is to 1, the higher the prediction accuracy. The definitions of these four evaluation indicators are given by Eqs (16) to (20).
(16)
(17)
(18)
(19)
(20)
where Yi and
are the actual and predicted values of the load at that time,
is defined in Eq (20), and N is the total number of load points in the forecast.
4.3 Selection of hyperparameter
The key parameters of the Bi-SATCN network include the size of the convolutional kernel, the number of network layers, and the dilation factor. For small-scale parameter optimization, the grid search method is particularly efficient and feasible. The GridSearchCV method in Python was used to loop through the key parameters for optimization. The parameters were optimized using the Adam optimizer, with the mean square error (MSE) selected as the loss function. Additionally, an early stopping mechanism based on MSE was integrated into the model framework, with the patience parameter set to 15 and the significance threshold set to 0.1%. If the MSE does not decrease by more than 0.1% over 15 consecutive training epochs, training is stopped as a measure to prevent overfitting. Table 3 details the parameter settings for this model on both the public and real datasets. The optimal model parameter configuration was determined through experimentation.
4.4 Proposed method
The hyperparameters were set as indicated in Table 1. In addition to using public datasets, real datasets were also collected to validate the model’s effectiveness. The final predictions for the public dataset are illustrated in Fig 12.
In Table 4, the proposed method achieved a MAPE of 0.81%, RMSE of 126.9058 MW, MAE of 92.8696 MW, and an R2 of 99.18%.
The final prediction of the real data is shown in Fig 13. In Table 5, the proposed method achieved a MAPE of 3.19%, RMSE of 1043.1784 MW, MAE of 682.3129 MW, and an R2 of 90.48%. The comparative tests in the next section will provide further validation of the model’s predictive accuracy.
4.5 Comparative experiment
4.5.1 Ablation experiment.
To evaluate the predictive efficacy of each stage of the model described in this study, four models were constructed: the TCN model, the TCN model based on VMD, the BiTCN model based on VMD, and the BiSATCN model based on VMD. These models were then compared with the final VMD-SE-BiSATCN model proposed in this study. Fig 14 illustrates the predictive performance of the aforementioned comparison models, and the prediction error evaluation metrics for each model are presented in Table 6.
Table 6 demonstrates that the VMD-SE-BiSATCN model, developed in this study, exhibits the lowest MAPE, RMSE, and MAE values of 0.81%, 126.9058 MW, and 92.8696 MW, respectively, when compared to other models. Additionally, it achieves the highest R2 value of 0.9918. As the most fundamental model, the TCN model exhibits comparatively inferior prediction outcomes, thereby demonstrating the efficacy of time series decomposition when compared to the VMD-TCN model. When comparing the remaining three models based on VMD, it is evident that the prediction accuracy of the VMD-BiSATCN model is superior. This highlights the positive impact of incorporating the self-attention mechanism and constructing a bidirectional model, leading to improved prediction accuracy. Ultimately, the accuracy of the VMD-SE-BiSATCN model surpasses that of the VMD-BiSATCN model, indicating the usefulness of sample entropy-based time series reconstruction. In conclusion, the VMD-SE-BiSATCN model developed in this study yields the most accurate prediction results.
4.5.2 Comparison and analysis of different models.
Experiments on an identical dataset are conducted to verify the effectiveness of the methodology proposed in this study. The evaluation metrics of various models are derived by comparing them against SVR, LSTM, Bi-LSTM, and TCN-LSTM models. The results of these metrics are then analyzed.
The following are the comparison models.
- SVR model [41]: A traditional forecasting model based on regression analysis.
- LSTM model [19]: This model effectively addresses the issues of gradient explosion and vanishing that occur in conventional recurrent neural networks (RNNs) when handling long-term sequences.
- Bi-LSTM model [20]: A bidirectional neural network based on LSTM that performs both forward and backward computation.
- TCN-LSTM model [42]: Combines the TCN and LSTM models to extract data features using TCN’s convolutional operations and predict time series changes using the LSTM model.
- VMD-SE-BiSATCN model: The model proposed in this study.
Fig 15 shows the predicted outcomes of each model. The VMD-SE-BiSATCN model predicts results that are closer to the power loads in the public dataset. The evaluation indicators for each of the prediction models are presented in Table 7. The table demonstrates that our proposed model outperforms the other four experimental models. Specifically, the model achieved a MAPE of 0.81%, RMSE of 126.9058 MW, and MAE of 92.8696 MW. The MAPE, the primary evaluation index, was reduced by 85.9%, 79.7%, 78.9%, and 78.7%, respectively. In addition, the R2 for the proposed model is 0.9918, indicating a strong correlation with the actual data.
5. Conclusion
This study proposes a Bi-SATCN network for short-term power load forecasting, integrating VMD and SE. The SE technique is utilized to evaluate the sequences decomposed by each VMD. The components with similar features are subsequently recombined and fed into the forecasting model. Furthermore, the forecasting model utilizes the Bi-SATCN neural network, which incorporates the self-attention mechanism and establishes a bidirectional model structure. The final load prediction is obtained by summing the predicted outcomes from each component. This approach effectively improves the prediction accuracy for highly volatile and nonlinear load series.
To assess the efficacy of the VMD-SE-BiSATCN model, four comparable forecasting models were also constructed. The outcomes of the forecasting analysis demonstrate that the proposed VMD-SE-BiSATCN method outperforms the other four models in short-term power load prediction.
For future research, we propose incorporating additional influencing factors, such as market characteristics, population density, and the level of economic development. Furthermore, this technique could be applied to make predictions in other fields, such as forecasting solar power generation and air quality.
References
- 1. Li J, Ho M S, Xie C, Stern N. China’s flexibility challenge in achieving carbon neutrality by 2060[J]. Renewable and Sustainable Energy Reviews, 2022, 158: 112112.
- 2. Zhu J, Dong H, Zheng W, Li S, Huang Y, Xi L. Review and prospect of data-driven techniques for load forecasting in integrated energy systems[J]. Applied Energy, 2022, 321: 119269.
- 3. Dong X, Deng S, Wang D. A short-term power load forecasting method based on k-means and SVM[J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13(11): 5253–5267.
- 4. Goia A, May C, Fusai G. Functional clustering and linear regression for peak load forecasting[J]. International Journal of Forecasting, 2010, 26(4): 700–711.
- 5. Khwaja A S, Anpalagan A, Naeem M, Venkatesh B. Joint bagged-boosted artificial neural networks: Using ensemble machine learning to improve short-term electricity load forecasting[J]. Electric Power Systems Research, 2020, 179: 106080.
- 6. Bai S, He H, Luo D, Ge M, Yang R, Bi X. A Large‐Scale Group Decision‐Making Consensus Model considering the Experts’ Adjustment Willingness Based on the Interactive Weights’ Determination[J]. Complexity, 2022, 2022(1): 2691804.
- 7. Li D, Sun G, Miao S, Gu Y, Zhang Y, He S. A short-term electric load forecast method based on improved sequence-to-sequence GRU with adaptive temporal dependence[J]. International Journal of Electrical Power & Energy Systems, 2022, 137: 107627.
- 8. Kaytez F, Taplamacioglu M C, Cam E, Hardalac F. Forecasting electricity consumption: A comparison of regression analysis, neural networks and least squares support vector machines[J]. International Journal of Electrical Power & Energy Systems, 2015, 67: 431–438.
- 9. Moonchai S, Chutsagulprom N. Short-term forecasting of renewable energy consumption: Augmentation of a modified grey model with a Kalman filter[J]. Applied Soft Computing, 2020, 87: 105994.
- 10. Guo W, Che L, Shahidehpour M, Wan X. Machine-Learning based methods in short-term load forecasting[J]. The Electricity Journal, 2021, 34(1): 106884.
- 11. Jayasudha M, Elangovan M, Mahdal M, Priyadarshini J. Accurate estimation of tensile strength of 3D printed parts using machine learning algorithms[J]. Processes, 2022, 10(6): 1158.
- 12. Ma X, Dong Y. An estimating combination method for interval forecasting of electrical load time series[J]. Expert Systems with Applications, 2020, 158: 113498.
- 13. Rao C, Zhang Y, Wen J, Xiao X, Goh M. Energy demand forecasting in China: A support vector regression-compositional data second exponential smoothing model[J]. Energy, 2023, 263: 125955.
- 14. Yazici I, Beyca O F, Delen D. Deep-learning-based short-term electricity load forecasting: A real case application[J]. Engineering Applications of Artificial Intelligence, 2022, 109: 104645.
- 15. Fan W, Zhang R, He H, Hou S, Tan Y. A short-term price prediction-based trading strategy[J]. Plos one, 2024, 19(3): e0294970.
- 16.
Fan W, Wen J, Jia X, Shen L, Zhou J, Li Q. EPL: Empirical Prototype Learning for Deep Face Recognition[J]. arXiv preprint arXiv:2405.12447, 2024.
- 17. Eskandari H, Imani M, Moghaddam M P. Convolutional and recurrent neural network based model for short-term load forecasting[J]. Electric Power Systems Research, 2021, 195: 107173.
- 18. Aseeri A O. Effective RNN-Based Forecasting Methodology Design for Improving Short-Term Power Load Forecasts: Application to Large-Scale Power-Grid Time Series[J]. Journal of Computational Science, 2023, 68: 101984.
- 19. Kwon B S, Park R J, Song K B. Short-term load forecasting based on deep neural networks using LSTM layer[J]. Journal of Electrical Engineering & Technology, 2020, 15: 1501–1509.
- 20.
Yan L, Zhang H. A Variant Model Based on BiLSTM for Electricity Load Prediction[C]//2021 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS). IEEE, 2021: 404–411.
- 21. Wu K, Wu J, Feng L, Yang B, Liang R, Yang S, et al. An attention‐based CNN‐LSTM‐BiLSTM model for short‐term electric load forecasting in integrated energy system[J]. International Transactions on Electrical Energy Systems, 2021, 31(1): e12637.
- 22.
Bai S, Kolter J Z, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[J]. arXiv preprint arXiv:1803.01271, 2018.
- 23. Jiang F, Zhang C, Sun S, Sun J. Forecasting hourly PM2. 5 based on deep temporal convolutional neural network and decomposition method[J]. Applied Soft Computing, 2021, 113: 107988.
- 24. Chen J, Lv T, Cai S, Song L, Yin S. A novel detection model for abnormal network traffic based on bidirectional temporal convolutional network[J]. Information and Software Technology, 2023, 157: 107166.
- 25.
Wang H, Zhao Y, Tan S. Short-term load forecasting of power system based on time convolutional network[C]//2019 8th international symposium on next generation electronics (ISNE). IEEE, 2019: 1–3.
- 26. Cheng W, Wang Y, Peng Z, Ren X, Shuai Y, Zang S, et al. High-efficiency chaotic time series prediction based on time convolution neural network[J]. Chaos, Solitons & Fractals, 2021, 152: 111304.
- 27. Tang X, Chen H, Xiang W, Yang J, Zou M. Short-term load forecasting using channel and temporal attention based temporal convolutional network[J]. Electric Power Systems Research, 2022, 205: 107761.
- 28. Tong C, Zhang L, Li H, Ding Y. Attention-based temporal–spatial convolutional network for ultra-short-term load forecasting[J]. Electric Power Systems Research, 2023, 220: 109329.
- 29. Xia C, Zhang M, Cao J. A hybrid application of soft computing methods with wavelet SVM and neural network to electric power load forecasting[J]. Journal of Electrical Systems and Information Technology, 2018, 5(3): 681–696.
- 30. Karapetyan A, Khonji M, Chau S C K, Elbassioni K, Zeineldin H, El-Fouly THM, et al. A competitive scheduling algorithm for online demand response in islanded microgrids[J]. IEEE Transactions on Power Systems, 2020, 36(4): 3430–3440.
- 31. Jia Y, Li G, Dong X, He K. A novel denoising method for vibration signal of hob spindle based on EEMD and grey theory[J]. Measurement, 2021, 169: 108490.
- 32. He Y, Wang Y. Short-term wind power prediction based on EEMD–LASSO–QRNN model[J]. Applied Soft Computing, 2021, 105: 107288.
- 33. Mounir N, Ouadi H, Jrhilifa I. Short-term electric load forecasting using an EMD-BI-LSTM approach for smart grid energy management system[J]. Energy and Buildings, 2023, 288: 113022.
- 34. Meng Z, Xie Y, Sun J. Short-term load forecasting using neural attention model based on EMD[J]. Electrical Engineering, 2022: 1–10.
- 35. Yuan F, Che J. An ensemble multi-step M-RMLSSVR model based on VMD and two-group strategy for day-ahead short-term load forecasting[J]. Knowledge-Based Systems, 2022, 252: 109440.
- 36. Lu J, Feng W, Li Y, Zhang J, Zou Y, Li J. VMD and self-attention mechanism-based Bi-LSTM model for fault detection of optical fiber composite submarine cables[J]. EURASIP Journal on Advances in Signal Processing, 2023, 2023(1): 29.
- 37. Ran P, Dong K, Liu X, Wang J. Short-term load forecasting based on CEEMDAN and Transformer[J]. Electric Power Systems Research, 2023, 214: 108885.
- 38. Dragomiretskiy K.; Zosso D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544.
- 39. Wang J, Cao J, Yuan S, Cheng M. Short-term forecasting of natural gas prices by using a novel hybrid method based on a combination of the CEEMDAN-SE-and the PSO-ALS-optimized GRU network[J]. Energy, 2021, 233: 121082.
- 40. Vaswani A. Attention is all you need[J]. Advances in neural information processing systems, 2017.
- 41. Fan G F, Peng L L, Hong W C, Sun F. Electric load forecasting by the SVR model with differential empirical mode decomposition and auto regression[J]. Neurocomputing, 2016, 173: 958–970.
- 42.
Li H, Sun J, Liao X. A Novel Short-term Load Forecasting Model by TCN-LSTM Structure with Attention Mechanism[C]//2022 4th International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). IEEE, 2022: 178–182.