Figures
Abstract
The consumption forecasting of oil and coal can help governments optimize and adjust energy strategies to ensure energy security in China. However, such forecasting is extremely challenging because it is influenced by many complex and uncertain factors. To fill this gap, we propose a hybrid deep learning approach for consumption forecasting of oil and coal in China. It consists of three parts, i.e., feature engineering, model building, and model integration. First, feature engineering is to distinguish the different correlations between targeted indicators and various features. Second, model building is to build five typical deep learning models with different characteristics to forecast targeted indicators. Third, model integration is to ensemble the built five models with a tailored, self-adaptive weighting strategy. As such, our approach enjoys all the merits of the five deep learning models (they have different learning structures and temporal constraints to diversify them for ensembling), making it able to comprehensively capture all the characteristics of different indicators to achieve accurate forecasting. To evaluate the proposed approach, we collected the real 880 pieces of data with 39 factors regarding the energy consumption of China ranging from 1999 to 2021. By conducting extensive experiments on the collected datasets, we have identified the optimal features for four targeted indicators (i.e., import of oil, production of oil, import of coal, and production of coal), respectively. Besides, we have demonstrated that our approach is significantly more accurate than the state-of-the-art forecasting competitors.
Citation: He J, Li Y, Xu X, Wu D (2025) Energy consumption forecasting for oil and coal in China based on hybrid deep learning. PLoS ONE 20(1): e0313856. https://doi.org/10.1371/journal.pone.0313856
Editor: Jude Okolie, University of Oklahoma, UNITED STATES OF AMERICA
Received: July 7, 2024; Accepted: October 31, 2024; Published: January 6, 2025
Copyright: © 2025 He et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data used in this study is available at the link: https://github.com/wudi1989/Energy_datasets/tree/main.
Funding: Science and Technology Foundation of State Grid Corporation of China under grant 1400-202357341A-1-1-ZN (Identification of Energy Security Risks and Strategic Path Optimization Technology Research under Global Coal-Oil-Gas-Electricity Coupling in China). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
China is the world’s largest importer and one of the largest consumers of oil and coal. Consumption forecasting of oil and coal is crucial in China as it not only provides a clear understanding of the future energy landscape but also helps the government to optimize and adjust strategies, thereby ensuring energy security [1]. For example, with China’s economy’s growth, oil and coal consumption has been gradually increasing [1, 2]. However, China is facing the challenge that its reserves of oil and coal are not abundant. Since becoming a net oil importer in 1993, China’s external dependence on oil has exceeded 65% [3]. China’s reliance on energy has steadily risen, making it the world’s largest energy-importing country. Hence, forecasting the energy consumption of oil and coal can help China develop its energy-importing strategies to ensure energy security [4].
In principle, the forecasting of consumption of oil and coal is a time series forecasting problem. To date, numerous time series forecasting methods have been proposed [5], including statistical analysis-based [6], machine learning-based [6], and deep learning-based ones [7]. First, autoregressive integrated moving average (ARIMA), exponential smoothing, and grey forecasting models have emerged as notable statistical analysis-based methods for time series forecasting. In terms of ARIMA, it has shown promising results in forecasting future electricity consumption [8, 9]. For example, ARIMA was combined with bootstrap aggregation and exponential smoothing to achieve remarkable performance in mid to long-term electricity consumption forecasting [10]. In terms of exponential smoothing, the double exponential smoothing model was employed to forecast the demands for coal, oil, natural gas, and primary electricity [11]. In terms of grey forecasting models, a grey forecasting model with simulated annealing exhibited higher accuracy than traditional grey models in forecasting coal consumption [12]. In addition, the grey Lotka-Volterra model was built based on the competition and cooperation mechanism to forecast energy consumption [13]. However, these statistical analysis-based methods have their own limits. For example, the exponential smoothing method cannot detect inflection points in data and the grey forecasting model is sensitive to outliers. Furthermore, ARIMA, exponential smoothing, and grey forecasting methods heavily rely on historical data. If the historical data exhibits substantial variability, they may not be suitable for forecasting long-term time series.
Second, with the rapid development of machine learning, it has been employed to forecast time series in various industrial applications [6]. Among various machine learning approaches, Prophet, support vector machines (SVM), and artificial neural networks (ANNs), have gained significant attention because of their feature mappings learning capability between input and output data [14]. For example, the Prophet model performs better in forecasting India’s monthly total energy demand and peak energy demand than traditional models [15]. The advanced SVM achieves significantly higher accuracy than traditional models in forecasting solar and wind energy for most regions [16]. ANNs exhibit excellent performance in both real-time and short-term solar energy forecasting [17]. However, these machine learning-based methods are shallow models. They fail to effectively explore the deep potential correlations between various features that are closely related to the forecasting targets.
Finally, by increasing the number of hidden layers in the neural network, deep learning methods excel at handling strong nonlinear deep characteristics with remarkable performance [18]. Recurrent neural networks (RNN) and long short-term memory (LSTM) are two commonly adopted deep learning algorithms for time series forecasting [15]. RNN was first introduced in 1990 to retain temporal information by incorporating a recurrent layer to decide whether to retain information from previous time steps [12]. It outperforms the algorithms employed by the Estonian energy regulatory authority in forecasting wind power generation [19]. However, RNN struggles to maintain long-term dependencies due to the exploding/vanishing gradient problem [20]. To address this issue, LSTM was proposed by designing the gate structure [21]. Besides, it preserves temporal correlation through the utilization of memory cells [22]. Hence, it has good generalizability in forecasting energy-related time series [14].
However, the consumption demands of oil and coal are influenced by many factors [1, 23, 24], such as economy, population, climate, international situation, natural environment, etc. As a result, it is very challenging to investigate which features are most crucial for accurate forecasting. Besides, different energy consumption indicators (e.g., import of oil, production of oil, import of coal, and production of coal) have different inherent characteristics, and a single deep learning model can not comprehensively capture all the inherent characteristics of different indicators, resulting in its limited forecasting robustness.
Ensemble learning is an effective approach to enhance the forecasting performance of a single model [25]. Instead of relying on one single model, ensemble learning combines a collection of diverse models to create a more robust and accurate forecasting. Recently, ensemble learning has been applied to energy areas. For example, ensemble learning has shown quite promising results in addressing the issue of forecasting building energy consumption [26]. A hybrid forecasting model based on a selective ensemble showed a significant impact in addressing the forecasting issues related to energy consumption in China [27]. Furthermore, ensemble learning exhibited a significant accuracy improvement in predicting the electricity consumption of office buildings [28, 29]. Therefore, these studies demonstrated that ensemble learning models can deal with the heterogeneity among different forecasting issues. However, the previous studies of ensemble learning still have some limitations in forecasting the energy consumption of oil and coal in China. First, the consumption of oil and coal is influenced by many complex and uncertain factors, previous studies did not distinguish the correlations between targeted indicators and various factors. Besides, previous studies did not design a weighting strategy for controlling the ensembling effects, which is unsuitable for complex and uncertain scenarios in forecasting.
Motivated by this, a hybrid deep learning approach is proposed for accurately forecasting the consumption of oil and coal in China. The proposed approach consists of main three parts, i.e., feature engineering, model building, and model integration. First, feature engineering is to adopt correlation analysis to distinguish the different correlations between targeted indicators and various features. Second, model building is to build five typical deep learning models with different characteristics to forecast targeted indicators. Third, model integration is to ensemble the built five deep learning models with a tailored, self-adaptive weighting strategy. As such, the proposed approach enjoys all the merits of the five deep learning models, making it able to comprehensively capture all the characteristics of different factors to achieve accurate energy consumption forecasting. To evaluate the proposed approach, we collected the real 880 pieces of data with 39 factors ranging from 1999 to 2021. Four factors of import of oil, production of oil, import of coal, and production of coal were used as the targeted forecasting indicators because they are closely tied to the energy consumption of China. The remaining 35 factors (e.g., natural gas production, total construction industry output, corn yield, etc.) were used as the features. By conducting extensive experiments, we demonstrated that: a) the significant features of the four targeted indicators were identified respectively, and b) the proposed approach significantly outperforms both state-of-the-art statistical and deep learning comparison models in forecasting the four targeted indicators.
2. Methodology
2.1 Design philosophy
Fig 1 illustrates the general process of our proposed approach with five steps. The specific steps are outlined as follows:
Step 1: Input Data.
Collecting as much as possible data that may be related to energy consumption from economy, culture, society, etc. Split these data into targeted forecasting indicators and relevant features. In this paper, we have collected 39 factors, where four factors (i.e., import of oil, production of oil, import of coal, and production of coal) are used as the targeted forecasting indicators and the remaining are used as the features.
Step 2: Feature Engineering.
Conducting correlation analysis based on Pearson correlation coefficient, Spearman correlation coefficient, and Kendall correlation coefficient. The final comprehensive correlation is obtained by taking the weighted average of the coefficients derived from these methods. Based on the correlation coefficient values, features are grouped according to their correlation coefficients. Moreover, the grouped features are individually input into a forecasting model to further select the best feature groups based on forecasting errors.
Step 3: Model Building.
Five deep learning models of LSTM (long short-term memory network) [21], XGBOOST (eXtreme Gradient Boosting) [30], TCN (temporal convolutional network) [31], CNN (convolutional neural networks) [32], and TRMF (temporal regularized matrix factorization) [33] are employed as the forecasting models. The selected best feature groups are used to train these forecasting models to obtain the forecasting results.
Step 4: Model Integration.
The forecasting results from different forecasting models are ensembled through an adaptive weighting strategy to develop a robust forecasting model. As such, the integrated model can accurately forecast all four targeted indicators.
Step 5: Output Results.
The forecasting results for the four targeted indicators, i.e., import of oil, production of oil, import of coal, and production of coal, obtained from the integrated model are outputted.
2.2 Feature engineering
Correlation refers to the degree of association between two or more variables, serving as a measurement in the fields of statistics and data analysis to assess the interdependence among variables [34]. When variable x changes, another highly correlated variable y will also change accordingly. The correlation coefficient, COR(x, y), ranges from -1 to 1, and the closer the absolute value of COR(x, y) is to 1, the stronger the correlation between variables x and y.
Various statistical methods can be utilized to calculate the correlation. Among them, the Pearson correlation coefficient is commonly employed to quantify the linear relationship between two continuous variables. Additionally, several other correlation coefficients are employed, such as the Spearman correlation coefficient, utilized to gauge the relationship between two ordinal variables, and the Kendall correlation coefficient, used to measure non-linear relationships between two variables. These diverse correlation coefficient methods contribute to understanding relationships between variables of different types, thereby enhancing the capability for data interpretation and analysis.
2.2.1 Pearson correlation coefficient (PCC).
The Pearson correlation coefficient is a statistical measurement used to quantify the linear relationship between two random variables represented as real-valued vectors. It holds significant historical importance as the formal method for measuring correlation [14]. This linear correlation coefficient is specifically designed to assess the linear correlation between two normally distributed continuous variables, and its definition is as follows [35]:
(1)
where
denotes the mean of x,
denotes the mean of y, and n is the number of variables in each group.
2.2.2 Spearman’s rank correlation coefficient.
Spearman’s rank correlation coefficient is to evaluate the non-linear correlation between two variables. Instead of using their original observed values, it is computed based on the ranks (orderings) of the variables [36]. The Spearman correlation coefficient COR(x,y) is:
(2)
The Spearman correlation coefficient is insensitive to the distribution of data and is applicable to various types of data.
2.2.3 Kendall correlation coefficient.
The Kendall correlation coefficient serves as a metric to quantify the strength of the ordinal relationship between two variables. Its computation involves a comprehensive comparison of all pairs of observations within the dataset, aiming to discern their inherent ordinal hierarchy [37]. The Kendall correlation coefficient τ is as follows:
(3)
where φ1 denotes the number of corresponding observations in the data where the ordinal relationships are concordant, meaning the ranks are identical in both variables, φ2 denotes the number of corresponding observations in the data where the ordinal relationships are discordant, meaning the ranks are different in the two variables. n represents the sample size.
Distinguishing itself from the Pearson correlation coefficient, which emphasizes linear associations, Kendall proves particularly adept at capturing non-linear relationships. Furthermore, in handling tied ranks, Kendall exhibits enhanced robustness compared to its counterpart, Spearman.
2.3 Forecasting models
A single deep learning model can not comprehensively capture all the inherent characteristics of different indicators. Ensemble learning is an effective way to address this issue by combining several deep learning models. However, ensemble learning requires that the base deep learning models have the characteristics of diversity and accuracy. Diversity: the individual models need to make different kinds of predictions, which ensures that when one model makes a wrong prediction, another can potentially correct it. Accuracy: the individual models should still be reasonably accurate. If each model is weak (too inaccurate), the ensemble won’t perform well, even with diversity. Following this principle, we employed the five typical deep learning models of LSTM [21], XGBOOST [30], TCN [31], CNN [32], and TRMF [33] for ensembling. First, they have different learning structures and forecasting mechanisms to make them diverse. Second, they have been demonstrated to be accurate and effective in forecasting energy-related time-series. Hence, the ensembled model enjoys all the merits of the five deep learning models, making it able to comprehensively capture all the characteristics of different indicators to achieve accurate forecasting. Next, we introduce the five models.
2.3.1 Long short-term memory network (LSTM).
LSTM [21] is a specialized variant of the recurrent neural network. When dealing with sequence data, LSTM is capable of effectively capturing and utilizing long-term dependencies. In comparison to traditional RNN models, LSTM has been more successful in addressing issues like gradient vanishing and explosion.
The LSTM model is composed of a sequence of LSTM units, each containing three crucial gate mechanisms: the Input Gate, Forget Gate, and Output Gate. These gate mechanisms are designed to selectively filter and adjust input data, precisely controlling the flow of information and the transmission of memory. These gates are stored in the memory block. Fig 2 shows the structure of a memory block. The equations for LSTM calculations at time ’t’ are presented below:
(4)
where it, ft, ot represent the input gate, forget gate, and output gate, respectively. gt serves as an intermediate value during the computation. The Wi, Ui, Wf, Uf, Wo, Uo, Wg, Ug are weight matrices and bi, bf, bo, bg are bias vectors. ht and ht−1 constitute the outputs at both the current time t and the preceding time t-1, respectively. Xt is the current input. The hyperbolic tangent functions are as follows:
(5)
2.3.2 eXtreme gradient boosting (XGBOOST).
XGBOOST [30] is a gradient boosting tree algorithm that iteratively trains multiple decision trees. Each tree corrects the residual errors of the previous one, and their outputs are aggregated for predictions. The model’s output is the cumulative sum of the outputs from multiple decision trees, with the weight of each tree controlled by a learning rate.
For the forecasting value of the i-th sample, it can be represented as the accumulation of outputs from all trees:
(6)
where K is the number of trees and fk(xi) is the output of the k-th tree for the sample xi. To train the model, it is necessary to define the loss function and regularization term. The objective of XGBoost is to minimize the following loss function:
(7)
where, loss function
measures the difference between the true value yi and the forecasting value
, while the regularization term Ω(fk) is used to control the complexity of each tree fk.
2.3.3 Temporal convolutional network (TCN).
TCN [31] is a deep learning architecture designed for sequence modeling. Unlike traditional Recurrent Neural Networks (RNN), TCN employs Convolutional Neural Networks (CNN) to capture long-range dependencies within sequences.
Given an input sequence X = (x1, x2,…,xT), TCN generates an output sequence Y = (y1, y2,…,yT) through a series of convolutional operations. Each convolutional layer applies an activation function and includes residual connections. The output of TCN can be computed as follows:
(8)
where H(Xt) represents the output of the convolutional layer, and Xt is the element of the input sequence. Yi is the output of the network at time t.
TCN introduces residual connections, making the network easier to train and helping to prevent issues like gradient vanishing or exploding. It is particularly effective in handling deep networks.
2.3.4 Convolutional neural networks (CNN).
The role of CNN [32] in time series forecasting is to enhance the model’s understanding of patterns and structures within the sequence through convolutional operations and feature learning, thereby improving predictive performance. In specific time series problems, the combination or nested use of CNN with other models can also yield promising results.
For an input matrix (or feature map) I and a convolutional kernel (or filter) K, the mathematical expression for the convolution operation is as follows:
(9)
In this context, S(i, j) represents the outcome of the convolution operation. The coordinates (i, j) signify the position within the resulting matrix, while m and n serve as indices for the convolutional kernel. I(m, n) denotes an element within the input matrix, and K(i−m, j−n) signifies the weight associated with the convolutional kernel.
This mathematical expression articulates that each element in the resulting matrix is obtained by performing a weighted summation of the input matrix and the convolutional kernel, adhering to specific computational rules. This convolutional operation adeptly captures local features inherent in the input data, enabling CNNs to discern spatial structures and patterns present in images.
2.3.5 Temporal regularized matrix factorization (TRMF).
TRMF [33] is a model designed for time series prediction. It captures the underlying structure in sequences by decomposing the time series data matrix into the product of two low-rank matrices. TRMF introduces regularization terms during the decomposition to prevent overfitting and enhance the model’s generalization capability. The simple formula for TRMF can be expressed as:
(10)
where, Xijt is the observed value at time t, variable i, in the time series data matrix. F is the time factor matrix, representing the influence of time. W is the space factor matrix, representing the relationships between variables. Ylt is additional information at time t. ϵijt is the error term.
The training process of the model involves finding appropriate factor matrices (F, W, etc.) through optimization algorithms to minimize the error between the actual observed values and the model predictions. TRMF excels in time series decomposition and prediction, particularly demonstrating advantages in handling missing data and large-scale datasets.
2.4 Ensemble modeling
Adaptive aggregation ensemble learning [38] is an effective method for aggregating multiple models. We employ this approach for model integration. The theoretical foundation for adaptive aggregation ensemble learning has been supported. Let Errk(t) be the forecasting errors of k-th model in the five forecasting models at t-th training iteration, where k∈ {1,2,3,4,5} The adaptive weights εk(t) of k-th model at t-th training iteration can be expressed as follows:
(11)
where, t is training iteration,
is the equilibrium factor that governs the aggregation weights of the ensemble during the training process, T is the maximum training iteration, Alk(t) is the cumulative errors of k-th model over t iterations. The process of model ensemble is illustrated in Fig 3.
3. Experiments and results
3.1 Data collection
The production data for Chinese oil and coal is sourced from the China Statistical Yearbook (stats.gov.cn) published by the National Bureau of Statistics. The import data for Chinese oil and coal is derived from China’s import and export trade data published by the General Administration of Customs(customs.gov.cn). Additionally, we collected more than 50 sets of feature data related to various aspects of the economy, culture, and society from the aforementioned sources. Subsequently, we extracted data for a time period from 1999 to 2021. Finally, we collected four indicators (import of oil, production of oil, import of coal, and production of coal) with total 88 data entries, and 35 features (natural gas production, electricity production, total construction industry output, etc.) with total 770 data entries. More details can be found in the S1 Appendix.
3.2 Experimental settings
3.2.1 Experiment design.
To evaluate the proposed approach, we design four sets of experiments, i.e., feature engineering, comparison between multi-feature trained model and univariate models, comparison between single model and our proposed ensemble approach, and hyperparameter analysis of forecasting models. First is feature engineering, it calculates three correlation coefficients between indicators and features to group the features based on the obtained correlation coefficients. Then, inputting the grouped features into a multi-feature time series forecasting model to identify the optimal features. Due to the limited page space, only LSTM is employed to identify the optimal features in the experiments. Second is comparison between multi-feature trained model and univariate models, it compares the model trained on the identified optimal features with several univariate models to validate that multi-feature data yields better forecasting accuracy than univariate data. Third is comparison between single model and our proposed ensemble approach, it compares the single model with out proposed ensemble model to demonstrate that our approach can improve the forecasting accuracy of each single model on all four indicators. Fourth is hyperparameter analysis of forecasting models, it conducts hyperparameter sensitivity analysis of LSTM to show that a single model is sensitive to hyperparameters and then monitores the ensemble weights and convergence curves of ensemble model to show that our proposed approach is adaptive to achieve better forecasting performance.
3.2.2 Evaluation metric.
In the experiments, the training sets were constructed by using the data from 1999 to 2016, and the remaining five years data (i.e., 2017, 2018, 2019, 2020, and 2021) were set as testing sets. The model hyperparameters were tuned by grid search. We repeated each experiment five times to obtain the average forecasting accuracy. Root mean square error (RMSE) and mean absolute percentage error (MAPE) are commonly adopted to assess forecasting accuracy in various related time series forecasting scenarios [9]. The computing formulas of RMSE and MAPE are given as follows:
(12)
where N represents the count of data samples in the testing set, Yi represents the real value of the i-th sample, and
represents the forecasting value of the i-th sample. Lower RMSE and MAPE correspond to a higher forecasting accuracy.
3.3 Feature engineering
To identify the optimal features for the four targeted indicators (import of oil, production of oil, import of coal, and production of coal), we initially computed three types of correlation coefficient (i.e., Pearson correlation coefficient, Spearman correlation coefficient, and Kendall correlation coefficient) between the four targeted indicators and the 35 features. Then, the three types of correlation coefficients were averaged to obtain the final averaged correlation coefficients. We grouped the features based on the final averaged correlation coefficients as shown in Tables 1–4 (the detailed results of the correlation coefficients can be found in Tables B1-B4 in the S1 Appendix).
There are six groups divided by the correlation coefficient range, i.e., the correlation coefficient ranges of 0.90–1.00, 0.80–0.90, 0.70–0.80, 0.60–0.70, 0.50–0.60, and 0.00–0.50, respectively.
There are seven groups divided by the correlation coefficient range, i.e., the correlation coefficient ranges of 0.99–1.00, 0.90–0.99, 0.80–0.90, 0.70–0.80, 0.60–0.70, 0.50–0.60, and 0.00–0.50.
There are six groups divided by the correlation coefficient range, i.e., the correlation coefficient ranges of 0.90–1.00, 0.80–0.90, 0.70–0.80, 0.60–0.70, 0.50–0.60, and 0.00–0.50.
There are six groups divided by the correlation coefficient range, i.e., the correlation coefficient ranges of 0.90–1.00, 0.80–0.90, 0.70–0.80, 0.60–0.70, 0.50–0.60, and 0.00–0.50.
To identify the optimal features for forecasting, we sequentially added the grouped features into LSTM for forecasting in descending order with different correlation coefficients, as shown in Tables 5–8. By finding the highest forecasting accuracy, the corresponding optimal features can be identified. From Tables 5–8, we obtained the following four aspects of observations:
- Incorporating features with high correlation coefficients can enhance the model’s forecasting accuracy in general. For example, the models with added features exhibit higher forecasting accuracy than the "Single feature" group that has no feature. Another example, Table 8 shows that the model’s forecasting accuracy is improved by sequentially adding features until the last group (0.00–1.00).
- Incorporating features with relatively high correlation coefficients can also lead to a decrease of forecasting accuracy. For example, Table 7 shows that the average MAPE increases from 1.88% to 2.41% after adding features with correlation coefficients of 0.90–0.99. The reason may be that there are some redundant features within the correlation coefficients of 0.90–0.99. In other words, some features within the correlation coefficients of 0.90–0.99 are covered by the features within the correlation coefficients of 0.99–1.00.
- The production and import of oil and coal are related to the production and import of other energy sources, i.e., incorporating other energy-related data can improve forecasting accuracy. Such improvement could be attributed to the Chinese Energy Conservation and Emission Reduction Strategy introduced in 2006 and the Energy Development Strategic Action Plan formulated in 2014, which continuously optimized the Chinese energy structure.
- Incorporating data related to staple foods such as rice, barley, and wheat import volumes can also enhance forecasting accuracy. Besides, the population growth rate and birth rate can also further enhance forecasting accuracy. These effects could be attributed to the increasing demands of food due to Chines population growth, thereby increasing the correlations between these features and energy consumption.
In summary, this set of experiments identifies the optimal features for the four targeted indicators (import of oil, production of oil, import of coal, and production of coal). These optimal features were employed to conduct the subsequent three sets of experiments.
The grouped features with different correlation coefficients were added in descending order. The column of ‘Single feature’ denotes that only the targeted indicator (without features) was used to model for forecasting. The columns of ‘Grouped features with different correlation coefficient ranges’ denotes that the features with corresponding correlation coefficients were used to model for forecasting. For example, the column of ‘0.90–1.00’ denotes that the features with correlation coefficients of 0.90–1.00 were used to model.
The grouped features with different correlation coefficients were added in descending order. The column of ‘Single feature’ denotes that only the targeted indicator (without features) was used to model for forecasting. The columns of ‘Grouped features with different correlation coefficient ranges’ denotes that the features with corresponding correlation coefficients were used to model for forecasting. For example, the column of ‘0.90–1.00’ denotes that the features with correlation coefficients of 0.90–1.00 were used to model.
The grouped features with different correlation coefficients were added in descending order. The column of ‘Single feature’ denotes that only the targeted indicator (without features) was used to model for forecasting. The columns of ‘Grouped features with different correlation coefficient ranges’ denotes that the features with corresponding correlation coefficients were used to model for forecasting. For example, the column of ‘0.90–1.00’ denotes that the features with correlation coefficients of 0.90–1.00 were used to model.
The grouped features with different correlation coefficients were added in descending order. The column of ‘Single feature’ denotes that only the targeted indicator (without features) was used to model for forecasting. The columns of ‘Grouped features with different correlation coefficient ranges’ denotes that the features with corresponding correlation coefficients were used to model for forecasting. For example, the column of ‘0.90–1.00’ denotes that the features with correlation coefficients of 0.90–1.00 were used to model.
3.4 Comparison between multi-feature trained model and univariate models
To explore whether the identified optimal features can enhance the forecasting accuracy of energy consumption demand, we compared the multi-feature trained LSTM with seven univariate forecasting models based on statistical analysis, machine learning, and deep learning. The seven univariate forecasting models were built on the data of univariate targeted indicators. Table 9 provides their brief introductions. The comparison results are presented in Tables 10–13, where we find that the multi-feature trained LSTM can achieve better forecasting accuracy than seven univariate forecasting models. The reason is that the univariate forecasting model has limited learning ability, it cannot comprehensively capture the complex characteristics of the four targeted indicators. Specifically, compared with the best univariate forecasting models in Tables 10–13, the multi-feature trained LSTM reduces MAPE by 2.20%, 2.19%, 3.12%, and 1.34%, respectively, across the four targeted indicators forecasting. Hence, we conclude that the identified optimal features can significantly improve the forecasting accuracy of energy consumption in China.
The results show that the multi-feature trained LSTM (i.e., LSTM_Multi) outperforms the seven univariate forecasting models.
The results show that the multi-feature trained LSTM (i.e., LSTM_Multi) outperforms the seven univariate forecasting models.
The results show that the multi-feature trained LSTM (i.e., LSTM_Multi) outperforms the seven univariate forecasting models.
The results show that the multi-feature trained LSTM (i.e., LSTM_Multi) outperforms the seven univariate forecasting models.
3.5 Comparison between single model and our proposed ensemble approach
This set of experiments compared our proposed ensemble approach with the five single models of LSTM [30], XGBOOST [30], TCN [31], CNN [32], and TRMF [33]. All involved models were trained with the identical identified optimal features. Their hyperparameters were tuned by grid search. LSTM model had a three-layer neural network structure with each layer containing 50 neurons, a batch size of 8, and the Adam optimizer with a learning rate of 0.001 for 100 epochs. CNN model had a two-layer network structure with each layer containing 101 neurons, a batch size of 1, and a learning rate of 0.01 with the SGD optimizer for 100 epochs. TCN model had a batch size of 5 and the Adam optimizer with learning rate of 0.001 for 100 epochs. XGBoost model explored various hyperparameters, including the number of trees in the range [50, 100, 150, 200, 250, 300], learning rates of [0.01, 0.1, 0.2], and maximum tree depths of [1, 2, 3, 4, 5, 6]. TRMF model set the number of factors to 4, the time delay list of [1, 2, 3], the noise variance to 2, and the maximum iterations of 100. Table 14 shows the comparison results. To better understand these comparison results, we conducted statistical analyses of the loss/win, the Wilcoxon signed rank test, and the Friedman’s test following prior studies [38, 45]. From these results, we find that our proposed ensemble approach significantly outperforms the five single comparison models. In the total 100 comparison cases (for each of the four indicators, 25 cases were compared, giving a total of 100 comparison cases), our approach only performs worse than the LSTM with one case and outperforms all other models with the remaining 99 cases. Our approach achieves excellent MAPE for all the cases consistently below 0.9%, and the minimum MAPE can go as low as 0.0052%. In conclusion, this set of experiments demonstrates that our proposed ensemble approach can effectively address the limitations of a single model in forecasting heterogeneous targeted indicators of energy consumption.
The results show that our approach significantly outperforms the five single comparison models.
3.6 Hyperparameter analysis of forecasting models
3.6.1 Hyperparameter analysis of the single LSTM model.
To show that a single model is sensitive to hyperparameters, we conducted the hyperparameter sensitivity analysis of LSTM as a case study. Fig 4 records the results of the production of oil. The complete results on all the datasets are recorded in Fig B2 in the S1 Appendix. From these results, we observed that the single LSTM model is significantly affected by its hyperparameters of batch size, Dropout, and Units. In other words, LSTM requires time-consuming manual hyperparameter tunings for the different forecasting indicators. Therefore, it is challenging for a single model to tune its hyperparameters to accurately forecast all four indicators.
3.6.2 Hyperparameter analysis of the ensemble model.
To visualize the integration process and performance of the ensemble model, we plot the ensembling weight and convergence curves of our proposed approach. Fig 5 records the adaptive ensembling weights of our approach for 2021. Fig 6 records the convergence curves of all the models for 2021. The results for other years are recorded in Figs B3, B4 in the S1 Appendix. From these figures, we observe that although the convergence patterns of each model vary across different indicators, all the models can consistently converge to better forecasting accuracy with the increasing training epochs. However, our proposed model always achieves the best forecasting accuracy among all the models because it can adaptively adjust its ensembling weights based on the forecasting accuracy. Therefore, the results demonstrate that our approach can adaptively integrate the merits of each single model to obtain an optimal ensemble forecasting performance.
Four figures are the results of the production of oil, production of coal, import of oil, and import of coal, respectively, from left to right. The results show that our approach can adaptively adjust its ensembling weights.
Four figures are the results of the production of oil, production of coal, import of oil, and import of coal, respectively, from left to right. The results show that all the models can consistently converge to better forecasting accuracy with the increasing training epochs.
4. Conclusion
This paper proposes a hybrid deep learning approach for consumption forecasting of oil and coal in China. In the experiments, we collected the real 880 pieces of data with 39 factors regarding the energy consumption of China ranging from 1999 to 2021. The four factors of import of oil, production of oil, import of coal, and production of coal were set as the targeted indicators for representing the energy consumption of China, and the remaining 35 factors were set as features. Based on the experimental results, we have three main kinds of findings. First, feature engineering can not only identify the optimal features for forecasting energy consumption in China but also explore new knowledge. For example, feature engineering discovers that the energy consumption of China is greatly influenced by staple foods, population growth rate, and birth rate. Second, the identified optimal features can significantly improve the forecasting accuracy of energy consumption in China, which indicates that it is critical to find the correct features before modeling. Third, by ensembling five deep learning models, our proposed ensemble approach can effectively address the limitations of a single model in forecasting heterogeneous targeted indicators of energy consumption in China. Note that these ensembling models are required to be diverse with different characteristics [46, 47]. One easy solution is to select the different types of deep learning models, such as the models with different learning principles and structures. Although our approach holds immense potential in energy consumption forecasting, it still has one limitation. The feature engineering of our approach needs to sequentially add the grouped features into a forecasting model to identify the optimal features in advance, which requires some manual adjustments. In the future, we plan to automatically identify the optimal features based on intelligent optimization methods such as differential evolution [45].
References
- 1. Wang T, Wu F, Dickinson D, et al. Energy price bubbles and extreme price movements: Evidence from China’s coal market. Energy Economics, 2024, 129: 107253.
- 2. Zhu Q, Sang X, Li Z. Economic growth and household energy footprint inequality in China. Plos one, 2023, 18(3): e0282300. pmid:36857403
- 3. Wang Q, Song X. Forecasting China’s oil consumption: a comparison of novel nonlinear-dynamic grey model (GM), linear GM, nonlinear GM and metabolism GM. Energy, 2019, 183: 160–171.
- 4. Chen H, Chen J, Han G, et al. Winding down the wind power curtailment in China: What made the difference?. Renewable and Sustainable Energy Reviews, 2022, 167: 112725.
- 5. Lin Y, Li J, Ruan X, et al. Energy consumption analysis of power grid distribution transformers based on an improved genetic algorithm. PeerJ Computer Science, 2023, 9: e1632. pmid:38077544
- 6. Miconi F, Dimitri G M. A machine learning approach to analyse and predict the electric cars scenario: The Italian case. PloS one, 2023, 18(1): e0279040. pmid:36662837
- 7. Sen A, Dutta Choudhury K, Kumar Datta T. An analysis of crude oil prices in the last decade (2011–2020): With deep learning approach. Plos one, 2023, 18(3): e0268996. pmid:36893097
- 8. Huang J, Zhang X, Jiang X. Short-term power load forecasting based on the CEEMDAN-TCN-ESN model[J]. Plos one, 2023, 18(10): e0284604.
- 9. Ouyang X, Yang Y, Du K, et al. How does residential electricity consumption respond to electricity efficiency improvement? Evidence from 287 prefecture-level cities in China. Energy Policy, 2022, 171: 113302.
- 10. de Oliveira E M, Oliveira F L C. Forecasting mid-long term electric energy consumption through bagging ARIMA and exponential smoothing methods. Energy, 2018, 144: 776–788.
- 11. Rao C, Zhang Y, Wen J, et al. Energy demand forecasting in China: A support vector regression-compositional data second exponential smoothing model. Energy, 2023, 263: 125955.
- 12. Tong M, Dong J, Luo X, et al. Coal consumption forecasting using an optimized grey model: The case of the world’s top three coal consumers. Energy, 2022, 242: 122786.
- 13. Duan H, Liu Y, Wang G. A novel dynamic time-delay grey model of energy prices and its application in oil price forecasting. Energy, 2022, 251: 123968.
- 14. Irfan M, Ayub N, Althobiani F, et al. Ensemble learning approach for advanced metering infrastructure in future smart grids. Plos one, 2023, 18(10): e0289672. pmid:37851626
- 15. Chaturvedi S, Rajasekar E, Natarajan S, et al. A comparative assessment of SARIMA, LSTM RNN and Fb Prophet models to forecast total and peak monthly energy demand for India. Energy Policy, 2022, 168: 113097.
- 16. Zendehboudi A, Baseer M A, Saidur R. Application of support vector machine models for forecasting solar and wind energy resources: A review. Journal of cleaner production, 2018, 199: 272–285.
- 17. Jebli I, Belouadha F Z, Kabbaj M I, et al. Prediction of solar energy guided by pearson correlation using machine learning. Energy, 2021, 224: 120109.
- 18. Zhu C, An Adaptive Agent Decision Model Based on Deep Reinforcement Learning and Autonomous Learning. Journal of Logistics, Informatics and Service Science, 2023, 10(3): 107–118.
- 19.
Shabbir N., Kütt L., Jawad M., Amadiahanger R., Iqbal M. N. and Rosin A., "Wind Energy Forecasting Using Recurrent Neural Networks," 2019 Big Data, Knowledge and Control Systems Engineering (BdKCSE), Sofia, Bulgaria, 2019, pp. 1–5.
- 20. Zhu Y, Li H, Liao Y, et al. What to Do Next: Modeling User Behaviors by Time-LSTM//IJCAI. 2017, 17: 3602–3608.
- 21. Yu Y, Si X, Hu C, et al. A review of recurrent neural networks: LSTM cells and network architectures. Neural computation, 2019, 31(7): 1235–1270. pmid:31113301
- 22.
Fu R, Zhang Z, Li L. Using LSTM and GRU neural network methods for trafficflow prediction. In: 2016 31st youth academic annual conference of Chinese association of automation (YAC). IEEE; 2016.
- 23. Feng Y, Chen J, and Luo J, Life cycle cost analysis of power generation from underground coal gasification with carbon capture and storage (CCS) to measure the economic feasibility. Resources Policy, 2024, 92: 104996.
- 24. Zhang L, Liu C, Jia Y, Mu Y, Yan Y, Huang P, Pyrolytic Modification of Heavy Coal Tar by Multi-Polymer Blending: Preparation of Ordered Carbonaceous Mesophase. Polymers, 2024, 16(1): 161. pmid:38201826
- 25. Tanveer M, Rastogi A, Paliwal V, et al. Ensemble deep learning in speech signal tasks: A review. Neurocomputing, 2023: 126436.
- 26. Wang R, Lu S, Feng W. A novel improved model for building energy consumption prediction based on model integration. Applied Energy, 2020, 262: 114561.
- 27. Xiao J, Li Y, Xie L, et al. A hybrid model based on selective ensemble for energy consumption forecasting in China. Energy, 2018, 159: 534–546.
- 28. Pinto T, Praça I, Vale Z, et al. Ensemble learning for electricity consumption forecasting in office buildings. Neurocomputing, 2021, 423: 747–755.
- 29. Li L, Han Y Li Q & Chen W, Multi-Dimensional Economy-Durability Optimization Method for Integrated Energy and Transportation System of Net-Zero Energy Buildings. IEEE Transactions on Sustainable Energy, 2024, 15(1): 146–159.
- 30.
Chen T, Guestrin C. Xgboost: A scalable tree boosting system//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785–794.
- 31. Hewage P, Behera A, Trovati M, et al. Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station. Soft Computing, 2020, 24: 16453–16482.
- 32.
Xie L, Yuille A. Genetic cnn//Proceedings of the IEEE international conference on computer vision. 2017: 1379–1388.
- 33. Yu H F, Rao N, Dhillon I S. Temporal regularized matrix factorization for high-dimensional time series prediction. Advances in neural information processing systems, 2016, 29.
- 34. Liu Z, Zhao Y, Wang Q, Xing H., and Sun J, Modeling and Assessment of Carbon Emissions in Additive-Subtractive Integrated Hybrid Manufacturing Based on Energy and Material Analysis. International Journal of Precision Engineering and Manufacturing-Green Technology, 2024, 11(3): 799–813.
- 35. Wu D, Luo X, He Y, and Zhou M, A Prediction-sampling-based Multilayer-structured Latent Factor Model for Accurate Representation to High-dimensional and Sparse Data, IEEE transactions on neural networks and learning systems, 2024, 35(3): 3845–3858. pmid:36083962
- 36. Morf H. A validation frame for deterministic solar irradiance forecasts. Renewable Energy, 2021, 180: 1210–1221.
- 37.
Abdi H. The Kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA, 2007: 508–510.
- 38. Wu D, He Y, and Luo X, A Graph-Incorporated Latent Factor Analysis Model for High-Dimensional and Sparse Data, IEEE transactions on emerging topics in computing, 2023, 11(4): 907–917.
- 39.
Box G E P, Jenkins G M, Reinsel G C, et al. Time series analysis: forecasting and control. John Wiley & Sons, 2015.
- 40.
Hyndman R J, Athanasopoulos G. Forecasting: principles and practice. OTexts, 2018.
- 41. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012, 25.
- 42.
Bai S, Kolter J Z, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
- 43. Ju-Long D. Control problems of grey systems. Systems & control letters, 1982, 1(5): 288–294.
- 44. Taylor S J, Letham B. Forecasting at scale. The American Statistician, 2018, 72(1): 37–45.
- 45. Wu D, Sun B, Shang M. Hyperparameter Learning for Deep Learning-based Recommender Systems. IEEE transactions on services computing, 2023, 16(4): 2699–2712.
- 46. Wu D, Zhang P, He Y, and Luo X, MMLF: Multi-Metric Latent Feature Analysis for High-Dimensional and Incomplete Data, IEEE transactions on services computing, 2024, 17(2): 575–588.
- 47. Wu D, Zhang P, He Y, and Luo X, A Double-Space and Double-Norm Ensembled Latent Factor Model for Highly Accurate Web Service QoS Prediction, IEEE transactions on services computing, 2023, 16(2): 575–588.