A genetic-algorithm-based remnant grey prediction model for energy demand forecasting

Energy demand is an important economic index, and demand forecasting has played a significant role in drawing up energy development plans for cities or countries. As the use of large datasets and statistical assumptions is often impractical to forecast energy demand, the GM(1,1) model is commonly used because of its simplicity and ability to characterize an unknown system by using a limited number of data points to construct a time series model. This paper proposes a genetic-algorithm-based remnant GM(1,1) (GARGM(1,1)) with sign estimation to further improve the forecasting accuracy of the original GM(1,1) model. The distinctive feature of GARGM(1,1) is that it simultaneously optimizes the parameter specifications of the original and its residual models by using the GA. The results of experiments pertaining to a real case of energy demand in China showed that the proposed GARGM(1,1) outperforms other remnant GM(1,1) variants.


Introduction
Energy is necessary for the sustainable development and economic prosperity of a country [1], and this is evidenced by the fact that energy demand has emerged as an important economic index in recent years. With the rapid pace of industrialization, the global demand for energy has increased exponentially in the past decade. Worldwide energy consumption is expected to increase by over 50% before 2030 if the current pattern of global energy consumption continues [2]. Moreover, high energy consumption has a significant and deleterious impact on the environment. This means that the environmental impact of energy consumption will play a crucial role in guiding energy development policies for cities and countries in the future [1]. An important issue in this context is the ability to accurately predict energy demand.
Traditional methods of demand forecasting, including artificial intelligence techniques, multivariate regression, and time series analysis, have been frequently applied to predict energy demand [3][4][5][6][7][8]. However, a large sample size is usually required to achieve reasonable forecasting accuracy for these methods [9][10][11][12]. Furthermore, statistical methods usually require that the data conform to statistical assumptions such as following a particular distribution. However, using large sample sizes or conforming to statistical assumptions is often PLOS ONE | https://doi.org/10.1371/journal.pone.0185478 October 5, 2017 1 / 11 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 impractical [13]. Hence, a forecasting method is needed that can work with small samples without making statistical assumptions to construct an energy demand prediction model [10,11]. Grey prediction [14] has emerged as a popular technique in the past decade, and is suitable for forecasting energy demand because of its simplicity and ability to characterize an unknown system using a limited number of data points [1]. Grey prediction consists of several forecasting models, of which the GM(1, 1) is commonly used for time series forecasting [15]. The GM (1, 1) model needs only four recent sample data points to achieve reliable and acceptable prediction accuracy [16,17]. Its effectiveness has been verified through application to a wide range of real-world problems, including energy consumption forecasting [10][11][12][13][18][19][20][21][22][23], technology management [24,25], engineering problems [26], optimization model development [27,28], and general management [29][30][31].
To increase the forecasting accuracy of the original GM(1, 1) model, further development of the residual GM(1,1) model has been recommended [14,15]. A residual modification model, also called a remnant GM(1,1) model, is commonly constructed by first building the original GM(1,1) model, and then constructing the residual GM(1,1) model to modify the predicted values obtained by the original model. A number of improved remnant GM(1, 1) models focusing on sign estimation for residual modification have been developed. For instance, Hsu and Chen [32] used a multi-layer perceptron (MLP) to estimate the signs of residual modification to forecast power demand; Hsu [33] used Markov-chain-based sign estimation to modify residuals for the global integrated circuit industry, whereas Lee and Tong [13] combined residual modification with residual genetic programming (GP) sign estimation to develop the GPGM(1, 1) model in view of the importance of forecasting energy demand.
Usually, the original and the residual models are set up separately for remnant models. It would be interesting to investigate whether the prediction accuracy of the traditional remnant GM(1,1) model improves when the GM(1,1) and its residual models are constructed simultaneously. This paper develops a grey forecasting model called the genetic-algorithm-based remnant GM(1,1) model (GARGM(1,1)) with sign estimation that delivers high prediction accuracy. Its distinctive feature is that it can simultaneously optimize the parameters required for the original GM(1,1) and its residual models by a powerful search and optimization method [34][35][36], the genetic algorithm (GA). This grey prediction model is then applied to forecast energy demand.
The remainder of the paper is organized as follows. Sections 2 and 3 introduce the traditional remnant GM(1,1) and the proposed GARGM(1,1) models, respectively. Section 4 examines the forecasting performance of the GARGM(1,1) model using a dataset collected from China Statistical Yearbook 2008. The results show that the GARGM(1,1) model can outperform other variants of the remnant GM(1,1) model. Section 5 contains a discussion and the conclusions of this study.

Remnant GM(1,1) model
This section introduces the traditional remnant GM(1,1) model used to improve the predictive accuracy of the original GM(1,1) model. It consists of two main components: the original GM (1,1) model described in Section 2.1, and the residual GM(1,1) model described in Section 2.2.

Original GM(1,1) model
Let an original data sequence x ð0Þ ¼ ðx ð0Þ 1 ; x ð0Þ 2 ; . . . ; x ð0Þ n Þ be provided by one system and consist of n samples. A new sequence x ð1Þ ¼ ðx ð1Þ 1 ; x ð1Þ 2 ; . . . ; x ð1Þ n Þ can then be generated from x (0) by the accumulated generating operation (AGO) [7,15] as follows: x ð1Þ 1 , x ð1Þ 2 ,. . ., x ð1Þ n can then be approximated by a first-order differential equation: where a and b are the developing coefficient and the control variable, respectively. The AGO is used because it can identify potential regularities hidden in the data sequences even if the original data are finite, insufficient, and chaotic.
x ð1Þ k , the predicted value of x ð1Þ k , can be obtained by solving the grey difference equation with initial condition x ð1Þ a and b can be estimated by means of a grey difference equation: where the background value z ð1Þ k is formulated as follows: α is usually specified as 0.5 for convenience, but this is not the optimal setting. By using n-1 grey difference equations (k = 2, 3,. . ., n), a and b can be obtained by the ordinary least-squares method: Using the inverse AGO, the predicted value,x ð0Þ k , of x ð0Þ k can be generated as follows: Therefore,x ð0Þ k can be formulated as follows: Note thatx ð1Þ 1 ¼x ð0Þ 1 .

Residual GM(1,1) model
Let Using the same manner of construction as for the original GM(1,1) model for x (0) , a residual model can be constructed for ε (0) . The predicted residual,ε ð0Þ k , of ε ð0Þ k can be derived as follows: where a ε and b ε are the developing coefficient and the control variable respectively. In the remnant GM(1,1) model with sign estimation,x ð0Þ k is modified by addingε ð0Þ k to, or subtractingε ð0Þ where s(k) denotes the sign (positive or negative) ofε ð0Þ k with respect to the k-th year. Compared to the original remnant GM(1,1) model, the sign of each residual in the improved one is unknown and needs to be estimated.

Genetic-algorithm-based remnant GM(1,1) model
Two main issues need to be addressed in the traditional remnant GM(1,1) model. First, the determination of both the developing coefficient and the control variable, in the original GM (1,1) model and the residual GM(1,1) model, are completely dependent on the background values. However, as the background values cannot be easily determined in advance by decision makers, it is reasonable to try to find developing coefficients and control variables without using background values. The second issues that needs to be addressed is that a ε and b ε are determined once a and b in the original GM(1,1) model have been created. However, in addition to sign estimation, to minimize the difference between the predicted and the actual values, it might be worth examining whether simultaneously determining the four crucial parameters (i.e., a, b, a ε , b ε ) has an effect on the prediction accuracy of the remnant GM(1,1) model.
The objective of our optimization problem is to minimize the mean absolute percentage error (MAPE) of the training patterns: where TS denotes the training or testing data. As the background values are not involved in the formulation ofx ð0Þ k , the computation of MAPE is completely free of the influence of background values. The absolute percentage error (APE), which was used to comparex ð0Þ k and x ð0Þ k with the time series data, was defined as: A method based on the GA is developed to automatically determine the developing coefficients (i.e., a and a ε ), the control variables (i. e., b and b ε ), and the sign of the k-th year (i.e., s k , k = 2, 3, . . ., n) for the improved remnant GM(1,1) model (i.e., GARGM(1,1)). Let k . s m u;k has a positive sign when it is one and a negative sign when it is zero. Let n size and n max denote the population size and the maximum number of generations, respectively. Using the MAPE for the training data as the fitness function, having evaluated the fitness value of each chromosome in P m , selection, crossover, and mutation are applied until n size new chromosomes have been generated for P m+1 . The GA can be executed until n max generations have been generated. The authors of this study performed these genetic operations as described in detail in [38].

Selection
Using binary tournament selection, two chromosomes from the current population are randomly selected, and the one with the higher fitness is placed in a mating pool. This process is repeated until there are n size chromosomes in the mating pool. n size pairs of chromosomes from the pool are then randomly selected for mating. Crossover and mutation operations are applied to a selected parent to reproduce children by altering the chromosomal makeup of the chromosomes of two parents.

Mutation
Let Pr m denote the probability that mutation is performed for each real-valued parameter in a new chromosome generated by crossover. To avoid excessive perturbation in the gene pool, a low mutation rate should be used. If mutation occurs for a real-valued gene, it is altered by adding a number randomly selected from a specified interval. For each gene of the newly generated binary chromosomes, the mutation operation with Pr m is performed on each bit or gene of the string. Each gene in a string can be thus changed either from zero to one or from one to zero with probability Pr m .
After crossover and mutation, n del (0 n del n size ) chromosomes in P m+1 are randomly removed from the set of new chromosomes (those formed by genetic operations) to make room for additional copies of the chromosome with a maximum fitness value in P m . Fig 1  shows a flowchart of the construction of the proposed prediction model using the GA.

Experimental results
Section 4.1 presents the parameter specifications of the GA-based learning algorithm and Section 4.2 reports the performance of different forecasting methods on a real-world case.

Parameter specifications of GA
A number of factors can influence the performance of the GA, including population size and the probabilities of applying the crossover and mutation operators. As a matter of fact, no optimal GA parameter specifications exist. The principles recommended by Osyczka [35] and Ishibuchi et al. [36] to specify parameters of GA were as follows: 1. The population size commonly should range from 50 to 500 individuals.
2. The stopping condition should be specified according to the available computation time.
3. Only a small number of elite chromosomes were needed.
4. The crossover probability should be set to a large value because it controls the range of exploration in the solution space.
5. The mutation probability should be set to a small value to avoid generating excessive perturbations.
Therefore, the parameters in the experiment were specified as: n size = 200, n max = 1000, n del = 2, Pr c = 0.9, and Pr m = 0.01.
This experiment constructed the proposed GARGM(1,1) without any complex mechanisms to tune its parameters.

Application to total energy demand in China
To examine the forecasting capability of the GARGM(1,1) model, an experiment was conducted to compare its performance with the original GM(1,1), the GPGM(1,1), and the improved grey forecasting model using MLP models (MLPGM(1,1)) on a dataset collected from the China Statistical Yearbook 2008. This dataset made up of historical annual total energy consumption in China was shown in [13]. With its rapid economic development and ongoing industrialization, China has played a vital role with regard to energy production and consumption [39]. Indeed, energy demand forecasting has become an increasingly important issue for China [12].
Data from 1990 to 2003 were used for model fitting and those from 2004 to 2007 for expost testing. The forecasting results reported in [13] for the original GM(1,1), the MLPGM (1,1), and the GPGM(1,1) models are summarized in Table 1 and illustrated in Fig 2. Table 1 shows that the MAPE values of the original GM(1,1), the MLPGM(1,1), the GPGM(1,1), and the GARGM(1,1) models for the training data were 4.13%, 3.61%, 2.59%, and 1.50%, respectively. Similarly, for the testing data, the MAPE values were 26.21%, 20.23%, 20.23%, and 17.51%, respectively. These results indicate that the GARGM(1,1) model outperformed the other forecasting methods on both training and testing data. The results of ex-post testing were relatively poor for every prediction model because the total energy consumption drastically went up in 2004, as shown in Table 1.

Discussion and conclusions
Energy management is crucial for economic prosperity and environmental security [1]. Energy demand forecasting plays an important role in creating energy policy. GM(1,1) is an appropriate approach to predict energy demand because it uses a limited number of samples to construct a prediction model without statistical assumptions. The present study developed the GARGM(1,1) model to forecast energy demand. The proposed model is sufficiently simple to implement as a computer program. Although the parameter specifications are somewhat subjective, the experimental results showed that they are acceptable. Compared with the MLPGM(1, 1) and the GPGM(1, 1) models, the GARGM(1, 1) model has the advantage of directly determining the developing coefficients and the control variables by the GA without using background values. The required parameters for the RGM are also optimized simultaneously. Experimental results concerning a case of energy demand data from China showed the effectiveness of the proposed forecasting model. In addition to grey prediction models, we examined the prediction performance of two frequently used models, linear regression and the MLP with backpropagation learning. The MLP had an input node, a hidden layer with two neurons, and an output layer with one neuron; it was trained over 10,000 iterations at a learning rate of 0.8. The forecasting results obtained by linear regression and the MLP are summarized in Table 2. It is clear that the prediction accuracy values of linear regression on the training and testing data were 4.20% and 27.76%, respectively [13], whereas those of the MLP on the training and testing data were 3.85% and 18.30%, respectively [20]. Therefore, the proposed GARGM(1,1) outperforms linear regression and the MLP. In case of linear regression, it is reasonable to speculate that the size of the training sample and the statistical assumptions (e.g., homoscedasticity) had a certain impact on the prediction performance of the statistical methods. Energy demand forecasting can be regarded as a grey system problem [1,12] because a few factors, such as income and population, influence energy demand. However, how exactly they influence energy demand is unclear. Therefore, based on the superior forecasting performance of the GARGM(1, 1) model in terms of energy demand, the applicability of the proposed forecasting model to other energy forecasting problems, such as electricity consumption in certain developing countries, should be explored. Moreover, sign estimation for the remnant forecasting model should be explored using other artificial intelligence tools such as the functionallink net [41,42] and other nonadditive neural networks [43][44][45] could improve prediction accuracy.
Moreover, it has been known that the GA can be time consuming in searching for optimum solutions. Although forecasting energy demand cannot be treated as a kind of large-scale optimization problem, several improved versions, such as the parallel GA [46][47][48][49][50], may be used to expedite the construction of the model or increase its precision for optimum solutions of the proposed GARGM(1,1) model.