An effective forecasting model for short-term load plays a significant role in promoting the management efficiency of an electric power system. This paper proposes a new forecasting model based on the improved neural networks with random weights (INNRW). The key is to introduce a weighting technique to the inputs of the model and use a novel neural network to forecast the daily maximum load. Eight factors are selected as the inputs. A mutual information weighting algorithm is then used to allocate different weights to the inputs. The neural networks with random weights and kernels (KNNRW) is applied to approximate the nonlinear function between the selected inputs and the daily maximum load due to the fast learning speed and good generalization performance. In the application of the daily load in Dalian, the result of the proposed INNRW is compared with several previously developed forecasting models. The simulation experiment shows that the proposed model performs the best overall in short-term load forecasting.
Citation: Lang K, Zhang M, Yuan Y (2015) Improved Neural Networks with Random Weights for Short-Term Load Forecasting. PLoS ONE 10(12): e0143175. https://doi.org/10.1371/journal.pone.0143175
Editor: Jesus Malo, Universitat de Valencia, SPAIN
Received: October 20, 2014; Accepted: November 2, 2015; Published: December 2, 2015
Copyright: © 2015 Lang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: The load data of Dalian from January 2012 to December 2013 are available at: http://www.ln.sgcc.com.cn/html/dl/col132/column_132_1.html and future researchers can contact the Dalian Electricity Corporation representative QL Luo (firstname.lastname@example.org) with further inquiries related to data access. Data of daily average temperature of Dalian from January 2012 to December 2013 are available at: http://www.cma.gov.cn/2011qxfw/2011qsjgx.
Funding: MZ received funding from National Natural Science Foundation of China with Grant No. 51208081. http://isisn.nsfc.gov.cn/egrantweb/. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
As with water supply, gas supply, communications, and transportation systems, electric power system is a necessary component of the urban lifeline engineering as well. Accurate load forecasting is increasingly important since it is critical for the planning, operations and investments of power systems . Improving the accuracy of load forecasting contributes to the promotion of the power supply efficiency and the reduction of operating costs .
Load forecasting can be classified into long-term, mid-term, short-term and very short-term forecasting, based on the forecasting horizon. During the past decades, researchers have developed many different kinds of methods to improve the load forecasting accuracy , especially in the field of short-term load forecasting [3–5]. Most of these methods have been restricted in practical applications due to the randomness and nonlinearity of the short-term load. In contrast, some intelligent forecasting calgorithms, such as artificial neural network (ANN) [6,7] and support vector machine (SVM), have been widely used . Park et al. first used ANN to forecast short-term load . Lee et al. analyzed the influence of different structures of the ANN on forecasting results . Hippert et al gave a review of ANN methods for short-term load forecasting, and pointed out the overfitting problems existing in ANN methods . Taylor et al. took the weather into account while modeling with ANN methods . In addition, SVM performs well in the field of short-term load forecasting as well. Moreover, as SVM is based on the structural risk minimization framework, it can overcome overfitting problems effectively . However, the effectiveness of SVM depends on the selection of kernel, the kernel's parameters, and the regularization parameter. Typically, each combination of parameters is checked using the cross validation, and the best combination of parameters is often selected by the grid search method with the exponentially growing computational complexity. Simulated annealing algorithm , genetic algorithm  and particle swarm optimization were used by some researchers to select the proper parameters of SVM.
Recently, researchers from all over the world have been improving the ANN according to different forecasting tasks and have obtained some satisfying results . Nevertheless, the gradient-based learning algorithms are widely used to train traditional ANNs, which may result in some drawbacks such as the slow convergence speed, the local minimum, and the overfitting phenomenon. In order to solve the aforementioned problems, we focus our study in this paper on an improved machine learning algorithms based on neural networks with random weights (NNRW) models . There are three layers in NNRW: input layer, hidden layer, and output layer. In the NNRW, the weights connecting the input layer to the hidden layer, as well as the bias values of the hidden layer, are randomly generated before the learning process. Only the weights connecting the hidden layer to the output layer are trained by the fast linear regression. Because of the rapid learning speed and the good generalization performance, NNRW has been successfully used in fields of computational intelligence and machine learning communities, such as electricity price forecasting , power loss analysis , lying and truth-telling classification , and attention-deficit/hyperactivity disorder (ADHD) classification . The structure of NNRW, i.e. the number of the hidden nodes, is one of the important factors that affect the performance of NNRW. It is empirically determined by the users. Recently, neural networks with random weights and kernels (KNNRW) [19–22] has been proposed by replacing the hidden nodes mapping with the kernel mapping. It does not need to determine the number of hidden nodes of KNNRW.
Based on the analysis above, this paper proposes a short-term load forecasting method based on KNNRW, which can combine the fast learning speed of NNRW and the good generalization performance of SVM. Eight relevant factors (e.g., the historical load data, the temperature data, and the holiday data) are first selected as the inputs of the forecasting model. It is known that the inputs are treated equally in KNNRW. However, different inputs may have different influences on the forecasting values. As a result, a mutual information weighting algorithm is then applied to allocate different weights to the inputs according to the corresponding influences. Finally, the resulting improved neural networks with random weights is used to approximate the nonlinear function between the selected inputs and the daily maximum load.
Neural Networks with Random Weights and Kernels
2.1 Basic Neural Networks with Random Weights
NNRW has been proposed by Schmidt et al. . However, there are still existing some similar ideas coming out from other researchers, such as Pao et al.  and Huang et al. . Pao et al. described such randomized learner models as the random vector functional-link (RVFL) net . Huang et al. defined such machine learning models as extreme learning machine (ELM) . Researchers have done some further researches on RVFL and ELM, and achieved some theoretical results [22,25,26]. In fact, a feed forward NNRW has a simple three-layer structure: input layer, output layer, and a hidden layer consisting of a large number of nonlinear processing nodes. Mathematically, NNRW  can be expressed as follows: (1) where Win ∈ RL×m is the input weight matrix, b ∈ RL is the bias value vector of the hidden layer, w ∈ R L is the output weight vector, g(⋅) is the activation function (g(⋅) could be almost any nonlinear piecewise continuous activation function or any linear combination of these functions), N is the number of the samples, L is the number of the hidden layer nodes, xk ∈ Rm is the input vector which has m-dimension features, and ok ∈ R is the output value.
The output of the proposed forecasting model is the maximum load of the next day. Consequently, we use the single output form of NNRW in this paper.
The matrix-vector formulation of (2) can be written as (3) where is the hidden layer output matrix of NNRW, and t = [t1, t2,…,tN]T is the desired output vector.
In the NNRW model, Win and b are generated randomly beforehand, and remain fixed in the training process. w is the only parameter that needs to be tuned through the training. It can be calculated analytically as follows: (4) where H† is the Moore-Penrose generalized inverse of matrix H.
The training of the NNRW model can be summarized as follows:
- Randomly generate the input weight Win and the hidden layer bias b;
- Calculate the hidden layer output matrix H;
- Calculate the output weight w by (4).
As can be seen from the above, the training process of NNRW is a simple linear regression process, which can overcome the limitations of traditional ANNs effectively. Despite the success of NNRW, there is still room for improvement, such as the determination of the structure (i.e., the number of the hidden layer nodes), and the ill-conditioned solution in the training process .
2.2 Neural Networks with Random Weights and Kernels
In order to overcome the aforementioned shortcomings of NNRW, neural networks with weights and kernels (KNNRW) has been proposed by introducing the kernel function mapping of SVM as the hidden node mapping of NNRW [19,21].
The optimization problem of NNRW can be written as: (5) where ξi is the training error related to the ith training sample xi, C is the regularization coefficient, and h(xi) denotes the ith row of H. The corresponding dual optimization problem of (5) can be formulated as: (6) where αi is the Langrage multiplier with respect to the ith training sample xi. The corresponding Karush-Kuhn-Tucker (KKT) conditions are as follows: (7) (8) (9)
It can be seen from (12) that the specific form of h(x) is not important as long as the dot product of HHT (or h(x)HT) is known. As a result, if the hidden node mapping h(x) is unknown, we can define the kernel matrix of KNNRW as follows: (13)
In the kernel implementation of NNRW, h(x) can be unknown, while the corresponding kernel function K (u, v) usually should be given (e.g., K (u, v) = exp(−γ‖u−v‖2), where γ is the kernel width.). Hence, the number of the hidden layer nodes does not need to be determined any more. Moreover, the KNNRW has the following universal approximation capability:
Theorem : Universal Approximation Capability: According to NNRW, a widespread type of the hidden node mapping h(x) can be used in NNRW so that NNRW can approximate any continuous target function. In other words, given any target continuous function g(x), there is a weight vector w such that (15)
With this universal approximation capability, KNNRW can use a wide range of feature mappings, such as Sigmoid, radial basis function (RBF), trigonometric, and polynomial mappings. The optimization objective functions of KNNRW are similar to those of traditional SVM/least squares support vector machine (LS-SVM). However, KNNRW does not have any constraints on the Lagrangian multipliers. As a result, KNNRW can obtain a better solution than SVM/LS-SVM. In addition, as KNNRW does not need the bias values while SVM does need, it is superior to the traditional SVM/LS-SVM algorithms in the performance of the scalability and learning rate .
Short-Term Load Forecasting Model Based on KNNRW
3.1 Inputs of KNNRW
In this section, the proposed INNRW was used to forecast the short-term load of Dalian city of China. The output of the model was the daily maximum load. With the analysis in literature , the load data had weekly and monthly characteristics. It can be seen from Figs 1 and 2 that values of the load remain stable on weekdays while dropping apparently at weekends; values of the same month show approximately the same tendency; values of every week indicate a regular variation tendency (Take Dalian as an example). Therefore, we took both weekly and monthly characteristics as the inputs. Additionally, it was verified that the temperature was an essential factor influencing the maximum load , and the temperature showed an obvious correlation with the maximum load. Therefore, we selected the temperature as another input.
The blue line represents the maximum load from November 5th of 2012 to December 2nd of 2012. The red line represents the maximum load from November 4th of 2013 to December 1st of 2013.
The blue line represents the maximum load from March 1st of 2012 to July 31st of 2012. The red line represents the maximum load from March 1st of 2013 to July 31st of 2013.
Meanwhile, the holiday data also affected the maximum load, for the descent of the industries power consumptions during the holidays can lead to the decrease of the total power consumptions. For example, as is known, there were 6 Chinese legal holiday vacations in 2012, and they were from 1st January to 3rd January, from 22nd January to 28th January, from 2nd April to 4th April, from 29th April to 1st May, from 22nd June to 24th June, and from 30th September to 7th October, respectively. In addition, it can be clearly seen from Fig 3 that the load data have an obvious holiday characteristic, that is, values of the load descend sharply during the holidays. Consequently, the binary encoded holiday data served as an input in this paper. As the maximum load was closely related to the historical maximum load, which can be verified by analyzing the load data as time series, we selected the maximum load of the day before, and that of the day last week as inputs of KNNRW.
The blue line represents the maximum load of 2012.
Finally, the inputs selected for the INNRW were month of the year, day of the month, day of the week, week number, holiday indicator, daily average temperature, maximum electricity load of the day before, and maximum electricity load of the day last week.
3.2 Mutual Information Weighting Algorithm
In order to further improve the forecasting accuracy, the contributions of the inputs to the output of KNNRW were calculated and the weight values were allocated to the inputs accordingly. The mutual information (MI) is a measurement of the variables’ mutual dependence [27–30]. Accordingly, the high mutual information indicates the high dependence, and the low mutual information indicates the low dependence.
For two given discrete variables X and Y, suppose the joint probability distribution was PXY(x, y), and the mutual information between X and Y, denoted I(X;Y), can be formatted as (16) where PX(x) and PY(y) were the marginal probability distribution (17)
In the case of continuous variables, (16) was replaced by (18) where PXY(x, y) was the joint probability density function of X and Y, and PX(x) and PY(y) were the marginal probability density functions of X and Y, respectively.
For discrete feature variables, both the joint and marginal probability can be estimated by tallying the samples of the categorical variables in the data. For continuous feature varibles, the following Parzen windows method was used to approxiamte I(X;Y).
Given N samples of a vector variable x, the approximate density funciton had the following form: (19) where x(i) was the ith sample, h was the window width, and δ(⋅) was the Parzen window function: (20) where z = x − x(i), d was the dimension of the sample x and Σ was the covariance of z. When d = 1, (19) returned the estimated marginal density; when d = 2, we can use (19) to estimate the density of the bivariate (x, y), PXY (x, y), which was the joint density of x and y in fact.
Hence, in this paper, we used the mutual information to determine the contribution of the inputs to the output of the INNRW. First, the mutual information MIi, i = 1,…, m of the inputs to the output were calculated. Then the weights can be allocated to the corresponding inputs according to the following equation (21) where μi was the weight allocated to the ith input. Then, the input of KNNRW can be expressed as . And the resulting forecasting model was denoted as the improved neural networks with random weights.
In order to verify the effectiveness, the proposed model was applied to forecast the actual maximum load. The electricity load data from January 1, 2012 to November 30, 2013 from the Dalian Electricity Corporation in China, the temperature, the holiday indicator and some other data were used to train the forecasting model. Daily maximum load data of 31 days in December of 2013 were used to test the performance of the forecasting model. The forecasting results were described using Mean Absolute Percentage Error (MAPE), Maximum Error (ME) and Forecasting Error (FE) as follows: (22) (23) (24) where stood for the actual values of the daily maximum load, stood for the forecasting values of the daily maximum load, and n stood for the number of days.
4.1 Simulation Experiment
Firstly, data sets were normalized. The inputs were normalized to [–1, 1] and the outputs were normalized to [0, 1]. According to (18), the weights were calculated and allocated to the corresponding inputs.
Secondly, the INNRW model was initialized, in which the Gaussian kernel function was used in the hidden layer, and the regularization coefficient and kernel width were determined by the grid search.
Thirdly, the INNRW model was trained by the training samples.
Fourthly, the testing samples based on the trained INNRW were forecasted, and the forecasting results of the daily maximum load of 31 days in December of 2013 were obtained.
Eventually, the residual errors between predicted values and actual values were calculated.
4.2 Experiment Results
Based on the analysis above, the Gaussian kernel function K(u, v) = exp(−γ‖u−v‖2), where γ was the kernel width, was chosen to be the kernel function in the INNRW model. Fig 4(A) illustrates the relations among MAPE, the kernel width and the regularization coefficient, while Fig 4(B) illustrates the relations among ME, the kernel width and the regularization coefficient. It can be seen from Fig 4 that both the kernel width and the regularization parameter are key parameters influencing the forecasting performance of the INNRW. The grid search method was used to optimize the two parameters. The optimal kernel width was 3.7276e+03, and the optimal regularization parameter was 1.3895.
(A) Kernel width and regularization coefficient of MAPE. (B) Kernel width and regularization coefficient of ME.
In order to further illustrate the effectiveness of the proposed method, a comparison was conducted between the INNRW method and several state-of-the-art load forecasting methods, such as back propagation (BP) neural network, RBF neural network, support vector regression (SVR), NNRW, online sequential extreme learning machine (OS-ELM) and KNNRW. The forecasting results were shown in Table 2, Table 3 and Table 4.
As can be seen from Table 4, KNNRW and the proposed INNRW can obtain much better forecasting results in both MAPE and ME indexes than the other methods. Moreover, the INNRW outperforms KNNRW in both indexes, which demonstrates the effectiveness of the weighting algorithm. The forecasting results of December in 2013 based on the INNRW were described in Fig 5. It is clear that the predicted values can be approximately fitting the tendency of the actual values. Consequently, the effectiveness of the proposed method was well verified.
The blue solid line represents the forecasting maximum load of December in 2013. The red dotted line represents the actual maximum load of December in 2013. The green line represents the forecasting error between the forecasting values and the actual values.
A forecasting model based on the INNRW was proposed for the short-term load forecasting. Through the data pre-processing, eight features, i.e. month of the year, day of the month, day of the week, week number, holiday indicator, daily average temperature, maximum electricity load of the day before, and maximum electricity load of the day last week, were selected as the inputs of the INNRW. Then, in order to further improve the forecasting accuracy, different weights were allocated to the inputs according to their mutual information with the forecasting load values. A novel neural network, KNNRW, which combined the universal approximation ability and the fast learning speed of NNRW and the good generalization performance of SVM, was used to model the nonlinear function between the selected inputs and the maximum load. Simulation experiment results based on the actual load data from Dalian, China, showed that the proposed method can obtain smaller predicted errors than the traditional forecasting methods in both MAPE and ME.
The kernel types and kernel parameters were crucial to the forecasting performance of the INNRW, and they were selected by the time-consuming grid search in this paper. The multiple kernel learning will be a potential solution. It is able to combine the kernel funtions which have different types or different parameters. As a result, the investigation of the multiple kernel learning in the INNRW will be a subject of the further research.
This work was supported by the National Natural Science Foundation of China with Grant No. 51208081.
Conceived and designed the experiments: KL. Performed the experiments: KL. Analyzed the data: KL. Contributed reagents/materials/analysis tools: KL MZ YY. Wrote the paper: KL. Reviewed the manuscript: MZ YY.
- 1. Espinoza M, Suykens JA, Belmans R, De Moor B. Electric load forecasting. Control Systems, IEEE 2007, 27, 43–57.
- 2. Ortiz-Arroyo D, Skov MK, Huynh Q. In Accurate electricity load forecasting with artificial neural networks, Computational Intelligence for Modelling, Control and Automation, 2005 and International Conference on Intelligent Agents, Web Technologies and Internet Commerce, International Conference on, 2005; IEEE: pp 94–99.
- 3. Yan J, Li K, Bai E-W. In Prediction error adjusted gaussian process for short-term wind power forecasting, Intelligent Energy Systems (IWIES), 2013 IEEE International Workshop on, 2013; IEEE: pp 173–178.
- 4. Hippert HS, Pedreira CE, Souza RC. Neural networks for short-term load forecasting: A review and evaluation. Power Systems, IEEE Transactions on 2001, 16, 44–55.
- 5. Lee K, Cha Y, Park J. Short-term load forecasting using an artificial neural network. Power Systems, IEEE Transactions on 1992, 7, 124–132.
- 6. Yu X, Efe MO, Kaynak O. In A backpropagation learning framework for feedforward neural networks, IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2001; Citeseer: pp 700–702.
- 7. Yu X, Efe MO, Kaynak O. A general backpropagation algorithm for feedforward neural networks learning. Neural Networks, IEEE Transactions on 2002, 13, 251–254.
- 8. Park DC, El-Sharkawi M, Marks R, Atlas L, Damborg M. Electric load forecasting using an artificial neural network. Power Systems, IEEE Transactions on 1991, 6, 442–449.
- 9. Taylor JW, Buizza R. Neural network load forecasting with weather ensemble predictions. Power Systems, IEEE Transactions on 2002, 17, 626–632.
- 10. Chen B-J, Chang M-W, Lin C-J. Load forecasting using support vector machines: A study on eunite competition 2001. Power Systems, IEEE Transactions on 2004, 19, 1821–1830.
- 11. Pai P-F, Hong W-C. Support vector machines with simulated annealing algorithms in electricity load forecasting. Energy Conversion and Management 2005, 46, 2669–2688.
- 12. Pai P-F, Hong W-C. Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms. Electric Power Systems Research 2005, 74, 417–425.
- 13. He Z, Hu Q, Zi Y, Zhang Z, Chen X. Hybrid intelligent forecasting model based on empirical mode decomposition, support vector regression and adaptive linear neural network. In Advances in natural computation, Springer: 2005; pp 324–327.
- 14. Schmidt WF, Kraaijveld M, Duin RP. In Feedforward neural networks with random weights, Pattern Recognition, 1992. Vol. II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on, 1992; IEEE: pp 1–4.
- 15. Chen X, Dong ZY, Meng K, Xu Y, Wong KP, Ngan H. Electricity price forecasting with extreme learning machine and bootstrapping. Power Systems, IEEE Transactions on 2012, 27, 2055–2062.
- 16. Nizar A, Dong Z, Wang Y. Power utility nontechnical loss analysis with extreme learning machine method. Power Systems, IEEE Transactions on 2008, 23, 946–955.
- 17. Gao J, Wang Z, Yang Y, Zhang W, Tao C, Guan J, et al. A novel approach for lie detection based on f-score and extreme learning machine. PloS one 2013, 8, e64704. pmid:23755136
- 18. Peng X, Lin P, Zhang T, Wang J. Extreme learning machine-based classification of adhd using brain structural mri data. PloS one 2013, 8, e79476. pmid:24260229
- 19. Blum A. Random projection, margins, kernels, and feature-selection. In Subspace, latent structure and feature selection, Springer: 2006; pp 52–68.
- 20. Balcan M-F, Blum A, Vempala S. Kernels as features: On kernels, margins, and low-dimensional mappings. Machine Learning 2006, 65, 79–94.
- 21. Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 2012, 42, 513–529.
- 22. Huang G-B. An insight into extreme learning machines: Random neurons, random features and kernels. Cognitive Computation 2014, 1–15.
- 23. Pao Y-H, Takefji Y. Functional-link net computing. IEEE Computer Journal 1992, 25, 76–79.
- 24. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501.
- 25. Pao Y-H, Park G-H, Sobajic DJ. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 1994, 6, 163–180.
- 26. Igelnik B, Pao Y-H. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. Neural Networks, IEEE Transactions on 1995, 6, 1320–1329.
- 27. Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Pattern Analysis and Machine Intelligence, IEEE Transactions on 2005, 27, 1226–1238.
- 28. Lian C, Zeng Z, Yao W, Tang H. Extreme learning machine for the displacement prediction of landslide under rainfall and reservoir level. Stochastic Environmental Research and Risk Assessment 2014, 28, 1957–1972.
- 29. Frenzel S, Pompe B. Partial mutual information for coupling analysis of multivariate time series. Physical review letters 2007, 99, 204101. pmid:18233144
- 30. Kraskov A, Stögbauer H, Grassberger P. Estimating mutual information. Physical review E 2004, 69, 066138.