Electricity forecasting on the individual household level enhanced based on activity patterns

Leveraging smart metering solutions to support energy efficiency on the individual household level poses novel research challenges in monitoring usage and providing accurate load forecasting. Forecasting electricity usage is an especially important component that can provide intelligence to smart meters. In this paper, we propose an enhanced approach for load forecasting at the household level. The impacts of residents’ daily activities and appliance usages on the power consumption of the entire household are incorporated to improve the accuracy of the forecasting model. The contributions of this paper are threefold: (1) we addressed short-term electricity load forecasting for 24 hours ahead, not on the aggregate but on the individual household level, which fits into the Residential Power Load Forecasting (RPLF) methods; (2) for the forecasting, we utilized a household specific dataset of behaviors that influence power consumption, which was derived using segmentation and sequence mining algorithms; and (3) an extensive load forecasting study using different forecasting algorithms enhanced by the household activity patterns was undertaken.


Introduction and problem statement
Throughout the EU, there has been considerable interest in smarter electricity networks, where increased control over electricity supply and consumption is going to be achieved thanks to investments and improvements in new technologies such as Advanced Metering Infrastructure (AMI). Smart metering is part of this movement, and it is perceived as a necessary step to achieving EU energy policy goals by the year 2020, that is, to cut greenhouse gas emissions by 20%, to improve energy efficiency by 20% and to ensure that 20% of EU energy demand is supplied by renewable energy sources.
Smart metering systems are a part of micro-grid which includes a variety of operational and energy measures including smart appliances, renewable energy resources and energy efficient resources. One of the most challenging problems associated with operation of micro-grids is the optimal energy management of residential buildings with respect to multiple and often conflicting objectives [1]. Recently, attention is paid to smart grid vision and smart homes that can optimize energy consumption and lower electricity bills. Developing a smart home energy PLOS  management system has become a common global priority to support the trend towards a more sustainable and reliable energy supply for smart grid as indicated in [2][3][4][5]. First, the new metering infrastructure is expected to ensure automated reading and billing based on actual usage. Second, by collecting high frequency consumption data, the system meets prerequisite for the implementation of cost reflective prices, varying based on the time of consumption. Third, these new metering systems are intended to contribute to reductions in the overall energy consumption by increasing the energy awareness of the users. One of the most important aspects of smart metering systems is to encourage users to use less electricity by being better informed about their consumption patterns. Forecasting usage provides customers with the possibility of linking current usage behaviors with future costs. Therefore, customers may benefit from forecasting solutions through greater understanding of their own energy consumption and their future projections, allowing them to better manage the costs of their usage. By making energy consumption and future projections more transparent, it would be easy to understand how much we are actually using and how it would affect our budget in the future. Of course, we should remember that technology alone will not be enough to change the way people consume energy, but it provides a method for using energy in a deliberate and conscious way. Therefore, we believe that our research fits into an attempt to generate value added for individual customers within the field of Residential Power Load Forecasting (RPLF) methods.
In this paper, we will study an approach to forecast the hourly electricity loads of individual consumers for 24 hours by taking into account historical electricity consumption and the household's behavioral data. In particular, based on smart metering data, we aim to provide answers to the following research questions: 1. Is it possible to provide accurate load forecasting for 24 hours on the individual household level and to what extent?
2. Are the clustering and sequence recognition algorithms good tools for identifying patterns of household behavior?
3. Do the usage pattern variables of the household enhance the forecasting accuracy of individual consumer loads? 4. What kind of forecasting methods and algorithms are appropriate to address high volatility data?
The structure of the paper is organized as follows: a short literature review on similar problems is provided in section 2. In section 3, the approach to detect household activity patterns based on the Almanac of Minutely Power Dataset (AMPds) [6] is shown. The household specific behavioral data influencing power consumption are derived using the segmentation and sequence mining algorithms. Then, in section 4, a number of numerical experiments aimed to provide accurate 24-hour forecasts on the household level are presented. The scalability of the approach based on WikiEnergy data [7] gathered for 46 households is shown in section 5. Finally, section 6 concludes the paper.

Literature review of similar problems
The field of electricity load forecasting is mature with numerous approaches that have been proposed throughout the years. They have usually focused on system demand level forecasting, which has a load reaching tens of megawatts or gigawatts. A general overview of short-term load forecasting can be found in [8][9], and some more classic surveys are provided in [10] and [11]. Different methods have been proposed for forecasting the electric load demand in the last decades. Some of the most popular include time series analyses with the autoregressive integrated moving average (ARIMA) method [12], fuzzy logic [13], the neuro-fuzzy method [14], artificial neural networks (ANNs) [15][16][17], and support vector machines (SVMs) [18][19].
Recently, with advances in communication infrastructure for remote and automated data reading, there has been increasing interest in RPLF. However, patterns of electricity use at a system demand level and at an individual level are very different. For instance, Fig 1A) shows the pattern of electricity use for a single random dwelling extracted from the WikiEnergy data [7]. The profile shows a peak in the morning at approximately 7-8 am and a second peak that is smaller than the peak in the morning between 4 pm and 6 pm. In contrast, Fig 1B) reflects a distinctly different pattern of electricity use for a group of 46 households on the same day of the year. The figure shows a smooth profile shape with relatively little electricity consumption in the early afternoon, a clearly defined peak in the morning and a slightly smaller defined peak in the evening.
Load forecasting on the individual household level is a challenging task due to the extreme system volatility as a result of dynamic processes composed of many individual components. Typical home loads are between 1 and 3 kWh and can be influenced by a number of factors, such as the operational characteristics of devices, the behaviors of the users, economic factors, time of the day, day of the week, holidays, weather conditions, geographic patterns and other random effects. Aggregation reduces the inherent variability in electricity consumption resulting in increasingly smooth load shapes, and as a result, the relative forecasting errors typically seen at the level of substations and power systems has been quite low in terms of MAPE (1% − 2%) [20], [15], [21]. The forecasting performance at the individual level shows much higher errors ranging from 20% to 100% (and even higher), and it depends on dwelling lifestyle and regularity of appliance usage [22][23][24][25].
Some interesting examples of electricity forecasting on the individual household level are indicated below.
Given sensor data collected from three residential homes, the authors of [26] aimed to determine which machine learning technique performed best at predicting whole building energy consumption for the next hour. The results showed that LS-SVM is the best technique for predicting each home's future electrical consumption. In general, the proposed methods achieved MAPE of 1.60% to 13.41% for a 700 kWh commercial building and between 15% and 32% for three homes with mean consumptions close to 1.5 kWh.
In [27], the authors investigated high-resolution data from 3 private households collected over 30 days. The conclusion was that advanced forecasting methods can yield better forecasts, even when carried out on aggregated household consumption data that could be obtained from smart meters. Based on disaggregated data from smart homes, sensors with persistence and smart meter benchmarks reveal substantial forecast improvements of 4% to 33% in terms of the mean absolute error.
In [28], various methods were utilized to forecast peak demand for individual homes. The authors concluded that a home's historic peak load and occupancy is a better predictor of peak load than temperature or season. They also showed that Seasonal Auto-Regressive Moving Average (SARMA) can be used to model both the intrinsic load pattern and consumer activity in a home and that it has 30% lower mean square error than regression-based techniques.
In [22], Kalman filter-based forecasting resulted in load forecast with MAPE of 30% for a sampling period and forecasting horizon equal to one hour. Shorter time intervals between receiving real-time measurement data from the customers' smart meter improved the accuracy of the proposed method and resulted in MAPE of nearly 13%.
In [29], based on power consumption measurements from 23 households collected across Japan, the authors proposed a support vector regression model and an activity sequence driven approach for inferring future activities and enhancing load forecasting. The activity sequence variable turned out to be an influencing factor that could improve the accuracy of load forecasting 15 minutes ahead for individual households, reaching 42% MAPE on average. Additionally, the study revealed that half of the households could not benefit from activity sequences to reduce their forecasting error.
In [30], the authors applied a number of forecasting methods including ARIMA, neural networks, and exponential smoothening using several strategies for training data selection (including day type and sliding window) with forecasting horizons ranging between 15 minutes and 24 hours. The evaluation was performed on two data sets; the first one was a single household in Germany, and the second one was for six households in the United States. The results indicate that forecasting accuracy varies significantly depending on the choice of forecasting method/strategy and the parameter configuration. In general, the MAPE ranged between 5% and greater than 100%, and the average MAPE for the first data set was approximately 30%, while it was approximately 85% for the other data set.
In [23], the authors presented an approach to forecast electricity loads on the individual household level using CART, SVM and MLP neural networks for 24 hour short-term load forecasts. The study concluded that a combination of historical usage data and household behavioral data could greatly enhance the forecasting of individual consumer loads. The obtained MAPE were 51% for the neural networks and 48% for the SVM.
Based on the literature review, there is a clear and increasingly recognizable research trend that looks at challenges associated with behavioral factors that impact the energy usages of individual appliances observed at the household level. The rationale is to provide feedback on usage patterns and derive significant underlying associations between several contextual factors including time of use and user activities. It is expected that the insights may increase awareness and understanding of home energy consumption and may be used as an additional variable that can enhance electricity forecasting.

Data characteristics
The analysis was prepared based on the collection of electricity consumption data from a single house. The dataset is known as the Almanac of Minutely Power dataset (AMPds) [6] and contains two years of recorded energy consumption data (at one minute intervals) using 21 submeters and covering the time span between April 1st, 2012, and March 31st, 2014. The monitored house was built in 1955 in the greater Vancouver region in British Columbia, and it underwent major renovations in 2005 and 2006, which resulted in it receiving a Canadian Government EnerGuide rating of 82%.
The list of the monitored appliances is presented in Table 1. For the purpose of the numerical experiments, the cut-off value for each appliance was proposed based on visual analysis.
The analysis was narrowed to the most energy-intensive household appliances. These were Clothes Dryer (Dryer), Clothes Washer (Wash), Dishwasher (Dish), Heat Pump (Heat), and Instant Hot Water Unit (Instant). The other appliances were not considered due to their insignificant activities (e.g., Basement Plugs and Lights), continuous activity (e.g., Entertainment: TV, PVR, AMP), or those not showing any repetitive patterns (e.g., Electronics Workbench).
The starting point for the usage pattern detection was to prepare a matrix with switching on probabilities for each of the individual devices over a specified time period. The probabilities  Table 2 presents the matrix with observed probabilities for each appliance turn ON event over the analyzed period of 2 years. The probabilities for each appliance are equal to 1. The highest probabilities (more than 0.07) for each appliance are shown in bold.
It can be noticed that the highest probabilities to use the cloth dryer are at midday (between 12 am and 2 pm) and in the evening (10 pm). The clothes washer is used frequently between 9 am and 2 pm. The use of the dishwasher usually occur in the evening between 6 pm and 10 pm. The heat pump operates more frequently in the morning at 7 am, while the instant hot water unit reaches its peak in the evening, between 5 pm and 8 pm.
In the same manner, a larger table (S1 Table) has been created, which consists of 24 rows (representing hours) and 35 columns (representing appliances over the seven days of the week). For each appliance, seven columns show the probabilities of the turn ON events on a specified day of the week. In this case, the probabilities over the whole week are equal to 1, for each appliance.

Activity segmentation
In the industry, segmentation has been used by large electricity suppliers to group customers together that share similar characteristics in terms of electricity usage. However, its use at the individual household level has been somewhat limited. Currently, thanks to smart meter data, there is an opportunity to capture and present individual patterns of energy consumption derived by the clustering algorithms. The discovered patterns of home appliance usage can be visualized to help the users understand their own energy consumptions, and these patterns can be used to feed the forecasting models, which is our motivation. We believe such an enhanced set of explanatory variables can significantly improve the accuracy of the forecasts generated at the household level. For this purpose, we propose hierarchical clustering and grade data analysis. Hierarchical cluster analysis is an algorithmic approach to find discrete groups with varying degrees of similarity in a dataset represented by a similarity matrix. These groups are hierarchically organized as the algorithms proceed and may be presented as a dendrogram.
One of the most popular agglomerative clustering algorithm is Ward's method [31]. Basically, it looks at cluster analysis as an analysis of variance problem, instead of using distance metrics or measures of association. It looks for groups of leaves, which it forms into branches. Then, the branches are formed into limbs and eventually into the trunk. Ward's method starts out with n clusters of size 1 and continues until all of the observations are included in one cluster.
The purpose of this analysis is to discover similar profiles or, in other words, appliances with similar switch ON probability distributions throughout the day or the week. As a result of grouping using Ward's method with the Euclidean distance measure, the following dendrogram for the data presented in Table 2 was obtained and presented in Fig 2. The height of each edge of the dendrogram is proportional to the distance between the joined groups. As shown in Fig 2, there are two groups that are distinctly separated from each other. From the visual analysis of the dendrogram, it can be observed that the switch ON probabilities of the clothes washer and heat pump are very similar (cluster marked in blue). A similar correlation in periods of joint work can be seen in the case of the dishwasher, clothes dryer and instant hot water unit (clusters marked in red).
Graphical representation of the data from S1 Table is shown in Fig 3. In the case of those data, it is difficult to draw clear conclusions about the dependencies of the co-occurrences of the appliances by day of the week. However, at the bottom, there is a cluster associated with the dishwasher activity on all of the week days (cluster marked in gray). Another group (marked in green) represents the washing machine over the entire week except Sunday. The usage profile of this device is most similar to the usage profile of the clothes dryer on the weekends (yellow). Finally, the profiles of the heat pump over the entire week are clustered in one group (on top of the chart, in purple). They are the least similar to the profiles of the other devices.
Grade data analysis is an efficient technique that works on variables measured on any measurement scales (including categorical) because it is based on dissimilarity measures such as concentration curves and some precisely defined measure of monotonic dependence. Its main framework is composed of grade transformation proposed by [32]. The idea is to transform any distribution of two variables into a convenient form of the so called grade distribution. This transformation leaves the orders of the variables, ranks, and values of monotone dependence measures (such as Spearman's ρ Ã and Kendall's τ) unchanged. In case of empirical data, this approach is focused on analyzing the two-way table with objects/variables, which is preceded by proper recoding of variable values.
The main tool of the grade data methods is Grade Correspondence Analysis (GCA), which refers to classical correspondence analysis and extends it by the mean of the grade transformation. Briefly, GCA orders the variables/objects in a table in a way such that neighboring objects are more similar than those that are further apart, and at the same time, neighboring variables are also more similar than those that are further apart. After the optimal ordering is found, it is possible to aggregate neighboring objects and neighboring variables and, therefore, to build segments with similar distributions.
The data structure presented in Table 2 has been analyzed using the GradeStat tool [33], which was developed at the Institute of Computer Science Polish Academy of Science.
The first step was to calculate over-representation ratios for each field (cell) of the table. A given m × k data matrix with non-negative values can be visualized using an over-representation map in the same way as a contingency table [34]. Instead of frequency n ij , the value of the j-th variable for the i-th object is used. Next, it is compared in a contingency table with the corresponding neutral or fair representation n i• × n •j /∑∑n ij where n i• /∑ j n ij , n •j /∑ i n ij . The ratio of the first and second expressions is called the over-representation ratio. An over-representation surface over a unit square is divided into m × k rectangles situated in m rows and k columns, and the area of the rectangle placed in row i and column j being equal to fair representation of normalized n ij . For instance, taking into account the use of the dishwasher at 8 pm, the ratio would be equal to 1.79153 because the probability of using the dishwasher in this hour is 0.11 and the row sum is 0.307 (for five appliances). Thus, we have 1.79153 = 0.11/(0.307/5). Using the over-representation ratios, the over-representation map for the initial raw data can be constructed. The color of each field in the map depends on the comparison of two values: (1) the real value of the measure connected to the considered field and corresponding to the population element; (2) the expected value of the measure. Fig 4 presents the initial over-representation map for the analyzed data. The colors of the cells in the map are grouped into three classes: 1. gray-the measure for the element is neutral (ranging between 0.99 and 1.01). which means that the real value of the measure is equal to its expected value; 2. black or dark gray-the measure for the element is over-represented (between 1.01 and 1.5 for weak over-representation and more than 1.5 for strong), which means that the real value of the measure is greater than the expected one; 3. light gray or white the measure for the element is under-represented (between 0.66 and 0.99 for weak under-representation and less than 0.66 for strong under-representation). which means that the real value of the measure is less than the expected one.
The next step was to apply the grade analysis to measure the dissimilarity between two data distributions in order to reveal the structural trends in the data. For this reason. Spearman's ρ Ã was used as the total diversity index. The value of ρ Ã strongly depends on the mutual order of the map's rows and columns. To calculate ρ Ã the concentration indices of differentiation between the distributions are used. The basic procedure of GCA is executed through permuting the rows and columns of a table in order to maximize the value of ρ Ã . After each sorting, the ρ The result of the GCA procedure is presented in Fig 5. Additionally, cluster analysis was performed through the aggregation of some columns into one column (and for the rows respectively). The optimal number of clusters is obtained when the changes of the subsequent ρ Ã values appear to be negligible as referenced in [35]. The resulting order presents the structure of the underlying trends in the data. The clusters show typical usage patterns of home appliances. Two clusters in the top left corner (marked A1 and A2) are similar in terms of the same usage hours; they represent the dishwasher and the clothes dryer. This group of appliances is strongly over-represented in the evening hours, whereas they are strongly under-represented in the morning. The clusters in the bottom right corner are related to the heat pump and the clothes washer. The usages of these appliances usually occurred in the morning or at midday (marked with B1 and B2). The profile of the water heating device (located in the center of the over-representation map) is not related to the usages of the other appliances.
Additional, separate analysis was performed using the data from S1 Table. This was to inspect the differences in household usage patterns between weekdays, please see Fig 6 for the results.
The analysis revealed that the usage profiles for the devices broken down by day of the week differ only slightly from those observed as a whole. Only some cases, e.g. clothes dryer on Wednesday (marked with red border), are outside their natural group. In general, the patterns of behavior are similar to those presented in Fig 5.

Activity sequence mining
Sequence mining is an exploration technique that focuses on discovering statistically relevant patterns in the form of a sequence for a given data set. The resulting rules (patterns) adopt the following form of conditional statements: if appliance A was used, then appliance B will be used next.
In the home energy research field, sequence mining can be applied to capture the use of appliances in sequence. This kind of analysis gives insight that can help understand how Electricity forecasting on the individual household level enhanced based on activity patterns power consumption is influenced by certain activities and their sequences and how those activities are related to each other.
Within the collected data, it is possible to track sequences of different activities that a household performs throughout a day. To understand the daily activity sequences of a household, a sequence is given by all of the activities performed by the household during the day, ordered by the start time of the activity.
To discover only interesting and valuable rules from the dataset, three basic measures were used for their evaluation [36]: (1) support is defined as the proportion of days containing a specific itemset of the appliances to all of the days; (2) confidence is defined as the proportion of the observed support of the specific rule to the support of the left side item (corresponds to the conditional probability denoting if the left side occurred then also with some probability the right side of rule will occur); (3) lift is defined as the proportion of the observed support of the rule to the product of the supports of both sides of the rule, and it shows, in business terms, how many times more likely the appearance of appliance B with appliance A is than that with any other randomly chosen appliance. These measures are calculated according to following formulas: The goal of sequence rules is to find all of the strong rules, such that support level and confidence level are greater than a minimum threshold value. By considering only the rules with support greater than 1%, a minimum difference between the appliances' switch on events of at least 1 hour, and a maximum difference between consecutive elements in the sequence, the usage patterns presented in Table 3 were discovered.
All of the observed sequential rules have lift greater than one, which means that the occurrence of the elements on the left side of the rules influences the occurrence of the elements contained on the right side of the sequential rule. In particular, the following behavior patterns were observed: • with support equal to 4% and a confidence of 80%, if in a given hour the instant unit & clothes washer & dishwasher have been used, then in the next hours, the clothes washer & dishwasher will also have been used; • with support equal to 3% and a confidence of 56%, if in a given hour the instant unit & clothes washer have been used, then in the next hours, we could expect the dryer & dishwasher to be switched on, and they could operate for the next hour; • with support equal to 3% and a confidence of 67%, if in a given hour the heat pump & clothes washer have been used, then in the next hours, the instant unit & heat pump & clothes washer will also have been used and followed by the heat pump & dryer; • with support equal to 3% and a confidence of 64%, if in a given hour the instant unit & heat pump & clothes washer have been used, then in the next hours, the same set of appliances will also have been used and followed by the heat pump & dryer; • finally, with support equal to 5% and a confidence of 47%, if in a given hour the instant unit was switched on, then in the next hours, the heat pump & clothes washer will also have been used and followed by the instant unit & heat pump & dryer.

Accuracy measures
To assess the model performance for forecasting, three measures were used. These were precision, resistant mean absolute percentage error and accuracy [37]. Precision is defined as the measure of how close the model is able to forecast the actual load. To measure precision, the mean squared error (MSE) was used: where W hi is the observed load in hour i and P hi is the forecasted load in hour i. Mean Absolute Percentage Error (MAPE) was the second measure that was used, This measure satisfies the criteria of reliability. ease of interpretation and clarity of presentation. However, it does not meet the validity criterion because the distribution of the absolute percentage Electricity forecasting on the individual household level enhanced based on activity patterns errors is usually skewed to the right due to the presence of outlier values. In these cases, MAPE can be highly over-influenced by some very bad instances and can disrupt quite good forecasts. Therefore, we propose an alternative measure, called resistant MAPE or r-MAPE. based on the calculation of the Huber M-estimator, which helps to overcome the aforementioned limitation [38]. An M-estimator for the location parameter μ using the maximum likelihood (ML)-estimator is defined as a solution θ to or where φ = ρ 0 .σ is the scale parameter. For a given positive constant k, the Huber [39] estimator is defined by the following function in φ (3) where k is a tuning constant determining the degree of robustness and set at 1.5. The function is known as metric Winsorizing and brings in extreme observations to μ Ç k. In practice, σ is not known, thus, a MAD robust estimator was used: Finally, an accuracy measure was used, which identifies how many correct forecasts the model makes, where the term correctness is defined by the user. This can be done by defining correct forecasts as values within a percentage range of the actual load. However. for low loads, a percentage range may become insignificant. For a load of 0.1 kWh, a 15% range would be 0.085-0.115, and a forecast of 0.2 kWh will be considered wrong, but in practice, such a forecast would be acceptable. To address this false loss of accuracy, we set two scales to measure the accuracy. In this study, we set a 15% range of error for accuracy, but if the load was smaller than 1 kWh, then we considered the range of ±0.15 kWh as the range of acceptable forecasts. Therefore, accuracy for hour i was given as:

Predictors
In this research, we focused on forecasting the electricity usage of a particular household for 24 hours ahead. To forecast the load, we constructed a feature vector with attributes as presented in Table 4. The attributes were constructed based on time series with hourly electricity demand. Additionally, other variables were collected, including temperature, humidity, and date. Electricity demand varies over time depending on the time of day (daily cycles), day of the week (weekly cycles), day of the month (monthly cycles), season (seasonal cycles) and occurrence of holidays. Therefore, we enriched the analysis with an additional 76 dummy variables that described the hour (1-24), 31 variables associated with the day of the month, 7 variables associated with the day of the week, 12 variables associated with the month, one variable indicating a holiday and one variable indicating the sunset in a particular hour.
The main variables taken into account in the forecasting process are those derived directly from the time series. The features were created by the decomposition of the time series, and they define, among others, minimum, maximum and actual demand at certain intervals (up to 35 days back).
In addition to the attributes that describe the historical electricity consumption, a set of behavioral features describing the habits of the household was prepared. Those are associated with the use of certain electrical appliances as presented in Table 5. The presented set of attributes describes the household behavior patterns that were discovered during the earlier stages of the research with the method of segmentation and sequence analysis.
The feature vector in Table 5 includes a number of switch on states (activations) collected for individual appliances: Clothes Dryer (Dryer), Clothes Washer (Wash), Dishwasher (Dish), Heat Pump (Heat), and Instant Hot Water Unit (Instant). The data report the appliances' activations taking into account previous hours, previous days, previous weeks and the difference (in hours) between the most recent five successive activations (for each appliance). The features presented in Table 5 are the outcome of segmentation and sequence analysis, and they describe, as widely as possible, the existing dependencies in the data. In particular, they revealed the following relations: • The structures of the devices' profiles over days, weeks and months, which were discovered by analyzing appliances' switch on (activations) events in each hour (see Table 2 and S1 Table), and the outcome of the analysis of the contributions from variables no. 147 to 196, which are associated with the characteristics of a single device.
• The sequential patterns and the time periods between successive activations are reflected in variables no. 197 to 221 as a result of the sequence mining approach (see Paragraph 3.3).

An approach to forecasting
Building predictive models is a task that requires dealing with both huge data volumes and complex algorithms. Therefore, it becomes necessary to have an efficient computing environment with high-performance computers. In our case, all the numerical calculations were performed on computing clusters located at the Interdisciplinary Center for Mathematical and Computational Modelling at the University of Warsaw. The HYDRA engine with Scientific Linux 6 operating system was used with the following nodes and their parameters: • Istanbul-AMD Opteron(tm) 2435 2.6 GHz, 2 CPU x 6 cores, 32 GB RAM.
R-CRAN was used as the computing environment, which is an advanced statistical package, as well as an interpreted programming language that exists on Windows, Unix and MacOS platforms. It is licensed under the GNU GPL and based on the S language.
The starting point for the numerical experiments was the division of the AMPDs dataset into three parts, which corresponded to the training, validation and testing samples with the following proportions.
The main criterion taken into account while learning the models is to gain good generalization of knowledge with the least error. The most commonly used measure to assess the quality of forecasts in the electric power system is the MAPE. Therefore, to find the best parameters for all models and to assure their generalization, the following function was minimized: where MAPE U and MAPE W stand for the training and validation errors, respectively.
In the experiments, the broad set of the machine learning algorithms was tested including artificial neural networks, regression trees, random forest regression, k-nearest neighbors regression and support vector regression. In the following discussion, the algorithms are briefly introduced along with their settings.
Artificial neural network. An artificial neural network (ANN) is a network that is often used to estimate or approximate functions that can depend on a large number of inputs. In contrast to other machine learning algorithms, they required special preparation of the input data. The vector of continuous variables has been standardized, while the binary variables were converted such that the value of 0 was transformed into -1. Finally, the dependent variable was normalized by zero unitarization. To propose the forecast, the reverse transformation of the dependent variable was applied.
To train the neural networks, we used the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm, which belongs to the broad family of quasi-Newton optimization methods (available in the nnet library). The network had an input layer with 146 and 221 variables, depending on whether the additional variables with usage patterns were considered in the model. In the hidden layer, a different number of neurons was tested, starting from 10 to 50 by 5 (nine sets).
A logistic function was used to activate all of the neurons in the network, and a regularization factor was introduced to penalize weights that were too high in the network (to control overfitting). Three different values of the factor were considered in the experiments: 0.01, 0.1 and 0.5.
Each time, 27 neural networks were learned with various parameters (the number of neurons in the hidden layer multiplied by the number of penalties). To avoid overfitting, after the completion of each learning iteration (with a maximum of 50 iterations), the models were checked for the error measure defined in Eq (11).
At the end, out of 27 learned network, the one characterized by the smallest error defined in formula (11) was chosen as the best for delivering forecasts.
K-Nearest Neighbors regression. The K-Nearest Neighbors algorithm is a non-parametric method used for regression. The input consists of the k-closest training examples in the feature space, and the property of an object is obtained via a similar averaging process, where the value of the object is the average value of the k closest training points.
To improve the algorithm, we used the normalization of the explanatory variables (standardization for quantitative variables and replacement of 0 into -1 for the binary variables). The normalization assures that all dimensions for which the Euclidean distance is calculated have the same importance. Otherwise, it could lead to a situation in which a single dimension would dominate other dimensions.
Regression trees. To train regression trees, an rpart package that implements the CART algorithm was used. The criterion that was minimized in the process of dividing a multidimensional space was the dispersion (variance) around the mean value of the dependent variable for observations belonging to the same node (leaf). At each stage of the splitting node, the variable and its specific value that minimized the sum of squared was chosen. The minimum number of observations in the node was set to 20, and the leaf was set to at least 6 observations, otherwise the node was subject to splitting.
Instead of pruning the tree at the end of the algorithm, we used pruning during the growth of the tree. Generally, this approach will stop the process of creating new splits in case the previous splits provided only a slight increase in predictive accuracy. The complexity parameter (cp) was set from 0 to the value of 0.1 (with an increment value of 0.001), meaning that if any split does not increase the model's overall coefficient of determination by at least cp, then the split is decreed to be, a priori, not worth pursuing. The tree was built up to a depth of 30 levels.
Out of 1.000 regression trees that were tested, the final structure was chosen, based on the error measure defined according to formula (11).

Random regression forests.
To train the random regression forest, an algorithm from the randomForest library was used.
Each time prior to the training, the n-element samples with replacement were drawn, and they accounted for approximately 63% of the population. The samples were used to construct the CART tree. Each tree has been built to its maximum size (without pruning), preventing the occurrence of 5 or fewer observations in a leaf.
A randomized subset of variables was used to construct each tree. The number of variables used in the models varied from 10 to 142 for the base model (without usage pattern variables) and from 10 to 217 for the enhanced model with variables describing the appliances' usage patterns.
The total number of trees in the forest was 500. The final forecast was defined based on Huber's robust estimator, as defined in [39].
Finally, as in previous cases, the best forest structure was chosen, based on the error measure defined in (11). Support vector regression. To construct the support vector regression, the ε-SVR version was used from kernlab library with its SMO algorithm-Sequential Minimal Optimization to solve the quadratic programming problem. The linear function was used as a kernel function. The value of the ε-parameter, which defines the margin width for which the value of the error function is zero, was arbitrarily taken from the following set {0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1}.
The regularized parameter that controls the overfitting, C, was arbitrarily set, and the simulations were run for the following values {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1}. Finally, as in all previous cases, based on the models' results, the model that minimized function (11) was chosen.
The proposed machine learning algorithms were challenged against some typical approaches used for forecasting (benchmarks). Those were naive forecast, random forecast, the ARIMA model and stepwise regression.
Naive forecast. The naive forecast was constructed in the following manner: for the forecasting horizon of 24 hours, the value recorded on a previous day and at the respective hour was taken as the forecast.
Random forecast. The random forecast was constructed in the following way. Given the electricity consumption in a given hour with respect to the day of the week, empirical distribution functions were computed. Then, using a runif function, a value from a uniform distribution between 0 and 1 was drawn (p probability). This value was then used to estimate the quantile of empirical distribution (the final value of the forecast) by the weighted averaging of order statistics z g (quantile function): where γ = np + m − g, n is number of observations, g = floor(np + m) and m = 1 − p. ARIMA model. The third method used to compare was the forecast developed with the ARIMA(p,d,q) model. The model was estimated using the auto.arima function implemented in the forecast library. The function identifies and estimates the model by minimizing the Akaike information criterion.
To estimate the model, the maximum values for the AR and MA order were arbitrary set to p = 14 and q = 14. The degree of differencing was tested with the KPSS test (Kwiatkowski-Phillips-Schmidt-Shin), which was used for testing a null hypothesis that an observable time series is stationary around a deterministic trend Stepwise regression. The last method was stepwise linear regression as an automated tool used to identify a useful subset of predictors. Two procedures were tested: the one that adds the most significant variable (forward selection) and the second one that removes the least significant variable during each step (backward elimination). The forward approach stops when all variables not in the model have p-values that are greater than the specified alpha-to-enter value (5% significance level was used). The backward stops when all variables in the model have p-values that are less than or equal to the specified alpha-to-remove value (5% significance level was used).

The results of numerical experiments
Taking into account the clarity of the presented results, the following notations were introduced for the models: Z 24 -naive forecast, Fg−random forecast, ARIMA-ARIMA model, L_fstepwise regression with forward selection, L_b-stepwise regression with backward elimination, KNN-k-nearest neighbors regression, RPART-regression trees, RF-random regression forests, NNET-artificial neural network, and SVR-support vector regression.
The results of the models for the 24 hour forecasting horizon are presented in Table 6. For the testing sample, the MAPE varied from 56.27% for the random forecast (F g ) to 26.78% for support vector regression (SVR). In terms of the r-MAPE, the highest error were observed for random forecast (F g )-42.66%, while the lowest error was observed for support vector regression (SVR)-25.26%, as previously observed. A small difference between the MAPE and the r-MAPE is observed, suggesting that the models do not exhibit very large errors in terms of both underestimation and overestimation.
The precision of how close the model is able to forecast to the actual load (MSE) revealed that artificial neural networks (NNET) are able to forecast with the least error-0.21. In contrast, the random forecast (F g ) had an MSE of 0.63.
In the case of the accuracy (Acc), which measures how many correct forecasts the model makes, it was observed that the highest accuracy was achieved with support vector regression (SVR)-47.02%. Surprisingly, the second best result was observed for the naive forecast (Z 24 )-45.83%. The least precise forecast was obtained with the ARIMA model-an accuracy of only 20.83%.
In general, the proposed machine learning methods (except regression trees-RPART) in comparison to benchmarking methods had smaller MSE, MAPE, and r-MAPE and higher accuracy rates.
The same techniques were applied to the enhanced dataset including the behavioral variables. The results are summarized in Table 7.  Electricity forecasting on the individual household level enhanced based on activity patterns

MAPE (%) r_MAPE (%) Acc (%) MSE MAPE (%) r_MAPE (%) Acc (%) MSE MAPE (%) r_MAPE (%) Acc (%) MSE
To identify situations in which additional explanatory (behavioral) variables describing electricity usage patterns improved the final forecast, we introduced a percentage point sensitivity range, denoted with the following colors: • green shows forecast improvements in terms of Acc and MAPE/r-MAPE when building the model using the enhanced features dataset (with usage pattern variables), e.g., for the model that has 20% error, the improvement should be at least 0.5 p.p. so the model's error should be less than 19.5%; • red shows forecast worsening in terms of Acc and MAPE/r-MAPE when building the model using the enhanced features dataset (with usage pattern variables), e.g., for the model that has 20% error, the error increase should be at least 0.5 p.p. so the model's error should be greater than 20.5%; • no color shows neutral cases in which Acc and MAPE/r-MAPE stayed at similar levels, e.g., for the model that has 20% error, we define 1 p.p. range (19.5%-20.5%) to say that no improvement is observed when building the model using enhanced features dataset (with usage pattern variables).
In comparison to the modeling without behavioral pattern variables, the results associated with the testing dataset are better for two machine learning algorithms, i.e., the random forests and artificial neural networks, and for the stepwise regression model. In the case of the artificial neural network model, the improvement was substantial: the MAPE decreased by 6.66 p.p. The two techniques with the most accurate forecasts are presented in graphical form in Fig  7. For this reason, the real value and the forecasts provided by the neural networks and random forests were presented for two randomly selected test days-May 7th and 8th. 2013.
The following notations were used: NNET_1 and RF_1 indicate forecasts prepared using the dataset with past electricity consumption variables only, NNET_2 and RF_2 indicate forecasts build on the enhanced dataset including the household behavioral data.  For the neural networks (NNET_1) and the random forest regression (RF_1), which were trained on the dataset with a limited number of variables, it was observed that the forecasts were higher than the actual load value. Adding behavioral variables resulted in decreases in the error. However, overestimation is eliminated, but at the same time, there are some problems with underestimation. In particular, this is observed when relatively large energy loads occur.

MAPE (%) r_MAPE (%) Acc (%) MSE MAPE (%) r_MAPE (%) Acc (%) MSE MAPE (%) r_MAPE (%) Acc (%) MSE
In general, from the figure we can observe how the forecasts follow the real load curve. The trend is followed well enough, but as expected, due to household behavior and other immeasurable influences, there are some deviations when comparing the forecasts and the real load.

Scalability of the approach
To draw meaningful conclusions and to provide rationale for the proposed approach, additional experiments were undertaken. For this purpose, the numerical analyses were performed for the group of 46 households based on the WikiEnergy data [7]. This enabled us to formulate generalized conclusions about the modeling techniques and household specific behavioral data in terms of the applicability of both for the benefit of accurate forecasting modeling.
The WikiEnergy dataset by Pecan Street Inc. is a large database of consumer energy information. This database is highly granular, including the usage measurements collected from up to 24 circuits within the home. The investigated households were located in Austin, Texas, USA. From these data, we have extracted 14 months of data from 46 households at a granularity of 1 hour, covering the same time window from March 2013 until April 2014.
The dataset was split into three parts, which corresponded to the training, validation and testing samples with the following proportion. The same feature vectors describing basic electricity consumption and an enhanced dataset with usage patterns were prepared, and the same techniques were trained, as previously for AMPDs data.
The aggregated results for all 46 households are presented in Table 8. This is to check whether the enhanced dataset, including the usage patterns, improved the forecasts in terms of the MAPE observed on the test dataset for each of the modeling methods. The main finding is that the neural networks forecasts prepared on the enhanced dataset (including usage patterns) were more accurate than those prepared using the basic data set with only historical usage variables.
The neural network outperformed the other methods in terms of MAPE. The forecast proposed for 82.61% of the households (that is 38 households out of 46) had the lowest error, RPART was the second most accurate forecasting method, which delivered accurate forecasts for 39.13% of the households (that is 18 households out of 46).
The MAPE distributions were also analyzed as presented in Fig 8. The forecasts prepared based on the dataset with past electricity consumption variables only are shown with the solid line (e.g., NNET_1, SVR_1). The forecasts build on the enhanced datasets including household behavioral data are presented with dotted lines (e.g., NNET_2 and SVR_2). In general, it could be observed that the machine learning models exhibit smaller errors than the benchmarking methods.
Finally, to provide a quantitative summary of the experiments, we tested the following: 1. Whether there are statistically significant differences in terms of the errors between the forecasting algorithms, including the machine learning algorithms and the benchmarking methods-please refer to Table 9 for details. The Duncan multiple range test (DMRT) was used to compare differences among the error means between the modeling techniques and across datasets (basic consumption variables vs. enhanced dataset). Differences were found to be significant at P < 0.05.
2. Whether there are statistically significant differences in terms of the MAPE for the neural networks (as the most accurate forecasting technique) with respect to the data, i.e., the basic dataset vs. the enhanced dataset with household activity patterns. Please refer to Table 10 for details. The Kolmogorov-Smirnov (K-S) test was used to compare two empirical error distributions with respect to the data sets. The differences were found significant at P < 0.05. The results presented in Table 9 indicate that there are significant differences among the forecasting results from the machine learning techniques (capital letter D), namely, SVR, NNET, RF, RPART and the benchmarking methods (capital letter A), namely, Z 24 and Fg, observed when using the basic consumption dataset. No differences are observed between the results provided by L_f, L_b, KNN, RPART, RF and NNET (capital letter C). For the enhanced Electricity forecasting on the individual household level enhanced based on activity patterns dataset, the differences are less diverse, e.g., the NNET results are significantly different from Z 24 , Fg and the ARIMA results and, at the same time, not significantly different from those obtained with RF, RPART, KNN, L_B, and L_f (capital letter D). The analysis to determine the significance of the forecasting results between the basic dataset vs. the enhanced dataset for the neural networks (as the most accurate forecasting technique) revealed that a significant difference is observed for 50% of the households, although an improvement of at least 1 p.p. in terms of the MAPE was observed for 82.61% of the households (in green), as presented in Table 10. In three cases, the forecasts prepared using the enhanced dataset resulted in less accurate forecasts (in red); however, the differences were not significant. The richer dataset helped to reduce the MAPE by 8.5% on average, that is, from 45.5% to 41.7%, calculated over the whole population of 46 households.

Conclusions
In this paper, we presented an extensive analysis aimed at forecasting electricity loads on the individual household level, which potentially brings greater intelligence to smart meters and delivers value added for individual customers. The experiments were designed to find answers to research questions concerning the forecasting loads for individual customers. In particular, the findings are as follows: 1. Based on the results, we can conclude that it is possible to provide accurate load forecasting for 24 hours ahead on the individual household level, and this can be obtained with reasonable prediction accuracy. The forecasts of the neural network models show that they have good performance characterized by low errors obtained on both datasets, i.e., a basic one with past usage data and a richer dataset with usage patterns. Based on the AMPDs data, the richer dataset helped to reduce the MAPE from 30.28% to 23.62% and, on average, from 45.5% to 41.7% for the WikiEnergy data.
2. The clustering and sequence recognition algorithms are good tools for identifying patterns of household behavior. They allowed quickly grasping general trends in data and then clustering appliances based on their typical usage hours. The data obtained by grade analysis might be the basis for the decision support for individual customers to consider the price elasticity tariffs and for accurate forecasting in smart metering applications. Sequence analysis gave insight that can help understand how power consumption is influenced by certain activities and their sequences and how those activities are related to each other.
3. We showed through experiments that a combination of historical usage data and household behavioral data can greatly enhance the forecasting of individual consumer loads.
The richer data set can reduce MAPE by 8.5% on average and up to 41% for individual households (e.g., household #5 from the WikiEnergy dataset) as observed on the test set for the neural network model.

4.
The results indicate that there are significant differences in forecasting in favor of the machine learning techniques, namely, SVR, NNET, RF, and RPART, in comparison to random forecasts or ARIMA models. In particular, artificial neural networks through their hidden layers and ability to approximate complex nonlinear mappings directly from the input samples seem to be very effective at solving short-term forecasting when dealing with high volatility data. They are able to identify hidden trends and make use of richer data, thereby finding the trends in household consumption data.
As a future work, we will explore algorithmic approaches for mining typical usage patterns and utilize them for the purpose of energy consumption forecasting and the development of unique, individualized energy management strategies. Additionally, considerable interest and high expectations worldwide are associated with attempts to combine research on forecasting systems utilizing non-intrusive appliance recognition and user patterns with multi agent systems [40][41][42]. Such a multi-agent computer system can be used for managing the unbalanced energy in a microgrid, and the main goal of the system would be to control and minimize the differences between the current energy demand and the actual energy supply.
Supporting information S1 Table. The matrix with the probabilities of appliance turn ON events in each hour over whole week. (XLSX)