Stochastic and Statistical Analysis of Utility Revenues and Weather Data Analysis for Consumer Demand Estimation in Smart Grids

In smart grid paradigm, the consumer demands are random and time-dependent, owning towards stochastic probabilities. The stochastically varying consumer demands have put the policy makers and supplying agencies in a demanding position for optimal generation management. The utility revenue functions are highly dependent on the consumer deterministic stochastic demand models. The sudden drifts in weather parameters effects the living standards of the consumers that in turn influence the power demands. Considering above, we analyzed stochastically and statistically the effect of random consumer demands on the fixed and variable revenues of the electrical utilities. Our work presented the Multi-Variate Gaussian Distribution Function (MVGDF) probabilistic model of the utility revenues with time-dependent consumer random demands. Moreover, the Gaussian probabilities outcome of the utility revenues is based on the varying consumer n demands data-pattern. Furthermore, Standard Monte Carlo (SMC) simulations are performed that validated the factor of accuracy in the aforesaid probabilistic demand-revenue model. We critically analyzed the effect of weather data parameters on consumer demands using correlation and multi-linear regression schemes. The statistical analysis of consumer demands provided a relationship between dependent (demand) and independent variables (weather data) for utility load management, generation control, and network expansion.


Introduction
For evaluating probabilities of various events in Smart Grids (SGs), statistical analysis plays a fundamental role in stochastic processes [1]. Every time varying event in smart is stochastic, such as consumer demand and utility revenues. Probabilistic models of the utility revenues will estimate present and future outcomes from the random consumer demand. Random distributions, such as the Gaussian Distribution Function (GDF) will predict utility outcomes with various samples of random data. The GDF will help the policy makers to re-shape and modify the current and future utility expansion plans for modeling large scale distribution [1].
The utility costs, such as running costs, maintenance costs, operational costs, and wages and salaries of the crews have put a limit on the net utility revenues (fixed and variable) [2]. The fluctuating load curves, seasonal variations, weather drifts, and living standards of the consumers have resulted in stochastic energy demands [3], [4], [5]. Energy consumption in each interconnected area of the SG is time-variant. Utility demand curve flattening during off-peak period, mid-peak period, and peak periods is still a big issue of the electrical utilities [6], [7]. In the aforementioned scenario, probabilistic analysis of demand-revenue model is challenging task for the optimized utility outcomes.
The stochastic load growth in the smart grid system will result in various issues, such as demand-supply miss-match, voltage instability, transmission line losses, and blackouts [8], [9]. The steady-state performance, stability, and control of the SG will be affected by the above stated issues. In deregulated energy market, the revenue of the energy supplying agencies (utilities) is highly dependent on the consumer's participation in the energy demand-response programs [10]. The reliability and quality of energy service will be degraded, resulting in less customer participation and less revenue generation [11]. Lack of past, present, and future probabilistic demand information will prevent utilities from intelligent demand-supply management and maximization of profits. The drifts in climate effects the life-styles of the consumers that influence their living standards. This variation indirectly effects the consumer demands [12], [13], [14]. Therefore, a model interpreting load relationship with weather parameters will reflect the aforesaid dependency [15].
In the light of above, there is a pressing need to develop stochastic smart grid models and statistical analysis of consumer demands. We present in depth analysis of the Multi-Variate Gaussian Distribution Function (MVGDF) model for utility revenues with stochastic consumer demands. The consumer demands are modeled as a stochastically time-dependent processes with various data samples n, such as n = 10 7 , n = 10 8 , and n = 10 9 . Moreover, we also elaborate random classification of the consumer demand and utility revenue for the outcomes of the MVGDF. Furthermore, the probabilistic utility revenues are estimated for present and future grid planning and management. Finally, the relative errors in the estimated models are evaluated for various data samples and statistics of the proposed models are comparatively analyzed. Our work also presents the effect of climatic change on consumer demands and penetration level of weather parameters in consumer load estimations. We believe that our research contribution is more versatile and covers a broad area in the SG stochastic processes and timevariant demand patterns, compared to prior works.
The main contributions of our paper in the light of the above stated issues are: • We present a mathematical MVGDF model for estimating probabilities of the utility revenues with time-varying consumer demands during off-peak period, mid-peak period, and peak periods • Our work describes a detailed statistical and comparative analysis of aforementioned models for various time-variant demand samples, such as n = 10 8 and n = 10 9 • The estimated values are elaborated for estimating the present and future utility revenues, which will help in various utility growth factors, such as utility planning, utility economic and financial developments, and utility network expansion • Relative Errors (REs), Confidence Intervals (CI), and G-Matrix Scatter plots are analyzed in the aforesaid models with a brief discussion on the factor of accuracy in proposed estimations; and • The MVGDF model is validated using Standard Monte Carlo (SMC) simulations for various consumer energy demands; and • Correlation and regression analysis is also presented for elaborating the inter-dependency between weather data parameters and the corresponding relationship for consumer demands estimation.
The remainder of the paper is structured as follows. Section 2 discusses the related work on the stochastic processes of the SG and probability models for the utility revenues. The mathematical model of MVGDF is elaborated in Section 3. SMC Simulations and weather data analysis is described in Section 4. Section 5 concludes the paper with a summary and proposal for future work.

Related Work
One thread of the research focusses on the stochastic models and processes in the SG environment. The stochastic and prediction model for an efficient energy flow to achieve load balancing and minimize fluctuations during demand curve periods is presented in [16]. The unit commitment problem for demand-supply balance with renewable energy resources using hidden markov chains is addressed in [17]. The authors in [18] analyzed the impact of hybrid electric vehicle charging on the solar power grid inter-connected system using intelligent stochastic models. The optimized power consumption model for load scheduling using constrained Markov decision process is described in [19]. Although the aforementioned stochastic schemes presented optimized models for power consumption and demand-supply balance, they are unable to analyzed time varying consumer demands, which we proposed in our MVGDF model. Moreover, such schemes are unable to define the stochastic impact of power consumption on electrical utilities, such as revenue generation.
A large body of the research community focusses on the probabilistic load forecasting and demand-response models in SGs. The authors in [20] and [21] proposed stochastic load models using advanced energy metering and consumer's appliances. A similar schemes presented in [22] using probabilistic demand-response for consumer's load management. The authors in [23] formulated robust uncertainty model in SG, while generation expansion and planning schemes are proposed in [24]. The aforesaid methodologies described uncertainty in SG model and design but are not scalable enough to simultaneously predict consumer's demand. Moreover, they suffer from comparative statistical analysis for SG modeling. Furthermore, the above schemes were unable to elaborate the impact of forecasted load on the utilities revenue.
Apart from modeling uncertainty in consumer's load, few existing techniques were based on probabilistic models for the revenue maximization and estimation. Day-a-head pricing model for revenue maximization is described in [25], while a similar work using agent-based SG planning is presented in [26]. An optimization problem for increasing revenues from renewable energy resources with dynamic weather conditions is addressed in [27]. The authors in [28] discussed an optimal way of vehicle-to-grid charging and parking for increasing the utility revenues. The above mentioned schemes unable to evaluate the statistical analysis for probabilistic estimation. Consequently, we incorporate detailed statistical analysis and the effects of random demand inputs on the utility outcomes.
Most of the aforementioned approaches either focus on load forecasting or revenue maximization using renewable energy resources. These schemes do not thoroughly investigate and analyze the statistical behavior of consumer's random demand on the utility revenues. Moreover, the forecasted models are unable to estimate the relative errors in their predictions and probability estimates. Furthermore, the probabilistic models are not validated using distribution density functions, such as GDF. Consequently, our work provides a thorough treatment to the problem at hand, with a complete theoretical and simulation validation.

Problem Formulation
In SG systems, randomly varying consumer demand is a stochastic problem for the optimal generation of the electrical utility revenues. The conceptual picture of a stochastic system is presented in Fig 1. Electrical utilities are facing a challenging problem of revenue maximization with randomly varying daily load curve, monthly load curve, and yearly load curve. The running, operational, and maintenance costs of the electrical grid with wages and salaries of electrical crews have put a limit on the resulting profit of the supplying agencies. The probabilistic revenue function is a random process dependent on the time varying load patterns. Thus, a revenue function is a time function with stochastic experiment of varying consumer demands.
The sample space is described by three random samples, namely S 1 , S 2 , and S 3 . The outcomes of the random experiment vary stochastically. The sample function of each random output is defined as: X (t, S 1 ), X (t, S 2 ), and X (t, S 3 ). With varying dynamics of the SG, a stochastic process will lend itself to averages estimation. The probability models with n number of samples will result in an optimized outcome with random and time dependent inputs. The electrical utilities can predict from the probabilistic models the present and future outcomes from time varying load patterns. Moreover, relative errors in actual and estimated values will reflect the factor of accuracy in proposed model. Furthermore, the Gaussian Probability Density Function (GPDF) with maximum data samples will produce the statistical results of the utility revenues. Finally, numerical and graphical results will provide complete analytical overview of the utilities revenues.

Stochastic Analysis of the MVGDF
The normal random variables are referred as Gaussian random variables. The MVGDF is a mathematical model for n number of variables exhibiting a Gaussian Property (GP). The GP illustrates that all the random variables possesses the probability density function to be strictly Gaussian. The set that presents all MVGDFs are jointly Gaussian. The MVGDF is presented considering the consumer demands to be strictly Gaussian. The estimated utility revenues are also Gaussian. Let R b be a Gaussian (μ,σ) Random Variable (GRV) with Probability Density Function (PDF) of X described as: The constraints of b R x ðxÞ are defined as The standard normal Cumulative Density Function (CDF) of random variable Z is defined as: Let X be Gaussian (μ X ,C X ) random vector with expected value μ X and covariance C X provided that f X (x) is given as: Suppose the probability function P b j , sample variance s 2 , and indicator function I j are used for estimating the utility revenues. Let j be the number of bins for each histogram evaluation. The above random functions are described as: In Eq (5), 2 R is the relative error of the probability estimate. The term s 2 P b j will provide the estimated (absolute) error in the probability model. Let us assume that the data samples X 1 ,X 2 ,. . .,X n are independent and identically distributed (IID). We assume m intervals called histogram bins, which are defined as: For preventing any data loss, edge sequence is used when max i X i = e m+1 . The interval [e m , e m+1 ] is preferred in edge sequence estimation. The histogram count for bin j is defined as: The value of the sample mean M n converges to the population mean when n!/. The Confidence Interval (CI) is the difference between the random variable and the expected value. The CI is a random set P(m 2 [M n −δ,M n +δ]) = 1−α. In probability theory, the term (1−α) is called a Confidence Level (CL) and [M n −δ,M n +δ] is called as CI. The CL is probability for a sample value of the random variable to be present within the CI. In practical applications, we can write m = M n ± δ with 100(1−α)% probability. In confidence interval, the term δ corresponds to Theorem. Let X be a Gaussian Random Variable (GRV)(μ,σ). The elements of the set defined as: Proof. For verifying the above mentioned statement, we consider X to be strictly Gaussian. The elements of X are exhibiting normal distribution and IID.
Definition. A real R d -valued random variable X is multi-variant normal or Gaussian if for every vector t 2 R d the real valued random variable t.X is normal.
Definition. Let G be a group with a σ-field F, such that group operation x,y!x+y is a measurable transformation (G×G,FF)!(G,F). Let (O,M,P) be a probability space. A measurable function X:(O,M)!(G,F) is called a G-valued random variable and the distribution is termed as a probability measure on G.
For any constant c > 0, :, P½jM n ðXÞ À m X j ! c V ar ½X nc 2 ¼ a; :, P½jM n ðXÞ À m X j < c ! 1 À V ar ½X P½ðM n ðXÞ À cÞ m ðM n ðXÞ þ cÞ ¼ P½ðm X À cÞ M n ðXÞ ðm X þ cÞ ¼ P½Àc ðM n ðXÞ À m X Þ c; ¼ P½ðM n ðXÞ À cÞ m ðM n ðXÞ þ cÞ; The stochastic random demand D(t) is categorized as: (a) fixed demand D F (t) and (b) variable demand D V (t). Fixed demand is the energy consumption of the fixed loads, such as lights, bulbs, and constant energy consumption loads. Variable demand is the energy consumption of the variable loads, such as washing machine, electric car charging, and electric cooling system.
The utility revenue R is the random function of consumer's demand D and price p of electricity. The total demand model is summarized as: ðD F t ðtÞ; . . . ; D F n ðtÞÞ; 8D F ðtÞ > 0 " # ; D F min ðtÞ D F ðtÞ D F max ðtÞ; The total revenue function R(t) is classified as: (a) fixed revenue R F (t) and (b) variable revenue R V (t). Fixed revenue model, variable revenue model, and revenue constraints are described as: The above mentioned stochastic model is modelled as MVGDF. The consumer demands are time-dependent random functions. The time-varying energy consumption is the sample outcome of the random events. In SG, consumer demands vary as a GDF and the outcomes of the Gaussian Event (GE) will also be a Gaussian.

Stochastic Analysis
To validate the MVGDF, consumer demands data are collected from the local grid station [29], [30]. The data set included consumer demands, weather parameters, such as temperature, humidity, precipitation, and generation capacity of the utility. The consumer demand is taken as a random input and outcomes of the numerical simulations are: (a) fixed revenue R F (t) and (b) variable revenue R v (t), probabilities of R F (t) and R v (t), and Relative Errors (REs) in the proposed models. The random inputs of the data samples, such as 10 7 , 10 8 , and 10 9 samples obeys GDF characteristics. Utility revenues, probability estimates, and REs are evaluated in each set of data sample. For optimized utility revenues, SMC simulations are performed using MATLAB. Moreover, complete statistics of MVGDF R F (t) and R v (t) is elaborated using statistical analysis of the outcomes from the numerical simulations. Furthermore, Confidence Intervals (CIs) and REs in the aforementioned stochastic model is also evaluated for various data samples. and R v (t) for n = 10 8 data samples. The probability density function increases for R F (t) and R v (t), compared to the 10 7 data samples cases.
Figs 6 and 7 are the estimates for n = 10 9 data samples of energy demand. REs are reduced and probability factor of R F (t) and R v (t) has increased further. Figs 6 and 7 reflect higher probability densities and minimum REs for n = 10 9 random samples. Fig 7 indicate the least RE and highest density in the probability estimate, compare to above models. From probability theory, the REs will approach zero when input data samples are taken close to infinity. The factor of accuracy in the probability estimates also increases likewise. Figs 8 and 9 elaborate the analysis of multi-variant data using two-dimensional matrix scatter plots. The matrix scatter plots are three outcomes generated from one predictor variable. In    Table 1 presents the monthly utility load consumptions of the local grid station [29], [30],. The Fixed Load (FL) and Variable Load (VL) consumptions are described for residential, commercial, and industrial loads. The energy consumption of each type of load varies, for example, industrial loads have highest energy consumption than commercial and residential loads. Table 2 discusses the Utility Load Curve Periods (ULCPs) during off-peak period, mid-peak period, and peak periods. The energy consumption during peak-period is high than mid-peak periods and off-peak periods. Table 3 and Table 4 elaborate the statistics of R F (t) and R v (t) for various consumer demand samples. Statistical analysis include parameters, such as maximum (max) value, minimum (min) value, Standard Deviation (SD), Variance (V), Co-Variance (CV), and CIs. The CIs values are calculated on 99% probability. Table 5 and Table 6 describe  the statistics of REs in R F (t) and R v (t) in above mentioned models for various data samples. Moreover, REs approaches to a minimum value for n = 10 9 samples. Furthermore, the mean value of RE in the estimated values reaches a very low value in the order 10 −15 . Finally, the variation of the samples in the estimated values is approximately negligible. This factor ensures the high reliability of the estimated values.

Statistical Analysis
From the above mentioned statistical analysis and SMC simulations, present and future estimates of the SG can be calculated and predicted. The important SG parameters for estimation are listed as: • Demand-supply management; • Present and future energy demands based on the past population estimates; • Generation units expansion; • Energy costs estimations, such as fuel costs; • Present and future transmission-line losses estimations based on past observations; • Installment of Distributed Generators (DGs) with the SG system;  • Statistical weather data incorporation for predicting power system controlling parameters, such as power-flows; and • Utility financial and economic development plans.
The stochastic analysis can be utilized for planning, re-shaping, and modifying overall utility characteristics. With such demand and economic estimates, electrical utility will be able to take further steps for predicting various other electrical parameters, such as voltage instability and transmission-line losses. Moreover, the probabilistic estimates will help electrical utility while bidding for demand response programs in deregulated energy market of the SG. Furthermore, consumer's satisfaction will be increased and quality-of-service will be upgraded. Finally, several other predictive models of the SG can be estimated using above methodology.

Correlation Schemes.
In this section, weather parameters are correlated with each other using correlation schemes, such as Pearson (P), Spearman (S), and Kendall (K). The summer weather parameters taken into consideration are Temperature T, Humidity H, and Precipitation P recorded from the local grid station of Pakistan, as shown in Fig 10 [31].
The demand of consumers (power demand) L is a dependent variable, while weather data variables are independent. The objective is to investigate and analyze correlations between dependent and independent variables (T, H, and P). Dependent variable (load) is graphically analyzed in Fig 11. The dependencies of power and current (pu) demand with respect to T are shown in Figs 12 and 13. Similarly, consumer power demand variations with respect to H and P are presented in Figs 14 and 15.     From aforesaid graphical analysis, consumer demands are highly dependent on weather parameters, which directly affect the lifestyles of the people. Among various weather parameters, T and H effects the most than P as compared to consumer demands. Moreover, current increase on transmission lines is obvious due to the increase in T, which assures that weather forecasting directly relates to the consumers demand in smart grid. Furthermore, without forecasting generation capacity, load growth, losses calculations, and optimal revenue estimation will not be astimated optimally.
In the following tables namely, Table 7, Table 8, Table 9 and Table 10, inter-relationships are explored using P, S, and K schemes. The terms V1, V2, V3, and V4 symbolize Load L, Temperature T, Humidity H, and Precipitation P. In Table 7, the effect of L on T, H, and P is investigated using aforesaid schemes. The correlation of V1 with respect to V2, V3, and V4 is presented using R-software. The positive correlations indicate strong association of L with T, H, and P, while negative quantities indicate weak relationships among variable. For example, correlation of V1 with V2 is 60.22% with P scheme, while 56.78% and 40.82% with S and K schemes. Similarly, there is a weak association of V1 with V2 and V3. Following above mentioned methodology, correlations of T with L, H, and P is presented in Table 8. Table 9 and Table 10 discuss the effects of H and P on dependent and independent variables. The relationship of L and T bear a very strong association with H and P. This shows that critical investigation of the parameters will play foremost role in analyzing smart grid systems, load forecasting, demand predictions, and energy estimations. Moreover, among these weather parameters, temperature performs a significant role in analyzing smart dynamical energy systems. Furthermore, in probabilistic utility revenue estimations, load and temperature data must be used for revenue predictions, energy estimations, and demand-supply management.
4.3.2 Multi-Linear Regression System Model. The model for multi-linear regression is described in Eq (13) and Eq (14). The dependent and independent parameters are analyzed using R-platform. This investigation is based on conditional probability distribution of load with respect to weather data parameters. The term Y indicates the dependent variable (load), X is independent weather data parameters, and ε is the error variable. β is the regression coefficient, while B o is the intercept. Using aforementioned model, three varying weather data parameters are analyzed for association with load. In Table 11, the Load L parameter (V1) is varied as 2V1, 0.5 V1 and V1.
The relation of this varying load is analyzed with respect to varying weather data or predictor variables, which are Temperature T (V2), Humidity H (V3), and Precipitation P (V4). This  shows that in each case there exists a strong association of load with varying weather data parameters. This association is 65.19%, which indicates a strong evidence of load forecasting or demand forecasting in smart grid networks using weather data parameters. The multi-regression scatter plots are described in Fig 16. Various test cases of varying dependent and independent variable are analyzed and plotted in R. We see that a very strong association, such as R = 1.0 and R = 60% exists for dependent variable with independent variables. However, there is a weak relation for humidity and precipitation with other dependent and independent parameters. We conclude that stability and steady-state performance is the foremost feature for smart grids. The role of stochastic models and load prediction and estimations are evident from above critical analysis. The weather data parameters influence the living standards of consumers, which in turn effect the demand of electric supply. This shows that probabilistic evaluation and statistical analysis of consumer demands plays a pivotal role in smart grid load monitoring, management, prediction, stability, and control. Moreover, economic and financial developments can be forecasted and predicted using above analysis.

Conclusions and Future Work
Time-varying processes are stochastic in nature, such as consumer's demand in smart grid environment. The probability estimation of the uncertain events is fairer when random  Table 9. H Correlation WRT L, T, and P. samples are close to infinity. We elaborated probabilistic utility revenue functions using GDF and comprehensive statistical analysis. The energy demands of the consumers are modeled as time-variant and random input functions. The probability estimation presented a demand-revenue model for consumer energy demand forecasting and utility revenue estimates. The correlations and regression analysis of weather data analysis is also performed using local grid data and weather data. We concluded that the present and future estimates of utility revenues are highly dependent on the time-varying consumer demands and weather data. The effect of consumer's demand on the utility revenues can be analyzed in various domains of stochastic processes. The utility stochastic analysis can be extended considering all random input variables, such as price of electricity and consumer participation in demand response programs. The MVGDF and probabilistic demand-revenue models can be further evaluated using advanced MC and multi-canonical MC simulations. The aforementioned models can also be described within various inter-connected areas of SG, such as local area SG system and wide area SG system. In near future, load forecasting will be completed using various prediction schemes for short-term and long-term forecasting.