Local Polynomial Estimation of Heteroscedasticity in a Multivariate Linear Regression Model and Its Applications in Economics

Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.


Introduction
The heteroscedasticity in classical linear regression model is defined by the variances of random items and which are not the same for different explanatory variables and observations [1,2]. Especially the heteroscedasticity is usually inevitable when we study the cross-sectional data [3][4][5][6]. When there is a heteroscedasticity in a linear regression model, estimations of parameters we obtained by ordinary least squares estimation (OLS) are still linear and unbiased [7][8][9][10]. However, the efficiency is bad [11,12]. This could lead to a uncorrect statistical diagnosis for the parameters' significance test. Similarly, it unnecessarily enlarges the confidence interval when we estimate the parameter interval. Besides, the accuracy of predictive value may lower when we estimate with the regression model that we obtained. In order to solve the problem above, we can use generalized least squares estimation (GLS) when the covariance matrix of the random items is known. If it is unknown, we usually use two-stage least squares estimate. In other words, we first estimate variances of the residual error, and then the generalized least squares estimator is used to obtain the coefficients of the model by using the estimate of variances of the random items. However, the traditional estimation method is that we suppose the residual error variances as a certain parametric model. So estimations we obtained are always inaccurate. For a heteroscedasticity model which has only one explanatory variable, we have discussed in detail and given a rigorous proof in [13]. In this paper, we try to apply multivariate local polynomial fitting to random item variances as the first step, and then generalized least squares estimation is used to estimate the coefficients of the model. On the one hand, because of local polynomial fitting's various nice statistical properties, the estimations obtained with this technology also have the same good statistical properties [14]. On the other hand, we exploit a heteroscedastic regression model rather than the artificial structure of heteroscedasticity. Then, we can directly get the heteroscedastic function based on the nonparametric technique, which shows the relationship between variance function of random items and explanatory variables from regression results [15]. Thus, it is unnecessary to test heteroscedasticity of the model. Particularly, the estimated value by multivariate local polynomial fitting are more accurate than that by the traditional method and univariate local polynomial fitting.
The rest of this paper is organized as follows. The multivariate local polynomial estimation adopted is explained in detail in Section Methods [16][17][18]: in its the first Subsection we study multivariate estimation with local polynomial fitting. The second subsection contains selections of parameters. Estimations of model coefficients and heteroscedastic function are presented in the third subsection Parametric estimations with multivariate local polynomial regression. In Sections Results and Discussion, we firstly give two models, do the simulations, then collect some real data, and use the local polynomial estimating the coefficients, respectively. Finally, Section Conclusions conclude.

Methods
Multivariate local polynomial fitting is an attractive method both from theoretical and practical point of view. It possesses a smaller mean squared error than that of classical kernel Nadaraya-Watson estimator which leads to an undesirable form of the bias and the Gasser-Muller estimator which has to pay a price in variance when dealing with a random design model. It also possesses other advantages [19,20]. In this Section, we briefly outline the idea of the extension of multivariate local polynomial fitting to multivariate linear regression.

Multivariate Estimator with Local Polynomial Fitting
We treat the m-dimension estimation problem where the measured data Y at the position X~½x 1 ,x 2 , Á Á Á ,x m T is given by where f ( : ) is the regression function to be estimate, E(e)~0, Var(e)~I n (I n denoting the identity matrix of order n), and X and e are independent. We always denote the conditional variance of Y given X~X T by s 2 (X T ) and the marginal density of X , that is, the design density, by f ( : ). While the specific form of f ( : ) may remain unspecific, if we assume that the (pz1) th derivative of f (X ) at the point X T exsits, then in order to estimate the value at this point, we can rely on a generic local expansion of the function about this point.
Specifically, for X in a neighborhood of X T , a p-term Taylor expansion gives, where Given the series fX i ,Y i g n i~1 , this polynomial is fitted locally by a weighted least squares regression problem: minimize.
where M is a bandwidth matrix controlling the size of the local neighborhood and K( : ) is the kernel function which penalizes both geometric and radiometric distances, where K M ( : ) is defined as [21] Denote byĥ h k (X ) the solution to the least squares problem (3)). It is clear from the Taylor expansion in (2) that It is more convenient to work with matrix notation. For the weighted least squared problem, a matrix form can be depicted by where and We then have the least squared solution with multivariate local polynomial fitting [22].
where { À Á denotes pseudo-inverse, or when X x T WX x is inverse, the estimation can be written bŷ Then, we can get the estimationf f X T ð Þ, where E 1 is a column vector (the same size of a in (5)) with the first element equal to 1, and the rest equal to zero, that is, Computing theâ a will suffer from large computational cost. We can use the recursive least squared method to reduce the computation complexity, and it is very powerful especially in the local polynomial fitting problems. There are several important    issues about the bandwidth, the order of multivariate local polynomial function and the kernel function which have to be discussed. The three problems will be presented in following subsection.

Parameters Selections
For the multivariate local polynomial estimator, there are three important problems which have significant influence to the estimation accuracy and computational complexity [23].
First of all, there is the choice of the bandwidth matrix, which plays a rather crucial role. The bandwidth matrix M is taken to be a diagonal matrix. For simplification, the bandwidth matrix is designed into M~hI m . Therefore, the most important thing is to find the bandwidth h [24,25]. In theory, there exists a optimal bandwidth h opt in the meaning of mean integrated square error(MISE), fulfilling the equality However, the theoretical bandwidth h opt in formula (23) can not be directly calculated. Here, we propose a search method to select the bandwidth: Compare values of the objective function as the bandwidth h from small to large, and then find out the optimal bandwidth which minimize the objective function.
Suppose that h l~K h min , where h min is the minimum, K is coefficient of expansion. We search a bandwidth h to minimize the objective function in the interval ½h min ,h max , where the objective function refers to the prediction mean square error (MSE), denoted by E MS (h).
Firstly, we assume h~h min , then increase h by efficient of expansion K and calculate value of objective function for each h. Stop down when hwh max , and choose a bandwidth h which minimizes E MS (h) as the approximate optimal bandwidth. E MS (h) can be taken place by a estimation In this paper, we choose where h m~m ax jjX i {X j jj,K~1:1: Compared with other methods, this method is more convenient.    In order to closer to the ideal optimal bandwidth, we search once again by narrowing the interval on the basis of the above searching process. Supposing j is the bandwidth which make K j h min optimal in the above searching process. Now, divide the small interval ½K j{1 h min ,K jz1 h min into n equal intervals. Supposing among these n{1 bandwidths, the approximate optimal bandwidth is the one that makes e MS minimize.Obviously, this search method can quickly select the right bandwidth. Another issue in multivariate local polynomial fitting is the choice of the order of the polynomial. For a given bandwidth h, a large value of p would expectedly reduce the modeling bias, but would cause a large variance and a considerable computational cost. Since the bandwidth is used to control the modeling complexity, and due to the sparsity of local data in multidimensional space, a higher-order polynomial is rarely used. So we apply the local quadratic regression to fit the model (that is to say, p~2).

Parametric Estimations with Multivariate Local Polynomial Regression
Let the dependent variable x and the explanatory variable y fulfill the following regression model £u.
where y i are the observations and x 1i , Á Á Á ,x mi are independent variables. Denote . .
Therefore, the equation (15) can be abbreviated as Suppose that
or it can be written as where x i~1 ,x 1i ,x 2i , Á Á Á ,x mi ð Þ T , i~1,2, Á Á Á ,n: If covariance matrix S of e is known, formula (20) can be estimated. Equation (20) is considered as the weighted least squares estimate of b and it possesses nice qualities. However, in many practical situations, the form of S is unknown. Therefore, the so-called two-stage method of estimation is used to solve the heteroscedasticity problem. It can be depicted as follows: first, apply multivariate local polynomial fitting to get the estimateŝ s 2 i of s 2 i , and then we can obtain the estimateb b of b by using equation (20). The estimator follows that Because of E(s(X )e i jX ) 2~s i 2 , we construct the following regression model in order to estimate s 2 i , where u i is the difference between (s(X )e i ) 2 and its expectation.
Suppose that b~X x T X x À Á {1 X x T y is the OLS of model (18). Although the ordinary least squares estimate b is ineffective, it is still consistent. Therefore, the corresponding residuals hold that Consequently, we can approximately get It can be taken as a regression model, in which the variance function is regression function and the squared residuals e 2 i are dependent variables. In order to estimate this model, parameter estimation method would usually be taken in some articles. In other words, they suppose s 2 i~f (c T ,x 1i , Á Á Á :x mi ), where the form of f is known and c~c 0 ,c 1 , Á Á Á c n ð Þ T are the parameters to be estimated. Note that what we usually discuss about more and more detail are s 2 i~s 2 c T ,x 1i , Á Á Á ,x mi ð Þ , s i 2~s2 exp c T ,x 1i , Á Á Á x mi ð Þ and so on [26]. However, the discussion of these models requires that the analyst have a better understanding for the background in practical problems. As an example, variance of corporate profits is often in direct proportion with family income. Since the variance function must be non-negative, a non-parametric method is proposed to fit s 2 i . This method can be depicted as follows. Then,   the p-order local polynomial estimate of the variances function s 2 x 1i , Á Á Á ,x mi ð Þis according to formula (2). Using the least squares method for the data around the local window, we can estimate the local intercept via minimizing Furthermore, the solution vector can be written as where the weighted matrix the the design matrix X xx~1 and e 2~e 1 2 ,e 2 2 , Á Á Á ,e n 2 À Á T . Consequently, the estimated variance function isŝ s 2 X 1i , Á Á Á ,X mi ð ÞE 1 a(X T ). Finally, we can get twostage estimateb b of b by substituting estimateŝ s 2 i of s 2 i into equation (21).

Simulations and Analysis
In this section, we give the following model to discuss the qualities ofb b under the limited sample. Considering the practical background which is applied to economics, we assume the variance function of the following two forms. where x 1i and x 2i i~1,2, Á Á Á ,n ð Þ are independent variables. Here, we assume the variance function of the error term s 2 x 1 ,x 2 ð Þ~exp ({(x 1 2 zx 2 2 )).
Step 1: Firstly, obtain the estimation of two coefficients in model (28) with the ordinary least squares estimation. Second, calculate the squares of the residuals e 2 . Third, do the local polynomial regression on the model (24) Figure 1 is the result after 10000 replicates, where n~20, p~2, and the kernel function is given by.
In the kernel function, the ranges of x 1 and the same with x 2 are ½{4,5, and the optimal bandwidth matrix is That is MISE~0:00179.
Minimizing the asymptotic MISE with respect to the bandwidth parameter h results in a bandwidth, called the asymptotically optimal bandwidth or simply optimal bandwidth. So we can obtain the optimal bandwidth matrix M. Figure 2 shows the plot which generates from the variance function.
Þ~exp ({(x 1 zx 2 )): Figure 3 indicates the variance of the error term function and the fitting functions drawn in the same situation. Besides, we compare some fitted value with the true value in Table 1. It is worth pointing out that our fitting is very close to the real value from Figures 1, 2, 3 and Table 1.
Step 2: We substituteŝ s i 2 which obtained from step 1 into model (20), then we will getb b. Figures 4, 5, 6 depict the histograms and asymptotic distributions ofb b 0 ,b b 1 , andb b 2 , respectively, which we do 10000 replicates with GLS and choose n~20. It is easy to see from Figures 4, 5, 6 that the estimated distributions of parameters subject to normal asymptotically, which the proofs are in [27][28][29][30]. Further, we give normal Q-Q plots of residuals of estimated parameters in Figures 7,8,9. As illustrated in Figures 10, 11, 12, the histograms and asymptotic distributions ofb b 0 ,b b 1 , andb b 2 can be get by using OLS, respectively. Similarly, the normal Q-Q plots of residuals are depicted in Figures 13, 14, 15. However, the results are obviously unsatisfying in Figures 10,11,12,13,14,15. Besides, the fitted value and true value of GLS and ones of OLS are listed in Table 2. Therefore, we can conclude that our fitting is very perfect. Here, we define the relative error as Re~ĵ j{j j |100%, where j is a true value, andĵ j is a fitted value. So the GLS estimates of parameters are much better than OLS according to Table 2. Model 2. In formula (28), we assume the error term function is .and the optimal bandwidth matrix is M~0 :95 0 0 0:95 , where MISE~0:01748. The process of parameter's estimation and calculation is the same with Model 1. Also we choose p~2, the ranges of x 1 and the same with x 2 are ½{4,5. Besides, the kernel function is still defined as equation (29). We do a total of 10000 replicates. Figure 16 shows the estimated variance function s 2 x 1i , Á Á Á ,x mi ð Þ , where the selection of optimal bandwidth matrix is similar with that in step 1. Figure 17 shows the plot which generates from the variance function  Table 3. In order to obtain a visual effect, we calculate the relative errors and do a comparison. It is easy to see that the GLS estimates of parameters are much better than OLS from Table 3.

Application
As an example of pure cross-sectional data with potential for heteroscedasticity, consider the data given in Table 4, which gives data on per capita consumption expenditure (PCCE), agricultural business income (ABI) and other income (OI) for 31 different regions in China in 2009. The cross-sectional data presented in this table are quite heterogenous in a regression of PCCE on ABI and OI, so heteroscedasticity is likely.
We hope to understand the relationship of PCCE, ABI and OI, then the regression function is as follows: y i~b0 zb 1 x 1i zb 2 x 2i zs(x 1i ,x 2i )e i : . Firstly, the ordinary least squares estimators can be obtained, say,b b O0~3 56:4174,b b O1~0 :4282,b b O2~0 :6931. Then, inspect that if there is a heteroscedasticity. According to formula (23), the residual squares can be considered as variances. So we need only to fit residual squares, seeing Figure 31, where residual squares are variance squares. Obviously, there are biases among the residual squares fitted. Therefore, it can be concluded that there is a heteroscedasticity. Furthermore, according to the two-stage estimate, the generalized least squares estimation for b G0 , b G1 , and b G2 can be estimated after 10000 replicates provided with the bandwidth matrix M~0 : The scatter of original and fitting data is also given, see Figure 32.

Conclusions
In this paper £ -| we presented a new method for estimation of multivariate linear heteroscedastic regression model based on multivariate local polynomial estimation with non-parametric technique. The proposed scheme firstly adopted the local polynomial fitting to estimate heteroscedastic function, then the coefficients of regression model are obtained based on generalized least squares method. Our approach avoided the test of heteroscedasticity for the multivariate linear model. Due to nonparametric technique of local polynomial estimation, if the heteroscedastic function is unknown, the precision of estimation was improved. Furthermore, the asymptotic normality of parameters was verified by the results of numerical simulations normal Q-Q plots. Finally, the simulation results and local polynomial estimation of real data really indicated that our approach was effective in finite-sample situations, which did not need to assume the form of heteroscedastic function. The presented algorithm could be easily used to heteroscedastic regression model in some practical problems.