Minimum distance quantile regression for spatial autoregressive panel data models with fixed effects

This paper considers the quantile regression model with individual fixed effects for spatial panel data. Efficient minimum distance quantile regression estimators based on instrumental variable (IV) method are proposed for parameter estimation. The proposed estimator is computational fast compared with the IV-FEQR estimator proposed by Dai et al. (2020). Asymptotic properties of the proposed estimators are also established. Simulations are conducted to study the performance of the proposed method. Finally, we illustrate our methodologies using a cigarettes demand data set.


Introduction
In the last few decades, spatial autoregressive (SAR) models have been studied and applied to many areas such as economics, demography, geography and other scientific areas. Panel data with spatial interaction is also of great interest, as it can control for both heterogeneity and spatial correlation and enable researchers to take into account the dynamics (see, [1][2][3][4][5][6][7]).
Recently, there has been a growing literature on estimating and testing of spatial panel data models. For instance, [7] proposed the maximum likelihood (ML) estimator for the spatial autoregressive (SAR) panel model with both spatial lag and spatial disturbances. Zhang and Shen [8] studied estimation of a semi-parametric varying coefficient spatial panel data models. Dai et al. [9] investigated fixed effects quantile regression for general spatial panel data models with both individual fixed effect and time period effects based on instrumental variable method. Xu and Yang [10] proposed adjusted quasi score (AQS) tests for testing the existence of temporal heterogeneity in slope and spatial parameters in spatial panel data (SPD) models with fixed effects. Bai and Li [11] studied quasi-maximum likelihood estimator of dynamic spatial panel data models with common shocks to deal with both weak and strong cross-sectional correlations. Li and Yang [12] developed M-estimation and inference methods for spatial dynamic panel data models with correlated random effects based on short panels. Zhang et al. [13] studied a penalized quantile regression for spatial panel model with fixed effects. Except the work of [9,13], all these works were developed based on (conditional) mean regression methods. Compared with mean regression methods, the quantile regression (QR) method is more robust and can be adopted to deal with data characterized by different error distributions. However, in contrast to mean regression, there is no general transformation that can suitably eliminate the individual effects in the quantile regression framework (see, [14,15]). Thus the FEQR estimation (see, [16]) is implemented by treating each individual effect also as a parameter to be estimated, which brings the computational difficulties. Hence, the IV-FEQR estimator (i.e., FEQR estimator based on instrumental variable method) used in [9] is also computational cumbersome. To address computational difficulties, [15] proposed the efficient minimum distance quantile regression (MDQR) method. Compared with the FEQR estimator, the MDQR estimator is computationally fast and is easy to implement in practice. The computing advantage is particularly obvious for large cross-sections.
In this paper, we employ the MDQR methodology for estimating the SAR panel data model with individual fixed effects. The instrumental variable (IV) method is employed to attenuate the estimation bias. The asymptotic properties of the IV-MDQR estimator are also developed. Monte Carlo simulations are conducted to assess the finite sample performance of the IV-MDQR, MDQR and IV-FEQR estimators. Computation speeds of IV-MDQR and IV-FEQR are also compared. Finally, We apply our theoretical results for the demand for cigarettes.
The rest of the paper is organized as follows. Section 2 introduces the SAR panel data model with individual fixed effects. Section 3 proposes the IV-MDQR estimation procedure. The asymptotic properties of the IV-MDQR estimators are also discussed. Proofs of the theorems in Sections 3 are given in the Appendix. Section 4 reports a simulation study for assessing the finite sample performance of the proposed estimators. An empirical illustration is considered in Section 5. Section 6 concludes the paper.

The model with individual fixed effects
Consider the following spatial autoregressive panel data (SARP) model with individual fixed effects: where y it is the dependent variable for subject i at time t, X it is a p × 1 vector of explanatory variables, w ij is the (i, j)th element of W, W is an N × N non-stochastic spatial weight matrix reflecting spatial dependence on y it among cross sectional units, and ε it is the random disturbance term. The parameters η i , i = 1, � � �, N are fixed effects for the regions. Interaction effects are reflected in the spatial lag variable P N j¼1 w ij y jt (and associated scalar parameter ρ). We consider the following conditional τ-quantile of response variable: where τ is a quantile in the interval (0, 1), vector with the ith element equal to 1 and the rest equal to 0, Z = 1 T � I N is an NT × N matrix, 1 T is the T × 1 vector with all the elements being 1.
And the FEQR estimator can then be obtained by minimizing the following objection function: ðrðtÞ;βðtÞ;ηðtÞÞ ¼ argmin r;β;η where ρ τ (u) = u(τ − I(u � 0)) is the check function and I(�) is the indicator function (see, e.g., [17]). Galvao and Wang [15] argued that unlike mean regression, the individual effects cannot be suitably eliminated via transformation in the FEQR estimator. Thus the FEQR estimator is implemented by treating each individual effect η i as a parameter to be estimated. Therefore, if the number of the individuals is large, the FEQR estimator will involve optimization with large number of parameters to be estimated, which makes the problem computationally cumbersome. Inference using the FEQR estimator is difficult to conduct in practice. For this reason, we employ the minimum distance quantile regression (MDQR) estimator (see, [15]) for estimation.
Denote θ = (ρ, β > ) > . The MDQR estimation of model (1) can be implemented via the following two steps: Step 1:. Obtain the QR estimationθ i andẐ i using the time series data of each individual i.
Denote V i the associated variance-covariance matrix ofθ i for each individual, i.e., Step 2:. Then the MDQR estimator can be defined bŷ whereV i is the estimator of V i .

The IV-MDQR estimator
However, there exist an endogenous variable in model (2), i.e., the spatial lag D it , which can cause biased estimation. Thus the MDQR estimation of model (2) is biased especially for the spatial correlation coefficient ρ. The problem of bias for quantile regression for spatial autoregressive panel data model can be ameliorated through the use of instrumental variables. Therefore, we employ the instrumental variable method for bias reduction in this section. Suppose the endogenous variable D it is related to a vector of instruments ω it , and the instruments ω it are independent of ε it . Following [18][19][20], and assuming the availability of instrumental variables ω it , we can derive the IV-MDQR estimator via the following four steps: Step 1:. For each individual i and a given quantile τ, define a suitable set of values {ρ j , j = 1, � � �, J;|ρ| < 1}. One can obtain the ordinary QR estimationβ i ; η i ;γ i of each individual i using the time series data via minimizing the following objective function: where γ is the coefficient of the instrumental variable ω it .
Step 2:. ChooserðtÞ among {ρ j , j = 1, � � �, J} which makes a weighted distance function defined on γ closest to 0:r where R is the parameter space of ρ.
Step 4:. Then the IV-MDQR estimator of SAR panel data model (1) can be defined bŷ whereV i is the estimation ofṼ i ,Ṽ i the associated variance-covariance matrix of the IVQR estimatorθ i for each individual i, which takes the form:

Remark 3.1. For each individual i, we need instruments for the endogenous variables
where W i is the ith row of the spatial weight matrix W. The instruments need to satisfy the following two conditions: (i) instruments ω it can impact the endogenous variables D it ; (ii) instruments ω it are independent of the random error ε it . In practice, for spatial autoregressive panel data model (1), we can choose the time-lag of y it , i.e., y it−1 and the spatial lag of the explanatory variable, i.e., w > i X t , as instrumental variable.

Asymptotic theory
In this section, we investigate the asymptotic properties of the IV-MDQR estimator. We impose the following regularity conditions:

is independent across individuals, and is independent and identically
A2 For all t 2 T , ðrðtÞ; βðtÞ; ηðtÞÞ is in the interior of the set R � B � E, and R � B � E is compact and convex.
A4 W is non-stochastic spatial weights matrices with zero diagonals. W is uniformly bounded in both row and column sums in absolute value.

A5 For each individual i, for
Pðr; β; where The parameter space R � B � E is a connected set and the image of R � B � E under the map ðr; β; Z i Þ7 !Pðr; β; Z i ; tÞ is simply connected.

A6
The conditional density f i (ε|D, X) is continuously differentiable for each i. There exist 0 < C L � C U < 1 such that f i (ε|D, X)�C U uniformly over ðε;XÞ and i � 1, and f i (0|D, X)�C L uniformly overX and i � 1; and there exists C f > 0 such that jf 0 Assumptions A1-A3 and A6 are standard in the literature on quantile regression for panel data. Assumption A4 is originated by [21,22] and is also used in [7,9]. Assumption A5 is a standard assumption in the instrumental variable quantile regression literature. Assumption A7 assures that S À 1 i ðtÞ are bounded uniformly across i. Assumptions A3 and A7 guarantee that both it Þ and their inverses are bounded uniformly across i. In applications, the variance-covariance matrices are unknown and need to be estimated. Following [15], when T and N tend to infinity sequentially, we impose the following assumption: exists and is non-singular. When N and T tend to infinity simultaneously, we make the following assumption: exists and is non-singular.
We can now establish consistency and asymptotic normality of the IV-MDQR estimator. Proofs are given in the Appendix.
For the disturbance errors, we consider the standard normal (i.e., N(0, 1)) distribution. Tables 1 and 2 respectively reports the bias and RMSE of the several QR estimators in the homoscedastic and heteroscedastic case. For IV-MDQR and IV-FEQR, we considered two

PLOS ONE
different instruments, y it−1 and the spatial lag of X it . The results are similar in both cases, and we simply present results for the y it−1 case. From Tables 1 and 2, we see that the bias and RMSE of the estimators are obviously reduced as the sample size increase except the MDQR estimator. The IV-MDQR overwhelmingly performs better than the MDQR estimator, which shows that the instrumental variable method effectively reduces the estimation bias. For estimating the coefficient β, the IV-MDQR and the IV-FEQR estimator perform similarly. For estimating the spatial correlation coefficient ρ, the IV-MDQR estimator has larger bias but smaller RMSE than the IV-FEQR estimator. Following we compare the computing time of the IV-MDQR and IV-FEQR at one particular quantile τ = 0.5 in Example 1. We are interest in the elapsed time, i.e., the time required for one replication of simulation. The results are summarized in the following Table 3. Table 3   Table 2  shows that as the sample size increase, the computing times of both the IV-MDQR and the IV-FEQR estimators increase, but the increase rate of IV-FEQR estimator is much faster than the IV-MDQR estimator. Moreover, we are also interested in the question whether the computing time of the estimators is more sensitive to T and N. We consider the following two situations: (1) fix T = 100 and N varies in {10, 50, 100, 250, 500}; (2) fix N = 100 and T varies in {10, 50, 100, 250, 500}. The results are summarized in the following Table 4. From which, we can see that the computing time of the two estimators both are much more sensitive to the the size of N. But the sensitivity of IV-MDQR estimator is much lower than the IV-FEQR estimator.

Illustration
In this section, we employ the cigarette demand data set (https://spatial-panels.com/software/) to illustrate our methodologies. The cigarette demand data set has been analyzed by many authors (see, [9,[23][24][25][26][27]). The data set is based on a panel of 46 states and covers 1963 to 1992. The spatial weight matrix W is also given in the data set. We take the following two variables as explanatory variables, such as the logarithm average retail price of a pack of cigarettes measured in real terms, (X 1 ), and the logarithm real per capita disposable income (X 2 ). The dependent variable y it is the logarithm real per capita sales of cigarettes by persons of smoking age (14 years and older).
Firstly, the Kolmogorov-Smirnov test is employed to test whether the standardized y follows the standard normal distribution. The result shows that the normality assumption is rejected at the 0.05 significance level. Fig 1 gives the p.d.f. plot of response y, which shows that the density of y has larger kurtosis than N(0, 1).
Following, we employ the spatial autoregressive panel data model for analysis. The fitted model takes the form: where D it ¼ P N j¼1 w ij y jt . We estimate the parameters using the IV-MDQR, IV-FEQR, MLE, and OLS methods. The results are presented in Table 5. The first three columns are the IV-MDQR estimates for τ = 0.25, 0.5, 0.75, the middle three columns are the IV-FEQR estimates for τ = 0.25, 0.5, 0.75, and the last two columns correspond to the MLE and OLS estimates respectively. We can see that the IV-MDQR and IV-FEQR estimates both vary at different quantiles (i.e., τ = 0.25, 0.5, 0.75). Except β 2 , the signs of the estimates are the same among IV-MDQR, IV-FEQR, MLE and OLS methods. At quantiles 0.25, 0.5 and 0.75, the cigarettes sales between neighbour states has a positive effect to each other, the log average cigarettes retail price has a negative effect to the cigarettes sales, and the log disposable income generally has a positive effect to the cigarettes sales.

PLOS ONE
Fig 2 presents a complete analysis, which considers other quantiles of the conditional cigarettes demand distribution. The x-axis presents the quantiles and y-axis presents the IV-MDQR estimations of parameters (red lines) and their corresponding confidence intervals (blue lines). We find that the cigarettes retail price has negative effect to the capita sales of cigarettes and disposable income has positive effect to the capita sales of cigarettes at all quantiles levels. Besides, the effects of capita sales of cigarettes and disposable income are both larger at extreme quantiles.

Conclusion
In this paper, we investigate minimum distance quantile regression (IV-MDQR) estimation of spatial autoregressive panel data models with fixed individual effects. The instrumental Firstly,θ i ðtÞ is the IVQR estimator which is computed using the time series data. Following [18], under assumptions A1-A6, the IVQR estimation of ðr i ðtÞ;β i ðtÞ;Ẑ i ðtÞÞ! p ðr 0 ðtÞ; β 0 ðtÞ; Z i0 ðtÞÞ as T ! 1 for each individual i.

2.
To show the consistency ofθ for joint asymptotics, we do the following computation: Under assumptions A1-A5, based on Lemma 1 in [15], we have max 1�i�N kθ i0 ðtÞ À

Proof of Theorem 3.2
Proof. We first derive the asymptotic normality of the IV-MDQR estimator under sequential asymptotics. Under Assumptions A1-A6, for each individual, the IVQR estimatorθ i ðtÞ converges to a Gaussian distribution: ffi ffi ffiffi T p ðθ i ðtÞ À θ 0 ðtÞÞ! d Nð0;Ṽ i Þ; tional density of ε it at the quantile of interest (see, [18,19]).