Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Minimum distance quantile regression for spatial autoregressive panel data models with fixed effects

  • Xiaowen Dai,

    Roles Data curation, Writing – original draft

    Affiliations School of Statistics and Mathematics, Shanghai Lixin University of Accounting and Finance, Shanghai, China, Interdisciplinary Research Institute of Data Science, Shanghai Lixin University of Accounting and Finance, Shanghai, China

  • Libin Jin

    Roles Methodology, Writing – original draft

    jinlb1987@hotmail.com

    Affiliations School of Statistics and Mathematics, Shanghai Lixin University of Accounting and Finance, Shanghai, China, Interdisciplinary Research Institute of Data Science, Shanghai Lixin University of Accounting and Finance, Shanghai, China

Abstract

This paper considers the quantile regression model with individual fixed effects for spatial panel data. Efficient minimum distance quantile regression estimators based on instrumental variable (IV) method are proposed for parameter estimation. The proposed estimator is computational fast compared with the IV-FEQR estimator proposed by Dai et al. (2020). Asymptotic properties of the proposed estimators are also established. Simulations are conducted to study the performance of the proposed method. Finally, we illustrate our methodologies using a cigarettes demand data set.

1 Introduction

In the last few decades, spatial autoregressive (SAR) models have been studied and applied to many areas such as economics, demography, geography and other scientific areas. Panel data with spatial interaction is also of great interest, as it can control for both heterogeneity and spatial correlation and enable researchers to take into account the dynamics (see, [17]).

Recently, there has been a growing literature on estimating and testing of spatial panel data models. For instance, [7] proposed the maximum likelihood (ML) estimator for the spatial autoregressive (SAR) panel model with both spatial lag and spatial disturbances. Zhang and Shen [8] studied estimation of a semi-parametric varying coefficient spatial panel data models. Dai et al. [9] investigated fixed effects quantile regression for general spatial panel data models with both individual fixed effect and time period effects based on instrumental variable method. Xu and Yang [10] proposed adjusted quasi score (AQS) tests for testing the existence of temporal heterogeneity in slope and spatial parameters in spatial panel data (SPD) models with fixed effects. Bai and Li [11] studied quasi-maximum likelihood estimator of dynamic spatial panel data models with common shocks to deal with both weak and strong cross-sectional correlations. Li and Yang [12] developed M-estimation and inference methods for spatial dynamic panel data models with correlated random effects based on short panels. Zhang et al. [13] studied a penalized quantile regression for spatial panel model with fixed effects. Except the work of [9, 13], all these works were developed based on (conditional) mean regression methods. Compared with mean regression methods, the quantile regression (QR) method is more robust and can be adopted to deal with data characterized by different error distributions.

However, in contrast to mean regression, there is no general transformation that can suitably eliminate the individual effects in the quantile regression framework (see, [14, 15]). Thus the FEQR estimation (see, [16]) is implemented by treating each individual effect also as a parameter to be estimated, which brings the computational difficulties. Hence, the IV-FEQR estimator (i.e., FEQR estimator based on instrumental variable method) used in [9] is also computational cumbersome. To address computational difficulties, [15] proposed the efficient minimum distance quantile regression (MDQR) method. Compared with the FEQR estimator, the MDQR estimator is computationally fast and is easy to implement in practice. The computing advantage is particularly obvious for large cross-sections.

In this paper, we employ the MDQR methodology for estimating the SAR panel data model with individual fixed effects. The instrumental variable (IV) method is employed to attenuate the estimation bias. The asymptotic properties of the IV-MDQR estimator are also developed. Monte Carlo simulations are conducted to assess the finite sample performance of the IV-MDQR, MDQR and IV-FEQR estimators. Computation speeds of IV-MDQR and IV-FEQR are also compared. Finally, We apply our theoretical results for the demand for cigarettes.

The rest of the paper is organized as follows. Section 2 introduces the SAR panel data model with individual fixed effects. Section 3 proposes the IV-MDQR estimation procedure. The asymptotic properties of the IV-MDQR estimators are also discussed. Proofs of the theorems in Sections 3 are given in the Appendix. Section 4 reports a simulation study for assessing the finite sample performance of the proposed estimators. An empirical illustration is considered in Section 5. Section 6 concludes the paper.

2 The model with individual fixed effects

Consider the following spatial autoregressive panel data (SARP) model with individual fixed effects: (1) where yit is the dependent variable for subject i at time t, Xit is a p × 1 vector of explanatory variables, wij is the (i, j)th element of W, W is an N × N non-stochastic spatial weight matrix reflecting spatial dependence on yit among cross sectional units, and εit is the random disturbance term. The parameters ηi, i = 1, ⋯, N are fixed effects for the regions. Interaction effects are reflected in the spatial lag variable (and associated scalar parameter ρ).

We consider the following conditional τ-quantile of response variable: (2) where τ is a quantile in the interval (0, 1), Qτ(εit|Dit, Xit, Zi) = 0, , , Zi = Z hi is an indicator variable for the individual effect ηi, hi is an NT × 1 vector with the ith element equal to 1 and the rest equal to 0, Z = 1TIN is an NT × N matrix, 1T is the T × 1 vector with all the elements being 1.

And the FEQR estimator can then be obtained by minimizing the following objection function: (3) where ρτ(u) = u(τI(u ≤ 0)) is the check function and I(⋅) is the indicator function (see, e.g., [17]).

Galvao and Wang [15] argued that unlike mean regression, the individual effects cannot be suitably eliminated via transformation in the FEQR estimator. Thus the FEQR estimator is implemented by treating each individual effect ηi as a parameter to be estimated. Therefore, if the number of the individuals is large, the FEQR estimator will involve optimization with large number of parameters to be estimated, which makes the problem computationally cumbersome. Inference using the FEQR estimator is difficult to conduct in practice. For this reason, we employ the minimum distance quantile regression (MDQR) estimator (see, [15]) for estimation.

Denote θ = (ρ, β). The MDQR estimation of model (1) can be implemented via the following two steps:

  1. Step 1:. Obtain the QR estimation and using the time series data of each individual i. Denote Vi the associated variance-covariance matrix of for each individual, i.e., , where , is the conditional density of εit at the quantile of interest.
  2. Step 2:. Then the MDQR estimator can be defined by where is the estimator of Vi.

3 The IV-MDQR estimator

However, there exist an endogenous variable in model (2), i.e., the spatial lag Dit, which can cause biased estimation. Thus the MDQR estimation of model (2) is biased especially for the spatial correlation coefficient ρ. The problem of bias for quantile regression for spatial autoregressive panel data model can be ameliorated through the use of instrumental variables. Therefore, we employ the instrumental variable method for bias reduction in this section.

Suppose the endogenous variable Dit is related to a vector of instruments ωit, and the instruments ωit are independent of εit. Following [1820], and assuming the availability of instrumental variables ωit, we can derive the IV-MDQR estimator via the following four steps:

  1. Step 1:. For each individual i and a given quantile τ, define a suitable set of values {ρj, j = 1, ⋯, J;|ρ| < 1}. One can obtain the ordinary QR estimation of each individual i using the time series data via minimizing the following objective function: (4) where γ is the coefficient of the instrumental variable ωit.
  2. Step 2:. Choose among {ρj, j = 1, ⋯, J} which makes a weighted distance function defined on γ closest to 0: (5) where is the parameter space of ρ.
  3. Step 3:. The IVQR estimation of can be obtained, which is respectively , . Therefore, .
  4. Step 4:. Then the IV-MDQR estimator of SAR panel data model (1) can be defined by (6) where is the estimation of , the associated variance-covariance matrix of the IVQR estimator for each individual i, which takes the form: where , , , , is the conditional density of εit at the quantile of interest.

Remark 3.1. For each individual i, we need instruments for the endogenous variables , where Wi is the ith row of the spatial weight matrix W. The instruments need to satisfy the following two conditions: (i) instruments ωit can impact the endogenous variables Dit; (ii) instruments ωit are independent of the random error εit. In practice, for spatial autoregressive panel data model (1), we can choose the time-lag of yit, i.e., yit−1 and the spatial lag of the explanatory variable, i.e., , as instrumental variable.

3.1 Asymptotic theory

In this section, we investigate the asymptotic properties of the IV-MDQR estimator. We impose the following regularity conditions:

  1. A1 {(yit, Xit)} is independent across individuals, and is independent and identically distributed (i.i.d.) within each i.
  2. A2 For all , is in the interior of the set , and is compact and convex.
  3. A3 , and .
  4. A4 W is non-stochastic spatial weights matrices with zero diagonals. W is uniformly bounded in both row and column sums in absolute value.
  5. A5 For each individual i, for (7) (8) where Δi = [ωi, Xi, 1T], Xi = (Xi1, ⋯, XiT), ωi = (ωi1, ⋯, ωiT). The Jacobian matrices and are continuous and have full rank uniformly over . The parameter space is a connected set and the image of under the map is simply connected.
  6. A6 The conditional density fi(ε|D, X) is continuously differentiable for each i. There exist 0 < CLCU < ∞ such that fi(ε|D, X)≤CU uniformly over and i ≥ 1, and fi(0|D, X)≥CL uniformly over and i ≥ 1; and there exists Cf > 0 such that . Here, , is the first-order derivative of the density fi(ε|D, X).
  7. A7 There exists δS > 0 such that min1≤iN min eig(Si(τ)) ≥ δS, where , .

Assumptions A1-A3 and A6 are standard in the literature on quantile regression for panel data. Assumption A4 is originated by [21, 22] and is also used in [7, 9]. Assumption A5 is a standard assumption in the instrumental variable quantile regression literature. Assumption A7 assures that are bounded uniformly across i. Assumptions A3 and A7 guarantee that both and their inverses are bounded uniformly across i.

In applications, the variance-covariance matrices are unknown and need to be estimated. Following [15], when T and N tend to infinity sequentially, we impose the following assumption:

  1. A8 for each i as T → ∞. Assume that , where exists and is non-singular.
    When N and T tend to infinity simultaneously, we make the following assumption:
  2. A8′ for some hN → 0 uniformly across i and as N → ∞. Assume that , where exists and is non-singular.

We can now establish consistency and asymptotic normality of the IV-MDQR estimator. Proofs are given in the Appendix.

Theorem 3.1. 1. Under assumptions A1-A6 and A8, (ρ(τ), β(τ)) is consistently estimable as (T, N)seq → ∞.

2. Under assumptions A1-A6 and A8′, (ρ(τ), β(τ)) is consistently estimable as (T, N)→∞ and logN/T → 0.

Theorem 3.2. 1. Under assumptions A1-A7 and A8, as (T, N)seq → ∞,

2. Under assumptions A1-A7 and A8′, as (T, N) → ∞ and ,

4 Monte Carlo simulations

In this section, we conduct Monte Carlo simulations to investigate the finite sample performance of the IV-MDQR estimator. We report results for average bias and root mean squared error (RMSE). We are mainly interested in comparing the performances of the IV-MDQR estimator and other two QR estimators, such as MDQR and IV-FEQR. The Monte Carlo simulations are repeated 1000 times. We consider several sample sizes and quantiles, where N ∈ {50, 100, 200}, T ∈ {50, 100}, and τ ∈ {0.25, 0.5, 0.75}. For IV-MDQR and IV-FEQR, we search ρ from .

The samples are generated as follows:

Example 1. (homoscedastic case)

Example 2. (heteroscedastic case)

In the two cases, we set ρ = 0.5, β = 1. And X, η are drawn respectively from U(−2, 2) and N(0, 1) distribution. The spatial weights matrix W is generated based on the mechanism considered in Zhang and Shen (2015), i.e., . εit = eitF−1(τ), F is the common CDF of eit. Therefore, the random errors εit are centered to have zero τth quantile. For the disturbance errors, we consider the standard normal (i.e., N(0, 1)) distribution.

Tables 1 and 2 respectively reports the bias and RMSE of the several QR estimators in the homoscedastic and heteroscedastic case. For IV-MDQR and IV-FEQR, we considered two different instruments, yit−1 and the spatial lag of Xit. The results are similar in both cases, and we simply present results for the yit−1 case. From Tables 1 and 2, we see that the bias and RMSE of the estimators are obviously reduced as the sample size increase except the MDQR estimator. The IV-MDQR overwhelmingly performs better than the MDQR estimator, which shows that the instrumental variable method effectively reduces the estimation bias. For estimating the coefficient β, the IV-MDQR and the IV-FEQR estimator perform similarly. For estimating the spatial correlation coefficient ρ, the IV-MDQR estimator has larger bias but smaller RMSE than the IV-FEQR estimator.

thumbnail
Table 1. Bias and RMSE (in parentheses) of various estimators in the homoscedastic case.

https://doi.org/10.1371/journal.pone.0261144.t001

thumbnail
Table 2. Bias and RMSE (in parentheses) of various estimators in the heteroscedastic case.

https://doi.org/10.1371/journal.pone.0261144.t002

Following we compare the computing time of the IV-MDQR and IV-FEQR at one particular quantile τ = 0.5 in Example 1. We are interest in the elapsed time, i.e., the time required for one replication of simulation. The results are summarized in the following Table 3. Table 3 shows that as the sample size increase, the computing times of both the IV-MDQR and the IV-FEQR estimators increase, but the increase rate of IV-FEQR estimator is much faster than the IV-MDQR estimator.

thumbnail
Table 3. The computing time (in seconds) of IV-FEQR and IV-MDQR estimators required for one replication under different sample size.

https://doi.org/10.1371/journal.pone.0261144.t003

Moreover, we are also interested in the question whether the computing time of the estimators is more sensitive to T and N. We consider the following two situations: (1) fix T = 100 and N varies in {10, 50, 100, 250, 500}; (2) fix N = 100 and T varies in {10, 50, 100, 250, 500}. The results are summarized in the following Table 4. From which, we can see that the computing time of the two estimators both are much more sensitive to the the size of N. But the sensitivity of IV-MDQR estimator is much lower than the IV-FEQR estimator.

thumbnail
Table 4. The computing time (in seconds) of IV-FEQR and IV-MDQR estimators required for one replication under different sample size.

https://doi.org/10.1371/journal.pone.0261144.t004

5 Illustration

In this section, we employ the cigarette demand data set (https://spatial-panels.com/software/) to illustrate our methodologies. The cigarette demand data set has been analyzed by many authors (see, [9, 2327]). The data set is based on a panel of 46 states and covers 1963 to 1992. The spatial weight matrix W is also given in the data set. We take the following two variables as explanatory variables, such as the logarithm average retail price of a pack of cigarettes measured in real terms, (X1), and the logarithm real per capita disposable income (X2). The dependent variable yit is the logarithm real per capita sales of cigarettes by persons of smoking age (14 years and older).

Firstly, the Kolmogorov-Smirnov test is employed to test whether the standardized y follows the standard normal distribution. The result shows that the normality assumption is rejected at the 0.05 significance level. Fig 1 gives the p.d.f. plot of response y, which shows that the density of y has larger kurtosis than N(0, 1).

thumbnail
Fig 1. Probability density plots of y in the cigarette demand data set (blue line) and N(0, 1) (orange line).

https://doi.org/10.1371/journal.pone.0261144.g001

Following, we employ the spatial autoregressive panel data model for analysis. The fitted model takes the form: (9) where .

We estimate the parameters using the IV-MDQR, IV-FEQR, MLE, and OLS methods. The results are presented in Table 5. The first three columns are the IV-MDQR estimates for τ = 0.25, 0.5, 0.75, the middle three columns are the IV-FEQR estimates for τ = 0.25, 0.5, 0.75, and the last two columns correspond to the MLE and OLS estimates respectively. We can see that the IV-MDQR and IV-FEQR estimates both vary at different quantiles (i.e., τ = 0.25, 0.5, 0.75). Except β2, the signs of the estimates are the same among IV-MDQR, IV-FEQR, MLE and OLS methods. At quantiles 0.25, 0.5 and 0.75, the cigarettes sales between neighbour states has a positive effect to each other, the log average cigarettes retail price has a negative effect to the cigarettes sales, and the log disposable income generally has a positive effect to the cigarettes sales.

thumbnail
Table 5. Estimation results of cigarette demand based on spatial autoregressive panel data models.

https://doi.org/10.1371/journal.pone.0261144.t005

Fig 2 presents a complete analysis, which considers other quantiles of the conditional cigarettes demand distribution. The x-axis presents the quantiles and y-axis presents the IV-MDQR estimations of parameters (red lines) and their corresponding confidence intervals (blue lines). We find that the cigarettes retail price has negative effect to the capita sales of cigarettes and disposable income has positive effect to the capita sales of cigarettes at all quantiles levels. Besides, the effects of capita sales of cigarettes and disposable income are both larger at extreme quantiles.

thumbnail
Fig 2. Quantile effects of the spatial correlation coefficient, the log average retail price of a pack of cigarettes and the log disposable income.

The areas represent 95% point-wise confidence intervals.

https://doi.org/10.1371/journal.pone.0261144.g002

6 Conclusion

In this paper, we investigate minimum distance quantile regression (IV-MDQR) estimation of spatial autoregressive panel data models with fixed individual effects. The instrumental variable method is employed for bias reduction. The asymptotic properties are studied. Monte Carlo results are provided to show that the proposed methodology effectively reduces the estimation bias and is computationally advantageous.

7 Appendix: Proofs

7.1 Proof of Theorem 3.1

Proof. Denote . Firstly, is the IVQR estimator which is computed using the time series data. Following [18], under assumptions A1-A6, the IVQR estimation of as T → ∞ for each individual i.

  1. By assumption A8, for each i, it follows that for fixed N, as T → ∞ Hence, it follows that as (T, N)seq → ∞.
  2. To show the consistency of for joint asymptotics, we do the following computation: Under assumptions A1-A5, based on Lemma 1 in [15], we have as (N, T)→∞ and . Hence, the last equation above is equal to op(1).

7.2 Proof of Theorem 3.2

Proof. We first derive the asymptotic normality of the IV-MDQR estimator under sequential asymptotics. Under Assumptions A1-A6, for each individual, the IVQR estimator converges to a Gaussian distribution: where , , , , , is the conditional density of εit at the quantile of interest (see, [18, 19]).

By assumption A8 and Slutsky’s theorem, as T → ∞. Fix N and let T tend to infinity, then we have Then let N tend to infinity, we can obtain . Moreover, by Lyapunov Central Limit Theorem, it follows that . Hence, by Slutsky’s theorem, we can obtain the following result:

Following, we derive the asymptotic normality of the IV-MDQR estimator under joint asymptotics. Let Ξ = [0|Ip+1]. Under assumption A8′, we can do the following computation: Based on Lemmas 4 and 5 in [15], we can obtain that for each i, where , , , , .

Then it follows that

As (T, N)→∞ and , the second term of the above equation is op(1). Moreover, by Lyapunov central limit theorem, the first term of the above equation converges in distribution to . Hence, by Slutsky’s theorem, we can obtain the following result: as (T, N)→∞ and .

Acknowledgments

We thank the editor, the associate editor and two referees for their helpful comments which led to a considerable improvement of the original article.

References

  1. 1. Anselin L. (1988) Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, The Netherlands.
  2. 2. Baltagi B.H., Song S.H., Koh W. (2003). Testing panel data regression models with spatial error correlation. Journal of Econometrics, 117, 123–150.
  3. 3. Baltagi B.H., Egger P., Pfaffermayr M. (2013). A generalized spatial panel data model with random effects. Econometric Reviews, 32, 650–685.
  4. 4. Elhorst J.P. (2003). Specification and estimation of spatial panel data models. International Regional Science Review, 26, 244–268.
  5. 5. Yu J., de Jong R., Lee L.F. (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. Journal of Econometrics, 146, 118–134.
  6. 6. Yu J., Lee L.F. (2010). Estimation of unit root spatial dynamic panel data models. Econometric Theory, 26, 1332–1362.
  7. 7. Lee L.F., Yu J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics, 154, 165–185.
  8. 8. Zhang Y. and Shen D. (2015). Estimation of semi-parametric varying-coefficient spatial panel models with random-effects. Journal of Statistical Planning and Inferences, 159, 64–80.
  9. 9. Dai X., Yan Z., Tian M., Tang M. (2020). Quantile regression for general spatial panel data models with fixed effects. Journal of Applied Statistics, 47(2), 45–60.
  10. 10. Xu Y., Yang Z. (2020). Specification Tests for Temporal Heterogeneity in Spatial Panel Data Models with Fixed Effects. Regional Science and Urban Economics, 81, 103488.
  11. 11. Bai J., Li K. (2021). Dynamic spatial panel data models with common shocks. JJournal of Econometrics 224(1), 134–160.
  12. 12. Li L., Yang Z. (2021). Spatial dynamic panel data models with correlated random effects. J. Econometrics, 221(2), 424–454.
  13. 13. Zhang Y., Jiang J., Feng Y. (2021). Penalized quantile regression for spatial panel data with fixed effects. Communications in Statistics-Theory and Methods,.
  14. 14. Kato K., Galvao A., Montes-Rojas G. (2012) Asymptotics for panel quantile regression models with individual effects. Journal of Econometrics 170, 76–91.
  15. 15. Galvao A.F., Wang L. (2015) Efficient minimum distance estimator for quantile regression fixed effects panel data. Journal of Multivariate Analysis 133, 1–26.
  16. 16. Koenker R. Quantile regression for longitudinal data. (2004). J. Multivariate Anal., 91, 74–89.
  17. 17. Koenker R. (2005) Quantile Regression. Cambridge University Press, New York.
  18. 18. Chernozhukov V., Hansen C. (2006) Instrumental quantile regression inference for structural and treatment effect models. Journal of Econometrics 132, 491–525.
  19. 19. Chernozhukov V., Hansen C. (2008) Instrumental variable quantile regression: A robust inference approach. Journal of Econometrics 142, 379–398.
  20. 20. Galvao A.F. (2011) Quantile regression for dynamic panel data with fixed effects. Journal of Econometrics 164, 142–157.
  21. 21. Kelejian H. H., Prucha I. R. (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17(1), 99–121.
  22. 22. Kelejian H.H., Prucha I.R. (2001). On the asymptotic distribution of the Moran I test statistic with applications. Journal of Econometrics, 104, 219–257
  23. 23. Baltagi B.H. (2001) Econometric analysis of panel data. John Wiley & Sons.
  24. 24. Baltagi B. H., Li D. (2006). Prediction in the panel data model with spatial correlation. Spatial Economic Analysis, 1(2), 175–185.
  25. 25. Yang Z. (2006) Quasi-maximum likelihood estimation for spatial panel data regressions. Research Collection School of Economics.
  26. 26. Elhorst J.P. (2005) Unconditional maximum likelihood estimation of linear and log-linear dynamic models for spatial panels. Geographical Analysis 37, 85–106.
  27. 27. Kelejian H.H., Piras G. (2013). A j-test for panel models with fixed effects, spatial and time dependence. Regional Research Institute Working Paper Series.