Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

DGQR estimation for interval censored quantile regression with varying-coefficient models

  • ChunJing Li,

    Roles Writing – original draft

    Affiliation School of Mathematics and Statistics, Changchun University of Technology, Changchun, China

  • Yun Li,

    Roles Software, Writing – original draft

    Affiliation School of Mathematics and Statistics, Changchun University of Technology, Changchun, China

  • Xue Ding,

    Roles Writing – original draft

    Affiliation School of Mathematics and Statistics, Changchun University of Technology, Changchun, China

  • XiaoGang Dong

    Roles Writing – review & editing

    dongxiaogang@ccut.edu.cn

    Affiliation School of Mathematics and Statistics, Changchun University of Technology, Changchun, China

Abstract

This paper propose a direct generalization quantile regression estimation method (DGQR estimation) for quantile regression with varying-coefficient models with interval censored data, which is a direct generalization for complete observed data. The consistency and asymptotic normality properties of the estimators are obtained. The proposed method has the advantage that does not require the censoring vectors to be identically distributed. The effectiveness of the method is verified by some simulation studies and a real data example.

Introduction

Varying-coefficient models are among popular models that have been proposed to reduce the curse of dimensionality. They were natural extensions of classical parametric models and more popular in data analysis. Thanks to their flexibility and interpretability. Varying-coefficient models were frist introduced by Cleveland [1]. Hastie and Tibshirani [2] extended it to regression models and generalized regression models. Huang and Wu [3] proposed an inference program based on the resampling subject bootstrap, which is based on the varying-coefficient model. At present, there were many results of parameter estimation studies on quantile regression for varying-coefficient models, such as, Honda [4] considered varying-coefficient quantile regression. Cai and Xu [5] studied quantile regression estimation for varying coefficients dynamic models. Yuan and Ju [6] considered a varying-coefficient quantile regression model in which some covariates random missing, and proposed a weighted estimate based on empirical likelihood. Tang and Zhou [7] used inverse probability weighted method in the varying-coefficient composite quantile regression model with random missing covariates. Sun and Sun [8] proposed optimal inverse probability weighted estimation of regression parameters when selection probabilities were known in the quantile regression model with varying-coefficient.

We focus on the following varying-coefficient quantile regression model in this article: where τ ∈ (0, 1), yi is the response variable of interest, which may represent the timing of the occurrence of some events, such as the time of death or disease, or some transformation of the time to the event [9], and xi is an observable covariate vector. Qτ(yi|xi) is the conditional quantile function [10] of yi given xi, and βτ(T) ∈ Rm is the coefficient function vector dependent on τ.

However, in some practical applications, yi may not be fully observed due to the occurrence of censoring. For example, response variable yi is subjected to interval censoring: suppose one does not observe yi, but censoring vector t1i, t2i, which satisfies P(t1i < yit2i) = 1. Interval censored data is naturally produced in many clinical trials and longitudinal studies where individuals are tested regularly but not continuously. Interval censored data have been discussed by Sun [11] discuss several important topics about interval-censored failure time data that can occur in practice. Feng and Duan [12] studied a interval-censored data that distribution of or the underlying mechanisms behind censoring variables may depend on the treatment method, so it is different for subjects in different treatment groups. Chay and Powell [13], Ji and Peng [14], Li and Zhang [15], Lin and He [16], concerned linear regression with interval censored data. Zhou and Feng [17] propose an estimation method for quantile regression models with interval censored data. For varying-coefficient quantile regression model with censored data, Yin and Zeng [18] proposed a varying-coefficient quantile regression model subject to random censoring. Xie and Zhou [19] adopted a weighted inverse probability approach to develop a varying-coefficient model to the estimation of regression quantiles under random data censoring. These studies have not considered the coefficient function estimation method of the interval censored data.

The primary goal of this article is to develop a estimate method with interval censored data. We will use methods to estimate the coefficient function vector βτ(T) for general τ ∈ (0, 1). We propose a direct generalization quantile regression(DGQR) estimation method and first to develop theory and methodology of the quantile regression for varying-coefficient models with interval censored data. Under some regularity conditions, obtain the asymptotic normality of . The proposed estimator is defined as the optimal solution point of a minimization problem with convex objective function. The property of asymptotic normality is established with a bias converging to zero. We also compared the performance of our proposed method with other methods in the quantile regression with varying-coefficient models.

The rest of this paper is arranged as follows. In Section 2, we put forward the DGQR estimation method to quantile regression for varying-coefficient model with interval censored response observations. In Section 3, establish asymptotic properties of the estimator. In Section 4, simulations are achieved to investigate the finite sample performance of the proposed methods, and simulation results show that the proposed methods work well for various τ ∈ (0, 1). Section 5 gives an example analysis. A conclusion are given in Section 6. In the appendix to Section 7, technical proofs are given.

DGQR estimation

We consider the following varying-coefficient model: (1) where YR is a response variable, X = (X1, ⋯, Xp)Rp is a p-dimensional covariate, β(⋅) = (β1(⋅), ⋯, βp(⋅)) is an unknown vector-valued function with a smoothing variable T, the components βj(⋅) (j = 1, 2, ⋯, p) are all differentiable functions, ε is the random error whose τth quantile is zero, i.e., where f(ε) denotes the density function of ε. ε is also assumed to be independent with X and T.

In what follows, we first briefly introduce the quantile regression (QR) estimates under complete data. Then we discuss in detail the quantile regression method under the interval censored data. Throughout the paper, we denote β′(t) the derivative function of β(t). Denote ‖ ⋅ ‖ the L2 norm of the corresponding vector.

Note that βj(T) is differentiable. By Taylor’s expansion, we have [7] Thus, if all data are observable, the QR estimator of β(t) [4] is defined as for some fixed τ ∈ (0, 1), where a = (a1, ⋯, ap), b = (b1, ⋯, bp), is a kernel function with bandwidth h, ρτ(s) = s(τI(s < 0)) is the loss function (see, e.g., Koenker (2001) [20].), i.e.,

Next, we focus on the interval censoring case, i.e., yi can not be observed, and we can only observe two point t1i and t2i satisfying t1i < yit2i. Suppose the length of interval t2it1i is small. Then yi will be close to t1i and t2i. Under this assumption and some other regularity conditions, the probability of will be close to zero. Thereby, we can modify the loss function by using the method proposed by Zhou and Feng [17]. Define this method as DGQR estimation, i.e., (2) In 2), we use Fτ(⋅) instead of ρτ(⋅) to make the notation clearer. Based on 2), the DGQR estimator for interval censored varying-coefficient model 1) can be obtained by minimizing the following criterion function (3) i.e., (4)

Obviously, if yi are exactly observed, i.e. t1i = t2i holds for each i, the DGQR estimator defined in (4) will be reduced to quantile estimator for the complete observed data.

Asymptotic properties

To study the asymptotic properties of varying-coefficient DGQR estimator , we first give some assumptions.

  1. C.1.. The density function f(⋅) of ε has a continuous and uniformly bounded derivative, namely 0 < sups f′(s) < B0.
  2. C.2.. are the independent and identically distributed (i.i.d.) sample from random vector which is subject to the condition in Lemma 2.
  3. C.3.. Matrix is a positive definite matrix, and E(Xi) = 0.
  4. C.4.. Random variable T has a second-order differentiable density function fT(t) > 0 in some neighborhood of t [7].
  5. C.5.. The kernel function K(⋅) is a symmetric density function with a compact support, whose bandwidth h → 0, nh → ∞ as n → ∞ [7].
  6. C.6.. (t1i, t2i)(i = 1, ⋯, n) are independent random vectors (not necessary to be indentically distributed) which satisfy supi|t2it1i| ≤ ϱn for some sequence of ϱn → 0 as n → 0. Moreover, and are the marginal distribution functions of t1i and t2i, which has continuous and bounded dervatives at the point .
  7. C.7.. For each ϵ > 0, there is a finite M satifying which holds for all n large enough.
  8. C.8.. The sequence of the smallest eigenvalues of the matrices is bounded away from zero for some n large enough, where .

Now we are ready to state the consistency and asymptotic normality of the QR estimators .

Theorem 1. For any τ ∈ (0, 1), under Assumptions C.1-C.8, holds as n → + ∞, where stands for convergence in probability, and , β0(t) = (β(t), β′(t)).

Theorem 2. For τ ∈ (0, 1), under Assumptions C.1-C.8, holds as n → + ∞, where Em denotes the identity matrix of order m, stands for convergence in distribution, and

Simulations

In all simulations, we always use the Uniform kernel [21], that is , and use the bandwidths h = 0.5n−1/3. For each scenario, we report the BIAS and mean-squared error (MSE) of parameter estimators based on 500 replications, which is defined as

Example 1. In this example, we adopt a data generation process similar to Kim et al [22]. With the regression model where coefficient function is β(Ti) = Ti, the observed data {(t1i, t2i, xi, Ti)} are generated as follows:

  1. (1). Sample covariate {xi} from a standard normal distribution with Normal(0,1).
  2. (2). Generate {Ti} from Uniform(0.9,1.1).
  3. (3). For each i, to generate censoring interval (t1i, t2i], firstly we let ui = min{yi} − 0.3 + ri, with riUniform(0, 0.3). Then choose as (t1i, t2i), where l0 = 0, lj is generated from Uniform(0,0.3) independently for j = 1, ⋯, k, and k is a non negative integer which satisfies .
  4. (4). {εi} are generated independently from the following four distributions:(a) Normal(0,0.1); (b) Logistic(0,0.3); (c) Lognormal(0,0.1); (d) Weibull(2.0,1.0).

Since the method proposed by Zhou and Feng [17] (Zhou estimation) can also be directly applied to quantile regression with varying-coefficient models. We are mainly interested in comparing the performance of the method proposed by Zhou and Feng [17] and ours (DGQR) in the quantile regression with varying-coefficient models. Frist we do simulations to compare these two methods for models with τ = 0.5 and sample size n = 200. The simulation results of quantile regression with varying-coefficient models, Zhou estimation, and DGQR estimation, including BIAS and MSE, are presented in Table 1.

thumbnail
Table 1. BIAS and MSE of two methods simulation results for Example 1.

https://doi.org/10.1371/journal.pone.0240046.t001

Example 2. The performance of the proposed method for interval censored quantile regression with varying-coefficient models with different τ ∈ (0, 1), generate random data {(t1i, t2i, xi} from the same models as in Example 1 except that coefficient function is β(Ti) = sin(2πTi) and {Ti} from Uniform(0,1). We focus on comparing the BIAS and MSE(in brackets) with sample size n = 100, 200 and 300. Then calculation BIAS and MSE of varying-coefficient models for τ takes four different values: 0.2, 0.4, 0.6, 0.8.

Example 3. We generate random data {(t1i, t2i, xi, Ti)} from the same models as in Example 2 except that coefficient function is β(Ti) = 2T2 + 6T, and calculat BIAS and MSE for τ takes four different values: 0.2, 0.4, 0.6, 0.8.

Example 4. We generate random data {(t1i, t2i, xi, Ti)} from the same models as in Example 2 except that {xi} are derived independently from the distribution Exp(1), and calculat BIAS and MSE for τ takes four different values: 0.2, 0.4, 0.6, 0.8.

We summarize our findings below:

  1. (1). From Table 1, we can see that the estimation method (DGQR) we proposed in terms of BIAS and MSE is superior than the method proposed by Zhou and feng [17], for the quantile regression for varying-coefficient models.
  2. (2). As is seen in Tables 24, all the biases and MSE decrease as n increases with different values of τ, the estimates seem to be unbiased. This implies our estimates are consistent for all the parameters.
  3. (3). Table 2 shows the BIAS and MSE of different residual distributions under the parameter settings of Example 2. We see that the values of bias do not differ much from their corresponding MSE, indicating that the estimators converge fast. Compared with Tables 2 to 4, all simulation result performs well, regardless the distrubution type of the covariates and the coefficients.
  4. (4). Figs 1 and 2 show the DGQR estimator based on the Example 2 and Example 3 in the case of τ = 0.5, respectively. From Figs 1 and 2, we can see that the biases of the estimator is very small. This further confirms that our proposed estimation method is effective.
thumbnail
Fig 1. Parameter setting based on Example 2 and τ = 0.5.

The solid curves true function β(t); dotted line estimated function .

https://doi.org/10.1371/journal.pone.0240046.g001

thumbnail
Fig 2. Parameter setting based on Example 2 and τ = 0.5.

The solid curves true function β(t); dotted line estimated function .

https://doi.org/10.1371/journal.pone.0240046.g002

thumbnail
Table 2. BIAS and MSE (in parentheses) of four distribution simulation result for Example 2.

https://doi.org/10.1371/journal.pone.0240046.t002

thumbnail
Table 3. BIAS and MSE (in parentheses) of four distribution simulation result for Example 3.

https://doi.org/10.1371/journal.pone.0240046.t003

thumbnail
Table 4. BIAS and MSE (in parentheses) of four distribution simulation result for Example 4.

https://doi.org/10.1371/journal.pone.0240046.t004

Empirical analysis

In this section, we will use the proposed DGQR estimation and interval generation mechanism procedure to analyze the air pollution data set collected by the Norwegian Public Roads Administration. The data set consists of 500 observations and can be found in StatLib. The data includes the concentration of NO2(yi) per hour of the day, the number of cars per hour (x1i), the wind speed (x2i) and the hour (Ti). We use varying-coefficient model based quantile regression method to fit the data. We establish the following varying-coefficients model: (5) We use the interval generation mechanism in the simulation which generates interval (t1i, t2i] with yi.

In order to test whether the coefficient function really time varying, we consider the following test questions: where β = c(β1, β2) is a constant vector. Based on 200 bootstrap resampling, we analyze interval censored data and give estimated functions of β1(T) and β2(T), along with the 95% bootstrap confidence bands, respectively. The p-values of test Tn are both 0.00. Therefore, we should reject null hypothesis H0 at a significance level of 0.05. Prove that model (5) is a varying-coefficient model.

Fig 3 plots the confidence intervals for β1(T) and β2(T) of the quantile regression for varying-coefficient models with completed data. Fig 4 plots the confidence intervals for β1(T) and β2(T) with interval censored data. The result in Fig 3 show that β1(T) and β2(T) are significant time varying with completed data and Fig 4 also show that β1(T) and β2(T) are significant time varying with interval censored data. Furthermore, we can also see that the DGQR estimators confidence intervals with the completed data as long as with the interval censored data. Basically, we can see that β1(T) and β2(T) of completed data and interval censored data the results are consistent in the confidence interval. And there is no loss effect.

thumbnail
Fig 3. Estimates and the corresponding pointwise confidence interval of β1(t), β2(t) for complete data.

https://doi.org/10.1371/journal.pone.0240046.g003

thumbnail
Fig 4. Estimates and the corresponding pointwise confidence interval of β1(t), β2(t) for interval censored data.

https://doi.org/10.1371/journal.pone.0240046.g004

To further illustrate the effect of fitting, we perform the following residual analysis. Fig 5 plots the residual histogram (a) and AFC plot (b) of the model fitted to the data. We can see the residual histogram plot(a) it is close to the normal distribution, and the residual sequence cannot be seen to be correlated in the corresponding AFC chart (b). This fitting result also confirms the advantage of the varying-coefficient quantile model in fitting interval censored data. As shown in the above results, when the data cannot be fully observed, our proposed method can well estimate the coefficient function.

Conclusions

In this paper, firstly proposes a coefficient function estimation method (DGQR estimation) for interval censored quantile regression with varying-coefficient model, which creatively solves the problem of interval censoring of response variables under the model. The property of asymptotic normality is established with a bias converging to zero and asymptotic normality are given a strict proof. We proposed methods do not require the interval censoring vectors to be identically distributed, and can be applied to models with fixed, discrete random, or continuous random design covariates. An other important advantage of the proposed methods is their computational simplicity, and all objective functions of the minimization problems involved in the proposed methods are simple, convex, and easy to treat. In the simulation, we put in the Uniform kernel, our simulation results support the validity of our methods. Finally, a real data sets analysis show that intervel censored of quantile regression with varying-coefficient model for the air pollution data set. The empirical analysis results are significant. Therefore the DGQR estimation for interval censored quantile regression with varying-coefficient models can be applied to alleviate the curse of dimensionality application.

Appendix

Nothing that is free of a and the minimization in problem (3) is taken over a, we rewrite problem (3) in the following:

In order to prove the theorem, we establish the following four lemmas under the assumption C.1–C.8 for any τ ∈ (0, 1).

Lemma 1. If S(u1, u2) = (1 − τ)|t2 − max(t2, u2)| + τ|t1 − min(t1, u2)| − (1 − τ)|t2 − max(t2, u1)| − τ|t1 − min(t1, u1)|, u2 = u1 + a, t1 < t2, P(t1 < u1 < t2) → 0, and define t1 and t2 cannot belong to Λ = [u1, u2) at the same time, then we can obtian where

Lemma 2. holds uniformly in n and uniformly over ||z(t)||≤v with v → 0.

where

Proof Lemma 2. We provide as Hence if we let We can decompose Then we have rewrite Sn(a, b, t) as Sn(z(t))

For notational convenience, let By Lemma 1 rewrite as u1, as u2, as a, than let Λi be the interval with and as two end point, thus where

Noting that P(t1i < t2i) = 1, by Assumptions C.1–C.8, it is also easy to show that Using mean value theorems for definite integrals, we have where . By a Taylor expansion, Thus, we can obtain Imitating the calculation process of E(I1), we have Obviously, hold true, where

Based on the above result, we have holds uniformly in n and uniformly over ||z(t)|| < v with v → 0. This complete the proof of Lemma 2.

Define which is the derivative of fni(z(t)) at z(t) = 0 expect or .

Lemma 3. Let . Then

Proof of Lemma 3. It follows directly from Lemma 2 in [17].

Lemma 4. For any τ ∈ (0, 1) holds for any bounded subset Ø ∈ ℜm as n → ∞; holds uniformly in n and uniformly over 0 < ||z(t)|| < Z as v → 0.

Proof of Lemma 4. It follows directly from Lemma 3 in [17].

Proof of Theoren 1: Note that holds for n large enough. By the fact P(t1i < yt2i) = 1, we have

By Assumption C.1−C.8, we can get the following results Under the Assumption C.8 we know , and we know Hn is bounded away from zero for n large enough. Then we show that for any v > 0, holds for all n large enough, and v small enough.

By Lemma 2 we know for any v > 0 small enough, there is ϵ > 0 such that holds for any ||z(t)|| = v and n large enough. By Lemma 3 we have that for any δ > 0, holds for any n large enough. Nothing that Sn(z(t)) is convex and Sn(0) = 0, we can conclude that holds true with probability tending to 1 as n → ∞.

Proof of Theorem 2. Let , and . where

According to above conclusions and Lemma 2 we have Since is the minimization point of , then Let , by direct calculation we know then where Let , then Thus Then calculate the variance of , where Noting the fact that Then Therefore, we have Theorem 2 holds true.

References

  1. 1. Cleveland WS, Grosse E, Shyu WM. Local regression models in: Chambers JM, Hastie TJ, editors. Statistical Models in S. 1993;8:309–376.
  2. 2. Hastie TJ, Tibshirani R. Varying-coefficient models. Journal of the Royal Statistical Society Series B. 1993;55(5):757–796.
  3. 3. Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89(1):111–128.
  4. 4. Honda T. Quantile regression in varying coefficient models. Journal of Statistical Planning and Inference. 2004;121(1):113–125.
  5. 5. Cai Z, Xu X. Nonparametric quantile estimations for dynamic smooth coefficient models. Journal of the American Statistical Association. 2009;104(485):371–383.
  6. 6. Yuan XH, Ju TT. Weighted quantile regression for varying-Coefficient models with missing covariates based on empirical likelihood. Journal of Jilin University(Science Edition). 2017;55(02):281–288.
  7. 7. Tang LJ, Zhou ZG. Weighted local linear CQR for varying-coefficient models with missing covariates. TEST. 2015;24(3):583–604.
  8. 8. Sun J, Sun QH. An improved and efficient estimation method for varying-coefficient model with missing covariates. Statistics and Probability Letters. 2015;107:296–303.
  9. 9. Koenker R, Bassett JrG. Regression quantiles. Econometrica. 1978;46:33–50.
  10. 10. Zhao ZW, Wang DH, Peng CX. Coefficient constancy test in generalized random coefficient autoregressive model. Applied Mathematics and Computation. 2013;219(20):10283–10292.
  11. 11. Sun JG. The statistical analysis of interval-censored failure time Data. Publications of the American Statal Association. 2012;102(480):1473–1474.
  12. 12. Feng YQ, Duan R, and Sun JG. Nonparametric comparison of survival functions based on interval-censored data with unequal censoring. Statistics in Medicine. 2017;36(12):1895–1906.
  13. 13. Chay KY, Powell JL. Semiparametric censored regression models. Journal of Economic Perspectives. 2001;15(4):29–42.
  14. 14. Ji S, Peng L, Cheng Y, et al. Quantile regression for doubly censored data. Biometrics. 2012;68(1):101–112. pmid:21950348
  15. 15. Li G, Zhang CH. Linear regression with interval censored data. The Annals of Statistics. 1998;26(4):1306–1327.
  16. 16. Lin GX, He XM, Portnoy S. Quantile regression with doubly censored data. Computational Statistics and Data Analysis. 2012;56(4):797–812.
  17. 17. Zhou XQ, Feng YQ, Du XL. Quantile regression for interval censored data. Communications in Statistics-Theory and Methods. 2017;46(8):3848–3863.
  18. 18. Yin GS, Zeng DL, Li H. Censored quantile regression with varying coefficients. Statistica Sinica. 2014;24(2):855–870.
  19. 19. Xie SG, Wan Alan TK, Zhou Y. Quantile regression methods with varying-coefficient models for censored data. Computational Statistics and Data Analysis. 2015;88(02):154–172.
  20. 20. Koenker R, Hallock KF, Hallock . Quantile regression. Journal of Economic Perspectives. 2001;15(4):143–156.
  21. 21. Wu CO, Chiang CT. Kernel smoothing on varying coefficient models with longitudinal dependent variable. Statistica Sinica. 2000;10(2):433–456.
  22. 22. Kim YJ, Cho HJ, Kim J, et al. Median regression model with interval censored data. Biometrical. 2010;52(2):201–208.