DGQR estimation for interval censored quantile regression with varying-coefficient models

This paper propose a direct generalization quantile regression estimation method (DGQR estimation) for quantile regression with varying-coefficient models with interval censored data, which is a direct generalization for complete observed data. The consistency and asymptotic normality properties of the estimators are obtained. The proposed method has the advantage that does not require the censoring vectors to be identically distributed. The effectiveness of the method is verified by some simulation studies and a real data example.


Introduction
Varying-coefficient models are among popular models that have been proposed to reduce the curse of dimensionality. They were natural extensions of classical parametric models and more popular in data analysis. Thanks to their flexibility and interpretability. Varying-coefficient models were frist introduced by Cleveland [1]. Hastie and Tibshirani [2] extended it to regression models and generalized regression models. Huang and Wu [3] proposed an inference program based on the resampling subject bootstrap, which is based on the varying-coefficient model. At present, there were many results of parameter estimation studies on quantile regression for varying-coefficient models, such as, Honda [4] considered varying-coefficient quantile regression. Cai and Xu [5] studied quantile regression estimation for varying coefficients dynamic models. Yuan and Ju [6] considered a varying-coefficient quantile regression model in which some covariates random missing, and proposed a weighted estimate based on empirical likelihood. Tang and Zhou [7] used inverse probability weighted method in the varyingcoefficient composite quantile regression model with random missing covariates. Sun and Sun [8] proposed optimal inverse probability weighted estimation of regression parameters when selection probabilities were known in the quantile regression model with varying-coefficient.
We focus on the following varying-coefficient quantile regression model in this article: where τ 2 (0, 1), y i is the response variable of interest, which may represent the timing of the occurrence of some events, such as the time of death or disease, or some transformation of the time to the event [9], and x i is an observable covariate vector. Q τ (y i |x i ) is the conditional quantile function [10] of y i given x i , and β τ (T) 2 R m is the coefficient function vector dependent on τ. However, in some practical applications, y i may not be fully observed due to the occurrence of censoring. For example, response variable y i is subjected to interval censoring: suppose one does not observe y i , but censoring vector t 1i , t 2i , which satisfies P(t 1i < y i � t 2i ) = 1. Interval censored data is naturally produced in many clinical trials and longitudinal studies where individuals are tested regularly but not continuously. Interval censored data have been discussed by Sun [11] discuss several important topics about interval-censored failure time data that can occur in practice. Feng and Duan [12] studied a interval-censored data that distribution of or the underlying mechanisms behind censoring variables may depend on the treatment method, so it is different for subjects in different treatment groups. Chay and Powell [13], Ji and Peng [14], Li and Zhang [15], Lin and He [16], concerned linear regression with interval censored data. Zhou and Feng [17] propose an estimation method for quantile regression models with interval censored data. For varying-coefficient quantile regression model with censored data, Yin and Zeng [18] proposed a varying-coefficient quantile regression model subject to random censoring. Xie and Zhou [19] adopted a weighted inverse probability approach to develop a varying-coefficient model to the estimation of regression quantiles under random data censoring. These studies have not considered the coefficient function estimation method of the interval censored data.
The primary goal of this article is to develop a estimate method with interval censored data. We will use methods to estimate the coefficient function vector β τ (T) for general τ 2 (0, 1). We propose a direct generalization quantile regression(DGQR) estimation method and first to develop theory and methodology of the quantile regression for varying-coefficient models with interval censored data. Under some regularity conditions, obtain the asymptotic normality ofb t ðtÞ. The proposed estimator is defined as the optimal solution point of a minimization problem with convex objective function. The property of asymptotic normality is established with a bias converging to zero. We also compared the performance of our proposed method with other methods in the quantile regression with varying-coefficient models.
The rest of this paper is arranged as follows. In Section 2, we put forward the DGQR estimation method to quantile regression for varying-coefficient model with interval censored response observations. In Section 3, establish asymptotic properties of the estimator. In Section 4, simulations are achieved to investigate the finite sample performance of the proposed methods, and simulation results show that the proposed methods work well for various τ 2 (0, 1). Section 5 gives an example analysis. A conclusion are given in Section 6. In the appendix to Section 7, technical proofs are given.
In what follows, we first briefly introduce the quantile regression (QR) estimates under complete data. Then we discuss in detail the quantile regression method under the interval censored data. Throughout the paper, we denote β 0 (t) the derivative function of β(t). Denote k � k the L 2 norm of the corresponding vector.
Note that β j (T) is differentiable. By Taylor's expansion, we have [7] b j ðTÞ � b j ðtÞ þ b 0 j ðtÞðT À tÞ≔a j þ b j ðT À tÞ; j ¼ 1; � � � ; p: Thus, if all data y i f g n i¼1 are observable, the QR estimatorbðtÞ of β(t) [4] is defined as is a kernel function with bandwidth h, ρ τ (s) = s(τ − I(s < 0)) is the loss function (see, e.g., Koenker (2001) [20].), i.e., ( Next, we focus on the interval censoring case, i.e., y i can not be observed, and we can only observe two point t 1i and t 2i satisfying t 1i < y i � t 2i . Suppose the length of interval t 2i − t 1i is small. Then y i will be close to t 1i and t 2i . Under this assumption and some other regularity conditions, the probability of Pðx > i ½a þ bðT i À tÞ� 2 ðt 1i ; t 2i �Þ will be close to zero. Thereby, we can modify the loss function r t ðy i À x > i ½a þ bðT i À tÞ�Þ by using the method proposed by Zhou and Feng [17]. Define this method as DGQR estimation, i.e., In 2), we use F τ (�) instead of ρ τ (�) to make the notation clearer. Based on 2), the DGQR estimatorb n ðtÞ for interval censored varying-coefficient model 1) can be obtained by minimizing the following criterion function i.e.,b Obviously, if y i are exactly observed, i.e. t 1i = t 2i holds for each i, the DGQR estimatorb n ðtÞ defined in (4) will be reduced to quantile estimatorb n ðtÞ for the complete observed data.

Asymptotic properties
To study the asymptotic properties of varying-coefficient DGQR estimatorb n ðtÞ, we first give some assumptions.
C.1. The density function f(�) of ε has a continuous and uniformly bounded derivative, C.5. The kernel function K(�) is a symmetric density function with a compact support, whose bandwidth h ! 0, nh ! 1 as n ! 1 [7].
C.6. (t 1i , t 2i )(i = 1, � � �, n) are independent random vectors (not necessary to be indentically distributed) which satisfy sup i |t 2i − t 1i | � % n for some sequence of % n ! 0 as n ! 0. Moreover, G 1 i ð�Þ and G 2 i ð�Þ are the marginal distribution functions of t 1i and t 2i , which has continuous and bounded dervatives at the point which holds for all n large enough.
C.8. The sequence of the smallest eigenvalues of the matrices is bounded away from zero for some n large enough, where Now we are ready to state the consistency and asymptotic normality of the QR estimatorŝ b n ðtÞ.
holds as n ! + 1, where E m denotes the identity matrix of order m, " ! d " stands for convergence in distribution, and

Simulations
In all simulations, we always use the Uniform kernel [21], that is KðtÞ , and use the bandwidths h = 0.5n −1/3 . For each scenario, we report the BIAS and mean-squared error (MSE) of parameter estimators based on 500 replications, which is defined as Example 1. In this example, we adopt a data generation process similar to Kim et al [22]. With the regression model are generated as follows: (1) Sample covariate {x i } from a standard normal distribution with Normal(0,1).
(3) For each i, to generate censoring interval (t 1i , t 2i ], firstly we let Since the method proposed by Zhou and Feng [17] (Zhou estimation) can also be directly applied to quantile regression with varying-coefficient models. We are mainly interested in comparing the performance of the method proposed by Zhou and Feng [17] and ours (DGQR) in the quantile regression with varying-coefficient models. Frist we do simulations to compare these two methods for models with τ = 0.5 and sample size n = 200. The simulation results of quantile regression with varying-coefficient models, Zhou estimation, and DGQR estimation, including BIAS and MSE, are presented in Table 1. We summarize our findings below: (1) From Table 1, we can see that the estimation method (DGQR) we proposed in terms of BIAS and MSE is superior than the method proposed by Zhou and feng [17], for the quantile regression for varying-coefficient models.
(2) As is seen in Tables 2-4, all the biases and MSE decrease as n increases with different values of τ, the estimates seem to be unbiased. This implies our estimates are consistent for all the parameters.  (3)

Empirical analysis
In this section, we will use the proposed DGQR estimation and interval generation mechanism procedure to analyze the air pollution data set collected by the Norwegian Public Roads Administration. The data set consists of 500 observations and can be found in StatLib. The data includes the concentration of NO 2 (y i ) per hour of the day, the number of cars per hour (x 1i ), the wind speed (x 2i ) and the hour (T i ). We use varying-coefficient model based quantile    regression method to fit the data. We establish the following varying-coefficients model: We use the interval generation mechanism in the simulation which generates interval (t 1i , t 2i ] with y i .

PLOS ONE
In order to test whether the coefficient function really time varying, we consider the following test questions: where β = c(β 1 , β 2 ) is a constant vector. Based on 200 bootstrap resampling, we analyze interval censored data and give estimated functions of β 1 (T) and β 2 (T), along with the 95% bootstrap confidence bands, respectively. The p-values of test T n are both 0.00. Therefore, we should reject null hypothesis H 0 at a significance level of 0.05. Prove that model (5) is a varying-coefficient model.  Fig 4 also show that β 1 (T) and β 2 (T) are significant time varying with interval censored data. Furthermore, we can also see that the DGQR estimators confidence intervals with the completed data as long as with the interval censored data. Basically, we can see that β 1 (T) and β 2 (T) of completed data and interval censored data the results are consistent in the confidence interval. And there is no loss effect.
To further illustrate the effect of fitting, we perform the following residual analysis. Fig 5  plots the residual histogram (a) and AFC plot (b) of the model fitted to the data. We can see the residual histogram plot(a) it is close to the normal distribution, and the residual sequence cannot be seen to be correlated in the corresponding AFC chart (b). This fitting result also confirms the advantage of the varying-coefficient quantile model in fitting interval censored  data. As shown in the above results, when the data cannot be fully observed, our proposed method can well estimate the coefficient function.

Conclusions
In this paper, firstly proposes a coefficient function estimation method (DGQR estimation) for interval censored quantile regression with varying-coefficient model, which creatively solves the problem of interval censoring of response variables under the model. The property of asymptotic normality is established with a bias converging to zero and asymptotic normality are given a strict proof. We proposed methods do not require the interval censoring vectors to be identically distributed, and can be applied to models with fixed, discrete random, or continuous random design covariates. An other important advantage of the proposed methods is their computational simplicity, and all objective functions of the minimization problems involved in the proposed methods are simple, convex, and easy to treat. In the simulation, we put in the Uniform kernel, our simulation results support the validity of our methods. Finally, a real data sets analysis show that intervel censored of quantile regression with varying-coefficient model for the air pollution data set. The empirical analysis results are significant. Therefore the DGQR estimation for interval censored quantile regression with varying-coefficient models can be applied to alleviate the curse of dimensionality application.