Quantile estimation of semiparametric model with time-varying coefficients for panel count data

Yijun Wang; Weiwei Wang

doi:10.1371/journal.pone.0261224

Abstract

Panel count data frequently occurs in follow-up studies, such as medical research, social sciences, reliability studies, and tumorigenicity experiences. This type data has been extensively studied by various statistical models with time-invariant regression coefficients. However, the assumption of invariant coefficients may be violated in some reality, and the temporal covariate effects would be of great interest in research studies. This motivates us to consider a more flexible time-varying coefficient model. For statistical inference of the unknown functions, the quantile regression approach based on the B-spline approximation is developed. Asymptotic results on the convergence of the estimators are provided. Some simulation studies are presented to assess the finite-sample performance of the estimators. Finally, two applications of bladder cancer data and US flight delay data are analyzed by the proposed method.

Citation: Wang Y, Wang W (2021) Quantile estimation of semiparametric model with time-varying coefficients for panel count data. PLoS ONE 16(12): e0261224. https://doi.org/10.1371/journal.pone.0261224

Editor: Feng Chen, Tongji University, CHINA

Received: June 7, 2021; Accepted: November 27, 2021; Published: December 13, 2021

Copyright: © 2021 Wang, Wang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data of bladder cancer data can be found in the Table 9.2 of the book “Statistical analysis of panel count data” (Sun and Zhao, 2013). Besides, the 2015 US flight delay data can be obtained from https://www.kaggle.com/usdot/flight-delays. The authors had no special access privileges to data that others would not have.

Funding: This paper was partially supported by the National Natural Science Foundation of China under Grand No. 12001485; the National Bureau of Statistics of China under Grand No. 2020LY073, and the Characteristic & Preponderant Discipline of Key Construction Universities in Zhejiang Province (Zhejiang Gongshang University-Statistics). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In longitudinal follow-up studies, panel count data is frequently encountered in many fields such as medical research, social sciences, reliability studies, and tumorigenicity experiences, which has been widely analyzed by many authors. This type data is usually collected from the discrete observations in recurrent event process, as the continuous observations might be too expensive to be carried out. Thus, we can only obtain the cumulative occurrence numbers of the events of interest at these discrete observation times.

For the analysis of panel count data, [1, 2] developed the regression analysis approaches to the panel count data model. [3] studied the clustered mixed nonhomogeneous Poisson models of panel count data. [4] considered the spline-based likelihood estimation of the proportional mean model. To describe the potential correlations of the recurrent event process, [5–7] developed some joint models of panel count data by employing some frailty parameters to discuss these correlations. Recently, semiparametric transformation models with informative observation times were studied by many authors, such as [8–10]. More comprehensive introductions about this type data can be referred to the book of [11].

In general, the existing approaches in modeling panel count data are based on the time-invariant coefficients assumption, but which may be violated in practice. In some applications, coefficients may be time-varying, and sometimes it is more vital to detect the temporal impacts on the recurrent event process. For example, in medical studies, we are interested in detecting the temporal impacts of one new drug. Recently, [12, 13] proposed the varying coefficient models for recurrent events. However, the analysis of panel count data with varying coefficients is very limited. Most recently, [14] proposed a partially varying coefficient model of panel count data to account for the nonlinear interactions between covariates. [15] proposed a nonparametric proportional mean model of the panel count data with time-varying coefficients.

Quantile regression is widely used in the analysis of longitudinal data. It can provide more information about the distribution shape of the response and can be used to measure the effect of variables under different percentiles of the distribution. However, quantile regression methodologies for the panel count data are lagging. As the discreteness of the panel count data, quantile regression cannot be directly used. For the first, a smoothing technique (“jitter”) is used for this type data, then the quantile regression can be applied to the smooth data.

In this paper, a semiparametric time-varying coefficient model is formulated. For the inference of the unknown functions, quantile regression method is used for the panel count data, with the unknown functions approximated by the B-spline basis functions. Furthermore, the asymptotic results on the convergence of the estimators are established as well. The main contribution of the paper is that we propose a new spline-based quantile estimation procedure for the time-varying coefficient panel count data model, which has not been discussed in the literature.

Model specification

Suppose that n independent subjects are observed over time. N_i(t) denotes the cumulative total number of recurrent event occurring at or before time t for subject i. is a counting process with jumps at the discrete observation times, t_i,1 < t_i,2 < ⋯. We assume that t is in a fix interval ℜ of finite length. Besides, two follow-up times are existed: the potential censoring time and the observation endpoint T_i. Thus, only can be observed in the process, with . is assumed to be independent with N_i(t) and . Let denote the real observation process of subject i, and , i = 1, ⋯, n. Then, N_i(t) can be only acquired at the time points where H_i(t) jumps. The total number of the observations is defined as . Let Z_i be a p × 1 vector of covariates. So we can have the independent and identically distributed dataset {H_i(t), N_i(t)dH_i(t), C_i, δ_i, Z_i;t ≥ 0, i = 1, ⋯, n}.

To describe the possible time-varying effects of covariates on N_i(t), the time-varying coefficient model is proposed as follows.

(1) Given Z_i, the conditional mean function of N_i(t) is (1) where λ₀(u) is an unspecified smooth baseline intensity function, and β(u) is an unknown p × 1 vector of time-varying regression coefficients.
(2) Conditional on Z_i, are mutually independent.

For the model defined above, [15] developed the likelihood and pseudo-likelihood methods to get the estimation of the baseline intensity function λ₀(u) and the varying coefficient functions β(u) based on the Poisson distribution assumption on N_i(t). However, no distribution assumption is specified in this paper and the existed methods cannot be used. In the next section, the spline-based quantile regression is proposed to acquire the estimation of the unknown functions. In the first step, the unknown baseline intensity function and the coefficients are approximated by B-splines. And then, the discrete panel count data become continuous by a smoothing technique. Quantile regression is developed for the inference in the last step.

Estimation procedure

For the inference of Eq (1), the model can be rewritten as, where , η(u) = (β(u)^⊤, log{λ₀(u)})^⊤.

Approximations of baseline and varying coefficients

Similar as [16], we use the basis expansion method to get the estimation of the unknown functions in this paper. Suppose η_k(u), k = 1, 2, ⋯, p + 1, can be approximated by a basis expansion, that is where are basis functions, and L_k is the number of basis functions. Various basis functions can be used in the expansion such as Fourier basis functions, polynomial basis functions and B-spline functions. In this paper, the B-spline basis is selected in the estimation procedure for calculation simplicity.

The tuning parameter L_k is selected by L_k = n_k + q_k + 1, where n_k is the number of interior knots and q_k is the degree of the B-spline functions. The interior knots of the splines are equally spaced or placed on the sample quantiles of the data in all simulations and applications. The tuning parameter L_k may be different for different k. In this paper, we assume that L_k = L and q_k = q for all η_k(u). Thus, we define B_k(u) = B(u) for simplicity presentation.

Quantile regression

As quantile regression is a good alternative to the conditional mean models, the quantile regression is considered for the panel count data model. However, quantile regression cannot be directly used as the discreteness of the data N_i(t). According to the method developed in [17], the “jitter” method is applied to construct continuous random variables. By adding U_ij, which is generated from a [0, 1) uniform distribution, we can have where the noise U_ij is independent of N_i(t_ij) and Z_i. The uniform distribution is used because it allows computational simplifications. The uniform noise, however, is by no means a necessity to jitter the data. The noise may be generated by any continuous distribution with support on [0, 1). Thus, we can get the continuous data and there exists a one-to-one link between the quantiles of N_i(t_ij) and . The regression model of can be written as where ϵ_ij are assumed to be independent of t_ij with unknown cumulative distribution function (cdf) G(⋅) and density function g(⋅). Besides, the τ-th conditional quantile of ϵ_ij is b_τ.

The quantile regression loss function is defined as ρ_τ(u) = u[τ − I(u < 0)], τ ∈ (0, 1). Then the quantile regression is applied on the smooth data to obtain the estimation of the unknown parameters. Thus, the unknown parameters ϕ = (γ^⊤, b_τ)^⊤ can be estimated by minimizing the following objective function Ψ(ϕ), that is where W(u, X_i) = I_p+1 ⊗ B(u) ⋅ X_i and .

For the ease of calculation, Gauss-Legendre formula is used to approximate the integral. Thus, we have where ω_s is the Gauss coefficient, S is the number of the Gauss points and Δ_s is the Gauss point. The Gauss-Legendre approximation of the objective function Ψ(ϕ) can be defined as

Define be the minimizers of the approximation of the objective function Ψ(ϕ). It is nature to get the estimation of the varying coefficient β_k(u), k = 1, ⋯, p, and the baseline intensity function of λ₀(u) can be obtained by

Next, we discuss how to select the tuning parameter L and the Gauss point number S. As proposed by [16], we use the leave-one-subject-out cross-validation (CV) to choose L and S. Let and denote the estimators from the data with the i-th subject deleted. So the leave-one-subject-out CV can be written as

Thus, the tuning parameter L and S can be selected as

Remark 1 The number L_k of the basis expansion of β_k may be different from each other. However, we assume L_k = L for all k, for simplicity.

Asymptotic results

The asymptotic results are concluded in this section. Before presenting the results, some regularity conditions are introduced for the first.

(C1) Z_i is uniformly bounded.
(C2) The observation number m_i is bounded by a constant.
(C3) λ₀(u) and β_k(u), k = 1, ⋯, p, are l-th differentiable and bounded.
(C4) There exists an open subset Ω ⊂ R^pL+1, which contains the true parameter ϕ*. The second derivative matrix ∇² h(t_ij, X_i;ϕ) of h(t_ij, X_i;ϕ) with respect to ϕ, satisfies for all ϕ ∈ Ω, with , for all j, k.
(C5) , , and 0 < d₁ < λ_min(Γ) ≤ λ_max(Γ) < d₂ < ∞, where λ_min(Γ) and λ_max(Γ) denote the smallest and the largest eigenvalues of Γ.
(C6) ϵ_ij is independent with unknown distribution function G(⋅) and density g(⋅). Besides, the τ-th conditional quantile of ϵ_ij is ℓ_τ.

Under these above regularity conditions, the asymptotic results on the convergence of the estimators are displayed in the following theory. For the need of the proofs, a lemma of spline function of [18] is presented. First, define

Let S_kn be the space of splines of degree q consisting of functions η_kn satisfying: (i) the function η_kn to each subinterval is a polynomial spline of degree q; (ii) for q ≥ 1 and 0 ≤ q′ ≤ q, η_kn is q′ times continuously differentiable on the support. Besides, η_k is assumed to satisfy the following regularity condition. Let l₁ ∈ [0, q] be a nonnegative integer. The l₁-th derivative, denoted as , exists and satisfies the Lipschitz condition of order v ∈ (0, 1] such that ρ = l₁ + v > 0.5 and , for s, t ∈ [0, C], where δ is a positive constant.

Lemma 1 There exists η_kn ∈ S_kn such that ‖η_kn − η_k‖₂ = O_p(L^−ρ + L^1/2 m^−1/2). If L = O{m^1/(2ρ+1)}, then we have ‖η_kn − η_k‖₂ = O_p{(L/m)^1/2} = O_p{m^{−ρ/(2ρ+1)}}.

Theorem 1 Suppose the conditions (C1)–(C6) hold and if L = O{m^1/(2ρ+1)}, then we have

Furthermore, we have

Ignoring the approximation error in the B-spline basis approximation of β_k(u), k = 1, ⋯, p, we can have the 100(1 − α)% pointwise confidence interval of β_k(u) under quantile τ, where z_2/α is the 100(1 − α)% percentile of the standard normal distribution and . Similar as the baseline function λ₀(u).

Simulation studies

Three simulation studies are carried out to evaluate the performance of the method developed in this paper. We generated 200 datasets from the time-varying coefficient model, each of size n = 100 or 200 independent subjects. For each subject i, the endpoint of observation T_i is assumed to be 6 and the censoring time follows the uniform distribution of [T_i/2, 3T_i/2]. The number of observation times m_i is generated from a discrete uniform distribution {1, 2, 3, 4, 5}. And the observed event times, , are the order statistics of a random sample size m_i from the uniform distribution over (0, C_i). Given m_i and , the panel count data N_i(t_ij) can be obtained by the following formula for j = 1, ⋯m_i and i = 1, ⋯n. is the random number generated from the Poisson distribution with mean

The following three cases are considered:

Case I: p = 1 and the covariate Z_i is generated independently from the [0, 1] uniform distribution. The baseline function is taken as λ₀(u) = 2u + 1 and the varying coefficient β(u) = sin(−πu/6).
Case II: p = 1 and the covariate Z_i is generated independently from the [0, 1] uniform distribution. The baseline function is taken as λ₀(u) = 2(u + τ) and the varying coefficient β(u) = sin(−τπu/6).
Case III: p = 2 and the covariates Z_i are generated from the [0, 1]² uniform distribution with correlation cor(Z_ik, Z_il) = 0.5^|k−l|. The baseline function is taken as λ₀(u) = 2u + 1 and the varying coefficient β₁(u) = sin(−πu/6) and β₂(u) = 2sin(−τπu/6).

To estimate the smooth functions logλ₀(u) and β(u), the cubic B-spline functions are selected. Under different quantiles τ = {0.25, 0.5, 0.75}, the estimations of Case I–III are presented with sample size n = 100 or 200 in Tables 1–3, respectively. The results include the average of the absolute bias values based 100 grid points (BIAS), the average of sampling standard errors based 100 grid points (SSE), the average of the bootstrap standard errors based 100 grid points (BSE) and the average of the estimated 95% coverage probabilities based 100 grid points (CP). It can be seen that the estimations are unbiased under different quantiles. The values of SSE and BSE are close and decrease with the increasing sample size n. Besides, from the results of CP, we can note that the Gaussian approximation is appropriate for the estimators.

Download:

Table 1. BIAS, SSE, BSE and CP of the estimated functions in Case I at different τ.

https://doi.org/10.1371/journal.pone.0261224.t001

Download:

Table 2. BIAS, SSE, BSE and CP of the estimated functions in Case II at different τ.

https://doi.org/10.1371/journal.pone.0261224.t002

Download:

Table 3. BIAS, SSE, BSE and CP of the estimated functions in Case III at different τ.

https://doi.org/10.1371/journal.pone.0261224.t003

Figs 1–3 display the estimation curves of the unknown functions log λ₀(t) and β(t) with n = 200. In the figures, the point lines represent the estimated curves, the solid lines represent the true curves and the dotted lines represent the 95% confidence intervals. Based the figures, it is easy to find that the real curves and the estimated curves are very close, which indicates the B-spline estimations of the unknown functions work well. From the simulation results, we note that the estimations under different quantiles are reasonable for log λ₀(t) and β(t).

Download:

Fig 1. Estimated curves of time-varying functions in case I at different τ with n = 200.

https://doi.org/10.1371/journal.pone.0261224.g001

Download:

Fig 2. Estimated curves of time-varying functions in case II at different τ with n = 200.

https://doi.org/10.1371/journal.pone.0261224.g002

Download:

Fig 3. Estimated curves of time-varying functions in case III at different τ with n = 200.

https://doi.org/10.1371/journal.pone.0261224.g003

Applications

Bladder cancer data

Bladder cancer data was collected by the Veterans Administration Cooperative Urological Research Group. In this study, 85 patients were randomly assigned to two treatment groups: placebo group (47) and thiotepa group (38). For each patient, the observation times and the cumulative numbers of the bladder tumors that occurring at or before the observation times are recorded. The observation endpoint is 53 month. What’s more, the initial number of the bladder tumors and the largest initial tumor size for each patient are also recorded. In the literature, the dataset has been discussed by many authors such as [5, 7, 19]. However, time-varying coefficient panel count data model is not considered for this dataset.

In order to describe the temporal impacts of the covariates on the bladder cancer data, the time-varying coefficient model proposed in this paper is applied to these data. For each patient i, N_i(t) is denoted as the cumulative bladder tumors number occurring up to time t, and H_i(t) is denoted as the cumulative observation number up to time t, i = 1, ⋯, 85. Furthermore, let Z_i1 = 1 if the patient i is belonged to the thiotepa group and Z_i1 = 0 otherwise. Z_i2 is denoted as the initial tumor number and Z_i3 is the natural logarithm of the largest initial tumour size plus 1 for each patient i. Therefore, we have the model

Then quantile regression estimation is applied to this data. 100 samples are drawn from the data every time and 200 times are repeated in the estimation. Similar to the numerical studies, the unknown functions λ₀(t) and β_k(t), k = 1, 2, 3 are approximated by Cubic B-spline functions. The estimation is implemented under quantiles τ ∈ {0.25, 0.5, 0.75}.

The estimation curves of log λ₀(t) and β_k(t), k = 1, 2, 3 are displayed in Fig 4. In general, the thiotepa treatment and the tumor recurrence rate are negatively correlated at different quantiles. Patients in the thiotepa group tend to have less tumor recurrence rate than those in the placebo group. The initial tumor number is positively correlated with the recurrence rate and the largest initial tumor size is negatively correlated with the recurrence rate. These above conclusions are consistent with [19]. Furthermore, we can find the covariates impacts are varying during the observation time and the impacts are different at different quantiles. Thus, more information can be obtained from the quantile regression of the time-varying coefficient panel count data model than the other analysis in the existing literature.

Download:

Fig 4. Estimated curves of time-varying functions for bladder cancer data at different τ.

https://doi.org/10.1371/journal.pone.0261224.g004

US flight delay data

In this subsection, 2015 US flight delay data (available from https://www.kaggle.com/usdot/flight-delays) is analyzed with the time-varying coefficient panel count data model. This dataset was collected from the U.S. Department of Transportation’s (DOT) monthly Air Travel Consumer Report. The report contained information about the numbers of on-time, delayed, canceled, and diverted flights. The dataset included 9794 flights which were observed during 3 months in the year of 2015. The numbers of delays for each flight are recorded between the observation times. The observation times of each flight are the same and the observation interval is 7 days. Besides, the average departure delay time and the average flight distance of each flight are also recorded.

In order to describe the temporal covariates impacts on the flight delays, the time-varying coefficient model proposed in this paper is used to these data. For each flight i, N_i(t) is denoted as the cumulative flight delay number that had occurred up to time t, H_i(t) is denoted as the cumulative observation number up to time t, i = 1, ⋯, 9794. Furthermore, we define Z_i1 as the average time of the departure delay and Z_i2 as the average distance of the flight i. Therefore, we have the model

Then spline-based quantile estimation is applied to this data. Similarly, the unknown functions λ₀(t) and β_k(t), k = 1, 2 are also approximated by Cubic B-spline functions. The estimation is implemented under quantiles τ ∈ {0.25, 0.5, 0.75}.

As the sample size of the dataset is large, it is time-consuming or even not possible to read the entire dataset in practice due to the limited memory. Besides, the direct analysis can be infeasible, mainly due to the computing memory or computing time. In order to deal with the massive data, parallel computing method is developed by [20, 21]. In parallel computing method, we split the original dataset into a family of disjoint sub-sample blocks with equal size for the first. More precisely, the data structure can be defined as the following form: where the original dataset S is of size n = K × m which is partitioned to K subsample blocks S_k each consist m samples which are randomly picked up from the dataset S.

Thus, the estimation procedure proposed can be implemented for every block S_k, k = 1, ⋯, K and the estimated values of unknown parameters for each block S_k is denoted as . Similar to the method introduced in [21], the final full-sample estimators can be generated by

The estimation curves of log λ₀(t) and β_k(t), k = 1, 2, under different quantiles τ ∈ {0.25, 0.5, 0.75} are displayed in Fig 5. From Fig 5, we can find that the departure delay time is positively correlated with the cumulative flight delay numbers. Besides, the impact of the departure delay time is varying over the time under different quantiles and the impact is different at different quantiles. However, the effect of the flight distance is not significant on the flight delay numbers.

Download:

Fig 5. Estimated curves of time-varying functions for US flight delay data at different τ.

https://doi.org/10.1371/journal.pone.0261224.g005

Concluding remarks

In this paper, we proposed a spline-based quantile regression estimation method in the time-varying coefficient panel count data model. This model discussed in our paper is more general than [15], with no Poisson restriction on the recurrent event process. To get the estimations, B-splines are used to approximate the unknown functions log λ₀(t) and β(t) for the first, and then a smoothing technique is applied to obtain the continuation of the discrete panel count data. Finally, the spline-based quantile regression approach is developed at different quantiles. Some simulations are presented to evaluate the performance of the proposed approach and two applications are analyzed to demonstrate its effectiveness in this paper.

Recently, the Enron e-mail corpus which was a massive set of the e-mail messages, have been discussed by many authors, such as [22]. If we are interested in the number of interactions of all pairs of individuals in this longitudinal observations, as usual in network analysis, the snapshots are applied to model this longitudinal networks, then, this is a standard panel count dataset with massive observations. Furthermore, in this paper, we only considered the situation with low dimensional covariates, which may be not unpracticable in the applications. As the high-dimensional covariates may be existed, variable selection methods can be considered for the time-varying coefficient model. This will be an important topic for our further studies. Besides, reliability data and traffic data have been studied by many authors, such as [23–26]. This will be interesting to study the quantile regression estimation of such data.

Proof of Theorem 1

Define as the true but unknown values of , , , , ϕ = (γ^⊤, b_τ)^⊤ and

Let where

By the Taylor expansion, we can have r_ij = o_p(1). Besides, where and is between ϕ* and . Define

By the identity of [27],

Hence, it can be obtained that

Thus, ΔH can be denoted as ΔH = ΔH₁ + ΔH₂, with

By calculating the expectation and variance of ΔH₂,

By condition (C5), E[{∇h(t_ij, X_i;ϕ*)}^⊗2] = Γ, we can have

Next, we calculate the variance of ΔH₂,

Hence, we can have . Before discussing ΔH₁, we first define .

Then, we have E(κ) = 0 and Var(κ) = τ(1− τ)Γ. By the Cramer-Wald Theorem and the Central Limit Theorem, we can have that κ →_d N{0, τ(1 − τ)Γ}.

Next, we define so that . By simple calculation, we have

Thus κ₁ →_p κ. By Slutsky’s theorem, κ₁ →_d N{0, τ(1 − τ)Γ}. Then, we can have that

By the epi-convergence results of [28], . Finally, the asymptotic normality is proved .

Since , we have

By the Lemma 1, . Thus, we can get and

References

1. Diggle PJ, Liang KY, Zeger SL. The analysis of longitudinal data. Oxford University Press New York; 1994.
2. Sun Y. Estimation of semiparametric regression model with longitudinal data. Lifetime Data Analysis. 2010;16(2):271–298. pmid:19890712
- View Article
- PubMed/NCBI
- Google Scholar
3. Nielsen JD, Dean CB. Clustered mixed nonhomogeneous Poisson process spline models for the analysis of recurrentevent panel data. Biometrics. 2008;64(3):751–761. pmid:18047528
- View Article
- PubMed/NCBI
- Google Scholar
4. Lu M, Zhang Y, Huang J. Semiparametric estimation methods for panel count data using monotone B-splines. Journal of the American Statistical Association. 2009;104(487):1060–1070.
- View Article
- Google Scholar
5. He X, Tong X, Sun J. Semiparametric analysis of panel count data with correlated observation and follow-up times. Lifetime Data Analysis. 2009;15(2):177–196. pmid:19082711
- View Article
- PubMed/NCBI
- Google Scholar
6. Zhao X, Tong X. Semiparametric regression analysis of panel count data with informative observation times. Computational Statistics and Data Analysis. 2011;55(1):291–300.
- View Article
- Google Scholar
7. Zhao X, Tong X, Sun J, Azen SP. Robust estimation for panel count data with informative observation times. Computational Statistics and Data Analysis. 2013;57(1):33–40.
- View Article
- Google Scholar
8. Li N, Sun L, Sun J. Semiparametric transformation models for panel count data with dependent observation process. Statistics in Biosciences. 2010;2(22):191–210.
- View Article
- Google Scholar
9. Li N. Semiparametric transformation models for panel count data. University of Missouri–Columbia; 2011.
10. Li N, Zhao H, Sun J. Semiparametric transformation models for panel count data with correlated observation and follow-up times. Statistics in Medicine. 2013;32(17):3039–3054. pmid:23297190
- View Article
- PubMed/NCBI
- Google Scholar
11. Sun J, Zhao X. Statistical analysis of panel count data. Springer New York; 2013.
12. Chiang CT, Wang MC. Varying-coefficient model for the occurrence rate function of recurrent events. Annals of the Institute of Statistical Mathematics. 2009;61(1):197–213.
- View Article
- Google Scholar
13. Sun L, Zhou X, Guo S. Marginal regression models with time-varying coefficients for recurrent event data. Statistics in Medicine. 2011;30(18):2265–2277. pmid:21590791
- View Article
- PubMed/NCBI
- Google Scholar
14. He X, Feng X, Tong X, Zhao X. Semiparametric partially linear varying coefficient models with panel count data. Lifetime Data Analysis. 2016;23(3):1–28. pmid:27118299
- View Article
- PubMed/NCBI
- Google Scholar
15. Zhao H, Tu W, Yu Z. A nonparametric time-varying coefficient model for panel count data. Journal of Nonparametric Statistics. 2018;30(3):640–661.
- View Article
- Google Scholar
16. Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89(1):111–128.
- View Article
- Google Scholar
17. Machado JAF, Silva JMCS. Quantiles for counts. Journal of the American Statistical Association. 2005;100(472):1226–1237.
- View Article
- Google Scholar
18. Guo J, Tang M, Tian M, Zhu K. Variable selection in high-dimensional partially linear additive models for composite quantile regression. Computational Statistics and Data Analysis. 2013;65(9):56–67.
- View Article
- Google Scholar
19. Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation and censoring times. Journal of the Royal Statistical Society. 2000;62(2):293–302.
- View Article
- Google Scholar
20. Fan TH, Cheng KF. Tests and variables selection on regression analysis for massive datasets. Data & Knowledge Engineering. 2007;63(3):811–819.
- View Article
- Google Scholar
21. Liquet B, Saracco J. BIG-SIR: a sliced inverse regression approach for massive data. Statistics and its Interface. 2016;9(4):509–520.
- View Article
- Google Scholar
22. Perry PO, Wolfe PJ. Point process modelling for directed interaction networks. Journal of the Royal Statistical Society. 2013;75(5):821–849.
- View Article
- Google Scholar
23. Xu A, Zhou S, Tang Y. A unified model for system reliability evaluation under dynamic operating conditions. IEEE Transactions on Reliability. 2021;70(1):65–72.
- View Article
- Google Scholar
24. Chen F, Chen S, Ma X. Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. International Journal of Environmental Research and Public Health. 2016;13(6):609. pmid:27322306
- View Article
- PubMed/NCBI
- Google Scholar
25. Chen F, Chen S, Ma X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research. 2018;65:153–159. pmid:29776524
- View Article
- PubMed/NCBI
- Google Scholar
26. Dong B, Ma X, Chen F, Chen S. Investigating the differences of single-vehicle and multivehicle accident probability using mixed logit model. Journal of Advanced Transportation. 2018, UNSP 2702360.
- View Article
- Google Scholar
27. Knight K. Limiting distributions for L₁ regression estimators under general conditions. Annals of Statistics. 1998;26(2):755–770.
- View Article
- Google Scholar
28. Knight K, Fu W. Asymptotics for lasso-type estimators. Annals of Statistics. 2000;28(5):1356–1378.
- View Article
- Google Scholar

[ref1] 1. Diggle PJ, Liang KY, Zeger SL. The analysis of longitudinal data. Oxford University Press New York; 1994.

[ref2] 2. Sun Y. Estimation of semiparametric regression model with longitudinal data. Lifetime Data Analysis. 2010;16(2):271–298. pmid:19890712
View Article
PubMed/NCBI
Google Scholar

[3] View Article

[4] PubMed/NCBI

[5] Google Scholar

[ref3] 3. Nielsen JD, Dean CB. Clustered mixed nonhomogeneous Poisson process spline models for the analysis of recurrentevent panel data. Biometrics. 2008;64(3):751–761. pmid:18047528
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Lu M, Zhang Y, Huang J. Semiparametric estimation methods for panel count data using monotone B-splines. Journal of the American Statistical Association. 2009;104(487):1060–1070.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. He X, Tong X, Sun J. Semiparametric analysis of panel count data with correlated observation and follow-up times. Lifetime Data Analysis. 2009;15(2):177–196. pmid:19082711
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref6] 6. Zhao X, Tong X. Semiparametric regression analysis of panel count data with informative observation times. Computational Statistics and Data Analysis. 2011;55(1):291–300.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref7] 7. Zhao X, Tong X, Sun J, Azen SP. Robust estimation for panel count data with informative observation times. Computational Statistics and Data Analysis. 2013;57(1):33–40.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref8] 8. Li N, Sun L, Sun J. Semiparametric transformation models for panel count data with dependent observation process. Statistics in Biosciences. 2010;2(22):191–210.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref9] 9. Li N. Semiparametric transformation models for panel count data. University of Missouri–Columbia; 2011.

[ref10] 10. Li N, Zhao H, Sun J. Semiparametric transformation models for panel count data with correlated observation and follow-up times. Statistics in Medicine. 2013;32(17):3039–3054. pmid:23297190
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref11] 11. Sun J, Zhao X. Statistical analysis of panel count data. Springer New York; 2013.

[ref12] 12. Chiang CT, Wang MC. Varying-coefficient model for the occurrence rate function of recurrent events. Annals of the Institute of Statistical Mathematics. 2009;61(1):197–213.
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref13] 13. Sun L, Zhou X, Guo S. Marginal regression models with time-varying coefficients for recurrent event data. Statistics in Medicine. 2011;30(18):2265–2277. pmid:21590791
View Article
PubMed/NCBI
Google Scholar

[36] View Article

[37] PubMed/NCBI

[38] Google Scholar

[ref14] 14. He X, Feng X, Tong X, Zhao X. Semiparametric partially linear varying coefficient models with panel count data. Lifetime Data Analysis. 2016;23(3):1–28. pmid:27118299
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref15] 15. Zhao H, Tu W, Yu Z. A nonparametric time-varying coefficient model for panel count data. Journal of Nonparametric Statistics. 2018;30(3):640–661.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89(1):111–128.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Machado JAF, Silva JMCS. Quantiles for counts. Journal of the American Statistical Association. 2005;100(472):1226–1237.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Guo J, Tang M, Tian M, Zhu K. Variable selection in high-dimensional partially linear additive models for composite quantile regression. Computational Statistics and Data Analysis. 2013;65(9):56–67.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation and censoring times. Journal of the Royal Statistical Society. 2000;62(2):293–302.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Fan TH, Cheng KF. Tests and variables selection on regression analysis for massive datasets. Data & Knowledge Engineering. 2007;63(3):811–819.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Liquet B, Saracco J. BIG-SIR: a sliced inverse regression approach for massive data. Statistics and its Interface. 2016;9(4):509–520.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Perry PO, Wolfe PJ. Point process modelling for directed interaction networks. Journal of the Royal Statistical Society. 2013;75(5):821–849.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Xu A, Zhou S, Tang Y. A unified model for system reliability evaluation under dynamic operating conditions. IEEE Transactions on Reliability. 2021;70(1):65–72.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Chen F, Chen S, Ma X. Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. International Journal of Environmental Research and Public Health. 2016;13(6):609. pmid:27322306
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref25] 25. Chen F, Chen S, Ma X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research. 2018;65:153–159. pmid:29776524
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref26] 26. Dong B, Ma X, Chen F, Chen S. Investigating the differences of single-vehicle and multivehicle accident probability using mixed logit model. Journal of Advanced Transportation. 2018, UNSP 2702360.
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref27] 27. Knight K. Limiting distributions for L₁ regression estimators under general conditions. Annals of Statistics. 1998;26(2):755–770.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref28] 28. Knight K, Fu W. Asymptotics for lasso-type estimators. Annals of Statistics. 2000;28(5):1356–1378.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

Figures

Abstract

Introduction

Model specification

Estimation procedure

Approximations of baseline and varying coefficients

Quantile regression

Asymptotic results

Simulation studies

Applications

Bladder cancer data

US flight delay data

Concluding remarks

Proof of Theorem 1

References