Figures
Abstract
Panel count data frequently occurs in follow-up studies, such as medical research, social sciences, reliability studies, and tumorigenicity experiences. This type data has been extensively studied by various statistical models with time-invariant regression coefficients. However, the assumption of invariant coefficients may be violated in some reality, and the temporal covariate effects would be of great interest in research studies. This motivates us to consider a more flexible time-varying coefficient model. For statistical inference of the unknown functions, the quantile regression approach based on the B-spline approximation is developed. Asymptotic results on the convergence of the estimators are provided. Some simulation studies are presented to assess the finite-sample performance of the estimators. Finally, two applications of bladder cancer data and US flight delay data are analyzed by the proposed method.
Citation: Wang Y, Wang W (2021) Quantile estimation of semiparametric model with time-varying coefficients for panel count data. PLoS ONE 16(12): e0261224. https://doi.org/10.1371/journal.pone.0261224
Editor: Feng Chen, Tongji University, CHINA
Received: June 7, 2021; Accepted: November 27, 2021; Published: December 13, 2021
Copyright: © 2021 Wang, Wang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data of bladder cancer data can be found in the Table 9.2 of the book “Statistical analysis of panel count data” (Sun and Zhao, 2013). Besides, the 2015 US flight delay data can be obtained from https://www.kaggle.com/usdot/flight-delays. The authors had no special access privileges to data that others would not have.
Funding: This paper was partially supported by the National Natural Science Foundation of China under Grand No. 12001485; the National Bureau of Statistics of China under Grand No. 2020LY073, and the Characteristic & Preponderant Discipline of Key Construction Universities in Zhejiang Province (Zhejiang Gongshang University-Statistics). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In longitudinal follow-up studies, panel count data is frequently encountered in many fields such as medical research, social sciences, reliability studies, and tumorigenicity experiences, which has been widely analyzed by many authors. This type data is usually collected from the discrete observations in recurrent event process, as the continuous observations might be too expensive to be carried out. Thus, we can only obtain the cumulative occurrence numbers of the events of interest at these discrete observation times.
For the analysis of panel count data, [1, 2] developed the regression analysis approaches to the panel count data model. [3] studied the clustered mixed nonhomogeneous Poisson models of panel count data. [4] considered the spline-based likelihood estimation of the proportional mean model. To describe the potential correlations of the recurrent event process, [5–7] developed some joint models of panel count data by employing some frailty parameters to discuss these correlations. Recently, semiparametric transformation models with informative observation times were studied by many authors, such as [8–10]. More comprehensive introductions about this type data can be referred to the book of [11].
In general, the existing approaches in modeling panel count data are based on the time-invariant coefficients assumption, but which may be violated in practice. In some applications, coefficients may be time-varying, and sometimes it is more vital to detect the temporal impacts on the recurrent event process. For example, in medical studies, we are interested in detecting the temporal impacts of one new drug. Recently, [12, 13] proposed the varying coefficient models for recurrent events. However, the analysis of panel count data with varying coefficients is very limited. Most recently, [14] proposed a partially varying coefficient model of panel count data to account for the nonlinear interactions between covariates. [15] proposed a nonparametric proportional mean model of the panel count data with time-varying coefficients.
Quantile regression is widely used in the analysis of longitudinal data. It can provide more information about the distribution shape of the response and can be used to measure the effect of variables under different percentiles of the distribution. However, quantile regression methodologies for the panel count data are lagging. As the discreteness of the panel count data, quantile regression cannot be directly used. For the first, a smoothing technique (“jitter”) is used for this type data, then the quantile regression can be applied to the smooth data.
In this paper, a semiparametric time-varying coefficient model is formulated. For the inference of the unknown functions, quantile regression method is used for the panel count data, with the unknown functions approximated by the B-spline basis functions. Furthermore, the asymptotic results on the convergence of the estimators are established as well. The main contribution of the paper is that we propose a new spline-based quantile estimation procedure for the time-varying coefficient panel count data model, which has not been discussed in the literature.
Model specification
Suppose that n independent subjects are observed over time. Ni(t) denotes the cumulative total number of recurrent event occurring at or before time t for subject i. is a counting process with jumps at the discrete observation times, ti,1 < ti,2 < ⋯. We assume that t is in a fix interval ℜ of finite length. Besides, two follow-up times are existed: the potential censoring time
and the observation endpoint Ti. Thus, only
can be observed in the process, with
.
is assumed to be independent with Ni(t) and
. Let
denote the real observation process of subject i, and
, i = 1, ⋯, n. Then, Ni(t) can be only acquired at the time points where Hi(t) jumps. The total number of the observations is defined as
. Let Zi be a p × 1 vector of covariates. So we can have the independent and identically distributed dataset {Hi(t), Ni(t)dHi(t), Ci, δi, Zi;t ≥ 0, i = 1, ⋯, n}.
To describe the possible time-varying effects of covariates on Ni(t), the time-varying coefficient model is proposed as follows.
- (1) Given Zi, the conditional mean function of Ni(t) is
(1) where λ0(u) is an unspecified smooth baseline intensity function, and β(u) is an unknown p × 1 vector of time-varying regression coefficients.
- (2) Conditional on Zi,
are mutually independent.
For the model defined above, [15] developed the likelihood and pseudo-likelihood methods to get the estimation of the baseline intensity function λ0(u) and the varying coefficient functions β(u) based on the Poisson distribution assumption on Ni(t). However, no distribution assumption is specified in this paper and the existed methods cannot be used. In the next section, the spline-based quantile regression is proposed to acquire the estimation of the unknown functions. In the first step, the unknown baseline intensity function and the coefficients are approximated by B-splines. And then, the discrete panel count data become continuous by a smoothing technique. Quantile regression is developed for the inference in the last step.
Estimation procedure
For the inference of Eq (1), the model can be rewritten as,
where
, η(u) = (β(u)⊤, log{λ0(u)})⊤.
Approximations of baseline and varying coefficients
Similar as [16], we use the basis expansion method to get the estimation of the unknown functions in this paper. Suppose ηk(u), k = 1, 2, ⋯, p + 1, can be approximated by a basis expansion, that is
where
are basis functions,
and Lk is the number of basis functions. Various basis functions can be used in the expansion such as Fourier basis functions, polynomial basis functions and B-spline functions. In this paper, the B-spline basis is selected in the estimation procedure for calculation simplicity.
The tuning parameter Lk is selected by Lk = nk + qk + 1, where nk is the number of interior knots and qk is the degree of the B-spline functions. The interior knots of the splines are equally spaced or placed on the sample quantiles of the data in all simulations and applications. The tuning parameter Lk may be different for different k. In this paper, we assume that Lk = L and qk = q for all ηk(u). Thus, we define Bk(u) = B(u) for simplicity presentation.
Quantile regression
As quantile regression is a good alternative to the conditional mean models, the quantile regression is considered for the panel count data model. However, quantile regression cannot be directly used as the discreteness of the data Ni(t). According to the method developed in [17], the “jitter” method is applied to construct continuous random variables. By adding Uij, which is generated from a [0, 1) uniform distribution, we can have
where the noise Uij is independent of Ni(tij) and Zi. The uniform distribution is used because it allows computational simplifications. The uniform noise, however, is by no means a necessity to jitter the data. The noise may be generated by any continuous distribution with support on [0, 1). Thus, we can get the continuous data
and there exists a one-to-one link between the quantiles of Ni(tij) and
. The regression model of
can be written as
where ϵij are assumed to be independent of tij with unknown cumulative distribution function (cdf) G(⋅) and density function g(⋅). Besides, the τ-th conditional quantile of ϵij is bτ.
The quantile regression loss function is defined as ρτ(u) = u[τ − I(u < 0)], τ ∈ (0, 1). Then the quantile regression is applied on the smooth data to obtain the estimation of the unknown parameters. Thus, the unknown parameters ϕ = (γ⊤, bτ)⊤ can be estimated by minimizing the following objective function Ψ(ϕ), that is
where W(u, Xi) = Ip+1 ⊗ B(u) ⋅ Xi and
.
For the ease of calculation, Gauss-Legendre formula is used to approximate the integral. Thus, we have
where ωs is the Gauss coefficient, S is the number of the Gauss points and Δs is the Gauss point. The Gauss-Legendre approximation of the objective function Ψ(ϕ) can be defined as
Define be the minimizers of the approximation of the objective function Ψ(ϕ). It is nature to get the estimation of the varying coefficient βk(u), k = 1, ⋯, p,
and the baseline intensity function of λ0(u) can be obtained by
Next, we discuss how to select the tuning parameter L and the Gauss point number S. As proposed by [16], we use the leave-one-subject-out cross-validation (CV) to choose L and S. Let and
denote the estimators from the data with the i-th subject deleted. So the leave-one-subject-out CV can be written as
Thus, the tuning parameter L and S can be selected as
Remark 1 The number Lk of the basis expansion of βk may be different from each other. However, we assume Lk = L for all k, for simplicity.
Asymptotic results
The asymptotic results are concluded in this section. Before presenting the results, some regularity conditions are introduced for the first.
- (C1) Zi is uniformly bounded.
- (C2) The observation number mi is bounded by a constant.
- (C3) λ0(u) and βk(u), k = 1, ⋯, p, are l-th differentiable and bounded.
- (C4) There exists an open subset Ω ⊂ RpL+1, which contains the true parameter ϕ*. The second derivative matrix ∇2 h(tij, Xi;ϕ) of h(tij, Xi;ϕ) with respect to ϕ, satisfies
for all ϕ ∈ Ω, with
,
for all j, k.
- (C5)
,
, and 0 < d1 < λmin(Γ) ≤ λmax(Γ) < d2 < ∞, where λmin(Γ) and λmax(Γ) denote the smallest and the largest eigenvalues of Γ.
- (C6) ϵij is independent with unknown distribution function G(⋅) and density g(⋅). Besides, the τ-th conditional quantile of ϵij is ℓτ.
Under these above regularity conditions, the asymptotic results on the convergence of the estimators are displayed in the following theory. For the need of the proofs, a lemma of spline function of [18] is presented. First, define
Let Skn be the space of splines of degree q consisting of functions ηkn satisfying: (i) the function ηkn to each subinterval is a polynomial spline of degree q; (ii) for q ≥ 1 and 0 ≤ q′ ≤ q, ηkn is q′ times continuously differentiable on the support. Besides, ηk is assumed to satisfy the following regularity condition. Let l1 ∈ [0, q] be a nonnegative integer. The l1-th derivative, denoted as , exists and satisfies the Lipschitz condition of order v ∈ (0, 1] such that ρ = l1 + v > 0.5 and
, for s, t ∈ [0, C], where δ is a positive constant.
Lemma 1 There exists ηkn ∈ Skn such that ‖ηkn − ηk‖2 = Op(L−ρ + L1/2 m−1/2). If L = O{m1/(2ρ+1)}, then we have ‖ηkn − ηk‖2 = Op{(L/m)1/2} = Op{m−ρ/(2ρ+1)}.
Theorem 1 Suppose the conditions (C1)–(C6) hold and if L = O{m1/(2ρ+1)}, then we have
Ignoring the approximation error in the B-spline basis approximation of βk(u), k = 1, ⋯, p, we can have the 100(1 − α)% pointwise confidence interval of βk(u) under quantile τ,
where z2/α is the 100(1 − α)% percentile of the standard normal distribution and
. Similar as the baseline function λ0(u).
Simulation studies
Three simulation studies are carried out to evaluate the performance of the method developed in this paper. We generated 200 datasets from the time-varying coefficient model, each of size n = 100 or 200 independent subjects. For each subject i, the endpoint of observation Ti is assumed to be 6 and the censoring time follows the uniform distribution of [Ti/2, 3Ti/2]. The number of observation times mi is generated from a discrete uniform distribution {1, 2, 3, 4, 5}. And the observed event times,
, are the order statistics of a random sample size mi from the uniform distribution over (0, Ci). Given mi and
, the panel count data Ni(tij) can be obtained by the following formula
for j = 1, ⋯mi and i = 1, ⋯n.
is the random number generated from the Poisson distribution with mean
The following three cases are considered:
- Case I: p = 1 and the covariate Zi is generated independently from the [0, 1] uniform distribution. The baseline function is taken as λ0(u) = 2u + 1 and the varying coefficient β(u) = sin(−πu/6).
- Case II: p = 1 and the covariate Zi is generated independently from the [0, 1] uniform distribution. The baseline function is taken as λ0(u) = 2(u + τ) and the varying coefficient β(u) = sin(−τπu/6).
- Case III: p = 2 and the covariates Zi are generated from the [0, 1]2 uniform distribution with correlation cor(Zik, Zil) = 0.5|k−l|. The baseline function is taken as λ0(u) = 2u + 1 and the varying coefficient β1(u) = sin(−πu/6) and β2(u) = 2sin(−τπu/6).
To estimate the smooth functions logλ0(u) and β(u), the cubic B-spline functions are selected. Under different quantiles τ = {0.25, 0.5, 0.75}, the estimations of Case I–III are presented with sample size n = 100 or 200 in Tables 1–3, respectively. The results include the average of the absolute bias values based 100 grid points (BIAS), the average of sampling standard errors based 100 grid points (SSE), the average of the bootstrap standard errors based 100 grid points (BSE) and the average of the estimated 95% coverage probabilities based 100 grid points (CP). It can be seen that the estimations are unbiased under different quantiles. The values of SSE and BSE are close and decrease with the increasing sample size n. Besides, from the results of CP, we can note that the Gaussian approximation is appropriate for the estimators.
Figs 1–3 display the estimation curves of the unknown functions log λ0(t) and β(t) with n = 200. In the figures, the point lines represent the estimated curves, the solid lines represent the true curves and the dotted lines represent the 95% confidence intervals. Based the figures, it is easy to find that the real curves and the estimated curves are very close, which indicates the B-spline estimations of the unknown functions work well. From the simulation results, we note that the estimations under different quantiles are reasonable for log λ0(t) and β(t).
Applications
Bladder cancer data
Bladder cancer data was collected by the Veterans Administration Cooperative Urological Research Group. In this study, 85 patients were randomly assigned to two treatment groups: placebo group (47) and thiotepa group (38). For each patient, the observation times and the cumulative numbers of the bladder tumors that occurring at or before the observation times are recorded. The observation endpoint is 53 month. What’s more, the initial number of the bladder tumors and the largest initial tumor size for each patient are also recorded. In the literature, the dataset has been discussed by many authors such as [5, 7, 19]. However, time-varying coefficient panel count data model is not considered for this dataset.
In order to describe the temporal impacts of the covariates on the bladder cancer data, the time-varying coefficient model proposed in this paper is applied to these data. For each patient i, Ni(t) is denoted as the cumulative bladder tumors number occurring up to time t, and Hi(t) is denoted as the cumulative observation number up to time t, i = 1, ⋯, 85. Furthermore, let Zi1 = 1 if the patient i is belonged to the thiotepa group and Zi1 = 0 otherwise. Zi2 is denoted as the initial tumor number and Zi3 is the natural logarithm of the largest initial tumour size plus 1 for each patient i. Therefore, we have the model
Then quantile regression estimation is applied to this data. 100 samples are drawn from the data every time and 200 times are repeated in the estimation. Similar to the numerical studies, the unknown functions λ0(t) and βk(t), k = 1, 2, 3 are approximated by Cubic B-spline functions. The estimation is implemented under quantiles τ ∈ {0.25, 0.5, 0.75}.
The estimation curves of log λ0(t) and βk(t), k = 1, 2, 3 are displayed in Fig 4. In general, the thiotepa treatment and the tumor recurrence rate are negatively correlated at different quantiles. Patients in the thiotepa group tend to have less tumor recurrence rate than those in the placebo group. The initial tumor number is positively correlated with the recurrence rate and the largest initial tumor size is negatively correlated with the recurrence rate. These above conclusions are consistent with [19]. Furthermore, we can find the covariates impacts are varying during the observation time and the impacts are different at different quantiles. Thus, more information can be obtained from the quantile regression of the time-varying coefficient panel count data model than the other analysis in the existing literature.
US flight delay data
In this subsection, 2015 US flight delay data (available from https://www.kaggle.com/usdot/flight-delays) is analyzed with the time-varying coefficient panel count data model. This dataset was collected from the U.S. Department of Transportation’s (DOT) monthly Air Travel Consumer Report. The report contained information about the numbers of on-time, delayed, canceled, and diverted flights. The dataset included 9794 flights which were observed during 3 months in the year of 2015. The numbers of delays for each flight are recorded between the observation times. The observation times of each flight are the same and the observation interval is 7 days. Besides, the average departure delay time and the average flight distance of each flight are also recorded.
In order to describe the temporal covariates impacts on the flight delays, the time-varying coefficient model proposed in this paper is used to these data. For each flight i, Ni(t) is denoted as the cumulative flight delay number that had occurred up to time t, Hi(t) is denoted as the cumulative observation number up to time t, i = 1, ⋯, 9794. Furthermore, we define Zi1 as the average time of the departure delay and Zi2 as the average distance of the flight i. Therefore, we have the model
Then spline-based quantile estimation is applied to this data. Similarly, the unknown functions λ0(t) and βk(t), k = 1, 2 are also approximated by Cubic B-spline functions. The estimation is implemented under quantiles τ ∈ {0.25, 0.5, 0.75}.
As the sample size of the dataset is large, it is time-consuming or even not possible to read the entire dataset in practice due to the limited memory. Besides, the direct analysis can be infeasible, mainly due to the computing memory or computing time. In order to deal with the massive data, parallel computing method is developed by [20, 21]. In parallel computing method, we split the original dataset into a family of disjoint sub-sample blocks with equal size for the first. More precisely, the data structure can be defined as the following form:
where the original dataset S is of size n = K × m which is partitioned to K subsample blocks Sk each consist m samples which are randomly picked up from the dataset S.
Thus, the estimation procedure proposed can be implemented for every block Sk, k = 1, ⋯, K and the estimated values of unknown parameters for each block Sk is denoted as . Similar to the method introduced in [21], the final full-sample estimators can be generated by
The estimation curves of log λ0(t) and βk(t), k = 1, 2, under different quantiles τ ∈ {0.25, 0.5, 0.75} are displayed in Fig 5. From Fig 5, we can find that the departure delay time is positively correlated with the cumulative flight delay numbers. Besides, the impact of the departure delay time is varying over the time under different quantiles and the impact is different at different quantiles. However, the effect of the flight distance is not significant on the flight delay numbers.
Concluding remarks
In this paper, we proposed a spline-based quantile regression estimation method in the time-varying coefficient panel count data model. This model discussed in our paper is more general than [15], with no Poisson restriction on the recurrent event process. To get the estimations, B-splines are used to approximate the unknown functions log λ0(t) and β(t) for the first, and then a smoothing technique is applied to obtain the continuation of the discrete panel count data. Finally, the spline-based quantile regression approach is developed at different quantiles. Some simulations are presented to evaluate the performance of the proposed approach and two applications are analyzed to demonstrate its effectiveness in this paper.
Recently, the Enron e-mail corpus which was a massive set of the e-mail messages, have been discussed by many authors, such as [22]. If we are interested in the number of interactions of all pairs of individuals in this longitudinal observations, as usual in network analysis, the snapshots are applied to model this longitudinal networks, then, this is a standard panel count dataset with massive observations. Furthermore, in this paper, we only considered the situation with low dimensional covariates, which may be not unpracticable in the applications. As the high-dimensional covariates may be existed, variable selection methods can be considered for the time-varying coefficient model. This will be an important topic for our further studies. Besides, reliability data and traffic data have been studied by many authors, such as [23–26]. This will be interesting to study the quantile regression estimation of such data.
Proof of Theorem 1
Define as the true but unknown values of
,
,
,
, ϕ = (γ⊤, bτ)⊤ and
By the Taylor expansion, we can have rij = op(1). Besides,
where
and
is between ϕ* and
. Define
By the identity of [27],
Hence, it can be obtained that
Thus, ΔH can be denoted as ΔH = ΔH1 + ΔH2, with
By calculating the expectation and variance of ΔH2,
By condition (C5), E[{∇h(tij, Xi;ϕ*)}⊗2] = Γ, we can have
Next, we calculate the variance of ΔH2,
Hence, we can have . Before discussing ΔH1, we first define
.
Then, we have E(κ) = 0 and Var(κ) = τ(1− τ)Γ. By the Cramer-Wald Theorem and the Central Limit Theorem, we can have that κ →d N{0, τ(1 − τ)Γ}.
Next, we define
so that
. By simple calculation, we have
Thus κ1 →p κ. By Slutsky’s theorem, κ1 →d N{0, τ(1 − τ)Γ}. Then, we can have that
By the epi-convergence results of [28], . Finally, the asymptotic normality is proved
.
References
- 1.
Diggle PJ, Liang KY, Zeger SL. The analysis of longitudinal data. Oxford University Press New York; 1994.
- 2. Sun Y. Estimation of semiparametric regression model with longitudinal data. Lifetime Data Analysis. 2010;16(2):271–298. pmid:19890712
- 3. Nielsen JD, Dean CB. Clustered mixed nonhomogeneous Poisson process spline models for the analysis of recurrentevent panel data. Biometrics. 2008;64(3):751–761. pmid:18047528
- 4. Lu M, Zhang Y, Huang J. Semiparametric estimation methods for panel count data using monotone B-splines. Journal of the American Statistical Association. 2009;104(487):1060–1070.
- 5. He X, Tong X, Sun J. Semiparametric analysis of panel count data with correlated observation and follow-up times. Lifetime Data Analysis. 2009;15(2):177–196. pmid:19082711
- 6. Zhao X, Tong X. Semiparametric regression analysis of panel count data with informative observation times. Computational Statistics and Data Analysis. 2011;55(1):291–300.
- 7. Zhao X, Tong X, Sun J, Azen SP. Robust estimation for panel count data with informative observation times. Computational Statistics and Data Analysis. 2013;57(1):33–40.
- 8. Li N, Sun L, Sun J. Semiparametric transformation models for panel count data with dependent observation process. Statistics in Biosciences. 2010;2(22):191–210.
- 9.
Li N. Semiparametric transformation models for panel count data. University of Missouri–Columbia; 2011.
- 10. Li N, Zhao H, Sun J. Semiparametric transformation models for panel count data with correlated observation and follow-up times. Statistics in Medicine. 2013;32(17):3039–3054. pmid:23297190
- 11.
Sun J, Zhao X. Statistical analysis of panel count data. Springer New York; 2013.
- 12. Chiang CT, Wang MC. Varying-coefficient model for the occurrence rate function of recurrent events. Annals of the Institute of Statistical Mathematics. 2009;61(1):197–213.
- 13. Sun L, Zhou X, Guo S. Marginal regression models with time-varying coefficients for recurrent event data. Statistics in Medicine. 2011;30(18):2265–2277. pmid:21590791
- 14. He X, Feng X, Tong X, Zhao X. Semiparametric partially linear varying coefficient models with panel count data. Lifetime Data Analysis. 2016;23(3):1–28. pmid:27118299
- 15. Zhao H, Tu W, Yu Z. A nonparametric time-varying coefficient model for panel count data. Journal of Nonparametric Statistics. 2018;30(3):640–661.
- 16. Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002;89(1):111–128.
- 17. Machado JAF, Silva JMCS. Quantiles for counts. Journal of the American Statistical Association. 2005;100(472):1226–1237.
- 18. Guo J, Tang M, Tian M, Zhu K. Variable selection in high-dimensional partially linear additive models for composite quantile regression. Computational Statistics and Data Analysis. 2013;65(9):56–67.
- 19. Sun J, Wei LJ. Regression analysis of panel count data with covariate-dependent observation and censoring times. Journal of the Royal Statistical Society. 2000;62(2):293–302.
- 20. Fan TH, Cheng KF. Tests and variables selection on regression analysis for massive datasets. Data & Knowledge Engineering. 2007;63(3):811–819.
- 21. Liquet B, Saracco J. BIG-SIR: a sliced inverse regression approach for massive data. Statistics and its Interface. 2016;9(4):509–520.
- 22. Perry PO, Wolfe PJ. Point process modelling for directed interaction networks. Journal of the Royal Statistical Society. 2013;75(5):821–849.
- 23. Xu A, Zhou S, Tang Y. A unified model for system reliability evaluation under dynamic operating conditions. IEEE Transactions on Reliability. 2021;70(1):65–72.
- 24. Chen F, Chen S, Ma X. Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. International Journal of Environmental Research and Public Health. 2016;13(6):609. pmid:27322306
- 25. Chen F, Chen S, Ma X. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research. 2018;65:153–159. pmid:29776524
- 26. Dong B, Ma X, Chen F, Chen S. Investigating the differences of single-vehicle and multivehicle accident probability using mixed logit model. Journal of Advanced Transportation. 2018, UNSP 2702360.
- 27. Knight K. Limiting distributions for L1 regression estimators under general conditions. Annals of Statistics. 1998;26(2):755–770.
- 28. Knight K, Fu W. Asymptotics for lasso-type estimators. Annals of Statistics. 2000;28(5):1356–1378.