Integrating preliminary test and Stein-type techniques to improve estimation in the time-dependent Cox model

Rohollah Ramezani; Mohammad Reza Rabiei; Mohammad Arashi

doi:10.1371/journal.pone.0345123

Abstract

While shrinkage and preliminary test estimation have long been studied in linear and static Cox models, their theoretical integration within models featuring time-dependent covariates has remained unresolved due to the evolving risk set and nonhomogeneous information accumulation inherent in such data. In this study, we develop a unified framework for shrinkage estimation in the time-dependent Cox proportional hazards model, by extending the classical Stein-type theory to a dynamic semiparametric survival setting. Our theoretical analyses reveal that the positive-rule Stein estimator preserves unbiasedness under valid restrictions while adaptively attenuating variance inflation when the restriction is approximately correct, striking a principled balance between efficiency and robustness. A comprehensive Monte Carlo simulation study and an empirical application to the Mayo Clinic primary biliary cirrhosis dataset substantiate the theoretical advantages, demonstrating that the superior estimation strategy achieves substantial efficiency gains relative to both unrestricted and penalized estimators such as adaptive LASSO.

Citation: Ramezani R, Rabiei MR, Arashi M (2026) Integrating preliminary test and Stein-type techniques to improve estimation in the time-dependent Cox model. PLoS One 21(6): e0345123. https://doi.org/10.1371/journal.pone.0345123

Editor: Zakariya Yahya Algamal, University of Mosul, IRAQ

Received: February 26, 2026; Accepted: May 16, 2026; Published: June 8, 2026

Copyright: © 2026 Ramezani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are publicly available through the R package “survival” (Therneau & Lumley, 2015), which includes the Mayo Clinic Primary Biliary Cirrhosis (PBC) dataset. The package is accessible at https://cran.r-project.org/package=survival.

Funding: Mohammad Arashi received partial funding from the Iran National Science Foundation (INSF), Grant No. 4015320 (https://insf.org). The funder had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors declare no conflict of interest.

Introduction

Since the pioneering work of Cox (1972) [1], the proportional hazards model has become a foundational tool for survival analysis. Its semiparametric formulation enables flexible assessment of covariate effects on event times, yet the classical model assumes time-invariant effects—an assumption often violated in biomedical studies where risk factors evolve dynamically. Time-dependent extensions of the Cox model—originally formalized via the counting-process and martingale framework by Andersen and Gill (1982) [2] and subsequently advanced by Fleming and Harrington (2005) [3] and Andersen et al. (1995) [4]—provide a rigorous theoretical foundation for estimation and inference in the presence of dynamic covariates.

Although the partial likelihood estimator is consistent and asymptotically normal under standard regularity conditions [2], its finite-sample efficiency may deteriorate in the presence of weak signals, multicollinearity, or when credible prior information about linear constraints is available. Shrinkage methods offer a principled strategy for improving efficiency by introducing controlled bias. Originating from the seminal contributions of Stein (1956) [5] and James and Stein (1992) [6], this idea has stimulated extensive research across linear, ridge-type, robust, and high-dimensional settings [7–11]. A modern perspective on shrinkage and its empirical Bayes interpretation is given by Efron (2024) [12].

Despite these advances, shrinkage estimation for time-dependent Cox models remains comparatively underdeveloped. Existing work has largely focused on time-independent covariates, where risk sets and information accumulation are more tractable. In contrast, time-dependent covariates introduce evolving risk structures, nonhomogeneous information processes, and more intricate local asymptotic behavior.

While asymptotic properties under local alternatives have been studied in classical linear models and Cox models with time-independent covariates [13], existing results typically do not characterize the joint behavior of restricted and unrestricted estimators in the presence of time-dependent covariates. Moreover, the integration of such asymptotic frameworks with Stein-type shrinkage estimation remains limited. These considerations highlight the need for a unified treatment of shrinkage estimation in this semiparametric setting.

Parallel developments in variable selection for Cox models—ranging from classical stepwise procedures [13] to modern penalized likelihood methods such as LASSO [14], adaptive LASSO, Elastic Net (ENET), SCAD [15], and group LASSO [16]—have demonstrated excellent empirical performance, especially in correlated or moderately high-dimensional data. However, penalized estimators do not naturally incorporate approximate linear restrictions, nor do they exhibit the adaptive risk-balancing behavior characteristic of Stein-type methods.

This paper addresses this gap by developing a unified shrinkage framework for the time-dependent Cox proportional hazards model. We introduce three improved estimators—the Preliminary Test Estimator (PTE), the Stein-type Estimator (SE), and the Positive-Rule Stein Estimator (PSE)—and derive their asymptotic distributions under local alternatives using counting-process theory. Closed-form expressions for asymptotic bias, risk, and mean squared error are obtained, clarifying the conditions under which shrinkage provides meaningful efficiency gains relative to unrestricted partial likelihood and penalized estimators.

The theoretical results are complemented with extensive Monte Carlo experiments and a real-data analysis of the Mayo Clinic PBC cohort. Across diverse scenarios, the PSE achieves the most favorable bias–variance trade-off and often outperforms both the unrestricted estimator and modern penalized approaches such as adaptive LASSO.

The remainder of the paper is structured as follows. The Time-Dependent Cox Model introduces the time-dependent Cox framework and outlines the associated asymptotic properties. Shrinkage Estimation under the Time-Dependent Cox Model presents the proposed shrinkage estimators and establishes their theoretical results. Simulation Study reports Monte Carlo evidence evaluating finite-sample performance, whereas Application to the Mayo Clinic PBC Data demonstrates the practical applicability of the proposed methodology using real-world data. Finally, Conclusions summarizes the main findings and discusses potential directions for future research.

The time-dependent Cox model

Let T_i and C_i denote the event and censoring times for subject i, with observed time and indicator , . The covariate process is denoted by , allowing the effect of predictors to vary over time. Let Y_i(t) denote the at-risk indicator process.

The time-dependent Cox model specifies the hazard function as

(1)

where h₀(t) is an unspecified baseline hazard and is a vector of regression coefficients, with an open parameter space.

To establish the asymptotic properties of the proposed estimators, we adopt regularity conditions analogous to those of Andersen and Gill (1982) [2], rewritten in terms of the notation used in this paper. Throughout, denotes the k-fold tensor product.

(C1) Finite cumulative baseline hazard. The baseline hazard function satisfies
for some finite .
(C2) Stability of risk processes. Define
There exist deterministic limit functions such that
for k = 0,1,2.
(C3) Lindeberg-type condition. For some ,
(C4) Regularity and boundedness. The limit functions are continuous in uniformly in , and bounded. Moreover, is bounded away from zero.
Define
Then the matrix
is positive definite.

Partial likelihood

Estimation of is based on the partial likelihood [2,4]:

(2)

where is the risk set at time t_i. Maximizing the full likelihood with respect to the baseline cumulative hazard H₀(t) shows that the profile likelihood is proportional to (2) [17].

In the presence of tied event times, Efron’s approximation [18] provides an accurate and computationally efficient modification:

(3)

where is the set of individuals failing at time t_i and .

Semiparametric estimation

Given the conditional hazard function in (1), inference for is based on the partial likelihood with Efron’s correction for tied event times [3,18,19]. Let denote the corresponding log–partial likelihood, with

The score function and observed information matrix are then

The unrestricted estimator (UE) is the maximizer of the partial likelihood, computed via the Newton–Raphson/Fisher–scoring iteration The asymptotic theory of [2] ensures that, under regularity conditions (C1–C4), and that the observed information satisfies .

However, in many applications, partial prior information on the parameter may be available in the form of linear constraints where H is a q × p full–rank matrix. When H₀ holds, the restricted estimator (RE) is obtained by projecting the UE onto the constrained subspace:

Under H₀, and using the convergence , it follows from Slutsky’s theorem that

where

This unified framework provides the basis for constructing the shrinkage and pretest estimators developed in following Section.

Shrinkage estimation under the time-dependent Cox model

In semiparametric survival models, incorporating external information in the form of linear constraints can yield substantial efficiency gains when the restriction is valid. The UE and RE estimators therefore represent two extremes: full reliance on the data versus full reliance on the constraint. Since the validity of H₀ is typically uncertain, methods that adaptively interpolate between these extremes are of both practical and theoretical interest.

To evaluate the plausibility of H₀, we employ the Wald statistic

which is asymptotically under H₀ and noncentral under H_A. This leads to the classical preliminary test estimator,

which, however, inherits the discontinuity and significance–level dependence typical of pretest procedures.

A continuous alternative is provided by a James–Stein shrinkage estimator,

which shrinks the UE toward the RE with data‐adaptive intensity. Since may overshoot the restricted subspace, we employ the positive–rule modification

which enforces a nonnegative shrinkage factor and yields improved stability.

These estimators form a coherent class of shrinkage procedures that continuously blend restricted and unrestricted inference, enabling principled incorporation of uncertain prior information in the time–dependent Cox proportional hazards model.

Theoretical characteristics

In this section, we introduce the asymptotic distributional bias (ADB), quadratic bias (ADQB), MSE-matrices (ADMSE), and quadratic risks (ADQR) associated with estimators of the parameter vector . To this end, let denote a general consistent estimator of , and consider a positive semi-definite weight matrix W along with the following quadratic loss function:

(4)

Let the asymptotic distribution matrix (ADM) be

Then the A’DQR of is given by .

Under fixed alternative hypotheses , it follows that converges in distribution to the same limiting distribution as as . This follows from the consistency of for [2,20] together with Slutsky’s theorem.

Then, To achieve a meaningful asymptotic distribution of , we look at the class of local alternatives, {K_(n)} is given by . Let the asymptotic cumulative distribution function (cdf) of under {K_(n)} be

(5)

If the asymptotic cdf exists, then the ADB and the ADQB are given by

(6)

and , respectively, where is the MSE-matrix of as . Defining , we have the weighted risk of given by .

Asymptotics under the fixed alternatives.

We now turn to the asymptotic behavior of the Wald statistic when the null hypothesis is violated. In particular, we investigate its limiting distribution under fixed alternatives of the form , where is a fixed vector in . Then

and the statistic can be written as

(7)

Under the assumed regularity conditions C1–C4 and invoking Theorem 4.2 of [2], as , we obtain

(8)

and

Moreover, the cross term satisfies

and hence diverges in probability as . Therefore, it follows that

Consequently, for any fixed x > 0,

under the fixed alternatives . Then we have the following theorem:

Theorem 1. Under the regularity conditions C1–C4, and under the fixed alternatives , as , we have

(9)

The proof of Theorem 1 is provided in the Appendix.

Theorem 1 shows that the asymptotic distribution of , , and is given by , and the ADB,ADQB, ADMSE, and ADQR of the estimators , , and are all equal and are given by

(10)

while and as . Thus, the asymptotic distributions of and are not equivalent as . Now,

(11)

(proof in the Appendix) implies that the asymptotic distribution of is degenerate under the fixed alternative .

Asymptotics under the local alternatives.

To obtain meaningful asymptotic distributions of the various estimators and the test-statistics, we consider the following theorem: Theorem 2. Under and under the regularity conditions C1–C4, the following result holds as :

(12)

where , and is the cdf of a p-variate normal distribution with mean and covariance matrix V, and is the cdf of a noncentral chi-square distribution with v d.f. and noncentrality parameter .

It is difficult to obtain the asymptotic distributions of and , but we can obtain an asymptotic representation of these estimators under {K_(n)} to facilitate the computation of ADB, ADQB, ADMSE and ADQR of these estimators. Hence, we have under {K_(n)}

where and .

The proof of Theorem 2 is provided in the Appendix.

Based on the Theorem 2 we can easily obtain the ADB, ADQB, ADMSE and ADQR of the estimators. First, we consider the ADB and ADQB of the estimators. Clearly,

(13)

(14)

and Similarly,

(15)

We derive the expressions for the ADQR and ADMSE of the estimators, as well as the ADMSE, based on Theorem 2. The complete proofs are provided in the Appendix.

Remark 1. The asymptotic structure derived in Theorem 2 is based on martingale central limit theorems for counting processes (see, e.g., Andersen and Gill, 1982). The present results extend these classical asymptotic arguments in several directions that are particularly relevant in the context of the time-dependent Cox model.

First, the analysis is carried out within a time-dependent Cox regression framework, where the covariates are stochastic processes. This setting introduces additional technical challenges due to the presence of counting processes and stochastic integrals.

Second, Theorem 2 establishes the joint asymptotic distribution of the unrestricted and restricted estimators, as well as their differences. This joint characterization plays a key role in deriving the asymptotic properties of the proposed shrinkage estimators and is not commonly developed in standard treatments.

Third, the result provides explicit asymptotic representations under local alternatives, which enable a unified derivation of ADB, ADQB, ADMSE, and ADQR for all estimators under consideration.

A comprehensive framework that simultaneously incorporates time-dependent covariates, linear restrictions, and Stein-type shrinkage estimation does not appear to have been systematically studied in the existing literature.

Simulation study

We conduct a Monte Carlo simulation study to assess the finite-sample behavior of the proposed shrinkage estimators in the time-dependent Cox model. The comparison includes the UE, RE, PTE, SE, and PSE, together with three widely used penalized likelihood estimators—LASSO [14], ALASSO [21], and ENET [22]—thereby enabling a comprehensive evaluation of classical, shrinkage-based, and regularized approaches.

Two practical considerations guide the simulation design. First, time-dependent covariates may induce immortal-time bias if risk intervals are not constructed correctly; to avoid this, all datasets are generated directly in the (start,stop,status) counting-process format [23]. Second, extremely large coefficients can lead to quasi-separation and numerical instability in the partial likelihood [19]; thus, coefficient magnitudes are restricted to moderate values. These adjustments ensure realistic hazard trajectories and unbiased assessment of estimator performance.

We consider the time-dependent Cox model (1) with parameter vector , where and . Covariates follow with constant baseline hazard h₀(t)=0.1. Survival times are generated via the inverse transform method,

and censoring times are drawn to yield approximately 15% censoring. Each trajectory is partitioned into three equal-length intervals to create a valid counting-process representation. Departures from the null constraint are introduced using the deviation index where and denotes the Euclidean norm. For example, when , the coefficient vector takes the form ..

Simulation scenarios vary over sample sizes and numbers of potentially irrelevant covariates , with 100 replications per configuration. A key methodological component is the adaptive selection of the Stein shrinkage parameter . This parameter is chosen through a short pilot Monte Carlo procedure that estimates the empirical distribution of the Wald statistic and constructs a data-driven grid of candidate values. The value minimizing the Monte Carlo MSE of the positive-rule Stein estimator is selected (Algorithm 1).

Algorithm 1: Monte Carlo–Based Selection of

The performance of the estimators is evaluated using standard finite-sample criteria, with the mean squared error (MSE) and the simulated relative efficiency (SREF) computed as follows:

Coefficientwise empirical bias and standard deviation are also reported to assess systematic deviations and sampling variability. Together, these measures provide a coherent evaluation of the precision, robustness, and efficiency of the competing estimators across a broad range of data-generating mechanisms.

Table 1 and Figs 1–3 provide a unified assessment of the finite-sample performance of all estimators and empirically validate the theoretical results presented in Shrinkage Estimation under the Time-Dependent Cox Model. The UE exhibits small biases and standard deviations across all sample sizes and dimensions (Table 1), confirming both the numerical stability of the simulation design and the suitability of its empirical SDs as a reference scale for interpreting . Notably, the largest deviations considered () correspond to nearly ten times the empirical SD of .

Download:

Table 1. Estimated biases and standard deviations (SDs) of regression coefficients

. Results are shown for different sample sizes and values of p₂.

https://doi.org/10.1371/journal.pone.0345123.t001

Download:

Fig 1. Efficiency of UE, RE, PTE, SE, and PSE estimators versus

.

Results are shown across different sample sizes and values of p₂. Efficiency is calculated relative to the UE estimator.

https://doi.org/10.1371/journal.pone.0345123.g001

Download:

Fig 2. Efficiency of UE, LASSO, ALASSO, and ENET estimators versus

.

Results are presented across different sample sizes and values of p₂. Relative efficiency is computed with respect to the UE estimator.

https://doi.org/10.1371/journal.pone.0345123.g002

Download:

Fig 3. Efficiency of UE, ALASSO, and PSE estimators versus

.

Results are displayed across different sample sizes and values of p₂. Relative efficiency is calculated with respect to the UE estimator.

https://doi.org/10.1371/journal.pone.0345123.g003

Fig 1 shows that the PSE uniformly dominates the UE and PTE and performs on par with or better than the SE across most deviations. The RE attains the highest efficiency only when is close to zero, reflecting the near-validity of the restriction ; however, its efficiency deteriorates rapidly once the restriction is violated. In contrast, the PSE remains robust across a broad range of departures, fully consistent with its optimal ADMSE and ADQR properties.

Fig 2 compares the UE with penalized estimators. ALASSO achieves the highest efficiency near , owing to its adaptive weighting strategy, but all penalized estimators lose efficiency as increases. When considered alongside Fig 1, ALASSO surpasses the PSE only under sufficiently large deviations where the restriction is clearly invalid.

Fig 3 further contrasts PSE and ALASSO across various (n,p₂) configurations. For small , PSE consistently outperforms ALASSO and UE, and this dominance persists until the deviation reaches roughly an order of magnitude larger than the empirical SD of . Beyond this point, ALASSO begins to dominate, particularly in high-dimensional and small-sample settings (e.g., p = 15, n = 200), where penalization compensates for the invalidity of the restriction.

Across all scenarios, a consistent pattern emerges: the relative efficiency of both PSE and ALASSO improves as dimensionality increases or sample size decreases, reflecting the stronger stabilizing influence of shrinkage and penalization when information is limited. Additional experiments (not shown) confirm that the same trend holds for n < 200 and n > 500. For very small samples (n < 100), the inflated sampling variability shifts the PSE–ALASSO crossover point toward larger values (often near 0.5), supporting the empirical rule that ALASSO typically overtakes PSE only when the deviation exceeds approximately ten times the SD of the relevant coefficient estimates.

Overall, these findings demonstrate that PSE achieves the most favorable bias–variance trade-off across a broad range of practically relevant settings, occupying an effective middle ground between restricted and penalized estimators. The resulting efficiency hierarchy may be summarized as .

Additional Censoring-Level Experiments (Fig 4). To further examine robustness to censoring, an additional experiment was conducted under two scenarios: (i) no censoring and (ii) a higher censoring rate of 30%. Fig 4 compares the efficiencies of PSE, ALASSO, and UE in both settings. The overall qualitative patterns mirror those observed under the 15% censoring design: PSE dominates for small and moderate deviations, while ALASSO overtakes PSE only for sufficiently large .

Download:

Fig 4. Efficiency of UE, ALASSO, and PSE estimators versus

.

Results are shown for different sample sizes, values of p₂, and censoring rates. Relative efficiency is computed with respect to the UE estimator.

https://doi.org/10.1371/journal.pone.0345123.g004

A notable additional finding is that higher censoring amplifies the relative advantage of PSE at larger deviations. Specifically, as the censoring rate increases, the efficiency crossover point shifts to the right, indicating that PSE retains superior performance over a broader range of departures from H₀. This behavior highlights the resilience of PSE in scenarios where information loss due to censoring is more pronounced.

Application to the Mayo Clinic PBC data

We illustrate the proposed estimators using the Mayo Clinic PBC dataset [24], derived from a randomized clinical trial of D-penicillamine. The analysis includes 312 patients with complete longitudinal records, reformatted into intervals using tmerge(), resulting in a censoring rate of approximately 53.8%. The dataset contains thirteen demographic and biochemical predictors measured repeatedly over time; a concise description is provided in Table 2.

Download:

Table 2. List of variables in the pbcseq dataset.

https://doi.org/10.1371/journal.pone.0345123.t002

A preliminary BIC-based stepwise selection identified seven influential covariates—, albumin, , , hepato, stage, and sex. The remaining predictors were treated as irrelevant, defining a restricted subspace with p₂ = 6. To evaluate finite-sample properties, we employed subject-level case-resampling with B = 500 bootstrap replicates, preserving the within-subject dependence inherent to longitudinal measurements.

Table 3 reports bootstrap estimates (Est), standard errors (SE), and simulated relative efficiencies (SREF). The Positive-rule Stein Estimator (PSE) achieved the highest overall efficiency (SREF = 1.72), followed by RE and ENET, whereas UE served as the baseline. SE and PTE showed reduced stability under moderate departures from the restriction, while PSE delivered consistently smaller SEs across several clinically relevant coefficients

Download:

Table 3. Bootstrap estimates and simulated relative efficiencies for significant time-dependent covariates.

https://doi.org/10.1371/journal.pone.0345123.t003

Among penalized estimators, ALASSO produced the smallest SEs overall, with ENET exhibiting efficiency comparable to PSE. LASSO demonstrated stronger shrinkage, slightly attenuating both weak and strong effects; the adaptive weighting in ALASSO mitigated this bias.

The estimated effects aligned with established clinical understanding: elevated bilirubin, prolonged prothrombin time, and advanced histologic stage were associated with increased risk, whereas higher albumin, hepatomegaly, and female sex reduced the hazard.

Fig 5 depicts the bootstrap distributions of the coefficient estimates. PSE and ENET exhibit the narrowest dispersion, ALASSO shows comparably stable behavior, while LASSO displays overshrinkage and heavier tails for weaker predictors. UE yields the greatest variability. These graphical results reinforce the superior finite-sample efficiency and robustness of adaptive and positive-rule shrinkage estimators in time-dependent Cox regression with partial prior information.

Download:

Fig 5. Bootstrap distributions of regression coefficient estimates in the time-dependent PBC Cox model.

Boxplots are based on B = 500 bootstrap samples and compare eight estimation methods.

https://doi.org/10.1371/journal.pone.0345123.g005

Conclusions

This work develops a unified class of improved estimators for the time-dependent Cox proportional hazards model by incorporating linear subspace information through preliminary-test and Stein-type shrinkage strategies. Within the counting-process framework, we established the asymptotic distributions of the proposed estimators and derived explicit expressions for their asymptotic bias and risk under local alternatives, thereby elucidating the conditions under which principled shrinkage yields efficiency gains.

Monte Carlo simulations and the longitudinal Mayo Clinic PBC analysis provide strong empirical validation of the theoretical results. Across a wide range of settings, the positive-rule Stein estimator (PSE) achieves the most favorable bias–variance trade-off whenever the imposed restriction is approximately correct, and maintains competitive performance under moderate misspecification. A recurring empirical finding is that adaptive LASSO surpasses PSE only when the deviation from the restriction exceeds roughly an order of magnitude of the empirical standard deviation of the associated coefficient, offering a practical rule-of-thumb for applied analyses.

In summary, positive-rule Stein shrinkage constitutes a robust and interpretable alternative to both unrestricted and penalized estimators in semiparametric survival models with time-dependent covariates. The proposed framework suggests several directions for future research, including hybrid shrinkage–penalization approaches, data-adaptive selection of shrinkage intensity, and extensions to high-dimensional or partially sparse Cox-type models.

Appendix

A Proofs of main results

A.1 Proof of Theorem 1.

Let a be any vector and B be any positive definite matrix. Define the quadratic form

Moreover, let denote the upper -quantile of the asymptotic distribution of the Wald statistic . Finally, let k_n be the sequence of shrinkage constants appearing in the definition of the Stein-type estimator, satisfying as .

We first consider the quadratic form of the difference between and ,

Under the fixed alternative , we have , which implies

Since is a fixed constant, we obtain

Therefore,

which proves part (i). Next, we consider the Stein-type estimator. We have

Under the fixed alternative , we have , which implies that . Moreover, since for sufficiently large n, it follows by the dominated convergence theorem that

Since , we obtain

Consequently,

Finally, for the positive-part Stein estimator, we have

Each term on the right-hand side can be treated similarly to the previous cases. In particular, using the fact that and , together with boundedness arguments, we obtain

Hence,

A.2 Proof Equation 11.

Let’s compute the expression using the given equation for and considering the alternative hypothesis .

The equation for the restricted estimator is:

First, let’s find the difference :

Now, we need to substitute . We can write this as:

Under the alternative hypothesis , it follows that . Now, substitute this back into the expression for :

Finally, we multiply this entire expression by :

We can simplify the second term slightly: . So the expression becomes:

This is the computed expression for in terms of , , H, h, , and the information matrix .

A.3 Proof of Theorem 2.

The proof follows standard martingale central limit arguments (see Andersen and Gill, 1982 [2]); however, its application in the present setting requires careful handling of the time-dependent covariates, counting process structure, and the imposed linear restrictions.

(i) See Theorem 4.2 of [2].
(ii) . By (i) as .
(iii) as .
(iv) and (v) follow similarly. To prove (vi), we note that as . Hence,
as where .
(vi)
.

Since and are independent, the first term reduces to as . The second term is obtained by conditional arguments as and is given by

where .

Proofs of (viii) and (ix) are obtained by writing the expressions in terms of and then applying the distribution of the related statistics.

A.4 Proofs for and equations.

Since is a symmetric idempotent matrix with rank , there exists an orthogonal matrix such that

The matrices A₁₁ and A₁₂ are of order q and , respectively. Hence,

Further, by Theorem 2(ii),

(16)

Let ; then, the r.h.s. of (16) becomes

(17)

Thus, reduces to

Next, we consider the risk expression for :

(18)

Finally, we consider the risk expression for :

References

1. Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1972;34(2):187–202.
- View Article
- Google Scholar
2. Andersen PK, Gill RD. Cox’s Regression Model for Counting Processes: A Large Sample Study. Ann Statist. 1982;10(4).
- View Article
- Google Scholar
3. Fleming RT. Counting processes and survival analysis. John Wiley & Sons. 2005. https://doi.org/10.1002/9781118150672
4. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. Springer New York. 1995. https://doi.org/10.1007/978-1-4612-4348-9
5. Stein C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Contribution to the Theory of Statistics. University of California Press. 1956. 197–206. https://doi.org/10.1525/9780520313880-018
6. James W, Stein C. Estimation with Quadratic Loss. Springer Series in Statistics. Springer New York. 1992. 443–60. https://doi.org/10.1007/978-1-4612-0919-5_30
7. Ahmed SE, Arabi Belaghi R, Hussein AA. Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy (Basel). 2025;27(3):254. pmid:40149178
- View Article
- PubMed/NCBI
- Google Scholar
8. Zareamoghaddam H, Ahmed SE, Provost SB. Shrinkage estimation applied to a semi-nonparametric regression model. Int J Biostat. 2020;17(1):23–38. pmid:32769222
- View Article
- PubMed/NCBI
- Google Scholar
9. Arashi M, Norouzirad M. Advances in shrinkage and penalized estimation strategies: Honoring the contributions of Professor A K Md. Ehsanes Saleh. Springer. 2025. https://doi.org/10.1007/978-3-031-94050-7
10. Shih J-H, Lin T-Y, Jimichi M, Emura T. Robust ridge M-estimators with pretest and Stein-rule shrinkage for an intercept term. Jpn J Stat Data Sci. 2020;4(1):107–50.
- View Article
- Google Scholar
11. Raheem E, Ahmed SE, Liu S. Stein-rule M-estimation in sparse partially linear models. Jpn J Stat Data Sci. 2023;7(1):507–35.
- View Article
- Google Scholar
12. Efron B. Machine learning and the James–Stein estimator. Jpn J Stat Data Sci. 2023;7(1):257–66.
- View Article
- Google Scholar
13. Hossain S, Ahmed SE. Penalized and Shrinkage Estimation in the Cox Proportional Hazards Model. Communications in Statistics - Theory and Methods. 2014;43(5):1026–40.
- View Article
- Google Scholar
14. Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med. 1997;16(4):385–95. pmid:9044528
- View Article
- PubMed/NCBI
- Google Scholar
15. Ng CT, Yu CW. Modified SCAD penalty for constrained variable selection problems. Statistical Methodology. 2014;21:109–34.
- View Article
- Google Scholar
16. Utazirubanda JC, Leon T, Ngom P. Variable selection with Group LASSO approach: Application to Cox regression with frailty model. Commun Stat Simul Comput. 2021;50(3):881–901. pmid:34248255
- View Article
- PubMed/NCBI
- Google Scholar
17. Elashoff R, Li N. Joint Modeling of Longitudinal and Time-to-Event Data. Chapman and Hall/CRC. 2016. https://doi.org/10.1201/978131537487
18. Efron B. The Efficiency of Cox’s Likelihood Function for Censored Data. Journal of the American Statistical Association. 1977;72(359):557–65.
- View Article
- Google Scholar
19. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. John Wiley & Sons. 2002. https://doi.org/10.1002/9781118032985
20. van der Vaart AW. Asymptotic statistics. Cambridge University Press. 1998. https://doi.org/10.1017/CBO9780511802256
21. Zou H. The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association. 2006;101(476):1418–29.
- View Article
- Google Scholar
22. Zou H, Hastie T. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301–20.
- View Article
- Google Scholar
23. Suissa S. Immortal Time Bias in Pharmacoepidemiology. American Journal of Epidemiology. 2008;167(4):492–9.
- View Article
- Google Scholar
24. Therneau TM, Lumley T. Package ‘survival’. R Top Doc. 2015;128(10):28–33.
- View Article
- Google Scholar

[ref1] 1. Cox DR. Regression Models and Life-Tables. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1972;34(2):187–202.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Andersen PK, Gill RD. Cox’s Regression Model for Counting Processes: A Large Sample Study. Ann Statist. 1982;10(4).
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Fleming RT. Counting processes and survival analysis. John Wiley & Sons. 2005. https://doi.org/10.1002/9781118150672

[ref4] 4. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical models based on counting processes. Springer New York. 1995. https://doi.org/10.1007/978-1-4612-4348-9

[ref5] 5. Stein C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Contribution to the Theory of Statistics. University of California Press. 1956. 197–206. https://doi.org/10.1525/9780520313880-018

[ref6] 6. James W, Stein C. Estimation with Quadratic Loss. Springer Series in Statistics. Springer New York. 1992. 443–60. https://doi.org/10.1007/978-1-4612-0919-5_30

[ref7] 7. Ahmed SE, Arabi Belaghi R, Hussein AA. Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox’s Proportional Hazards Models. Entropy (Basel). 2025;27(3):254. pmid:40149178
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref8] 8. Zareamoghaddam H, Ahmed SE, Provost SB. Shrinkage estimation applied to a semi-nonparametric regression model. Int J Biostat. 2020;17(1):23–38. pmid:32769222
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref9] 9. Arashi M, Norouzirad M. Advances in shrinkage and penalized estimation strategies: Honoring the contributions of Professor A K Md. Ehsanes Saleh. Springer. 2025. https://doi.org/10.1007/978-3-031-94050-7

[ref10] 10. Shih J-H, Lin T-Y, Jimichi M, Emura T. Robust ridge M-estimators with pretest and Stein-rule shrinkage for an intercept term. Jpn J Stat Data Sci. 2020;4(1):107–50.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref11] 11. Raheem E, Ahmed SE, Liu S. Stein-rule M-estimation in sparse partially linear models. Jpn J Stat Data Sci. 2023;7(1):507–35.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref12] 12. Efron B. Machine learning and the James–Stein estimator. Jpn J Stat Data Sci. 2023;7(1):257–66.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref13] 13. Hossain S, Ahmed SE. Penalized and Shrinkage Estimation in the Cox Proportional Hazards Model. Communications in Statistics - Theory and Methods. 2014;43(5):1026–40.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref14] 14. Tibshirani R. The lasso method for variable selection in the Cox model. Stat Med. 1997;16(4):385–95. pmid:9044528
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref15] 15. Ng CT, Yu CW. Modified SCAD penalty for constrained variable selection problems. Statistical Methodology. 2014;21:109–34.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref16] 16. Utazirubanda JC, Leon T, Ngom P. Variable selection with Group LASSO approach: Application to Cox regression with frailty model. Commun Stat Simul Comput. 2021;50(3):881–901. pmid:34248255
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref17] 17. Elashoff R, Li N. Joint Modeling of Longitudinal and Time-to-Event Data. Chapman and Hall/CRC. 2016. https://doi.org/10.1201/978131537487

[ref18] 18. Efron B. The Efficiency of Cox’s Likelihood Function for Censored Data. Journal of the American Statistical Association. 1977;72(359):557–65.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref19] 19. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. John Wiley & Sons. 2002. https://doi.org/10.1002/9781118032985

[ref20] 20. van der Vaart AW. Asymptotic statistics. Cambridge University Press. 1998. https://doi.org/10.1017/CBO9780511802256

[ref21] 21. Zou H. The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association. 2006;101(476):1418–29.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref22] 22. Zou H, Hastie T. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2005;67(2):301–20.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref23] 23. Suissa S. Immortal Time Bias in Pharmacoepidemiology. American Journal of Epidemiology. 2008;167(4):492–9.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref24] 24. Therneau TM, Lumley T. Package ‘survival’. R Top Doc. 2015;128(10):28–33.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

Figures

Abstract

Introduction

The time-dependent Cox model

Partial likelihood

Semiparametric estimation

Shrinkage estimation under the time-dependent Cox model

Theoretical characteristics

Asymptotics under the fixed alternatives.

Asymptotics under the local alternatives.

Simulation study

Application to the Mayo Clinic PBC data

Conclusions

Appendix

A Proofs of main results

A.1 Proof of Theorem 1.

A.2 Proof Equation 11.

A.3 Proof of Theorem 2.

A.4 Proofs for and equations.

References