Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Weighted Estimation of the Accelerated Failure Time Model in the Presence of Dependent Censoring

  • Youngjoo Cho ,

    Affiliation Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania, 16802, United States of America

  • Debashis Ghosh

    Affiliation Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado, Aurora, Colorado, 80045, United States of America

Weighted Estimation of the Accelerated Failure Time Model in the Presence of Dependent Censoring

  • Youngjoo Cho, 
  • Debashis Ghosh


Independent censoring is a crucial assumption in survival analysis. However, this is impractical in many medical studies, where the presence of dependent censoring leads to difficulty in analyzing covariate effects on disease outcomes. The semicompeting risks framework offers one approach to handling dependent censoring. There are two representative estimators based on an artificial censoring technique in this data structure. However, neither of these estimators is better than another with respect to efficiency (standard error). In this paper, we propose a new weighted estimator for the accelerated failure time (AFT) model under dependent censoring. One of the advantages in our approach is that these weights are optimal among all the linear combinations of the previously mentioned two estimators. To calculate these weights, a novel resampling-based scheme is employed. Attendant asymptotic statistical results for the estimator are established. In addition, simulation studies, as well as an application to real data, show the gains in efficiency for our estimator.


In medical studies, it is very common that death or withdrawal of study and progression on disease of interest simultaneously occur in the study. For this case, death or withdrawal of study may censor the development of disease. This type of data structure is called ‘semicompeting risks data’ [4].

Semicompeting risks data have been widely studied in the past decade. Some researchers used a Gamma copula to estimate the association parameter between the event of interest and dependent censoring [2], [4]. There is a literature that extended the methodology of [2] to the case that a nuisance parameter exists and also considered a more general copula model [22].

On the other hand, other researchers used semiparametric regression to model the event of interest and dependent censoring jointly. One approach is an estimation procedure based on the accelerated failure time (AFT) model [13], [17]. They used the artificial censoring technique to adjust the bias of the usual estimator. While the estimating equation of [13] is a U-statistic of order one, that of [17] is a U-statistic of order 2.

However, none of these papers fully discussed optimality of the estimator. In this case, choosing an estimator that is optimal from an efficiency viewpoint is an important issue for consideration. Here, we adapt the idea of [25], which proposed an optimal estimator whose form is a linear combination of estimators for multivariate failure time data. They used idea of [24], which proposed using combinations of dependent tests in the presence of missing values. Idea of [24] is to create a test which can maximize power based on linear combination of test statistics. Approach of [25] is simple and flexible, so it is sensible to apply their method in our case.

In this paper, we propose a weighted estimator by using methodology from [25]. Our weighted estimator combines those of [13] and [17]. The structure of this paper is as follows. In methods section, we review estimators proposed by [13] and [17] briefly. In addition, we describe details on our new weighted estimator. In model checking section, model checking procedure is briefly discussed. In simulation studies section, results of simulation studies will be given. Application of our method to a real data example is presented in real data analysis section. Some discussion concludes discussion section.


Review of Model

Let X be time to the event of interest, D the time to dependent censoring and C the time to independent censoring. All these times are transformed on a logarithmic scale. Let X˜=XDC and D˜=DC. Define δ=I(XD˜), Δ = I(DC) and let Z be covariates. The data contain n independent and identically distributed observations (X˜i,D˜i,Zi,δi,Δi),i=1,,n. The model is where θ0 and η0 are k × 1 vectors, and ϵi(ϵiX,ϵiD) are error terms with an unknown joint distribution. In this case, we assume that the model is identifiable only in upper wedge X < D [4], [17]. We assume that ϵ has unknown distribution H. The goal is to obtain an unbiased estimator of α = (ηT,θT)T without nonparametrically estimating the distribution of ϵi, i = 1,…, n. We further assume that given Z, C and (X, D) are independent, but X and D can be dependent given Z. Now we are going to describe the procedures of [13] and [17] in turn.

Since D˜ only depends on independent censoring, a standard rank regression approach is available for estimation [11], [13], [15], [17], [20], [23]. The estimating equation for η is given by where D˜i*(η)=D˜iZiTη. The estimator of η can be obtained by solving Sn(η) = 0.

For estimation of θ, simply replacing D˜iZiTη to X˜iZiTθ does not yield unbiased estimation of θ. This is because the cause-specific hazard function for X˜iZiTθ depends on D˜iZiTθ, which violates the model assumption [13]. To fix this problem, many authors use artificial censoring techniques [3], [6], [7], [10], [13], [17]. In [13], a single constant term g(α) is proposed so that the estimation equation will be unbiased for estimation of θ in the two-sample case. The form of g(α) is g(α)=max1in{0,ZiT(θη)}. The proposed estimator in [13] is obtained by solving UnL(α) = 0, where (1) and ab means minimum of a and b. In [17], pairwise comparisons of all the subjects is proposed so that each subject has different degree of the artificial censoring. The transformations suggested by [17] are (2) The proposed estimator according to [17] is obtained by solving UnP(α)=0, which is defined by Note that X and D are not observable, but we can express transformation (Eq 1) and (Eq 2) by using observable quantities [7], [13].

Weighted estimator

Given these two estimation procedures, it is natural to consider their efficiencies with respect to standard error. However, in this point of view, neither estimator is superior to the other. Moreover, these estimators may not be optimal estimators with respect to the standard error. There is an argument that estimator of [17] gains more efficiency than that of [13] because pairwise comparisons lead to less artificial censoring than that in [13]. However, this logic only holds when we look at performance of estimators in the view of bias and variance across the estimators in simulation study. Concentrating on standard error of an estimator in a single dataset, the estimator by [17] may not provide better estimator than that of [13]. This will be seen in the real data analysis section.

The reason for this is due to estimation procedure of [17]. As discussed [7], for n samples, the number of comparisons of [13] for artificial censoring is of order n, while that of [17] is of order n2. By definition of gij(α), different degrees of artificial censoring is applied to observations. It may lead more variation between observations, which makes standard error larger than that of [13].

Having discussed our data structure and estimators from [13] and [17], we now describe the proposed estimation in this paper. Let η^=(η^1,,η^k)T be estimator of η0, θ^L=(θ^1L,,θ^kL)T be estimator of θ0 by [13] and θ^P=(θ^1P,,θ^kP)T be estimator of θ0 by [17]. θ^L and θ^P are asymptotically unbiased estimators of θ0.

We extend the scope of estimators which provide consistent estimation of θ0. The natural extension of estimators of [13] and [17] is to consider collections of estimators that are linear combination of these two estimators with sum of weights being 1. By choosing proper weights, we can expect that the variance of the new combined estimator is smaller than that of each individual estimator in θ^L and θ^P.

The goal is to find weights such that the variance of the new estimator is smaller than the minimum of variance of the estimators by [13] and [17], which have good theoretical properties. To obtain the estimator that yields smallest variance with these properties, we can use the idea of [25], which was applied to the problem of modeling multivariate failure times.

In [25], the joint distribution of estimators γ^={γ^mr} is considered, where m = 1,…, k and r = 1…R. In this case, m indicates index of regression parameters and r stands for index of the rth event. For obtaining an optimal estimator, they applied arguments from [24] which derived a linear combination of test statistic to maximize power against every alternative hypothesis. Let H^ be the covariance matrix for the estimators γ^. Then we fix m and define H^m be covariance matrix of γ^m=(γ^m1,,γ^mR). It can be obtained from the entire covariance matrix by selecting the part corresponding to γ^ for r = 1,…, R under fixed m. Now we can define r=1Rdrγ^mr, where d = (d1, d2,…, dR) satisfies r=1Rdr=1 [25]. Then d(eTH^m1e)1H^m1e is a vector of weights which leads the best estimator among linear combinations of estimators of γ^m where e is a vector consisting of R ones [24], [25].

We now apply the argument in previous paragraph to our model by considering the joint distribution of β^={η^T,(θ^L)T,(θ^P)T}T. Let β0=(η0T,θ0T,θ0T)T and Gn(β)=[SnT(η),{UnL(α)}T,{UnP(α)}T]T where [SnT(η),{UnL(α)}T,{UnP(α)}T]T are estimating equations for β0. The strong consistency and asymptotic joint distribution of three estimators, described in following theorems, play a crucial role in our methodology.

To prove asymptotic results, several regularity conditions are required. As stated in [7] and [17], define

Let α0=(η0T,θ0T)T. Define and

From the Appendix in [17], the additional conditions are as follows:

  1. The parameter space 𝓦 is compact, and the true parameter α0 is an interior point of 𝓦.
  2. θ0 is the only solution of the estimating equation E{n1/2UnP(η0,θ)}=0.
  3. E(||Z||2) < ∞, where ||·|| is Euclidean norm and there exists positive constant K such that partial derivatives of F are bounded by K and there exists positive constant K* such that marginal probability distribution of F is bounded by K* almost surely.
  4. cov[(Z1Z2){T1(Z1, Z2)}1/2] and cov[(Z1Z2){T2(Z1, Z2)}1/2] are positive definite.
In many parts of proofs, we adapt arguments from [13] and [17].

Theorem 1. By conditions of C1 − C3 in Appendix of [17] and conditions in [27], β^ is (strongly) consistent.

Proof. Let β^={η^T,(θ^L)T,(θ^P)T}T. It suffices to show that η^,θ^L and θ^P are strongly consistent, respectively. Let α = (ηT,θT)T. Note that we have compact region, say 𝓦 and we assume regularity conditions in [27]. By [27], there exists nonrandom function m1 such that supη ∈ 𝓝0||n−1/2 Sn(η) − m1(η)|| converges to 0 with probability 1 where 𝓝0 is a neighborhood of η0. Thus η^ is strongly consistent. Similarly, we have another nonrandom function m2 such that supα𝓝1||n1/2UnL(α)m2(α)|| converges to 0 with probability 1 where 𝓝1 is a neighborhood of α0. Hence by [27], α^L is strongly consistent.

For θ^P, by argument in Appendix of [17], note that by the U-statistics version of the law of large numbers, for all α ∈ 𝓦, ||n1/2UnP(α)γ(α)|| converges to 0 in probability where γ(α)=E{n1/2UnP(α)}. We can partition our compact space as 𝓦1,…,𝓦k so that 𝓦j=1k𝓦j. Clearly, then for {αj ∈ 𝓦j, j = 1,…, k}, max1jk||n1/2UnP(αj)γ(αj)|| converges to 0 in probability. Then by Appendix of [17], and for all ϵ > 0, there exists ξ > 0 such that Hence

Thus θ^P is strongly consistent and clearly, β^ is strongly consistent.

Theorem 2. Assuming certain technical conditions from [27] and [17], n1/2(β^β0) is asymptotically normal with mean zero vector and covariance matrix Σ0 where Σ0=Γ01Ω0Γ01, where Γ0 is a nonsingular matrix and Ω0 is the asymptotic covariance matrix of Gn(β0).

Proof. As consistency, we assume the same regularity conditions as in [27]. Let β0=(η0T,θ0T,θ0T)T and Gn(β)=[SnT(η),{UnL(α)}T,{UnP(α)}T]T. Similar to [13], let λ0(1)(t) be the cause-specific hazard function for the D˜i*(η) and let λ0(2)(t) be the cause-specific hazard function for X˜i*(α) under dependent censoring. Define (3) (4) Then M1i and M2i are martingales [5], [13]. By adapting a proof in the Appendix in [13], Rebdolledo’s martingale central limit theorem [5] gives where Z(1)(u)=limn[j=1nI{D˜j*(η0)u}Zj]/[j=1nI{D˜j*(η0)u}] and Z(2)(u)=limn[j=1nI{X˜j*(α0)u}Zj]/[j=1n{I(X˜j*(α0)u)}]. From Appendix of [17], where 2h1(v,α0) = 2E[h(v,V2,α0)]. For j = 1,…, n, M1j(t) is the martingale associated with ϵjD, while M2j(t) is the martingale associated with ϵjX and h(Vi,Vj,α) = (ZiZj)ϕij(α) [13], [17]. For j = 1,…, n, define By the Cramér-Wold theorem, Gn(β0) has an asymptotically normal distribution with mean zero and covariance matrix Ω0, where Note that E{n1/2UnP(α)}=γ(α). As stated in the Appendix of [17], under conditions of N1 − N3 from [9], there exists an open neighborhood of α0, say K0, such that (5) Using a Taylor series expansion of γ(α) around α0, (6) With these two results (Eq 5) and (Eq 6), by Appendix of [17], (7) From [27], we have that (8) for any η in the small neighborhood of η0, where P0 is k × k nonsingular matrix. From the Appendix in [13], for J1n(α)=[SnT(η),{UnL(α)}T]T, (9) for any α in the small neighborhood of α0, where L10 is defined as is 2k × 2k nonsingular matrix and M0 and H0 are k × k constant matrices. Define J2n(α)=[SnT(η),{UnP(α)}T]T. Using expansion from [17], for any α in the small neighborhood of α0, (10) where R0=γ(α)η|α=α0 and V0=γ(α)θ|α=α0. Combining expansions of (Eq 8), (Eq 9) and (Eq 10), we have for any β in the small neighborhood of β0, where Γ0 is defined as The results from [9] and [27], along with the consistency of β^, imply that By combining the above results with Slutsky’s theorem, n1/2(β^β0) has an asymptotically normal distribution with mean zero and covariance matrix Γ01Ω0Γ01.

Theorem 2 implies the asymptotic normality of β^ with the form of Σ0 being Let Σ^ be the estimated covariance matrix of Σ0. In this covariance matrix, Σ^11 is a k × k covariance matrix for η^, Σ^22 is a k × k covariance matrix for θ^L and Σ^33 is a k × k covariance matrix for θ^P. Moreover, Σ^12 and Σ^13 represent covariance terms between η^ and θ^L and between η^ and θ^P, respectively. Define Σ^23 as the covariance matrix between θ^L and θ^P. Clearly, Σ^21=Σ^12T, Σ^31=Σ^13T and Σ^32=Σ^23T.

The issue remains of how to obtain the matrix corresponding to H^m1 in our context. Note that η^,θ^L and θ^P are correlated with each other. The estimating equation structure implies that θ^L and θ^P cannot be estimated separately from η^. Thus our matrix corresponding to H^m1 should include the effect of η^. To obtain the matrix, we need to invert whole matrix and extract submatrix corresponding to θ^L and θ^P. There are two approaches to obtain submatrix.

The first approach is to invert Σ^ and obtain the submatrix of Σ^1 corresponding to θ^mL and θ^mP. Let us denote this matrix as Σ^m*. Clearly, this matrix is 2 × 2 and also positive definite. Then we can calculate c^m=(c^m1,c^m2)T=(hTΣ^m*h)1Σ^m*h, where h = (1, 1)T. By using the form of the optimal estimator in [25], we obtain new weighted estimator for mth covariate, say θ^mMWE, where We can repeat this step for the other regression coefficients. Then we obtain θ^MWE=(θ^1MWE,,θ^kMWE)T. In this first approach, weights are generated through using k number of 2 × 2 matrices. We can refer this first approach as ‘marginal approach’.

Sometimes it is desirable to consider entire covariates all at once when obtaining weights. The second approach is to obtain the corresponding submatrix of Σ^1 for {(θ^L)T,(θ^P)T}T. We denote this matrix as Σ^**. This approach is different from first one in that Γ^m consists of elements of the covariance matrix from θ^mL and θ^mP but now Σ^** has elements of covariance matrix from corresponding entire {(θ^L)T,(θ^P)T}T. This approach reflects the effect of {(θ^L)T,(θ^P)T}T jointly on our new estimator. Let E be a 2k × k matrix such that

E is a multivariate extension of h. Note that E is concatenation of two k × k identity matrices by row. Entries that are 1 in these two k × k identity matrices are source of weights for θ^L and θ^P. The next step is to construct B^, which is

Then B^ has the form This matrix is a multivariate extension of c^m from the first approach. This matrix is a contrast matrix in the sense that c^m,m*+c^(k+m),m*=1 for the mth regression coefficient of θ^L and θ^P. Moreover, c^p,p*+c^(k+p),p*=0 for pm = 1,…, k. Using a vector form, from this approach our new estimator, say θ^JWE, We can also refer this approach as the ‘joint approach’.

Now the key step is to obtain Σ^. We use the resampling approach of [16], which was also used in [13] and [17]. Let α^L={η^T,(θ^L)T}T and α^P={η^T,(θ^P)T}T. From [13] and [17], we have and Define A consistent estimator of Ω0 is We then solve the estimating equation (11) where Qi (i = 1,…, n) represent standard normal random variables. Note that Gn(β)=[SnT(η),{UnL(α)}T,{UnP(α)}T]T is joint estimating equation for (η0T,θ0T,θ0T)T. By solving this equation, we obtain many realizations of β^s, say β^R={(η^*)T,(θ^L*)T,(θ^P*)T}T where {(η^*)T,(θ^L*)T,(θ^P*)T}T are solutions from (Eq 11). The next theorem, combined with Theorem 2, justifies the resampling approach for calculating Σ^.

Theorem 3. Based on the technical conditions in [16], the unconditional distribution of n1/2(β^β0) is same asymptotically as the conditional distribution of n1/2(β^Rβ^) where β^R are realizations of β^ from resampling.

Proof. Recall that for any β in the small neighborhood of β0, we have (12) Note that β^R are solutions of Eq (11). By conditioning on observed data and using expansion (Eq 12) as well as by adapting arguments in [13] and [16], and hence, Note that n1/2i=1nWiQi is asymptotically normal with covariance matrix Σ0. Then given observed data, distribution of n1/2(β^Rβ^) is asymptotically normal with covariance matrix Γ01Σ0Γ01. Hence conditional distribution of n1/2(β^Rβ^) on observed data is asymptotically same as unconditional distribution of n1/2(β^β0).

For m = 1,…k and j = 1,…, M, let (η^m*)(j),(θ^mL*)(j) and (θ^mP*)(j) be jth realizations of an element η^m,θ^mL and θ^mP corresponding to mth covariate, respectively. The algorithm for the first approach is as follows.

  1. By resampling, calculate the covariance matrix Σ^ using realizations (η^m*)(j),(θ^mL*)(j) and (θ^mP*)(j), (m = 1,…, k and j = 1,…, M).
  2. From Σ^1, obtain the covariance matrix corresponding to θ^mL and θ^mP, say Σ^m*.
  3. Calculate c^m=(c^m1,c^m2)T=(hTΣ^m*h)1Σ^m*h where h = (1,1)T and obtain the new estimate θ^mMWE=c^m1θ^mL+c^m2θ^mP.
  4. Repeat step 3 for all covariates.

The algorithm for the second approach is as follows.

  1. By resampling, calculate the covariance matrix Σ^ using realizations (η^m*)(j),(θ^mL*)(j) and (θ^mP*)(j) (m = 1,…, k and j = 1,…, M).
  2. Obtain Σ^** from Σ^.
  3. From Σ^** and E, obtain B^.
  4. Calculate the new estimate θ^mJWE=c^m,m*θ^mL+c^k+m,m*θ^mP, where c^j,l* be the element of jth row and lth column of B^.

By Theorem 1 and Theorem 2, our new estimators are consistent and asymptotically normal.

Model checking

For assessing the adequacy of the model, since our weight estimator is based on estimators from [13] and [17], it is reasonable to consider entire processes from [13] and [17]. In this case, we extend model checking technique from [13]. As defined in [13], Let N1i(t;η)=ΔiI{D˜i*(η)t} and N2i(t;α)=δ˜i*(α)I{X˜i*(α)t}, where i = 1,…, n. Then Nelson-Aalen estimators for the event of interest and dependent censoring are Note that by (Eq 3) and (Eq 4), martingale residuals are defined as where α^ can be either α^L={η^T,(θ^L)T}T or α^P={η^T,(θ^P)T}T. Then as defined in [13],

Then similar to [13] and [17], we can substitute η^ on Sn(s; η), α^L and α^P on Un(t;α). [SnT(s;η^),{Un(t;α^L)}T,{Un(t;α^P)}T]T are called observed score processes with respect to dependent censoring and the event of interest, respectively [7], [13], [17]. We can construct [S^nT(s;η^*),{U^nL(t;α^L*)}T,{U^nP(v;α^P*)}T]T [13], [17], where where α^L*={(η^*)T,(θ^L*)T}T and α^P*={(η^*)T,(θ^P*)T}T . These three processes are called bootstrapped processes [7], [13], [17]. We can plot the observed process with bootstrapped processes by randomly selecting 20 or 30 observations. Standard tests for goodness of fit can be performed by calculating Kolmogorov-Smirnov type test statistics. Test statistics are then defined by sups||Sn(s;η^)||,supt||Un(t;α^L)||, and supv||Un(v;α^P)||. To calculate the null distribution of the test statistics, first we obtain jth realizations of bootstrap samples (η^*)(j),(θ^L*)(j) and (θ^P*)(j). Then we compute BSj=sups||S^n(s;(η^*)(j))||,BSjL=supt||U^nL(t;(α^L*)(j))|| and BSjP=supv||U^nP(v;(α^P*)(j))||, respectively for j = 1,…, M, where (α^L*)(j) and (α^P*)(j) are jth realizations of bootstrap samples of α^L* and α^P*. The p-values can be defined by

[10]. If the p-value is smaller than predetermined level, we reject the null hypothesis, which means that data does not have appropriate fit on our bivariate model. Note that a multiple testing problem arises for testing the models for θ. We address this by adjusting p-values based on a Bonferroni correction with two tests.

Simulation Studies

We consider two simulation settings. In first simulation setting, the errors follow a bivariate normal distribution with mean (0,1.2) with variance 1 and correlation ρ = 0,0.25. The independent censoring time C is generated from log(U*), where U* has uniform distribution with minimum value 0 and maximum value 20. Covariate is ZBernoulli(0.5), where Bernoulli(0.5) is Bernoulli distribution with success probability 0.5. We run 500 simulation runs. Within each simulation run, 500 resampling runs are tried for covariance matrix calculation. Sample sizes are N = 150 and N = 300. If there is only one covariate in the model, the first and the second method of the weighted estimation are equivalent. Let this common weighed estimator be θ^WE. We calculate bias (Bias), mean squared error (MSE), mean of standard error (SEE), 95% coverage rate (Coverage). The coverage is based on the normal approximation. Moreover, to evaluate robustness of estimators, we also compute median of difference of the estimator from true value (Dmedian), median of squared error of estimates (Mediansq), and median of standard errors (Sdmedian). Results are summarized on Table 1 and Table 2.

Table 1. Simulation result when N = 150 and N = 300, ρ = 0 with covariate Bernoulli(0.5).

Table 2. Simulation result when N = 150 and N = 300, ρ = 0.25 with covariate Bernoulli(0.5).

In second simulation setting, we generate Gamma random variable ν with mean μ = 1 and variance σ2 = 0 or 1, then create W = exp (ϵX), which is an exponential random variable with rate 4ν−1 and exp (ϵD) with an exponential random variable with rate ν−1. Then we generate time to the event of interest by exp(X)=exp(θ0TZ)exp(ϵX) and time to the dependent censoring by exp(D)=exp(η0TZ)exp(ϵD) (By notation in our paper, X, D and C are already log-transformed times. Thus in this context, exp (X), exp (D) and exp (C) are times in the original scale). The independent censoring time exp (C) has uniform distribution with minimum value 0 and maximum value 20. True parameter values are θ0 = (0.5,1)T and η0 = (1,0.5)T and covariates are Z1U(0,1), where U(0,1) is uniform distribution with minimum value 0 and maximum value 1 and Z2Bernoulli(0.5). We run 500 simulation runs. Within each simulation run, 500 resampling runs are tried for covariance matrix calculation. Let θ^MWE be weighted estimators from calculating weights marginally (the first proposed method) and let θ^JWE be weighted estimators from calculating weights jointly (the second proposed method). We compute the same quantities as we did in the first set of the simulation study. Results are summarized on Table 3 and Table 4.

Table 3. Simulation result when N = 150 and N = 300, σ2 = 0 with two covariates (Z1: U(0, 1), Z2: Bernoulli(0.5)).

Table 4. Simulation result when N = 150 and N = 300, σ2 = 1 with two covariates (Z1: U(0,1), Z2: Bernoulli(0.5)).

In these simulation results, we can see that our weighted estimators have good results. In both cases, bias and mean squared error of our new estimator has similar performance compared to the estimators by [13] and [17]. Mean of standard errors and median of standard errors are smaller than the estimators by [13] and [17]. Moreover, computation results for the median of difference of the estimators from true value and the median of squared error imply that our proposed estimator is comparable with the estimators from the original methods.

In the first simulation setting, the difference of standard error between our proposed estimator and θ^L is bigger than the one between θ^P and the proposed estimator. In the second simulation setting, the phenomenon is the opposite. Furthermore, in the first simulation setting, θ^P has lower standard error on average than one of θ^L while θ^L have better efficiency (with respect to standard error) than ones by θ^P in the second simulation setting. This simulation result verifies our claim, which means that neither estimator is better than another. Our proposed estimator takes advantage of smaller standard error with achieving small bias and correct coverage except N = 150 with σ2 = 1 in the second simulation setting. In this scenario, empirical coverage of proposed estimators is lower than nominal 95% coverage. This is due to low coverage of θ^L. Since we combine θ^L and θ^P, if one of them has low coverage, it is highly likely that the coverage of weighted estimator may also be below the nominal coverage.

Real data analysis

We applied our method to data from the AIDS Clinical Trial Group (ACTG) Study 364 [1], which was used in [17]. This multicenter randomized study investigated patients whose plasma RNA level is at least 500 copies per ml. Subjects were assigned to three treatments, nelfinavir (NFV), efavirenz (EFV), and combination of nelfinavir and efavirenz (NFV + EFV). Details about this study can be found in [1].

The two failure times are time to HIV RNA level greater than 2000 copies per ml and time to withdrawal of study. Let X be the first time when HIV RNA level is greater than 2000 copies per ml and D be time to withdrawal of study. We considered four covariates and 194 observations. Z1 takes value 1 if a patient receives EFV and 0 otherwise. Z2 takes value 1 if a patient receives NFV + EFV and 0 otherwise. Z3 is New3TC, which takes value 1 if lamivudine is given as a new nucleoside analogue therapy to a patient and 0 otherwise. Z4 is logarithm of RNA level at the start of the study.

Table 5 and Table 6 show the point estimates and standard errors of η^, θ^L, θ^P, θ^MWE and θ^JWE. Our method works well for the models with and without New3TC on all covariates. Some variables are seen to be statistically significant based on the weighted estimator while they are not by [13] or [17]. For example, let’s consider effect of EFV to the time to first virologic failure. By Table 6, the estimated effect by using approach of [13] is 0.475 and its standard error is 0.250. From approach of [17], an estimate is 0.464 and its standard error is 0.281. Based on the fact that estimators are asymptotic normal, from Wald test using [13] and [17], EFV is not a statistically significant variable on 5% significant level. On the other hand, a weighted estimate using first approach is 0.471 and its standard error is 0.222. In this case, EFV is a statistically significant variable on 5% significant level.

Table 5. Point estimates with standard errors of covariates in AIDS study for model without New3TC (Standard errors are shown in parenthesis).

Table 6. Point estimates with standard errors of covariates in AIDS study for model with New3TC (Standard errors are shown in parenthesis).

Observed score process with bootstrapped processes for withdrawal of study with respect to Z1 is shown in Fig 1. Fig 2 and Fig 3 show observed score processes and bootstrapped processes of the first virologic failure using α^L,α^P with respect to Z1. These three plots are based on the model without New3TC. They are fluctuating around zero, so it seems that there is no graphical evidence for lack of fit. The p-value for the lack of fit tests of withdrawal is 0.952 and the first virologic failure using α^L and α^P are 0.918 and 0.959 respectively. With graphical checking, p-value indicates that there is no evidence for violation of the model assumption.

Fig 1. Plot of observed score process and bootstrapped processes of time to withdrawal of study with respect to Z1.

The thickline is observed process and the dashed lines are bootstrapped processes.

Fig 2. Plot of observed score process and bootstrapped processes of time to first virologic failure using α^L with respect to Z1.

The thickline is observed process and the dashed lines are bootstrapped processes.

Fig 3. Plot of observed score process and bootstrapped processes of time to first virologic failure using α^P with respect to Z1.

The thickline is observed process and the dashed lines are bootstrapped processes.

For purposes of interpretation, since D represents a standard survival time, the interpretation of η^ is in terms of covariate effect for survival time. However, since the observed time for X depends on D, interpretation of θ^ is difficult. One way to interpret θ^ is to assume that D does not exist and interpret the effect of θ^ on X only. This approach is possible if there exists a reasonable extrapolation mechanism for X [18]. However, considering the estimation structure for θ, it is difficult to separate effect of θ^ to X from effect of η^ to D.


In this paper, we have proposed optimal estimators using combinations of the two estimators from [13] and [17]. Our methodology can be extended to a case of recurrent event with dependent censoring, which is extensively studied [6], [7], [10]. We are currently working on this extension.

Optimality of the estimator has been discussed in other contexts. Recently, there is a publication that proposed optimal additive functions based on score functions [14]. The main point of their method is to combine unbiased estimating functions. In our case, this would be combining estimating equations and new solution can be obtained by this estimating equation. Comparing performance of this solution and our proposed estimator is of interest. This will be left open to future research.

Another way of achieving optimality is to use generalized method of moment estimator [8]. This estimator is a linear combination of estimating functions [19]. In this case, the estimating functions have a greater dimension than the dimension of the parameter vector. The optimality is achieved by the linear combination. It is shown that the estimator from this linear combination of estimating functions is consistent and asymptotically normal [8]. In the literature of statistics, this idea is applied to generalized estimating equations [19]. The estimating functions proposed by [19] are called quadratic inference function. Recently, the quadratic inference function is applied to Cox model [26].

[8] and [19] derived new estimating functions, while we combined two estimators directly. This idea of the generalized method of moments is very appealing, but the estimating functions of [13] and [17] are nonsmooth. Finding derivative for the linear combination of the estimating functions, which is a key in generalized method moments, is challenging for our work because we cannot find the derivatives in the estimating functions proposed by [13] and [17]. Applying the idea of [8] to AFT model will be interesting future research.

Our estimating equations to obtain estimators involve nonsmooth functions of η and α. Many literatures used a linear programming approach for estimating θ [3], [11]. However, this linear programming method is very slow for computing estimators of θ. Thus this approach is very inefficient when implementing to solve (Eq 11) for estimation of Σ. Recently, an approach called a derivative free-spectral algorithm for nonlinear equations (DF-SANE) was proposed [12], and there is a publication that showed that this algorithm is better than the linear programming method using an example of estimating parameters of AFT models under independent censoring. [21]. However, under dependent censoring, the artificial censoring term leads to numerical instability in estimating parameters and calculating resampled estimators. Moreover, this algorithm does not converge well under default tolerance settings using DF-SANE [21]. Thus using this algorithm requires changing the tolerance level. Developing efficient numerical algorithms for estimating parameters is an important topic for future research.

Author Contributions

Conceived and designed the experiments: DG. Performed the experiments: YC DG. Analyzed the data: YC. Contributed reagents/materials/analysis tools: YC DG. Wrote the paper: YC DG.


  1. 1. Albrecht MA, Bosch RJ, Hammer SM, Liou S-H, Kessler H, Para MF, et al. Nelfinavir, Efavirenz, or both after the failure of Nucleoside treatment of HIV infection. N Engl J Med. 2001; 345: 398–407. pmid:11496850
  2. 2. Day R, Bryant J, Lefkopoulou M. Adaptation of bivariate frailty models for prediction, with application to biological markers as prognostic indicators. Biometrika. 1997; 84: 45–56.
  3. 3. Ding AA, Shi G, Wang W, Hsieh J-J. Marginal regression analysis for semi-competing risks data under dependent censoring. Scand J Stat. 2009; 36: 481–500.
  4. 4. Fine JP, Jiang H, Chappell R. On semi-competing risks data. Biometrika. 2001; 88: 907–919.
  5. 5. Fleming TR, Harrington DP. Counting Processes and Survival Analysis. 2nd ed. New York: Wiley; 2005.
  6. 6. Ghosh D, Lin DY. Semiparametric analysis of recurrent events data in the presence of dependent censoring. Biometrics. 2003; 59: 877–885. pmid:14969466
  7. 7. Ghosh D. Semiparametric analysis of recurrent events: artificial censoring, truncation, pairwise estimation and inference. Lifetime Data Anal. 2010; 16: 509–524. pmid:20063182
  8. 8. Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982; 50: 1029–1054.
  9. 9. Honoré BE, Powell JL. Pairwise difference estimators of censored and truncated regression models. J Econom. 1994; 64: 241–278.
  10. 10. Hsieh J-J, Ding AA, Wang W. Regression analysis for recurrent events data under dependent censoring. Biometrics. 2011; 67: 719–729. pmid:21039394
  11. 11. Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003; 90: 341–353.
  12. 12. La Cruz W, Martínez JM, Raydan M. Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Mathematics of Computation. 2006; 75: 1429–1448.
  13. 13. Lin DY, Robins JM, Wei LJ. Comparing two failure time distributions in the presence of dependent censoring. Biometrika. 1996; 83: 381–393.
  14. 14. Lindsay BG, Yi GY, Sun J. Issues and strategies in the selection of composite likelihoods. Statistica Sinica. 2011; 21: 71–105.
  15. 15. Louis TA. Nonparametric analysis of an accelerated failure time model. Biometrika. 1981; 68: 381–390.
  16. 16. Parzen MI, Wei LJ, Ying Z. A resampling method based on pivotal estimating functions. Biometrika. 1994; 81: 341–350.
  17. 17. Peng L, Fine JP. Rank estimation of accelerated lifetime models with dependent censoring. J Am Stat Assoc. 2006; 101: 1085–1093.
  18. 18. Prentice RL, Kalbfleisch JD, Peterson AV Jr, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978; 34: 541–554. pmid:373811
  19. 19. Qu A, Lindsay BG, and Li B. Improving generalised estimating equations using quadratic inference functions. Biometrika. 2000; 87: 823–836.
  20. 20. Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann Stat. 1990; 18: 354–372.
  21. 21. Varadhan R, Gilbert PD. BB: An R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. J Stat Softw. 2009; 32: 1–26.
  22. 22. Wang W. Estimating the association parameter for copula models under dependent censoring. J R Stat Soc Series B Stat Methodol. 2003; 65: 257–273.
  23. 23. Wei LJ, Gail MH. Nonparametric estimation for a scale-change with censored observations. J Am Stat Assoc. 1983; 78: 382–388.
  24. 24. Wei LJ, Johnson WE. Combining dependent tests with incomplete repeated measurements. Biometrika. 1985; 72: 359–364.
  25. 25. Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc. 1989; 84: 1065–73.
  26. 26. Xue L, Wang L, Qu A. Incorporating correlation for multivariate failure time data when cluster size is large. Biometrics. 2010; 66: 393–404. pmid:19673860
  27. 27. Ying Z. A large sample study of rank estimation for censored regression data. Ann Stat. 1993; 21: 76–99.