## Figures

## Abstract

Independent censoring is a crucial assumption in survival analysis. However, this is impractical in many medical studies, where the presence of dependent censoring leads to difficulty in analyzing covariate effects on disease outcomes. The semicompeting risks framework offers one approach to handling dependent censoring. There are two representative estimators based on an artificial censoring technique in this data structure. However, neither of these estimators is better than another with respect to efficiency (standard error). In this paper, we propose a new weighted estimator for the accelerated failure time (AFT) model under dependent censoring. One of the advantages in our approach is that these weights are optimal among all the linear combinations of the previously mentioned two estimators. To calculate these weights, a novel resampling-based scheme is employed. Attendant asymptotic statistical results for the estimator are established. In addition, simulation studies, as well as an application to real data, show the gains in efficiency for our estimator.

**Citation: **Cho Y, Ghosh D (2015) Weighted Estimation of the Accelerated Failure Time Model in the Presence of Dependent Censoring. PLoS ONE 10(4):
e0124381.
https://doi.org/10.1371/journal.pone.0124381

**Academic Editor: **Xi Luo,
Brown University, UNITED STATES

**Received: **October 19, 2014; **Accepted: **March 1, 2015; **Published: ** April 24, 2015

**Copyright: ** © 2015 Cho, Ghosh. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

**Data Availability: **Third-party data are available from AIDS Clinical Trial Group 364. Requests for data may be sent to sdac.data@sdac.harvard.edu. Please also see the full AIDS Clinical Trial Group Access to Published Data page here: https://actgnetwork.org/clinical-trials/access-published-data.

**Funding: **This work was supported by National Institute of Health (NIH) grant CA 129102-05. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

In medical studies, it is very common that death or withdrawal of study and progression on disease of interest simultaneously occur in the study. For this case, death or withdrawal of study may censor the development of disease. This type of data structure is called ‘semicompeting risks data’ [4].

Semicompeting risks data have been widely studied in the past decade. Some researchers used a Gamma copula to estimate the association parameter between the event of interest and dependent censoring [2], [4]. There is a literature that extended the methodology of [2] to the case that a nuisance parameter exists and also considered a more general copula model [22].

On the other hand, other researchers used semiparametric regression to model the event of interest and dependent censoring jointly. One approach is an estimation procedure based on the accelerated failure time (AFT) model [13], [17]. They used the artificial censoring technique to adjust the bias of the usual estimator. While the estimating equation of [13] is a U-statistic of order one, that of [17] is a U-statistic of order 2.

However, none of these papers fully discussed optimality of the estimator. In this case, choosing an estimator that is optimal from an efficiency viewpoint is an important issue for consideration. Here, we adapt the idea of [25], which proposed an optimal estimator whose form is a linear combination of estimators for multivariate failure time data. They used idea of [24], which proposed using combinations of dependent tests in the presence of missing values. Idea of [24] is to create a test which can maximize power based on linear combination of test statistics. Approach of [25] is simple and flexible, so it is sensible to apply their method in our case.

In this paper, we propose a weighted estimator by using methodology from [25]. Our weighted estimator combines those of [13] and [17]. The structure of this paper is as follows. In methods section, we review estimators proposed by [13] and [17] briefly. In addition, we describe details on our new weighted estimator. In model checking section, model checking procedure is briefly discussed. In simulation studies section, results of simulation studies will be given. Application of our method to a real data example is presented in real data analysis section. Some discussion concludes discussion section.

## Methods

### Review of Model

Let *X* be time to the event of interest, *D* the time to dependent censoring and *C* the time to independent censoring. All these times are transformed on a logarithmic scale. Let $\tilde{X}=X\wedge D\wedge C$ and $\tilde{D}=D\wedge C$. Define $\delta =I(X\le \tilde{D})$, Δ = *I*(*D* ≤ *C*) and let **Z** be covariates. The data contain *n* independent and identically distributed observations $({\tilde{X}}_{i},{\tilde{D}}_{i},{\mathbf{\text{Z}}}_{i},{\delta}_{i},{\Delta}_{i}),i=1,\dots ,n$. The model is
where *θ*_{0} and *η*_{0} are *k* × 1 vectors, and ${\u03f5}_{i}\equiv ({\u03f5}_{i}^{X},{\u03f5}_{i}^{D})$ are error terms with an unknown joint distribution. In this case, we assume that the model is identifiable only in upper wedge *X* < *D* [4], [17]. We assume that *ϵ* has unknown distribution *H*. The goal is to obtain an unbiased estimator of ** α** = (

*η*^{T},

*θ*^{T})

^{T}without nonparametrically estimating the distribution of

*ϵ*

_{i},

*i*= 1,…,

*n*. We further assume that given

**Z**,

*C*and (

*X*,

*D*) are independent, but

*X*and

*D*can be dependent given

**Z**. Now we are going to describe the procedures of [13] and [17] in turn.

Since $\tilde{D}$ only depends on independent censoring, a standard rank regression approach is available for estimation [11], [13], [15], [17], [20], [23]. The estimating equation for ** η** is given by
where ${\tilde{D}}_{i}^{*}(\mathit{\eta})={\tilde{D}}_{i}-{\mathbf{\text{Z}}}_{i}^{T}\mathit{\eta}$. The estimator of

**can be obtained by solving**

*η***S**

_{n}(

**) = 0.**

*η*For estimation of ** θ**, simply replacing ${\tilde{D}}_{i}-{\mathbf{\text{Z}}}_{i}^{T}\mathit{\eta}$ to ${\tilde{X}}_{i}-{\mathbf{\text{Z}}}_{i}^{T}\mathit{\theta}$ does not yield unbiased estimation of

**. This is because the cause-specific hazard function for ${\tilde{X}}_{i}-{\mathbf{\text{Z}}}_{i}^{T}\mathit{\theta}$ depends on ${\tilde{D}}_{i}-{\mathbf{\text{Z}}}_{i}^{T}\mathit{\theta}$, which violates the model assumption [13]. To fix this problem, many authors use artificial censoring techniques [3], [6], [7], [10], [13], [17]. In [13], a single constant term**

*θ**g*(

**) is proposed so that the estimation equation will be unbiased for estimation of**

*α***in the two-sample case. The form of**

*θ**g*(

**) is $g(\mathbf{\alpha})={\mathrm{max}}_{1\le i\le n}\{0,{\mathbf{\text{Z}}}_{i}^{T}(\mathit{\theta}-\mathit{\eta})\}$. The proposed estimator in [13] is obtained by solving ${\mathbf{\text{U}}}_{n}^{L}(\mathbf{\alpha})$ = 0, where (1) and**

*α**a*∧

*b*means minimum of

*a*and

*b*. In [17], pairwise comparisons of all the subjects is proposed so that each subject has different degree of the artificial censoring. The transformations suggested by [17] are (2) The proposed estimator according to [17] is obtained by solving ${U}_{n}^{P}(\mathbf{\alpha})=0$, which is defined by Note that

*X*and

*D*are not observable, but we can express transformation (Eq 1) and (Eq 2) by using observable quantities [7], [13].

### Weighted estimator

Given these two estimation procedures, it is natural to consider their efficiencies with respect to standard error. However, in this point of view, neither estimator is superior to the other. Moreover, these estimators may not be optimal estimators with respect to the standard error. There is an argument that estimator of [17] gains more efficiency than that of [13] because pairwise comparisons lead to less artificial censoring than that in [13]. However, this logic only holds when we look at performance of estimators in the view of bias and variance across the estimators in simulation study. Concentrating on standard error of an estimator in a single dataset, the estimator by [17] may not provide better estimator than that of [13]. This will be seen in the real data analysis section.

The reason for this is due to estimation procedure of [17]. As discussed [7], for *n* samples, the number of comparisons of [13] for artificial censoring is of order *n*, while that of [17] is of order *n*^{2}. By definition of *g*_{ij}(** α**), different degrees of artificial censoring is applied to observations. It may lead more variation between observations, which makes standard error larger than that of [13].

Having discussed our data structure and estimators from [13] and [17], we now describe the proposed estimation in this paper. Let $\widehat{\mathit{\eta}}={({\widehat{\mathit{\eta}}}_{1},\dots ,{\widehat{\mathit{\eta}}}_{k})}^{T}$ be estimator of *η*_{0}, ${\widehat{\mathit{\theta}}}^{L}={({\widehat{\mathit{\theta}}}_{1}^{L},\dots ,{\widehat{\mathit{\theta}}}_{k}^{L})}^{T}$ be estimator of *θ*_{0} by [13] and ${\widehat{\mathit{\theta}}}^{P}={({\widehat{\mathit{\theta}}}_{1}^{P},\dots ,{\widehat{\mathit{\theta}}}_{k}^{P})}^{T}$ be estimator of *θ*_{0} by [17]. ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$ are asymptotically unbiased estimators of *θ*_{0}.

We extend the scope of estimators which provide consistent estimation of *θ*_{0}. The natural extension of estimators of [13] and [17] is to consider collections of estimators that are linear combination of these two estimators with sum of weights being 1. By choosing proper weights, we can expect that the variance of the new combined estimator is smaller than that of each individual estimator in ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$.

The goal is to find weights such that the variance of the new estimator is smaller than the minimum of variance of the estimators by [13] and [17], which have good theoretical properties. To obtain the estimator that yields smallest variance with these properties, we can use the idea of [25], which was applied to the problem of modeling multivariate failure times.

In [25], the joint distribution of estimators $\widehat{\mathbf{\gamma}}=\{{\widehat{\gamma}}_{mr}\}$ is considered, where *m* = 1,…, *k* and *r* = 1…*R*. In this case, *m* indicates index of regression parameters and *r* stands for index of the *r*th event. For obtaining an optimal estimator, they applied arguments from [24] which derived a linear combination of test statistic to maximize power against every alternative hypothesis. Let $\widehat{\mathbf{\text{H}}}$ be the covariance matrix for the estimators $\widehat{\mathbf{\gamma}}$. Then we fix *m* and define ${\widehat{\mathbf{\text{H}}}}_{m}$ be covariance matrix of ${\widehat{\mathbf{\gamma}}}_{m}=({\widehat{\gamma}}_{m1},\dots ,{\widehat{\gamma}}_{mR})$. It can be obtained from the entire covariance matrix by selecting the part corresponding to $\widehat{\mathbf{\gamma}}$ for *r* = 1,…, *R* under fixed *m*. Now we can define ${\sum}_{r=1}^{R}{d}_{r}{\widehat{\gamma}}_{mr}$, where **d** = (*d*_{1}, *d*_{2},…, *d*_{R}) satisfies ${\sum}_{r=1}^{R}{d}_{r}=1$ [25]. Then $\mathbf{\text{d}}\equiv {({\mathbf{\text{e}}}^{T}{\widehat{\mathbf{\text{H}}}}_{m}^{-1}\mathbf{\text{e}})}^{-1}{\widehat{\mathbf{\text{H}}}}_{m}^{-1}\mathbf{\text{e}}$ is a vector of weights which leads the best estimator among linear combinations of estimators of ${\widehat{\mathbf{\gamma}}}_{m}$ where **e** is a vector consisting of *R* ones [24], [25].

We now apply the argument in previous paragraph to our model by considering the joint distribution of $\widehat{\mathit{\beta}}={\left\{{\widehat{\eta}}^{T},{\left({\widehat{\theta}}^{L}\right)}^{T},{\left({\widehat{\theta}}^{P}\right)}^{T}\right\}}^{T}$. Let ${\mathit{\beta}}_{0}={({\mathit{\eta}}_{0}^{T},{\mathit{\theta}}_{0}^{T},{\mathit{\theta}}_{0}^{T})}^{T}$ and ${G}_{n}(\mathit{\beta})={\left[{S}_{n}^{T}(\eta ),{\left\{{U}_{n}^{L}(\mathit{\alpha})\right\}}^{T},{\left\{{U}_{n}^{P}(\mathit{\alpha})\right\}}^{T}\right]}^{T}$ where ${\left[{S}_{n}^{T}(\eta ),{\left\{{U}_{n}^{L}(\mathit{\alpha})\right\}}^{T},{\left\{{U}_{n}^{P}(\mathit{\alpha})\right\}}^{T}\right]}^{T}$ are estimating equations for *β*_{0}. The strong consistency and asymptotic joint distribution of three estimators, described in following theorems, play a crucial role in our methodology.

To prove asymptotic results, several regularity conditions are required. As stated in [7] and [17], define

Let ${\mathbf{\alpha}}_{0}={({\mathit{\eta}}_{0}^{T},{\mathit{\theta}}_{0}^{T})}^{T}$. Define and

From the Appendix in [17], the additional conditions are as follows:

- The parameter space 𝓦 is compact, and the true parameter
*α*_{0}is an interior point of 𝓦. *θ*_{0}is the only solution of the estimating equation $E\{{n}^{-1/2}{\mathbf{\text{U}}}_{n}^{P}({\mathit{\eta}}_{0},\mathit{\theta})\}=0$.*E*(||**Z**||^{2}) < ∞, where ||·|| is Euclidean norm and there exists positive constant*K*such that partial derivatives of*F*are bounded by*K*and there exists positive constant*K** such that marginal probability distribution of*F*is bounded by*K** almost surely.*cov*[(**Z**_{1}−**Z**_{2}){*T*_{1}(**Z**_{1},**Z**_{2})}^{1/2}] and*cov*[(**Z**_{1}−**Z**_{2}){*T*_{2}(**Z**_{1},**Z**_{2})}^{1/2}] are positive definite.

**Theorem 1**. By conditions of *C*1 − *C*3 in Appendix of [17] and conditions in [27], $\widehat{\mathit{\beta}}$ is (strongly) consistent.

*Proof*. Let $\widehat{\beta}={\left\{{\widehat{\eta}}^{T},{\left({\widehat{\theta}}^{L}\right)}^{T},{\left({\widehat{\theta}}^{P}\right)}^{T}\right\}}^{T}$. It suffices to show that $\widehat{\mathit{\eta}},{\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$ are strongly consistent, respectively. Let ** α** = (

*η*^{T},

*θ*^{T})

^{T}. Note that we have compact region, say 𝓦 and we assume regularity conditions in [27]. By [27], there exists nonrandom function

*m*

_{1}such that sup

_{η ∈ 𝓝0}||

*n*

^{−1/2}

**S**

_{n}(

**) −**

*η***m**

_{1}(

**)|| converges to 0 with probability 1 where 𝓝**

*η*_{0}is a neighborhood of

*η*_{0}. Thus $\widehat{\mathit{\eta}}$ is strongly consistent. Similarly, we have another nonrandom function

**m**

_{2}such that ${\mathrm{sup}}_{\mathbf{\alpha}\in {\U0001d4dd}_{1}}||{n}^{-1/2}{\mathbf{\text{U}}}_{n}^{L}(\mathbf{\alpha})-{\mathbf{\text{m}}}_{2}(\mathbf{\alpha})||$ converges to 0 with probability 1 where 𝓝

_{1}is a neighborhood of

*α*_{0}. Hence by [27], ${\widehat{\mathbf{\alpha}}}^{L}$ is strongly consistent.

For ${\widehat{\mathit{\theta}}}^{P}$, by argument in Appendix of [17], note that by the U-statistics version of the law of large numbers, for all ** α** ∈ 𝓦, $||{n}^{-1/2}{\mathbf{\text{U}}}_{n}^{P}(\mathbf{\alpha})-\mathbf{\gamma}(\mathbf{\alpha})||$ converges to 0 in probability where $\mathbf{\gamma}(\mathbf{\alpha})=E\{{n}^{-1/2}{\mathbf{\text{U}}}_{n}^{P}(\mathbf{\alpha})\}$. We can partition our compact space as 𝓦

_{1},…,𝓦

_{k}so that $\U0001d4e6\in {\cup}_{j=1}^{k}{\U0001d4e6}_{j}$. Clearly, then for {

*α*^{j}∈ 𝓦

_{j},

*j*= 1,…,

*k*}, ${\mathrm{max}}_{1\le j\le k}||{n}^{-1/2}{\mathbf{\text{U}}}_{n}^{P}({\mathbf{\alpha}}^{j})-\mathbf{\gamma}({\mathbf{\alpha}}^{j})||$ converges to 0 in probability. Then by Appendix of [17], and for all

*ϵ*> 0, there exists

*ξ*> 0 such that Hence

Thus ${\widehat{\mathit{\theta}}}^{P}$ is strongly consistent and clearly, $\widehat{\mathit{\beta}}$ is strongly consistent.

**Theorem 2**. Assuming certain technical conditions from [27] and [17], ${n}^{1/2}(\widehat{\mathit{\beta}}-{\mathit{\beta}}_{0})$ is asymptotically normal with mean zero vector and covariance matrix **Σ**_{0} where ${\mathbf{\Sigma}}_{0}={\mathbf{\Gamma}}_{0}^{-1}{\mathbf{\Omega}}_{0}{\mathbf{\Gamma}}_{0}^{-1},$ where **Γ**_{0} is a nonsingular matrix and **Ω**_{0} is the asymptotic covariance matrix of **G**_{n}(*β*_{0}).

*Proof*. As consistency, we assume the same regularity conditions as in [27]. Let ${\mathit{\beta}}_{0}={({\mathit{\eta}}_{0}^{T},{\mathit{\theta}}_{0}^{T},{\mathit{\theta}}_{0}^{T})}^{T}$ and ${G}_{n}(\mathit{\beta})={\left[{S}_{n}^{T}(\eta ),{\left\{{U}_{n}^{L}(\mathit{\alpha})\right\}}^{T},{\left\{{U}_{n}^{P}(\mathit{\alpha})\right\}}^{T}\right]}^{T}$. Similar to [13], let ${\lambda}_{0}^{(1)}(t)$ be the cause-specific hazard function for the ${\tilde{D}}_{i}^{*}(\mathit{\eta})$ and let ${\lambda}_{0}^{(2)}(t)$ be the cause-specific hazard function for ${\tilde{X}}_{i}^{*}(\mathbf{\alpha})$ under dependent censoring. Define
(3) (4)
Then *M*_{1i} and *M*_{2i} are martingales [5], [13]. By adapting a proof in the Appendix in [13], Rebdolledo’s martingale central limit theorem [5] gives
where ${\stackrel{\u203e}{\mathbf{\text{Z}}}}^{(1)}(u)={\mathrm{lim}}_{n\to \infty}[{\sum}_{j=1}^{n}I\{{\tilde{D}}_{j}^{*}({\mathit{\eta}}_{0})\ge u\}{\mathbf{\text{Z}}}_{j}]/[{\sum}_{j=1}^{n}I\{{\tilde{D}}_{j}^{*}({\mathit{\eta}}_{0})\ge u\}]$ and ${\stackrel{\u203e}{\mathbf{\text{Z}}}}^{(2)}(u)={\mathrm{lim}}_{n\to \infty}[{\sum}_{j=1}^{n}I\{{\tilde{X}}_{j}^{*}({\mathbf{\alpha}}_{0})\ge u\}{\mathbf{\text{Z}}}_{j}]/[{\sum}_{j=1}^{n}\{I({\tilde{X}}_{j}^{*}({\mathbf{\alpha}}_{0})\ge u)\}]$. From Appendix of [17],
where 2**h**_{1}(**v**,*α*_{0}) = 2*E*[**h**(**v**,**V**_{2},*α*_{0})]. For *j* = 1,…, *n*, *M*_{1j}(*t*) is the martingale associated with ${\u03f5}_{j}^{D}$, while *M*_{2j}(*t*) is the martingale associated with ${\u03f5}_{j}^{X}$ and **h**(**V**_{i},**V**_{j},** α**) = (

**Z**

_{i}−

**Z**

_{j})

*ϕ*

_{ij}(

**) [13], [17]. For**

*α**j*= 1,…,

*n*, define By the Cramér-Wold theorem,

**G**

_{n}(

*β*_{0}) has an asymptotically normal distribution with mean zero and covariance matrix

**Ω**

_{0}, where Note that $E\{{n}^{-1/2}{\mathbf{\text{U}}}_{n}^{P}(\mathbf{\alpha})\}=\mathbf{\gamma}(\mathbf{\alpha})$. As stated in the Appendix of [17], under conditions of

*N*1 −

*N*3 from [9], there exists an open neighborhood of

*α*_{0}, say

*K*

_{0}, such that (5) Using a Taylor series expansion of

**γ**(

**) around**

*α*

*α*_{0}, (6) With these two results (Eq 5) and (Eq 6), by Appendix of [17], (7) From [27], we have that (8) for any

**in the small neighborhood of**

*η*

*η*_{0}, where

**P**

_{0}is

*k*×

*k*nonsingular matrix. From the Appendix in [13], for ${J}_{1n}(\mathit{\alpha})={\left[{S}_{n}^{T}(\eta ),{\left\{{U}_{n}^{L}(\mathit{\alpha})\right\}}^{T}\right]}^{T}$, (9) for any

**in the small neighborhood of**

*α*

*α*_{0}, where

**L**

_{10}is defined as is 2

*k*× 2

*k*nonsingular matrix and

**M**

_{0}and

**H**

_{0}are

*k*×

*k*constant matrices. Define ${J}_{2n}(\mathit{\alpha})={\left[{S}_{n}^{T}(\eta ),{\left\{{U}_{n}^{P}(\mathit{\alpha})\right\}}^{T}\right]}^{T}$. Using expansion from [17], for any

**in the small neighborhood of**

*α*

*α*_{0}, (10) where ${\mathbf{\text{R}}}_{0}=\frac{\partial \mathbf{\gamma}(\mathbf{\alpha})}{\partial \mathit{\eta}}{|}_{\mathbf{\alpha}={\mathbf{\alpha}}_{0}}$ and ${\mathbf{\text{V}}}_{0}=\frac{\partial \mathbf{\gamma}(\mathbf{\alpha})}{\partial \mathit{\theta}}{|}_{\mathbf{\alpha}={\mathbf{\alpha}}_{0}}$. Combining expansions of (Eq 8), (Eq 9) and (Eq 10), we have for any

**in the small neighborhood of**

*β*

*β*_{0}, where

**Γ**

_{0}is defined as The results from [9] and [27], along with the consistency of $\widehat{\mathit{\beta}}$, imply that By combining the above results with Slutsky’s theorem, ${n}^{1/2}(\widehat{\mathit{\beta}}-{\mathit{\beta}}_{0})$ has an asymptotically normal distribution with mean zero and covariance matrix ${\mathbf{\Gamma}}_{0}^{-1}{\mathbf{\Omega}}_{0}{\mathbf{\Gamma}}_{0}^{-1}$.

Theorem 2 implies the asymptotic normality of $\widehat{\mathit{\beta}}$ with the form of **Σ**_{0} being
Let $\widehat{\mathbf{\Sigma}}$ be the estimated covariance matrix of **Σ**_{0}. In this covariance matrix, ${\widehat{\mathbf{\Sigma}}}_{11}$ is a *k* × *k* covariance matrix for $\widehat{\mathit{\eta}}$, ${\widehat{\mathbf{\Sigma}}}_{22}$ is a *k* × *k* covariance matrix for ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathbf{\Sigma}}}_{33}$ is a *k* × *k* covariance matrix for ${\widehat{\mathit{\theta}}}^{P}$. Moreover, ${\widehat{\mathbf{\Sigma}}}_{12}$ and ${\widehat{\mathbf{\Sigma}}}_{13}$ represent covariance terms between $\widehat{\mathit{\eta}}$ and ${\widehat{\mathit{\theta}}}^{L}$ and between $\widehat{\mathit{\eta}}$ and ${\widehat{\mathit{\theta}}}^{P}$, respectively. Define ${\widehat{\mathbf{\Sigma}}}_{23}$ as the covariance matrix between ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$. Clearly, ${\widehat{\mathbf{\Sigma}}}_{21}={\widehat{\mathbf{\Sigma}}}_{12}^{T}$, ${\widehat{\mathbf{\Sigma}}}_{31}={\widehat{\mathbf{\Sigma}}}_{13}^{T}$ and ${\widehat{\mathbf{\Sigma}}}_{32}={\widehat{\mathbf{\Sigma}}}_{23}^{T}$.

The issue remains of how to obtain the matrix corresponding to ${\widehat{\mathbf{\text{H}}}}_{m}^{-1}$ in our context. Note that $\widehat{\mathit{\eta}},{\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$ are correlated with each other. The estimating equation structure implies that ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$ cannot be estimated separately from $\widehat{\mathit{\eta}}$. Thus our matrix corresponding to ${\widehat{\mathbf{\text{H}}}}_{m}^{-1}$ should include the effect of $\widehat{\mathit{\eta}}$. To obtain the matrix, we need to invert whole matrix and extract submatrix corresponding to ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$. There are two approaches to obtain submatrix.

The first approach is to invert $\widehat{\mathbf{\Sigma}}$ and obtain the submatrix of ${\widehat{\mathbf{\Sigma}}}^{-1}$ corresponding to ${\widehat{\mathit{\theta}}}_{m}^{L}$ and ${\widehat{\mathit{\theta}}}_{m}^{P}$. Let us denote this matrix as ${\widehat{\mathbf{\Sigma}}}_{m}^{*}$. Clearly, this matrix is 2 × 2 and also positive definite. Then we can calculate ${\widehat{\mathbf{\text{c}}}}_{m}={({\widehat{c}}_{m1},{\widehat{c}}_{m2})}^{T}={({\mathbf{\text{h}}}^{T}{\widehat{\mathbf{\Sigma}}}_{m}^{*}\mathbf{\text{h}})}^{-1}{\widehat{\mathbf{\Sigma}}}_{m}^{*}\mathbf{\text{h}}$, where **h** = (1, 1)^{T}. By using the form of the optimal estimator in [25], we obtain new weighted estimator for *m*th covariate, say ${\widehat{\mathit{\theta}}}_{m}^{MWE}$, where
We can repeat this step for the other regression coefficients. Then we obtain ${\widehat{\mathit{\theta}}}^{MWE}={({\widehat{\mathit{\theta}}}_{1}^{MWE},\dots ,{\widehat{\mathit{\theta}}}_{k}^{MWE})}^{T}$. In this first approach, weights are generated through using *k* number of 2 × 2 matrices. We can refer this first approach as ‘marginal approach’.

Sometimes it is desirable to consider entire covariates all at once when obtaining weights. The second approach is to obtain the corresponding submatrix of ${\widehat{\mathbf{\Sigma}}}^{-1}$ for ${\left\{{\left({\widehat{\theta}}^{L}\right)}^{T},{\left({\widehat{\theta}}^{P}\right)}^{T}\right\}}^{T}$. We denote this matrix as ${\widehat{\mathbf{\Sigma}}}^{**}$. This approach is different from first one in that ${\widehat{\mathbf{\Gamma}}}_{m}$ consists of elements of the covariance matrix from ${\widehat{\mathit{\theta}}}_{m}^{L}$ and ${\widehat{\mathit{\theta}}}_{m}^{P}$ but now ${\widehat{\mathbf{\Sigma}}}^{**}$ has elements of covariance matrix from corresponding entire ${\left\{{\left({\widehat{\theta}}^{L}\right)}^{T},{\left({\widehat{\theta}}^{P}\right)}^{T}\right\}}^{T}$. This approach reflects the effect of ${\left\{{\left({\widehat{\theta}}^{L}\right)}^{T},{\left({\widehat{\theta}}^{P}\right)}^{T}\right\}}^{T}$ jointly on our new estimator. Let **E** be a 2*k* × *k* matrix such that

**E** is a multivariate extension of **h**. Note that **E** is concatenation of two *k* × *k* identity matrices by row. Entries that are 1 in these two *k* × *k* identity matrices are source of weights for ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$. The next step is to construct $\widehat{\mathbf{\text{B}}}$, which is

Then $\widehat{\mathbf{\text{B}}}$ has the form
This matrix is a multivariate extension of ${\widehat{\mathbf{\text{c}}}}_{m}$ from the first approach. This matrix is a contrast matrix in the sense that ${\widehat{c}}_{m,m}^{*}+{\widehat{c}}_{(k+m),m}^{*}=1$ for the *m*th regression coefficient of ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$. Moreover, ${\widehat{c}}_{p,p}^{*}+{\widehat{c}}_{(k+p),p}^{*}=0$ for *p* ≠ *m* = 1,…, *k*. Using a vector form, from this approach our new estimator, say ${\widehat{\mathit{\theta}}}^{JWE}$,
We can also refer this approach as the ‘joint approach’.

Now the key step is to obtain $\widehat{\mathbf{\Sigma}}$. We use the resampling approach of [16], which was also used in [13] and [17]. Let ${\widehat{\mathit{\alpha}}}^{L}={\left\{{\widehat{\eta}}^{T},{\left({\widehat{\theta}}^{L}\right)}^{T}\right\}}^{T}$ and ${\widehat{\mathit{\alpha}}}^{P}={\left\{{\widehat{\eta}}^{T},{\left({\widehat{\theta}}^{P}\right)}^{T}\right\}}^{T}$. From [13] and [17], we have
and
Define
A consistent estimator of **Ω**_{0} is
We then solve the estimating equation
(11)
where *Q*_{i} (*i* = 1,…, *n*) represent standard normal random variables. Note that ${G}_{n}(\mathit{\beta})={\left[{S}_{n}^{T}(\eta ),{\left\{{U}_{n}^{L}(\mathit{\alpha})\right\}}^{T},{\left\{{U}_{n}^{P}(\mathit{\alpha})\right\}}^{T}\right]}^{T}$ is joint estimating equation for ${({\mathit{\eta}}_{0}^{T},{\mathit{\theta}}_{0}^{T},{\mathit{\theta}}_{0}^{T})}^{T}$. By solving this equation, we obtain many realizations of $\widehat{\mathit{\beta}}$s, say ${\widehat{\mathit{\beta}}}^{R}={\left\{{\left({\widehat{\eta}}^{*}\right)}^{T},{\left({\widehat{\theta}}^{L*}\right)}^{T},{\left({\widehat{\theta}}^{P*}\right)}^{T}\right\}}^{T}$ where ${\left\{{\left({\widehat{\eta}}^{*}\right)}^{T},{\left({\widehat{\theta}}^{L*}\right)}^{T},{\left({\widehat{\theta}}^{P*}\right)}^{T}\right\}}^{T}$ are solutions from (Eq 11). The next theorem, combined with Theorem 2, justifies the resampling approach for calculating $\widehat{\mathbf{\Sigma}}$.

**Theorem 3**. Based on the technical conditions in [16], the unconditional distribution of ${n}^{1/2}(\widehat{\mathit{\beta}}-{\mathit{\beta}}_{0})$ is same asymptotically as the conditional distribution of ${n}^{1/2}({\widehat{\mathit{\beta}}}^{R}-\widehat{\mathit{\beta}})$ where ${\widehat{\mathit{\beta}}}^{R}$ are realizations of $\widehat{\mathit{\beta}}$ from resampling.

*Proof*. Recall that for any ** β** in the small neighborhood of

*β*_{0}, we have (12) Note that ${\widehat{\mathit{\beta}}}^{R}$ are solutions of Eq (11). By conditioning on observed data and using expansion (Eq 12) as well as by adapting arguments in [13] and [16], and hence, Note that ${n}^{-1/2}{\sum}_{i=1}^{n}{\mathbf{\text{W}}}_{i}{Q}_{i}$ is asymptotically normal with covariance matrix

**Σ**

_{0}. Then given observed data, distribution of ${n}^{1/2}({\widehat{\mathit{\beta}}}^{R}-\widehat{\mathit{\beta}})$ is asymptotically normal with covariance matrix ${\mathbf{\Gamma}}_{0}^{-1}{\mathbf{\Sigma}}_{0}{\mathbf{\Gamma}}_{0}^{-1}$. Hence conditional distribution of ${n}^{1/2}({\widehat{\mathit{\beta}}}^{R}-\widehat{\mathit{\beta}})$ on observed data is asymptotically same as unconditional distribution of ${n}^{1/2}(\widehat{\mathit{\beta}}-{\mathit{\beta}}_{0})$.

For *m* = 1,…*k* and *j* = 1,…, *M*, let ${({\widehat{\mathit{\eta}}}_{m}^{*})}^{(j)},{({\widehat{\mathit{\theta}}}_{m}^{L*})}^{(j)}$ and ${({\widehat{\mathit{\theta}}}_{m}^{P*})}^{(j)}$ be *j*th realizations of an element ${\widehat{\mathit{\eta}}}_{m},{\widehat{\mathit{\theta}}}_{m}^{L}$ and ${\widehat{\mathit{\theta}}}_{m}^{P}$ corresponding to *m*th covariate, respectively. The algorithm for the first approach is as follows.

- By resampling, calculate the covariance matrix $\widehat{\mathbf{\Sigma}}$ using realizations ${({\widehat{\mathit{\eta}}}_{m}^{*})}^{(j)},{({\widehat{\mathit{\theta}}}_{m}^{L*})}^{(j)}$ and ${({\widehat{\mathit{\theta}}}_{m}^{P*})}^{(j)}$, (
*m*= 1,…,*k*and*j*= 1,…,*M*). - From ${\widehat{\mathbf{\Sigma}}}^{-1}$, obtain the covariance matrix corresponding to ${\widehat{\mathit{\theta}}}_{m}^{L}$ and ${\widehat{\mathit{\theta}}}_{m}^{P}$, say ${\widehat{\mathbf{\Sigma}}}_{m}^{*}$.
- Calculate ${\widehat{\mathbf{\text{c}}}}_{m}={({\widehat{c}}_{m1},{\widehat{c}}_{m2})}^{T}={({\mathbf{\text{h}}}^{T}{\widehat{\mathbf{\Sigma}}}_{m}^{*}\mathbf{\text{h}})}^{-1}{\widehat{\mathbf{\Sigma}}}_{m}^{*}\mathbf{\text{h}}$ where
**h**= (1,1)^{T}and obtain the new estimate ${\widehat{\mathit{\theta}}}_{m}^{MWE}={\widehat{c}}_{m1}{\widehat{\mathit{\theta}}}_{m}^{L}+{\widehat{c}}_{m2}{\widehat{\mathit{\theta}}}_{m}^{P}$. - Repeat step 3 for all covariates.

The algorithm for the second approach is as follows.

- By resampling, calculate the covariance matrix $\widehat{\mathbf{\Sigma}}$ using realizations ${({\widehat{\mathit{\eta}}}_{m}^{*})}^{(j)},{({\widehat{\mathit{\theta}}}_{m}^{L*})}^{(j)}$ and ${({\widehat{\mathit{\theta}}}_{m}^{P*})}^{(j)}$ (
*m*= 1,…,*k*and*j*= 1,…,*M*). - Obtain ${\widehat{\mathbf{\Sigma}}}^{**}$ from $\widehat{\mathbf{\Sigma}}$.
- From ${\widehat{\mathbf{\Sigma}}}^{**}$ and
**E**, obtain $\widehat{\mathbf{\text{B}}}$. - Calculate the new estimate ${\widehat{\mathit{\theta}}}_{m}^{JWE}={\widehat{c}}_{m,m}^{*}{\widehat{\mathit{\theta}}}_{m}^{L}+{\widehat{c}}_{k+m,m}^{*}{\widehat{\mathit{\theta}}}_{m}^{P}$, where ${\widehat{c}}_{j,l}^{*}$ be the element of
*j*th row and*l*th column of $\widehat{\mathbf{\text{B}}}$.

By Theorem 1 and Theorem 2, our new estimators are consistent and asymptotically normal.

## Model checking

For assessing the adequacy of the model, since our weight estimator is based on estimators from [13] and [17], it is reasonable to consider entire processes from [13] and [17]. In this case, we extend model checking technique from [13]. As defined in [13], Let ${N}_{1i}(t;\mathit{\eta})={\Delta}_{i}I\{{\tilde{D}}_{i}^{*}(\mathit{\eta})\le t\}$ and ${N}_{2i}(t;\mathbf{\alpha})={\tilde{\delta}}_{i}^{*}(\mathbf{\alpha})I\{{\tilde{X}}_{i}^{*}(\mathbf{\alpha})\le t\}$, where *i* = 1,…, *n*. Then Nelson-Aalen estimators for the event of interest and dependent censoring are
Note that by (Eq 3) and (Eq 4), martingale residuals are defined as
where $\widehat{\mathbf{\alpha}}$ can be either ${\widehat{\mathit{\alpha}}}^{L}={\left\{{\widehat{\eta}}^{T},{\left({\widehat{\theta}}^{L}\right)}^{T}\right\}}^{T}$ or ${\widehat{\mathit{\alpha}}}^{P}={\left\{{\widehat{\eta}}^{T},{\left({\widehat{\theta}}^{P}\right)}^{T}\right\}}^{T}$. Then as defined in [13],

Then similar to [13] and [17], we can substitute $\widehat{\mathit{\eta}}$ on **S**_{n}(*s*; ** η**), ${\widehat{\mathbf{\alpha}}}^{L}$ and ${\widehat{\mathbf{\alpha}}}^{P}$ on

**U**

_{n}(

*t*;

**). ${\left[{S}_{n}^{T}(s;\widehat{\eta}),{\left\{{U}_{n}(t;{\widehat{\mathit{\alpha}}}^{L})\right\}}^{T},{\left\{{U}_{n}(t;{\widehat{\mathit{\alpha}}}^{P})\right\}}^{T}\right]}^{T}$ are called observed score processes with respect to dependent censoring and the event of interest, respectively [7], [13], [17]. We can construct ${\left[{\widehat{S}}_{n}^{T}(s;{\widehat{\eta}}^{*}),{\left\{{\widehat{U}}_{n}^{L}(t;{\widehat{\mathit{\alpha}}}^{L*})\right\}}^{T},{\left\{{\widehat{U}}_{n}^{P}(v;{\widehat{\mathit{\alpha}}}^{P*})\right\}}^{T}\right]}^{T}$ [13], [17], where where ${\widehat{\mathit{\alpha}}}^{L*}={\left\{{\left({\widehat{\eta}}^{*}\right)}^{T},{\left({\widehat{\theta}}^{L*}\right)}^{T}\right\}}^{T}$ and ${\widehat{\mathit{\alpha}}}^{P*}={\left\{{\left({\widehat{\eta}}^{*}\right)}^{T},{\left({\widehat{\theta}}^{P*}\right)}^{T}\right\}}^{T}$ . These three processes are called bootstrapped processes [7], [13], [17]. We can plot the observed process with bootstrapped processes by randomly selecting 20 or 30 observations. Standard tests for goodness of fit can be performed by calculating Kolmogorov-Smirnov type test statistics. Test statistics are then defined by ${\mathrm{sup}}_{s}||{\mathbf{\text{S}}}_{n}(s;\widehat{\mathit{\eta}})||,{\mathrm{sup}}_{t}||{\mathbf{\text{U}}}_{n}(t;{\widehat{\mathbf{\alpha}}}^{L})||$, and ${\mathrm{sup}}_{v}||{\mathbf{\text{U}}}_{n}(v;{\widehat{\mathbf{\alpha}}}^{P})||$. To calculate the null distribution of the test statistics, first we obtain**

*α**j*th realizations of bootstrap samples ${({\widehat{\mathit{\eta}}}^{*})}^{(j)},{({\widehat{\mathit{\theta}}}^{L*})}^{(j)}$ and ${({\widehat{\mathit{\theta}}}^{P*})}^{(j)}$. Then we compute $B{S}_{j}={\mathrm{sup}}_{s}||{\widehat{\mathbf{\text{S}}}}_{n}(s;{({\widehat{\mathit{\eta}}}^{*})}^{(j)})||,B{S}_{j}^{L}={\mathrm{sup}}_{t}||{\widehat{\mathbf{\text{U}}}}_{n}^{L}(t;{({\widehat{\mathbf{\alpha}}}^{L*})}^{(j)})||$ and $B{S}_{j}^{P}={\mathrm{sup}}_{v}||{\widehat{\mathbf{\text{U}}}}_{n}^{P}(v;{({\widehat{\mathbf{\alpha}}}^{P*})}^{(j)})||$, respectively for

*j*= 1,…,

*M*, where ${({\widehat{\mathbf{\alpha}}}^{L*})}^{(j)}$ and ${({\widehat{\mathbf{\alpha}}}^{P*})}^{(j)}$ are

*j*th realizations of bootstrap samples of ${\widehat{\mathbf{\alpha}}}^{L*}$ and ${\widehat{\mathbf{\alpha}}}^{P*}$. The p-values can be defined by

[10]. If the p-value is smaller than predetermined level, we reject the null hypothesis, which means that data does not have appropriate fit on our bivariate model. Note that a multiple testing problem arises for testing the models for ** θ**. We address this by adjusting p-values based on a Bonferroni correction with two tests.

## Simulation Studies

We consider two simulation settings. In first simulation setting, the errors follow a bivariate normal distribution with mean (0,1.2) with variance 1 and correlation *ρ* = 0,0.25. The independent censoring time *C* is generated from *log*(*U**), where *U** has uniform distribution with minimum value 0 and maximum value 20. Covariate is *Z* ∼ *Bernoulli*(0.5), where *Bernoulli*(0.5) is Bernoulli distribution with success probability 0.5. We run 500 simulation runs. Within each simulation run, 500 resampling runs are tried for covariance matrix calculation. Sample sizes are *N* = 150 and *N* = 300. If there is only one covariate in the model, the first and the second method of the weighted estimation are equivalent. Let this common weighed estimator be ${\widehat{\mathit{\theta}}}^{WE}$. We calculate bias (Bias), mean squared error (MSE), mean of standard error (SEE), 95% coverage rate (Coverage). The coverage is based on the normal approximation. Moreover, to evaluate robustness of estimators, we also compute median of difference of the estimator from true value (Dmedian), median of squared error of estimates (Mediansq), and median of standard errors (Sdmedian). Results are summarized on Table 1 and Table 2.

In second simulation setting, we generate Gamma random variable *ν* with mean *μ* = 1 and variance *σ*^{2} = 0 or 1, then create *W* = exp (*ϵ*^{X}), which is an exponential random variable with rate 4*ν*^{−1} and exp (*ϵ*^{D}) with an exponential random variable with rate *ν*^{−1}. Then we generate time to the event of interest by $\mathrm{exp}(X)=\mathrm{exp}({\mathit{\theta}}_{0}^{T}\mathbf{\text{Z}})\mathrm{exp}({\u03f5}^{X})$ and time to the dependent censoring by $\mathrm{exp}(D)=\mathrm{exp}({\mathit{\eta}}_{0}^{T}\mathbf{\text{Z}})\mathrm{exp}({\u03f5}^{D})$ (By notation in our paper, *X*, *D* and *C* are already log-transformed times. Thus in this context, exp (*X*), exp (*D*) and exp (*C*) are times in the original scale). The independent censoring time exp (*C*) has uniform distribution with minimum value 0 and maximum value 20. True parameter values are *θ*_{0} = (0.5,1)^{T} and *η*_{0} = (1,0.5)^{T} and covariates are *Z*_{1} ∼ *U*(0,1), where *U*(0,1) is uniform distribution with minimum value 0 and maximum value 1 and *Z*_{2} ∼ *Bernoulli*(0.5). We run 500 simulation runs. Within each simulation run, 500 resampling runs are tried for covariance matrix calculation. Let ${\widehat{\mathit{\theta}}}^{MWE}$ be weighted estimators from calculating weights marginally (the first proposed method) and let ${\widehat{\mathit{\theta}}}^{JWE}$ be weighted estimators from calculating weights jointly (the second proposed method). We compute the same quantities as we did in the first set of the simulation study. Results are summarized on Table 3 and Table 4.

In these simulation results, we can see that our weighted estimators have good results. In both cases, bias and mean squared error of our new estimator has similar performance compared to the estimators by [13] and [17]. Mean of standard errors and median of standard errors are smaller than the estimators by [13] and [17]. Moreover, computation results for the median of difference of the estimators from true value and the median of squared error imply that our proposed estimator is comparable with the estimators from the original methods.

In the first simulation setting, the difference of standard error between our proposed estimator and ${\widehat{\mathit{\theta}}}^{L}$ is bigger than the one between ${\widehat{\mathit{\theta}}}^{P}$ and the proposed estimator. In the second simulation setting, the phenomenon is the opposite. Furthermore, in the first simulation setting, ${\widehat{\mathit{\theta}}}^{P}$ has lower standard error on average than one of ${\widehat{\mathit{\theta}}}^{L}$ while ${\widehat{\mathit{\theta}}}^{L}$ have better efficiency (with respect to standard error) than ones by ${\widehat{\mathit{\theta}}}^{P}$ in the second simulation setting. This simulation result verifies our claim, which means that neither estimator is better than another. Our proposed estimator takes advantage of smaller standard error with achieving small bias and correct coverage except *N* = 150 with *σ*^{2} = 1 in the second simulation setting. In this scenario, empirical coverage of proposed estimators is lower than nominal 95% coverage. This is due to low coverage of ${\widehat{\mathit{\theta}}}^{L}$. Since we combine ${\widehat{\mathit{\theta}}}^{L}$ and ${\widehat{\mathit{\theta}}}^{P}$, if one of them has low coverage, it is highly likely that the coverage of weighted estimator may also be below the nominal coverage.

## Real data analysis

We applied our method to data from the AIDS Clinical Trial Group (ACTG) Study 364 [1], which was used in [17]. This multicenter randomized study investigated patients whose plasma RNA level is at least 500 copies per ml. Subjects were assigned to three treatments, nelfinavir (NFV), efavirenz (EFV), and combination of nelfinavir and efavirenz (NFV + EFV). Details about this study can be found in [1].

The two failure times are time to HIV RNA level greater than 2000 copies per ml and time to withdrawal of study. Let *X* be the first time when HIV RNA level is greater than 2000 copies per ml and *D* be time to withdrawal of study. We considered four covariates and 194 observations. *Z*_{1} takes value 1 if a patient receives EFV and 0 otherwise. *Z*_{2} takes value 1 if a patient receives NFV + EFV and 0 otherwise. *Z*_{3} is New3TC, which takes value 1 if lamivudine is given as a new nucleoside analogue therapy to a patient and 0 otherwise. *Z*_{4} is logarithm of RNA level at the start of the study.

Table 5 and Table 6 show the point estimates and standard errors of $\widehat{\mathit{\eta}}$, ${\widehat{\mathit{\theta}}}^{L}$, ${\widehat{\mathit{\theta}}}^{P}$, ${\widehat{\mathit{\theta}}}^{MWE}$ and ${\widehat{\mathit{\theta}}}^{JWE}$. Our method works well for the models with and without New3TC on all covariates. Some variables are seen to be statistically significant based on the weighted estimator while they are not by [13] or [17]. For example, let’s consider effect of EFV to the time to first virologic failure. By Table 6, the estimated effect by using approach of [13] is 0.475 and its standard error is 0.250. From approach of [17], an estimate is 0.464 and its standard error is 0.281. Based on the fact that estimators are asymptotic normal, from Wald test using [13] and [17], EFV is not a statistically significant variable on 5% significant level. On the other hand, a weighted estimate using first approach is 0.471 and its standard error is 0.222. In this case, EFV is a statistically significant variable on 5% significant level.

Observed score process with bootstrapped processes for withdrawal of study with respect to *Z*_{1} is shown in Fig 1. Fig 2 and Fig 3 show observed score processes and bootstrapped processes of the first virologic failure using ${\widehat{\mathbf{\alpha}}}^{L},{\widehat{\mathbf{\alpha}}}^{P}$ with respect to *Z*_{1}. These three plots are based on the model without New3TC. They are fluctuating around zero, so it seems that there is no graphical evidence for lack of fit. The p-value for the lack of fit tests of withdrawal is 0.952 and the first virologic failure using ${\widehat{\mathbf{\alpha}}}^{L}$ and ${\widehat{\mathbf{\alpha}}}^{P}$ are 0.918 and 0.959 respectively. With graphical checking, p-value indicates that there is no evidence for violation of the model assumption.

**The thickline is observed process and the dashed lines are bootstrapped processes**.

**The thickline is observed process and the dashed lines are bootstrapped processes**.

**The thickline is observed process and the dashed lines are bootstrapped processes**.

For purposes of interpretation, since *D* represents a standard survival time, the interpretation of $\widehat{\mathit{\eta}}$ is in terms of covariate effect for survival time. However, since the observed time for *X* depends on *D*, interpretation of $\widehat{\mathit{\theta}}$ is difficult. One way to interpret $\widehat{\mathit{\theta}}$ is to assume that *D* does not exist and interpret the effect of $\widehat{\mathit{\theta}}$ on *X* only. This approach is possible if there exists a reasonable extrapolation mechanism for *X* [18]. However, considering the estimation structure for ** θ**, it is difficult to separate effect of $\widehat{\mathit{\theta}}$ to

*X*from effect of $\widehat{\mathit{\eta}}$ to

*D*.

## Discussion

In this paper, we have proposed optimal estimators using combinations of the two estimators from [13] and [17]. Our methodology can be extended to a case of recurrent event with dependent censoring, which is extensively studied [6], [7], [10]. We are currently working on this extension.

Optimality of the estimator has been discussed in other contexts. Recently, there is a publication that proposed optimal additive functions based on score functions [14]. The main point of their method is to combine unbiased estimating functions. In our case, this would be combining estimating equations and new solution can be obtained by this estimating equation. Comparing performance of this solution and our proposed estimator is of interest. This will be left open to future research.

Another way of achieving optimality is to use generalized method of moment estimator [8]. This estimator is a linear combination of estimating functions [19]. In this case, the estimating functions have a greater dimension than the dimension of the parameter vector. The optimality is achieved by the linear combination. It is shown that the estimator from this linear combination of estimating functions is consistent and asymptotically normal [8]. In the literature of statistics, this idea is applied to generalized estimating equations [19]. The estimating functions proposed by [19] are called quadratic inference function. Recently, the quadratic inference function is applied to Cox model [26].

[8] and [19] derived new estimating functions, while we combined two estimators directly. This idea of the generalized method of moments is very appealing, but the estimating functions of [13] and [17] are nonsmooth. Finding derivative for the linear combination of the estimating functions, which is a key in generalized method moments, is challenging for our work because we cannot find the derivatives in the estimating functions proposed by [13] and [17]. Applying the idea of [8] to AFT model will be interesting future research.

Our estimating equations to obtain estimators involve nonsmooth functions of ** η** and

**. Many literatures used a linear programming approach for estimating**

*α***[3], [11]. However, this linear programming method is very slow for computing estimators of**

*θ***. Thus this approach is very inefficient when implementing to solve (Eq 11) for estimation of**

*θ***Σ**. Recently, an approach called a derivative free-spectral algorithm for nonlinear equations (DF-SANE) was proposed [12], and there is a publication that showed that this algorithm is better than the linear programming method using an example of estimating parameters of AFT models under independent censoring. [21]. However, under dependent censoring, the artificial censoring term leads to numerical instability in estimating parameters and calculating resampled estimators. Moreover, this algorithm does not converge well under default tolerance settings using DF-SANE [21]. Thus using this algorithm requires changing the tolerance level. Developing efficient numerical algorithms for estimating parameters is an important topic for future research.

## Author Contributions

Conceived and designed the experiments: DG. Performed the experiments: YC DG. Analyzed the data: YC. Contributed reagents/materials/analysis tools: YC DG. Wrote the paper: YC DG.

## References

- 1. Albrecht MA, Bosch RJ, Hammer SM, Liou S-H, Kessler H, Para MF, et al. Nelfinavir, Efavirenz, or both after the failure of Nucleoside treatment of HIV infection. N Engl J Med. 2001; 345: 398–407. pmid:11496850
- 2. Day R, Bryant J, Lefkopoulou M. Adaptation of bivariate frailty models for prediction, with application to biological markers as prognostic indicators. Biometrika. 1997; 84: 45–56.
- 3. Ding AA, Shi G, Wang W, Hsieh J-J. Marginal regression analysis for semi-competing risks data under dependent censoring. Scand J Stat. 2009; 36: 481–500.
- 4. Fine JP, Jiang H, Chappell R. On semi-competing risks data. Biometrika. 2001; 88: 907–919.
- 5.
Fleming TR, Harrington DP. Counting Processes and Survival Analysis. 2nd ed. New York: Wiley; 2005.
- 6. Ghosh D, Lin DY. Semiparametric analysis of recurrent events data in the presence of dependent censoring. Biometrics. 2003; 59: 877–885. pmid:14969466
- 7. Ghosh D. Semiparametric analysis of recurrent events: artificial censoring, truncation, pairwise estimation and inference. Lifetime Data Anal. 2010; 16: 509–524. pmid:20063182
- 8. Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982; 50: 1029–1054.
- 9. Honoré BE, Powell JL. Pairwise difference estimators of censored and truncated regression models. J Econom. 1994; 64: 241–278.
- 10. Hsieh J-J, Ding AA, Wang W. Regression analysis for recurrent events data under dependent censoring. Biometrics. 2011; 67: 719–729. pmid:21039394
- 11. Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Biometrika. 2003; 90: 341–353.
- 12. La Cruz W, Martínez JM, Raydan M. Spectral residual method without gradient information for solving large-scale nonlinear systems of equations. Mathematics of Computation. 2006; 75: 1429–1448.
- 13. Lin DY, Robins JM, Wei LJ. Comparing two failure time distributions in the presence of dependent censoring. Biometrika. 1996; 83: 381–393.
- 14. Lindsay BG, Yi GY, Sun J. Issues and strategies in the selection of composite likelihoods. Statistica Sinica. 2011; 21: 71–105.
- 15. Louis TA. Nonparametric analysis of an accelerated failure time model. Biometrika. 1981; 68: 381–390.
- 16. Parzen MI, Wei LJ, Ying Z. A resampling method based on pivotal estimating functions. Biometrika. 1994; 81: 341–350.
- 17. Peng L, Fine JP. Rank estimation of accelerated lifetime models with dependent censoring. J Am Stat Assoc. 2006; 101: 1085–1093.
- 18. Prentice RL, Kalbfleisch JD, Peterson AV Jr, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978; 34: 541–554. pmid:373811
- 19. Qu A, Lindsay BG, and Li B. Improving generalised estimating equations using quadratic inference functions. Biometrika. 2000; 87: 823–836.
- 20. Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann Stat. 1990; 18: 354–372.
- 21. Varadhan R, Gilbert PD. BB: An R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. J Stat Softw. 2009; 32: 1–26.
- 22. Wang W. Estimating the association parameter for copula models under dependent censoring. J R Stat Soc Series B Stat Methodol. 2003; 65: 257–273.
- 23. Wei LJ, Gail MH. Nonparametric estimation for a scale-change with censored observations. J Am Stat Assoc. 1983; 78: 382–388.
- 24. Wei LJ, Johnson WE. Combining dependent tests with incomplete repeated measurements. Biometrika. 1985; 72: 359–364.
- 25. Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc. 1989; 84: 1065–73.
- 26. Xue L, Wang L, Qu A. Incorporating correlation for multivariate failure time data when cluster size is large. Biometrics. 2010; 66: 393–404. pmid:19673860
- 27. Ying Z. A large sample study of rank estimation for censored regression data. Ann Stat. 1993; 21: 76–99.