Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Nonparametric estimation of median survival times with applications to multi-site or multi-center studies

  • Mohammad H. Rahbar ,

    Roles Conceptualization, Data curation, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Mohammad.H.Rahbar@uth.tmc.edu

    Affiliations Department of Epidemiology, Human Genetics, and Environmental Sciences, University of Texas School of Public Health at Houston, Houston, TX 77030, United States of America, Division of Clinical and Translational Sciences, Department of Internal Medicine, McGovern Medical School; University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America, Biostatistics/Epidemiology/Research Design component, Center for Clinical and Translational Sciences, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America

  • Sangbum Choi,

    Roles Formal analysis, Software, Visualization, Writing – review & editing

    Affiliation Department of Statistics, Korea University, Seoul, South Korea

  • Chuan Hong,

    Roles Formal analysis, Software

    Affiliation Department of Biostatistics, Harvard University, Boston, MA 02115, United States of America

  • Liang Zhu,

    Roles Data curation, Formal analysis, Software, Validation, Writing – review & editing

    Affiliations Division of Clinical and Translational Sciences, Department of Internal Medicine, McGovern Medical School; University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America, Biostatistics/Epidemiology/Research Design component, Center for Clinical and Translational Sciences, University of Texas Health Science Center at Houston, Houston, TX 77030, United States of America

  • Sangchoon Jeon,

    Roles Data curation, Formal analysis, Validation, Visualization, Writing – review & editing

    Affiliation Division of Acute Care/Health Systems, School of Nursing, Yale University, West Haven, CT 27399, United States of America

  • Joseph C. Gardiner

    Roles Writing – review & editing

    Affiliation Department of Epidemiology and Biostatistics, Michigan State University, Lansing, MI 48824, United States of America

Abstract

We propose a nonparametric shrinkage estimator for the median survival times from several independent samples of right-censored data, which combines the samples and hypothesis information to improve the efficiency. We compare efficiency of the proposed shrinkage estimation procedure to unrestricted estimator and combined estimator through extensive simulation studies. Our results indicate that performance of these estimators depends on the strength of homogeneity of the medians. When homogeneity holds, the combined estimator is the most efficient estimator. However, it becomes inconsistent when homogeneity fails. On the other hand, the proposed shrinkage estimator remains efficient. Its efficiency decreases as the equality of the survival medians is deviated, but is expected to be as good as or equal to the unrestricted estimator. Our simulation studies also indicate that the proposed shrinkage estimator is robust to moderate levels of censoring. We demonstrate application of these methods to estimating median time for trauma patients to receive red blood cells in the Prospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study.

Introduction

Multi-site and multi-center studies have become very popular in clinical and translational sciences in the past two decades. Although multi-site studies allow for increased enrollment rate and improved generalizability to the target population [1], they introduce additional statistical challenges in the study design and analysis of data from these studies. For example, multi-site studies could lead to non-homogeneous sub-samples due to the differences in the study sites. This is particularly relevant to trauma and emergency care research.

Having served as the Data Coordinating Center for the Prospective Observational Multicenter Major Trauma Transfusion (PROMMTT) study [1], we have identified a number of challenges in analysis of time to event data from PROMMTT. For example, in blood transfusion research, time to receive the first unit of red blood cells (RBCs) is one of important surrogates to measure how rapidly trauma patients receive blood transfusion. However, such data were often collected from multi-sites. Due to the fact that trauma centers may have their own guideline and practice to manage trauma patients, different sites may not only contribute different number of patients but also have different distributions of time to event of interest with different levels of censoring and variability. As a result, analysis of data from multi-site studies requires an exploratory step prior to pooling samples from different sites.

In some medical research, particularly in multi-site randomized clinical trials (e.g., for trauma/blood transfusion), when the main outcome of interest is time to observe a certain outcome of interest (e.g., time to receive the first unit of RBCs), complete observations in all patients are not usually available due to death before receiving a fixed number of units of RBCs, which results in right-censored data. [2] introduced nonparametric estimation of the survival curve, based on right-censored data. Nonparametric estimation of the mean survival time has been studied by many investigators including [3, 4] and [5, 6], and [7] have extensive discussions on nonparametric estimation of the mean and quantiles of the survival function. [8] introduced a nonparametric procedure for testing the equality of median survival times from k-independent samples of right censored data. [9] introduced an alternative to the BC test, which may perform better than the BC test under certain situations. To deal with the inflated Type I error rates when sample sizes are small, more recently, [10] extended Mood’s median test for uncensored data to the setting of survival data.

The main aim of this research is to propose an improved nonparametric method for estimation of the median survival time from right-censored data from k-independent samples when there is uncertainty regarding the homogeneity of the k-population medians. We compare the proposed estimator to other two commonly used estimators asymptotically, and through extensive simulations. In addition, we demonstrate application of this method to data from the PROMMTT study ([11, 12]).

The remainder of the article is organized as follows. In Section 2, we propose the estimation strategies for median survival times and make comparisons among different estimators. In Section 3, we present the results from our simulation study comparing the performance of the proposed estimator against the other two commonly used estimators based on mean square errors (MSE). In Section 4, we demonstrate an application of our proposed method to data from PROMMTT ([1, 11]). Section 5 is devoted to concluding remarks with some discussion.

Materials and methods

We consider the nonparametric estimation of median survival times based on right-censored data in the presence of uncertain prior information in a k-independent sample situation. We assume here that are random samples selected from k populations with ni observations taken from the i-th population. We will refer to Tij as to the survival time of j-th subject in the i-th population. Due to censoring at time Cij, the survival time or time-to-event may not be observable in some subjects. Therefore, for each subject, the data are recorded in the form (Yij, δij), j = 1, ⋯, ni, i = 1, ⋯, k, where Yij = min(Tij, Cij), δij = I(TijCij), and I(⋅) is the indicator function. We assume that random variables Tij and Cij are independent with continuous survival distributions Fi(x) = P(Tij > x) and Gi(y) = P(Cij > y), respectively.

0.1 Unrestricted estimator (UE)

A straightforward way to estimate the median-parameter vector can be defined as, (2.1) where and for i = 1, ⋯, k.

We call this estimator as unrestricted estimator (UE) of Θ. This estimator is usually used when no hypothesis information is available on Θ. For example, in a multi-site study with k sites one can provide site-specific estimates for median survival time without combing the data from all sites. All the estimators for the k components of the vector are independent.

For each i = 1, ⋯, k, converges in distribution to a normal random variable with mean 0 and variance . Hence operationally, is asymptotically normal with mean and variance Ψi(p)/ni.

0.2 Combined estimator (CE)

Θ can also be estimated by combining the sample and hypothesis information under the assumption of homogeneity of the k medians given by (2.2) We can use this additional information together with the sample information to obtain improved estimators. Under the null hypothesis (2.2), we consider the combined/restricted estimator (CE) of Θ defined by (2.3) This estimator is expressed as a linear combination of the , i.e., , where , where can be estimated by for i = 1, ⋯, k. However, in order to avoid the difficulty in estimating the density function, we used the bootstrap to estimate the variance of the estimated median, following [9].

For the preliminary test on H0 in (2.2), we consider the following test statistic that is defined by the normalized distance of from : (2.4) where (2.5) where

We assume that ωi = limn→∞ ωi,n is fixed for i = 1, ⋯, k, and exists and is nonsingular. It is shown that under the null hypothesis for large n, Λn follows the central χ2 distribution with k − 1 degrees of freedom [9]. For given α, the critical value of Λn may be approximated by , the upper 100α% point of the chi-square distribution with k − 1 degrees of freedom. More details can be found in [9].

Positive-part shrinkage estimator (PP)

The combined estimator works well only when the null hypothesis (2.2) holds. If the null hypothesis (2.2) is rejected, we propose another estimator based on the James-Stein type shrinkage estimator (SSE) [12] which is defined by (2.6) The Stein-type estimator in (2.6) is not sensitive to departure from H0, and will provide uniform improvement in terms of efficiency for the entire parameter space of Θ. It is, however, not a convex combination of and . Also, this estimator may not remain nonnegative. To avoid this strange behavior, we truncate at positivity boundary by adding an extra term to (2.6), which leads to a convex combination of and , namely, the positive-part shrinkage estimator (PP). When k ≥ 4, the positive-part shrinkage estimator is defined as follows: (2.7) where , and Λn were defined in Sections 2.1 and 2.2.

0.3 Comparison of , and

In this Section, we compare the performance of , and by asymptotic distribution quadratic risk function [13]. For an estimator Θ*, define the weighted quadratic loss function of the form L(Θ*, Θ) = n(Θ* − Θ)T W(Θ* − Θ), where W is a positive-definite matrix of weights. The expectation of the loss function R0(Θ*, Θ) = E[L(Θ*, Θ)] is called the risk function. The performance of the estimators can be evaluated by comparing the risk functions and an estimator with a smaller risk is preferred.

Since the test statistic in (2.2) is consistent for fixed Θ when Θ ∉ H0, is asymptotically equivalent to for fixed alternatives, this makes it difficult to compare their performance [14]. Alternatively, we may evaluate the asymptotic performance of each estimator under the following contiguous sequence of alternatives: (2.8) where φ is a fixed vector and Θ0 = (θ0, ⋯, θ0). The risk function R0(Θ*, Θ) = E[L(Θ*, Θ)] can be written as R0(Θ*, Θ) = nE[(Θ* − Θ)T W(Θ* − Θ)] = ntr(WΓ*) where Γ* is the covariance matrix of Θ*. Then, considering the asymptotic distribution of , we can define the asymptotic distribution quadratic risk (ADQR) as R(Θ*, Θ) = tr(WΓ) where Γ is the asymptotic covariance matirx. To facilitate the numerical computation and general discussion, we consider the particular case with W = Γ − 1. Then, following similar arguments in [14], and define Δ = (JΩ − Ik)Θ with and , and Δ* = ΔT WΔ, we can demonstrate that (2.9) (2.10) (2.11) and (2.12)

An estimator Θ* is said to asymptotically dominate an estimator Θ0 if R(Θ*, Θ) ≤ R0, Θ), i.e., if the ADQR of Θ* is smaller for at least some value of Θ, and the ADQR does not exceed that of Θ0 for any value of Θ. Further, Θ* strictly dominates Θ0 if R(Θ*, Θ) < R0, Θ) for some (Θ, W). At Δ* = 0, that is, under the null, the dominance of the estimators is usually observed as , where the notation ≻ stands for dominance in terms of risk performance. For all Δ* and k ≥ 4, is satisfied, that is, asymptotically dominates under local alternatives. Thus, we conclude that consistently performs better than in the entire parameter space induced by Δ*. The gain in risk over is substantial when Δ* = 0 or near 0.

Simulation studies

We conducted extensive simulation studies to examine the performance of the proposed estimators in situations with different degrees of departure from the assumption of homogeneity and censoring rates.

In order to evaluate the effect of the departure from the null hypothesis, we generated samples with median Θϵ = (θ0 − 3ϵ, θ0ϵ, θ0 + ϵ, θ0 + 3ϵ) by varying ϵ ≥ 0 from uniform distributions on (θiai/ci, θi + ai/ci), where ai, ci > 0. Let , where ||⋅|| is the Euclidean norm. Various values of ϵ ∈ [0, 1] have been considered to achieve different dϵ. The k samples of censoring variables {Y1i, ⋯, Ynii} were generated from uniform distributions on (θiai/ci, θi + ai/ci + η), where η is a fixed value to achieve a desired level of censoring (e.g., p = 0.3). We set Θ0 = (6, 6, 6, 6), a = (2, 2, 2, 2), and c = (1, 2, 1, 2). The simulation procedure was repeated 1000 times for k = 4 independent samples with size of 100 for each group.

The simulation results of REs and the comparative plots are presented in Table 1 and Fig 1. The performance of Θ* was measured by the relative efficiency (RE), i.e., comparing its MSE with that of ΘUE, defined as , where is one of the estimators ( or ) considered in this study. The amount by which a RE is larger than 1 indicates the degree of superiority of the estimator over . As highlighted in [9], to avoid difficulty in estimating the densitity function, we used boostrapping to estimate the variance of the estimated median. We also compare the performance of by asymptotic distribution quadratic risk described in Section 2.4.

thumbnail
Table 1. Simulated relative efficiency (RE) for combined (CE) and positive part shrinkage (PP) estimators relative to unrestricted estimator (UE) for different values of ϵ.

https://doi.org/10.1371/journal.pone.0197295.t001

thumbnail
Fig 1. The asymptotic distributional quadratic risk (ADQR) performance of the estimators.

https://doi.org/10.1371/journal.pone.0197295.g001

Table 1 shows that when dϵ = 0 or near zero, outperforms all other estimators. However, as dϵ moves away from 0, the risk of become unbounded, making it very inefficient. Overall, maintain its superiority over other estimators for a wide range of dϵ. This clearly suggests that is preferred as there always remains uncertainty about level of heterogeneity between survival medians. In situations where the assumed model is grossly wrong, is expected to be as good or equal to . The asymptotic behavior of ADQR over dϵ is illustrated in Fig 1 under different significance alpha levels. It shows that may has smaller risks and thus be more efficient upon under the null or near. The ADQR of approaches infinity as dϵ grows. Overall, has a better performance in the entire parameter space.

Next, we study scenarios with different distributions and censoring rates. In each scenario, 50 or 100 samples were generated from k = 4 subpopulations. The distributions considered in Tables 2 and 3 are (i) a uniform distribution with median Θ0 = (6,6,6.5,6), (ii) a log-normal distribution with Θ0 = (403.43, 665.14, 403.43, 665.14), and (iii) an exponential distribution with Θ0 = (6.69, 6.58, 7.0, 6.43). The distributions considered in Tables 4 and 5 are mixed distributions with (iv) two subpopulations had a uniform distribution and the other two had a log-normal distribution with median Θ0 = (6, 6, 4.48, 7.39), (v) two uniform and two exponential distributions with median Θ0 = (6, 6, 6.15, 6.15), and (vi) two log-normal and two exponential distributions with median Θ0 = (6, 6, 6.89, 6.89). For each scenario, simulations were repeated 1000 times under 0%, 30%, and 50% censoring schemes and the results are summarized in Tables 2 to 5. We also considered the scenarios where different sites had different censoring rates (0% to 70%) in Table 6, using three mixed distributions and a sample size of 50.

thumbnail
Table 2. Mean of 1000 estimated medians and REs for 0%, 30% and 50% censoring rates for uniform, log-normal and exponential distributions when sample size is 50.

https://doi.org/10.1371/journal.pone.0197295.t002

thumbnail
Table 3. Mean of 1000 estimated medians and REs for 0%, 30% and 50% censoring rates for uniform, log-normal and exponential distributions when sample size is 100.

https://doi.org/10.1371/journal.pone.0197295.t003

thumbnail
Table 4. Mean of 1000 estimated medians and REs for 0%, 30% and 50% censoring rates for (2 uniform+2 lognormal), (2 uniform+2 exponential), (2 lognormal +2 exponential) distributions when sample size is 50.

https://doi.org/10.1371/journal.pone.0197295.t004

thumbnail
Table 5. Mean of 1000 estimated medians and REs for 0%, 30% and 50% censoring rates for (2 uniform+2 lognormal), (2 uniform+2 exponential), (2 lognormal +2 exponential) distributions when sample size is 100.

https://doi.org/10.1371/journal.pone.0197295.t005

thumbnail
Table 6. Mean of 1000 estimated medians and REs when censoring rates are different among 4 sites (small censoring rates: 30%, 20%, 10%, 0%; large censoring rates: 70%, 50%, 30%, 10%), for (2 uniform+2 lognormal), (2 uniform+2 exponential), (2 lognormal +2 exponential) distributions when sample size is 50.

https://doi.org/10.1371/journal.pone.0197295.t006

In general, the results in Tables 2 and 6 show that survival distribution and censoring rate affect the deviation from the true value for different estimators. The RE of ranges from 20% to over 300%, which might be considered as an approximate measure of departure from the homogeneity of survival medians. Overall, shrinkage estimator outperforms with respect to UE in all aforementioned scenarios, yielding 100% to 160% REs. Efficiency gain is more noticeable with higher censoring rate.

Application to the PROMMTT study

The PROMMTT study was a multi-site prospective observational cohort study in a severely injured transfused trauma patients, conducted at 10 level 1 trauma centers in the United States [1, 11]. The original objectives of PROMMTT study were to accurately describe when some blood components were infused and to assess the association between in-hospital mortality and the timing and amount of blood products. Understanding current blood product usage patterns and their impact on patient outcomes among a severely injured and substantially hemorrhaging cohort is critically important.

In our analysis, we applied our method on 698 patients from four major medical centers to examine difference of their median times of receiving the first unit of RBC infusion. The number of patients in the 4 sites are 303, 137, 133, and 125. The median age of patients are 34, 37, 34, and 41 years. The percents of male patients are 73.27%, 77.37%, 76.69%, and 75.20%. Around 61.06%, 84.67%, 54.26%, and 96.77% are Caucasian in the four sites, respectively. Since no patients dropped out of study before the first unit of RBC infusion, the censoring rate for each site is zero.

We apply the proposed method to this data in two steps. First, we formally evaluate homogeneity of medians using randomly right censored data. Second, based on the results of the test of homogeneity our proposed method evaluates whether the unrestricted estimator (UE), combined estimator (CE), or the Positive Part shrinkage estimator (PP) should be used to obtain a more efficient estimator. The primary outcome is time to the first unit of RBC infusion measured as the number of minutes from ED admission. This outcome may reflect patient’s status and may differ across different medical institutions because management of severely hemorrhaging patients may differ. As we found in the data, the UE median times to the first unit of RBC infusion are 18, 55, 65, and 24 for four sites, indicating potential heterogeneity of survival times across 4 sites. Fig 2 shows the estimated survival curves of time to the first unit of RBCs for each site, along with p-values from (2.2) and the log-rank test. It shows that median times are significantly different, suggesting that caution should be exercised when merging data sets from different sites. All of these motivate us to apply the proposed Positive Part shrinkage estimator (PP) method.

thumbnail
Fig 2. The survival curves of the four Kaplan-Meier (KM) estimates for time to receiving the first unit of RBC from site 1 (n1 = 303, solid line), site 2 (n2 = 137, dashed), site 3 (n3 = 133, dotted), and site 4 (n4 = 125, dot-dashed), respectively.

https://doi.org/10.1371/journal.pone.0197295.g002

Table 7 summarizes estimated median times and their 95% confidence interval based on bootstrap variance for each site based on PP method, along with the comparisons to CE and UE methods. The PP median times are 18.08, 54, 63.71, and 23.91, compared to the CE median times of 20.82, 20.82, 20.82, and 20.82. Obviously, the CE median times are unreliable since the test for homogeneity of medians to first RBC infusion is rejected (p-value<0.0001). The PP median times are almost identical to the UE median times, with slightly narrower confidence intervals than UE median times at site 2, 3, and 4, indicating a slight efficiency gain.

thumbnail
Table 7. Estimated median time to receive the first unit of red blood cell (RBC) infusion in PROMMTT study by study sites.

https://doi.org/10.1371/journal.pone.0197295.t007

Concluding remarks

In this paper, other than the unrestricted estimator and simply combined estimator, we have presented a shrinkage nonparametric approach for estimating the median survival vector in a k-sample problem. Other than asymptotical comparison on ADQR, extensive simulations have been done to assess the performance of these estimators, considering various scenarios allowing varying levels of censorship and different level of departure from homogeneity of the survival medians.

The performance of the combined estimator heavily depends on the strength of homogeneity. When homogeneity holds is more efficient compared to and . However, becomes inconsistent and the efficiency of the CE decreases significantly when homogeneity fails. On the other hand, seems to be robust to the non-homogeneity and different levels of censoring. Though the relative efficiency against decreases as we deviate from the quality of the survival medians, it keeps being greater or equal to 1. Like any other shrinkage estimation procedures, there is a bias-variance tradeoff for .

The proposed procedures have applications in epidemiologic and health care research. For example, for estimating survival median based on data from a multi-site study one always faces with the challenge of whether to pool data from all sites or not pool such data. In this study using PROMMTT data we have demonstrated the utility of various estimators as well as how one can make a decision as to choose the most appropriate estimation procedure. On the other hand, if the distributions of median times are similar, greater efficiency of the estimators by using shrinkage-type methods may be gained, depending on the distribution of event time.

Supporting information

S1 File. Revised PROMMTT publication committee guidelines.

https://doi.org/10.1371/journal.pone.0197295.s001

(DOCX)

S1 Dataset. R code for generating simulated data and analysis.

https://doi.org/10.1371/journal.pone.0197295.s002

(ZIP)

Acknowledgments

This research is co-funded by the National Heart, Lung and Blood Institute (NHLBI R21 HL109479) and by the U.S. Army Medical Research and Materiel Command (subcontract W81XWH-08-C-0712), awarded to The University of Texas Health Science Center at Houston (UTHealth). This research is also partly supported by the NIH Centers for Translational Science Award (NIH CTSA) grant (UL1 RR024148), awarded to the Center for Clinical and Translational Sciences (CCTS) in UTHealth by the National Center for Research Resources (NCRR) and its renewal (UL1 TR000371) by the National Center for Advancing Translational Sciences (NCATS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NHLBI, NCATS or the U.S. Army.

References

  1. 1. Rahbar MH, Fox EE, del Junco DJ, Cotton BA, Podbielski JM, Matijevic N et al. Coordination and management of multicenter clinical studies in trauma: Experience from the PRospective Observational Multicenter Major Trauma Transfusion (PROMMTT) Study. Resuscitation. 2012b; 83, 459–464.
  2. 2. Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958; 53, 457–481.
  3. 3. Gill R. Large sample behavior of the product-limit estimator on the whole line. The Annals of Statistics. 1983; 11, 49–58.
  4. 4. Rahbar MH. Sequential estimation of functionals of the survival curve under random censorship. Sequential Analysis. 1990; 9(2), 137–150.
  5. 5. Rahbar MH, Gardiner JC. Nonparametric estimation of regression parameters using censored data with a discrete covariate. Statistics and Probability Letters. 1995; 24, 13–20.
  6. 6. Rahbar MH, Gardiner JC. Nonparametric modeling of the mean survival time in a multi-factor design based on randomly right-censored data. Biometrical Journal. 2004; 46, 497–502.
  7. 7. Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data, 2nd Edition. 2003; Springer-Verlag.
  8. 8. Brookmeyer R, Crowley J. A confidence interval for the median survival time. Biometrics. 1982; 38, 29–41.
  9. 9. Rahbar MH, Chen Z, Jeon S, Gardiner JC, Ning J. A nonparametric test for equality of survival medians. Statistics in Medicine. 2012a; 31, 844–854.
  10. 10. Chen Z. Extension of Mood’s median test for survival data. Statistics & Probability Letters. 2014; 95, 77–84.
  11. 11. Holcomb JB, del Junco DJ, Fox EE, Wade CE, Cohen MJ, Schreiber MA et al. The prospective, observational, multicenter, major trauma transfusion (PROMMTT) study: comparative effectiveness of a time-varying treatment with competing risks. Journal of the American Medical Association, Surgery. 2013; 148, 127–36.
  12. 12. Stein C. Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceeding of the Third Berkeley Symposium on Mathematical Statistics and Probability. 1956; University of California Press, volume 1, 197–206.
  13. 13. Sen PK. On the asymptotic distributional risk shrinkage and preliminary test version of the mean of a multivariate normal distribution. Sankhya. 1986; 48, 354–371.
  14. 14. Ahmed SE. Simultaneous estimation of coefficient of variations. Journal of Statistical Planning and Inference. 1993; New York: Springer-Verlag.