Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

BioPETsurv: Methodology and open source software to evaluate biomarkers for prognostic enrichment of time-to-event clinical trials

  • Si Cheng,

    Roles Formal analysis, Methodology, Software, Visualization, Writing – review & editing

    Affiliation Department of Biostatistics, University of Washington, Seattle, Washington, United States of America

  • Kathleen F. Kerr,

    Roles Conceptualization, Methodology, Supervision, Writing – original draft

    Affiliation Department of Biostatistics, University of Washington, Seattle, Washington, United States of America

  • Heather Thiessen-Philbrook,

    Roles Data curation, Project administration, Writing – review & editing

    Affiliation Division of Nephrology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America

  • Steven G. Coca,

    Roles Writing – review & editing

    Affiliation Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America

  • Chirag R. Parikh

    Roles Conceptualization, Funding acquisition, Writing – review & editing

    Affiliation Division of Nephrology, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America


Biomarkers can be used to enrich a clinical trial for patients at higher risk for an outcome, a strategy termed "prognostic enrichment." Methodology is needed to evaluate biomarkers for prognostic enrichment of trials with time-to-event endpoints such as survival. Key considerations when considering prognostic enrichment include: clinical trial sample size; the number of patients one must screen to enroll the trial; and total patient screening costs and total per-patient trial costs. The Biomarker Prognostic Enrichment Tool for Survival Outcomes (BioPETsurv) is a suite of methods for estimating these elements to evaluate a prognostic enrichment biomarker and/or plan a prognostically enriched clinical trial with a time-to-event primary endpoint. BioPETsurv allows investigators to analyze data on a candidate biomarker and potentially censored survival times. Alternatively, BioPETsurv can simulate data to match a particular clinical setting. BioPETsurv's data simulator enables investigators to explore the potential utility of a prognostic enrichment biomarker for their clinical setting. Results demonstrate that both modestly prognostic and strongly prognostic biomarkers can improve trial metrics such as reducing sample size or trial costs. In addition to the quantitative analysis provided by BioPETsurv, investigators should consider the generalizability of trial results and evaluate the ethics of trial eligibility criteria. BioPETsurv is freely available as a package for the R statistical computing platform, and as a webtool at


Biomarkers are used for various purposes across research and clinical contexts. In a clinical trial of an intervention intended to prevent or delay some unwanted clinical event, a biomarker may be useful for “prognostic enrichment” of the trial [15]. A prognostically enriched trial uses a biomarker to enroll only patients at relatively higher risk of the unwanted clinical event. Since study power depends on observing events, running a trial in an enriched study population can allow for a smaller trial compared to an unenriched trial [6, 7]. Moreover, it may be more ethically acceptable to test an intervention only on patients at high risk for the clinical event, and ethically preferable to test on a smaller study sample. Prognostic enrichment can produce greater efficiency in evaluating new interventions, potentially benefiting patients, sponsors, and public health.

There is a substantial literature on biomarkers that are predictive of treatment efficacy [812], also referred to as treatment-selection biomarkers [1316]. In contrast, little has been written about evaluating biomarkers for prognostic enrichment [1, 2]. As noted by Temple [1], prognostic enrichment is well-established in cardiovascular disease, where it is common that interventions are first studied in individuals who are at high risk. CONSENSUS, the first trial of enalapril, enrolled only very high-risk patients (6-month mortality of 44%). CONSENSUS demonstrated efficacy of enalapril with only 253 patients. Subsequent trials in groups with less severe disease needed to be much larger.

In nephrology, a trial for patient with autosomal dominant polycystic kidney disease (ADPKD) enriched for those at greater risk of a substantial decline in renal function [17]. Total Kidney Volume (TKV), measured at baseline, was used in combination with patient age and estimated glomerular filtration rate (eGFR) to identify high risk patients. Without TKV, it was determined that 13 patients would need to be screened to enroll 11 patients to observe one event. With TKV, 25 patients would need to be screened to enroll 9 patients and observe one event. The FDA qualified TKV as a prognostic biomarker for use in clinical trials for ADPKD on August 31, 2015 [18]. The PRIORITY trial in patients with type 2 diabetes enriched for patients at high-risk of the primary endpoint, confirmed microalbuminuria, which occurred in 28% of participants classified as high-risk and only 9% of those classified as low-risk [19]. Although the trial did not establish that spironolactone is efficacious for the primary endpoint, without enrichment a sample size 3–4 times as large would have been needed and many more patients would have been exposed to a therapy that has side effects.

Despite prognostic enrichment being well-established in cardiology and employed in other clinical areas, little has been written about how to evaluate a biomarker for prognostic enrichment or to consider the trade-offs of an enriched vs. unenriched trial [4, 6]. For trials with a binary primary outcome, our group previously published the Biomarker Prognostic Enrichment Tool (BioPET) [7]. We identified key considerations for evaluating a biomarker for prognostic enrichment, including: clinical trial sample size; number of patients to screen to enroll the trial; total patient screening costs and the total of per-patient costs for patients in the trial. BioPET includes methods and graphical devices to evaluate a biomarker on these dimensions for trials with a binary outcome, but cannot be used for trials with a time-to-event outcome such as survival. Compared to trials with binary outcomes, trials with time-to-event outcomes can utilize more information in the data and accommodate the partial information available in censored outcomes. This article describes new methods and open source software, BioPETsurv, for such trials.

As a motivating example, consider the population of patients with a hospitalized episode of acute kidney injury [20] and a hypothetical intervention intended to prevent or delay the onset of chronic kidney disease. A randomized trial will compare the hazard for chronic kidney disease in a treatment group and a control group. As a proof-of-principle illustration, in this article we use synthetic data that mimic an existing cohort [20] to illustrate BioPETsurv for prognostic enrichment in this setting (Example 1).

BioPETsurv accommodates two trial designs. The first design is a fixed-duration trial ‒ the observation period is the same for all patients. The second design has an accrual period plus a follow-up period. For example, there may be a 1-year accrual period and a 3-year follow-up period, so that the observation period varies between 3 and 4 years for study participants.

BioPETsurv can be used to evaluate a biomarker and (possibly right-censored) survival data on a sample of patients. Alternatively, investigators can specify some essential features such as the event rate without enrichment and the prognostic capacity of the biomarker in terms of a hazard ratio. BioPETsurv can simulate biomarker and survival data matching these specifications, allowing investigators to explore prognostic enrichment for their clinical setting.

In this article, “biomarker” can refer to either a single measured characteristic or a “composite biomarker” [2] combining multiple prognostic factors [7]. For simplicity, we use "survival" for any time-to-event variable.


Without loss of generality, assume that patients with higher levels of the biomarker tend to experience the unwanted clinical outcome sooner. For a binary outcome, the area under the ROC curve (AUC) summarizes the discrimination performance of a biomarker. For a survival outcome, BioPETsurv displays the Kaplan-Meier survival curves for the entire patient population and for enriched subsets.

A continuous biomarker can, in principle, be used for a low or high level of enrichment of a trial. The level of enrichment is the threshold (percentile in the biomarker) above which patients are eligible for the trial. For example, excluding patients from the trial below the 10th percentile in the biomarker would be a low level of enrichment; requiring patients in the trial to be in the top quartile would be a high level of enrichment. Based on the level of enrichment, the prognostic strength of the biomarker, and the length of the trial, BioPETsurv estimates the expected event rate absent intervention. The expected event rate together with statistical testing specifications (e.g., power) and the treatment effect to detect determine the trial sample size. The total number of patients screened to enroll the trial depends on the trial sample size and the level of enrichment. For example, a trial with a 50% level of enrichment requires, on average, 2 patients to be evaluated to identify one eligible for the trial. Under the assumption that patients express interest in enrolling in the trial at a constant rate over time, `total number of patients screened' is a proxy for the calendar time to enroll the trial [7].

For cost analysis, BioPETsurv allows the cost for a patient in the trial to be either constant, or depend on the time the patient is in the trial before the primary endpoint. The latter may be realistic if the endpoint is death. The cost of screening, such as assay costs or patient work-up, must also be specified. Based on these user-specified costs, BioPETsurv calculates total trial cost for each enrichment level.

A key element in prognostic enrichment is the time-specific event rate by the end of the trial in enriched subgroups, which must be estimated. This can be done using Kaplan-Meier methods in subgroups. Alternatively, the nearest neighbor estimation method [4] allows the censoring process to depend on the biomarker and guarantees monotone estimated Receiver Operating Characteristic curves for time-specified outcomes. BioPETsurv offers both methods for fixed-duration trials and uses Kaplan-Meier methods for trials with an accrual period plus a follow-up period.

Fixed-duration trials

Given type I error rate α, power 1-β, and treatment hazard ratio HR, the number of events needed [21] is . For a given enrichment level and trial length, let be estimated survival; the event rate is in the control arm and in the treatment arm. Let be the sample size in one arm of a trial planned to have equal sample size in each arm. Then , so total N is . Let C1 be the cost for a patient in the trial and C2 the cost of screening (such as assay cost). For enrichment at quantile t (patients with biomarker below quantile t are excluded), total cost is . Let . We calculate the standard deviation (SD) from the delta method, and . We treat No, which comes from a standard sample size formula, as fixed; variability comes from .

Trials with an accrual period and a follow-up period

Let a and f be the accrual and follow-up time respectively. The only difference from a fixed-duration trial is in estimating the event rates, and , when participants are followed for different periods of time. Following [21], from Simpson’s rule and , with standard errors estimated by bootstrapping. All other trial characteristics follow as for fixed-duration trials.

Simulating data to allow investigators to explore prognostic enrichment

To allow investigators to explore prognostic enrichment without data on survival and the biomarker for a sample of patients, BioPETsurv can simulate data to mimic specific clinical parameters. Survival is simulated from a Weibull distribution with user-specified shape parameter k, which allows hazards to be constant, increasing, or decreasing. The user also specifies the survival probability p at time T, which are used to solve for the Weibull scale parameter a. We expect investigators will specify p based on knowledge of overall survival in the patient population. The data simulator takes p as the survival probability for individuals with mean biomarker level. The survival curve for this group is similar to the overall survival curve. The prognostic strength of the biomarker is specified by the hazard ratio for a 1 standard deviation difference in the biomarker. Without loss of generality, the biomarker X is mean-centered so that X = 0 is the baseline group. Given Weibull shape and scale parameters, baseline hazards are and under proportional hazards an individual with biomarker X has hazard , which corresponds to a Weibull distribution with the same shape parameter k and scale parameter . Biomarker data are simulated to have either a symmetric (normal) or right-skewed (lognormal) shape (user-specified). Based on biomarker value X = x, survival time is simulated from the appropriate Weibull distribution but censored at time T. The joint distribution of simulated biomarker and survival times is used for prognostic enrichment analysis.


Example 1: A modestly prognostic biomarker and fixed-duration trial

Fig 1 and Table 1 show an example using BioPETsurv to evaluate a biomarker that is modestly prognostic of the event, with HR 1.2 corresponding to one standard deviation difference in the biomarker. The trial will be either 36 or 48 months. Fig 1A shows estimated survival curves for screening threshold 0% (top curve), i.e., for all patients (no enrichment). The plot shows that events accumulate more quickly in enriched subpopulations of patients, showing more quickly decreasing survival curves for enrichment levels 25%, 50%, and 75% (meaning that patients with biomarker below the 25th, 50th, or 75th percentile are excluded). Two vertical lines indicate the candidate trial lengths, 36 and 48 months. In all other panels in Fig 1, the horizontal axis is the level of enrichment, with 0% representing an unenriched trial. Fig 1B shows the estimated event rate as a function of the level of enrichment for both candidate trial lengths. Based on these event rates and specifying 90% power to detect treatment hazard ratio 0.8 (two-sided testing, α = 0.05), Fig 1C shows the sample size for each trial duration. As expected, the longer trial requires fewer patients than the shorter trial. Fig 1D shows the number of patients needed to screen to enroll the trial. With this modestly prognostic biomarker, the screening total increases with higher enrichment, although the increase is modest below 50% enrichment. Fig 1E and 1F display the cost analysis, with per-patient costs of $4000 (36-month trial) and $5000 (48-month trial). The screening cost was set at $500. In this example, with less than 25% enrichment an enriched trial is anticipated to be more expensive than an unenriched trial because the decrease in sample size is not enough to offset the cost of screening. The cost analysis shows cost savings for higher levels of enrichment.

Fig 1. BioPETsurv analysis of a modestly prognostic biomarker for a fixed-duration 36-month or 48-month trial.

Investigators are considering the biomarker for enrichment of either a 36-month or 48-month trial and specified 90% power to detect a hazard ratio of 0.8 using two-sided testing and α = 0.05. For cost analysis, the cost of screening was $500 and the cost of one patient in the trial was $4000 (36-month trial) and $5000 (48-month trial). The biomarker in this example has HR≈1.2 corresponding to a 1 SD difference in the marker.

Table 1. BioPETsurv analysis of a modestly prognostic biomarker (Example 1).

Example 2: Simulating data for a highly prognostic biomarker and a trial with accrual period and follow-up period

We illustrate the BioPETsurv data simulator. We set simulation parameters to mimic the clinical setting of Example 1 but anticipating a more highly prognostic biomarker. We simulated biomarker and survival data for n = 5,000 patients with event rate 18% at 48 months. We specified constant hazards, and a hazard ratio 2.8 corresponding to 1 standard deviation difference in the biomarker, which we simulated as normally distributed. Fig 2 and Table 2 give prognostic enrichment analysis using the simulated data and planning a trial with a 12-month accrual period and a 36-month follow-up. We again specified 90% power to detect 0.8 treatment hazard ratio, two-sided testing, and α = 0.05.

Fig 2. BioPETsurv analysis of simulated biomarker for a trial with a 12-month accrual period and 36-month follow-up period.

Investigators are planning a trial with a 12-month accrual period plus a 36-month follow-up period, and anticipate having a marker with HR≈2.8 corresponding to a 1 standard deviation difference in the marker. The BioPETsurv data simulator generated data for a normally distributed biomarker with this prognostic strength. Sample size calculations specified 90% power to detect a treatment hazard ratio of 0.8 using two-sided testing and α = 0.05. For cost analysis, patient screening cost was $300 and the cost of a patient in a trial was $100/month before the clinical endpoint. Numeric results are in Table 2.

Table 2. BioPETsurv analysis of simulated biomarker for a trial with a 12-month accrual period and 36-month follow-up period (Example 2).

Fig 2A shows estimated survival curves for no enrichment, and 25%, 50%, and 75% enrichment. Compared to Fig 1A, there is greater separation between the curves because the biomarker here is more prognostic. Fig 2B shows the average event rate for each level of enrichment (the average accounts for the variable length of patient follow-up). Sample size decreases steadily with greater enrichment (Fig 2C). The total number of patients to screen to enroll the trial is gradually increasing for lower levels of enrichment but increases rapidly at high levels of enrichment (Fig 2D). With $300 screening cost and a patient in the trial costing $100/month before the clinical event, there is potential for substantial savings from prognostic enrichment (Fig 2E and 2F).


In this work we demonstrated a comprehensive framework for evaluating a biomarker for prognostic enrichment of a clinical trial with a survival endpoint. In both Examples 1 and 2, total trial costs are nearly monotone decreasing with greater levels of enrichment, but this will not always be true. For example, one can use the data simulator with the following specifications: $100 screening cost, $1000 cost per patient in the trial, 50% survival at 10 years and a trial planned for 5 years for 90% power to detect a treatment hazard ratio of 0.8. If the biomarker is highly prognostic (effect size 2.0), the total trial cost is U-shaped with a minimum at about the 75% enrichment level (that is, the trial only enrolls patients in the highest quartile of the biomarker). See S1 Fig and S1 Table. On the other hand, If the biomarker is weakly prognostic (e.g., effect size 1.2), total cost is monotone increasing with the level of enrichment. See S2 Fig and S2 Table. That is, at a 1:10 ratio of per patient screening and trial costs, it is not cost-effective to use prognostic enrichment at any level with the weakly prognostic biomarker.

Interestingly, the number of patients screened can be either an increasing or decreasing function of the level of enrichment. In both Examples 1 and 2 it was increasing. However, for highly prognostic biomarkers, the number of patients screened can be decreasing because the trial sample size drops precipitously and more than compensates for the additional patients who must be screened for an enriched trial (see [7] for examples).

The BioPETsurv data simulator requires specification of the biomarker distribution, the biomarker hazard ratio, the trend in hazards over time (increasing, constant, or decreasing), and the event rate without enrichment. These elements are realistic for area specialists to identify. The simulation methodology incorporates proportional hazards. As with any data simulation, results will be accurate only to the extent that the assumptions of the simulation hold.

When considering a prognostic enrichment strategy, investigators must consider multiple, sometimes conflicting goals: trial sample size, number of patients to screen for eligibility, and cost. BioPETsurv is useful for several types of questions in this arena. First, investigators with data on a prognostic biomarker can use BioPETsurv to evaluate that biomarker for its utility for prognostic enrichment for their clinical setting. Second, investigators with a prognostic biomarker who are planning a trial can use BioPETsurv to decide whether, and to what extent, to use the biomarker to plan and implement an enriched trial. Third, investigators working in a particular clinical setting can use BioPETsurv's simulation functionality to explore the prognostic capacity that would be needed in order for a biomarker to be useful for prognostic enrichment; results then inform biomarker research and development in that area [2225]. BioPETsurv uses metrics that align with key trial elements. Together with important considerations around generalizability and ethics, BioPETsurv facilitates a comprehensive evaluation of competing dimensions in trial planning and the evaluation of prognostic enrichment biomarkers.


  1. 1. Temple R. Enrichment of clinical study populations. Clin Pharmacol Ther. 2010;88(6):774–8. pmid:20944560.
  2. 2. Administration USFaD. Guidance for Industry and FDA Staff: Qualification Process for Drug Development Tools. 2014.
  3. 3. Parikh CR, Moledina DG, Coca SG, Thiessen-Philbrook HR, Garg AX. Application of new acute kidney injury biomarkers in human randomized controlled trials. Kidney Int. 2016;89(6):1372–9. pmid:27165835; PubMed Central PMCID: PMC4869991.
  4. 4. Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000;56(2):337–44. pmid:10877287.
  5. 5. Stanski NL, Wong HR. Prognostic and predictive enrichment in sepsis. Nat Rev Nephrol. 2020;16(1):20–31. Epub 2019/09/11. pmid:31511662.
  6. 6. Vickers AJ, Bennette C, Kibel AS, Black A, Izmirlian G, Stephenson AJ, et al. Who should be included in a clinical trial of screening for bladder cancer?: a decision analysis of data from the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial. Cancer. 2013;119(1):143–9. pmid:22736219; PubMed Central PMCID: PMC4036636.
  7. 7. Kerr KF, Roth J, Zhu K, Thiessen-Philbrook H, Meisner A, Wilson FP, et al. Evaluating biomarkers for prognostic enrichment of clinical trials. Clin Trials. 2017;14(6):629–38. Epub 2017/08/10. pmid:28795578; PubMed Central PMCID: PMC5714681.
  8. 8. Perez-Gracia JL, Sanmamed MF, Bosch A, Patiño-Garcia A, Schalper KA, Segura V, et al. Strategies to design clinical studies to identify predictive biomarkers in cancer research. Cancer Treat Rev. 2017;53:79–97. Epub 2016/12/30. pmid:28088073.
  9. 9. Satagopan JM, Iasonos A, Zhou Q. Prognostic and Predictive Values and Statistical Interactions in the Era of Targeted Treatment. Genet Epidemiol. 2015;39(7):509–17. Epub 2015/09/09. pmid:26349638; PubMed Central PMCID: PMC4784265.
  10. 10. Wang SJ, O'Neill RT, Hung HM. Approaches to evaluation of treatment effect in randomized clinical trials with genomic subset. Pharm Stat. 2007;6(3):227–44. pmid:17688238.
  11. 11. Taieb J, Jung A, Sartore-Bianchi A, Peeters M, Seligmann J, Zaanan A, et al. The Evolving Biomarker Landscape for Treatment Selection in Metastatic Colorectal Cancer. Drugs. 2019;79(13):1375–94. pmid:31347092; PubMed Central PMCID: PMC6728290.
  12. 12. Janes H, Brown MD, Huang Y, Pepe MS. An approach to evaluating and comparing biomarkers for patient treatment selection. Int J Biostat. 2014;10(1):99–121. pmid:24695044; PubMed Central PMCID: PMC4341986.
  13. 13. Song X, Dobbin KK. Evaluating biomarkers for treatment selection from reproducibility studies. Biostatistics. 2020. Epub 2020/05/18. pmid:32424421.
  14. 14. Chen JJ, Lu TP, Chen YC, Lin WJ. Predictive biomarkers for treatment selection: statistical considerations. Biomark Med. 2015;9(11):1121–35. Epub 2015/10/28. pmid:26507127.
  15. 15. Chen YC, Lee UJ, Tsai CA, Chen JJ. Development of predictive signatures for treatment selection in precision medicine with survival outcomes. Pharm Stat. 2018;17(2):105–16. Epub 2018/01/03. pmid:29297979.
  16. 16. Simon R. Validation of pharmacogenomic biomarker classifiers for treatment selection. Cancer Biomark. 2006;2(3–4):89–96. pmid:17192063.
  17. 17. Torres VE, Chapman AB, Devuyst O, Gansevoort RT, Grantham JJ, Higashihara E, et al. Tolvaptan in patients with autosomal dominant polycystic kidney disease. N Engl J Med. 2012;367(25):2407–18. Epub 2012/11/03. pmid:23121377; PubMed Central PMCID: PMC3760207.
  18. 18. Biomarker Qualification Review Team (BQRT). Biomarker Qualification Review for Total Kidney Volume.
  19. 19. Tofte N, Lindhardt M, Adamova K, Bakker SJL, Beige J, Beulens JWJ, et al. Early detection of diabetic kidney disease by urinary proteomics and subsequent intervention with spironolactone to delay progression (PRIORITY): a prospective observational study and embedded randomised placebo-controlled trial. Lancet Diabetes Endocrinol. 2020;8(4):301–12. Epub 2020/03/02. pmid:32135136.
  20. 20. Go AS, Parikh CR, Ikizler TA, Coca S, Siew ED, Chinchilli VM, et al. The assessment, serial evaluation, and subsequent sequelae of acute kidney injury (ASSESS-AKI) study: design and methods. BMC Nephrol. 2010;11:22. Epub 2010/08/27. pmid:20799966; PubMed Central PMCID: PMC2944247.
  21. 21. Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39(2):499–503. pmid:6354290.
  22. 22. Ayas NT, Hirsch Allen AJ, Fox N, Peres B, Mehrtash M, Humphries KH, et al. C-Reactive Protein Levels and the Risk of Incident Cardiovascular and Cerebrovascular Events in Patients with Obstructive Sleep Apnea. Lung. 2019;197(4):459–64. Epub 2019/05/14. pmid:31089857.
  23. 23. Parikh CR, Liu C, Mor MK, Palevsky PM, Kaufman JS, Thiessen Philbrook H, et al. Kidney Biomarkers of Injury and Repair as Predictors of Contrast-Associated AKI: A Substudy of the PRESERVE Trial. Am J Kidney Dis. 2020;75(2):187–94. Epub 2019/09/20. pmid:31547939; PubMed Central PMCID: PMC7012712.
  24. 24. Nadkarni GN, Chauhan K, Verghese DA, Parikh CR, Do R, Horowitz CR, et al. Plasma biomarkers are associated with renal outcomes in individuals with APOL1 risk variants. Kidney Int. 2018;93(6):1409–16. Epub 2018/04/25. pmid:29685497; PubMed Central PMCID: PMC5918426.
  25. 25. Greenberg JH, Zappitelli M, Jia Y, Thiessen-Philbrook HR, de Fontnouvelle CA, Wilson FP, et al. Biomarkers of AKI Progression after Pediatric Cardiac Surgery. J Am Soc Nephrol. 2018;29(5):1549–56. Epub 2018/02/22. pmid:29472416; PubMed Central PMCID: PMC5967766.