An alternative approach for estimating the number needed to treat for survival endpoints

To investigate the issues of the NNT based on the absolute risk reduction (ARR), namely NNTARR; and to propose an alternative definition and an estimation procedure based on the restricted mean survival time (RMST), namely NNTRMST, for RCTs. Three recent clinical trials with survival endpoints, representing different scenarios, were selected to compare the performance of the NNTARR and NNTRMST. For each trial, both versions of NNT were estimated using the reconstructed individual-level data, and the average life gain (ALG) was derived to show the differences between the NNTARR and NNTRMST. Four hypothetical scenarios were constructed to further explore the advantages and disadvantages of each definition of the NNT for survival endpoints. For the illustrative trial examples, the NNTARR failed to capture the profile of the treatment effect over time as it is calculated at a specific time point. Sometimes it may even result in misinterpretations of the treatment benefit. In particular, when either the observed event rates are low, the two survival curves cross, or a mixture of survival patterns exist. In contrast, the NNTRMST based on the average survival (or event-free) time can quantify the treatment effect more accurately and its interpretation is more intuitive and clinically meaningful. The NNTRMST can be used as an alternative measure for quantifying treatment effect in RCTs, especially so in the case of the ALG, which helps practitioners to better understand the magnitude of the benefit conferred by treatment.


Introduction
Well-designed and properly executed randomized controlled trials (RCTs) provide the best evidence for the efficacy of health care interventions or new treatments. It is desirable to construct a single measure that can adequately summarize the treatment benefit and be easily conveyed to patients and clinicians. [1] The number needed to treat (NNT) is a popular and intuitive measure in RCTs to quantify the magnitude of the treatment effect. [2] For survival endpoints, the NNT (or NNT ARR ) is computed as the reciprocal of the absolute risk reduction (ARR) between the treatment and the control group, which is the difference in the Kaplan-Meier (KM) estimated survival rates or the difference in the cumulative incidences at a time point of clinical interest (see S1 Appendix). [3,4] For the past three decades, the NNT has been widely advocated by medical journals [5,6] as well as the Cochrane [7] and the Consolidated Standards of Reporting Trials (CONSORT) [8,9] groups, because it is more transparent to express the magnitude of the treatment effect using the number of patients needed to treat in order to prevent one additional adverse event during a specific follow-up period.
Despite its many advantages, [5,6,10,11] the NNT ARR has been criticized for certain poor statistical properties. [12,13] In particular, when either the observed event rates for both groups are low, the survival curves cross, or a mixture of survival patterns exist. In these situations, the NNT ARR may fail to capture the profile of the treatment effect over time, thus leading to misinterpretations of the benefit conferred by treatment to some extent. For example, when the two survival curves are close or cross at a chosen time t of clinical interest, the corresponding difference in the KM estimates would be close to zero or even becomes a negative value, which results in either a very large or a negative value of the NNT ARR . Moreover, the calculation of the NNT ARR depends on the truncated binary endpoints by ignoring the entire process of events and censoring during the t-period follow-up, which thus neglects some critical information, resulting in a methodology which cannot reflect the average survival (or event-free) time of patients for both groups. In addition, several other methods have been developed to compute NNT, such as the pseudo-value-based method, [14] the hazard-ratio-based method, [3] and the risk-difference-based method. [15] However, various assumptions are required for these methods. For example, the hazard-ratio-based method requires the validation of the proportional hazard assumption between two groups, which results in an inaccurate NNT when the proportional hazards assumption is not satisfied. [16] Recently, the restricted mean survival time (RMST), corresponding to the average survival time of patients being followed up to a specific time point, has been advocated to quantify the treatment effect. [17][18][19] The RMST is typically measured by the area under the KM curve or the area above the cumulative incidence curve from 0 to a specific time point. To better quantify NNT for survival endpoints, we propose the RMST-based NNT as an alternative measure for the NNT ARR . We use the reconstructed survival data based on the algorithm by Guyot et al. [20] from three recent clinical trials to illustrate the limitations of the NNT ARR and discuss the advantages and disadvantages of the NNT RMST . A general guideline for calculating and reporting the NNT RMST is then provided to facilitate its use in clinical practice.

Number needed to treat based on the restricted mean survival time (NNT RMST )
As an alternative to the NNT ARR , the NNT RMST provides an intuitive measure to quantify the number of patients needed to treat in order to gain the observed difference in mean survival time for a death or an event (see S1 Appendix). The difference in RMSTs represents the average gain in survival time for patients receiving the experimental treatment in comparison with the control during the t-period follow-up. [21] However, the mean survival time for a death or an event is often unknown in clinical practice, and thus we convert it to be the RMST of patients in the control group. The NNT RMST is defined as the RMST in the control group divided by the difference in RMSTs between the two groups up to a chosen time t. The NNT RMST can be interpreted as follows.
(1) If the RMST in the treatment group is larger than that in the control group at the chosen time t, the NNT RMST is the number of patients needed in the treatment to prevent an extra death or an event in comparison with the control during the t-period follow-up.
(2) If the RMST in the treatment group is smaller than that in the control group, a negative NNT RMST is obtained, which conveys a poorer outcome for the treatment. The NNT RMST should then be interpreted as the number needed to treat to harm (NNTH), i.e., the number of patients needed in the treatment group to cause an extra death or an event in comparison with the control during the t-period follow-up.
To quantify the uncertainty of the NNT RMST , we construct its confidence interval (CI) by inverting the lower and upper boundaries of the ratio of RMSTs between the two groups minus 1 (See S1 Appendix for more details). For example, suppose the RMST for the control group is 1.0 year, the difference in RMSTs is 0.1 and the 95% CI for the ratio of RMSTs is 0.95 to 1.2 during 1-year follow-up. The point estimate of NNT RMST is 10 with the corresponding 95% CI of -20 to 5. In such cases, the 95% CI of the NNT RMST should be interpreted as the number needed to benefit (NNT RMST ), which is from 5 to 1, and the number needed to treat to harm (NNT RMST ), which is from 20 to 1. In more concise notation, it can be termed as NNT RMST = 10 (NNT RMST 5 to 1 to NNT RMST 20).

Average life gain (ALG)
Typically, there is no criterion to make a direct comparison between the NNT ARR and NNT RMST . Toward this goal, we introduce a new concept of the average life gain, which is defined as the additional average survival time of patients receiving the treatment in comparison with the control group during the t-period follow-up. For the NNT RMST , the ALG RMST is the difference in RMSTs between the treatment and the control group up to time t. For the NNT ARR , the ALG ARR is computed as the mean survival time for one death or an event in patients up to time t (which can be approximated as the RMST in the control group) divided by the NNT ARR .

Example 1. Radical prostatectomy trial
The SPCG-4 (Scandinavian Prostate Cancer Group Study Number 4) [22] trial tested whether radical prostatectomy would reduce the mortality among men with localized prostate cancer in comparison with watchful waiting. A total of 695 patients were randomized to the radical prostatectomy or the watchful waiting group from October 1989 to February 1999 and followed until December 2017. The primary endpoint was death from prostate cancer with death from other causes treated as a competing event. Fig 1A shows the survival curves, where no significant treatment effect is observed during the first four years, and afterwards the survival curves appear to be notably different.
In addition to the cumulative incidences and the hazard ratios, Bill-Axelson et al. [22] also reported the absolute difference in risk and the corresponding NNT to quantify the treatment effect at the 23-year follow-up. The estimated NNT ARR was 8.8 (95% CI, 5.2 to 27.8), which indicated that the number of patients needed in the radical prostatectomy group to prevent one death was 8.8 during the 23-year follow-up. Clinically, the NNT ARR reflects the cumulative treatment effect at the 23-year follow-up rather than the profile of the benefit conferred by the radical prostatectomy over time as the KM curves are initially close, and then diverge after 4 years ( Fig 1A). Furthermore, the estimated NNT ARR at 20 and 23 years were exactly the same with the value of 8.8, while the average survival times for patients being followed up to 20 and 23 years were different (Table 1). In such cases, the NNT ARR estimated at an arbitrary time point may not adequately account for the long-term follow-up effect. As a result, it is rather difficult to explain the equality of the NNT ARR s at 20 and 23 years to clinicians and patients. Last but not least, the uncertainty of the estimated NNT ARR depends on the CI of the ARR at the specific time point, which further relies on the event rates in the treatment and control groups, but not on patient exposure time. In some situations, it may also lead to an unstable CI of the NNT ARR , particularly when the observed event rates are low during the long-term follow-up period (Table 1).
An alternative approach to summarizing the effect of the radical prostatectomy is based on the NNT RMST . The estimated NNT RMST decreased with the follow-up time, which qualitatively reflected the benefit of the radical prostatectomy. These trends were identical to those in the NNT ARR , confirming the advantage of the NNT RMST in capturing the necessary information. This is why the NNT RMST and NNT ARR curves diverge in Fig 1B. Moreover, the NNT RMST s at 20 and 23 years were 16.4 (95% CI, 9.6 to 52.6) and 13.7 (95% CI, 8.1 to 40.0) respectively, which suggested that the number needed to treat at 23 years was smaller than that at 20 years owing to the cumulative effect conferred by the radical prostatectomy. In such cases, the NNT RMST provides a more reasonable measure than the NNT ARR . To further explore the difference between the NNT ARR and NNT RMST , Fig 1C presents the ALG respectively based on the NNT RMST and NNT ARR during the follow-up period. It is evident that the ALG based on the NNT ARR was overestimated compared with that based on the NNT RMST , leading to an exaggerated treatment effect. Clearly, the NNT RMST inherits the nature of the survival time and provides a more accurate and intuitive estimate than the NNT ARR .

Example 2. Atezolizumab and nab-paclitaxel trial
The IMpassion130 trial, [23] a phase 3 trial of an anti-PD-L1 or anti-PD-1 antibody in patients with metastatic triple-negative breast cancer, was conducted to evaluate the potential benefit of and watchful waiting groups. (B) The NNTs based on the difference in restricted mean survival times (NNT RMST ) or the absolute risk reduction (NNT ARR ). The rescaled y-axis accommodating infinity to distinguish NNT to harm (NNTH) and NNT to benefit (NNTB). (C) The ALG for the NNT ARR and NNT RMST  first-line atezolizumab plus nab-paclitaxel (AP) in comparison with placebo plus nab-paclitaxel (PP). The trial enrolled 452 patients in each group with a median follow-up of 12.9 months. The primary endpoint was progression-free survival (PFS) as shown in Fig 2A. Although the survival curves are initially close, they begin to diverge and then move close again multiple times during the follow-up period. The survival curve of the AP group always stays above that of the PP group, suggesting that patients receiving AP had improved PFS. We estimated the NNT ARR and its 95% CI at different time points. The NNT ARR was 10.1 (95% CI, 6.0 to 38.5) at 6 months and then continuously increased with the follow-up time, as shown in Table 1 and Fig 2B. It is interesting to note that such NNT ARR s only capture the cumulative treatment effect at a specific time point, and may be misinterpreted in clinical practice. For example, the NNT ARR s were 29.4 (95% CI, 11.2 to 47.6) and 55.6 (95% CI, 13.3 to 1 to 25.6) at 21 and 24 months respectively, which suggests that the number of patients receiving AP to prevent one death at 24 months was 1.9 times higher than that at 21 months. However, the average PFS time for patients being followed up to 21 and 24 months were quite similar with 7.3 and 7.5 months, respectively (Table 1). Such inconsistent results may cause confusion among patients and clinicians. Moreover, the ALG based on the NNT ARR decreased, particularly at the end of the followup period (Fig 2C), which failed to depict the cumulative treatment effect of AP. In contrast, the estimated NNT RMST decreased with the follow-up time, indicating that patients receiving AP had improved PFS. For example, the NNT RMST s at 21 and 24 months were 6.0 (95% CI, 3.6 to 18.3) and 5.6 (95%CI, 3.3 to 17.5) with the corresponding ALG of 1.2 and 1.3 months. These results further confirm that the NNT RMST is a more rational choice than the NNT ARR when summarizing the study results and communicating with practitioners in clinical practice.

Example 3. Caplacizumab trial
The third example is from a recent clinical trial reported by Scully et al. [24] with an aim to test whether the treatment with caplacizumab could expedite the confirmed normalization of the platelet count in comparison with placebo among patients with acquired thrombotic thrombocytopenic purpura (TTP). A total of 145 patients were randomly assigned to the caplacizumab or placebo to record the time to normalization of the platelet count. The median time to normalization of the platelet count was shorter with caplacizumab (2.7 days; 95% CI, 1.9 to 2.8) than that with placebo (2.9 days; 95% CI, 2.7 to 3.6). Fig 3A shows the proportion of patients without confirmed platelet normalization for the two groups during the follow-up period. The two survival curves are initially indistinguishable, and then diverge and converge twice during the follow-up, before eventually crossing 18 days afterwards. However, the benefit of using caplacizumab continuously exists until the end of the trial, as the average time to confirmed platelet normalization in the caplacizumab group was shorter than that in the placebo group.
The estimated NNT ARR fluctuated with an unstable 95% CI during the follow-up period. For example, the NNT ARR s were 4.6 (95% CI, 2.7 to 15.9) and -52.6 (95% CI, 14.9 to 1 to 9.5) at 3 and 20 days respectively, indicating that the treatment effect of caplacizumab reversed around day 20 (Fig 3A and 3B). In addition, we estimated the NNT RMST at 3 and 20 days to be 13.4 (95% CI, 5.9 to 1 to 49.5) and 2.3 (95% CI, 1.1 to 1 to 25.4) with the corresponding ALG of 0.2 and 1.5 days. The results suggested that caplacizumab decreased the time to confirmed platelet normalization, although no significant difference was observed between caplacizumab and placebo, as shown in Table 1 and Fig 3C. In such cases, the NNT RMST outperforms the NNT ARR and provides a more sensible and accurate measure of the treatment effect.

Hypothetical example
To further explore the advantages and disadvantages of each definition of the NNT for survival endpoints, four hypothetical scenarios were constructed. Scenario 1 reflects two survival curves with an increasing treatment effect over time (Fig 4A). Both NNT RMST and NNT ARR decrease with the follow-up time, and the corresponding ALG increases. Compared with the NNT RMST , the NNT ARR may overestimate the treatment effect during the follow-up period as it does not account for patient exposure time. In Scenario 2, two survival curves converge at the end of the follow-up (Fig 4B). The survival curve in the treatment group always stays above that in the control group until the end of the study, which indicates that patients receiving the experimental treatment have prolonged survival time. The ALG RMST increases with the follow-up time, whereas the ALG ARR first increases, but then decreases to 0. Scenario 3 reflects that two survival curves cross during the follow-up period (Fig 4C). The overall (or cumulative) treatment effect still exists, as the mean survival time in the treatment group is larger than that in the control group. The NNT ARR captures the local treatment effect rather than the overall treatment effect, e.g., NNT ARR is infinity at time 1.5. In contrast, the NNT RMST depicts the cumulative treatment effect at a chosen time point and reflects the profile of the treatment effect over the follow-up time. In Scenario 4, two survival curves with a mixture of short-and long-term effects are presented (Fig 4D). It is evident that the treatment effect continuously exists during the follow-up period. However, the NNT ARR is the same of 4 from time 1 to 2, which fails to reflect the treatment benefit and may further cause misunderstanding. In contrast, the NNT RMST provides a more reasonable measure as it decreases gradually until the end of the study.
In addition, we conducted simulation studies under the four hypothetical scenarios to examine the performance of NNT RMST and NNT ARR in terms of the biases of NNT and ALG. A total of 5000 simulation studies with a sample size of 500 were carried out, and the simulation results in Fig 5 show that the NNT RMST outperforms the NNT ARR , as the NNT RMST inherits the advantages of quantifying the average "survival" (event-free) time. It is worth noting that the performance of NNT RMST is worse only under scenario 2 at time point t = 0.5, because the follow-up time is short and there is little information and thus large variation in RMST by t = 0.5.

Discussion
As an essential component of RCTs, interpreting the evidence of the treatment effect to practitioners plays a vital role in their decision making under the risk-benefit consideration. The popularity of the ARR in medical research makes the NNT ARR a primary tool for quantifying treatment effect, [2,6,25] although it has significant drawbacks. Nowadays, the RMST-based quantitative measures have been advocated to be a primary tool for clinical trials and to help practitioners to understand the treatment effect better. [17,18,[26][27][28][29] We mainly compared pros and cons between the NNT ARR and NNT RMST via the ALG along with three real examples and four hypothetical scenarios. The NNT RMST can accurately convey the likelihood of the treatment success and aligns more closely with the patient perspective by converting the treatment effect into "the chance of benefiting 1 in X", which further facilitates the rational decision making when several treatments are available. In addition, the uncertainty of the benefit is quantified by constructing the CI of the NNT RMST . Not only does the NNT RMST inherit the plus nab-paclitaxel and the placebo plus nab-paclitaxel groups. (B) The NNTs based on the difference in restricted mean survival times (NNT RMST ) or the absolute risk reduction (NNT ARR ). The rescaled y-axis accommodating infinity to distinguish NNT to harm (NNTH) and NNT to benefit (NNTB). (C) The ALG for the NNT ARR and NNT RMST during the follow-up time.
https://doi.org/10.1371/journal.pone.0223301.g002 intuitive interpretation of the NNT ARR but it also overcomes the shortcomings of the NNT ARR to some extent, providing a better alternative measure. For example, the NNT RMST (1) provides an easy-to-interpret clinically meaningful summary of the treatment effect; (2) has a well-established calculation procedure; (3) conveys the uncertainty of the NNT RMST at a specific time point t; (4) reflects the profile of the treatment effect during the follow-up period, which further explains why the NNT RMST and NNT ARR curves diverge in the figures; (5) has a coherent estimate when either the event rates are low, the survival curves cross, or a mixture of survival patterns exist; (6) makes full use of the available information and accounts for the follow-up effect; (7) quantifies the average "survival" (event-free) time directly.
(B) The NNTs based on the difference in restricted mean survival times (NNT RMST ) or the absolute risk reduction (NNT ARR ). The rescaled y-axis accommodating infinity to distinguish NNT to harm (NNTH) and NNT to benefit (NNTB). (C) The ALG for the NNT ARR and NNT RMST during the follow-up time.
https://doi.org/10.1371/journal.pone.0223301.g003 Nevertheless, it is arguable that determination of the more appropriate measure depends on the real situation rather than either the estimation procedure or the stability of the method throughout the entire follow-up period. For example, if survival to a specific time is a critical indicator of treatment effectiveness, then the NNT ARR at the specific time may be more appropriate. Furthermore, there still exist some limitations of the NNT RMST . For instance, the NNT RMST cannot reflect the importance of endpoints, fails to convey the cost-effectiveness of the treatment, and requires a prespecified follow-up time. Despite these limitations, the NNT RMST is still of great value as it reflects the treatment effect between the two groups accurately and intuitively. We have developed the R software package "nnt" to facilitate the calculation of the NNT RMST , which can be freely downloaded from R-CRAN.

Conclusion
The NNT RMST T can be used as an alternative measure for quantifying treatment effect in RCTs, especially so in the case of the ALG, which helps practitioners to better understand the magnitude of the benefit conferred by treatment. Supporting information S1 Appendix. Technical notes. (DOCX)