^{1}

^{2}

The authors have declared that no competing interests exist.

In this Collection Review for the Novel Treatments for Tuberculosis Collection, Piero Olliaro and Michael Vaillant discuss the considerations when choosing a non-inferiority margin that is meaningful from statistical, ethical, clinical, and health standpoint.

The noninferiority design is being adopted in tuberculosis treatment trials to identify regimens that may have practical advantages over current standard therapy (e.g., being shorter, easier to adhere) and thus are more efficient in real-life settings, even while accepting that they might be less effective to a certain degree.

This margin of acceptance is called the noninferiority margin, or delta. How narrow or wide the margin should be, and how this translates into acceptable losses and desired gains, is a matter of debate.

Noninferiority trials are trials of ‘trade-offs’, in which one has to decide what one can lose in terms of pure efficacy against what one expects to make up in terms of effectiveness, tolerability, deployability, affordability, or else when replacing an existing intervention with a new one.

This paper is about the principles behind identifying a ‘meaningful noninferiority margin’—that is, a margin that is meaningful from a statistical, ethical, clinical, and health standpoint.

Pragmatic approaches to expressing treatment effects using the number needed to treat (NNT), the reciprocal of the absolute risk reduction, with NNT for one patient to benefit (NNTB) and NNT for one patient to be harmed (NNTH) are useful to understand the implications of outcome definition and find a way to quantify gains and losses.

Applying the noninferiority design to pragmatic (effectiveness) trials in addition to efficacy/safety trials would help quantify the trade-offs in real life.

Identifying effective regimens for tuberculosis (TB) is challenging; trials are long between treatment and follow-up and require large sample sizes, so they take a long time to complete and are expensive. Oftentimes, they are also inconclusive. Lienhardt and Nahid [

Hardly present in the medical literature before the year 2000, the noninferiority design has gained in popularity across disciplines and medical interventions in the past 2 decades. A recent paper [

The noninferiority design is generally chosen when it is felt that a new medicine or intervention conveys benefits over the existing approved standard of care (such as better tolerability, real-life effectiveness, accessibility, or affordability), which would be enough to justify a ‘trade-off’ [

Central to the design of these trials is therefore establishing a noninferiority margin, which should ‘preserve a minimum clinically acceptable proportion of the effect of the active treatment compared with placebo. This margin cannot be greater than the smallest effect size for the active treatment that would be expected in a placebo-controlled trial’ [

With the noninferiority design, the null hypothesis is that treatments are different, a type I error is to wrongly accept an inferior intervention, and a type II error is to reject a noninferior intervention [

Error bars are 95% CI of treatment differences expressed as ARR. Arbitrary values are used for illustration purposes. Outcomes on the right of the ‘zero’ (no-difference) line favour the new treatment. Noninferiority margin set at −10%. ARR, absolute risk reduction; CI, confidence interval.

Although opinions have shifted over the years, it is now generally agreed that conclusions should take into account the result of the analyses of both the modified intent-to-treat (mITT) and the per-protocol (PP) population and that conclusions are more robust when the results of the analyses of both sets are consistent [

In this paper, we consider the implications of the noninferiority design for TB treatment trials, identify specific issues, and propose practical options. In particular, we focus on the choice of the noninferiority margin and clinically relevant end points; how these can be taken into account to weigh losses versus gains; and how to link statistical, clinical, ethics, patients’, and public health imperatives in a way that these studies can be designed and interpreted with a view to informing policy decisions and ultimately improving health outcomes.

The noninferiority design has been adopted in explanatory treatment trials of active TB for newly diagnosed (expectedly drug-sensitive) TB (DSTB) (five trials completed and reported [

Indication | Study [Reference] | Regimen | Comparator | Delta | Outcome | Note |
---|---|---|---|---|---|---|

DSTB | Jindani, 2004 [ |
8HRZE | 6HRZE | 5% | inferior | |

DSTB | Jindani, 2004 | 8HRZE (weekly |
6HRZE | 5% | inferior | |

DSTB | Gillespie, 2014 [ |
4HRZM | 6HRZE | 6% | inferior | |

DSTB | Gillespie, 2014 | 4RZEM | 6HRZE | 6% | inferior | |

DSTB | Jindani, 2014 [ |
4HRZM | 6HRZE | 6% | inferior | |

DSTB | Jindani, 2014 | 6HRZM | 6HRZE | 6% | noninferior | |

DSTB | Merle, 2014 [ |
4HRZG | 6HRZE | 6% | inferior | |

DSTB _noncavitary | Johnson, 2009 [ |
4HRZE | 6HRZE | 5% | inferior | |

DSTB noncavitary | Alipanah, 2016 [ |
4HRZM/E | 6HRZE | 6% | noninferior | meta-analysis |

DSTB | Lienhardt, 2011 [ |
6HRZE_fixed | 6HRZE_loose | 4% | noninferior | |

DSTB | Aseffa, 2016 [ |
6HRZE fixed | 6HRZE_loose | 4% | noninferior | |

DSTB | TBTC Study 31 | 4RHE; 4RptZHE | 6HRZE | 6.6% | enrolling | |

DSTB + DRTB | STAND | PaMZ | 6HRZE | 12% | active, nonrecruiting | NCT02342886 |

MDRTB | STREAM [ |
40–48 weeks | 18–24 months | 10% | noninferior | NCT02409290 |

MDRTB | endTB | 5 arms | SOC | 12% | enrolling | NCT02754765 |

MDRTB | PRACTECAL | 2 arms | SOC | 12% | enrolling | NCT02589782 |

* In the continuation phase.

Abbreviations: DRTB, drug-resistant TB; DSTB, drug-sensitive TB; E, ethambutol; G, gatifloxacin; H, isoniazid; M, moxifloxacin; MDRTB, multidrug-resistant TB; Pa, pretomanid; R, rifampicin; Rpt, rifapentine; SOC, standard of care; STAND, Shortening Treatments by Advancing Novel Drugs; TB, tuberculosis; TBTC, Tuberculosis Trials Consortium; Z, pyrazinamide.

The standard treatment for newly diagnosed DSTB is a 6 month regimen made of a 2 month, four-drug intensive phase with daily isoniazid (H), rifampicin (R), pyrazinamide (Z), and ethambutol (E), followed by a 4 month, two-drug phase with H and R (4HRZE/2HR) [

The current standard WHO-recommended ‘conventional’ regimen for MDRTB requires 18–20 months [

Of the trials listed in

Nunn and colleagues [

Noninferiority trials of DSTB so far have used noninferiority margins ranging from 4% to 6.6% (mostly 6%) (

A 4-percentage-point shift from a 6% to a 10% margin (see

18m, 18 month follow-up from treatment start; 24m, 24 month follow-up after treatment end; ARR, absolute risk reduction; CI, confidence interval; DSTB, drug-sensitive tuberculosis; mITT, modified intent-to-treat set; PP, per-protocol set.

This example further illustrates how critical it is to establish a noninferiority margin that is ‘meaningful’ statistically, clinically, and programmatically and is ethically acceptable. But how can the results of a trial be made to speak to clinicians and policy makers?

We propose to translate the ARR into number needed to treat (NNT). Although objections have been raised to the use of NNTs [

The NNT is easy to calculate: it is the reciprocal of the ARR (NNT = 1 / ARR); similarly, the CI is calculated by inverting and exchanging the upper limit (UL) and lower limit (LL) of the CI for ARR [1 / UL (ARR), 1 / LL (ARR)]. However, complications arise when there is no difference between treatments because, when the ARR is zero, the NNT is infinite, and the CI of the NNT will comprise infinity, thus violating the continuity between the CI limits. The classical Wald’s CIs suffers from a series of limitations (see, for instance, Newcombe [

ARR, absolute risk reduction; CI, confidence interval; DSTB, drug-sensitive tuberculosis; E, ethambutol; H, isoniazid; mITT, modified intent-to-treat; NI, noninferior; NNTB, number needed to treat for one patient to benefit; NNTH, number needed to treat for one patient to be harmed.

Clinical trial | PP | mITT |
---|---|---|

OFLOTUB | NNTH = 16 (41–10) | NNTH = 13 (24–9) |

REMOX (H) | NNTH = 15 (NNTH 5 to ∞ to 39) | NNTH = 17 (NNTH 6 to ∞ to 47) |

REMOX (E) | NNTH = 9 (14–6) | NNTH = 11 (16–8) |

RIFAQUIN, 4M | NNTH = 8 (15–5) | NNTH = 9 (19–6) |

RIFAQUIN, 6M | NNTB = 59 (NNTB 17 to ∞ to NNTH 40) | NNTB = 120 (NNTB 25 to ∞ to NNTH 42) |

Abbreviations: CI, confidence interval; E, ethambutol; H, isoniazid; mITT, modified intent-to-treat population; NNT, number needed to treat; NNTB, NNT for one patient to benefit; NNTH, NNT for one patient to be harmed; PP, per-protocol; TB, tuberculosis.

Example 1: fluoroquinolone-substitution trials in DSTB. In the OFLOTUB trial [

By contrast, with a 6 month moxifloxacin regimen with rifapentin given intermittently in the continuation phase (the RIFAQUIN trial [

Example 2: STREAM trial in MDRTB [

Using a composite end point is practical (as it summarises findings into a single message), but we must be aware of two potential issues.

One is that, as mentioned earlier, changing from ‘relapse’ to ‘unfavourable outcome’ (generally including primary failure, relapse, and death) inflates the failure rate and has effects on the power and sample size calculation of the study. For instance, a change from 5% to approximately 10%–15% failure rate (depending on the population analysed) means that the required sample size could double or triple; for a noninferiority margin set at 6%, the sample size would increase by 1.8%–2.6%, 1.9%–2.8%, 2.1%–3.1%, and 2.3%–3.5% for risk differences from 1% to 4%, respectively. An example of implications for sample size calculation is presented in

The sample size was calculated with a one-sided chi-squared test for comparison of two groups by specifying the delta, the reference proportion, and the expected difference. The test statistic is assumed to have a null distribution of N(0,1). A description of the underlying calculations can be found in Julious and Owen [

The other complication with a composite end point is that there might be discordant results within its individual components. An illustrative example can be found in the OFLOTUB trial (

ARR, absolute risk reduction; CI, confidence interval; NNTB, number needed to treat for one patient to benefit; NNTH, number needed to treat for one patient to be harmed.

The primary efficacy end point was a composite unfavourable outcome including on-treatment events (failure, death, other adverse event, drop-out, and withdrawal) plus posttreatment recurrence by month 24 on the mITT population. This gives an unadjusted ARR (95% CI) of −3.8% (0.4% to −8%). However, when one looks at these two components separately, the short regimen is significantly better than the standard treatment for on-treatment outcomes (ARR [95% CI] 3.6% [0.7%–6.6%]) but significantly worse for relapse (−7.5% [−4.2% to −10.7%]); at the same time, it has also significantly fewer posttreatment losses to follow-up (5.9% [2.4%–9.5%]).

It is therefore prudent to dissect composite end points in order to verify that the individual components do not have conflicting implications for regimen effectiveness [

Now the question is, Using the previously mentioned examples, how would stakeholders (national TB programme managers, caregivers, patients) weigh losses and gains?

In the case of a one-third reduction of treatment for DSTB given under directly observed treatments (DOTs), what would be the practical gains for the health system (e.g., more time for patient visits, reduced costs, increased efficiency) versus having to deal with one more failure every 10 rather than 40 cases? Would the advantages of a shortened treatment and faster resolution, along with the smaller reduction in wages for the patients, outweigh the disadvantages of excess relapses? Can a national TB programme gear up for actively and systematically following up with patients in order to identify and deal with relapses promptly?

Another example for DSTB: How would health providers and patients value a regimen that is given for the same total duration but weekly (instead of daily) in the 4 month continuation phase [

Similarly, for MDRTB, when a patient is now on treatment for 1.5–2 years, how would they value a regimen that is half as long and might either produce a benefit every 7–13 patients treated or one more failure every 11? The gains here may, however, be offset by the need for drug sensitivity testing and by the toxicity of injectable aminoglycosides used in these shorter regimens, at least until evidence is gained on replacing them with safer drugs [

Where would one draw the line? Information is required on a number of variables which, together, can help quantify gains and losses and thus inform both study design and treatment policy decisions. These cover a range of outcomes—not just efficacy and safety but also patient’s preferences and satisfaction, quality of life, healthcare provider’s preferences and performance, emergence of drug resistance, etc., which are rarely collected in clinical trials.

AE, adverse event; ARR, absolute risk reduction; CI, confidence interval; NNTB, number needed to treat for one patient to benefit; NNTH, number needed to treat for one patient to be harmed; SAE, serious adverse event.

Together with key stakeholders, we need to identify the critical questions and score the answers in order to inform both study design (and identify a ‘meaningful noninferiority margin’ in case of a noninferiority trial) and policy and practice (how to take advantage of the benefits and how to handle the potential problems).

Traditionally, we tend to derive this information from explanatory (‘phase 3’) trials. However, they suffer from an inherent limitation in this regard, in that they typically seek to measure the efficacy (and safety) gains introduced by a new intervention by minimising confounders and standardising eligibility criteria—the very elements we are interested in to decide on the trade-offs. Instead, in order better to quantify gains and losses of a new treatment and its associated effects, it would be useful to apply the noninferiority design also to pragmatic (effectiveness) trials, analysed on the (m)ITT (as well as PP) population.

Though it has been wrongly used to justify ‘me-too’ medicines, the noninferiority design is having a growing place in diseases like TB, which require treatments that are long and cumbersome for patients and health systems alike, and where attributes like adherence, user-friendliness, and tolerability are critical to real-life effectiveness. Although the noninferiority design may be applied to treatment trials in both DSTB and DRTB against the current standard of care, this does not take away the responsibility for finding both more effective and easier-to-comply regimens, especially for DRTB.

This design responds to the need for a planned trade-off between what we think we can afford to lose in terms of efficacy against what we expect to gain in terms of safety, effectiveness, ease of use, costs, etc. It is generally applied when a net gain in efficacy cannot realistically be shown within the conditions of a typical trial, though a superiority test can be applied if noninferiority is demonstrated. However, more work is required to develop end points for TB treatment trials, which will identify regimens that better serve the needs of patients as well as country TB programmes and health providers. Also, using the noninferiority design in pragmatic trials would provide useful information.

The noninferiority margin is a central element in study design and interpretation. Identifying and weighing the appropriate parameters for gains and losses is crucial towards defining a ‘meaningful’ noninferiority margin.

PLO was also a staff member of the UNICEF/UNDP/World Bank/WHO Special Programme on Research and Training in Tropical Diseases (TDR), World Health Organization, Geneva, Switzerland, at the time of the writing of this paper. MV is a staff member of LIH.

The opinions expressed are those of the authors and do not necessarily represent the views and opinions of their employers.

absolute risk reduction

confidence interval

Consolidated Standards of Reporting Trials

directly observed treatment

drug-resistant TB

drug-sensitive TB

ethambutol

isoniazid

lower limit

multidrug-resistant TB

modified intent-to-treat

number needed to treat

NNT for one patient to benefit

NNT for one patient to be harmed

odds ratio

pretomanid

per-protocol

rifampicin

relative risk

Shortening Treatments by Advancing Novel Drugs

tuberculosis

Tuberculosis Trials Consortium

upper limit

pyrazinamide.