Development of new TB regimens: Harmonizing trial design, product registration requirements, and public health guidance

Christian Lienhardt and colleagues discuss the importance of communication and coordination between regulators, researchers, and policy makers to ensure tuberculosis trials provide high-quality evidence for policy decisions.


Introduction
Under the paradigm of adding a new drug to a regimen or substituting single drugs in a regimen one at a time, it would take 15-20 years to develop an entirely new tuberculosis (TB) regimen comprising three to four new drugs [1]. As has been noted in the papers of this Special The regulatory needs In principle, regulatory authorities overseeing drug development have the primary responsibility of ensuring that the quality, efficacy, and safety of marketed medicinal products are adequate, conforming to currently defined standards. A key role of the regulatory authorities is to determine whether there is a positive benefit-risk balance to support use of the drug for the proposed indication and patient population.
Regulators also continue to reevaluate the benefit-risk balance after approval through pharmacovigilance activities and postmarketing studies. New data that emerge in the postapproval phase are taken into consideration in reassessing the benefit-risk balance, and information is communicated in product labeling as appropriate. Regulators, however, are not expected to consider cost-effectiveness or to perform in-depth evaluations of comparative effectiveness in assessing benefit and risk or for defining treatment policies. This role lies, rather, within the scope of public health recommending bodies, and, even if at times there seems to be some overlap, it is important to recognize and understand the implications of this distinction. Some regulatory agencies have mechanisms for accelerated reviews and early approval of new drugs that address unmet needs according to specified criteria-e.g., the conditional marketing authorization pathway in the European Union where the benefit-risk balance of the new drug is such that immediate availability justifies acceptance of less comprehensive data than normally required [7,8]. In the United States, the accelerated approval pathway allows for the approval of a product for a serious disease with an unmet need based on a surrogate or an intermediate clinical endpoint that is reasonably likely to predict clinical benefit [9]. The PLOS Medicine | https://doi.org/10.1371/journal.pmed.1002915 September 6, 2019 2 / 13 supplies or funding for PK sub-studies. One company, Sanofi, has provided 6 unrestricted grants to the CDC Foundation over the years 2007-2015 totaling~$2.8 million to facilitate or support TBTC work related to rifapentine. These funds have supported several PK sub-studies, supported 3 contract research staff, have funded travel to TBTC scientific meetings for invited speakers (all in coach class), and have supported expenses related to fulfillment of company requests for data and data formats as part of their efforts to use TBTC data to support regulatory filings. None of these funds have otherwise benefited members of his research group.
accelerated approval pathway has been used primarily in conditions in which the disease course is long and an extended period of time would be required to measure the intended clinical benefit of a drug. The implication is that, while awaiting further data to be generated postapproval, there may be limited data to support policy recommendations at this stage. Development of new TB drugs and regimens is a good example of a scenario in which regulators need to establish that a drug submitted for licensure is safe and effective for the proposed use, whereas recommending bodies need to define how to use the drug optimally within a regimen in a way that addresses the public health need. Often, demonstrating the safety and effectiveness of a drug is the first step. Although a single clinical study cannot answer all research questions at once, it is still worth exploring clinical study designs that maximize the chance of gathering evidence that is informative both for assessing the benefit-risk of individual drugs and for determining their optimal use in the context of TB regimens. In view of the shift in focus toward the development of new treatment regimens, the European Medicine Agency (EMA) has proactively issued updated guidance to developers to address such scenarios [10]. In July 2017, the US Food and Drug Administration (FDA) held a public workshop regarding scientific and clinical trial design considerations for development of new TB drug regimens [11]. Of note, the FDA and EMA work collaboratively to provide advice to pharmaceutical sponsors or investigators on various aspects of the clinical trial design and to ensure that, whenever feasible, the same development program addresses the regulatory requirements of these agencies (for instance, the FDA pre-investigational new drug (IND) consultative process allows facilitated early communications between the FDA and potential drug sponsors or investigators [12]).

The public health needs
Countries, technical agencies, donors, and other TB stakeholders, routinely seek guidance and advice from WHO on optimal disease management practices to be adopted based on the evidence available. Over the last decade, WHO has published a series of normative guidance documents for the diagnosis and treatment of all forms of TB, with a particular focus on the needs of low-and middle-income countries [13]. In 2007, WHO adopted a procedure to guarantee that guidelines are based on the best available evidence and meet the highest international standards. Using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) framework, which relies on the use of systematic reviews and meta-analyses, the findings of these reviews are then considered in the context of implementation and feasibility issues of stakeholder countries [14,15]. The GRADE framework provides an explicit and transparent approach to assess the level of certainty in the evidence across relevant studies and outcomes and to translate that evidence to recommendations. This framework incorporates multiple processes to minimize bias and optimize usability and requires rigor, fairness, and transparency in all judgments and decision-making.
To formulate evidence-based recommendations, four key aspects are taken into account: (1) the respective magnitude of benefits and harm conferred by the intervention under evaluation; (2) the consideration of resource use, feasibility, acceptability, and equity; (3) the certainty ("quality") of evidence; and (4) patients' values and preferences. Based on this assessment, the proposed recommendation is qualified as "strong" or "conditional" (i.e., "weak"), reflecting the extent to which one can, across the range of patients for whom the recommendation is intended, be certain in the evidence that the desirable effects of the given intervention outweigh the undesirable effects. The assessment of each of the above aspects leads, understandably, to the consideration of a number of nuances when moving from clinical trial results to public health policy making. As a result, the final qualification of the recommendation ultimately has implications for the way policy makers, clinicians, and patients interpret and adopt the guidance, as shown in Table 1.
Recent developments highlight how trial results that are used as the basis for regulatory approval may allow only conditional recommendations for policy making due to the use of surrogate endpoints and limited data on patient-and population-relevant outcomes. As an example, the accelerated approval of bedaquiline by the US FDA in December 2012, based on the surrogate endpoint of sputum culture conversion at 6 months, allowed the drug to be readily used in the treatment of multidrug-resistant (MDR)-TB under certain conditions in the field [16]. However, the data gathered from the pivotal Phase II trial appeared inadequate for policy decision-making because of the absence of information on the outcomes of interest (nonrelapsing cure); further, the selected design did not provide information on the optimal use of the drug in combination with others or whether the addition of the drug would allow any modification in treatment duration. Finally, there was an excess of deaths in the experimental arm, the significance of which was uncertain given the small sample sizes and lack of long-term follow-up. These limitations in the available evidence at the time of regulatory review led to the adoption of a conditional recommendation that had implications in terms of wider scale-up of the intervention. Thus, for bedaquiline, results of the pivotal Phase II trial, in addition to relevant safety data, were adequate for obtaining regulatory approval but appeared insufficient for wider policy recommendations [17], thus calling for postlicensure evidence generation. The yield of a large body of observational data obtained over a subsequent period, associated with large individual-patient data meta-analyses, allowed WHO to update its recommendations for MDR-TB treatment in December 2018 [18], with significant changes in the assessment of the quality of evidence. As a result, bedaquiline is now strongly recommended for use in the treatment of MDR-TB, based on moderate-quality evidence-showing the importance of collecting additional data to complement early trial results. It should be noted that, at the time, the standard of care for rifampicin-resistant (RR)-TB treatment had low efficacy and high toxicity and was based on observational evidence. Though these conditions are now changing, a similar situation may present itself again in the future. Therefore, the experience with bedaquiline raises the question of whether specific trial features and designs can be used to produce endpoints with value for both the regulator and the policy maker. It is with this objective in mind that the Task Force on New Drug Policy Development established by WHO in 2012 worked together with drug developers, regulators, scientists, and program managers to define the policy needs and produce relevant documents [19].

Methodological issues: How to fit both regulatory and programmatic decision-making needs
Could outcome definitions in clinical trials be redesigned to satisfy both regulatory and programmatic decision-making needs? We argue that this is feasible, and WHO Technical Consultation on Advances in Clinical Trials Design for TB Treatment Regimen proposed features and designs that could address this need in greater detail and that are described in relevant papers of this Collection [2,20]. Regulatory agencies rightfully seek to use conservative approaches to endpoint evaluation, relying upon the protection from bias provided by randomization. For certain diseases, including MDR-TB, the expedited approval pathway can be used based on a surrogate or an intermediate clinical endpoint that is reasonably likely to predict a clinical benefit. These endpoints, however, are not fit-for-purpose for programmatic and policy needs. Whereas intensive efforts are underway to identify improved intermediate surrogate markers of treatment outcome with the ability to measure and describe accurately the effect an experimental regimen will likely have on achieving nonrelapsing cure [21,22], no marker has yet been identified that fully serves the needs of TB investigators and regulators, let alone policy makers [23]. The desire for an equivalent to the viral load in HIV and viral hepatitis trials has been often voiced but not yet attained, and current efforts are directed toward identification of markers that might reliably predict efficacy. In addition, combination of bacterial (e.g., minimum inhibitory concentration [MIC]) and host (e.g., pharmacokinetic characteristics, adherence, and perhaps genetic or other features) factors would be of value in dose selection and for predicting outcome [24,25]. Relevant surrogate markers providing highly reliable estimates of treatment outcome, once realized, could provide sufficient evidence for guideline development beyond market approval [4], but until then, the TB therapeutics field has to look to novel trial designs, longterm endpoint definitions, and other trial features as a means to generating data pertinent to policy decisions [3].
The "composite" clinical trial endpoint (comprising multiple events such as a combination of failure, relapse, and death) has been used as a mechanism to capture multiple serious outcomes of interest with a programmatic perspective, often allowing for smaller sample sizes. The use of composite endpoints, however, poses some problems, the most significant being that respective endpoints are of differing individual and public health value (i.e., death is always a worse outcome than any other). Further, there are often varying levels of certainty around different endpoints (for example, cause of death is often uncertain in trials performed in low-resource settings). The choice of the components of a composite endpoint should be made carefully: because the occurrence of any one of the individual components is considered to be an endpoint event, each of the components is of equal importance in the analysis of the composite [26]. For these reasons, when composite outcomes are used, it is essential that information on all their components be collected in such a way that they can be disaggregated and individually reported. As an illustration, endpoints of currently conducted Phase II and Phase III trials of TB drugs or regimens are shown in Table 2.
Noninferiority (NI) design has become the design of choice in most Phase II and Phase III trials of new TB drugs and regimens over the last decade, either because of the high efficacy of the control regimens (as in drug-susceptible TB) or because of the interest in shortening treatment (as in the case of DR-TB). NI trial designs, however, pose a number of methodological questions, particularly in terms of analysis [27]. In NI trial designs, different analysis populations are of interest-the effect in all randomized patients and the effect in those who can adhere to treatment, which have historically been estimated using the intention-to-treat (ITT) and the per protocol (PP) populations, respectively [28]. The ITT principle allows virtually all patients to contribute information to the primary trial analysis. In this approach, all randomized patients are included in the analysis of results, and favorable status is assigned only to those patients whose favorable outcome is documented; all others are deemed unfavorable or nonassessable (including those lost to follow-up, those whose therapy is altered, those who die or withdraw early, etc.). The PP population, conversely, is composed of those randomized and  otherwise eligible participants who complete the trial without significant deviation from the intended trial behavior; in particular, such participants typically satisfy minimal requirements for adherence to the trial interventions. Analysis with each of these two populations should lead to similar conclusions for a robust interpretation [29]. The ICH E9 Guideline further specifies that "any differences between them can be the subject of explicit discussion and interpretation" [30]. This concern arises in part from the recognition that adherent participants differ in unknown ways from those who are not adherent, as they may have more favorable outcomes, no matter what their randomized therapy [31]. The analyses of these trials are most robust when there is a high level of adherence, as inadequate therapy in all trial arms may lead to equally poor performance across arms and nonadherers are imputed as treatment failures in the analysis of all randomized patients, risking creating a false conclusion of NI. Consequently, it is extremely important that trial protocols encourage a high level of adherence. Finally, the generalizability of findings from preapproval clinical trials to the different populations and areas of interest to policy makers is also a significant concern. Some populations may be underrepresented in clinical trials conducted for approvals (e.g., children, elderly people, pregnant women, persons with advanced comorbid illness), whereas others are excluded for reasons of feasibility (e.g., those living far away from a clinic or deemed unreliable for follow-up). Significant problems have arisen from the assumption of generalizability [32]. When a successful trial establishes the efficacy of a new agent or regimen, efforts are then needed to expand exploration of the regimens in broader populations, or through additional pragmatic trials, such as the endTB trial [33]. The need for such trials is unlikely to be addressed through any innovations in design, but the rationale for excluding special populations even from early and middle phases of development is currently being revisited in the TB therapeutics field [2,3,20].

The link between registration and public health recommendations: Implications for national TB programs and the way forward
For TB program managers and policy makers at the country level, the successful registration of a candidate drug is only one component of the decision-making process around adoption and use. Feasibility, acceptability, resource use, equity, and quality of life are also considered when formulating public health recommendations, and these rely on qualitative data that need to be collected in parallel to quantitative assessment of evidence. WHO guidelines are key for the development of national policies for the care of TB patients. However, when reliable data are lacking, recommendations are predominantly based on low or very low certainty in the evidence, which creates challenges for the potential rapid adoption, successful implementation, and subsequent uptake of the new therapies-as has been the case with the treatment of DR-TB [34,35]. Moreover, recommendations, even if based on low or very low certainty in the evidence, will often create the perception of a new "standard of care" that subsequently complicates the ability to fund and conduct pragmatic trials that would address the uncertainty left by the lack of data. Policy makers, donors, and ethical review bodies should be aware that significant uncertainty persists when recommendations based on very low or low certainty are adopted and that further research is essential to test the merits of the new standard of care proposed. Such additional research can generate postlicensure data that are important for the update of policies, as in the case of the recent WHO DR-TB treatment guidelines [18,36] (Table 3).
Drug and regimen developers already have formal mechanisms of communication with regulators, but the engagement of policy recommendation institutions should be actively encouraged and pursued as early as possible at design stages. One example of the value of such communication relates to the definition of outcomes selected for trials. Discussions with regulatory authorities usually identify endpoints that address foundations of efficacy, safety, and tolerability in studies with shorter follow-up duration; however, these outcomes may not provide adequate information for guideline developers and policy makers to endorse a given drug for use in regimens. Integration of long-term outcomes into TB trials as much as is feasible, along with the standardization of outcomes, should be a top priority for the TB therapeutics field, using, for example, the novel Phase IIC design, wherein follow-up is extended and the experimental regimens are used for their intended total duration [37].
Finally, standardized data collection and outcome definitions compatible with the Clinical Data Interchange Standards Consortium (CDISC) platforms are required by regulatory bodies. These have enhanced the ability to optimally use GRADE-based methodological approaches to evaluating the evidence, and should be similarly considered by policy makers. The application of such data standards to cohorts and the collection of national TB program data would be an invaluable step forward by allowing real-world data analyses that will greatly inform policy decisions. Until then, TB clinical trialists and regimen developers are strongly encouraged to share individual patient-level data with policy makers to permit meta-analytic data synthesis approaches to be used in the GRADE methodology [38]. Data sharing in the domain of TB is a matter of global public good, and funders, donors, and implementers of trials should not only mandate such expectations for their clinical trials but also allocate funding to support the careful curation of data accessible to the public and to policy makers for future analyses. Consider postauthorization studies to answer some of the questions that cannot be addressed in the registrational trial(s) to help bridge gaps in knowledge.
Treatment success outcomes in recent trials of MDR-TB were much higher than that reported in prior trials and across program settings. Further research is needed to better understand the performance of the standard of care for rifampicinsusceptible and rifampicin-resistant TB in various conditions and settings to aid in the design of future studies. How can current/novel clinical trial endpoints that are intended to support regulatory decisions be subsequently translated to support programmatic implementation?
Operational research can help to translate clinical trial outcomes into WHO guidance and add evidence for better programmatic implementation. Often, patients enrolled in trials are not reflective of the general population; consider ways to make trial population more reflective of the population of patients who will be receiving treatment in real life. Also consider pragmatic studies for better evidence on programmatic implementation.
Should the assessment of clinical trial outcomes be updated for harmonization across regulatory and programmatic objectives, and if yes, how?
Communication between drug/regimen developers, regulators, and recommendation bodies is essential and should be encouraged and facilitated as early as possible at design stages.
Approaches to collecting clinical outcomes data that can potentially address assessment of safety and efficacy of the product and answer questions that are important from a programmatic perspective should address the following: • secondary/exploratory analyses are an option-but caution in overinterpreting the data • sample size implications if multiple primary analyses considered • importance of prespecifying analyses; consistent definitions across different trials are needed; limitations of using surrogate endpoints (e.g., 2-month culture conversion) for development of guidelines.
How to ensure that trial data at the individual-patient level can be pooled for enhanced meta-analysis when reviewing evidence for policy making by WHO and other professional bodies Data should be collected using standard definitions, and use of data standards for clinical trial is essential. Clinical trial data should be made available for sharing so as to conduct individual patient-level data analyses. Such databases are used by WHO and other recommending bodies for policy development. GRADE method should be well understood by all stakeholders As data quality improves, recommendations based on lower-quality data should be reexamined. A relevant process to address this should be established.

Conclusion
Given the recent enthusiasm for pursuing novel trial designs in TB therapeutics [37,39], more interactions will be needed between researchers responsible for designing the next generation of TB trials, regulators, and policy makers. This will allow better harmonization across the research pipeline and subsequent policies on access to TB medicines. Further, stakeholders, including donors and funders, need to acknowledge that both explanatory and pragmatic trials are needed to answer questions about efficacy and safety (explanatory) as well as expected effectiveness in programmatic conditions (pragmatic). In all cases, endpoints should be specific to the purposes. Late-phase clinical trial outputs that serve the objective of registration of a new TB drug or regimen can indeed meet the needs for development of public health guidelines, provided that data on long-term, patient-relevant, and population-relevant outcomes are being collected. Additionally, public health factors such as feasibility, acceptability, resource use, equity, and quality of life should be part of data collections, as these are necessary when formulating public health recommendations. The existing dialogue between drug developers and regulators should be expanded to policy makers under formal mechanisms of consultation, such as the one offered by WHO Task Forces [19]. More effective input from policy makers could greatly streamline and strengthen the value of TB clinical trial data in clinical settings. Such interactions with policy makers can be invaluable at the design stages and would result in better harmonization between the research pipeline and policies on access to TB medicines. The broad discussions that we propose would also ensure that secondary pooled analyses performed by WHO (or other policy-recommending bodies) are reliable and that the risk of conflicting interpretation and messaging provided by investigators and policy makers is reduced and usefully contribute to the generation of reliable and relevant data for further policy guidance on the treatment of all forms of TB [2].