Effect of Common Comparators in Indirect Comparison Analysis of the Effectiveness of Different Inhaled Corticosteroids in the Treatment of Asthma

Purpose Indirect comparison (IC) and direct comparison (DC) of four inhaled corticosteroid (CS) treatments for asthma were conducted, and the factors that may influence the results of IC were investigated. Among those factors, we focused on the effect of common comparator selection in the treatment of asthma, where little control group bias or placebo effect is expected. Method IC and DC were conducted using the change from baseline in forced expiratory volume in 1 s (FEV1(L)) as an outcome parameter. Differences between inhaled CS were evaluated to compare the results of IC and DC. As a common comparator for IC, placebo (PLB) or mometasone (MOM) was selected. Whether the results of IC are affected by the selection of a common comparator and whether the results of IC and DC are consistent were examined. Results 23 articles were identified by a literature search. Our results showed that ICs yielded results similar to DCs in the change from baseline of FEV1(L). No statistically significant difference was observed in inconsistency analysis between ICs and DCs. It was clinically and statistically confirmed that ICs with PLB and those with MOM did not differ in terms of the results of FEV1(L) analysis in this dataset. Conclusion This study demonstrated that ICs among inhaled CS can deliver results consistent with those of DCs when using the change from baseline in FEV1(L) as an outcome parameter in asthma patients. It was also shown that using an active comparator has similar results if there is no effect of control group bias. It should be emphasized that the investigation of control group bias is a key factor in conducting relevant ICs so that an appropriate common comparator can be selected.


Introduction
Indirect comparison (IC) analysis has recently been recognized as an alternative method for investigating the efficacy and safety of target interventions when head-to-head comparison data are not available. The number of studies reporting the results of ICs and network meta-analysis is increasing [1]. ICs are used not only in scientific investigations but also in healthcare decision making to assess the efficacy and safety of interventions. When used for healthcare decision making such as reimbursement evaluation and health technology assessment, some authorities, such as the Canadian Agency for Drugs and Technologies in Health, National Institute for Health and Care Excellence in the UK, and Institute for Quality and Efficiency in Health Care in Germany have been increasingly accepting IC results [2]. At the same time, some reports on ICs and network meta-analysis may not have sufficiently investigated the statistical methods and/or appropriateness of the datasets analyzed [3], and therefore it is necessary to establish transparent, uniform methods to assess the quality of ICs.
In response to the above, recently the International Society for Pharmacoeconomics and Outcomes Research-Academy of Managed Care Pharmacy-National Pharmaceutical Council (ISPOR-AMCP-NPC) Good Practice Task Force has proposed using a consensus-based 26-item questionnaire to help decision makers assess the relevance and credibility of ICs of treatment options and network meta-analysis to help inform healthcare decision making [2]. The 26 items are divided into the following five categories: evidence base (selection of study); analysis (statistical method); report quality and transparency; interpretation; and conflict of interest.
We previously reported an IC study of antipsychotics to investigate factors that may influence the results [4]. Control group bias was found to cause differing results between DC and IC. Typical control group bias can be observed between active-controlled and placebo (PLB)controlled studies of mental disorders. If such bias occurs, the absolute value of improvement in the efficacy outcome parameter is usually greater in active-controlled trials than in PLBcontrolled trials, and the absolute dropout rate is usually higher in the latter than in the former. In other words, the difference in the control group can lead to the inflation of outcome parameter scores in some therapeutic areas.
We also pointed that a well-defined endpoint should be used for IC analysis to obtain consistent results. At the same time, little control group bias and placebo effect is expected for inhaled corticosteroid (CS) studies in asthma because objective assessments such as spirometry measurement are commonly used for assessing the efficacy of inhaled CS, while subjective assessments are generally used for evaluating psychiatric diseases such as schizophrenia, depression, and anxiety disorders. In this paper, we not only report IC and DC results but also investigate factors that may influence the outcomes of IC to highlight points for consideration to ensure that it yields credible results. As one such factor, we focused on the effect of common comparator selection using inhaled CS studies for the treatment of asthma as an example.

Study selection
A literature search was conducted in PubMed and Embase, using the key words "fluticasone," "budesonide," "beclomethasone," "mometasone," "forced expiratory volume," and "asthma." Those four interventions were selected because inhaled CS is recommended as the first intervention for mild-to-moderate asthma patients in the Global Initiative for Asthma guidelines [5] that are widely followed in clinical practice. The search was limited to "randomized controlled trial" and conducted in December 2013. All literature published in English from January 1990 through December 2013 was searched. After screening the search results, reports using similar doses and treatment periods ranging from 4 to 26 weeks were selected. If inhaler devices were different, for example, aerosol and dry-powder inhalers, it was first determined whether the conversion dosages were clinically equal. If they were equal, the data were included. Crossover studies were excluded from the analysis because the carry-over treatment effect may cause misleading results. The quality of the reports was evaluated based on the Jadad score [6], and those with scores of !3 were selected for this analysis. One of the authors (T.K.) initially selected the literature and extracted all the data. The literature was independently searched by another author (M.H.), who also independently confirmed each value.

Outcome parameters
The primary efficacy endpoint for this analysis was the change from baseline in forced expiratory volume in 1 s (FEV1(L)) as assessed using spirometry.

Stastistical methods
As a common comparator for ICs, PLB or mometasone (MOM) was selected. We first investigated whether the results of ICs were affected by the selection of a common comparator and then examined the results of ICs with PLB and DCs.
As described previously [4], for conducting ICs, we first carried out meta-analyses using the data reported in the literature between two assessed interventions using Review Manager software version 5. Mean difference analysis was conducted to assess the change from baseline in FEV1(L). We applied the random effect model in this study because some I 2 values in metaanalysis suggested the existence of heterogeneity. In conducting IC for each analysis, we followed Bucher et al.'s method [7] using meta-analysis data obtained using Review Manager which included inhaled CS vs MOM or vs PLB.
where D 1 , D 2 is the mean difference in the change from baseline in FEV1(L) obtained by metaanalysis between drug 1 or drug 2 and the common comparator; SE 1 , SE 2 is the standard error of the mean difference in the change from baseline in FEV1(L) obtained by meta-analysis between drug 1 or drug 2 and the common comparator; D IC is the mean difference in the change from baseline in FEV1(L) between drug 1 and drug 2 obtained by IC; and SE IC is the standard error of the mean difference in the change from baseline in FEV1(L) between drug 1 and drug 2 obtained by IC. The results were used to investigate statistical inconsistencies between IC and DC results or among ICs using different common comparators. The assumption of consistency can be evaluated by comparing D DC and D IC in a simple z-test [1]. We estimated the inconsistency in a closed loop as D inconsis = D DC −D IC (often called inconsistency factors) and its 95% confidence interval (95% CI) using SE inconsis ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where D DC is the mean difference obtained by DC and D IC is that obtained by IC; and SE DC , SE IC is the standard error of the mean difference in the change from baseline in FEV1(L) obtained by DC and IC, respectively. Inconsistency between ICs with MOM and PLB can be calculated using the same method. The 95% CI can be calculated as and can be applied to for the statistical evaluation of whether there is consistency between IC and DC results.

Eligible studies and characteristics
23 studies were identified by the literature search that fulfilled the selection criteria (Table 1, Fig. 1). The majority included a PLB arm, and PLBs can be used as common comparators in various comparisons. The number of studies that compared more than one active intervention was limited, however. Fluticasone propionate (FP) was compared with MOM in three studies, beclomethasone dipropionate (BDP) was compared with MOM in one study, and budesonide (BUD) was compared with MOM in one study (Fig. 2). Those studies contributed to the formation of a closed loop for the investigation of statistical inconsistency analysis. There were no studies comparing FP vs BDP, FP vs BUD, and BDP vs BUD. As a result, MOM was selected as an active common comparator for further investigation of common comparator effects. When more than one dose was used in a report, the highest-dose arm was selected as long as it was within the US Food and Drug Administration-approved level. Safety endpoint analysis was not conducted in this study since no critical event has been reported with inhaled CS for acute-phase treatment, and limited safety information was available in this dataset.
Indirect analysis of changes in FEV1(L) using PLB or MOM as a common comparator First, DCs between four inhaled CS and PLB or MOM were conducted. Subsequently, those DC data were applied for ICs between FP and BUD, FP and BDP, and BUD and BDP using PLB or MOM as a common comparator. The results of these analysis are shown in Table 2 as the mean difference (95% CI) of the change from baseline in FEV1(L). A Forest plot of those comparisons showed that there was no significant difference between ICs using PLB or MOM as a common comparator (Fig. 3). For example, in a comparison between BUD and BDP, the point estimate of mean difference was -0.09 (-0.20, 0.02) and -0.02 (-0.21, 0.17) when using PLB and MOM as the common comparator, respectively. The inconsistency was also evaluated ( Table 2). The inconsistency factor with 95% CI in each comparison was FP vs BUD, 0.

Direct and indirect analysis of changes in FEV1(L)
All head-to-head study data were used for direct analysis, allowing three DCs to be conducted (Table 3). Subsequently, the same comparisons were calculated for ICs using PLB as a common comparator. Table 3

Discussion
Inhaled CS were first launched in the 1960s and are now widely used for the treatment of asthma, recommended for both a control-based first-line treatment for mild asthma and as subsequent therapy in combination with a beta-2 agonist or leukotriene-receptor antagonist. We compared the efficacy of FP, BUD, BDP, and MOM in asthma patients using the DC and IC methods and attempted to determine the factors that may influence the DC and IC results. The effects of common comparator selection for ICs were also examined in this dataset as an example where little control group bias and placebo effect were expected. FEV1(L) change was   selected as the efficacy endpoint. A previous report [4] stated that commonly defined endpoints should be selected to obtain consistent results between IC and DC. FEV1(L) is an appropriate efficacy endpoint because it is widely used, well validated, and has less placebo effect and less control group bias compared with other assessments such as patient report outcome, the PANSS for schizophrenia patients which we previously investigated, or investigators' impression scales such as clinical global impression. Although control group bias potentially influences the results of ICs with different common comparators, this study provided useful insights on the effects of common comparators in ICs. Safety assessment data were collected as well, such as all-cause dropout rate and incidence of adverse events in this dataset. However, safety data were insufficient in the reports examined, and therefore safety parameters could not be reliably assessed. The 23 studies included in this meta-analysis involved patients with similar demographic characteristics, such as age, duration of disease, and baseline FEV1(L) values, which may have affected the results of DC and IC. In addition, this dataset contained only five head-to-head comparisons, although 28 reports in the literature involved comparisons with PLB. The reason for this is assumed to be sponsors' or investigators' intent to confirm the efficacy of a treatment intervention compared with PLB rather than to show noninferiority over an active comparator. PLB-controlled studies are easier to conduct from the viewpoints of number of patients required, approval by regulatory authorities, or investigation of the comparative safety profile of an intervention. Active-comparator studies usually require more patients when a noninferiority/superiority confirmation study is designed.
No statistically significant difference was observed in inconsistency analysis between IC and DC. Clinically and statistically, ICs with PLB and MOM showed no difference in the results of FEV1(L) analysis in this dataset where no control group bias was observed. The absolute difference in FEV1(L) in point estimates ranged 0 to 0.14. A difference of 0.23 L in FEV1(L) has minimal clinical meaning in PLB-controlled trials [31].
Inconsistent results were observed in DC and IC between BUD vs MOM. DC showed a statistically significant difference in favor of MOM, whereas IC did not. We included Corren et al.'s study [24] for this DC analysis, which suggested that a disparity in baseline lung function existed between the BUD and MOM patient groups. That baseline difference was assumed to result in an absolute value disparity in FEV1(L) between the MOM and BUD groups. That study may have affected the results of the mean FEV1(L) change analysis in the BUD groups in this dataset. While the value in the mean FEV1(L) change was approximately 0.25 L in both active-comparator and PLB-controlled studies, that in the BUD group in PLB-controlled studies was 0.12 L (Fig. 5). That 0.13-L difference is believed not to be caused by control group bias but by variability due to the inclusion of the study by Corren et al. and the small number of studies in the present meta-analysis. As summarized in Figs. 3 and 4 and Tables 2 and 3, our results showed that ICs yield results similar to DCs in the change from baseline in FEV1(L).
Regarding the selection of a common comparator, Salanti et al. addressed different PLB effects in ICs using four topical fluoride treatments and two control interventions (and no treatment) in preventing dental caries in children [32]. They found that the no-treatment group and four PLB groups (i.e., toothpaste, gel, rinse, and PLB varnish) had different clinical effects although they found no statistically significant difference in consistency analysis. Salanti et al. concluded that those comparators were not exchangeable and could not be merged to conduct mixed-treatment comparisons. Our results suggest a similar point. When we use a common comparator that has a different effect, e.g., placebo effect or control bias, that effect may lead to differing results and cannot be compared even if statistical consistency is observed. It should be emphasized that clinical investigations on merging evidence should be carefully conducted. In our study, we did not observe any effect of common comparator selection in ICs, probably because we used the well-validated endpoint of FEV1(L) for assessment and investigated efficacy in asthma patients where little placebo effect is expected and the double-dummy method and/ or other measures were appropriately applied to the studies included in this analysis. Similar investigations should be conducted to obtain relevant results in other ICs.
In pharmaceutical development, a PLB arm is frequently used to investigate the efficacy and safety of drugs, especially in dose-finding and early phase studies. However, it is less common in confirmatory phase 3 studies, especially when a difference in efficacy is apparent between PLB and active comparators. As described in detail previously [4], ICs can potentially be used to shorten the total development period in such situation by eliminating the step for confirmatory studies with approved drugs. Because if credible IC results can be obtained by using dose finding data with PLB or data from a trial with an active comparator, investigators can explore the efficacy and safety of a new investigational drug compared with current approved treatment. Our study shows that if the number of DC studies between active comparators is limited as in this dataset, one solution would be using PLB-controlled studies to conduct ICs if it is believed that there is little placebo effect, little control group bias in the dataset, and a well-validated endpoint is used. Under these conditions, ICs of active comparators could be expected to yield clinically meaningful results.
This report highlights the importance of common comparator selection. If the effects of both control group bias and number of DCs are limited, researchers are encouraged to use IC in a head-to-head approach because it can be expected to yield results similar to DC with fewer difficulties. One of limitations of the present study is that the number of studies using DCs and ICs is limited. Therefore further investigations and examples are necessary to clarify the importance of common comparator selection. Secondary it should be noted that selecting homogeneous population could be controversial for generalizability of the result. It can allow a good control of confounder and result as we have shown, on the other hand it might hinder generalizability of the result in more various settings. This point needs to be considered as well.

Conclusions
This study demonstrated that IC between inhaled CS can deliver results consistent with those of DC when using the change from baseline in FEV1(L) in asthma patients. It was also shown that using active comparators has similar results when control group bias is limited. It should be emphasized that determining the degree of control group bias is a key factor in conducting relevant, appropriate IC and selecting appropriate common comparators.