Heterogeneity in Comparisons of Discontinuation of Tumor Necrosis Factor Antagonists in Rheumatoid Arthritis - A Meta-Analysis

Objective We did a systematic review of studies comparing discontinuation of tumor necrosis factor alpha (TNF) antagonists in rheumatoid arthritis (RA) patients, pooled hazard ratios and assessed clinical and methodological heterogeneity. Methods We searched MEDLINE and EMBASE until June 2015 for pairwise hazard ratios for discontinuing infliximab, etanercept, and adalimumab from cohorts of RA patients. Hazard ratios were pooled using inverse variance weighting and random effects estimates of the combined hazard ratio were obtained. Clinical and methodological heterogeneity was assessed using the between-subgroup I-square statistics and meta-regression. Results Twenty-four unique studies were eligible and large heterogeneity (I-square statistics > 50%) was observed in all comparisons. Type of data, location, and order of treatment (first or second line) modified the magnitude and direction of discontinuation comparing infliximab with either adalimumab or etanercept; however, some heterogeneity remained. No effect modifier was identified when adalimumab and etanercept were compared. Conclusion Heterogeneity in studies comparing discontinuation of TNF antagonists in RA is partially explained by type of data, location, and order of treatment. Pooling hazard ratios for discontinuing TNF antagonists is inappropriate because largely unexplained heterogeneity was demonstrated when random effect estimates were calculated.


Methods
We searched MEDLINE and EMBASE until June 2015 for pairwise hazard ratios for discontinuing infliximab, etanercept, and adalimumab from cohorts of RA patients. Hazard ratios were pooled using inverse variance weighting and random effects estimates of the combined hazard ratio were obtained. Clinical and methodological heterogeneity was assessed using the between-subgroup I-square statistics and meta-regression.

Results
Twenty-four unique studies were eligible and large heterogeneity (I-square statistics > 50%) was observed in all comparisons. Type of data, location, and order of treatment (first or second line) modified the magnitude and direction of discontinuation comparing infliximab with either adalimumab or etanercept; however, some heterogeneity remained. No effect modifier was identified when adalimumab and etanercept were compared.

Conclusion
Heterogeneity in studies comparing discontinuation of TNF antagonists in RA is partially explained by type of data, location, and order of treatment. Pooling hazard ratios for

Introduction
The tumor necrosis factor alpha (TNF) antagonists target a cytokine that regulates inflammation in multiple diseases, including rheumatoid arthritis (RA) [1]. Evidence on the relative efficacy and safety of these medications is indirect and incomplete because no randomized controlled trials (RCTs) directly compare two or more TNF antagonists in RA patients [2]. Lack of efficacy and adverse effects are the most common reasons for discontinuing TNF antagonists [3][4][5][6][7][8][9], and therefore discontinuation risk is a good measure of the benefit-harm balance of these medications [10]. Hence, comparison of discontinuation risk of different TNF antagonists can help in treatment decisions, especially selection of an individual medication.
Since their introduction in the late 1990s, multiple observational studies have compared discontinuation of TNF antagonists, but the results were inconsistent [11][12][13][14][15] due to methodological and clinical heterogeneity. Methodological heterogeneity, defined as "variability in study design and risk of bias" [16], may be caused, for example, by differences in data collection. Clinical heterogeneity, defined as "variability in the participants, interventions and outcomes" [16], could be caused by differences in location and dates, or frequency of dose adjustments. A previous systematic review summarized hazard ratios for discontinuing TNF antagonists but failed to identify predictors of methodological or clinical heterogeneity [15]. The objective of this study is to investigate methodological and clinical heterogeneity in hazard ratios for discontinuing TNF antagonists in RA patients.

Selection criteria for studies
We included studies of RA patients treated with infliximab, adalimumab, or etanercept that met the following criteria: Study design. Cohort studies with multiple TNF antagonists. RCTs were excluded due to differences between RA patients in RCTs and those treated in routine clinical practice [17][18][19][20]. Studies were selected regardless of the language and the type of publication (full articles, abstracts, or conference proceedings).
Participants. RA patients, based on either the American College of Rheumatology diagnosis criteria [21,22] or the clinical judgment of the care-providing physicians. Studies of multiple diseases were included only if the outcomes of interest were presented separately for RA.

Types of interventions.
First or second line treatments with infliximab, adalimumab, or etanercept selected by the care-providing physician and/or the patient. Studies of the newer TNF antagonists, such as certolizumab pegol or golimumab, were excluded due to shorter availability and fewer studies [15].
Duration of follow-up. At least one year from treatment initiation. Outcome of interest. Pairwise hazard ratios for discontinuation: infliximab vs. etanercept, infliximab vs. adalimumab, and adalimumab vs. etanercept.

Data extraction
Two reviewers (AF and GG/DS) independently selected studies and extracted data. In case of a discrepancy, a decision was reached by consensus. Authors of published studies were contacted when reports were incomplete, confusing, or difficult to interpret. The reviewers extracted as-reported hazard ratios, and 95% confidence intervals (CI) or p-value. If the hazard ratio for a specific comparison was missing, we attempted to calculate it using indirect comparison methodology [23] or synthesis of estimates from subgroups. To prevent the use of duplicate or overlapping data from the same source, we selected a single hazard ratio from a fully-published manuscript with the largest population for each comparison and data source.

Risk of bias
We identify two specific sources of bias in studies of discontinuation and included only studies with low risk of bias, defined as: 1. The study outcome was discontinuing the individual medication or switching to a second biologic anti-rheumatic medication. Patients remaining on treatment at the end of the study period were censored.
2. Discontinuation was not associated with the likelihood to be included in the study; i.e., newuser design without mandatory minimum treatment duration. In prevalent-user design, patients who started treatment before the study period are included only if they are still treated at the beginning of the study; hence, patients with longer use are overrepresented.

Statistical analysis
Hazard ratios for discontinuation with 95% CI were combined using an inverse variance approach, and data were recorded on the natural logarithm scale [24]. We calculated random effect estimates [25] because substantial heterogeneity has previously been observed [11,15].
In the absence of a definitive statistical test to assess whether a factor causes heterogeneity, we identified effect modifiers. We tested for the association between the effect size and clinical factors: continent, order of treatment, age, sex, and Disease Activity Score (DAS-28) as well as methodological factors: type of data and duration of follow-up. Categorical factors consisted of continent (Europe, Asia, or America), order of treatment (first or second line), and type of data (clinical charts, disease or drug registries, or administrative claim data). For these factors, we conducted between-subgroup I-square statistics, and estimated the significance using chisquared test [26]. For continuous factors, i.e., age, sex, baseline DAS-28, and duration of follow-up, we conducted meta-regression [27] with a fixed effect model and weights based on the inverse of the variance of the logarithm of the hazard ratio. For factors that were reported as the average or the median of populations, we stratified the regression model by type of central measure. A significant association between a factor and the effect size was defined as a twotailed p-value <0.05 for both categorical and continuous variables. Analyses were conducted using the Review Manager (RevMan) statistical software (Version 5.3, The Nordic Cochrane Centre, The Cochrane Collaboration, Denmark) and SAS software package (Version 9.4, SAS Institute Inc., Cary, NC).

Results
A total of 2,409 unique citations were identified and screened (Fig 1), and 24 unique studies were eligible for inclusion (Table 1). Forty studies reported hazard ratios for discontinuing TNF antagonists but were excluded, most commonly because the study drugs were not compared (S1 Table in the on-line supporting information). Two of the studies were excluded due to high risk of bias [28,29]. Three studies reported outcomes from the SSTAG/ARTIS Swedish registry [5,30,31], two studies from the Spanish BIOBADASER 2.0 registry or hospitals contributing to it [6,32], two studies from the Italian Monitornet registry [33,34], two studies from the American claim database MarketScan [35,36], and three studies from the national insurance claim data or hospitals in South Korea [37,38] ( Table 1).
Fifteen studies (20,796 patients) from unique data sources compared infliximab and adalimumab with the overall pooled hazard ratio of 1.08 (95% C] 0.92-1.27) (S1 Fig). There was significant heterogeneity between studies for all three comparisons, with I square statistics of 86%, 92%, and 56%, respectively.
Assessment of methodological and clinical heterogeneity is presented in Table 2. In analysis of categorical factors, effect modifications of the type of data (Fig 2), location (Fig 3), and order of treatment (Fig 4) was observed in comparisons of infliximab with adalimumab or etanercept, but not comparing adalimumab with etanercept. This effect modification was expressed as I squared statistics of 69.1-92.7%, with p-value <0.05 in Chi squared test. These percentages could be interested as following: 69.1-92.7% of variation across subgroups in each comparison is due to heterogeneity rather than chance. We also noticed that in all comparisons, not all subgroup hazard ratios reach statistical significance level and in most cases a residual within subgroup heterogeneity was observed. For example, in analysis of type of data ( Fig  2), when comparing infliximab with etanercept, we observed significant heterogeneity between the three subgroups compared: studies based on clinical charts, those conducted on registries and analyses of claim data (I square statistics of 69.1%). Only studies conducted on registries had a significant pooled hazard ratio of 1.49 (95% CI 1.23-1.81), but they also consisted the largest subgroup. A reversed direction of hazard ratio was estimated in two studies based on clinical charts and three analyses of claim data, i.e., lower risk of discontinuing infliximab, but these polled estimates did not reach significance level. We noticed residual heterogeneity within each subgroup: clinical chart, registries, and claim data.
In analysis of continuous factors (Table 2), the proportion of female patients using infliximab modified the hazard ratio in comparison of infliximab with etanercept. However, in the presence of multiple comparisons and in the absence of similar effect of the proportion of female patients using etanercept we discarded this finding. Finally, duration of follow up, age, and baseline DAS-28 did not modify the hazard ratios (Table 2).

Discussion
This review explored sources of clinical and methodological heterogeneity in studies comparing discontinuation of TNF antagonists in RA patients. The type of data (i.e. charts, registries,   or claims) modified the effect size in comparisons of infliximab with etanercept or adalimumab. However, this factor was not responsible for all the heterogeneity. Different types of data are susceptible to different types of biases. Registries are susceptible to selection bias caused by the volunteer enrollment and data collection [56]. Administrative data are susceptible to confounding due to the absence of clinical variables and exposure ascertainment bias because of the uncertainty whether patients who refilled the medication actually used it. Type of data also determines how the outcome, discontinuation, is defined. In analysis of registries or medical charts, discontinuation is recorded by physicians, either during a routine visit or in real-time.
In analysis of administrative data, discontinuation is usually ascertained using prescriptionrefill analysis and applying grace periods [57]. Comparisons of discontinuing TNF antagonists are especially sensitive to these differences in outcome definition because of the intermittent dosing schedules and different lengths of dose interval for different medications. Comparisons of infliximab were more sensitive to the data source probably because it has a significantly longer dose interval than adalimumab and etanercept. A second hazard modifier is location. In European countries, the risk of discontinuing etanercept and adalimumab is lower compared to infliximab, but in America, patients on infliximab had lower discontinuation risk compared to adalimumab and similar risk as patients treated with etanercept. In a previous review reported similar proportions of patients from European and non-European countries who discontinued any TNF antagonists [15], but the results were not presented separately for each individual medication. Souto et al [15] failed to determine whether these findings are constant across different medications. Hazard ratio estimates were also modified by the order of treatment (first or second line) in comparisons of infliximab with adalimumab or etanercept. However, in these comparisons the only two studies that reported hazard ratios for second line treatment were American studies Therefore, we cannot rule out that the modification observed is related to location and not to order of treatment.
Age, sex, baseline disease activity score (DAS), and duration of follow-up did not modify the hazard ratios. The absence of modification by baseline DAS opposes the hypothesis by Greenberg 2014 [58] that the difference in estimates between American and European studies is caused by differences in disease severity.
The results of this review question the reliability of hazard ratios for discontinuing TNF antagonists. Specifically, the residual heterogeneity within subgroups may indicate that stable results cannot be duplicated by different researchers nor can conclusive scientific findings be obtained. Alternately, researchers may not be measuring the same outcome because different types of data, and possibly different definitions of discontinuation, modified the hazard ratios. Standardization of methodological approaches may help achieving the requisite reliability.
There are several limitations to our study. First, we were unable to adequately assess risk of bias in the absence of a specific evaluation tool for discontinuation studies. Available tools for observational studies, such as Newcastle-Ottawa scale [59], do not assess relevant items such as new-user design and ascertainment of discontinuation. The other tools, e.g., STROBE statement [60], assess the quality of reporting and not the risk of bias. Second, in the absence of a statistical test to determine causes of heterogeneity between studies, we could only assess effect modification. Last, we found significant residual heterogeneity within many of the subgroups and therefore pooled estimates were impossible to interpret.
This review had several strengths including the wide scope: no temporal or linguistic constraints. Second, to minimize bias, this review included only studies reporting adjusted hazard ratios for discontinuation. Earlier systematic reviews summarized proportions of discontinuation for each TNF antagonist individually [14,15]. Because these proportions were crude estimates from observational data, comparisons between medications were most likely confounded. Last, we identified two major risks of bias in discontinuation studies and applied them in study selection.

Conclusions
Substantial heterogeneity was found in studies estimating head-to-head hazard ratios for discontinuing TNF antagonists in RA patients due to differences in type of data, location, and order of treatment. The heterogeneity observed shows that stable results have not been duplicated by different researchers and conclusive scientific findings cannot be obtained by pooling results.