To compare treatment persistence between two dosages of interferon β-1a in a large observational multiple sclerosis registry and assess disease outcomes of first line MS treatment at these dosages using propensity scoring to adjust for baseline imbalance in disease characteristics.
Treatment discontinuations were evaluated in all patients within the MSBase registry who commenced interferon β-1a SC thrice weekly (n = 4678). Furthermore, we assessed 2-year clinical outcomes in 1220 patients treated with interferon β-1a in either dosage (22 µg or 44 µg) as their first disease modifying agent, matched on propensity score calculated from pre-treatment demographic and clinical variables. A subgroup analysis was performed on 456 matched patients who also had baseline MRI variables recorded.
Overall, 4054 treatment discontinuations were recorded in 3059 patients. The patients receiving the lower interferon dosage were more likely to discontinue treatment than those with the higher dosage (25% vs. 20% annual probability of discontinuation, respectively). This was seen in discontinuations with reasons recorded as “lack of efficacy” (3.3% vs. 1.7%), “scheduled stop” (2.2% vs. 1.3%) or without the reason recorded (16.7% vs. 13.3% annual discontinuation rate, 22 µg vs. 44 µg dosage, respectively). Propensity score was determined by treating centre and disability (score without MRI parameters) or centre, sex and number of contrast-enhancing lesions (score including MRI parameters). No differences in clinical outcomes at two years (relapse rate, time relapse-free and disability) were observed between the matched patients treated with either of the interferon dosages.
Treatment discontinuations were more common in interferon β-1a 22 µg SC thrice weekly. However, 2-year clinical outcomes did not differ between patients receiving the different dosages, thus replicating in a registry dataset derived from “real-world” database the results of the pivotal randomised trial. Propensity score matching effectively minimised baseline covariate imbalance between two directly compared sub-populations from a large observational registry.
Citation: Kalincik T, Spelman T, Trojano M, Duquette P, Izquierdo G, Grammond P, et al. (2013) Persistence on Therapy and Propensity Matched Outcome Comparison of Two Subcutaneous Interferon Beta 1a Dosages for Multiple Sclerosis. PLoS ONE 8(5): e63480. https://doi.org/10.1371/journal.pone.0063480
Editor: Tobias Derfuss, University Hospital Basel, Switzerland
Received: January 27, 2013; Accepted: April 2, 2013; Published: May 21, 2013
Copyright: © 2013 Kalincik et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This study was funded by MSBase Foundation, a not-for-profit organisation. The MSBase Foundation receives financial support from Merck Serono, Biogen Idec, Novartis Pharma, Bayer Schering and Sanofi Aventis. The study was also funded by Multiple Sclerosis Research Australia and MSAngels. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have the following conflicts: TK received travel grants and honoraria from Novartis, Biogen Idec, Sanofi Aventis, Teva and Merck Serono. TS received travel grant from Biogen Idec. MT received honoraria from Biogen-Idec, Bayer-Schering, Sanofi Aventis, Merck-Serono, Teva and Novartis; and research grants from Biogen-Idec, Merck-Serono, and Novartis. GI received honoraria from Biogen-Idec, Novartis, Sanofi, Serono and Teva. AL is a Bayer Schering, Biogen Idec, Genzyme, Merck Serono Advisory Board Member. She received travel and research grants and honoraria from Bayer Schering, Biogen Idec, Merck Serono, Novartis, Sanofi Aventis and Teva, travel and research grants from the Associazione Italiana Sclerosi Multipla and is consultant of Fondazione Cesare Serono. EC received honoraria as scientific advisory board consultant from Biogen-Idec, Bayer-Schering, Merck-Serono, Genzyme and Novartis; has participated in research projects by Merck-Serono, Roche and Novartis. VvP has served on advisory boards for Biogen Idec and Genzyme; received travel grants from Biogen Idec, Bayer Schering, Sanofi Aventis, Merck Serono and Novartis Pharma and honoraria from Biogen Idec, Teva and Novartis Pharma. FGM received honoraria from Biogen Idec, Genzyme, Novartis and Roche. COG received honoraria as scientific advisory board consultant from Biogen-Idec, Bayer-Schering, Merck-Serono, Teva and Novartis; has participated in research projects by Biogen-Idec, GSK, Teva and Novartis. MPA received honoraria as consultant on scientific advisory boards by Biogen-Idec, Bayer-Schering, Merck-Serono, Teva and Sanofi-Aventis; has received research grants by Biogen-Idec, Bayer-Schering, Merck-Serono, Teva and Novartis. TP has recieved funding or speaker honoraria from Biogen Idec, Merck Serono, Novartis, Bayer Schering, Sanofi-Aventis, Roche, and Genzyme. RB received honoraria from Bayer Schering, Biogen, Novartis, Sanofi-Aventis, Teva; research grants from Bayer Schering, Biogen, Novartis, Sanofi-Aventis, Teva; congress and travel expense compensations from Bayer Schering, Biogen, Novartis, Sanofi-Aventis, Teva. GI had travel/accommodations/meeting expenses funded by Bayer Schering, Biogen Idec, Merck Serono, Novartis, Sanofi Aventis, and Teva. JLS’s institution receives support from Biogen Idec, Genzyme Sanofi, Merck Serono and Novartis. FM has participated in clinical trials sponsored by EMD Serono and Novartis. RA received honororia from Biologix, Bayer, Merck Sorono, GSK and Novartis, and served on advisory board for Biologix, Novartis and Merck Sorono. HB has served on scientific advisory boards for Biogen Idec, Novartis and Sanofi-Aventis, has received travel support from Novartis, Biogen Idec and Sanofi Aventis, serves on steering committees for trials conducted by Biogen Idec and Novartis, and has received research support from Merck Serono, Novartis and Biogen Idec. PD, PG, RH, DLS, MER, SF, GG, AS, RFB, CB, ND, OG, FV, MF, MB, VS, MS, MLS, CS, KK, TPB, LdBM, JC, ES, JH, DP, MN, EABB, WOA, MP, SV and JACG did not declare any competing interests. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.
Primary evidence of therapeutic efficacy is provided by randomised controlled trials (RCT). However, RCTs require substantial amount of resources, are time-consuming, associated with significant costs and employ highly specific selection criteria. Therefore, patients included in RCTs might not be representative of the general MS population. Additionally, many potential treatment comparisons will never be subjected to RCTs because of lack of commercial interest and large sample sizes required to show a difference.
Multicentre observational databases have the potential to describe large, longitudinally evaluated and prospectively assessed cohorts representative of general populations with specific conditions. The MSBase registry is an international, observational database collecting longitudinal data from a large population of patients with multiple sclerosis (MS; n = 18,886 in February 2012). This patient population is representative of patients managed in academic MS centres, which typically also recruit patients for RCTs.  Analyses of treatment outcomes in observational registries such as MSBase are susceptible to significant biases, e.g. confounding by treatment indication, recall bias or detection bias.  In such analyses, appropriate methods of bias reduction are required and need to be validated. The propensity scoring method is commonly employed to estimate the effect of multiple potential confounders on treatment assignment. ,  The result, a single propensity score per case, is then used to adjust for individual confounders of treatment assignment through subject selection, matching or outcome weighting –.
The pivotal RCT of interferon (IFN) β-1a SC three times weekly vs. placebo (Prevention of Relapses and Disability by IFN β-1a Subcutaneously in MS, PRISMS) provided the primary evidence of its clinical effect in relapsing-remitting MS.  In this RCT, clinical efficacy was no different between the two tested dosages (22 µg vs. 44 µg). After documenting treatment persistence of first-line use of these IFN dosages in the MSBase dataset, we assessed clinical outcomes between two propensity score-matched subpopulations of patients treated with either of the dosages as first line therapy and compared these results to those obtained in the PRISMS RCT.
Patients and Methods
The MSBase registry was approved by the Melbourne Health Human Research Ethics Committee, and by the local ethics committees in all participating centres (or exemptions granted, according to applicable local laws and regulations). If required, written informed consent was obtained from enrolled patients.
Database and Study Population
Data extracted from MSBase in February 2012 comprised longitudinal clinical data of more than 100,000 patient-years from 18,886 patients from 55 MS centres in 25 countries. All subjects with data recorded within the MSBase registry who received at least one dose of IFNβ-1a SC (Rebif; Merck Serono, Geneva, Switzerland) prior to February 2012 were included in the treatment discontinuation analysis.
The primary analysis of treatment outcomes was performed in patients treated with first-line Rebif in either available dose (i.e. 22 µg or 44 µg SC three times weekly) for at least two consecutive years, with no previous exposure to other disease modifying or immunosuppressive therapy and without switching between the doses. A prerequisite was availability of demographic and clinical information (including measures of disability and relapse activity) throughout the two-year follow-up period. Patients were excluded on the basis of long disease duration (>10 years from disease onset) and low disease activity (no relapses within the two years preceding baseline), in order to approximate the PRISMS study population.
A secondary analysis was performed in a subset of patients with investigator-classified cerebral MRI scans within the two years prior to the baseline visit. This subset was used to calculate a different propensity score including the MRI variables.
The data were recorded in a prospective, observational manner, as a part of routine clinical practice. Information about MS-related outcomes was updated during clinic visits, using the iMed patient record system to enter data at each of the participating centres. Disability was scored by accredited scorers using the Expanded Disability Status Scale (EDSS). Quality of the EDSS assessment was assured by the requirement of online Neurostatus certification at each of the participating centres. Date of onset of clinical relapses was recorded. Annualised relapse rate (ARR) was calculated based on the relapse onsets recorded within the year preceding treatment initiation (baseline relapse activity) and the two years following the baseline (on-treatment relapse activity). Duration of MS was estimated as the time since the patient-reported first clinical manifestation of the disease (recorded retrospectively). The presence, relationships and number of relatives with the diagnosis of MS was recorded in a proportion of patients. MRI brain scans were performed as part of routine clinical practice at each of the participating centres. Availability of T2-weighted imaging with locally reported number of hyperintense cerebral T2 lesions (categorised as 1–8 or 9+ per scan) was the minimum prerequisite for inclusion in the secondary analysis. If gadolinium-containing contrast was administered according to local procedures, gadolinium-enhancing lesions (Gd+) were evaluated as present or absent.
To assure quality of the analysed data, only information from centres with at least 10 active records was used, as stipulated in the study protocol. The minimum prerequisite was at least annual data updates. For all events, including new symptoms, clinical relapses, quantification of disability, changes in disease course, MRI and laboratory investigations and adverse events, a date of event onset was required. Prior to analysis the recorded data were verified using a series of automated procedures to identify any invalid or inconsistent entries.
Analysis of Treatment Discontinuation and Switch
Statistical analyses were carried out using Statistica 10 (Statsoft, Tulsa, OK, USA) and R software (http://www.R-project.org). Incidence of treatment discontinuation events with respect to the recorded reasons for discontinuation was compared between the treatment dosages using the Andersen-Gill models with Efron approximation method. These models are used to model time to recurrent events, compensating for highly variable treatment exposure and the fact that each subject could consecutively receive multiple treatments. The models were adjusted for patient age, sex and country. In selected variables, a “missing” value was allowed in order to avoid patient exclusions. Cases were censored at the time of the last visit unless the time of treatment discontinuation event was specified. Goodness of model fit was evaluated using the Akaike information criterion. Initiation of Rebif 44 µg within a month of discontinuing Rebif 22 µg was considered as treatment escalation. Similarly, treatment with Rebif 22 µg within a month of discontinuing Rebif 44 µg was viewed as treatment de-escalation.
Analysis of Treatment Outcomes
Treatment outcomes were analysed within selected populations of patients (see above) matched based on their propensity of assignment to treatment dosage. All matching procedures were performed using R, the MatchIt package.  The propensity score was calculated using a logistic regression model with the outcome variable represented by assignment to the Rebif dosage (with Rebif 22 µg set as the reference category). The model excluding MRI data was built using the following variables: age, disease duration, ARR, EDSS category, disease course, number of relatives with MS and MS centre. The model including the baseline MRI data contained two additional variables, the number of cerebral T2 lesions (categorical, 1–8 or 9+) and the Gd+ lesion status (not given, 0 or 1+). No interaction terms were included. The individual propensity scores (with and without MRI findings) were calculated as weighted sums of those variables with non-zero weights (at 0.1 level of statistical significance).
Patients in the two treatment groups were then matched in a 1∶1 ratio using nearest neighbour matching without replacement and discarding from both groups the cases outside the common support of the distance measure (i.e. the common hull of the pooled propensity scores). ,  Closeness of the match between the matched patients was evaluated using cumulative and average distances, analysis of standardised differences and tests of statistical significance (paired t-test and McNemar test). After assessing normality of data distribution, treatment outcomes were compared between the propensity score-matched patients with Wilcoxon signed-rank test (EDSS, change in EDSS and ARR) and McNemar test (relapse status) as appropriate. Time free from relapse was estimated by Kaplan-Meier analysis and proportions of relapse-free patients were compared between the groups with Log-rank test censored at two years. Cumulative hazard of multiple relapses was estimated and compared between the groups with the Andersen-Gill model (see above). Since the differences in the baseline variables were accounted for during the matching procedure, no further adjustments for potential confounders were performed. All reported p-values are two-tailed and for each analysis p≤0.05 was considered significant. The number of hypothesis-testing procedures was low, therefore no adjustment for multiple hypothesis testing was applied. Power within the used statistical models was estimated.
Discontinuation of Treatment
Among the 18,886 patients included in the MSBase registry as of February 2012, we identified 4678 patients exposed to Rebif. Of these, 1188 (72% females) were treated with the 22 µg dosage, 2488 (71% females) were treated with the 44 µg dosage and 1002 (72% females) patients received both the dosages at various times. The average patient age was 36±10 years and disease duration was 7±7 years (mean ± SD), for both treatment dosages at the time of their first initiation. Median treatment period was 2.1 and 2.5 years for the 22 µg and 44 µg dosages, respectively. Total patient years of follow up were 6480 for the 22 µg and 11,432 for the 44 µg dosage. Distribution of time on treatment is shown in Figure 1. It can be seen that the number of patients treated with the 22 µg dose for less than 1 year was disproportionately high compared to the longer treatment durations. In total, 4054 treatment discontinuations were recorded in 3059 patients, 1808 from Rebif 22 µg and 2246 from Rebif 44 µg. There were 192 dosage escalations occurring within the initial 12 months of treatment with Rebif 22 µg, and these were excluded from further analyses (red bar in Figure 1). Table 1 provides an overview of the recorded reasons for treatment discontinuation. It is worth noting that in a substantial proportion of cases, the reason for discontinuation was not specified (68%). The annual probability of treatment discontinuation reached 25% in patients on Rebif 22 µg and 20% in patients on Rebif 44 µg. For more detailed list of annual probabilities categorised by the recorded reasons for discontinuation, see Table 1. After adjusting for time on treatment, age, sex and country, the patients treated with Rebif 22 µg were more likely to discontinue treatment than those with Rebif 44 µg (hazard ratio (HR) = 1.4, p = 10−16, Andersen-Gill model, see Figure 2). This difference was apparent in the sub-group analysis with the reason for discontinuation specified as lack of improvement/progression of disease (HR = 1.7, p = 10−6), scheduled stop/convenience (HR = 1.6, p = 0.001) or without the reason recorded (HR = 1.5, p = 10−16). In contrast, the discontinuation rates due to adverse events/lack of tolerance did not significantly differ between the treatment groups (p = 0.98, Andersen-Gill models).
Numbers of patients treated with Rebif recorded within the MSBase registry (n = 4678) and stratified by time on treatment are shown. Red bar in year 1 indicates the proportion of patients in whom dose escalation was a planned procedure. TIW, three times weekly.
Overall proportion of treatment discontinuations in patients treated with either Rebif dosage is shown (left). Discontinuation rates by the recorded reasons are shown. Hazard ratio (HR) is given where significantly different from 1, dashed lines represent 95% confidence intervals. Planned dose escalations within the first year of treatment are not included. HR, hazard ratio; TIW, three times weekly.
Of the recorded discontinuation events, 466 were evaluated as escalations of Rebif dosage (including the 192 escalations occurring within the initial year of treatment). Apart from the 356 events with the reason not recorded, the most frequent reason for escalation was lack of improvement/progression of disease (94). Similarly, 123 discontinuation events were considered to be de-escalations of the Rebif dosage. The reason was not specified in 79 cases and an adverse event/lack of tolerance was recorded in 41 cases.
Disease Outcomes: Validation of Propensity Matched Outcome Analysis
To directly compare clinical outcomes of treatment with Rebif 22 µg and 44 µg as the first disease modifying treatment used for at least two consecutive years, 614 and 682 patients were selected, respectively (for baseline characteristics see Table 2). The propensity score (i.e. the likelihood of assignment to the 44 µg Rebif dosage) not including any MRI parameters was determined predominantly by the MS centre (OR = 0.05–15, p≥10−7, logistic regression, see Table S1). In addition, the score was increased by the absence of neurological disability (i.e. by EDSS step 0; OR = 1.8, p = 0.07). After applying the nearest matching procedure, 610 patients were retained in each of the treatment groups. Summative distance between the propensity scores of the matched groups decreased from 229 to 159, with the average pairwise distance decreasing from 0.34±0.12 to 0.26±0.13 per patient (mean ± SD). Characteristics of the matched patients are given in Table 2. No marked differences in the recorded variables were seen between the matched groups.
Table 3 compares the clinical outcomes between the matched groups after two years of treatment with either Rebif dosage. Neither EDSS nor ARR differed significantly between the groups (p≥0.5, signed-rank test). ARR was reduced by 66% and 68% compared to baseline in the lower and the higher dosage groups, respectively. Proportions of patients free from relapses after two years were 49% and 50% in the Rebif 22 µg and 44 µg groups, respectively (p = 0.8, McNemar test), with time to first relapse (p = 0.9, Log-rank test, see Figure 3) and cumulative risk of relapses comparable between the treatment groups (p = 0.5, Andersen-Gill model). Power contained within the statistical models was sufficient to uncover treatment effects of the following sizes at 90% power and the specified level of statistical significance: EDSS, 0.25; change in EDSS, 0.18; ARR, 0.09; cumulative relapse risk, 0.1.
No statistically significant differences between the treatment dosages were observed. MRI, magnetic resonance imaging; TIW, three times weekly.
The propensity score involving semi-quantitative MRI parameters at baseline was determined predominantly by the MS centre (OR = 0.2–7, p≥0.0001, logistic regression). In addition, men (OR = 2, p = 0.002) and patients with 9 or more T2 lesions (OR = 1.8, p = 0.09) were more likely to receive Rebif 44 µg. The matching procedure retained 226 patients in each group, with summative distance between the propensity scores of the groups decreasing from 105 to 44 and the average pairwise distance decreasing from 0.36±0.12 to 0.2±0.1 per patient (mean ± SD). Table 4 provides group characteristics before and after matching. Despite the overall decrease in distance between the two dosage groups, statistically significant differences in age and the number of hyperintense T2-lesions were not eliminated by the matching procedure.
Clinical outcomes in this analysis inclusive of baseline MRI were similar to the outcomes of the larger comparative analysis detailed above (Table 3). Both EDSS and ARR at two years were comparable between the matched groups (p≥0.9, signed-rank test). ARR was reduced by 72% and 71% compared to baseline in the lower and the higher dosage groups, respectively. Proportions of patients free from relapses at two years were 46% and 51% in the Rebif 22 µg and 44 µg groups, respectively (p = 0.7, McNemar test), with time to first relapse (p = 0.1, Log-rank test, see Figure 3) and cumulative risk of relapses similar in both groups (p = 0.9; Andersen-Gill model). The models contained 90% power at the specified level of statistical significance to uncover effect sizes as follows: EDSS, 0.4; change in EDSS, 0.31; ARR, 0.13; cumulative relapse risk, 0.2.
Using data from a large clinical practice MS registry, MSBase, we have shown that patients with IFNβ-1a SC thrice weekly (Rebif) in the 22 µg dosage are more likely to discontinue treatment than those receiving Rebif in the 44 µg dosage. Annual discontinuation rates reached 25% and 20% in the two treatment dosages, respectively. Compared to Rebif 44 µg, the 22 µg dosage was more often discontinued due to perceived insufficient effect or a scheduled stop. In order to compare clinical outcomes of the original PRISMS trial with real-world practice, we performed propensity score-matched pairwise analyses of patients receiving either dosage of Rebif as first-line MS therapy who continued on their respective dosage for at least two years. In agreement with the PRISMS trial, our closely matched populations did not show any effect of Rebif dosage on two-year clinical outcomes.
The mean annual probability of discontinuing Rebif within the MSBase registry was 23%, which has markedly exceeded the treatment discontinuation rate reported in the PRISMS study (10–11% over two years).  Similarly, the annual discontinuation rates due to reported adverse events were marginally higher in our study compared to the PRISMS trial (3% and 1.5–2.4%, respectively). Interestingly, the PRISMS and the EVIDENCE trials reported a dose-dependent incidence of adverse events. Namely, decreases in leukocyte, neutrophil and lymphocyte counts, increase in aminotransferase levels and injection site reactions were found to be more frequent in the groups with higher dosages of IFNβ. ,  In the present study, we have shown a similar trend towards higher annual discontinuation rates due to adverse events/lack of tolerance in patients receiving Rebif in the higher dosage, however, this did not reach statistical significance.
It could be argued that an expected better efficacy of the higher Rebif dosage (as perceived by patients and clinicians) could have inflated the discontinuation rate in the Rebif 22 µg group. In this case the discontinuation events would most likely be followed by dose escalations. Since the instances of increase in the Rebif dosage from 22 µg to 44 µg were not included in the analysis of discontinuation events, we assume that the effect of the perceived different therapeutic efficacy on treatment discontinuation was minimal. Overall, the dose escalation was a commonly observed phenomenon (466 cases, i.e. 26% of all discontinuation events in the Rebif 22 µg group). Even though lack of effect was the most commonly specified reason for escalation (in 20% of escalations), the reason was unspecified for 76% escalation events. It is worth noting that almost half of the escalations took place within the first year of treatment initiation, of which 83% were unspecified. We presume that a high proportion of the early escalations were likely planned as part of routine treatment initiation procedure used at some centres. In agreement with this is the observation that scheduled stop as a reason for discontinuation was more commonly recorded among patients treated with Rebif 22 µg.
Baseline characteristics of the MSBase cohort included in this study and the PRISMS study were remarkably similar. Patients had mean disease duration of 4 years in the MSBase study and 5.3 years in the PRISMS study, with the median EDSS of 2 and 2.5, respectively. Baseline mean ARR was only marginally different between the MSBase and PRISMS studies (1.3 vs. 1.5, respectively). Outcomes of the propensity-matched Rebif dosage comparison confirmed a lack of any statistically significant dose-dependent differences in relapse frequency or disability, as demonstrated in PRISMS.  Interestingly, our observed on-treatment ARR was 0.4 (for each dosage), while the PRISMS reported ARR of 0.91 and 0.86 after two years of treatment with Rebif 22 µg and 44 µg, respectively. If this difference is to be attributed to a potential under-reporting of relapses in the MSBase registry, it should be noted that this, if present, would in all likelihood apply to either of the treatment groups equally, and thus would be unlikely to confound the analysis comparing the outcomes of the two Rebif dosages. Reassuringly, our reported ARR is comparable to the ARR reported in patients receiving IFNβ-1a in the most recent RCTs (0.3–0.4). ,  Also, the reduction of ARR (66–72%) and proportion of relapse-free patients (46–51%) at two years were substantially higher in our study than in the PRISMS trial (39–42% and 27–32%, respectively). Finally, we showed a much less pronounced increase in EDSS over two years (0–0.1) compared to the PRISMS study (0.23–0.24). The PRISMS trial also showed a dose-dependent effect of IFNβ-1a on MRI parameters, which we were not able to assess, as the quantitative MRI data are not routinely recorded in the MSBase registry. The major difference potentially accounting for these large absolute outcome differences between the MSBase study and the PRISMS randomised trial is the fact that we only included patients with a two-year treatment completion at either dose of Rebif. We know that annualised discontinuation rates of Rebif in the MSBase dataset amount to 23%, therefore the patients with poor relapse control were likely to be differentially lost from the two studies. Nonetheless, the results suggest high treatment efficacy over two years in real-world patients treated with Rebif (at either dose) as their first DMD.
Importantly, we were able to derive a large patient sub-population from the MSBase clinical practice registry with different initial treatment assignations (largely determined by centre preference) whose two-year outcomes could be compared using patient pairs that were determined with propensity-score baseline covariate matching. We obtained a similar primary result (i.e. the lack of dosage-dependent treatment effect) to that obtained in the pivotal randomised trial examining the same treatment outcomes. We therefore believe that imbalance within patient populations non-randomly assigned to different treatment can potentially be controlled with propensity-based methods. Such methods include weighting, stratification, matching and covariate adjustment. Studies in observational cohorts of patients with MS had previously employed propensity score-weighted analyses to evaluate disease outcomes, , – propensity score-based stratification to assess long-term benefits of early versus delayed immunomodulatory treatment ,  and propensity score matching to evaluate sex difference in response to IFNβ.  Combinations of propensity score stratification with other methods, such as recursive partitioning, were also tested .
While our approach provided sufficient power for the subsequent analyses and resulted in a patient sample that was likely to be representative of patient populations at MS centres, it did not eliminate the bias potentially introduced by unknown confounders. To ameliorate this risk, we have accounted for the location-specific hidden confounders (e.g. centre-specific dose preferences) by adjusting our models for treating centre. As the matching algorithm, we have chosen the nearest neighbour procedure in a 1∶1 ratio with a relatively benevolent criterion for excluding the cases outside the hull of the pooled distance measure.  Even though this did not result in a perfect overlap of the propensity scores between the two matched populations, it still led to a marked decrease in the mean distance between the matched groups. For a perfect overlap to be achieved, a stricter matching criterion would have been required, which in turn would have resulted in exclusion of a high number of patients and unnecessary loss of power. We have therefore chosen to use the criterion that allowed us to preserve power while achieving a satisfactory match.
We also adjusted our statistical models for age, sex and country, which we have shown to be related to treatment discontinuation.  However, we were unable to adjust the analyses for change in disability, as this was usually not recorded at the time of treatment discontinuation. Moreover, we were unable to include information about relapse severity and recovery, which was often missing and the resulting statistical models would most probably be overfitted. A potential under-estimation of the frequency of treatment discontinuations due to specific reasons could stem from the relatively high proportion of discontinuation events with the reason not specified. Also, baseline cerebral MRI data were missing in the majority of patients. However, a propensity-matched subgroup analysis including MRI did not yield results different to the subgroup analysis excluding MRI. It is of note that the quality of the MRI data were likely to be variable, as they were provided by the clinicians using a semi-quantitative evaluation of MRI lesions carried out in a number of scanners with variable protocols. However, the number of hyperintense T2-lesions and the presence/absence of Gd+ lesions were probably the MRI characteristics that were most likely to influence clinical decision-making with respect to DMD choice. It should also be noted that the quality of clinical data recorded in observational registries such as MSBase is unlikely to be similar to the quality of data originating from RCTs during the on-treatment period. Paradoxically, the quality of data pertaining to the pre-treatment time is actually likely to be better, as it is generally prospectively recorded in MSBase prior to treatment start, whereas in clinical trials disease and relapse history is typically collected retrospectively. Finally, the inclusion criterion of sustained therapy with Rebif for at least two years resulted in bias towards selecting patients with more satisfactory treatment response. We presume that this bias influenced either of the dosage groups symmetrically and did not confound the comparison of disease outcomes between the groups.
In this study, we have shown that direct real-world treatment comparisons can be conducted on registry data. Using the global MSBase registry data, we conducted a propensity score-based pairwise patient selection method to compare treatment outcomes between two doses of IFN β-1a thrice weekly (Rebif 22 µg vs. Rebif 44 µg). The dosage comparisons in our study with respect to differences in relapse rate and EDSS change mirrored those obtained from the pivotal RCT and enabled their broader generalisation. This method could be of increasing importance for head-to-head evaluation of the rapidly increasing number of disease modifying therapies in MS, many of which will never be compared to each other in RCTs. Although we do not claim that the results produced by the analyses of the observational registries can substitute for RCTs, we believe that the described technique represents a useful and feasible option when RCTs are not feasible or unlikely to be conducted.
Assignation to treatment dosage by treating centres. The table shows number of patients assigned to either Rebif dosage at each of the participating centres. Odds relative to the reference centre (IT-002) of assignation to the higher dosage are given. The results were incorporated in the individual propensity scores.
MSBase study group co-investigators: From the Centre hospitalier del’Universite de Montreal, Hopital Notre-Dame, Canada, Dr Elaine Roger and Dr Pierre Despault; from the Royal Melbourne Hospital, Australia, Dr Mark Marriott, Dr Anneke Van der Walt, Dr John King, Dr Trevor Kilpatrick, Dr Katherine Buzzard, Dr Vilija Jokubaitis, Dr Jill Byron and Ms Lisa Morgan; from Box Hill Hospital, Monash University, Australia, Dr Olga Skibina and Ms Jodi Haartsen; from Department of Neuroscience and Imaging, University ‘G. d’Annunzio’, Italy, Dr Giovanna De Luca, Dr Valeria Di Tommaso, Dr Daniela Travaglini, Dr Erika Pietrolongo, Dr Maria di Ioia, Dr Deborah Farina and Dr Luca Mancinelli; from University of Bari, Italy, Dr Damiano Paolicelli and Dr Pietro Iaffaldano; from Hospital Italiano, Argentina, Dr Juan Ignacio Rojas and Dr Liliana Patrucco; from Hopital Tenon, Paris, France, Dr Etienne Roullet; from FLENI, Argentina, Dr Jorge Correale and Dr Celica Ysrraelit; from Ospedale di Macerata, Italy, Dr Elisabetta Cartechini and Mr Eugenio Pucci; from John Hunter Hospital, Australia, Dr David Williams and Dr Lisa Dark; from Al-Zahra Hospital, Iran, Dr Vahid Shaygannejad; from MS-Centrum Nijmegen, Nijmegen, The Netherlands, Dr Cees Zwanikken; from Mater Dei Hospital, Malta, Dr Norbert Vella; and from Central Clinical Emergency Military Hospital, Dr Carmen-Adella Sirbu.
Conceived and designed the experiments: TK HB. Performed the experiments: TS BS EvM TK MT AL EC VVP FGM CO-G MPA RB G. Izquierdo FM RA HB PD G. Iuliano PG RH DLS MER SF GG AS TP RF-B CB JL-S ND OG FV MF MB VS MS MLS CS KK TP-B LdB-M JC ES JH DP MN EABB WOA MP SV JACG. Analyzed the data: TK. Wrote the paper: TK.
- 1. Butzkueven H, Chapman J, Cristiano E, Grand’Maison F, Hoffmann M, et al. (2006) MSBase: an international, online registry and platform for collaborative outcomes research in multiple sclerosis. Mult Scler 12: 769–774.
- 2. Trojano M, Pellegrini F, Paolicelli D, Fuiani A, Di Renzo V (2009) Observational studies: propensity score analysis of non-randomized data. Int MS J 16: 90–97.
- 3. Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. Journal of American Statistical Association 79: 516–524.
- 4. Lunceford JK, Davidian M (2004) Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 23: 2937–2960.
- 5. Trojano M, Pellegrini F, Fuiani A, Paolicelli D, Zipoli V, et al. (2007) New natural history of interferon-beta-treated relapsing multiple sclerosis. Ann Neurol 61: 300–306.
- 6. Goodin DS, Jones J, Li D, Traboulsee A, Reder AT, et al. (2012) Establishing Long-Term Efficacy in Chronic Disease: Use of Recursive Partitioning and Propensity Score Adjustment to Estimate Outcome in MS. PLoS One 6: e22444.
- 7. Conway DS, Miller DM, O’Brien RG, Cohen JA (2012) Long term benefit of multiple sclerosis treatment: an investigation using a novel data collection technique. Mult Scler.
- 8. PRISMS Study Group (1998) Randomised double-blind placebo-controlled study of interferon beta-1a in relapsing/remitting multiple sclerosis. Lancet 352: 1498–1504.
- 9. Ho DE, Imai K, King G, Stuart EA (2006) Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis 15: 199–236.
- 10. Gu XS, Rosenbaum PR (1993) Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms. J Comput Graph Stat 4: 405–420.
- 11. King G, Zeng LC (2006) The dangers of extreme counterfactuals. Political Analysis 14: 131–159.
- 12. Panitch H, Goodin DS, Francis G, Chang P, Coyle PK, et al. (2002) Randomized, comparative study of interferon beta-1a treatment regimens in MS: The EVIDENCE Trial. Neurology 59: 1496–1506.
- 13. Lublin F, Cofield S, Cutter G, Conwit R, Narayana P, et al.. (2013) Randomized study combining interferon & glatiramer acetate in multiple sclerosis. Annals of neurology in press.
- 14. Cohen JA, Coles AJ, Arnold DL, Confavreux C, Fox EJ, et al. (2012) Alemtuzumab versus interferon beta 1a as first-line treatment for patients with relapsing-remitting multiple sclerosis: a randomised controlled phase 3 trial. Lancet 380: 1819–1828.
- 15. Shirani A, Zhao Y, Karim ME, Evans C, Kingwell E, et al. (2012) Association between use of interferon beta and progression of disability in patients with relapsing-remitting multiple sclerosis. JAMA 308: 247–256.
- 16. Prosperini L, Gianni C, Leonardi L, De Giglio L, Borriello G, et al. (2012) Escalation to natalizumab or switching among immunomodulators in relapsing multiple sclerosis. Mult Scler 18: 64–71.
- 17. Trojano M, Russo P, Fuiani A, Paolicelli D, Di Monte E, et al. (2006) The Italian Multiple Sclerosis Database Network (MSDN): the risk of worsening according to IFNbeta exposure in multiple sclerosis. Mult Scler 12: 578–585.
- 18. Trojano M, Pellegrini F, Paolicelli D, Fuiani A, Zimatore GB, et al. (2009) Real-life impact of early interferon beta therapy in relapsing multiple sclerosis. Ann Neurol 66: 513–520.
- 19. Trojano M, Pellegrini F, Paolicelli D, Fuiani A, Zimatore GB, et al. (2009) Post-marketing of disease modifying drugs in multiple sclerosis: an exploratory analysis of gender effect in interferon beta treatment. J Neurol Sci 286: 109–113.
- 20. Meyniel C, Spelman T, Jokubaitis VG, Trojano M, Izquierdo G, et al. (2012) Country, Sex, EDSS Change and Therapy Choice Independently Predict Treatment Discontinuation in Multiple Sclerosis and Clinically Isolated Syndrome. PLoS One 7: e38661.