Development and validation of a prognostic multivariable model to predict insufficient clinical response to methotrexate in rheumatoid arthritis

Objective The objective was to predict insufficient response to 3 months methotrexate (MTX) in DMARD naïve rheumatoid arthritis patients. Methods A Multivariable logistic regression model of rheumatoid arthritis patients starting MTX was developed in a derivation cohort with 285 patients starting MTX in a clinical multicentre, stratified single-blinded trial, performed in seven secondary care clinics and a tertiary care clinic. The model was validated in a validation cohort with 102 patients starting MTX at a tertiary care clinic. Outcome was insufficient response (disease activity score (DAS)28 >3.2) after 3 months of MTX treatment. Clinical characteristics, lifestyle variables, genetic and metabolic biomarkers were determined at baseline in both cohorts. These variables were dichotomized and used to construct a multivariable prediction model with backward logistic regression analysis. Results The prediction model for insufficient response in the derivation cohort, included: DAS28>5.1, Health Assessment Questionnaire>0.6, current smoking, BMI>25 kg/m2, ABCB1 rs1045642 genotype, ABCC3 rs4793665 genotype, and erythrocyte-folate<750 nmol/L. In the derivation cohort, AUC of ROC curve was 0.80 (95%CI: 0.73–0.86), and 0.80 (95%CI: 0.69–0.91) in the validation cohort. Betas of the prediction model were transformed into total risk score (range 0–8). At cutoff of ≥4, probability for insufficient response was 44%. Sensitivity was 71%, specificity 72%, with positive and negative predictive value of 72% and 71%. Conclusions A prognostics prediction model for insufficient response to MTX in 2 prospective RA cohorts by combining genetic, metabolic, clinical and lifestyle variables was developed and validated. This model satisfactorily identified RA patients with high risk of insufficient response to MTX.

Introduction Methotrexate (MTX) is an anchor-drug in the treatment of rheumatoid arthritis (RA), because of its safety and efficacy. [1,2] However, in significant numbers of patients, MTX does fail to achieve adequate suppression of disease activity. [3] According to the European league against rheumatism (EULAR) recommendations, therapy should be adjusted in patients who do not reach the treatment target, either remission or low disease activity, after 6 months of therapy; [1,2] or if no improvement has been achieved within 3 months of MTX treatment. [1] Adaptation of treatment strategy as early as 3 months after MTX start is in line with the need for aggressive and individualised treatment in order to achieve early remission, thus following the clinical practise, in which patients receive early step-up treatment already after 3 months of treatment. Adjustment of therapy in non-responders mostly concerns switching to biologicals, alone or in combination with MTX. [2] Prediction of MTX non-response before MTX start is paramount since first months upon diagnosis represent a window of opportunity during which outcomes can be more effectively modulated by therapy. [4] It is necessary to identify non-responders at baseline in order to ensure that only patients unresponsive to MTX receive early additional treatment with biological or other disease modifying anti-rheumatic drugs (DMARD) and those responsive to MTX are spared feasible costly biologicals. [2] Prediction models for MTX non-response have been developed earlier for juvenile idiopathic arthritis (JIA) [5] and RA. [6][7][8][9] However, these models did not use metabolic predictors and the model developed for RA was not validated and only moderately discriminated responders from non-responders, [6][7][8] and used the 6-month time-point remission or clinical response as outcome measure. In the RA prediction, there is need for earlier prediction of insufficient response to MTX in order to achieve early remission. Therefore, the aim of this study was to predict insufficient response to 3 months MTX treatment in RA patients before DMARD initiation.

Study design and patients
This study followed the rules of Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD). [10] Data from two prospective cohorts with Caucasian patients were used. The derivation cohort to construct the prediction model consisted of patients who were enrolled in the treatment in Rotterdam Early Arthritis Cohort (tREACH). This is a clinical multicentre, stratified single-blinded trial (ISRCTN26791028) described elsewhere. [11] 285 DMARD naïve patients starting MTX between July 2007 and October 2011 were selected for this study (Fig 1). The external validation cohort consisted of 102 patients from the Methotrexate in Rotterdam (Netherlands) cohort (MTX-R) who started MTX between January 2006 and December 2010 in the department of Rheumatology, Erasmus University Medical Center, Rotterdam (Erasmus MC), the Netherlands (Fig 1). [12] The medical ethics committee from the Erasmus MC approved both studies and patients gave written informed consent before inclusion.
The derivation cohort included patients from the tREACH who were on MTX and fulfilled the 2010 American College of Rheumatology (ACR) / European League Against Rheumatism (EULAR) criteria for RA. Therefore, only tREACH patients from the high probability group and intermediate probability group A were used since only these patients groups were on MTX. [11] These patients were included in seven secondary care centres and one tertiary care (Erasmus MC) centre in the South-West of the Netherlands. [13] Patients in the validation cohort were included when diagnosed with RA by the physician at the tertiary care centre (Erasmus MC). Inclusion and exclusion criteria for both cohorts are shown in S1 Table. Patients from the derivation cohort started with 25 mg/week MTX and glucocorticoids (GCs) and were randomized to treatment with or without sulfasalazine and hydroxychloroquine. In the derivation cohort, DMARD dosages were: MTX: 25 mg/week orally (dosage reached after 3 weeks), sulfasalazine 2 g/day and hydroxychloroquine 400 mg/day. GCs were either given IM (methylprednisolone 120 mg or triamcinolone 80 mg) or as an oral tapering scheme (week 1-4: 15 mg/day, week 5-6: 10 mg/day, week 7-8: 5 mg/day, and week 9-10: 2.5 mg/day).
In the validation cohort, the physician was free to choose dosing and co-medication. In both cohorts, all patients received folic acid (10 mg/week) during MTX treatment.

Assessment of insufficient response
Primary outcome was disease activity score 28 (DAS28) at 3 month follow-up, which included 28 tender joint count (TJC), 28 swollen joint count (SJC), visual analog scale (VAS) for general health and the erythrocyte sedimentation rate (ESR). [14] Physicians used the cut-off values of DAS28>3.2 to step-up therapy after 3 months, because low disease activity was not reached. Therefore, insufficient response was defined as DAS28>3.2 after three months of treatment with MTX. DAS28 was assessed by (research) nurses, while being blinded for the study outcome measure.

Data collection
All potential predictors were associated with MTX inefficacy in previous association studies [5-9, 12, 15, 16] or were likely to be associated based on the physiology (hypothesis-driven approach). All predictors were dichotomized according to commonly used cut-off values or by dividing them into quartiles. A cut-off was chosen at the quartile that had the strongest association with insufficient response as indicated by the highest odds ratio or -2Log likelihood.

Clinical and lifestyle variables
All predictors were assessed at baseline/diagnosis before initiation of therapy. Before the treatment was started, we collected blood from each RA patient and determined the ESR, C-reactive protein (CRP), TJC, SJC, VAS and DAS28. DAS28>5.1 was used as dichotomous variable for high disease activity at baseline according to the European League Against Rheumatism (EULAR) criteria for response. [17] The Health Assessment Questionnaire (HAQ) was added as variable as it could possibly predict MTX response since mild functional impairment was associated with RA remission. [15] Lifestyle variables included body mass index (weight (kg)) / (height (m)) 2 , smoking, consumption of alcohol and caffeine extracts (Coca-cola, coffee and tea). Caffeine is an adenosine receptor antagonist, which could diminish the anti-inflammatory effect of adenosine, thought to be stimulated by MTX, and therefore possibly decreases MTX response. [18] BMI and smoking were not measured in the validation cohort.

Metabolic and genetic variables
Metabolic variables were erythrocyte-folate, serum-folate, plasma-homocysteine, erythrocytevitamin B 6 , serum-vitamin B 12 , and estimated glomerular filtration rate (eGFR) calculated with the modification of diet in renal disease (MDRD) formula. [19] Three research blood sample-tubes were obtained during every study visit besides the routine blood samples for erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), Alanine-aminotransferase (ALAT), leukocytes and thrombocytes. One serum tube was centrifuged for 10 min at 1700 g, 4˚C; serum was divided into aliquots and stored at -80˚C. One EDTA tube was immediately put on ice after collection, centrifuged for 10 min at 1700 g, 4˚C, and plasma and cell-pellet aliquots were stored at -80˚C. One EDTA tube was kept at room temperature and whole-blood was divided into aliquots and stored at -80˚C.
Homocysteine was determined in EDTA-plasma using isotope-dilution liquid chromatography tandem mass spectrometry (LC-MS/MS; waters Acquity UPLC Quattro Premier XE) by an adapted method. [20] For chromatographic separation, a Waters Symmetry C 8 column 2.1x100mm (Waters, Etten-Leur, Netherlands) with a precolumn (Waters) was used. Vitamin B12 and folate in serum were measured using an electrochemiluminescence immunoassay (Modular E170, Roche, Almere, Netherlands). Vitamin B6 was measured in whole blood with an isotope-dilution LC-MS/MS assay that we described elsewhere. [21] For the erythrocytefolate assay, 100 μl whole blood was diluted with 1600 μl of a 10 g/l, pH 4, ascorbic acid solution and incubated 3 hours at room temperature. Tubes were centrifuged at 2000 g and analysed with an electrochemiluminescence immunoassay for folate (Modular E170, Roche). Erythrocyte-folate was measured in whole blood from the room temp EDTA tube within 24 hours after sample collection. Erythrocyte-folate stability at room temperature has been proven up to 24 hours. [22] Erythrocyte-folate was corrected for serum-folate and hematocrit. Routine haematology parameters were measured using a Sysmex XE-2100 and ESR was measured using an InteRRliner (Sysmex, Etten-Leur, Netherlands). Routine chemistry parameters were measured on a Roche Modular P analyser (Roche). eGFR was added as possible predictor of MTX outcome, because it influences intracellular MTX polyglutamate concentrations. [23] Genetic variables consisted of single nucleotide polymorphisms (SNP) were selected based on their involvement in the MTX metabolic pathways, their high polymorphic allele frequency and documented functional effects. In weekly low-dose MTX treatment, MTX polyglutamates accumulate intracellularly and as such inhibit several key enzymes in the folate metabolism and de novo purine synthesis. [16] MTX polyglutamates correlate with MTX efficacy in RA. [16] Non-responders accumulate fewer MTX polyglutamates in red blood cells compared to responders in an early phase of treatment. [16] SNP in genes involved in MTX transport and polyglutamylation affect intracellular MTX accumulation. Inside cells, MTX-PGs inhibit keyenzymes in one-carbon metabolism which is responsible for its therapeutic effects as well as its adverse-event profile. Intracellular MTX prevents cell proliferation and DNA-methylation by displacing the preferred substrates of the folate-dependent enzymes. [24] DNA was obtained from whole blood. SNP selection, DNA isolation and genotyping were performed as we described earlier. [5,16] All SNPs were determined using real-time PCR with Taqman technique. All laboratory parameters were measured at the clinical chemistry laboratory by technicians blinded for the study.

Statistical analysis
To construct a model to predict 3 months insufficient response, backward logistic regression analysis was performed in several stages. First, all continuous variables were dichotomized to facilitate the use of the models in daily clinical practice. Second, univariable odds ratios (ORs) with 95% confidence intervals (CI) were calculated. Third, potential predictors (p<0.20) were combined into a multivariable logistic regression model. The full model was simplified according to statistical strength (exclusion if p�0.200, in each step deleting the variable with the highest p-value), correlations between predictors and practical considerations. If two potential predictors correlated strongly (Spearman's r�0.40), the variable that was clinically more relevant or stronger associated with the outcome measure in univariate analysis was given preference.
To calculate predicted probabilities of 3 months DAS28>3.2, we used the following formula: were P is the predicted probability of achieving 3 months DAS28>3.2, β 0 is the constant and β 1 , β 2 and β p represent the regression coefficients for each of the predictors x 1 , x 2 and x p .
To evaluate the predictive power of the model, we used the predicted probabilities of insufficient response to construct a receiver operating characteristic (ROC) curve. The area under the ROC curve (AUC) measured the concordance of predicted values with actual outcomes, with an AUC of 0.5 reflecting no predictive power and an AUC of 1.0 reflecting perfect prediction. To assess whether the models fit the data well, we used the Hosmer-Lemeshow test.
To compute the risk score of being an insufficient responder to MTX for individual patients, the regression coefficients (β) of the predictors in the final model were transformed into simple scores that sum up to a total risk score. The total risk scores and probabilities of MTX insufficient response for each patient from the derivation cohort was computed. Mean probabilities for each risk score were calculated. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for each risk score cut-off by using the ROC curve of the derivation cohort. No model updating was performed based on the validation.
The prediction model was externally validated in the validation cohort. The regression coefficients of the predictors obtained from the derivation cohort were entered in the above-mentioned formula. This was used to construct a ROC curve for the validation cohort. All statistical analyses were carried out with SPSS V.21.0.0.1 (SPSS, Chicago, Illinois, USA). No internal validation was performed because an external validation cohort was available. Furthermore, the MTX-R external validation cohort was initiated before the tREACH study and was primarily designed to develop prediction models for MTX efficacy and to investigate pharmacokinetics and dynamics. [12,25] This study was targeted at a minimum of 100 patients in order to be able to include p 100 = 10 predictors in the multivariable prediction model. Hence, there are no accepted approaches to estimate the sample size for prediction model development and validation. The sample size of the derivation cohort was based on the primary objective of the tREACH study. [11,26] The total number of patients (n = 387) starting MTX and the number of patients that responded insufficient to MTX (n = 148) far exceeded the minimum number needed to develop a multivariable prediction model with 8-10 predictors. Therefore, this study was used to develop the prediction model and the MTX-R study, that was initiated earlier, was used for validation. Because of this, current smoking and BMI were not available in the validation cohort. Because the validation cohort has less parameters than the derivation cohort, model updating / recalibration arising from the validation cohort was not possible.
Missing data was not imputed in the tREACH or MTX-R. If data of a patient were lacking, the patient was left out of the analysis.

Patient characteristics
Three months data of the derivation and validation cohorts have been previously published. [12,26] For the present study, 285 patients from the derivation cohort participated at baseline, of whom 270 also participated at 3 months (Fig 1). From the validation cohort, 102 patients were included at baseline of which 84 participated after 3 months. MTX dose was higher in the derivation cohort as compared to the validation cohort (25 versus 15 mg/week) ( Table 1). Patients in the validation cohort had lower DAS28, used more non-steroidal anti-inflammatory drugs (NSAID), less glucocorticoids and received more often MTX as subcutaneous injections than patients in the derivation cohort. [12] In both cohorts, disease activity decreased over time. [12] In the derivation cohort, mean DAS28 was 4.94 (SD = 1.15) at baseline and decreased to 3.12 (SD = 1.19) after three months. In the validation cohort, DAS28 decreased from 4.26 (SD = 1.43) to 2.92 (SD = 1.23). In the derivation cohort (mean DAS28 = 3.12) and validation cohort (mean DAS28 = 2.92), DAS28 was comparable (p = 0.174) after three months of treatment. In the derivation cohort, 116 patients (43%) had a DAS28>3.2 after 3 months and in the validation cohort, 32 patients (38%). Table 2 shows the SNPs and other baseline variables that were univariabely associated (p�0.20) with insufficient response after 3 months of therapy in the derivation cohort. These variables were included in the multivariable logistic regression model with backward selection. The variables that remained in the final prediction model were: Adenosine triphosphate Binding Cassette transporter (ABC) family B member 1 (ABCB1) rs1045642 genotype, ABCC3 rs4793665 genotype, erythrocyte-folate<750 nmol/L, baseline DAS28>5.1, baseline HAQ>0.6, current smoking and BMI>25 kg/m 2 . The AUC of the prediction model was 0.80 (95% CI: 0.73-0.86), indicating that it classified 80% of patients correctly (Table 3; Fig 2). The Hosmer-Lemeshow goodness-of-fit test was not statistically significant (p = 0.82, indicating that the model fits the data well.

Prediction model for insufficient response to MTX
These predictors were used to test the model in the validation cohort. Smoking and BMI were not determined in the validation cohort, and therefore could not be tested. The AUC in the validation cohort also was 0.80 (95% CI: 0.69-0.91) ( Table 3) indicating that 80% of patients were classified correctly. The AUC of 0.80 of the validation cohort fits within the 95%  To make our prediction model suitable for daily practice, we transformed the regression coefficients (β) of the model's predictors, into simple scores. Thereafter, individual risk scores for having DAS28>3.2 after 3 months of therapy were computed ( Table 4). The constant (beta = -5.07) of the multivariable model of -5 was suppressed in order to simplify the model and not used. The score ranged from 0 to 8 whereby a higher score reflected higher chance at treatment failure (insufficient response) after 3 months. Risk score of a patient, who had all predictors of the final model, was calculated by adding up simple scores, assigned to individual predictors: 1+1+1+1+1+2+1, which results in a risk score of 8. If all predictors were present the probability of 3 months DAS28>3.2 is 0.80. The risk score of a patient having no predictors would be equal to 0. If no predictors were present, the probability of insufficient response was 0.01. Within the 0-8 range, the diagnostic accuracy of different cut-offs for the prediction model was evaluated by calculating the risk scores, and probability of having insufficient response, for each individual patient in the derivation cohort.  When only DAS28>5.1 was included the AUC under the ROC curve was 0.66 (95% CI: 0.59-0.72) in the derivation cohort, and 0.66 (95% CI: 0.54-0.79) in the validation cohort. When HAQ>0.6, erythrocyte folate<750 nmol/L, ABCB1 rs1045642 and ABCC3 rs4793665 were added, the AUC raised to 0.73 (95% CI: 0.66-0.80) in the derivation and to 0.80 (95% CI: 0.69-0.91) in the validation cohort. When, finally, current smoking and BMI>25 were added to the prediction model in the derivation cohort, the AUC rose to 0.80 (95% CI: 0.73-0.86).

Discussion
We developed and validated a model, which could predict insufficient response to MTX after 3 months of therapy, before start of MTX treatment in 2 large prospective cohorts including patients with RA. The model included DAS28>5.1 before start of MTX, HAQ>0.6, ABCB1 rs1045642 genotype, ABCC3 rs4793665 genotype, erythrocyte-folate<750 nmol/L, current smoking and BMI>25 kg/m 2 . The model classified 80% of patients correctly in the derivation and validation cohort.
Earlier studies did not include metabolic predictors [6][7][8] and one lacked a validation cohort. 6 Our study is the first validated and prospective study on prediction of MTX insufficient response that also incorporated metabolic predictors. One of the earlier reported models to predict MTX efficacy in MTX monotherapy in RA classified 85% of patients correctly. [6] This model was not yet validated. In agreement with our results, this model contained DAS at diagnosis and smoking status and genetic variables. However, in contrast to our study, this study did not assess BMI and folate status as possible predictors. Furthermore, they investigated other genetic factors, namely AMPD1, ATIC, ITPA and MTHFD1 genotypes, which were not included in our study.
Others [7] developed a prediction model using the EULAR response criteria [17] as dependent variable and current smoking, female gender, longer symptom duration and younger age as independent variables. EULAR response criteria can only be used in patients with baseline DAS28�3.3 and not in all patients with RA. We also found that women had a higher risk of insufficient response, but this variable was not included in the final model since other variables were stronger predictors. [17] Conflicting results on the association of BMI and disease activity in RA have been reported; underweight and obesity both have been associated with worse disease activity in RA. [27,28] The cut-off could be �4 (Fig 2, Table 4) with a probability for having 3 months DAS28>3.2 of 0.44. At this cut-off, 71% of the insufficient -responders could have received other therapy (i.e. biologicals) earlier, which may prevent irreversible joint destruction. However, 28% of responders would receive other medication while MTX would have worked. Whether the cost reduction of less irreversible joint destruction exceeds the extra costs that will be made for biologicals should be further investigated. The interest of the rheumatology community is focused more towards biological therapies, although no clinical prediction rules have been published yet to individualize DMARD therapy. Both the ACR [29] and the EULAR [1] recommend MTX as first-line therapy in RA. Methotrexate is first-line therapy and should be prescribed at an optimal dose of 25 mg weekly and in combination with glucocorticoids; 40% to 50% of patients reach remission or at least low disease activity with this regimen. [30] If this treatment fails, sequential application of targeted therapies, such as biologic agents (eg, tumor necrosis factor inhibitors) or Janus kinase inhibitors in combination with methotrexate, have allowed up to 75% of these patients to reach the treatment target over time. [30] As methotrexate is today still the first-line therapy and is much cheaper than biologic agents, the insufficient response will help in making the choice between biologic agents and methotrexate at forehand and minimize losing precious time with bone damage or severe adverse events. Therefore, both prediction rules for first-line MTX non-response and second-line biological response are needed to individualize therapy in RA.
The AUC of the ROC curve of the final model without current smoking and BMI>25 was lower in the derivation cohort (AUC = 0.73 (95% CI: 0.66-0.80)) as compared to the validation cohort (AUC = 0.80 (95% CI: 0.69-0.91) cohort. However, both AUCs are not significantly different since, 0.73 is within the 95% CI of the AUC of the validation cohort. At 6 months 64 of the 178 patients in the derivation and 24 of the 64 patients in the validation cohort had a DAS28 > 3.2. When the model was applied at 6 months after start of MTX, the AUC in the validation cohort was 0.73 (95% CI: 0.65-0.81) and in the validation cohort the AUC was 0.61 (0.46-0.77). Also, at 6 months the AUCs between the two cohorts are not significantly different since 0.73 is within the 95% CI of the AUC of the validation cohort.
A pitfall of this study was that current smoking and BMI>25 could not be replicated in the validation cohort. These variables have to be replicated in other cohorts. However, we decided to keep these predictors in the prediction model, because they are easy assessable and strong predictors improving the predictive value of the model (AUC from 0.73 to 0.80). The impracticality regarding SNP and laboratory predictors could make future clinical utilization difficult. This may turn the current practice of methotrexate trial easier to be done in most of the clinics. The biochemical parameters add complexity to the clinical implementation of the prediction model. However, the biochemical parameters added significant predictive power to the use of clinical parameters alone. Although erythrocyte folate is not available in every laboratory, it is relatively easy to measure using a routine immunochemistry platform. Similarly, SNPs can be measured relatively easily these days using automated DNA extraction equipment and SNP platforms. With current smoking and BMI>25, in addition to DAS28>5.1 and HAQ>0.6, without laboratory predictors, the AUC was 0.76 (95% CI: 0.70-0.82) in the derivation cohort, indicating good predictive value (76% will be classified correctly). Inclusion of the laboratory predictors, erythrocyte-folate<750 nmol/L, ABCB1 rs1045642 and ABCC3 rs4793665, improved the AUC to 0.80, indicating that 80% of the patients will be classified correctly. We aimed for the highest possible AUC (AUC = 0.80 (95% CI: 0.73-0.86)) in the derivation cohort and therefore choose to keep the laboratory parameters in the model. Nowadays, it is easy and not expensive to measure SNPs and erythrocyte-folate. In addition, erythrocyte-folate is a routine laboratory test and genotyping might be transformed to fast-test if useful.
Although, 25 mg/week is quite a high dosage that not always is reached world wide, in the derivation cohort mean doses of 25 mg/week (SD: 1 mg/week) were used and 15 mg/week (SD: 2 mg/week) in the validation cohort. Therefore, the prediction rule is generalizable to mean doses of MTX of 15-25 mg/week. The EULAR guideline actually states that MTX should be used in sufficient doses of 25-30 mg/week. Similarly, Methotrexate is first-line therapy and should be prescribed at an optimal dose of 25 mg weekly and in combination with glucocorticoids; 40% to 50% of patients reach remission or at least low disease activity with this regimen. [30] As this is the optimal dose according to current guidelines this should be the dose for which the model must predict the insufficient response.
MTX-dose could not be added as predictor itself because there was no variation in MTXdose within the cohorts. We showed before that erythrocyte MTX polyglutamate levels are associated with treatment response in JIA [31] and RA. [25] It is worthwhile to examine this a potential predictor in new studies.
A strong point of this study was that the developed prediction model was externally validated and that the predictive value was equally high in the validation cohort. This prediction model should be validated in a trial were insufficient response is predicted before treatment is started and patients treated according to the predicted insufficient response.
In conclusion we developed and validated a prediction model for insufficient response to MTX in 2 prospective RA cohorts by combining genetic, metabolic, clinical and lifestyle variables. This model can satisfactorily identify RA patients with a high risk of insufficient response to MTX after three months of treatment, and can, therefore, be used by clinicians as a tool for personalized treatment. RA patients who are likely to be unresponsive to MTX therapy, may be (additionally) treated with biologicals or other DMARDs without further delay.