Skip to main content
Advertisement
  • Loading metrics

Patient groups in Rheumatoid arthritis identified by deep learning respond differently to biologic or targeted synthetic DMARDs

Abstract

Cycling of biologic or targeted synthetic disease modifying antirheumatic drugs (b/tsDMARDs) in rheumatoid arthritis (RA) patients due to non-response is a problem preventing and delaying disease control. We aimed to assess and validate treatment response of b/tsDMARDs among clusters of RA patients identified by deep learning. We clustered RA patients clusters at first-time b/tsDMARD (cohort entry) in the Swiss Clinical Quality Management in Rheumatic Diseases registry (SCQM) [1999–2018]. We performed comparative effectiveness analyses of b/tsDMARDs (ref. adalimumab) using Cox proportional hazard regression. Within 15 months, we assessed b/tsDMARD stop due to non-response, and separately a ≥20% reduction in DAS28-esr as a response proxy. We validated results through stratified analyses according to most distinctive patient characteristics of clusters. Clusters comprised between 362 and 1481 patients (3516 unique patients). Stratified (validation) analyses confirmed comparative effectiveness results among clusters: Patients with ≥2 conventional synthetic DMARDs and prednisone at b/tsDMARD initiation, male patients, as well as patients with a lower disease burden responded better to tocilizumab than to adalimumab (hazard ratio [HR] 5.46, 95% confidence interval [CI] [1.76–16.94], and HR 8.44 [3.43–20.74], and HR 3.64 [2.04–6.49], respectively). Furthermore, seronegative women without use of prednisone at b/tsDMARD initiation as well as seropositive women with a higher disease burden and longer disease duration had a higher risk of non-response with golimumab (HR 2.36 [1.03–5.40] and HR 5.27 [2.10–13.21], respectively) than with adalimumab. Our results suggest that RA patient clusters identified by deep learning may have different responses to first-line b/tsDMARD. Thus, it may suggest optimal first-line b/tsDMARD for certain RA patients, which is a step forward towards personalizing treatment. However, further research in other cohorts is needed to verify our results.

Author summary

Rheumatoid arthritis (RA) is an auto-immune disease affecting the joints of the body. RA is subject to poor treatment response to advanced antirheumatic therapy. While previous studies have used machine learning techniques to identify different RA patient populations, no study has used these populations to evaluate potentially different treatment response of advanced RA treatments such as tumor necrosis factor inhibitors. Using machine learning, we identified five distinct RA patient groups which mainly differed by sex, disease burden/duration, and concomitant traditional RA treatment use (i.e. prednisone, methotrexate). Patients with high frequency of use of traditional RA treatment use at advanced RA treatment initiation, male patients, as well as patients with a lower disease burden responded better to tocilizumab than to adalimumab. Furthermore, seronegative women without use of prednisone at advanced RA treatment initiation as well as seropositive women with a higher disease burden and longer disease duration had a higher risk of non-response with golimumab than with adalimumab. The results are a step towards personalizing treatment and shall encourage other researchers to embrace machine learning techniques to improve treatment response in RA and other disease areas.

Introduction

Rheumatoid arthritis (RA) is an heterogenic inflammatory disorder, often presenting a fluctuant disease activity over the disease course [1]. Despite the advent of validated treatment recommendations, it remains challenging in clinical practice to avoid disease flares or overmedication [2]. Moreover, specific biomarkers to guide optimal biologic or targeted synthetic disease modifying antirheumatic drug (b/tsDMARD) use are not available [3].

Intelligent clinical decision support systems (CDSS) use machine learning to generate personalized treatment strategies. For example, deep learning methods have shown great potential for disease prediction in the medical domain, [4,5] including the field of RA [6]. We have recently described an adaptive supervised deep learning method to predict the disease activity score using 28 joints (DAS28) erythrocyte sedimentation rate (esr) values at the next visit with a variation of 8% compared to the individual true values [7]. Furthermore, machine learning models taking into account data from genomic or microbiome data are increasingly used to classify the response to methotrexate or bDMARDs [8,9]. Machine learning has also been used in RA, using clinical data [1013] or additional tissue samples for example [14]. We hypothesize that the use of machine learning helps to identify homogeneous RA patient groups within this heterogeneous disease which may help with the prediction of treatment response. However, to date, no study has assessed treatment response studies in machine learned clusters. Yet, it may prove crucial to use clustering approaches for digital stewardship for the treatment of RA.

This study aimed to identify b/tsDMARDs that work better or less well for certain patient groups, and thereby preventing treatment cycling and allowing disease control. Thus, we applied deep learning clustering techniques to identify RA patient groups among which we subsequently assessed comparative effectiveness of b/tsDMARDs.

Methods

Ethics statement

Ethical approval for this study was obtained from the CER-VD (La Commission cantonale d’éthique de la recherche sur l’être humain, ID 2020–00033). All patients provided written informed consent.

Public and patient involvement

Neither patients nor the public was involved in the conduct of this study.

Study design and data source

We conducted a cohort study using data derived from patients in the Swiss Clinical Quality Management in Rheumatic Diseases (SCQM) registry. The SCQM registry was established in 1997 and is used to prospectively follow RA patients [15]. Regulatory health authorities in Switzerland have recommended continuous monitoring with the SCQM system for all patients receiving b/tsDMARDs [16]. Further information is available in S1 Text.

Study population

We identified all patients with an RA diagnosis who started their first b/tsDMARD between 1999 and 2018 while under observation in SCQM. We included patients who initiated a b/tsDMARD after or on their first visit in SCQM but not those who initiated a b/tsDMARD before their first visit in SCQM. The date of the first b/tsDMARD start is further called cohort entry. We excluded patients without a record of DAS28-esr score ≤6 months prior to their first b/tsDMARD start. An overview of the study conception is provided in Fig 1.

thumbnail
Fig 1. Overview of the study composition.

RA: rheumatoid arthritis; DAS28-esr: rheumatoid arthritis disease activity score based on 28 joints and erythrocyte sedimentation rate; b/tsDMARD: biologic or targeted synthetic disease modifying antirheumatic drug.

https://doi.org/10.1371/journal.pcbi.1011073.g001

Exposures

The exposure was defined as the first bDMARD (TNF inhibitors: adalimumab, certolizumab pegol, etanercept, infliximab, golimumab; non-TNF inhibitors: abatacept, rituximab, tocilizumab), or tsDMARD (baricitinib, tofacitinib). Comparative effectiveness analyses were performed on a class effect basis and on an individual b/tsDMARD basis. The reference group in the class effect analyses was TNF inhibitors, and the reference group in the individual b/tsDMARD analyses was adalimumab.

Outcome

We assessed 2 separate outcomes (i.e. non-response, treatment response) within a time window of 15 months. The outcome non-response was defined as discontinuation of the b/tsDMARD due to physician-recorded non-response (binary). The outcome treatment response was defined as a reduction in DAS28-esr by ≥20% (binary).

Follow-up

When assessing non-response, we followed all patients from the cohort entry (i.e. b/tsDMARD start) until occurrence of treatment discontinuation due to non-response (i.e. the outcome), or censoring due to other treatment discontinuation, end of follow-up (15 months), or end of patient record, whichever happened first (Fig 1).

When assessing treatment response, we followed all patients from the cohort entry (i.e., b/tsDMARD start) until occurrence of 20% reduction in DAS28-esr (i.e. the outcome), or censoring due to any treatment discontinuation, end of follow-up (15 months), or end of patient record, whichever happened first (Fig 1).

Clustering

We clustered patients at cohort entry (i.e., start of b/tsDMARD) using deep embedded clustering (DEC) [17] in combination with the AnyNets-Autoencoder, an adaptive deep adaptive neural network, which is especially designed to work with medical data with missing values [18]. The adaptive architecture outperformed classical feed-forward neural networks in combination with imputation methods (a naive approach to deal with missing values by replacing them with defaults such as e.g. zeros) [19]. DEC simultaneously learns feature representations and cluster assignments in a lower-dimensional space (latent space) using the adaptive deep autoencoder in order to reconstruct the input and iteratively optimizes a clustering objective [20]. Details about the algorithm and the architecture can be found in S2 Text and S1 Fig. The number of clusters is not decided by the algorithm and has to be given as hyperparameter. We evaluated the found clusters using the silhouette score, which measures the quality of the cluster assignment based on how similar data points are to their own cluster compared to other clusters. The score ranges from -1 to +1, where a high value denotes a well matched and separated clustering.

Since the aim of this study was to find patient groups that may respond differently to certain treatments, one clustering run yielding only a few clusters will not lead to robust results in comparative effectiveness analyses. Thus, taking into account the performance of clustering and sample size, we decided for 3, 4, and 5 clusters whose response to different treatments we were then able to compare. The clustering was implemented using Python in which we used the deep learning library Pytorch.

In post-hoc analyses to compare obtained clusters, k-means clustering without deep learning was performed. We did not obtain separated clusters (S2 to S7 Figs).

Features

Features for patient clustering included patient demographics and life-style factors (e.g., physical activity, body mass index, smoking) that were considered time-invariant and thus captured using an ever before lookback window. Additional features such as other medication use (e.g., conventional synthetic [cs] DMARDs) were assessed at cohort entry. Laboratory markers (e.g., esr), other clinical feature (e.g., number of swollen/tender joints), and health measurement / disability scores (e.g., health assessment questionnaire) were assessed from a 6-month lookback window and if not present considered as missing. Detailed information on used features and the level of missingness is available in S1 Table.

Additional clustering runs to further increase robustness of our results accounted for the recency of the last measurement of laboratory markers, disease activity, and health measurement / disability scores by adding the time since measurement (within 6 months) as an additional input feature. Thus, given the pre-defined number of 3, 4, and 5 clusters and having separate runs not accounting and accounting for the recency of the last clinical measurements, we will obtain a total of 24 clusters (i.e. (3+4+5)*2 = 24) which was a good number in terms of achieving robust results and manual cluster handling and result interpretation.

Data analyses

Descriptive analyses.

The workflow of performed steps following the clustering can be seen in Fig 2. We described the patient characteristics overall and visually inspected each cluster. Because findings from individual clusters may be chance findings and each cluster was similar in patient characteristics to at least one other cluster, we grouped clusters manually according to most distinct patient characteristics. The grouping was based upon visual comparison of patient characteristics. Most extreme characteristics (i.e. highest and lowest frequencies/values) were marked, and by “putting these together”, groups were formed. An example with few clusters and patient characteristics is given in S3 Text. Significance testing of patient characteristics using ANOVA (Analysis of Variance) and subsequent Least Significance Difference tests was not feasible given the number of clusters and variables.

thumbnail
Fig 2. Work flow of performed steps.

The top part depicts a 2D latent representation of all patients using t-distributed Stochastic Neighbor Embedding from deep embedded clustering with 3 clusters, 4 clusters, and 5 clusters in separate runs once accounting for the recency of the last clinical measurements (input features #2) and once not (input features #1). Different colors denote the different cluster assignments. Information on input features are available in S1 Table.

https://doi.org/10.1371/journal.pcbi.1011073.g002

When clustering patients, we use the strength of machine learning techniques and include many features (i.e. patient characteristics). Using these results in clinical decision-making implies that all characteristics would need to be assessed at the moment of decision-making which is not feasible. Therefore, we further employ stratified analyses which make results applicable to clinical practice. Stratification means that patients are categorized according to few (most important) characteristics only and it is the preferred method to validate results obtained through clustering. In this study, we used the strength of clustering to set the basis to define the most distinct patient characteristics that set the different groups apart, which would not have been possible otherwise. Categories of these most distinct features were then derived as parameters for the stratified (validation) analyses (Fig 2).

Comparative effectiveness analyses.

In each of the 24 clusters, we performed comparative effectiveness analysis of b/tsDMARDs by using Cox proportional hazard regression analysis to estimate crude hazard ratios (HRs) with 95% confidence intervals (CIs) of treatment non-response, and treatment response separately. Because findings from individual clusters may be chance findings, we compared results among clusters belonging to the same group. Furthermore, to increase robustness of our reported findings, we only report significant results that were present in the maximum number of clusters of the same group (Fig 2).

Validation.

Finally, to validate obtained results, we repeated comparative effectiveness analyses in stratified analyses according to parameters derived in the descriptive analyses part. We estimated age- (and sex) adjusted HR with 95% CIs. We performed the description of patient clusters and comparative effectiveness analyses using SAS statistical software version 9.4 (NC, USA).

Results

Descriptive analyses

We identified 3516 patients with a first-time b/tsDMARD in SCQM between 1999 and 2018 (flow chart in S8 Fig). Selected patient characteristics of the overall population can be seen in Table 1. Patients had a mean age of 55.4 years and 76.2% of patients were women. The median RA duration of patients was 6.2 years, 69.3% were rheumatoid factor (RF) positive, the mean DAS28-esr was 4.3, and 66.6% of patients also used methotrexate at their first b/tsDMARD use.

thumbnail
Table 1. Selected patient characteristics at first-time b/tsDMARD.

https://doi.org/10.1371/journal.pcbi.1011073.t001

Among 24 obtained clusters, the clusters differed in size and comprised a minimum of 362 and a maximum of 1481 patients. Most distinct patient characteristics between clusters were sex, seropositivity, csDMARD use, prednisone use, measures of disease activity and pain, and disease duration. Visual inspection of patient characteristics of all 24 clusters led to their grouping into 5 distinct groups. Patient characteristics of each cluster (already grouped into 5 groups) can be seen in S2 to S6 Tables. The first group comprised clusters with highest frequencies of individual csDMARD and prednisone use (≥75%, ≥31%, ≥19%, and ≥47% of patients used methotrexate, leflunomide, and sulfasalazine, and prednisone, respectively, at b/tsDMARD initiation), and with a tendency towards seropositivity and no family history of rheumatic diseases. Parameters derived for the validation analysis of this group were the use of ≥2 csDMARDs and prednisone. The second group comprised clusters of a majority of men (>99%) which further displayed the highest frequency of smoking and mean body mass index (both assessed from an ever before lookback window). The parameter derived for the validation analysis of this group was being male. The third group comprised clusters of mainly seronegative patients (>99% RF negative) with lowest frequency of prednisone use (<40%) and a tendency towards a higher proportion of women (>76%). Parameters derived for the validation analysis of this group were being seronegative and without prednisone use, and in a separate analysis additionally being female. The fourth group comprised clusters of mainly seropositive patients (>80% RF positive), with a rather high disease burden (mean DAS28-esr >4.4, mean HAQ score >1.2, mean VAS >4.7 [out of 10], and activity of disease >4.9 [out of 10]), long disease duration (median of >8 years), and a tendency towards being female (≥77%). Parameters derived for the validation analysis of this group were being seropositive, with a DAS28-esr score >5.1, a HAQ score >1.5, pain levels >6, or activity of disease levels >6, and disease duration >8 years, and in a separate analysis additionally being female. The fifth group comprised clusters with a rather low disease burden (mean DAS28-esr ≤4.2, mean HAQ score ≤0.8, mean VAS ≤3.9 [out of 10], and activity of disease ≤4.2 [out of 10]) and with a tendency towards seropositivity and a higher proportion of women. Parameters derived for the validation analysis of this group were having a disease activity of a DAS28-esr score ≤3.2, a HAQ score <0.7, pain levels of <4 (of maximum 10), or activity of disease levels of <4 (of maximum 10), and in a separate analysis additionally being female and seropositive.

Comparative effectiveness analyses

S4 Text depicts comparative effectiveness results obtained per cluster (grouped into the 5 distinct groups for ease of comparison). To increase robustness of our results, we only reported significant results that were observed in the majority of clusters per group. Clusters in the first group (i.e., high frequency of use of csDMARDs and prednisone) responded well to tocilizumab (significant HRs of 2.55–2.67, count of outcomes: 6–11, depending on the cluster). Clusters in the second group (i.e., men) also responded well to tocilizumab (significant HRs of 6.78–8.54, count of outcomes: 5–7, depending on the cluster). Clusters in the third group (i.e., seronegative patients with a tendency towards low use of prednisone and a higher proportion of women) as well as clusters in the fourth group (i.e., mainly seropositive women with high disease burden and long duration) had a high risk of non-response with golimumab (significant HRs of 2.10–2.53 and of 2.45–4.54, count of outcomes: 19–24 and 8–12, respectively, depending on the clusters). Clusters in the fifth group (i.e., patients with low disease burden and with a tendency towards seropositivity and a higher proportion of women) responded well to golimumab (significant HRs of 2.56–3.15, count of outcomes: 5–9, depending on the cluster) and tocilizumab (significant HRs of 3.39–6.43, depending on the cluster).

Validation

Comparative effectiveness results in each group were validated by stratified analyses according to most distinct patient characteristics derived from each group to make results applicable to clinical practice. Tables 26 show comparative effectiveness results in each derived patient strata. First, among 728 patients with ≥2 csDMARDs and use of prednisone at index date, we confirmed the good response to tocilizumab with an age and sex adjusted HR of 5.46 (95% CI 1.76–16.94) when compared to adalimumab (Table 2). Furthermore, we observed a significant good response to golimumab and tofacitinib, and an increased risk of non-response to golimumab. Second, among 837 men, the good response to tocilizumab was confirmed with an age adjusted HR of 8.44 (95% CI 3.43–20.74) [Table 3]. Moreover, we observed a significant good response to golimumab and tofacitinib, but also a significant non-response to golimumab. Third, among 590 seronegative patients without use of prednisone, an increased risk of non-response with golimumab was not confirmed (age and sex adjusted HR of 1.98, 95% CI 0.94–4.19) [Table 4]. However, in a more specific population containing only women (n = 459), we observed a significant result of non-response with golimumab (age adjusted HR of 2.36, 95% CI 1.03–5.40). Fourth, among both 717 patients and 587 women with seropositivity, high disease burden and disease duration, we confirmed a high risk of non-response with golimumab with adjusted HR of 3.75 (95% CI 1.54–9.12) and of 5.27 (95% CI 2.10–13.21), respectively [Table 5]. Fifth, among both 2466 patients with low disease burden and 1313 seropositive women with low disease burden, we confirmed a good response to golimumab with adjusted HRs of 2.72 (95% CI 1.63–4.57) and of 2.39 (95% CI 1.06–5.41), respectively (Table 6). In these strata, we further confirmed a good response to tocilizumab with adjusted HRs of 3.64 (95% CI 2.04–6.49) and of 2.98 (95% CI 1.31–6.74), respectively (Table 6). Furthermore, we observed significant good responses to tofacitinib, and a significant non-response to golimumab.

thumbnail
Table 2. Comparative effectiveness analyses in strata of patients with at least 2 csDMARDs and use of prednisone.

https://doi.org/10.1371/journal.pcbi.1011073.t002

thumbnail
Table 3. Comparative effectiveness analyses in men stratum.

https://doi.org/10.1371/journal.pcbi.1011073.t003

thumbnail
Table 4. Comparative effectiveness analyses in strata of seronegative patients without use of prednisone, and more specifically, among women only.

https://doi.org/10.1371/journal.pcbi.1011073.t004

thumbnail
Table 5. Comparative effectiveness analyses in the stratum of patients with seropositivity, with high disease activity and disease duration, and additionally among women only.

https://doi.org/10.1371/journal.pcbi.1011073.t005

thumbnail
Table 6. Comparative effectiveness analyses in the stratum of patients with low disease activity, and additionally among women with seropositivity only.

https://doi.org/10.1371/journal.pcbi.1011073.t006

Discussion

Through deep embedded clustering, we obtained 5 distinct RA patient groups which turned out to respond differently to certain b/tsDMARDs. Results were validated using stratified analyses with fewer, most important features to make results applicable for clinical practice.

A strength of this study is that we used deep learning-based clustering that was shown to perform better than regular clustering [21]. We included temporal trends of characteristics by adding the time since the last measurement as a feature into the second clustering run. Yet, incorporating so called attentions into our clustering algorithm leading towards a transformer architecture may have been even more powerful [22]. While the use of clustering in this study also implied an exploratory approach (since we did not know whether obtained groups may prove useful), manyfold exploratory comparative effectiveness analyses among strata of various combinations of patient characteristics will be impracticable. A further strength includes the use of several clustering runs that obtained 2 dozen different clusters which we grouped according to similar patient characteristics. We admit that this process made the study more complex and potentially less reproducible because the visual inspection of clusters to form distinct groups may lead to different grouping depending on who is inspecting the clusters. A continuation of this work could include the automation of grouping of clusters through learning of a meta-classifier for example. However, such a program does not exist yet. While grouping of clusters may take away some information on clusters that machine learning techniques provided, results also need to be robust. The comparison of comparative effectiveness results among groups of similar clusters made our results less prone to chance findings. Having assessed similar but not identical clusters can be considered like a type of re-sampling. The overlap between clusters was indeed desired and needed to obtain robust results insensitive to minor variation. This was important because sample size of individual clusters was small. Finally, since the particular b/tsDMARD agent was not a feature in the clustering and a study in SCQM suggested no changes in patient characteristics at bDMARD initiation over time, [23] our cluster results have not been biased from time trends. Furthermore, we can also likely rule out bias from patient and physician expectations because there is no reason for a differentially distributed expectation between individual b/tsDMARD treatments since EULAR RA treatment guidance does not commend one over another [2].

While machine learning is a powerful tool to detect patterns not visible to the human eye, the use of obtained clusters in clinical practice is not feasible because clusters display an abundance of features. In clinical practice, we need simple criteria–if possible–by which to categorize patients in whom treatment response may differ. Since stratification is the preferred method to validate results obtained in patient clusters, we attempted this in this study. However, also stratification was a manual process subject to subjectivity. And since sample size was small, we were limited in the number of strata. Thus, maybe we simplified the strata too much. However, more strata mean more multiple testing. Thus, it was a trade-off to have meaningful strata according to most distinct patient characteristics while keeping the number of strata to a minimum. Moreover, despite small sample size, we observed congruent findings between comparative effectiveness analyses among clusters and validation strata. Yet, also the process to derive most important characteristics may be improved and automatized in future studies to ensure reproducibility. A further limitation of our study is that we did not have an external cohort at hand to further validate our results externally. Thus, caution is warranted when using our findings in clinical practice. Furthermore, to assess a change in DAS is not a stringent outcome and can only be considered an approximation of treatment response. However, it allowed us to increase sample size without which an investigation in this direction would not have been possible. Finally, we were not able to adjust our analyses for confounding variables like obesity because of the small sample size. However, crude and age/sex adjusted HR were identical for most of our results. Thus, we do not expect an influence on our results from other measured or non-measured confounding. While time-varying confounder adjustment may be desirable in comparative effectiveness analyses in rheumatologic conditions, markers of pain, disability, and disease activity often lie on the causal pathway between exposure and outcome and should therefore not be included in the model.

Obtained clusters in three recent studies using k-means clustering to identify RA phenotypes were mainly driven by disease activity and comorbidities, sex and comorbidities, and disease activity and health care costs, respectively [1113]. In our study, we did not assess comorbidities or health care costs, but we can confirm the importance of disease activity and sex, and we additionally identified seropositivity, disease duration, and other RA medication use at b/tsDMARD start as important variables for differentiating between clusters. It seems that the RA population in the cited studies were rather diverse and not linked to RA treatment, whereas we clustered our patients at b/tsDMARD start. However, despite using different clustering techniques and patient populations, the overlap between results is promising. Our post-hoc approach using k-means clustering did not yield separate clusters which suggests that deep learned clusters may be superior to k-means clusters when working with missing data which is usually the case in health care data.

Epidemiologic studies to date have identified predictors of treatment response (or non-response) to certain b/tsDMARDs [2428]. We compared our results with those because there are no studies that identified response to b/tsDMARDs among RA phenotypes that consist of more than one trait. Observed non-response to golimumab among seronegative women without use of prednisone as well as seropositive patients with higher disease burden and duration may suggest to avoid this treatment in these patients. However, no other significant findings were identified in these strata to help with treatment choice. Thus, adalimumab, the reference treatment may be a suggestion; especially since a study which assessed patient characteristics associated with treatment response to adalimumab suggested also that concomitant csDMARD use as well as a high disease burden was predictive of treatment response [24].

A study assessing drug survival of golimumab within 2 years was not able to identify predictors of response to golimumab [25]. In our study, 3 strata (patients with ≥2 csDMARDs and prednisone use at b/tsDMARD initiation, male patients, as well as patients with a lower disease burden) yielded a good response to golimumab. Yet, it is likely that response was not sufficient to persist on golimumab therapy given the equally increased risk of treatment discontinuation with golimumab. This may have been due to our outcome proxy for good response, which was defined as a ≥20% reduction in DAS28-esr which only requires small improvement but yielded a sufficient number of outcomes to perform analyses.

The same 3 strata (patients with ≥2 csDMARDs and prednisone use at b/tsDMARD initiation, male patients, as well as patients with a lower disease burden) further yielded a good response to tofacitinib and tocilizumab. Our results are not consistent with another study suggesting that a higher disease burden is a predictor of treatment response to tocilizumab, however the outcome definition was different and the response was assessed after only 6 months [26]. Nonetheless, attention should be paid to tofacitinib which was recently shown to have an early effect on pain reduction [29,30] which may work most efficiently in these 3 strata (i.e. patients with ≥2 csDMARDs and prednisone use at b/tsDMARD initiation, male patients, as well as patients with a lower disease burden).

This is the first study to combine machine learning with comparative effectiveness analyses. Thereby, we hope to inspire other investigators to repeat our approach or to think of other good ways how to personalize treatments for RA patients by using all methods available (i.e., bridging machine learning with conventional statistical methods).

Conclusion

This study used deep learning to suggest RA patient groups which responded differently to certain b/tsDMARDs. Results were validated through stratified analyses according to few most distinct patient characteristics for application in clinical practice. Our results suggests optimal first-line b/tsDMARD use in certain patient groups which is a step forward towards personalizing treatment in RA patients. However, further research in other cohorts is needed to verify our results.

Supporting information

S1 Text. More information on Swiss Clinical Quality Management in Rheumatic Diseases (SCQM) registry.

https://doi.org/10.1371/journal.pcbi.1011073.s001

(DOC)

S2 Text. Clustering—Deep Neural Network Architecture and Training.

https://doi.org/10.1371/journal.pcbi.1011073.s002

(DOC)

S3 Text. Example of approach how cluster groups were formed.

https://doi.org/10.1371/journal.pcbi.1011073.s003

(DOC)

S4 Text. Comparative effectiveness analyses in each of the 24 clusters.

https://doi.org/10.1371/journal.pcbi.1011073.s004

(DOC)

S1 Fig. Scheme of the adaptive deep neural network architecture AnyNet-Autoencoder with deep embedded clustering.

https://doi.org/10.1371/journal.pcbi.1011073.s005

(TIF)

S2 Fig. Results of post hoc K-means clustering using specification #1 requesting 3 clusters.

https://doi.org/10.1371/journal.pcbi.1011073.s006

(TIF)

S3 Fig. Results of post hoc K-means clustering using specification #1 requesting 4 clusters.

https://doi.org/10.1371/journal.pcbi.1011073.s007

(TIF)

S4 Fig. Results of post hoc K-means clustering using specification #1 requesting 5 clusters.

https://doi.org/10.1371/journal.pcbi.1011073.s008

(TIF)

S5 Fig. Results of post hoc K-means clustering using specification #2 requesting 3 clusters.

https://doi.org/10.1371/journal.pcbi.1011073.s009

(TIF)

S6 Fig. Results of post hoc K-means clustering using specification #2 requesting 4 clusters.

https://doi.org/10.1371/journal.pcbi.1011073.s010

(TIF)

S7 Fig. Results of post hoc K-means clustering using specification #2 requesting 5 clusters.

https://doi.org/10.1371/journal.pcbi.1011073.s011

(TIF)

S8 Fig. Flowchart of the study population.

https://doi.org/10.1371/journal.pcbi.1011073.s012

(TIF)

S1 Table. Features used for clustering at cohort entry including variable type, missingness, lookback window, and clustering run.

https://doi.org/10.1371/journal.pcbi.1011073.s013

(DOC)

S2 Table. Clusters with high use of conventional synthetic DMARDs and prednisone.

https://doi.org/10.1371/journal.pcbi.1011073.s014

(DOC)

S4 Table. Clusters of seronegative patients with a tendency towards low use of prednisone and a higher proportion of women.

https://doi.org/10.1371/journal.pcbi.1011073.s016

(DOC)

S5 Table. Clusters of mainly seropositive patients with high RA disease burden and long RA disease duration, and with a tendency towards a higher proportion of women.

https://doi.org/10.1371/journal.pcbi.1011073.s017

(DOC)

S6 Table. Clusters of patients with a rather low RA disease burden and with a tendency towards seropositivity and a higher proportion of women.

https://doi.org/10.1371/journal.pcbi.1011073.s018

(DOC)

Acknowledgments

We thank all patients and rheumatologists contributing to the SCQM registry, as well as the entire SCQM staff. A list of rheumatology offices and hospitals which contribute to the SCQM registry and a list of supporters of SCQM can be found at http://www.scqm.ch.The professorship of Andrea M Burden is partially supported by PharmaSuisse and the ETH Foundation.

References

  1. 1. Smolen JS, Aletaha D, Barton A, Burmester GR, Emery P, Firestein GS, et al. Rheumatoid arthritis. Nat Rev Dis Primers. 2018;4:1–23.
  2. 2. Smolen JS, Landewé RBM, Bijlsma JWJ, Burmester GR, Dougados M, Kerschbaumer A, et al. EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2019 update. Ann Rheum Dis. 2020;1–15.
  3. 3. Cuppen BVJ, Welsing PMJ, Sprengers JJ, Bijlsma JWJ, Marijnissen ACA, Van Laar JM, et al. Personalized biological treatment for rheumatoid arthritis: A systematic review with a focus on clinical applicability. Rheumatology (United Kingdom). 2016;55(5):826–39. pmid:26715775
  4. 4. Lu MT, Ivanov A, Mayrhofer T, Hosny A, Aerts HJWL, Hoffmann U. Deep Learning to Assess Long-term Mortality from Chest Radiographs. JAMA Netw Open. 2019;2(7):1–14. pmid:31322692
  5. 5. Choi GH, Yun J, Choi J, Lee D, Shim JH, Lee HC, et al. Development of machine learning-based clinical decision support system for hepatocellular carcinoma. Sci Rep. 2020;10(1):1–10.
  6. 6. Hügle M, Omoumi P, van Laar JM, Boedecker J, Hügle T. Applied machine learning and artificial intelligence in rheumatology. Rheumatol Adv Pract. 2021;4(1):1–10.
  7. 7. Kalweit M, Walker UA, Finckh A, Müller R, Kalweit G, Scherer A, et al. Personalized prediction of disease activity in patients with rheumatoid arthritis using an adaptive deep neural network. PLoS One. 2021;16(6):e0252289. pmid:34185794
  8. 8. Guan Y, Zhang H, Quang D, Wang Z, Parker SCJ, Pappas DA, et al. Machine Learning to Predict Anti–Tumor Necrosis Factor Drug Responses of Rheumatoid Arthritis Patients by Integrating Clinical and Genetic Markers. Arthritis and Rheumatology. 2019;71(12):1987–96. pmid:31342661
  9. 9. Artacho A, Isaac S, Nayak R, Flor-Duro A, Alexander M, Koo I, et al. The Pretreatment Gut Microbiome Is Associated With Lack of Response to Methotrexate in New-Onset Rheumatoid Arthritis. Arthritis and Rheumatology. 2021;73(6):931–42.
  10. 10. Lee YC, Frits ML, Iannaccone CK, Weinblatt ME, Shadick NA, Williams DA, et al. Subgrouping of rheumatoid arthritis patients based on pain, fatigue, inflammation and psychosocial factors. Arhtiritis Rheumatol. 2014;66(8):2006–14.
  11. 11. Curtis JR, Weinblatt M, Saag K, Bykerk VP, Furst DE, Fiore S, et al. Data-Driven Patient Clustering and Differential Clinical Outcomes in the Brigham and Women’s Rheumatoid Arthritis Sequential Study Registry. Arthritis Care Res (Hoboken). 2021;73(4):471–80. pmid:33002337
  12. 12. Jung S, Park K s, Kim K j, Min Jung S, Park KS, Kim KJ. Clinical phenotype with high risk for initiation of biologic therapy in rheumatoid arthritis: a data-driven cluster analysis Unsupervised clustering of RA / S.M. Jung et al [Internet]. Vol. 39, Clinical and Experimental Rheumatology. 2022. Available from: www.r-project.org.
  13. 13. Mars N, Kerola AM, Kauppi MJ, Pirinen M, Elonheimo O, Sokka-Isler T. Cluster analysis identifies unmet healthcare needs among patients with rheumatoid arthritis. Scand J Rheumatol. 2022;51(5):355–62. pmid:34511040
  14. 14. Orange DE, Agius P, Dicarlo EF, Robine N, Geiger H, Szymonifka J, et al. Identification of Three Rheumatoid Arthritis Disease Subtypes by Machine Learning Integration of Synovial Histologic Features and RNA Sequencing Data. Arthritis Rheumatol. 2018;70(5):690–701. pmid:29468833
  15. 15. Uitz E, Fransen J, Langenegger T, Stucki G. Clinical quality management in rheumatoid arthritis: Putting theory into practice. Rheumatology. 2000;39(5):542–9.
  16. 16. Finckh A, Simard JF, Duryea J, Liang MH, Huang J, Daneel S, et al. The effectiveness of anti-tumor necrosis factor therapy in preventing progressive radiographic joint damage in rheumatoid arthritis: A population-based study. Arthritis Rheum. 2006;54(1):54–9. pmid:16385495
  17. 17. Guo X, Gao L, Liu X, Yin J. Improved deep embedded clustering with local structure preservation. IJCAI International Joint Conference on Artificial Intelligence. 2017;0:1753–9.
  18. 18. Kalweit M, Kalweit G, Boedecker J. AnyNets: Adaptive Deep Neural Networks for Medical Data with Missing Values. Accepted at IJCAI 2021 Workshop on Artificial Intelligence for Function, Disability, and Health. 2nd Workshop on Artificial Intelligence for Function, Disability, and Health. 2021;
  19. 19. Roderick J. A. Little DBR. Statistical Analysis with Missing Data, 2nd Edition. In: John WileySons Inc. 2014.
  20. 20. Baldi P. Autoencoders, Unsupervised Learning, and Deep Architectures. ICML Unsupervised and Transfer Learning. 2012;37–50.
  21. 21. Karim MR, Beyan O, Zappa A, Costa IG, Rebholz-Schuhmann D, Cochez M, et al. Deep learning-based clustering approaches for bioinformatics. Brief Bioinform. 2021;22(1):393–415. pmid:32008043
  22. 22. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention Is All You Need. 2017 Jun 12; Available from: http://arxiv.org/abs/1706.03762.
  23. 23. Zufferey P, Dudler J, Scherer A, Finckh A. Disease activity in rheumatoid arthritis patients at initiation of biologic agents and 1 year of treatment: Results from the Swiss SCQM registry. Joint Bone Spine [Internet]. 2013;80(2):160–4. Available from: pmid:22771133
  24. 24. Kleinert S, Tony HP, Krause A, Feuchtenberger M, Wassenberg S, Richter C, et al. Impact of patient and disease characteristics on therapeutic success during adalimumab treatment of patients with rheumatoid arthritis: Data from a German noninterventional observational study. Rheumatol Int. 2012;32(9):2759–67. pmid:21822659
  25. 25. Pers YM, Fortunet C, Constant E, Lambert J, Godfrin-Valnet M, De Jong A, et al. Predictors of response and remission in a large cohort of rheumatoid arthritis patients treated with tocilizumab in clinical practice. Rheumatology (United Kingdom). 2014;53(1):76–84.
  26. 26. Iannone F, Santo L, Anelli MG, Bucci R, Semeraro A, Quarta L, et al. Golimumab in real-life settings: 2 Years drug survival and predictors of clinical outcomes in rheumatoid arthritis, spondyloarthritis, and psoriatic arthritis. Semin Arthritis Rheum. 2017;47(1):108–14. pmid:28216195
  27. 27. Rubbert-Roth A, Aletaha D, Devenport J, Sidiropoulos PN, Luder Y, Edwardes MD, et al. Effect of disease duration and other characteristics on efficacy outcomes in clinical trials of tocilizumab for rheumatoid arthritis. Rheumatology (United Kingdom). 2021;60(2):682–91. pmid:32844216
  28. 28. Maneiro RJ, Salgado E, Carmona L, Gomez-Reino JJ. Rheumatoid factor as predictor of response to abatacept, rituximab and tocilizumab in rheumatoid arthritis: Systematic review and meta-analysis. Semin Arthritis Rheum. 2013;43(1):9–17. pmid:23290690
  29. 29. Karateev A, Filatova E, Pogozheva E, Amirdzhanova V, Nasonov E, Lila A, et al. THU0206 А VERY EARLY (7–28 DAYS) RESPONSE ON JAK INHIBITOR TOFACITINIB IN PATIENTS WITH ACTIVE RHEUMATOID ARTHRITIS: EFFECT ON PAIN AND CENTRAL SENSITIZATION. Ann Rheum Dis. 2020;79(Suppl 1):327.
  30. 30. Joseph C, Bilgrami SM, Ottewell L, Gray L, Mitchell W, Wood F. THU0205 RESPONSE TO SMALL MOLECULES IS MOSTLY DRIVEN BY PATIENT GLOBAL ASSESSMENT OF DISEASE: A REAL WORLD OBSERVATION. Ann Rheum Dis. 2020;79(Suppl 1):327.