Skip to main content
  • Loading metrics

Treatment effect modification due to comorbidity: Individual participant data meta-analyses of 120 randomised controlled trials



People with comorbidities are underrepresented in clinical trials. Empirical estimates of treatment effect modification by comorbidity are lacking, leading to uncertainty in treatment recommendations. We aimed to produce estimates of treatment effect modification by comorbidity using individual participant data (IPD).

Methods and findings

We obtained IPD for 120 industry-sponsored phase 3/4 trials across 22 index conditions (n = 128,331). Trials had to be registered between 1990 and 2017 and have recruited ≥300 people. Included trials were multicentre and international. For each index condition, we analysed the outcome most frequently reported in the included trials. We performed a two-stage IPD meta-analysis to estimate modification of treatment effect by comorbidity. First, for each trial, we modelled the interaction between comorbidity and treatment arm adjusted for age and sex. Second, for each treatment within each index condition, we meta-analysed the comorbidity–treatment interaction terms from each trial. We estimated the effect of comorbidity measured in 3 ways: (i) the number of comorbidities (in addition to the index condition); (ii) presence or absence of the 6 commonest comorbid diseases for each index condition; and (iii) using continuous markers of underlying conditions (e.g., estimated glomerular filtration rate (eGFR)). Treatment effects were modelled on the usual scale for the type of outcome (absolute scale for numerical outcomes, relative scale for binary outcomes). Mean age in the trials ranged from 37.1 (allergic rhinitis trials) to 73.0 (dementia trials) and percentage of male participants range from 4.4% (osteoporosis trials) to 100% (benign prostatic hypertrophy trials). The percentage of participants with 3 or more comorbidities ranged from 2.3% (allergic rhinitis trials) to 57% (systemic lupus erythematosus trials). We found no evidence of modification of treatment efficacy by comorbidity, for any of the 3 measures of comorbidity. This was the case for 20 conditions for which the outcome variable was continuous (e.g., change in glycosylated haemoglobin in diabetes) and for 3 conditions in which the outcomes were discrete events (e.g., number of headaches in migraine). Although all were null, estimates of treatment effect modification were more precise in some cases (e.g., sodium-glucose co-transporter-2 (SGLT2) inhibitors for type 2 diabetes—interaction term for comorbidity count 0.004, 95% CI −0.01 to 0.02) while for others credible intervals were wide (e.g., corticosteroids for asthma—interaction term −0.22, 95% CI −1.07 to 0.54). The main limitation is that these trials were not designed or powered to assess variation in treatment effect by comorbidity, and relatively few trial participants had >3 comorbidities.


Assessments of treatment effect modification rarely consider comorbidity. Our findings demonstrate that for trials included in this analysis, there was no empirical evidence of treatment effect modification by comorbidity. The standard assumption used in evidence syntheses is that efficacy is constant across subgroups, although this is often criticised. Our findings suggest that for modest levels of comorbidities, this assumption is reasonable. Thus, trial efficacy findings can be combined with data on natural history and competing risks to assess the likely overall benefit of treatments in the context of comorbidity.

Author summary

Why was this study done?

  • There is often uncertainty about how treatments for single conditions should be applied to people with 2 or more long-term conditions (multimorbidity).
  • People with multimorbidity are underrepresented in randomised controlled trials (RCTs); however, trials rarely report whether the efficacy of treatment differs by the number of additional long-term conditions (comorbidities) or in the presence of specific comorbidities.

What did the researchers do and find?

  • We analysed individual-participant data from 120 RCTs including 128,331 participants across 23 index conditions to assess whether the efficacy of treatment differed depending on the number of comorbidities or in the presence of any of the most common comorbidities.
  • We found no evidence that treatment efficacy differed depending on the number of comorbidities, or by any specific comorbidities, for any of the index conditions and treatment comparisons included in this analysis.

What do these findings mean?

  • Within the range of comorbidities included within these trials, treatment effects did not vary by comorbidity. These findings can be used within evidence syntheses to estimate the likely overall benefit of treatments in the context of multimorbidity.
  • These findings are limited by the fact that people with multiple comorbidities are underrepresented in trials, and those with the highest degree of comorbidity are often excluded.


Multimorbidity, the presence of 2 or more long-term conditions, is a global clinical and public health priority [1,2]. Most people with a given long-term condition also have comorbidities (referring to additional long-term conditions in the context of an index condition). There is uncertainty about how individual long-term conditions should be managed in the presence of comorbidities [3]. A major driver of this uncertainty is the underrepresentation of people with multimorbidity in randomised controlled trials (RCTs) [4,5]. Trial populations are typically younger, healthier, and have fewer comorbidities than people treated in routine clinical practice. This has led clinical guideline developers to caution against the application of single-disease recommendations for people with multimorbidity [6]. However, despite the challenges to clinical management posed by this uncertainty, the efficacy of treatments in the context of comorbidity is rarely assessed. It is therefore not clear, for most treatments, whether relative treatment efficacy differs in people with comorbidity.

Assessing individual differences in response to medical treatments is a controversial topic. Differences in treatment efficacy are typically assessed using subgroup analyses. Subgroup analyses in RCTs seek to assess if treatment efficacy differs by patient characteristics [7]. Testing of prespecified subgroup effects is common practice in RCTs of medical therapies [8,9]. As such, subgroup analyses seek to inform stratified approaches to patient care by identifying groups for whom recommendations may be tailored [10]. However, trials rarely report subgroup analyses by levels of comorbidity or for specific comorbidities. Furthermore, subgroup analyses are inconsistently executed and reported, as well as suffering a number of well-documented statistical pitfalls [7,11], notably that analysis of subgroups risks false positive and false negative findings [11]. RCTs are generally not powered to detect subgroup effects, and as such, the sample size in subgroup analyses is frequently insufficient to detect clinically significant differences in treatment efficacy even if these were to exist [12]. Conversely, by testing multiple subgroups, the likelihood of chance findings (i.e., false positives) is increased [7,12].

The limitations of trial-level subgroup analyses can be reduced using meta-analyses. However, when considering whether treatment efficacy varies by comorbidity, traditional study-level meta-analysis of published findings are likely to be inadequate as trials rarely report subgroup effects by comorbidity, and those that do may be subject to publication bias. In such circumstances, any assessment of the effect of comorbidity is therefore based on between-trial comparisons that are prone to bias [13]. Individual-participant data meta-analysis has the potential to overcome these problems. We previously demonstrated, using data from >100 industry-sponsored clinical trials, that it was possible to identify comorbidities in most trials and that multimorbidity was common (although underrepresented) in trial populations [4,14,15]. Furthermore, in a recent simulation study, we demonstrated that combining trials on all comparisons for a given indication in Bayesian hierarchical models has several desirable properties in terms of estimating treatment effect modification by comorbidity [16]. First, precision is higher compared to single-comparison meta-analyses, increasing the likelihood of detecting small (but clinically relevant) subgroup effects where these are present. Secondly, extreme values are attenuated towards the null (shrinkage), reducing the risk of false positive findings [16]. Bayesian hierarchical models may therefore be a useful tool to assess treatment efficacy estimates in the context of multimorbidity.

This study aims to assess whether treatment effects are modified in the presence of comorbidity, by using individual participant data (IPD) from 120 trials to assess whether treatment efficacy for 23 index conditions differs by (i) number of additional long-term conditions (comorbidity count); (ii) the 6 commonest comorbidities for each index condition; (iii) by continuous biomarkers associated with comorbidity.


Study design

For trials of 23 index conditions, we identified comorbid long-term conditions using IPD for each trial. We then summarised these as a comorbidity count (in addition to the index condition) for each participant. Further, we identified the 6 commonest comorbidities for each index condition across trials and defined a presence/absence variable for each. We estimated differences in treatment efficacy by fitting regression models to IPD for each trial to obtain trial-level estimates of covariate–treatment interaction effects. We fit models for age and sex alone, for a comorbidity count, and for each of the 6 commonest comorbidities for each index condition. Trial-level estimates were then meta-analysed to obtain drug and index condition-specific estimates of treatment effect modification by comorbidity. This process is summarised in Fig 1 and explained in detail below.

Fig 1. Overview of analysis.

This figure gives an overview of the analysis structure and hierarchy. Analyses of individual-level data within each trial were conducted within 2 secure repositories (YODA shown in pink and CSDR shown in orange). For each trial, a summary of the results was exported and meta-analysed within each treatment indication (green). CSDR, Clinical Study Data Request; YODA, Yale Open Data Access.

All analyses were conducted in R (R Core Team, 2021). Analysis code, metadata (indicating, for example, how treatment arms and outcomes were selected), and data (except trial IPD) are available on the project github repository (

Data sources

Trials were identified according to a prespecified protocol [17]. We focused on trials of pharmacological agents for 23 index conditions (Table 1). Eligibility criteria were RCTs for one of the index conditions; registered with the United States Clinical trials registry ( on or after January 1990; phase 2/3, 3, or 4; including ≥300 participants; and with eligibility defined using an upper age limit of 60 years or more or no upper age limit. Smaller studies and studies with lower age limits were excluded as they were considered less likely to include sufficient people with comorbidity. From a list of all registered, eligible trials we then identified trials for which IPD were available from one of 2 repositories: Clinical Study Data Request (CSDR) or the Yale Open Data Access (YODA) repository. These repositories facilitate sharing of industry-sponsored trial data with third-party researchers. The process of trial identification is described in detail elsewhere [4].

Table 1. Index conditions, outcomes, and treatment comparisons for included trials.

Quantifying comorbidity

For each participant with a specified index condition in each of the included trials, we identified comorbidities from a prespecified list of 21 conditions (cardiovascular disease, chronic pain, arthritis, affective disorders, acid-related disorders, asthma/chronic obstructive pulmonary disease (COPD), diabetes mellitus, osteoporosis, thyroid disease, thromboembolic disease, inflammatory conditions, benign prostatic hyperplasia, gout, glaucoma, urinary incontinence, erectile dysfunction, psychotic disorders, epilepsy, migraine, parkinsonism, and dementia) [4]. These comorbidities were based on previous work identifying comorbidities within trial IPD and were based on assessment of medical history and concomitant medication data. In this previous work, we demonstrated that while for many trials medical history had been redacted, data on concomitant medications were widely available and could be used to define comorbidities [4]. This involved combining some conditions into the same definition (e.g., asthma and COPD, which could not be differentiated based on medication use alone). These definitions were based on the World Health Organisation Anatomic Therapeutic Classification and are described in our previous publication and available on the project github repository [4]. Where medical history data was available and coded using the Medical Dictionary for Regulatory Activities (MedDRA) coding system, we also identified the same conditions using MedDRA codes.

Comorbidity count

For the primary analysis, we created a comorbidity count for each participant. This was the total number of comorbidities present, not including the index condition. This count was used as a numerical variable in all analyses.

Individual comorbidities

For each index condition, we also identified the 6 most common comorbidities from the full list of 21 possible comorbidities. These individual comorbidities were analysed as binary variables (reflecting the presence of absence of that specific comorbidity).

Selected biomarkers/risk factors

In addition to the 21 comorbidities defined using medication and/or medical history, we identified 5 continuous biomarkers that may indicate comorbidity (e.g., renal impairment, hypertension, anaemia, or liver disease) or risk factors (e.g., obesity). These were based on baseline trial measurements: estimated glomerular filtration rate (eGFR, as a marker of renal impairment, taken from trial data where this was available and calculated from creatinine, age, sex, and race using the Modification of Diet in Renal Disease (MDRD) equations if it was not), body mass index (as recorded or calculated based on height and weight), fibrosis-4 (FIB-4) index (as a marker of liver disease calculated from aspartate aminotransferase, alanine transaminase, and platelet counts), haemoglobin, and mid-blood pressure (MBP, defined as 0.5 × (systolic blood pressure + diastolic blood pressure)).


Age and sex were extracted from each trial based on the trial recorded values at randomisation.

Treatment arms

Treatment arm comparisons were prespecified prior to undertaking the outcome analyses. For multiarm trials, the most extreme arms were selected for comparison (e.g., if different dosages were used, the highest dose was compared to placebo or usual care—e.g., canagliflozin 300 mg, rather than 100 mg, versus placebo). Where placebo or usual care was included as a trial arm, this was selected as the comparator. Otherwise, we chose the arm with the least recently developed treatment as the comparator arm. This was to give the best chance of identifying effect modification, with the resulting analysis representing an upper limit on the degree of effect modification observed.


We aimed to identify outcomes common across trials to facilitate meta-analysis. We obtained information from via the Database for Aggregated Analysis of (AACT; https: // on all outcomes (primary and secondary) for each trial. For each index condition, we then identified 1 or more outcomes that appeared to be common to multiple trials (e.g., forced expiratory volume in 1 s (FEV1) in COPD trials, 6-min walk distance (6MWD) in pulmonary hypertension trials). Within the trial repositories, we then reviewed the trial documentation to identify these outcomes for each trial. For trials of anticoagulants, in addition to the efficacy outcome, we also analysed bleeding events as these are a common and clinically important adverse outcome.

Statistical analyses

In 4 separate analyses we (i) estimated age and sex–treatment interactions without including comorbidity; (ii) estimated comorbidity–treatment interactions for the comorbidity count; (iii) estimated comorbidity–treatment interactions for the 6 commonest comorbidities for each index condition; and (iv) examined covariate–treatment interactions for continuous biomarkers. Full descriptions of the modelling are provided in the Supporting information appendix (S1 File) and are described briefly below.

IPD analysis

For trials where the outcome was a continuous variable, for each trial and analysis the change in each outcome was modelled using linear regression. For analysis (i), the final measure was regressed on the baseline measure, age (modelled as a continuous variable scaled to 15-year increments, which was close to the standard deviation for most trials), sex (male versus referent group of females), arm (binary variable treatment/control), and interactions with arm for each covariate. For American College of Rheumatology-N (ACR-N, a measure of improvement in disease activity in rheumatoid arthritis, which is itself a measure of change), we did not include the baseline measure as a covariate. We then repeated this modelling for the remaining analyses (ii to iv) adding comorbidity covariates in addition to age and sex (comorbidity count, specific comorbidities, and continuous biomarkers for analyses (ii to iv), respectively). From these models, the model coefficients, standard errors, and variance-covariance matrices were obtained and exported from the YODA and CSDR secure analysis platforms.

For trials where the outcome was a count or a binary variable, we fitted similar models using Poisson regression and logistic regression, respectively.


For the continuous outcomes, in order to convert the measures onto a similar scale, we divided the estimates and standard errors by the minimum clinically important difference (MCID) for that measure. For most outcome measures, higher scores indicate worse outcomes (e.g., Bath Ankylosing Spondylitis Disease Activity Index (BASDI)). Where this was not the case (e.g., FEV1), we multiplied the values by minus one so that the direction of effect was the same for all trials. For the variance-covariance matrix, we divided each element by the MCID-squared. The MCID was selected using the published literature by hand-searching papers in the Core Outcome Measures in Effectiveness Trials (COMET) database for relevant conditions [18]. This search was supplemented by simple internet searches (Google searches using the full and abbreviated names for each outcome and MCID, MID, “minimum clinically important difference,” or “minimum important difference”). Where no published MCID recommendations could be found, we used the MCID defined in the power calculations in the trial protocols. At this stage, for each index condition, we restricted the analysis to the single most common outcome across trials. In 2 index conditions (Ankylosing spondylitis and hypertension), 2 outcomes were equally common; BASDAI and Bath Ankylosing Spondylitis Functional Index (BASFI) and diastolic blood pressure and systolic blood pressure, we arbitrarily chose BASDAI in the former case and chose systolic blood pressure in the latter as it is more prominent in clinical decision-making.

For each drug class, the model outputs were then meta-analysed. We used random-effects meta-analyses where 5 or more trials were included within the same drug class, and fixed effects where there were fewer than 5 trials. We used Bayesian models since this allowed us to simultaneously model multiple coefficients (e.g., age–treatment and sex–treatment interactions). The Bayesian models were fit using the brms package [19]. Samples from the posterior distribution were obtained and summarised as the mean and 95% credible intervals (CI). P-values were not presented as this was a Bayesian analysis. The full posteriors are provided in the project repository (doi:10.5281/zenodo.7713360).

In case other researchers wish to use the results of our models of treatment–covariate interactions to inform subsequent analyses as informative priors, we obtained summaries of the posterior predictions. We did so only for analysis (ii) for continuous outcomes. In order to provide a more general set of priors, we also predicted the comorbidity count–treatment interaction for treatment comparisons/conditions not included in our model by obtaining samples from the posteriors. The latter are provided to allow researchers to conduct Bayesian analyses or probabilistic sensitivity analyses if studying conditions/treatment comparisons not included in our modelling as this represents a prediction for an unobserved index condition/treatment comparison (albeit one which is assumed to be exchangeable with the conditions/treatment comparisons included in the current analysis). We then summarised these samples by fitting a Student’s t-distribution. As with the main analysis, these models were fitted using the brms package (S1 File).

Ethical approval

This project had approval from the University of Glasgow, College of Medicine, Veterinary and Life Sciences ethics committee (200160070).


Trial characteristics

Trial baseline characteristics have been reported previously [4]. For trials with continuous outcomes, there were 20 index conditions and 47 treatment comparisons across a total of 106 trials (n = 88,150 participants). For 9 index conditions, there was only 1 treatment comparison across all trials. Diabetes, which was the condition for which there were the most trials (22), had the largest number of treatment comparisons (9) (Table 1). Within each model, all trials had a single common outcome except inflammatory bowel disease, where the ulcerative colitis trials used the MAYO score and Crohn’s disease trials used the Crohn’s Disease Activity Index score. For trials with categorical outcomes, there were 3 index conditions (migraine, osteoporosis, and thromboembolism) and 11 treatment comparisons across a total of 17 trials (n = 11,624 participants). For thromboembolism, there were 3 more specific categories of indication—primary prevention (5 trials), secondary prevention (2 trials), and treatment (2 trials).

Continuous outcomes—Age–and sex–treatment interactions

For all conditions with continuous outcomes, interaction terms for age–and sex–treatment interactions are shown in Table 2. For most drug classes, interaction terms for age included the null, indicating no statistically significant associations consistent with modification of treatment efficacy by age. However, in the diabetes trials, there appeared to be an attenuation in the treatment effect with increasing age for 3 drug classes (0.07 (95% CI 0.00, 0.13) for sulfonylureas versus SGLT2 inhibitors, 0.09 (95% CI 0.01, 0.17) for DPP-4 inhibitors versus SGLT2 inhibitors, and 0.07 (95% CI 0.04, 0.11) for SGLT2 inhibitors versus placebo). Taking SGLT2 inhibitors versus placebo as an example, this can be read as follows—“the lowering effect on HbA1c of SGLT2 inhibitors versus placebo is 0.28 (95% CI 0.16, 0.44) mmol/mol smaller (since the MCID for HbA1c is 4 mmol/mol) per 15-year increment in age, for age 50 years versus age 80 years, this corresponds to the effect being 0.56 (95% CI 0.32, 0.88) mmol/mol smaller. Similarly, most interaction terms for sex included the null, with a few exceptions (Table 2). For example, for glucagon-like peptide-1 (GLP-1) analogues, the interaction term for sex was 0.29 (0.12, 0.49) indicating that the lowering effect on HbA1c of GLP-1 analogues is 1.16 (0.48, 1.96) mmol/mol smaller in men than in women.

Table 2. Covariate–treatment interactions (expressed as multiples of minimal clinically important difference) by age and sex for continuous outcomes; point estimates and 95% CIs.

Continuous outcomes—Comorbidity–treatment interactions

For each drug class, Figs 2 to 6 show the main treatment effect (black points, expressed as change in minimally clinically important difference) and the estimate for the comorbidity–treatment interaction based on a comorbidity count (red points) meta-analysed within treatment indications. Fig 7 shows similar estimates for indications in which only a single trial was included. Meta-analyses for each drug class are shown in Figs 2 to 6 and, for classes where only 1 trial was analysed, trial-level estimates are shown in Fig 7. Comorbidity count was not associated with any attenuation or strengthening in treatment efficacy; in all cases, the 95% CIs included the null. This suggests that for all treatments and in all index conditions, it is plausible that there is no difference in treatment effect by comorbidity (on the absolute scale) within the range of comorbidity counts observed in the trials. When examining comorbidity–treatment interactions for the 6 most common comorbidities within each index condition, 95% CIs included the null for all estimates (S1 Table). Similarly, when assessing modification of treatment efficacy by continuous biomarkers, all estimates included the null (S2 Table).

Fig 2.

Main treatment effect and comorbidity–treatment interactions (ankylosing spondylitis, asthma, BPH, CIU, and dementia): This plot shows the main treatment effect (black) and the comorbidity–treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and meta-analysed estimates (diamonds) are presented along with 95% CIs (whiskers). Details of effect estimates, heterogeneity, and model diagnostics can be found here: BPH, benign prostatic hypertrophy; CI, credibility interval; CIU, chronic idiopathic urticaria.

Fig 3.

Main treatment effect and comorbidity–treatment interactions (diabetes, GORD, hypertension, and IBD): This plot shows the main treatment effect (black) and the comorbidity–treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and meta-analysed estimates (diamonds) are presented along with 95% CIs (whiskers). Details of effect [3]estimates, heterogeneity, and model diagnostics can be found here: CI, credibility interval; GORD, gastro-oesophageal reflux disease; IBD, inflammatory bowel disease.

Fig 4.

Main treatment effect and comorbidity–treatment interactions (IBD, inflammatory arthropathy, and osteoporosis): This plot shows the main treatment effect (black) and the comorbidity–treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and meta-analysed estimates (diamonds) are presented along with 95% CIs (whiskers). Details of effect estimates, heterogeneity, and model diagnostics can be found here: [4] CI, credibility interval; IBD, inflammatory bowel disease.

Fig 5.

Main treatment effect and comorbidity–treatment interactions (Parkinson’s disease, psoriasis, COPD, and pulmonary fibrosis): This plot [5]shows the main treatment effect (black) and the comorbidity–treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and meta-analysed estimates (diamonds) are presented along with 95% CIs (whiskers). Details of effect estimates, heterogeneity, and model diagnostics can be found here: CI, credibility interval; COPD, chronic obstructive pulmonary disease.

Fig 6.

Main treatment effect and comorbidity–treatment interactions (restless legs syndrome and SLE): This plot shows the main treatment effect (black) and the comorbidity–treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) and meta-analysed estimates (diamonds) are presented along with 95% CIs (whiskers). Details of effect estimates, heterogeneity, and model diagnostics can be found here: CI, credibility interval; SLE, systemic lupus erythematosus.

Fig 7.

Main treatment effect and comorbidity–treatment interactions (single trial estimates): This plot shows the main treatment effect (black) and the comorbidity–treatment interaction (red) based on a comorbidity count. Trial-level estimates (circles) are presented along with 95% CIs (whiskers). Details of effect estimates, heterogeneity, and model diagnostics can be found here: CI, credibility interval.

In a sensitivity analysis, rather than using a fixed effects model for meta-analyses where there were fewer than 5 trials, we used a random effects model. The 95% CIs were wider, but the results of these models were otherwise similar to those presented in the main analysis (S3 Table).

Informative priors for subsequent analyses including different index condition/treatment comparisons

On predicting treatment effect modification by comorbidity count for a notional unobserved condition and notional unobserved treatment comparison, the samples from the posterior were approximately t-distributed (central estimate = 0.01, dispersion = 0.01, degrees of freedom = 3.24).

Categorical outcomes—Morbidity count–treatment interactions

For the 3 index conditions with categorical outcomes (Table 1), there was no evidence of any comorbidity count–treatment interactions. These findings are summarised in Table 3.

Table 3. Comorbidity–treatment interactions for binary and count outcomes; point estimates and 95% CIs.


In an IPD meta-analysis of 120 trials, we examined whether the efficacy of drug treatments differed by comorbidity. For 20 index conditions where the outcome variable was continuous (e.g., glycosylated haemoglobin in diabetes trials), efficacy did not differ by the total number of comorbidities or by the presence or absence of specific comorbidities. Similarly, for 3 conditions (17 trials) examining outcomes which were discrete events (e.g., thromboembolism, bleeding, headaches, and fractures), there was no evidence of treatment effect modification by comorbidity count or by specific comorbidities.

Several previous studies have reported findings on treatment effect modification in IPD meta-analyses and meta-analyses of reported subgroup effects. However, these have largely been confined to major cardiovascular disease trials (e.g., for showing similar efficacy of statin in people with and without diabetes [20], differential benefit of blood pressure lowering therapy in people with and without diabetes [21], or showing questionable net benefit of aspirin in primary prevention [22]) or to concordant conditions defined as those closely related to the index condition or target event for the trial (such as hypertension in stroke trials [23]). These studies have not considered the impact of comorbidity more broadly or of discordant comorbidities not related to the index condition of the trial. This represents an important omission, because there are a number of mechanisms by which the presence of discordant conditions might plausibly modify treatment efficacy (positively or negatively) including increased diagnostic misclassification, altered pharmacokinetics, or pharmacodynamics (e.g., altered drug excretion in people with mild renal impairment or increased benefits of antiplatelet drugs in the presence of coexistent inflammatory conditions) and altered treatment-related behaviours (e.g., better or worse treatment adherence due to existing treatment regimens). Our study adds to this sparse literature showing that, on average, treatment effects are similar across different populations within trials (at modest comorbidity counts of 3 or fewer). This supports the standard assumption that treatment effects are similar when generalising from trial to non-trial eligible populations, at least for populations with limited prevalence of comorbidity such as in these trials.

Although we found that treatment efficacy did not differ by comorbidity count, net overall treatment benefits may nonetheless differ in people with differing degrees of comorbidity. This is because differences in the baseline risk (e.g., the absolute risk of the outcome that the treatment is intended to prevent), differences in susceptibility to treatment-related adverse events, differences in competing risks (e.g., absolute risk of mortality from other causes), and differences in the burden of treatment (e.g., higher treatment burden in the context of multimorbidity leading to reduced concordance or reduced quality of life) may all lead to differences in the net overall benefit of treatment [24]. Therefore, where comorbidity alters the (baseline) natural history of diseases, the likelihood of adverse treatment effects (e.g., comorbid renal impairment) or life expectancy (e.g., via discordant comorbidities associated with mortality), the effects of treatment must differ even assuming that there is no difference in efficacy. For example, while there is strong evidence that the benefits of dual antiplatelet therapy (DAPT) following myocardial infarction (versus a single antiplatelet) outweigh the risks [25,26], this may not be true for patients with coexisting COPD. Cardiovascular mortality is commoner in COPD than the general population, favouring DAPT [27]. However, non-cardiovascular mortality is also higher [28], favouring single-antiplatelet therapy because of competing risks. Intensive control of blood glucose and other risk factors in diabetes [29,30] and anticoagulant use in atrial fibrillation [31] provide similar examples where the net overall treatment benefits are uncertain for people with comorbidity.

This is the first IPD clinical trial meta-analysis, as far as we are aware, to examine whether treatment efficacy differs by comorbidity. Nonetheless, there are several important limitations. First, while for some index conditions (e.g., diabetes) there were many trials, for others there were few trials and so relatively few participants, limiting the precision with which covariate–treatment interactions could be estimated. Furthermore, the individual trials were neither designed not powered to measure comorbidity–treatment interactions. Specifically, the higher levels of comorbidity observed in clinical practice (e.g., 5 or more comorbidities) are rare within the trial participants. This reduces the likelihood of detecting a comorbidity–treatment interaction were one to exist. Therefore, while our results are consistent with there being no comorbidity–treatment interactions, this should be interpreted within the range of comorbidities, index conditions, and treatment comparisons that are presented.

Second, most trials were phase 3 trials focussed on efficacy outcomes (e.g., change in a disease marker such as blood pressure or glycosylated haemoglobin) rather than pragmatic trials focussed on harder outcomes (such as the incidence of specific adverse health outcomes). The findings for the smaller number of trials (17 in total) where we did have harder outcomes (headaches, bleeding, thromboembolism, and fracture) were similar to the findings for the remaining trials; there was no evidence of treatment effect modification by comorbidity count on the conventional scale (additive for continuous outcomes and relative for noncontinuous outcomes). Nonetheless, the small number of trials and indications where hard outcomes were studied means that caution is needed in extrapolating our findings to trials or meta-analyses focussing on such outcomes. Also, for some conditions and indications, the main effects were small or included the null. Where this was the case, the chances of detecting treatment effect modification are lower.

Third, while this analysis assesses treatment efficacy, we did not assess whether comorbidities lead to variation in adverse effects of treatment. An appreciation of both benefits and harms is required in order to inform judgements about the net benefits of treatment in the context of comorbidity.

Fourth, while we include a large number of trials across a range of index conditions, this is not a representative sample in terms of the larger body of trials. Specifically, there were all industry-sponsored trials (as the CSDR and YODA repositories only held industry-sponsored trial data for the conditions of interest). Furthermore, not all sponsors share data in this way nor do sponsors share data for all trials conducted. We have previously demonstrated that these trials were similar to the wider body of industry-sponsored trials in terms of characteristics such as size, phase, and significance of the primary outcome [4]. However, it is possible that by selecting only industry-sponsored trials inclusion criteria and selection processes may be more restrictive than for other trials. This means that, while we did not detect any evidence of treatment effect modification by comorbidity, it cannot be assumed to be absent particularly in other trials which may be more pragmatic or have less restrictive selection criteria.

Finally, while comorbidity was present in all the included trials, they remain underrepresentative in terms of the extent comorbidity [4,3234]. Specifically, there were few people in the included trials with high comorbidity counts (e.g., 4 or more conditions). This highly multimorbid population is not uncommon in routine clinical practice [35] and presents considerable challenges for clinical decision-making [3]. Their exclusion from these trials means that our findings cannot be assumed to be directly transferable to patient groups with the highest degree of multimorbidity, for whom uncertainties over the net benefit of treatments are often greatest [36,37].

Our findings have implications for the conduct of future evidence syntheses. In order to estimate net overall treatment benefits, clinical guidelines and health technology assessments routinely use evidence synthesis [38]. Such approaches combine (i) estimates of relative treatment efficacy with (ii) “natural” history (standard comparator rates) to calculate absolute effectiveness, commonly expressed as the absolute risk reduction (ARR) or number needed to treat [39]. However, hitherto evidence synthesis has rarely been used to estimate net overall treatment effects for people with multimorbidity. This may partly be due to uncertainty as to whether and how efficacy estimates differ in people with and without comorbidities. Since estimating the natural history rates of target and adverse events for people with multimorbidity is relatively straightforward using routine healthcare data (since such data are sufficiently large and rich in people with multimorbidity to produce such estimates), and within the limitations outlined above, our findings support the standard assumption of estimates of treatment efficacy being constant (at least at the modest levels observed within trial populations).

To support such evidence syntheses, we have provided a set of informative priors that can be used to propagate, into the final treatment effectiveness estimates, the additional uncertainty arising from applying estimates from clinical trials to populations rich in multimorbidity. We summarised the variation in treatment effects by comorbidity count as a set of Student’s t-distributions. These distributions can be used to inform modelling studies (e.g., health technology assessments) designed to extrapolate treatment effect estimates from trial populations to routine clinical practice where multimorbidity is more common. This has the potential to better inform regulatory bodies and guideline developers as they seek to make treatment recommendations for people with multimorbidity.

Our findings also have relevance for analyses of comorbidity subgroup findings in both single clinical trials and as part of meta-analyses. The lack of information for estimating subgroup effects in clinical trials and dangers of falsely claiming spurious subgroup effects is well established and a range of approaches have been advocated for dealing with this problem. These include limiting the number of subgroups and performing corrections for multiple testing (e.g., the Bonferroni technique used in frequentist analysis), the analysis of treatment effect modification according to participant’s prognostic risk scores at baseline (which reduce the dimensionality of the problem and prioritises characteristics known to predict differences in the rates of target events) [40] and in a Bayesian context the use of subject-matter expert knowledge (via prior elicitation). The prior distributions derived from our modelling for the comorbidity–treatment interactions can help inform such prior-elicitation exercises. Another technique used in Bayesian subgroup analyses is to use off-the-shelf conservative priors designed to avoid over fitting [41]; our findings will help provide reassurance that such priors are unlikely to be overly conservative for modelling comorbidity–treatment interactions.

Finally, our results have relevance for reporting of clinical trial results. Both comorbidity and frailty can be measured using data already collected from clinical trials and—as we show—it is feasible to estimate comorbidity–treatment interactions using such measures. In our project, this required access to IPD a process which is expensive (in terms of analysis time) and complex (requiring formal contractual agreements). The PATH statement advocated that clinical trials should report treatment effect modification by baseline prognostic risk score [40]. We agree that this is a useful approach because it reduces the complex problem of subgroup analysis into a single measure (reducing overfitting), and because, by definition, it targets variables which most strongly predict the risk of target events. This latter aspect is important as it helps inform evidence synthesis models applying trial results to a target population with a higher target event rate. For similar reasons, we propose that trials should also report evidence of treatment effect modification by comorbidity or degree of frailty; this would reduce the risk of overfitting by reducing comorbidity to a single variable that predicts rates of competing events. To inform judgements about net benefits, this same information should be provided for adverse events. In addition, more research is required to establish whether specific comorbidities may attenuate or strengthen treatment efficacy, as if these effects were in the opposite direction for different comorbidities, then a cumulative count of comorbidities may obscure this effect.

In conclusion, we found no evidence that treatment efficacy differed by comorbidity within the levels of comorbidity observed within clinical trial populations. This finding held whether comorbidity was measured using a simple condition count or by the presence or absence of 6 common conditions. Nonetheless, comorbidity is underrepresented in trials, especially at higher levels often seen in clinical practice, and in these contexts, the applicability of trial effect estimates needs to be carefully considered. The analysis of these trials may be used to inform subsequent evidence syntheses, analysis and reporting of individual trials, meta-analyses, and health economic models. We provide model outputs in the form of prior distributions to support such analyses.

Supporting information

S1 File. Statistical methods.

This file contains a more detailed description of the statistical analysis and model specifications.


S1 Table. Comorbidity–treatment interactions for the 6 most common comorbidities within each index condition.

This table shows the coefficients and 95% CIs for the comorbidity–treatment interaction for each of the 6 most common comorbidities within each index condition. Each comorbidity was modelled separately.


S2 Table. Comorbidity–treatment interactions for continuous biomarkers.

This table shows the coefficients and 95% CIs for the comorbidity–treatment interaction continuous biomarkers associated with comorbidity. Biomarkers assessed were estimated glomerular filtration rate (eGFR, as a marker of renal impairment, taken from trial data where this was available and calculated from creatinine, age, sex, and race using the MDRD equations if it was not), body mass index (as recorded or calculated based on height and weight), fibrosis-4 (FIB-4) index (as a marker of liver disease calculated from aspartate aminotransferase, alanine transaminase, and platelet counts), haemoglobin, and MBP (defined as 0.5 × (systolic blood pressure + diastolic blood pressure)).


S3 Table. Random effects meta-analyses for indications with <4 trials.

This table presents the results for meta-analyses of less than 5 trials using a random effects meta-analysis presenting alongside the fixed-effects findings from the main manuscript.



This study, carried out under YODA Project # 2017–1746, used data obtained from the Yale University Open Data Access Project, which has an agreement with JANSSEN RESEARCH & DEVELOPMENT, L.L.C. The interpretation and reporting of research using this data are solely the responsibility of the authors and does not necessarily represent the official views of the Yale University Open Data Access Project or JANSSEN RESEARCH & DEVELOPMENT, L.L.C. This study was also carried out under project number 1732, used data from the repository, who provided data from Boehringer-Ingelheim, GSK, Lilly, Roche, Takeda, and Sanofi. The interpretation and reporting of research using these data are solely the responsibility of the authors and does not necessarily represent the official views of or Boehringer-Ingelheim, GSK, Lilly, Roche, Takeda or Sanofi.


  1. 1. Whitty CJ, MacEwen C, Goddard A, Alderson D, Marshall M, Calderwood C, et al. Rising to the challenge of multimorbidity. BMJ. 2020.
  2. 2. Barnett K, Mercer SW, Norbury M, Watt G, Wyke S, Guthrie B. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study. Lancet. 2012;380(9836):37–43. pmid:22579043
  3. 3. Wallace E, Salisbury C, Guthrie B, Lewis C, Fahey T, Smith SM. Managing patients with multimorbidity in primary care. BMJ. 2015:350. pmid:25646760
  4. 4. Hanlon P, Hannigan L, Rodriguez-Perez J, Fischbacher C, Welton NJ, Dias S, et al. Representation of people with comorbidity and multimorbidity in clinical trials of novel drug therapies: an individual-level participant data analysis. BMC Med. 2019;17(1):201. pmid:31711480
  5. 5. Van Spall HG, Toren A, Kiss A, Fowler RA. Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review. JAMA. 2007;297(11):1233–1240. pmid:17374817
  6. 6. National Institute for Health and Care Excellence. Multimorbidity: clinical assessment and management. NICE guideline [NG56] 2016. Available from:
  7. 7. Sun X, Briel M, Walter SD, Guyatt GH. Is a subgroup effect believable? Updating criteria to evaluate the credibility of subgroup analyses. BMJ. 2010;340:c117. pmid:20354011
  8. 8. Sun X, Briel M, Busse JW, You JJ, Akl EA, Mejza F, et al. The influence of study characteristics on reporting of subgroup analyses in randomised controlled trials: systematic review. BMJ. 2011;342:d1569. pmid:21444636
  9. 9. Gabler NB, Duan N, Raneses E, Suttner L, Ciarametaro M, Cooney E, et al. No improvement in the reporting of clinical trial subgroup effects in high-impact general medical journals. Trials. 2016;17(1):320. pmid:27423688
  10. 10. Kent DM, Hayward RA. Limitations of Applying Summary Results of Clinical Trials to Individual PatientsThe Need for Risk Stratification. JAMA. 2007;298(10):1209–1212. pmid:17848656
  11. 11. Kent DM, Rothwell PM, Ioannidis JP, Altman DG, Hayward RA. Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal. Trials. 2010;11(1):85. pmid:20704705
  12. 12. Hernández AV, Boersma E, Murray GD, Habbema JDF, Steyerberg EW. Subgroup analyses in therapeutic cardiovascular clinical trials: Are most of them misleading? Am Heart J. 2006;151(2):257–64. pmid:16442886
  13. 13. Schandelmaier S, Briel M, Varadhan R, Schmid CH, Devasenapathy N, Hayward RA, et al. Development of the Instrument to assess the Credibility of Effect Modification Analyses (ICEMAN) in randomized controlled trials and meta-analyses. Can Med Assoc J. 2020;192(32):E901. pmid:32778601
  14. 14. Butterly EW, Hanlon P, Shah AS, Hannigan LJ, McIntosh E, Lewsey J, et al. Comorbidity and health-related quality of life in people with a chronic medical condition in randomised clinical trials: An individual participant data meta-analysis. PLoS Med. 2023;20(1):e1004154. pmid:36649256
  15. 15. Lees JS, Hanlon P, Butterly E, Wild SH, Mair FS, Taylor RS, et al. The impact of age, sex and morbidity count on trial attrition: a meta-analysis of individual participant-level data from phase 3/4 industry-funded clinical trials. BMJ Med. 2022.
  16. 16. Hannigan LJ, Phillippo DM, Hanlon P, Moss L, Butterly EW, Hawkins N, et al. Improving the estimation of subgroup effects for clinical trial participants with multimorbidity by incorporating drug class-level information in Bayesian hierarchical models: a simulation study. Med Decis Making. 2021:0272989X211029556. pmid:34407672
  17. 17. McAllister D, Rodriguez-Perez JA, Hannigan L. Assessing heterogeneity in treatment efficacy by age, sex and multimorbidity. PROSPERO. 2018:CRD42018048202. Available from:
  18. 18. Gargon E, Gorst SL, Williamson PR. Choosing important health outcomes for comparative effectiveness research: 5th annual update to a systematic review of core outcome sets for research. PLoS ONE. 2019;14(12):e0225980. pmid:31830081
  19. 19. Bürkner P-C. brms: An R Package for Bayesian Multilevel Models Using Stan. J Stat Softw. 2017;80(1):1–28.
  20. 20. Kearney P, Blackwell L, Collins R, Keech A, Simes J, Peto R, et al. Efficacy of cholesterol-lowering therapy in 18,686 people with diabetes in 14 randomised trials of statins: a meta-analysis. Lancet (London, England). 2008;371(9607):117–125. pmid:18191683
  21. 21. Turnbull F, Neal B, Algert C, Chalmers J, Chapman N, Cutler J, et al. Effects of different blood pressure-lowering regimens on major cardiovascular events in individuals with and without diabetes mellitus: results of prospectively designed overviews of randomized trials. Arch Intern Med. 2005;165(12):1410–1419. pmid:15983291.
  22. 22. Baigent C, Blackwell L, Collins R, Emberson J, Godwin J, Peto R, et al. Aspirin in the primary and secondary prevention of vascular disease: collaborative meta-analysis of individual participant data from randomised trials. Lancet. 2009;373(9678):1849–1860. pmid:19482214
  23. 23. Leonardi-Bee J, Bath PM, Bousser M-G, Davalos A, Diener H-C, Guiraud-Chaumeil B, et al. Dipyridamole for preventing recurrent ischemic stroke and other vascular events: a meta-analysis of individual patient data from randomized controlled trials. Stroke. 2005;36(1):162–168. pmid:15569877
  24. 24. O’Hare AM, Hotchkiss JR, Tamura MK, Larson EB, Hemmelgarn BR, Batten A, et al. Interpreting treatment effects from clinical trials in the context of real-world risk information: end-stage renal disease prevention in older adults. JAMA Intern Med. 2014;174(3):391–397. pmid:24424348
  25. 25. COMMIT Collaborative Group. Addition of clopidogrel to aspirin in 45 852 patients with acute myocardial infarction: randomised placebo-controlled trial. Lancet. 2005;366(9497):1607–1621. pmid:16271642
  26. 26. Clopidogrel in Unstable Angina to Prevent Recurrent Events Trial Investigators. Effects of clopidogrel in addition to aspirin in patients with acute coronary syndromes without ST-segment elevation. New Engl J Med. 2001;345(7):494–502.
  27. 27. MacNee W, Maclay J, McAllister D. Cardiovascular injury and repair in chronic obstructive pulmonary disease. Proc Am Thorac Soc. 2008;5(8):824–833. pmid:19017737
  28. 28. McGarvey LP, John M, Anderson JA, Zvarich M, Wise RA. Ascertainment of cause-specific mortality in COPD: operations of the TORCH Clinical Endpoint Committee. Thorax. 2007;62(5):411–415. pmid:17311843
  29. 29. Timbie JW, Hayward RA, Vijan S. Diminishing efficacy of combination therapy, response-heterogeneity, and treatment intolerance limit the attainability of tight risk factor control in patients with diabetes. Health Serv Res. 2010;45(2):437–456. pmid:20070387
  30. 30. National Institute for Health and Care Excellence. Type 2 Diabetes in Adults: Management (NICE Guideline 28). 2019. Available from:
  31. 31. Lopes LC, Spencer FA, Neumann I, Ventresca M, Ebrahim S, Zhou Q, et al. Systematic review of observational studies assessing bleeding risk in patients with atrial fibrillation not using anticoagulants. PLoS ONE. 2014;9(2):e88131. pmid:24523876
  32. 32. Hanlon P, Butterly E, Lewsey J, Siebert S, Mair FS, McAllister DA. Identifying frailty in trials: an analysis of individual participant data from trials of novel pharmacological interventions. BMC Med. 2020;18(1):1–12.
  33. 33. He J, Morales DR, Guthrie B. Exclusion rates in randomized controlled trials of treatments for physical conditions: a systematic review. Trials. 2020;21(1):1–11.
  34. 34. Hanlon P, Corcoran N, Rughani G, Shah AS, Mair FS, Guthrie B, et al. Observed and expected serious adverse event rates in randomised clinical trials for hypertension: an observational study comparing trials that do and do not focus on older people. Lancet Healthy Longev. 2021.
  35. 35. Hanlon P, Jani BD, Nicholl B, Lewsey J, McAllister DA, Mair FS. Associations between multimorbidity and adverse health outcomes in UK Biobank and the SAIL Databank: A comparison of longitudinal cohort studies. PLoS Med. 2022;19(3):e1003931. pmid:35255092
  36. 36. Boyd CM, Kent DM. Evidence-based medicine and the hard problem of multimorbidity. J Gen Intern Med. 2014;29(4):552. pmid:24442331
  37. 37. Guthrie B, Thompson A, Dumbreck S, Flynn A, Alderson P, Nairn M, et al. Better guidelines for better care: accounting for multimorbidity in clinical guidelines–structured examination of exemplar guidelines and health economic modelling. 2017.
  38. 38. Dias S, Welton NJ, Sutton AJ, Ades A. Evidence synthesis for decision making 1: introduction. Med Decis Making. 2013;33(5):597–606. pmid:23804506
  39. 39. Dias S, Welton NJ, Sutton AJ, Ades A. NICE DSU Technical Support Document 1: Introduction to evidence synthesis for decision making. University of Sheffield, Decision Support Unit. 2011:1–24.
  40. 40. Kent DM, Paulus JK, Van Klaveren D, D’Agostino R, Goodman S, Hayward R, et al. The predictive approaches to treatment effect heterogeneity (PATH) statement. Ann Intern Med. 2020;172(1):35–45. pmid:31711134
  41. 41. Tanniou J, van der Tweel I, Teerenstra S, Roes KCB. Subgroup analyses in confirmatory clinical trials: time to be specific about their purposes. BMC Med Res Methodol. 2016;16(1):20. pmid:26891992