Evidence for key individual characteristics associated with outcomes following combined first-line interventions for knee osteoarthritis: A systematic review

Objective To identify individual characteristics associated with outcomes following combined first-line interventions for knee osteoarthritis. Methods MEDLINE, CINAHL, Scopus, Web of Science Core Collection and the Cochrane library were searched. Studies were included if they reported an association between baseline factors and change in pain or function following combined exercise therapy, osteoarthritis education, or weight management interventions for knee osteoarthritis. Risk of bias was assessed using Quality in Prognostic Factor Studies. Data was visualised and a narrative synthesis was conducted for key factors (age, sex, BMI, comorbidity, depression, and imaging severity). Results 32 studies were included. Being female compared to male was associated with 2–3 times the odds of a positive response. Older age was associated with reduced odds of a positive response. The effect size (less than 10% reduction) is unlikely to be clinically relevant. It was difficult to conclude whether BMI, comorbidity, depression and imaging severity were associated with pain and function outcomes following a combined first-line intervention for knee osteoarthritis. Low to very low certainty evidence was found for sex, BMI, depression, comorbidity and imaging severity and moderate certainty evidence for age. Varying study methods contributed to some difficulty in drawing clear conclusions. Conclusions This systematic review found no clear evidence to suggest factors such as age, sex, BMI, OA severity and presence of depression or comorbidities are associated with the response to first-line interventions for knee OA. Current evidence indicates that some groups of people may respond equally to first-line interventions, such as those with or without comorbidities. First-line interventions consisting of exercise therapy, education, and weight loss for people with knee OA should be recommended irrespective of sex, age, obesity, comorbidity, depression and imaging findings.


Introduction
Clinical practice guidelines recommend land-based exercise, education, and weight loss in those with knee osteoarthritis (OA) before undertaking total knee replacement [1]. The use of these first-line interventions remains suboptimal despite these recommendations [2]. Combined firstline interventions (or multi-component osteoarthritis interventions) consist of two or more non-surgical interventions of exercise therapy, osteoarthritis education and weight management [3]. Combined first-line interventions are increasingly provided internationally through specialist osteoarthritis management programs (OAMPs) as a complete package of care. These programs aim to deliver coordinated, evidence-based care to those with knee OA [3].
There is extensive research that demonstrates the effectiveness of exercise therapy for knee OA [4]. Less research has been conducted on outcomes following combined first-line interventions [5]. The evaluation of combined first-line interventions has been identified as a research priority, and recent reviews have examined their effectiveness [6,7]. OAMPs such as the Good Life with osteoarthritis: Denmark (GLA:D 1 ) have reported improvements in pain and function and a reduced desire for surgery [8]. However, a proportion of people undertaking these programs do not improve [8][9][10]. For example, data from the Swedish Better Management of Patients with Osteoarthritis (BOA) registry indicate that up to 43% of those with knee OA were considered responders based on NRS pain [9]. Immediate outcomes following the GLA: D 1 program indicates half of the participants were classified as a responder for pain and function outcomes [8]. Predicting those who may benefit from combined first-line interventions for knee OA is important. This may assist clinicians in the early identification of alternative treatments (such as pharmacological interventions), improve the timeliness or suitability for total joint replacement surgery, and assist medical practitioners in providing appropriate referrals to first-line care [2].
Research has begun to identify subgroups or individual characteristics associated with outcomes in those with knee OA following conservative treatments including intraarticular glucocorticoid injections [11] and combined first-line interventions [9,[12][13][14][15][16]. Many factors have been evaluated and include demographics, body mass index (BMI), comorbidities, psychological factors, and baseline disease severity [17,18]. Interpreting the results of primary studies that have evaluated combined first line interventions is difficult due to a variety of factors identified, different study methods and contrasting results. A systematic review to collate these findings may provide clarity about which factors may influence the response to combined first-line interventions for knee OA and provide recommendations for clinical practice and future research. This systematic review aimed to identify individual characteristics associated with a response to combined first-line interventions of land-based exercise therapy, OA education and weight loss for knee osteoarthritis. The primary objective of this systematic review was to identify baseline characteristics associated with improvements in pain and function following a combined first-line intervention in people with knee osteoarthritis. The secondary objective was to evaluate baseline characteristics associated with a change in the willingness to undertake surgery or undertake total knee replacement surgery.

Protocol and registration
This systematic review was registered (PROSPERO, protocol number CRD42021234398 (www.crd.york.ac.uk/prospero)). There were no amendments to the protocol. Reporting follows the PRISMA 2020 statement [19] (S1 Table).

Eligibility criteria
Eligible studies investigated an association between a baseline prognostic factor and outcome following a multi-component intervention in those with knee osteoarthritis. Participants had knee osteoarthritis diagnosed either clinically or radiographically. The combined first-line intervention included (1) land-based exercise (any type) and either (2) arthritis education or self-management strategies or (3) weight loss or dietary management. The study design was not restricted and included secondary analysis of RCT data, case-series and cohort or longitudinal studies, including data from registries. The effect estimate was reported as a beta coefficient, odds ratio (OR), risk ratio (RR), hazard ratio (HR) or mean difference (MD). Primary outcome measures were a change in any measure of pain and function from baseline to follow-up with no restrictions to the length of follow-up. Secondary outcome measures were change in willingness to undertake joint replacement surgery or to have undertaken joint replacement.
Studies were excluded if participants had a TKR, had rheumatoid arthritis or other inflammatory conditions, studied pharmacological interventions, or did not estimate a prognostic factor at baseline with a reported association measure. Studies that examined treatment moderators or subgroup analysis that reported treatment effect sizes for baseline prognostic factors were excluded. Conference abstracts and review studies were excluded.

Data extraction
The modified version of CHARMS-PF checklist was used for data extraction [20]. Data extracted included study type, source of data, sample size and missing data, description of the intervention, outcomes to be predicted, the number and type of prognostic factors of interest, measurement of prognostic factors and cut-off points used, description of modelling method, reporting of adjusted or unadjusted effect estimates and the set of adjustment factors (covariates).
Data were extracted by the primary reviewer (JC) using Covidence and Microsoft Excel 2020. Secondary reviewers checked the strength of association measures and calculations (AW and JL).

Quality assessment
Risk of bias was assessed using the Quality in Prognostic Factor Studies (QUIPS) tool [20]. QUIPS consists of six domains: study participation, study attrition, prognostic factor measurement, outcome measurement, adjustment for other prognostic factors, and statistical analysis and reporting. Each domain was rated as low, moderate, or high risk of bias. A study was deemed low risk of bias overall if all or most domains were rated as low. Suggested signalling items for each domain were discussed a priori with the research team. A study was rated low risk of bias if most signalling items had been addressed. If one signalling item was deemed quite problematic, this was weighted more heavily in the decision process.
Three independent reviewers (JC, JL, and DS) assessed risk of bias using Covidence software. Conflicts were resolved by discussion with all three reviewers.

Data synthesis and analysis
The prognostic factors pre-identified in the study protocol were reported (age, sex, BMI, comorbidities, depression, and baseline OA severity). The effect estimate for each prognostic factor was summarised using odds ratio (OR), hazard ratio (HR), risk ratio (RR), or mean difference (MD) and the corresponding 95% confidence interval. Where relevant, both unadjusted and adjusted effect estimates were recorded. A meta-analysis was not performed due to the inconsistency in the intervention components, duration of follow-up, outcome measures, and methodology. Instead, a qualitative synthesis was conducted, and data visualisation was presented using R (https://www.r-project.org/) and 'ggplot2' (https://ggplot2.tidyverse.org/). The term 'responder' was used to refer to whether a person had improved in either a pain or function outcome measure. This response may be a positive or a negative response to the treatment and will depend on the outcome measure used to evaluate the response.
For all observations to be on a common scale, the OR and 95% CI for the prognostic factors of age, sex, BMI, comorbidity, and depression were rescaled when necessary, such that an odds ratio greater than 1 represents a positive response. Many studies reported several related outcomes, which resulted in an OR being reported multiple times within the same study [10,14]. Due to similarities in the outcome measures, we presented all multiple outcome measures within studies.
A recalculation for the prognostic factor of age was carried out, with the continuous cut-off point recalculated from age per 5 years to per year [15]. Two discrepancies observed in the reported data were resolved by contacting the author [10,12]. Multiple outcome variables were reported in two studies [15,21]. Gwynne-Jones (2018) used multinomial logistic regression with three categories, worse, stable and improved. OR for 'better' versus 'stable' was extracted instead of OR for 'worse' [15].  used multinomial logistic regression and reported the OR for four pain and function trajectories measured over 12 weeks [21]. The lower pain, early improvement trajectory (versus higher pain, no improvement) and the higher function, early improvement (versus lower function, delayed improvement) were extracted for comorbidities [21].
The certainty in the estimates of association were rated using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) for prognostic factors [22,23]. Each prognostic factor was rated from very low to high with consideration to the domains of risk of bias, inconsistency, indirectness, imprecision and publication bias [22].
Reasons for exclusion were the intervention was defined retrospectively from electronic health records [49]; or retrospectively by participant self-report [50,51], an exercise intervention only [52][53][54][55][56], including exercise and manual therapy [57] or the addition of joint injections with exercise and education component [58]. Other reasons included the study used analysis of variance and we were unable to extract meaningful estimates [59,60], did not report an association measure [61] or used individual participant data (IPD) from 7 RCTs that evaluated a range of interventions and different musculoskeletal conditions, including knee osteoarthritis [62]. A study that examined pre-treatment pain sensitivity was excluded (abstract only) [52].

Study characteristics
Twenty studies were prospective cohorts, and 12 were secondary analyses of RCT data (Table 1). In 31 of the 32 studies, the intervention consisted of land-based exercise therapy and education components. One study targeted weight loss in overweight and obese people with knee OA [34]. This study evaluated the association between baseline kinematics and knee pain but did not report data for age, sex, BMI, comorbidities, depression and OA severity. Seven studies were considered multidisciplinary with tailored interventions by health professionals such as dieticians or occupational therapists [10,12,15,18,30,39,43]. Fifteen studies were described as an OAMP, including GLA:D 1 [26,41,45,46], BOA [24,25,28,29,44,48], Osteoarthritis Chronic Care Program (OACCP) [10,12,43] and the Joint Clinic [15,30].
Study dates overlapped for several registry-based cohort studies [24,28,29,41,44,45,48,63]. Secondary analysis of RCT studies either combined participant data from multiple different exercise interventions within one study, or pooled data from several RCTs [21,31,35,37,42]. The duration of follow-up ranged from immediately post-intervention to 3-, 6-and 12-months post intervention (S1 Fig). The knee joint was reported separately in twenty-one studies, but nine studies reported the hip and knee joints together.

Outcome measures
Primary outcome. Twenty-eight studies reported various pain and function measures ( Table 2). Most outcomes were self-reported, including VAS, NRS and WOMAC for pain, self-reported minimum physical activity level [29,44,46] and composite measures such as WOMAC-G and OKS. Three studies reported a global rating of change [10,14,18]. Two studies measured walking speed with the 40metre fast-paced walk test [26,41].
Responder definitions. Table 2 briefly summarises each study's "responder" definition. There was a range of different responder definitions including:       • The OMERACT-OARSI responder criteria-a composite measure that uses improvements in pain, function and the patient's global assessment of improvement [65].
• The patient's global assessment of improvement in pain or function (Likert scale). The scale is dichotomised to reflect a responder or non-responder cut-off value [18].
• Patient-reported outcome measures such as WOMAC pain or function or WOMAC-global. A range of cut-off points were based on a minimally clinically important difference (MCID). • Multiple responder definitions constructed within a single study which produced several similar effect measures [10,12,14,39].
• Pain scales such as NRS pain (0-10) were dichotomised. A responder was defined as having greater than 50% reduction in pain following the intervention [17].

Secondary outcome
Four studies reported change in willingness to undertake joint surgery or undertake total knee joint replacement (secondary objective) [24,30,43,48]. Two studies reported hazard ratios and baseline characteristics associated with time to joint replacement surgery [30,48] and two studies reported change in willingness to undergo surgery [24,43]. Change in willingness to undertake surgery was assessed differently by the studies. One study asked the question "Are your knee/hip symptoms so severe that you wish to undergo surgery? (Yes/No) [24] and one study used a 5-point scale rating willingness for surgery (from 'definitely willing', 'probably willing', 'unsure', 'probably unwilling' and 'definitely unwilling') [43] (S5 Table).

Prognostic factor identification
There was considerable variation in the studies' methods of identifying prognostic factors.

Risk of bias
Most studies were rated as moderate overall risk of bias (20/32), and the remainder (12/32) had a low overall risk of bias. Almost half of the studies did not adequately account for adjustment for other prognostic factors, with 16 rated as moderate and one as high risk of bias. Most studies had a low risk of bias for statistical analysis and reporting (18/32), while the remainder (14/32) were rated as moderate or high. Study attrition was rated as moderate or high in 19 of the 32 studies; however, there was difficulty interpreting the loss-to-follow-up and response rate in large registry-based cohort studies [58]. A summary of the risk of bias assessment is shown in Fig 2 and ratings for the individual studies in S4 Table.

Prognostic factor results
Studies reported a wide range of prognostic factors. Table 2 includes the number and types of prognostic factors that were evaluated.
Due to the wide variation in reported factors, this review has focussed on the pre-identified factors of age, sex, BMI, depression, comorbidity and OA severity with a descriptive summary and visualisation of the odds of a positive response (OR and 95% CI). These factors have been examined in studies examining predictors of total knee joint replacement and OA progression but less commonly in predicting a response to first-line interventions [66,67]. Effect measure results (OR, MD, HR, Beta coefficient (ß)) and rescaling calculations for individual studies and prognostic factors are shown in S5 Table. Studies examining baseline OA severity (imaging). Baseline OA severity (imaging) was reported in four small exploratory studies [16,18,31,68]. Comparison between studies was difficult because each used different imaging modalities, outcome measures and cut-off points for grading OA severity (S5 Table). Knoop (2014) found all grades of OA severity on MRI were associated with a positive response following an exercise and education intervention, but found the response was reduced with advance PF OA [31]. O'Leary (2020) found severe medial compartment OA was associated with a poorer response compared to mild medial compartment OA. OA severity was assessed using radiological, CT or MRI results reported in the medical records. Severity was recorded in the medial, lateral and PF compartments as either absent, mild, moderate or severe. Lee (2018) examined the association between radiological KL grade and four different pain and function trajectory groups over 12 weeks. It was found KL grades were evenly distributed amongst the four groups and the authors reported no significant association between KL grade and trajectory group membership.
Effect of increasing age on the odds of positive response to intervention. We found moderate certainty evidence that older age may be associated with a lower odds of responding to a combined first-line intervention (Fig 3). The effect estimates were small, precise, and slightly negative (less than 10%) in studies that reported age continuously (OR =~0.9) (Fig 3). One study was imprecise but still reported a negative association between increasing age and the probability of a positive response (unadjusted OR 0.9, 95% CI 0.7-1.2) [12].
The results for Ernstgard (2017) differed. This study examined a physical activity measure, with a responder defined as exceeding a self-reported minimum physical activity threshold of 150 minutes per week or greater than 30 minutes on four or more days per week [29]. Age was not reported continuously with four aged groups compared (22-54, 55-64, 65-74 and 75 + years) [29]. Ernstgard (2017) found being older was associated with up to twice the odds of positive response [25]. The OR for 65-74 years compared to those aged 22-54 years was 2.13 (95% CI 1.85-2.38), suggesting that older people were more likely to be physically active than younger people. Comparing the results of this study is difficult due to the different outcome measures and that age was not reported continuously. Additionally, only a small proportion of patients reported a change in the minimum physical activity threshold (a slight increase from 77% at baseline to 82% at three months and decreased to 76% at 12 months) [29].
Effect of female sex on the odds of positive response to intervention. Low certainty evidence indicated that being female was associated with a positive response following a combined first-line intervention (Fig 3). Females had up to 2-3 times the odds of a positive response compared to males. The effect estimates for females (compared to males) were positive but imprecise in 7 out of 8 studies (OR ranging between 1 and 3). Four of these studies included small cohorts of less than 300 participants [14,15,17,18,39]. Weigl (2006) reported the three largest effect estimates from three different responder definitions. The OR using WOMAC-G responder (based on MCID 18% improvement) was 2.11 (95% CI 1.05-4.25)

Fig 3. Effect of increasing age and female sex on the odds of positive response following a combined first-line intervention for knee osteoarthritis.
All graphs report the log odds ratio and 95% CI. Repeated study labels by the same author represent multiple responder definitions within each study. The OR for sex reports the probability of a female (compared to a male) being a responder. OR > 1 = increased probability of female being a responder compared to a male. OR > 1 for age interpreted as increased probability of being a responder with increasing age. For age, original data from 8 studies reporting OR (7 cohorts, one secondary analysis of RCT). Studies not included: 2 reporting regression coefficients [25,42]  https://doi.org/10.1371/journal.pone.0284249.g003 [14]. Weigl (2006) was rated as moderate risk of bias in 5 of the 6 QUIPS domains and utilised a univariable screen as part of a 4-step modelling process.
Effect of increasing BMI on the odds of a positive response to intervention. It was difficult to conclude whether BMI was associated with pain and function outcomes following a combined first-line intervention (Fig 4). There were only five studies that reported BMI. The effect estimates for three studies were precise and close to 1 (OR =~0.98 to 1) suggesting no effect between BMI and a positive outcome. However, these studies reported an unadjusted Graph reports the log odds ratio and 95% CI. Repeated study labels by the same author represent multiple responder definitions within each study. Graph plots the odds ratio for the probability of BMI or presence of depression on a positive response following intervention. For BMI, OR >1 = increased probability of being a responder with increased BMI. For depression, OR > 1 = increased probability of being a responder with the presence of depression. For a continuous predictor, we interpret the odds ratio per one unit change, and for a dichotomised predictor, the OR is the probability compared to the reference group. For depression, original data from 7 studies (5 cohorts) reporting OR. Studies not included: 1 reporting a regression coefficient [42] and 2 mean difference [37,41]. For BMI, original data from 5 cohort studies reporting OR. Studies not included: 1 reporting regression coefficient [25], 2 MD [16,37] and 1 unadjusted HR [36]. effect estimate which makes interpreting the results difficult [10,12,18]. There may be greater certainty in the effect estimate in studies that present a multivariable-adjusted analysis [22].
The results of Tanaka (2021) are difficult to interpret. This study used two different responder definitions (a 5-point reduction in OKS and 50% reduction in NRS pain intensity) and reported a positive and negative OR (multidirectional) [17]. This was a small cohort study of 277 participants which had a dropout rate of close to 50%. The participant's average BMI was low (23 kgm2) in comparison to the remaining studies where the average participant BMI exceeded 28 kgm 2 .
Four studies reported BMI continuously per one unit increase. One study dichotomised BMI and found BMI was associated with a negative outcome [29]. This study found that those who were obese (compared with normal BMI) had half the odds of reaching a self-reported minimum physical activity threshold of 150 minutes per week (OR 0.52, 95% CI 0.46-0.58).
Presence of depression on the odds of a positive response to intervention. It was difficult to conclude whether depression was associated with pain and function outcomes following a combined first-line intervention (Fig 4). In four studies the effect estimates were small, precise, and negative (OR between 0.9 and 1.01) which may suggest a small negative association between depression and a positive outcome [18,21,39,42]. Graph reports the log odds ratio and 95% CI. Repeated study labels by the same author represent multiple responder definitions within each study. OR > 1 for comorbidity interpreted as increased probability of being a responder with the presence of comorbidity. For a continuous predictor, we interpret the log-odds change with a one-unit change in comorbidity score. For dichotomised predictor, the OR is the probability of the comorbidity category to the reference group of being a responder. Original data from 7 studies reporting OR. Studies not included:2 reporting regression coefficient [25,47], 2 mean differences [37,41] and 1 unadjusted hazard ratio [36]. SCG = selfadministered comorbidity questionnaire. Charnley classification = Charnley A (unilateral hip or knee OA), B (bilateral hip or knee OA), C (multiple joint sites hip and knee and presence of other disease affecting walking ability). Self-report = presence of one comorbidity or number of self-reported comorbidities.1. Lee2018b used multinomial logistical regression with multiple outcome definitions based on four group-based trajectories of WOMAC pain and function. OR > 1 indicates an increased probability of being in the lower pain, early improvement group. 2. There are six OR reported for Eyles (2016) due to multiple responder definitions and the number of comorbidities reported as low, moderate, and high. https://doi.org/10.1371/journal.pone.0284249.g005 Overall, the interpretation of the 7 studies was challenging. An unadjusted effect estimate was reported in three studies [10,12,18] and the effect estimate was the largest for the three imprecise studies [10,12,14]. There was a large variation in depression measures and cut-off points. For the two studies that reported a pain outcome, the OR was precise and between 0.9 and 1.01 [21,39]. Although both studies used a WOMAC pain MCID responder, Lee (2018) used a trajectory-based analysis assessing pain weekly over 12 weeks [21]. Depression scores were measured differently using a cut-off point (CES-D of greater than 16) and a continuous measure (per one unit increase in BECK-11score) [21,39]. Weigl (2006) and Eyes (2016) used identical outcome measures (WOMAC-G, a transition scale and a combination of both) but the results were conflicting. Within studies the results often differed. For Eyles (2016), the OR was multidirectional (positive or negative) depending on the responder definition used [10]. Depression was associated with over three times the odds of a positive response when the response was defined as improved WOMAC-G score (OR 3.33, 95% CI 1.27-9.09) [10]. When using a self-reported transition scale (much worse or moderately worse), the OR was 0.71 (95% CI 0. 36-1.52). The conflicting result might be explained by the small number of responders classified (n = 34) who were classified based on WOMAC-G responder definition [10].
Presence of comorbidity on the odds of a positive response to intervention. There was inconclusive evidence to determine whether comorbidity was associated with pain and function outcomes following a combined first-line intervention (Fig 5). The effect estimates were multidirectional (positive or negative) and imprecise. In addition, there was considerable variation in comorbidity outcome measures. For example, the use of self-reported presence of specific comorbidity or number of comorbidities [18,21], a validated comorbidity measure reported continuously per one-unit increase [42] or dichotomised with cut-off points based on the number of self-reported comorbidities [10,12,14].

Discussion
This systematic review is the first to evaluate prognostic factors associated with pain and function outcomes following combined first-line interventions of exercise therapy, OA education or weight loss for knee OA. The rationale of this review was to identify individual characteristics that may influence a person's response to combined first-line interventions for knee OA. A meta-analysis was not able to be performed due to study heterogeneity, instead, a narrative synthesis and data visualisation was conducted. Thirty-two studies were included in this review. We found being female was associated with 2-3 times increased odds of a positive response. Older age was associated with a lower odds of responding which was unlikely to be of clinical relevance. We could not conclude whether BMI, those with comorbidities or depression and OA severity (imaging) was associated with a positive response following combined first line interventions.
Our review found being female (compared to male) was associated with 2-3 times increased odds of a positive response to a combined first-line intervention. Although the magnitude of this effect appears large, evaluating whether this is clinically meaningful is difficult as it compares a female to male response. In addition, the use of a variety of responder definitions does not allow any evaluation of individual treatment response. Other prognostic studies that have examined sex are inconclusive or report conflicting findings. There is limited evidence that female sex is associated with symptomatic OA progression [69,70], and there is conflicting evidence on whether being female is a predictor for future TKR [66,71].
Older age was associated with a lower odds of responding to a combined first-line intervention. This small negative effect (less than 10% reduction in odds) is unlikely to be of clinical relevance. Other studies that have examined age are inconclusive or report conflicting findings. Age did not appear to be associated with WOMAC score following group exercise therapy, but interpretation of the results is difficult given age was dichotomised (cut off point 65 years) [55]. Increasing age was found to be positively associated with progression to knee joint replacement following a first-line intervention of education and exercise, however the effect size was very small [30,48]. Younger age was found to have a small association with becoming unwilling to undertake surgery following a multidisciplinary OA program [43]. The impact of increasing age on outcomes following TKR is also debated, with a 2021 systematic review concluding the evidence was inconsistent [72]. Our review suggests that age may not be a relevant factor in predicting response to first-line interventions for knee OA.
We could not conclude whether BMI was associated with a positive response following a combined first-line intervention. Few studies have evaluated BMI as a prognostic factor, and overall, the evidence of certainty was very low. Our analysis did focus on OR, however similar findings were found from studies that reported MD, HR and ß coefficient [25,36,37]. Dell'lsola (2020) found increased BMI was associated with an increase in pain following the BOA intervention but concluded the difference was unlikely to be clinically important (ß coefficient 0.02, 95% CI 0.02-0.03) (S5 Table). Studies that examined obesity as a treatment moderator of exercise therapy have been inconclusive, with few high-quality trials and conflicting evidence to date [16,73,74]. The results of an RCT comparing non-weight bearing and weight-bearing exercise in those who are obese found no between group difference in pain and function outcomes [73]. An IPD meta-analysis of 11 RCT trials evaluating structured exercise programs for knee OA found lower BMI was associated with a small positive treatment response (OR 1.04, 95% CI 1.02-1.07) [75]. Our review suggests that BMI may not be a relevant factor in predicting a response to first-line interventions for knee OA.
We could not conclude whether the presence of comorbidity or depression was associated with a positive response following a combined first-line intervention. Few studies have specifically examined the impact of depression or comorbidity on conservative treatments for knee OA [37,41,76], despite much research examining predictors of musculoskeletal problems such as chronic lower back pain [77] or cross-sectional studies evaluating the association between comorbidities and clinical symptoms in those with knee OA [67,78,79].
Current evidence suggests that those with comorbidities and depression may respond to first-line interventions in similar ways [41,47,75]. A subgroup analysis of an internet-based exercise and education program found little difference between the number of comorbidities present and pain and function outcomes [47]. A large, well-designed registry-based cohort study examined the association between comorbidities and change in pain and function following the GLA:D 1 program. For both the primary outcome (change in 40-metre walk test) and secondary outcome (change in NRS pain), little difference was found in the adjusted mean difference in those with and without comorbidities (S5 Table). Although those with comorbidities had worse baseline scores across all outcomes, similar improvement was found in both groups following the intervention. A 2020 systematic review concluded there was insufficient evidence to determine whether comorbidity and depression moderate the effects of exercise therapy in people with hip and knee osteoarthritis [75]. This review also highlighted similar methodological limitations found in our review, such as failure to identify the moderator a priori, diversity in measurements and the use of arbitrary cut-off points [75].
We could not conclude whether OA severity (imaging) was associated with a positive response following a first-line intervention. There were four small exploratory studies that examined baseline imaging and comparing studies was challenging due to different imaging modalities, outcome measures and cut-off points for grading OA severity. Despite the studies reporting positive findings such as advanced PF OA, severe medial compartment OA and higher KL grade being associated with a poor response, further research is required to be able to make any clear conclusions [18,21,31].

Strengths and limitations
A limitation of this systematic review was the inability to pool the effect measures. A metaanalysis was not able to be performed due to each study varying on many dimensions such as the intervention components, the joint analysed, the follow-up duration, outcomes measures, the definition of a responder, prognostic factor measurement and the statistical methods. The knee joint was the focus of this review; however, some studies did report hip and knee data together. Separate reporting of the hip and knee in future primary studies is recommended as there are known differences in risk factors, prognosis, clinical presentation, and non-surgical recommendations [80].
We focussed our review on key prognostic factors identified in our protocol which included demographics (age and sex), BMI, psychological factors, and OA severity (imaging). This may be considered a limitation of this review as we missed potentially important factors. Our initial research question was broad, and a more specific focus is preferable in order to make useful conclusions [81]. It is acknowledged that some factors were not considered in detail. For example, several recent studies examine the association between pre-treatment pain sensitisation and outcomes following exercise therapy [39,52,61]. As further evidence emerges, a systematic review focussing specifically on factors such as pain sensitisation may be warranted.
We did not include exercise only interventions in our review. The decision to focus on any combination of land-based exercise therapy, education or weight management reflects the fact that these interventions are consistently recommended as first-line care in clinical guidelines and are commonly delivered together in clinical practice and OAMPs. The evaluation of these complex interventions does remain a challenge. Determining suitable combinations and the most appropriate outcome measures, as well as the mechanism to explain their effectiveness is yet to be determined [5] and beyond the scope of this review.
Restricting our review to cohort studies may simplify our analysis and certainty in the effect estimates. Longitudinal cohort studies may provide better prognostic estimates due to a broad inclusion criterion and may provide more generalisable findings [22]. However, cohort studies have a limited ability to separate the treatment response from the natural history of the disease and to determine causation.
Our decision to focus on one effect measure was a limitation but unavoidable due to the OR being reported in most studies. A dichotomised outcome variable should be interpreted with caution [81,82]. Calculating the proportion of those who respond does not allow for visibility of individual variation in treatment response and may be subject to misclassification bias if the response definition is not well constructed [73]. MD, RR and HR were reported infrequently but still need to be considered when examining the evidence [25,41].
A strength of this review was the ability to make inferences using an estimation approach [74]. There are examples in the literature of prognostic factor studies that present results based on statistical significance, a practice increasingly being discouraged [61,75]. Our results were limited as we did rely on the primary studies' chosen model. Depending on the study's variable selection method, a factor of interest may have been excluded based on a non-significant finding. Therefore, no information about this factor is presented in the study [77]. Other aspects that contribute to difficulty in interpretating the findings of this review include whether the study reported an unadjusted effect estimate, inconsistent adjustment for other prognostic factors, or a variable selection method based on the data rather than expert opinion, evidence, or biological plausibility [77]. This review consisted mainly of small-sample studies which may result in a larger effect measure, and potential publication bias or selective reporting [15].

Research and clinical implications
Based on the limited findings of this review, there is no reason to expect those with comorbidities, depression, increased BMI, more advanced imaging findings and increasing age would not respond to an intervention consisting of exercise therapy, OA education and weight loss. First-line interventions should continue to be recommended to these individuals with knee osteoarthritis. Exercise therapy has been shown to be safe and effective for a broad range of conditions [83] and current evidence suggests some subgroups of people may respond equally to first-line interventions [41,47,74]. Additionally, clinicians need to recognise that similar improvements may occur irrespective of a higher baseline score [41].
Future research on individual characteristics associated with a response to a first-line intervention might be strengthened by access to IPD. IPD meta-analysis would allow for standardising the inclusion and exclusion criteria, consistent adjustment of prognostic factors, maintaining continuous factors on their original scale, reducing the need for arbitrary cut-off points, reducing the need for reporting unadjusted effect estimates and allowing for large data sets to be analysed [81]. Current work is being undertaken by the Joint Effort Initiative using IPD from OAMPs, aiming to identify prognostic factors associated with improvements in pain and function [84].

Conclusion
Based on the limited findings of this review, it is recommended that clinicians continue to recommend first-line interventions consisting of exercise therapy, education, and weight loss, irrespective of sex, age, obesity, comorbidity, depression and imaging findings to people with knee OA. This review found no clear evidence to suggest factors such as age, sex, BMI, OA severity and presence of depression or comorbidities are associated with the response to a first-line intervention for knee OA. Current evidence indicates that some groups of people may respond equally to first-line interventions such as those with or without comorbidities. Future research using IPD meta-analysis may help overcome some of the challenges found in this review, allowing for standardisation of inclusion and exclusion criteria, a more consistent study methodology and evaluation of larger datasets.