Patients with Severe Radiographic Osteoarthritis Have a Better Prognosis in Physical Functioning after Hip and Knee Replacement: A Cohort-Study

Introduction Although Total Hip and Knee Replacements (THR/TKR) improve Health-Related Quality of Life (HRQoL) at the group level, up to 30% of patients are dissatisfied after surgery due to unfulfilled expectations. We aimed to assess whether the pre-operative radiographic severity of osteoarthritis (OA) is related to the improvement in HRQoL after THR or TKR, both at the population and individual level. Methods In this multi-center observational cohort study, HRQoL of OA patients requiring THR or TKR was measured 2 weeks before surgery and at 2–5 years follow-up, using the Short-Form 36 (SF36). Additionally, we measured patient satisfaction on a 11-point Numeric Rating Scale (NRSS). The radiographic severity of OA was classified according to Kellgren and Lawrence (KL) by an independent experienced musculoskeletal radiologist, blinded for the outcome. We compared the mean improvement and probability of a relevant improvement (defined as a patients change score≥Minimal Clinically Important Difference) between patients with mild OA (KL Grade 0–2) and severe OA (KL Grade 3+4), whilst adjusting for confounders. Results Severe OA patients improved more and had a higher probability of a relevant improvement in physical functioning after both THR and TKR. For TKR patients with severe OA, larger improvements were found in General Health, Vitality and the Physical Component Summary Scale. The mean NRSS was also higher in severe OA TKR patients. Discussion Patients with severe OA have a better prognosis after THR and TKR than patients with mild OA. These findings might help to prevent dissatisfaction after THR and TKR by means of patient selection or expectation management.


Introduction
Total Hip Replacement (THR) and Total Knee Replacement (TKR) are effective surgical interventions, which alleviate pain and improve Health-Related Quality of Life (HRQoL) in patients with hip or knee joint degeneration at the population level. [1] Although on average patients improve markedly after THR or TKR, not all patients benefit from these surgeries. Persistent pain is reported in 9% of THR patients and 20% of TKR patients at long term follow-up. [2] Additionally, up to 30% of patients are dissatisfied after surgery, with higher reported dissatisfaction rates for TKR patients. [3][4][5][6][7][8][9] The relatively high dissatisfaction rate is especially worrying, as the therapeutic options are limited in dissatisfied patients after joint replacement. Moreover, given the projected increase in the annual number of THR and TKR performed in the United States, the absolute number of dissatisfied patients is expected to rise. [10].
Unattained expectations of surgery are thought to play an important role in dissatisfaction after joint replacement. [3,4,6,11] In order to successfully manage patient expectations, accurate prediction of the probability of a meaningful improvement for each individual patient is of paramount importance. This probability can be assessed at the individual level using the Minimal Clinically Important Difference (MCID), which is defined as the minimal difference in scores of an outcome measure that is perceived by patients as beneficial or harmful. [12,13] MCIDs in HRQoL, measured using the Short-Form 36, have been established for THR and TKR. [14][15][16].
Reports of the effect of the preoperative radiographic severity of osteoarthritis (OA) on the outcome of THR are conflicting: at the population level, Nilsdotter et al showed no effect at one year follow-up, while Meding et al found less postoperative pain at one year follow-up in patients with more preoperative joint space narrowing. [17,18] At the individual level, patients with severe preoperative radiographic OA were more likely to improve in physical functioning. [19] We found no studies addressing the effect of the preoperative radiographic severity of osteoarthritis (OA) on the outcome of TKR.
From a clinical perspective, the preoperative radiographic severity of OA would be a helpful predictor of improvement in HRQoL, as it is both inexpensive and performed routinely for templating purposes. Moreover, the assessment of the severity of preoperative OA could be standardised, whereas this would be more difficult with subjective symptoms such as pain.
We questioned whether the radiographic severity of OA affects the improvement in HRQoL after THR and TKR, both at the population and individual level. Additionally, we questioned whether patient satisfaction with the surgical results differed between patients with mild or severe preoperative radiographical OA.

Methods
We conducted a multi-center follow-up study at the departments of orthopaedic surgery of the Leiden University Medical Center, the Slotervaart hospital in Amsterdam, the Albert Schweitzer hospital in Dordrecht and the Groene Hart hospital in Gouda, the Netherlands, from August 2010 until August 2011. [20] The study was approved by the Medical Ethics Committee of the Leiden University Medical Center and the Medical Ethical Committees of all other participating centers; all patients gave written informed consent (CCMO-Nr: NL29018.058.09; MEC-Nr: P09.189). This study was registered in the Netherlands Trial Register (NTR2190). It concerned the clinical follow-up of a multi-center randomized controlled clinical trial, comparing different blood management modalities in THR and TKR surgery (Netherlands Trial Register: NTR303). In this trial, 2442 primary and revision hip or knee replacements in 2257 patients were included between 2004 and 2009.
All patients who participated in the randomized controlled trial and completed preoperative HRQoL questionnaires, who underwent primary THR of TKR for primary OA and who were alive at the time of inclusion for the present follow-up study were eligible for inclusion. In this study, patients are the subject of interest. Patients who participated more than once in the previous trial, were only allowed to participate once in the current study; the first joint replacement performed in the previous trial was chosen as the index surgery.
Records of the financial administration of all participating centers were checked in order to ascertain that all eligible patients were still alive before being approached. All eligible patients were first sent an invitation letter signed by their treating orthopaedic surgeon, an information brochure and a reply card. Patients who did not respond within 4 weeks after the first invitation were sent another invitation letter. The remaining patients, who did not respond to this second invitation, were contacted by telephone.

Assessments
The assessments of the follow-up study consisted of patientreported questionnaires, examination of patient records and preoperative radiographs.
Outcomes. HRQoL was measured preoperatively and in the present follow-up study using the SF36, which is translated and validated in the Dutch language. [21,22] The 36 items cover eight domains (physical function, role physical, bodily pain, general health, vitality, social function, role emotional, and mental health), for which a sub-scale score is calculated (100 indicating no symptoms and 0 indicating extreme symptoms). Additionally, these scales are incorporated into two summary measures: a Physical Component Summary (PCS) and Mental Component Summary (MCS).
At the population level, the HRQoL outcome measure was the mean change score, i.e. the mean of each patients postoperative sub-scale score minus their pre-operative sub-scale score). At the individual level, the change scores were used to categorise patients in responders and non-responders, using previously published MCIDs. [14][15][16] Patients with a change score equal to or larger than the MCID of that particular sub-scale were categorised as a responder; patients whose change score was less than the CID of that particular sub-scale were categorised as non-responders.
Patient satisfaction with the surgical result was measured using an 11-point Numeric Rating Scale of Satisfaction (NRSS; 0 indicating completely dissatisfied, 10 indicating completely satisfied). At the population level, the satisfaction outcome measure was the mean NRSS score. The proportion of patients who achieved a satisfactory outcome (defined as a NRSS.8, according to Brokelman et al [5]) was the satisfaction outcome measure at the individual level.
Exposure. Pre-operative radiographs of the hips (anteriorposterior) and knees (posterior-anterior) were collected from the participating patients' medical records and radiology department. These radiographs were routinely made in each participating center for pre-operative templating purposes. All radiographs were assessed by an experienced musculoskeletal radiologist (HMK), who was blinded for patient characteristics and HRQoL assessments. The method of scoring OA followed that described by Kellgren and Lawrence (KL) (0 indicating no OA, 1 doubtful OA, 2 minimal OA, 3 moderate OA and 4 indicating severe OA). [23] All radiographs were scored twice: both readings were used to establish intra-reader reliability (Intra-Class Correlation hip radiographs: 0.85 (95%CI: 0.82-0.88); Intra-Class Correlation knee radiographs: 0.87 (95%CI: 0.83-0.89)). The second reading was used for further statistical analyses.
As KL grade 0 to 2 and grade 3 and 4 are deemed similar from a clinical perspective, we grouped the severity of pre-operative OA in 2 categories: mild radiographic OA (KL grade 0, 1 or 2) and severe radiographic OA (KL grade 3 or 4).
Potential confounders. Socio-demographic characteristics collected at baseline in the trial included: age at joint replacement and gender. Additionally, the following socio-demographic variables were collected in the questionnaire of the follow-up study: length and weight, in order to calculate the Body Mass Index (BMI) (,25, 25-30, 30-35, .35) and patient reported Charnley classification of co-morbidity (Class A: patients in which the index operated hip or knee are affected only; Class B: patients in which the other hip or knee is affected as well; Class C: patients with a hip or knee replacement and other affected joints and/or a medical condition which affects the patients' ability to ambulate). [24,25].

Statistical Analysis
We performed descriptive analyses of patients baseline characteristics. In order to investigate the possible extent of self-selection bias, we compared the age at THR or TKR and gender of participants to non-participants.
Patients with missing pre-operative SF36 questionnaires, missing SF36 questionnaires at follow-up or missing pre-operative radiographs were excluded from analyses, as we could not exclude a Missing Not At Random (MNAR) mechanism. Missing values of the Charnley Co-morbidity Classification and BMI were deemed Missing At Random and imputed using Multiple Imputations (MI), in order to improve efficiency of the regression analyses and avert biased regression coefficients. We performed MI (m = 10) using an Expectation-Maximization algorithm, [26] which is implemented in the Amelia 2 package for R. [27,28].
We performed regression analyses in each imputed dataset in order to compare the mean improvement in HRQoL and the probability of achieving a MCID in HRQoL after THR and TKR, between patients with KL grade 0, 1 or 2 and grade 3 or 4. As MCIDs in HRQoL differ between THR patients and TKR patients, we performed all analyses separately for THR and TKR. Possible confounders are age, gender, BMI and poly-articular OA in both THR and TKR patients. We used the Charnley classification as a proxy for poly-articular OA. As the length of follow-up varied considerably, we first stratified our data in quartiles of follow-up length for each imputed dataset. Within each stratum of follow-up length, we performed a multivariate mixed effect linear regression analysis, with the mean improvement in HRQoL and the mean NRSS as the dependent variable, the KL grade and confounders as independent variables and center as a random effect. Stratum-specific mean differences in HRQoL between the KL grades were pooled using inverse variance weighting in order to produce an overall estimate of the mean difference in HRQoL for each imputed data-set. Finally, the m = 10 estimates of the mean differences in HRQoL were combined into one estimate, according to Rubin. [29].
Within each stratum of follow-up length, we also performed a multivariate mixed effect logistic regression analysis, with the probability of attaining a MCID in HRQoL and a satisfactory NRSS as the dependent variable, the KL grade and confounders as independent variables and center as a random effect. Stratumspecific odds ratios of attaining a MCID in HRQoL between the KL grades were pooled using inverse variance weighting in order to produce an overall estimate of the odds ratio of attaining a MCID in HRQoL for each imputed data-set. Finally, the m = 10 estimates of the mean differences in HRQoL were combined into one estimate, according to Rubin. [29].

Results
At 2 to 5 years after joint replacement, 723 patients agreed to participate and returned the questionnaires sufficiently completed (participation rate: 46%, figure 1 and 2). Non-participating THR patients were on average 4.32 years older than participants (95%CI: 2.93-5.70 years); Non-participating TKR patients were on average 2.68 years older than participants (95%CI: 1.28-4.09 years). The proportion of males was similar in participants and non-responders. An overview of the patient characteristics is provided in table 1. In 13 THR patients and 7 TKR patients, the Charnley classification was missing; in 9 THR patients and 11 TKR patients, the BMI was missing. These missing values were imputed using multiple imputation.
The mean improvement in HRQoL and mean NRSS per KL grade is shown in table 2 for THR patients and table 3 for TKR patients. In THR, patients with severe radiographic OA had a larger improvement in Physical Functioning than patients with mild radiographic OA. The improvement in other domains of HRQoL and the mean NRSS was similar for THR patients of all severities of radiographic OA. In TKR, patients with severe radiographic OA had a larger improvement in Physical functioning than patients with mild radiographic OA. Additionally, patients with severe radiographic OA had a larger improvement in General Health, a larger improvement in the Physical Component Summary Scale and a higher NRSS than patients with mild radiographic OA.
The crude probabilities of achieving a MCID in each dimension of HRQoL are presented in table 4 for THR patients and table 5 for TKR patients. In THR, the probability of achieving a relevant improvement in Physical Functioning was higher in patients with severe radiographic OA than in patients with mild radiographic OA. The probability of achieving a satisfactory outcome was also higher in patients with severe radiographic OA than in patients with mild radiographic OA. The probability of achieving a relevant improvement in other domains of HRQoL was similar for THR patients of all severities of radiographic OA. In TKR, the probability of achieving a relevant improvement in Physical Functioning was higher in patients with severe radiographic OA than in patients with mild radiographic OA. Additionally, the probability of achieving a relevant improvement in General Health and the probability of achieving a satisfactory outcome was also higher in patients with severe radiographic OA than in patients with mild radiographic OA.

Discussion
At the population level, patients with severe radiographic OA improve more in Physical Functioning than patients with mild radiographic OA, both for THR and TKR. At the individual level, THR and TKR patients with severe radiographic OA have a larger probability of a relevant improvement in Physical Functioning than patients with mild radiographic OA. The effects of the preoperative severity of radiographic OA on Physical Functioning are more pronounced in TKR patients than in THR patients. Other domains of HRQoL do not appear to be influenced by the preoperative severity of OA, except General Health and the Physical Component Summary Scale in TKR patients. Additionally, patient satisfaction appears to be better in patients with more severe preoperative radiographic OA.
Limitations of the study include the participation rate and range of follow-up period after joint replacement. Although participation rates of 100% are feasible in small-scaled studies with hard endpoints, [31,32] participation rates in epidemiological studies  have been steadily declining in the last 30 years. [33] Even sharper declines have been reported in the past few years. [34] Unfortunately, the participation rate of this study follows this general trend, resulting in a participation rate of 46%. Therefore, we cannot exclude the presence of self-selection bias. In order to limit the extent of this bias, we have sent multiple reminders and have called all patients who did not answer our reminders and who did not return the questionnaire. As incentives, we have included an appealing information brochure in which the primary goals of the follow-up study were explained and a study pen as a small gift. Additionally, patients were urged to participate by their treating physician. However, the participation rate alone does not determine the extent of bias present in any particular study. [34] The difference between participants and non-participants is far more important. [35] As the found differences in demographics were of little clinical relevance, it is unlikely that the study results will be severely biased. Finally, the patient demographics of our study population were similar to those of large-scaled national joint registry studies, regarding age, gender, Charnley classification and BMI. [36,37]. The follow-up period after joint replacement varies between 2 and 5 years. Although a residual effect of follow-up length cannot be excluded, we do not think this is very plausible, as recent evidence suggests that the improvement in HRQoL is sustained up to 5 years after joint replacement surgery. [38,39].
Although joint replacements are highly effective in improving HRQoL at the group level, [1] this is not the case for each individual patient, judging from the relatively high dissatisfaction rates. [40,41] Studying HRQoL at the individual level, using the probability of achieving a clinically important difference as an outcome measure, enables a better prediction of a successful outcome. Moreover, it could provide a helpful way to fine-tune the indication for joint replacement, for which there are no clear cutoff points currently available. [42].  Regardless of age, gender, co-morbidity and BMI, we have shown that joint replacement patients with severe preoperative OA have a better prognosis in improvement in Physical Functioning and patient satisfaction with the surgical results. These effects are more pronounced in TKR patients than in THR patients, which might be explained in part by biomechanical factors. The hip joint is a relatively simple ball and socket joint, which is adequately mimicked by a THR. The biomechanical aspects of the knee joint are more difficult to imitate, as the knee is a pivotal hinge joint with 6 degrees of freedom. These degrees of freedom are generally not restored after TKR, which is substantiated in kinematic and kinetic studies. [43] This additional disadvantage of TKR patients who underwent joint replacement for mild radiographic OA is reflected in a smaller increase in Physical Functioning than THR patients who underwent joint replacement for mild radiographic OA. Additionally, the odds of achieving a MCID in Physical Functioning is smaller and the difference in satisfaction is larger.
Clinically, these are promising findings, as dissatisfaction rates are higher in TKR patients than in THR patients. [4,6] Patient satisfaction is thought to be closely related to unfulfilled expectations. Although patient expectations of THR and TKR are similar, recent evidence suggests that THR meets important patient expectations better than TKR. [6,11,44] Our findings could lead to a more fitting expectation management regarding the expected improvement in Physical Functioning, using a single predictor. This improvement in expectation management might lead to higher satisfaction rates.
Plain radiographs have a number of appealing aspects. In the first place, they are inexpensive and easily available, as they are currently a part of the clinical work-up to joint replacement. Secondly, due to the non-invasive character of the test, radiographs are a patient-friendly modality. Finally, they offer a more objective approach to joint complaints. These aspects would make it easy to implement the KL grade in clinical practice, in order to predict HRQoL and satisfaction after joint replacement.