A comparison of comorbidity measures for predicting mortality after elective hip and knee replacement: A cohort study of data from the National Joint Registry in England and Wales

Background The risk of mortality following elective total hip (THR) and knee replacements (KR) may be influenced by patients’ pre-existing comorbidities. There are a variety of scores derived from individual comorbidities that can be used in an attempt to quantify this. The aims of this study were to a) identify which comorbidity score best predicts risk of mortality within 90 days or b) determine which comorbidity score best predicts risk of mortality at other relevant timepoints (30, 45, 120 and 365 days). Patients and methods We linked data from the National Joint Registry (NJR) on primary elective hip and knee replacements performed between 2011–2015 with pre-existing conditions recorded in the Hospital Episodes Statistics. We derived comorbidity scores (Charlson Comorbidity Index—CCI, Elixhauser, Hospital Frailty Risk Score—HFRS). We used binary logistic regression models of all-cause mortality within 90-days and within 30, 45, 120 and 365-days of the primary operation using, adjusted for age and gender. We compared the performance of these models in predicting all-cause mortality using the area under the Receiver-operator characteristics curve (AUROC) and the Index of Prediction Accuracy (IPA). Results We included 276,594 elective primary THRs and 338,287 elective primary KRs for any indication. Mortality within 90-days was 0.34% (N = 939) after THR and 0.26% (N = 865) after KR. The AUROC for the CCI and Elixhauser scores in models of mortality ranged from 0.78–0.81 after THR and KR, which slightly outperformed models with ASA grade (AUROC = 0.77–0.78). HFRS performed similarly to ASA grade (AUROC = 0.76–0.78). The inclusion of comorbidities prior to the primary operation offers no improvement beyond models with comorbidities at the time of the primary. The discriminative ability of all prediction models was best for mortality within 30 days and worst for mortality within 365 days. Conclusions Comorbidity scores add little improvement beyond simpler models with age, gender and ASA grade for predicting mortality within one year after elective hip or knee replacement. The additional patient-specific information required to construct comorbidity scores must be balanced against their prediction gain when considering their utility.

Background Elective knee (KR) and total hip (THR) replacement are amongst the most commonly performed elective operations. They are also highly successful procedures with typical 10-year revision rates of <5% [1]. Mortality after primary hip and knee replacement is rare and has decreased in recent years [2,3]. The National Joint Registry for England, Wales, Northern Ireland, the Isle of Man and the States of Guernsey (NJR) routinely monitors mortality outcomes at surgeon and unit level. This process includes case-mix adjustment for age, gender, indication for surgery and American Society of Anaesthesiologists physical status (ASA grade) which records the preoperative health of surgical patients.
The presence of comorbidities (pre-existing health conditions that coexist with an index disease) is associated with worse health outcomes and more complex clinical management [4]. Comorbidity has been found to be a predictor of perioperative and in-hospital mortality [5], and a risk factor for 90-day mortality after joint replacement [6]. The use of comorbidity score in place of ASA grade may improve prediction of mortality risk, but collection of comorbidities is much more complex and laborious than ASA grade.
Many summary indices of comorbidities based on diagnoses have been derived, however the main focus within replacement surgery has been on the Charlson Comorbidity (CCI) and Elixhauser indices [7]. The Elixhauser index includes 30 conditions and is a composite measurement to assess the impact of comorbidity on surgical procedures [8] and the CCI includes 17 conditions [9]. The Elixhauser index predicted inpatient mortality after orthopaedic surgery better than the CCI [5]. However, comorbidity does not predict long-term mortality [10]. Recent developments in comorbidity scores include the Hospital Frailty Risk Score (HFRS) [11], designed to screen for frailty and identify a group of patients who are at greater risk of adverse outcomes. This was found to predict adverse events after THRs and KRs, but its performance was not compared against other comorbidity indices [12].
The aims of this study are: 1. To determine which comorbidity score best predicts risk of mortality within 90 days of elective primary hip and knee replacement 2. To determine which comorbidity score best predicts risk of mortality within other landmark postoperative timepoints (30, 45, 120, 365 days) after elective primary hip and knee replacement  Data are collected at the time of surgery on prosthesis and operative information, patient information, and surgical and unit information. We linked these records to Hospital Episodes Statistics (HES)-Admitted Patient Care data, established in 1989 [14], for all available episodes up to and including the primary joint replacement operation. For people who had contralateral primary operations we linked separate HES records for each primary operation. Date of death was linked at the person-level using civil registration mortality records.

Ethics approval and consent to participate
Patient consent was obtained for data collection by the NJR. According to the specifications of the NHS Health Research Authority, separate informed consent and ethical approval were not required for the present study.

Study sample
We included patients who received a primary elective THR or KR for any indication between January 1 st 2011 and 31 st December 2015. Patients were followed up until 31 st December 2016. We only included primary operations that could be linked to HES records. This excluded privately funded operations since these episodes are not recorded in HES and hence comorbidity indices could not be derived at the time of the primary operation. This also excluded operations performed in Wales and Scotland, since HES data collection only covers operations performed in England. We excluded primary operations performed in Northern Ireland, the Isle of Man and Guernsey, since data collection in these regions only commenced in 2013, 2015, and 2020, respectively. We also excluded people who had not given consent for recording of personal details for research purposes and primary operations performed for trauma (see Figs 1 & 2).

Potential predictors
The patient's age at the time of surgery (in years, natural spline with knot points at 50 and 75 years) and gender (categorical) were included as potential predictors in all models and comprised our base model. Our reference model contained ASA grade (categorical predictor: 'I', 'II', 'III', 'IV & V'), which is routinely recorded in the NJR and has the advantage of not requiring linkage to other datasets. We used pre-existing conditions recorded in HES using ICD-10 codes to derive the following comorbidity scores: • CCI with two weightings: We derived the comorbidity scores using pre-existing conditions recorded over the following timeframes:

Outcomes
All-cause mortality. Our primary outcome was mortality from all causes within 90 days of the primary operation. Secondary outcomes were all-cause mortality within 30, 45, 120 and 365 days of the primary operation.

Statistical analysis
We analysed mortality outcomes for primary hip and knee replacements separately. We described the comorbidity of people undergoing elective THR and KR operations. We used predicted probabilities from logistic regression models to identify the best comorbidity predictors of mortality, and the optimal timeframe over which to define the comorbidity scores. We constructed the following regression models:

PLOS ONE
We compared models using area under the receiver operating characteristic curve (AUROC), a measure of how well the model discriminates between those who experience the outcome and those who don't (values 0 to 1, 0.5 = no discrimination, higher value = better classifier), and the Index of Prediction Accuracy (IPA) [16], calculated from the null model and model Brier scores to combine discrimination and calibration in a single value (values -1 to 1, 1 is a perfect model, <0 is a harmful model). We performed internal validation using 5-fold cross-validation, and reported the overall results and results of our primary analyses for each fold.
Sensitivity analyses. Some patients received a second primary THR or KR on the opposite joint (contralateral primary) within our follow-up timeframe. These patients will contribute twice to our analyses. We therefore excluded the earliest performed primary and repeated our main analyses.
Data were processed in Stata v15 (StataCorp) and all analyses were performed using R version 4.0 [17] and the 'tidymodels' packages [18]. Confidence intervals (95% CI) were derived using the exact method to evaluate the uncertainty of AUC developed by DeLong [19] and implemented using the algorithm proposed by Sun and Xu [20] in the 'pROC' package [21].
In patients who died within 90 days of their primary operation the five most prevalent comorbidities from the CCI were very similar for people who had a THR or KR: COPD (THR 20%, KR 22%), diabetes without complications (THR 17%, KR 22%), renal disease (THR 16%, KR 17%), acute myocardial infarction (THR and KR 10%) and congestive heart failure (THR 11%, KR 9.7%) (Tables 1 and 2). In the same patients, the most prevalent comorbidities from the Elixhauser index were very similar: uncomplicated hypertension (THR 52%, KR 61%), arrhythmia (THR 25%, KR 24%), chronic pulmonary disease (THR 20%, KR 22%), diabetes without complications (THR 17%, KR 21%) and renal failure (THR 16%, KR 17%). There was a marked difference in the prevalence of metastatic cancer between people who died within 90 days of their THR and KR: 12% and 0.8% respectively. This likely reflects the prophylactic replacement of the hip in patients with metastasis in the proximal femur to prevent a femoral fracture. Metastases in the distal femur, which may require a prophylactic knee replacement, occur much less frequently.
A comparison of comorbidity scores derived from varying lead-up times with those derived from all available episodes (S1-S4 Figs) highlights differences in the capture of high comorbidity scores. The majority of patients had CCI score 0 and the median comorbidity score for all measures at all time points, apart from HFRS derived using 5-year lead-up and all episodes, was 0 (S1 Table). Increasing the timeframe for deriving comorbidity scores decreased the proportion of patients with CCI = 0 and increased the comorbidity scores of the upper quartile a modest amount and the maximum comorbidity scores considerably.
Comparison of models 1. Comorbidity indices using comorbidities at time of primary. The AUROC indicate that, using comorbidities recorded at the time of the primary operation, the CCI (original and PLOS ONE  (Table 3 and Figs 3 and 4). HFRS performed similarly to ASA grade in predicting 90-day mortality after THR and KR (AUROC THR = 0.77, AUROC KR = 0.78). All models performed better than the base model (age and gender only, AUROC THR = 0.72, AUROC KR = 0.74). IPA scores for all models with comorbidity predictors recorded at the time of the primary were comparable or higher than models with ASA grade for THRs (IPA = 0.66% to 2.1% versus IPA = 0. 67%) and KRs (IPA = 0.51% to 1.0% versus IPA = 0.56%) and higher than those for the base models (IPA THR = 0.36%, IPA KR = 0.38%). ROC curves using comorbidity scores derived from conditions recorded at the time of the primary are shown in Figs 5 and 6. 2. Comorbidity indices using history of comorbidities. There was little difference between the discriminative abilities of comorbidity scores derived over different timeframes. The AUROC varied by a maximum of 1/10 th of a percentage point (Table 3). ROC curves for all timeframes are shown in S5-S12 Figs. IPA scores for all models with comorbidity predictors were highest when derived using comorbidities recorded at the time of the primary compared with those which were derived longer timeframes (Table 3). IPA scores for the CCI (original) and Elixhauser index were lowest when all available episodes were included, whereas IPA scores for CCI (SHMI) and HFRS were lowest when two to five years of preceding episodes were used to derive the scores.

Landmarks.
For all comorbidity scores the performance of the prediction models after THR and KR was best for the shortest timeframe (30 days) and their performance worsened with increasing time (Table 4). For THAs, CCI (original and SHMI) and Elixhauser had marginally better discriminative ability than ASA. HFRS had better discriminative ability than ASA for mortality by 30 and 45 days, but was slightly worse for mortality by 120 and 365 days. For KRs, there was almost no difference in the discriminative ability of models with ASA grade compared with any comorbidity score, irrespective of the mortality timeframe. IPA scores increased with increasing time for all potential predictors, indicating improved accuracy for mortality predictions at one year compared with 30 days.

Sensitivity analyses
The results from the five-fold cross-validation show variability of approximately five to seven percentage points in the AUC THR and approximately two and four percentage points in the AUC KR between the best and worst performing folds (S4 and S5 Tables). IPA scores varied considerably, including with two of the five folds indicating harmful models (negative IPA scores). We excluded 12,723 contralateral THRs and 20,703 contralateral KRs performed within one year of the corresponding first primary operation. Results of our primary analyses changed by only 0.3 percentage points (results not reported).

Discussion
We compared the performance of four comorbidity scores (CCI with original and SHMI weights, Elixhauser Index and HFRS) in predicting the risk of all-cause mortality within 30,

PLOS ONE
45, 90, 120 and 365 days of primary elective THRs and KRs. We found that mortality predictions from models with comorbidity scores add only a modest improvement compared with those from models with ASA grade. The CCI (original and SHMI) and Elixhauser scores all performed slightly better than ASA grade in predicting mortality after THR. The inclusion of comorbidities either at the time of or prior to the primary operation offers little improvement beyond models with ASA grade in the prediction of the risk of dying up to one year.
The main strengths of this study relate to the size and completeness of the NJR dataset, and the HES linkage. Mortality within 90 days of elective hip or knee replacement is a rare event and remains so up to one year after the primary operation. The size of the NJR meant that we were able to use a more recent dataset and not rely on the outcomes of operations performed early in the NJR which may not reflect the current postoperative mortality trends, while still having sufficient events to be confident in our findings. The completeness of the NJR data is high. A recent NJR audit of procedure recording compliance found capture rates were 95.7% for primary procedures [22]. This reduces the likelihood of differential reporting which may

PLOS ONE
have affected our models. Our ability to link with the HES data enabled us to derive four different comorbidity indices from the underlying ICD-10 codes and could potentially facilitate the derivation of more comorbidity scores in future.
The need for linkage to HES to derive comorbidity indices is also an important limitation of this study. The availability of HES data for linkage is variable, particularly for privately funded hospital episodes. Therefore, we were not able to derive comorbidity scores for many of the people who had privately funded joint replacements. These patients may have had fewer comorbidities, since private sector units tend to treat patients with fewer comorbidities than publicly funded units [23], although this may not have affected our findings. A further weakness of the HES data is that we do not know whether all pre-existing conditions are recorded for each episode, whether they are recorded accurately or whether incentives to report comorbidities have changed over time. A comparison of comorbidities recorded through HES with those from primary care records (clinical practice research database, CPRD) found that CPRD recorded more comorbidity than HES, but this did not adversely affect their models of mortality risk after gastrointestinal bleeding or diabetes [24]. This suggests that our HES records are likely to be missing some comorbidities, but these may not be important for modelling mortality risk. Some of the conditions recorded at the time of the primary operation may have been conditions which were not present on admission (i.e. complications) [25]. Our models of the risk of mortality may be missing important predictors. This study focussed on assessing whether comorbidity scores should be used instead of ASA grade in existing models, rather

PLOS ONE
than building more comprehensive models to predict these outcomes. In future it may be valuable to consider which other variables should be included in these models. We treated some of the comorbidity scores as continuous variables and alternative parameterisations may be useful, however categorisation of continuous variables rarely increases the ability to detect differences. Although completeness of the NJR and linked mortality data are high, we do not know how many patients have missing dates of death, which may occur for example if someone emigrates after their primary operation. Given the study population and short follow-up time this is unlikely to change our main findings. Finally, we did not validate our models using an external dataset. This would be essential if we intended to develop new prediction models to be applied to new patients, but this is outside the scope of our study.
The performance of our models including CCI and Elixhauser indices (AUROC = 0.78-0.81) predicted 90-day mortality slightly worse than those by Menendez et al. (AUROC = 0.83-0.86) [5] and are comparable with those by Inacio et al. (AUC THR = 0.79-0.80, AUC KR = 0.77) [7]. The timeframe for deriving comorbidity made little difference to model performance. The modest improvements in model fit, which is consistent with Bülow et al. [10], suggest that conditions recorded at the time of the primary joint replacement operation are likely sufficient for capturing comorbidities related to post-operative mortality.
Our models predicted earlier mortality risk better than one-year mortality risk. This is unsurprising given that ASA Grade, CCI and Elixhauser index were derived to better inform risk of death or adverse events during or immediately after surgery. Bülow et al. [10] found

PLOS ONE
that, while comorbidity score (Elixhauser or Charlson) on its own was a poor predictor of mortality risk 5-14 years after primary THR, the performance of models which included age and gender was comparable with those for our much shorter time frame (AUROC = 0.74-0.76). This indicates that the decrease in discriminative ability we observed for models of 365-day mortality compared with 30-day mortality may plateau for risk of mortality beyond one year.
This research has confirmed, using a very large national dataset with very good coverage and completeness, that there is little advantage to using comorbidity scores rather than ASA grade to predict risk of mortality within one year of elective hip and knee replacement. Future

PLOS ONE
research may explore whether these models can be improved by using other algorithms in addition to logit models, particularly for very rare outcomes such as mortality after elective replacement. However, logit models are generally considered to be robust and perform well. Although we have used the comorbidity indices as they have been used in many other studies, the additive approach used to combine conditions in the CCI is algebraically incorrect [26] and Elixhauser et al. intended the comorbidities to be retained as independent measures rather than used to derive a summary Elixhauser index [8]. It may therefore be valuable to explore the impact of these accepted but incorrect approaches may have on mortality prediction. Finally, it may be beneficial to investigate whether comorbidity scores or specific comorbid conditions predict risk of revision after joint replacement surgery.

Conclusions
The comorbidity scores used in this study offered little to no improvements over ASA grade in models of mortality between 30 and 365 days after elective hip or knee replacement surgery. If ASA grade is already available and linkage between datasets is needed to derive comorbidity scores, the inability to link some operations and the additional technical and administrative burdens of including comorbidity scores in models of mortality are not justified.
Supporting information S1 Table. ASA grade and comorbidity scores for the study sample of people having a primary THR and KR.
(DOCX) S2 Table. A comparison of the comorbidity scores of people having a primary THR who died within 90-days of their operation and those who were alive at 90-days.
(DOCX) S3 Table. A comparison of the comorbidity scores of people having a primary KR who died within 90-days of their operation and those who were alive at 90-days.
(DOCX) S4 Table. The area under the ROC curve and IPA scores from each of the 5 cross-validation folds for ASA grade and all comorbidity scores for models of 90-day mortality after THR, adjusted for age and gender.
(DOCX) S5 Table. The area under the ROC curve and IPA scores from each of the 5 cross-validation folds for ASA grade and all comorbidity scores for models of 90-day mortality after KR, adjusted for age and gender.