Ranking hospitals when performance and risk factors are correlated: A simulation-based comparison of risk adjustment approaches for binary outcomes

Background The conceptualization of hospital quality indicators usually includes some form of risk adjustment to account for hospital differences in case mix. For binary outcome variables like in-hospital mortality, frequently utilized risk adjusted measures include the standardized mortality ratio (SMR), the risk standardized mortality rate (RSMR), and excess risk (ER). All of these measures require the estimation of expected hospital mortality, which is often based on logistic regression models. In this context, an issue that is often neglected is correlation between hospital performance (e.g. care quality) and patient-specific risk factors. The objective of this study was to investigate the impact of such correlation on the adequacy of hospital rankings based on different measures and methods. Methods Using Monte Carlo simulation, the impact of correlation between hospital care quality and patient-specific risk factors on the adequacy of hospital rankings was assessed for SMR/RSMR, and ER based on logistic regression and random effects logistic regression. As an alternative method, fixed effects logistic regression with Firth correction was considered. The adequacies of the resulting hospital rankings were assessed by the shares of hospitals correctly classified into quintiles according to their true (unobserved) care qualities. Results The performance of risk adjustment approaches based on logistic regression and random effects logistic regression declined when correlation between care quality and a risk factor was induced. In contrast, fixed-effects-based estimations proved to be more robust. This was particularly true for fixed-effects-logistic-regression-based ER. In the absence of correlation between risk factors and care quality, all approaches showed similar performance. Conclusions Correlation between risk factors and hospital performance may severely bias hospital rankings based on logistic regression and random effects logistic regression. ER based on fixed effects logistic regression with Firth correction should be considered as an alternative approach to assess hospital performance.

Introduction sources used to construct quality indicators, including the German Diagnoses-Related Groups Statistics [16] do not indicate whether a specific comorbidity had been POA.
Against that background, the objective of this paper was to investigate the adequacy of different approaches to risk adjustment in the presence of correlation between hospital performance and risk factors. By the example of logistic regression for a binary outcome, we highlight that common approaches implicitly rely on the assumption that risk factors and hospital performance are uncorrelated. Hence, violations of this assumption may bias the hospital performance assessment. As an alternative method that allows for such correlation, we considered fixed effects logistic regression with Firth correction [17]. Furthermore, we compared the results of SMR/RSMR-based and ER-based assessments. The performance of these approaches was examined using Monte Carlo simulations.

Materials and methods
Note that this study focuses on hospital performance in terms of care quality. In the following, the terms hospital performance and care quality are therefore generally used interchangeably.

Data generation process
We considered a binary outcome variable Y hi � Berðp Y hi Þ, (e.g. mortality), where h = 1, . . ., H denotes hospitals and i = 1, . . ., n h denotes the patients treated within a specific hospital. The probability p Y hi of observing Y hi = 1 (e.g. in-hospital death) was determined by the (unobserved) hospital-specific quality of care Q h and a patient-specific risk factor X hi according to where F(�) is the logistic link function and β 0 and β 1 are fixed coefficients. We assumed that the quality of care follows a Beta distribution, i.e. Q h � i:i:d: Betaðq 1 ; q 2 Þ. Depending on the parameters q 1 and q 2 , this gave us the opportunity to consider scenarios with symmetric and skewed quality distributions (see Fig 1). Importantly, the data generation process allowed Q h and X hi to be correlated. This was achieved by specifying the generating equation for X hi as where a h � i:i:d: Nð0; Z 2 Þ is a normally distributed variable that induces hospital-specific differences with respect to the distribution of the risk factor and ε hi � i:i:d: Nð0; s 2 Þ is a normally distributed patient-specific random term. The coefficient γ relates X hi to Q h and, therefore, induces positive (negative) association between these variables if γ > 0 (γ < 0). If γ = 0, care quality and risk factor are independent. Note that γ does not have to be interpreted as inducing a causal effect but more generally determines the sign and strength of the correlation ρ = Corr[X hi , Q h ]. In each simulated scenario, we chose a specific value of ρ and used the relation ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi to determine the value of γ. It is noteworthy that correlation between risk factor and care quality influences the population average mortality rate p = E[Y hi ]. Since this parameter may be crucial for the detection of performance differences between hospitals, we used a Taylor series approximation of Eq 1 to choose β 0 n order to fix p at a specific value (details are provided in the supporting information S1). Another parameter that affects the chances of detecting differences in hospital performance is the effect size of care quality on the outcome β 1 . Given the logistic model, we set β 1 = ln(OR), where OR denotes the odds ratio of mortality for the highest possible care quality (Q h = 1) relative to the lowest possible care quality (Q h = 0).
Since datasets used for assessments of hospital performance usually include hospitals of different size, we simulated the number of patients treated in the hospitals according to the distribution of bed sizes reported in the German Hospital Directory 2016 provided by the German Federal Statistical Office [18] (see Fig 2). As outlined in detail below, the other parameters included in the data generation process were chosen such that the simulated datasets reflected properties (e.g. average mortality rates) comparable to real-world hospital data used for risk adjustment [1].

Methods and measures
In the following, the statistical models and measures compared in this study are described. All considered models were estimated using the maximum likelihood method [19][20][21][22][23]. a1) SMR based on logistic regression. Given data on outcome Y hi and risk factor X hi , one approach relied on the estimation of the logistic regression model Based on the parameter estimatesâ 0 andâ 1 , patient-specific predicted probabilities of death were calculated asp Y hi ¼ F À 1 ðâ 0 þâ 1 X hi Þ. The mean of these probabilities across all patients treated in a hospital then served as the expected mortality rate for this hospital, i.e. E Logit The logistic regression-based SMR therefore is given bŷ where O h ¼ n À 1 h P n h i¼1 Y hi is the observed mortality rate of hospital h. a2) ER based on logistic regression. As an alternative to the SMR derived from the logistic regression model, the logistic-regression-based excess risk is defined aŝ b1) RSMR based on random effects logistic regression. Following the methodology of CMS [10], the calculation of the RSMR was based on the hierarchical logistic regression model where a 0h � i:i:d: Nðm; x 2 Þ is a hospital-specific, normally distributed term with mean μ and variance ξ 2 . Since α 0h is a random intercept, Eq 6 represents a random effects (RE) model. The random effects RSMR further differs from the logistic-regression-based SMR as it does not relate the observed mortality rate to the expected mortality rate. Rather, it considers the expected mortality rate of a hospital conditional on its estimated performance levelâ 0h , i.e.
h P n h i¼1 F À 1 ðâ 0h þâ 1 X hi Þ, relative to its expected mortality rate conditional on the estimated average hospital performance levelm, i.e.� E RE Logit The estimate of the random effects RSMR thus iŝ Since the expected mortalityÊ RE Logit h lies in between 0 and 1, the RSMR-values of small hospitals are to some extent shrinked towards the overall mean. Note that for interpretability RSMR is sometimes scaled by the average sample mortality rate. However, this linear transformation does not affect the hospital ranking.
b2) ER based on random effects logistic regression. Analogous to the logistic-regression-based excess risk, the excess risk based on the random effects logistic regression model

c1) SMR based on fixed effects logistic regression with Firth correction.
Both the hierarchical random effects and the non-hierarchical logistic regression approach implicitly rely on the assumption that risk factor and care quality are uncorrelated, i.e. ρ = 0. Fixed effects models relax this assumption. In panel econometrics, fixed effects models are routinely applied when there is reason to suspect that observed influence factors are correlated with unobserved, time constant variables [24]. Although the data considered here do not have a time dimension, there is a related structure. While panel data is characterized by multiple time periods per unit [21,22,24], our data contains multiple patients nested within one hospitals. To estimate a logistic fixed effects model, we included hospital-specific dummy variables Note that this specification relates to the data generating Eq 1 by α 1 = 1 and The coefficients ω h � reflect risk-factor adjusted mortality differences between the hospitals. Since the model treats the hospital-specific dummy variables D h � as regressors, Eq 8 is a multiple logistic regression model. Estimation of those models takes correlation between regressors into account [19]. Thus, we expected the assessment of hospital performance based on the fixed effects model to be more robust against correlation between X hi and Q h . When estimating the fixed effects model given by Eq 8, we accounted for the small sample bias of the maximum likelihood estimator [25] and potential convergence problems caused by separation [26] by applying Firth correction [17]. Instead of maximizing the ordinary likelihood function L(ω, α 1 ) of the logistic regression model, Firth's logistic regression maximizes a penalized likelihood function L(ω, α 1 ) � |V(ω, α 1 )| 1/2 , where |V(�)| is the determinant of the Fisher information matrix. Previous studies confirmed that fixed effects logistic regression with Firth correction performed well in related contexts and reported better convergence compared to ordinary logistic regression [11,27].
Given the parameter estimates obtained from fixed effects logistic regression with Firth correction, the predicted mortality rate of hospital h was calculated aŝ The predicted mortality rate of the hospital given the average hospital performance level� o ¼ n À 1 h P H h¼1ôh was derived as Analogous to the RSMR based on the random effects model, the fixed effects RSMR then was obtained bŷ c2) ER based on fixed effects logistic regression with Firth correction. Given� E FE Logit h , the ER based on fixed effects logistic regression with Firth correction was calculated aŝ

Adequacy of the hospital performance assessment
Following [15], the adequacy of the hospital performance estimations was assessed by the proportion of hospitals correctly classified into quintiles according to their true care qualities Q h .

Scenarios
The performance of the measures described above was assessed in multiple scenarios, which differed due to variations of specific parameters relative to a baseline scenario (  1). The odds ratio of care quality was set to OR = 0.5. For the baseline mortality rate of 20%, this implied a mortality rate difference of 11 percentage points between the highest and the lowest care quality. The hospital-specific and the patient-specific standard deviations of X hi were set to η = 0.2 and σ = 0.6, respectively. This resulted in average pseudo-R-squared values of approximately 0.1 when applying logistic regression to the simulated data. Another parameter introduced to the simulations was the minimum number of patients simulated for each hospital n min h . In the baseline scenario, n min h ¼ 15, implying that no considered hospital treated less than 15 patients. Variations of this parameter were used to investigate the robustness of the measures with respect to the presence of hospitals with small volumes. Given this set of parameter values, correlations between risk factor and care quality between -0.8 and 0.8 were simulated. Following related studies [11,15], Monte Carlo estimates for each parameter constellation were based on 1,000 draws.

Simulation results
Without correlation between risk factor and care quality (i.e. ρ = 0), all considered methods resulted in similarly high proportions of hospitals correctly classified into quintiles in the baseline scenario (Fig 3). All measures based on logistic regression and random effects logistic regression performed worse when either positive or negative correlation between risk factor and care quality was induced. Particularly the RSMR based on random effects logistic regression was distorted by correlation. With regard to the fixed-effect-based estimations, there were notable differences between the performance of RSMR and ER. While the RSMR based on fixed effects logistic regression outperformed the other approaches in scenarios with high positive correlation between risk factor and care quality, it performed even worse than the SMR based on simple logistic regression in scenarios with negative correlation. In these cases, ER based on fixed effects logistic regression showed the best classification results. The fixedeffects-based ER also outperformed all measures based on logistic regression and random effects logistic regression in case of positive correlation between risk factor and care quality. Thus, the results of the baseline scenario indicated that ER based on fixed effects logistic regression was most robust against correlation between risk factor and care quality.
Holding the other parameter values of the baseline scenario constant, we also assessed performances for different sample sizes (Fig 4). All measures showed better classification results when the overall number of patients was increased. However, larger sample sizes did not reduce the distortion of the measures based on logistic regression and random effects logistic regression caused by correlation between risk factor and care quality as both positive and negative correlations resulted in worse classification results. These measures were almost always outperformed by RSMR and ER based on fixed effects logistic regression. The only exception was the slightly better classification result obtained by RSMR based on random effects logistic regression in the scenario characterized by 100,000 patients and the absence of correlation between risk factor and care quality (ρ = 0).
The results of the hospital performance assessment were found to depend on the distribution of hospital-care quality (Fig 5). A uniformly distributed care quality (Beta(1,1)) generally led to better classification results compared to the baseline scenario (Beta(6,6)). Given a leftskewed distribution (Beta(3,1)), the fixed-effects-based measures performed better than the other measures for positive correlation between risk factor and care quality but slightly worse than the logistic-regression-based SMR for high negative correlation. The assumption of a right-skewed distribution (Beta(1,3)) resulted in a clear dominance of the fixed-effects-regression-based SMR in case of positive correlation and of fixed-effects-regression-based ER in case of negative correlation. The latter also performed better than all measures based on simple logistic regression or random effects regression in case of positive correlation.  A reduced influence of care quality on the outcome as induced by an increase in the odds ratio of care quality to 0.7 resulted in worse classification results for all considered measures (Fig 6). The fixed-effects-based measures remained dominant for positive correlations between risk factor and care quality. For negative correlations, the best results were obtained from ER based on fixed effects regression and SMR based on simple logistic regression, which differed only slightly. Inducing greater mortality differences between high-quality and low-quality hospitals by reducing the odds ratio to 0.3 generally led to better classification results. However, there was also an increase in the distortion due to correlation between risk factor and care quality of those measures not based on fixed effects regression. Again, ER based on fixed effects regression was found to be most robust against both positive and negative correlation.
A reduction of the population average mortality rate from 20% to 10% was associated with a lower proportion of correctly classified hospitals (Fig 7). While the patterns of the classification results qualitatively remained stable, particularly the RSMR-based measures performed worse for strong negative correlations between risk factor and care quality. In most of these scenarios, ER based on fixed effects logistic regression performed best. The fixed-effects-based measures further dominated when positive correlation between risk factor and care quality was induced. Increasing the population average mortality rate to 30% further increased the dominance of the fixed-effects-regression-based measures.
Varying the minimum number of patients per hospital did not affect the general patterns observed in previous scenarios (Fig 8). However, particularly the performance of fixed-effectsbased measures improved when the number of patients per hospital was increased.

Discussion
In empirical assessments, hospital performance may be correlated with patient-specific risk factors. Better performing hospitals may treat sicker patients than hospitals with worse performance (or vice versa). Such correlation may also arise when the risk adjustment includes comorbidities that had not been present on admission. These issues are neglected by many common approaches to risk adjustment. Against that background, this study assessed the impact of correlation between hospital performance and risk factors on the adequacy of hospital rankings based on different methods and measures for binary outcomes.
The results of Monte Carlo simulations highlighted that ignoring such correlation may lead to severe bias in the performance assessment. The results for the SMRs/RSMRs and ERs based on logistic regression and random effects logistic regression showed that these approaches generally performed worse when either positive or negative correlation between care quality and risk factor was induced. In contrast, measures based on fixed effects logistic regression with Firth correction were more robust to such correlation. This was particularly true for the logistic-regression-based excess risk, which proved to be most robust against both positive and ρ ρ negative correlation between care quality and risk factor. In scenarios without correlation, all considered methods showed similar performance.

Strengths and limitations
Based on a simple simulation setup, this study contributes to the sparse literature on fixed effects approaches in the context of hospital performance measurement [11,[27][28][29][30] by highlighting the effects of correlation between hospital performance and risk factors on hospital rankings. The comparison of multiple methods and measures is one of the main strengths of the present analysis.
As a main result, measures based on fixed effects logistic regression proved to be relatively robust against correlation between risk factors and care quality. Estimation of fixed effects models is subject to several problems. One problem is the small sample bias of the maximum likelihood estimator of the logistic regression model, which may be substantial in magnitude [25]. Furthermore, the outcome of all or some patients may be perfectly predicted by covariates, particularly by the hospital dummies. This phenomenon is known as separation and may cause severe bias and convergence problems [26]. Separation is particularly likely if the dataset includes hospitals with a small number of patients. However, the results in this paper indicate that these issues can be addressed effectively by applying Firth correction, which is consistent with the findings of [17].
Following the methodology of CMS [10], the estimation of random effects RSMR and ER was based on a model that includes a random intercept at the hospital level. This random intercept accounts for correlation of patient outcomes within a hospital and is crucial for capturing quality differences between hospitals. Future studies may also consider random parameter models to allow for heterogeneous effects of risk factors on patient outcomes [20,22]. Furthermore, risk adjustment applications may include multiple hospitals over several time periods and be subject to unobserved spatially shared risk factors. While accounting for temporal and spatial correlation is beyond the scope of the present study, using appropriate modeling approaches [31][32][33][34] would be a promising route for future research.
Another general limitation is that hospital performance is unobservable in real-world applications. Hence, empirical examination of advantages of fixed effects approaches for specific datasets is not feasible. On average, however, our simulations reveal that particularly fixedeffects-logistic-regression-based ER outperforms approaches based on logistic regression and random effects logistic regression in most scenarios. Although many relevant scenarios have been covered in this simulation study, there may be other interesting scenarios that have not been considered here. As one limitation, this study did not examine the effects of confounding due to omitted relevant risk factors. Furthermore, the generalizability of the results to other outcome types and statistical models is open for exploration. As has been demonstrated in related contexts [23,31,32,34,35], the use of alternative approaches to logistic regression could also improve statistical modeling of hospital mortality. These topics could be addressed by future research.

Practical implications
The results of this study indicate that hospital quality indicators based on simple logistic regression and random effects logistic regression have to be interpreted with caution. These approaches may be severely biased when there is correlation between hospital performance and risk factors. Particularly ER based on fixed effects logistic regression with Firth correction was more robust to such correlation. Since we found no relevant differences between methods in the absence of correlation, ER based on fixed effects logistic regression with Firth correction should always be considered when the objective is to rank hospitals according to their performance.
Supporting information S1 Appendix. Taylor approximation of average mortality rate. (PDF) S1 Script. Script file for simulation.