Hospital performance is frequently evaluated by analyzing differences between hospital averages in some quality indicators. The results are often expressed as quality charts of hospital variance (e.g., league tables, funnel plots). However, those analyses seldom consider patients heterogeneity around averages, which is of fundamental relevance for a correct evaluation. Therefore, we apply an innovative methodology based on measures of components of variance and discriminatory accuracy to analyze 30-day mortality after hospital discharge with a diagnosis of Heart Failure (HF) in Sweden.
We analyzed 36,943 patients aged 45–80 treated in 565 wards at 71 hospitals during 2007–2009. We applied single and multilevel logistic regression analyses to calculate the odds ratios and the area under the receiver-operating characteristic (AUC). We evaluated general hospital and ward effects by quantifying the intra-class correlation coefficient (ICC) and the increment in the AUC obtained by adding random effects in a multilevel regression analysis (MLRA). Finally, the Odds Ratios (ORs) for specific ward and hospital characteristics were interpreted jointly with the proportional change in variance (PCV) and the proportion of ORs in the opposite direction (POOR).
Overall, the average 30-day mortality was 9%. Using only patient information on age and previous hospitalizations for different diseases we obtained an AUC = 0.727. This value was almost unchanged when adding sex, country of birth as well as hospitals and wards levels. Average mortality was higher in small wards and municipal hospitals but the POOR values were 15% and 16% respectively.
Swedish wards and hospitals in general performed homogeneously well, resulting in a low 30-day mortality rate after HF. In our study, knowledge on a patient’s previous hospitalizations was the best predictor of 30-day mortality, and this information did not improve by knowing the sex and country of birth of the patient or where the patient was treated.
Citation: Ghith N, Wagner P, Frølich A, Merlo J (2016) Short Term Survival after Admission for Heart Failure in Sweden: Applying Multilevel Analyses of Discriminatory Accuracy to Evaluate Institutional Performance. PLoS ONE 11(2): e0148187. https://doi.org/10.1371/journal.pone.0148187
Editor: Pablo Garcia de Frutos, IIBB-CSIC-IDIBAPS, SPAIN
Received: September 10, 2015; Accepted: January 14, 2016; Published: February 3, 2016
Copyright: © 2016 Ghith et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The database we analyzed is not publicly available for ethical and data safety reasons according to the Swedish National Board of Health and Welfare. However, the same dataset can be constructed by request to the Swedish National Board of Health and Welfare after approval of the research project by an Ethical Committee and by the data safety committee at the Swedish National Board of Health and Welfare.
Funding: This work was supported by the Swedish Research Council (VR) [#2013-2484, Juan Merlo] and by grants from the University of Copenhagen and the Association of Hospitals in Copenhagen for Nermin Ghith as a PHD candidate. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Heart Failure (HF) is a serious, life-threatening condition that causes considerable disability. However, early diagnosis and adequate treatment for heart failure can improve life quality and prolong survival.
Short term mortality within 30 days after a discharge diagnosis of heart failure is a commonly used quality indicator for evaluating hospital performance [2–5] which, as any other quality indicator, requires regular assessment.
Nowadays, multilevel regression analysis (MLRA) is becoming established as a suitable methodology for the evaluation of institutional performance [6, 7]. MLRA takes into account the multilevel structure of the data (e.g., patients nested within hospitals). This technique allows a less biased estimation of uncertainty providing a better ranking of the health care units under investigation. In addition, MLRA allows a flexible analysis of components of variance and permits estimating measures of association at different levels of analysis [8–11]. Nevertheless, many studies performed today are still using single level designs analyzing, for instance, individual patient data with dummy variables for the hospitals , or information aggregated at the hospital level [12–15]. Even when high quality data is available at the patient, institutional and geographical levels, assessments are normally performed at one level by using, for instance, funnel plots , health league tables , or similar   to compare hospitals or small geographical area averages [20, 21]. This single level approach is extensively used, but may provide misleading information for decision makers .
Conceptually, most studies evaluating institutional performance in health care (e.g., hospitals) make two implicit assumptions. First, it is assumed that over and above patient characteristics, the hospital context exerts a general, shared effect on all patients at the hospital. The existence of general, unspecified institutional effects originates differences between institutions that condition individual prognosis over and above individual characteristics. However, while the hospital level itself may play a direct role, most of this type of influence could result at the ward level within the hospitals where the patients are actually treated.
Second, it is often assumed that this general institutional effect can be measured by quantifying differences between averages in certain quality indicators. Here, researchers may perform funnel plots, control charts or ‘league tables’ where, for instance, hospitals are ranked according to their average 30-day mortality after admission for heart failure. However, for evaluating general institutional effect, what matters mostly is not the existence of differences between hospital averages, but rather the share of the patient differences in the outcome at the hospital level. This concept corresponds nicely with the idea of intra-class correlation (ICC) used to quantify general contextual effects in social epidemiology [23, 24], and it has been previously applied for assessing institutional performance [10, 25, 26]. This idea is even closely related to the notion of discriminatory accuracy (DA) developed for the evaluation of the performance of prognostic and screening markers in medicine [27, 28]. It is therefore possible to apply MLRA and use measures of DA like the area under the Receiving Operator Characteristics curve (AUC) to quantify general hospital effects [29–31]. In contrast with the ICC for binary outcomes , measures of DA like the AUC are well established in clinical and health care epidemiology and the information they give is relatively easy to interpret and communicate.
The application of AUC measures is also convenient as the evaluation of hospital performance (e.g., league tables) is often used as a basis for informed decisions. In this aspect, the evaluation of hospital performance resembles a screening test so we should know the discriminatory accuracy of, for instance, a funnel plot or a hospital league table before we use it to make decisions.
With the above background, our study quantifies to which extent hospital and ward differences in averages are relevant for understanding patient disparities in short term mortality after heart failure in Sweden. For this purpose, we perform a multilevel analysis that distinguishes between hospital, ward and patient components of variance and informs on both general and specific contextual effects on patient survival. We also apply a novel methodological approach based on measures of discriminatory accuracy [29–31]. In this approach, we considered information on hospitals and wards (i.e., league tables) as a tool for classifying patients according to their 30-day survival and assess the AUC of this instrument over and above patients’ characteristics.
Population and Methods
The National Board of Health and Welfare linked information from the Swedish Patient Register and the Cause of Death Register using a unique personal identification number. This data was then linked to the Longitudinal Integrated Database for Health Insurance and Labour Studies (LISA) that is maintained by Statistics Sweden and contains demographic and socioeconomic information. Finally, the Swedish authorities delivered the research database to us without the personal identification numbers to ensure the anonymity of the subjects.
From the research database we identified all 60,130 patients with a discharge diagnosis of heart failure (International Classification of Diseases code I50) admitted to Swedish hospitals between 2007–2009 with an age between 45 to 80 years. A major problem when comparing differences in outcome indicators like mortality is the threat of confounding by patient-mix. That is, some hospitals may provide specialized care to patients with a higher degree of co-morbidity than others and, in turn, co-morbidity is associated to mortality. Since previous heart failure is a strong determinant of mortality after a new hospitalization for heart failure, the most appropriate way of evaluating hospital quality in relation to survival after heart failure is to observe survival in patients with a first diagnosis of heart failure. In this way, we increase the homogeneity of this patient group, which gives a better basis for less confounded analyses. Therefore, we excluded 22,587 patients with a previous diagnosis of heart failure in the past five years.
Finally, as our focus was to evaluate hospitals, we excluded 448 patients treated at small nursing homes. In addition, to ensure the stability of the estimations, we arbitrarily excluded hospitals with less than 50 patients that led to a further exclusion of 152 patients. The final data set included 36, 943 patients within 565 wards from 71 hospitals (Fig 1).
Assessment of Variables
The study outcome is all-cause mortality within 30-days after discharge from the hospital (coded yes vs. no).
Since survival after heart failure may be conditioned by the gender and ethnic origin of the patient, we created a combined variable with four categories: Swedish females, Swedish males, non-Swedish females, and non-Swedish males and used the Swedish females as the reference group in subsequent comparisons.
An intrinsic problem when analyzing outcome indicators like mortality is the peril of selection bias so patients with a worse prognosis are channeled to certain hospitals. Therefore, hospital differences may be confounded by this selection of patients. Therefore, besides excluding patients with a previous diagnosis of HC, to reduce this compositional confounding (i.e., case-mix), we computed a risk score (RS) for 30-days mortality using previous discharge diagnosis (see methods section). We categorized the RS into 10 groups by deciles, using the first decile group as the reference in the comparisons.
Ward and hospitals characteristics.
Results from previous studies [2, 3, 33, 34] suggest that, on average, the prognosis of patients with heart failure improves when they are treated at institutions with higher volumes of patients. This situation could be explained by the fact that the medical staff could have more expertise. It may also originate in a selective referral of patients to hospitals with good reputation which causes higher volumes of patients . This selective referral might in turn channel patients with more severe conditions and act as a confounder that impairs average mortality. The exclusive use of administrative data for volume–outcomes research may, therefore, generate spurious findings.  Additionally, when studying volume of patients as a contextual variable rather than the hospital as a whole, the relevant level of analysis should be the ward where the patients are actually treated.
Aware of those difficulties, we computed a variable informing the number of patients with heart failure admitted to each hospital ward. We then categorized this variable into three groups by tertiles, using the first tertile group as the reference in the comparisons.
We also categorized the hospitals according to the traditional Swedish classification as (i) regional, (ii) provincial and (iii) municipal since the type of hospital could have an impact on mortality over and above ward and patient characteristics.
Risk score (RS) for mortality.
Using a single level stepwise backward logistic regression, we modeled 30-days mortality as a function of age and previous diseases (ICD-10 codes) and obtained the predicted probability (i.e., individual risk score) of 30-days mortality. Table 1 shows the out- and inpatients diagnoses considered in an initial stepwise logistic regression to develop the RS equation. The purpose for developing the RS was not to create an equation for future prediction of 30-days mortality for patient with heart failure  as those currently available for patient-mix adjustment (see, for instance, the Charlson Risk Score , The Elixhauser Score , and the CMS Hierarchical Condition Category "CMS-HCC" risk adjuster ). Rather, our aim was to perform a parsimonious analysis where the RS summarizes a large number of variables into a single construct. Otherwise, the adjusted models using the RS gave the same results as a model including the disease variables separately used for the computation of the RS.
Regression models analyses.
We applied an original, stepwise-multilevel, logistic regression analysis of discriminatory accuracy recently developed by this team . We used 36,943 patients nested within 565 wards that were, in turn, nested within 71 hospitals. We developed four consecutive logistic regression analyses to model mortality.
For every model equation, we obtained the predicted logit and computed the area under the receiver operator characteristics (AUC) curve or C statistic. The AUC measures the ability of the model to correctly classify individuals with or without the outcome (e.g., mortality within 30 days). The AUC assumes a value between 1 and 0.5 where 1 is perfect discrimination and 0.5 would indicate that the model is as informative as flipping a coin (i.e., the covariates have no predictive power) .
In the first model (model 1), we wanted to evaluate the influence of patients’ characteristics alone on short term mortality. We fitted a single level logistic regression model including only the risk score for mortality (RS). We obtained the AUC as a measure of the capacity of the individual level information (i.e. RS) to classify with accuracy the patients who survive from those who die.
In the second model (model 2), we added the combined explanatory variable of gender by migration/ethnic status. We aimed to know the value added of this information to classify with accuracy the patients who survive from those who die over and above the RS for mortality. For this purpose, we calculated the AUC and the difference between the AUC value of model 2 and 1.
In the next model (model 3), we expanded model 2 by including two random intercepts one for the wards inside hospitals and another for the hospitals level in a three level multilevel regression model. We calculated the intra-class correlation coefficient for the hospital (ICCh) and for the ward level (ICCw) from this model.
The ICC measures the observational general hospital/ward effect. Expressed as a percentage, the value of the ICC goes from 0% to 100%. If the hospitals/wards were not relevant for understanding patient short term mortality differences in Sweden, the ICCh would be close to 0%. That is, the hospitals would be similar to random samples taken from the whole patient population.
In the absence of variation, the MOR is equal to 1. The higher the MOR, the more relevant the hospital/ward is for understanding the individual outcome.
In the calculation of the AUC for the MLRA models, the prediction equation includes the random effects (i.e., higher level residuals) as has been discussed elsewhere [29–31]. By computing the difference between the AUC of the models 3 and 2, we obtained information on the added value of knowing the hospital and the ward where the patient was treated in order to classify with accuracy the patients who survive from those who die, over and above patient information. This difference also assesses the relevance of the hospitals and the wards for patients’ short term mortality, which provides complementary information to that obtained by the ICC [29, 30]. We also calculated two alternative AUCs in model 3. One was based on an equation that excluded the hospital residuals and another that excluded the wards residuals. In this way, we could quantify the contribution of having only information from one of those two levels.
Furthermore, to illustrate the differences between wards and hospitals averages in short term mortality, we created league tables by ranking hospitals and wards according to their average 30-day mortality rate using the values of the shrunken residuals and their 95% confidence intervals. The value added of using the league tables alone for auditing outcome quality is quantified by the ICC in model 3 and also by the difference between AUCs of model 3 and 2.
In the next model (model 4), we added the ward and the hospital level specific variables indicating the volume of patients with heart failure (in tertile groups) as well as the hospital classification.
Our aim was to understand the mechanism underlying eventual ward and hospital general effects. For this purpose, besides measuring specific contextual effects (i.e., odds ratios), we also calculated the proportional change in variance (PCV) as the percentage of the wards and hospitals variance in model 3 that was explained by the specific wards and hospitals variables in model 4.
We calculated AUC for model 4 as well as the difference between the AUC value of model 4 and model 2. We observed that model 4 cannot increase the AUC value obtained in previous model 3. As the inclusion of specific contextual variables as fixed effects will eventually explain some of the wards and hospitals intercept variance (i.e., decrease the shrunken residuals), they simultaneously improve the prediction equation by adding two regression-coefficients for the variables ward volume of patients and hospital classification. That is, the increase in the AUC in model 3 compared to model 2 represents the ceiling of the hospital’s general effect. Assuming an appropriate patient mix adjustment, it represents both measurable and immeasurable institutional factors that condition the prognosis of the HF patients.
We used the Proportion of Opposed Odds Ratios (POOR) for the contextual variable on hospital type. The values of the POOR extend between 0% and 50%. A POOR of 0% means all ORs have the same sign. A POOR of 50% would mean that half of the ORs are of the opposite sign and so the association is very heterogeneous. We calculated the POOR for the hospital and the ward level. See S1 Appendix for specific information.
For the estimation of the models we initially used the Restricted Iterative Generalized Least Squares (RIGLS) method to obtain values for the final Markov Chain Monte Carlo (MCMC) estimation method . We used the Bayesian Deviance Information Criterion (BDIC) as a measure of the goodness of fit of the models . The BDIC considers both the model deviance and its complexity. Models with smaller BDIC should be preferred to models with larger BDIC.
We estimated the variance as the median and 95% credible intervals of the posterior distribution obtained by the Markov Chain Monte Carlo (MCMC) method  and included additional technical details on the all the models and measures in S1 Appendix.
We performed the analyses using IBM SPSS Statistics for Windows, Version 21 (IBM Corp., Armonk, NY, USA), STATA, StataCorp. 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP and MLwiN version 2.22, The Centre for Multilevel Modeling, University of Bristol.
Characteristics of the population
Table 1 shows that most of the patients with heart failure were treated at the 21 provincial hospitals. However, the median number of patients was highest at the nine regional hospitals and lowest at the 41 municipal hospitals. Additionally, wards at the provincial hospitals had the highest volume of patients with heart failure. Crude 30-day mortality after heart failure was slightly lower at the municipal hospitals than in the other types of hospitals. Patients at the regional hospitals also appear to be slightly younger than patients at other hospitals. Municipal hospitals had the highest percentage of Swedish patients, and regional hospitals had the highest percentage of non-Swedish patients. On average, the RS was similar in the three hospital types even if the percentage of patients in the higher RS group was slightly higher in the regional hospitals.
Table 2 shows principle causes of death for all patients with heart failure who died within 30 days of hospital discharge. Diseases of the circulatory system were the most common cause of death (51.6%), and within this category ischaemic heart disease (59.3%).
Measures of association and specific ward and hospital effects
Table 3 shows that, as expected, the RS was strongly associated with 30-day mortality (We show only the 1st, 4th, 7th and 10th decile groups, as it is sufficient to illustrate the association). Independently of the RS we additionally observed that non-Swedish females have a lower mortality risk than the Swedish females. The last model 4 even shows that, in the adjusted analyses, compared with the wards with the lowest volume of patients, patients treated at wards with the highest volume of patients have the lowest mortality risk, yet the POOR was rather high for the 3rd category (POOR = 15%), and especially higher in the 2nd category (POOR = 33%), which indicates the existence of considerable ward heterogeneity for this association.
Patients treated at municipal hospitals presented a somewhat lower mortality risk than those treated at the regional hospitals. However, the 95% CI included one and the POOR was 16%. In addition, on average, patients treated at the provincial hospitals had a slightly higher mortality and the POOR was only 3%.
Measures of variance and general wards and hospital effects
Models 3 and 4 in Table 4 present the results of the variance components multilevel analysis. Both the ward and hospital components of variance were small. However, the ICC for the ward level was much higher (i.e. 5.3%) than the ICC of the hospital which was very close to zero (i.e., 0.4%). Also the MORW from models 3 and 4 were higher than the MORH.
Values are median and 95% credible/confidence intervals (CI).
These findings need to be considered when interpreting Fig 2 (based on model 3 in Tables 3 and 4) which shows the ranking of the wards and hospitals according to their average short term mortality using the overall mortality risk as a reference. Besides the initial low general ward and hospital effects, there is a considerable uncertainty of the estimated averages, which expresses itself as a considerable overlapping of the 95% CIs.
Values are logarithm odds ratios (i.e., shrunken residuals) with 95% confidence intervals (vertical lines) adjusted for mortality risk score, sex and ethnic origin (see model 3 in Tables 3 and 4). The figure also indicates the values of the hospital and wards intra-class correlation coefficients (ICC) for 30-day mortality.
The inclusion of the hospital and ward characteristics in model 4 of Table 4 explained about 46% of the minor hospital variance and about a third of the small ward variance.
Measures of discriminatory accuracy (AUC)
The very inclusion of the RS in model 1 (Table 4) produced an AUC of 0.727 that did not increase substantially when including information on gender and ethnicity in model 2. The same was true when including the random intercepts for the ward and hospital levels.
Fig 3 shows the ROC curves for the models distinguishing the separate contribution of the ward and hospital levels of model 3. It is clear that the (small) increase of the AUC for model 3 as compared with model 1 is due to the ward component. All the AUC values along with their 95% CIs are presented as part of Fig 3.
Model 1 (black line) is a simple logistic regression model including the individual risk score. Model 2 (grey line) is as model 1 but adding sex and ethnicity in categories. Model 3 is as model 2 but adding information on hospitals and wards in a multilevel logistic regression analysis. The ROC curve for model 3 is split showing the contribution of the ward level (thick dotted line) and of the hospital level (thin dotted line).
Bayesian Deviance Information Criterion (BDIC)
The BDIC value for the first reference model considerably decreased in model 2 that included the sex and ethnicity of the patients. However, we observed the highest reduction in models 3 and 4, which incorporated the random intercepts for the ward and hospital levels (see Table 4).
General study findings
We evaluated hospital performance in Sweden using an outcome indicator, 30-day mortality after heart failure. Beyond evaluating the hospital performance, our aim was to illustrate an innovative, stepwise-multilevel, logistic regression analysis of discriminatory accuracy recently developed by this team . This methodological approach combines single and multilevel regression models in order to evaluate the possible existence of general, unspecified institutional (i.e., wards and hospitals) effects which could condition individual prognosis over and above individual characteristics. Such observational effects are quantified by measures of clustering like the ICC [2, 10, 32], heterogeneity measure MOR , as well as by measures of discriminatory accuracy like the AUC [29–31].
We found that, overall in Sweden, 30-day mortality after heart failure was around 9%. This percentage appears rather low as comparing with other countries such as the United States with national values around 11–12% (2008–2012) including USA Medicaid hospitals with values between 10.8–11.29% (2008–2011) , but still higher than the 6.6–7.9% (2005–2008)  and 7.5–8.2% (2008–2009)  benchmarking values reported by some studies and hospitals in the United States. However, those figures are not fully comparable since there may be population differences in age structure, co-morbidities as well as non-standardized diagnostic criteria needing special consideration when comparing different health care settings.
We also observed differences between hospitals and, especially, between wards in average 30-day mortality around the national average of 30-day mortality. However, those differences between averages provide insufficient information that cannot be used for discriminating patients who will survive or not. Rather than differences between averages, what matters is to know the share of the individual differences in the propensity of dying that are at the hospital level. From this perspective, the almost non-existent hospital ICC (i.e., 0.04%) suggests that all hospitals in Sweden had a similar performance.
It is obvious that each individual hospital may have its own context that embraces factors like internal organization, treatment guidelines and harmonized clinical practices that could condition patient survival over and above patient characteristics. That is, the same patient would have a different prognosis if she/he was treated at a specific hospital rather than another. If this is true, we should expect a large general hospital effect that expresses itself as a high ICC. Also, the AUC would be expected to increase considerably after adding the hospital level to the predictive equation. Contrary to this, our results indicate that the Swedish hospitals were like random samples from the whole population of patients with heart failure. The ranking of hospitals by a so-called league table (Fig 2B) shows that no one of the hospitals could be distinguished with certainty from the overall average mortality. In addition, the hospital ICC was close to null and the AUC analysis confirmed the interpretation of the ICC values. This demonstrates that knowledge of patients age and previous hospitalizations was enough to obtain a relatively high discriminatory accuracy (AUC = 0.727). Further information on a patient sex or ethnicity only resulted in a minor contribution and the same was true when adding information on the hospital or ward where the patient was treated. However, the hospital level hides some ward heterogeneity. In fact, despite that the ward ICC was small (i.e., 5.3%), it was much higher than the hospital ICC.
In practice, our results suggest that any intervention directed at decreasing 30-day mortality in patients with heart failure should be focused on all Swedish hospitals and not only on those with an average mortality higher than the national average. We could consider launching an intervention in specific wards (i.e., those with the highest mortality in Fig 2A) but again, neither the ward ICC nor the AUC analysis gives strong support for this initiative. Therefore, despite findings from traditional measures of association (OR and 95% CI) indicating that wards with higher patient volumes, and municipal hospitals shown lower average mortality, we cannot qualify these wards and hospitals as better performers since the discriminatory accuracy of the ward and, more so, the hospital level was very low. Furthermore, the lower mortality risk found in municipal hospitals compared with regional hospitals was not conclusive and the POOR was 16% suggesting the existence of a weak and heterogeneous pattern of association. Even if patients treated at the provincial hospitals had a slightly higher mortality and the POOR was only 3%, the initial general hospital effect was very low, which makes this finding rather irrelevant.
In summary, our study shows that we cannot count on traditional measures based on differences between averages to evaluate hospital performance. The multilevel methodological approach we are promoting  needs a joint analysis including patient variables, hospital/ward units, and hospital/ward specific characteristics. The traditional use of league tables, funnel plots or reporting “significant” hospital variance alone should be reconsidered. Rather, a proper evaluation of hospital performance needs to include measures of association, variance and discriminatory accuracy.
The study includes three analytical steps. The first step (models 1 and 2) analyzes patient-level covariates in standard (i.e., single-level) logistic regressions. The selection of these individual variables is based on the assumption that they are confounders (i.e., patient mix).The second step (model 3) quantifies general hospital effects by measuring the ICC, the MOR and the increment in the AUC obtained by adding hospital/ward level information. We only include the hospital/ward codes without specifying any hospital/ward characteristics. The final step (Model 4) includes specific hospital/ward information (ward volume and type of hospital). In this model, the interpretation of the OR, and the POOR must always be done in relation to the hospital/ward variance (.) obtained in step 2 (Model 3) and the PCV associated with moving from model 2 to model 3. For instance, suppose Model 3 estimated a high value for and therefore a high ICCh for the binary outcome “30-day mortality”. Thereafter, in Model 4, we include contextual variables (e.g., type of hospital). If, for instance, the type of hospital is associated with the outcome (a high OR) and it explains a large share of (PCV is high) the POOR would be low. This case illustrates a situation where the hospital context conditions the mortality (i.e., high and ICCh). It also demonstrates that this influence appears mediated by the contextual variable (type of hospital) so the hospital variable is not only strongly associated with the outcome, but it also explains the hospital variance and thereby shows a low POOR. In other words, the conclusion would be that the hospital context influences the individual mortality and that this influence has to do with the type of hospital. However, there are other possible situations. For instance, could be very low from the beginning (Model 3) and the hospital variable could be significantly associated to the outcome but still does not explain much of the (i.e., low PCV) in Model 4. Nevertheless, since was low from the beginning, the POOR would be low. In this case, the hospital context would have a small influence on patient mortality even if the variable hospital type is, on average, associated with mortality and the POOR is low. Our results are included in this last scenario.
Strengths and weaknesses
This study has a number of strengths. The Swedish registers have a considerable quality and they are based on standardized procedures for data collection and storage . As the follow up only concerned the first 30 days after discharge from the hospital, the follow up data for short term mortality is, in principle, complete. Initially, we used data on all patients who met the inclusion criteria and were admitted to Swedish hospitals between 2007 and 2009. The registers cover the entire country of Sweden and information on mortality is recorded even when a Swedish resident dies outside the country.
We did, however, include Swedish hospitals that admitted at least 50 patients, which may have introduced some selection bias. This reduced extrapolation on the hospital specific associations obtained in MLRA. However, variance is not misestimated by this procedure since in MLRA the variance concerns the shrunken residuals and small size hospitals are shrunken towards the mean to avoid statistical noise.
To adjust the difference in patient-mix we covered a wide range of co-morbidities which are known in literature to be related to mortality after HF[48, 49].Additionally, we used a RS   which reduced the number of predictors, and the number of outcome events per variable was large enough .
On the other hand, this study has specific limitations. The models employed lack information on previous and current medications, as well as life style indicators on obesity, smoking and alcohol consumption [48, 49]. Additionally, no model validation was performed. Yet, if the models were over fitted, our prediction would be an overestimation so the small contribution of the hospital and ward components would even be lower. Second, the purpose of our study was not primarily to create a new risk score equation but rather to adjust for patient mix. Although, the analytical methodology we propose can be implemented in many diverse contexts, yet the results of our study concern only Swedish hospitals and wards and the RS equation we obtained cannot be used for prediction outside Sweden .
The use of ROC plots and the calculation of AUC are common practices for the evaluation of model fitting in health care epidemiology, and this approach has been criticized in a previous study by Katzan et al. 2014  p. 922. These authors concluded that "An important limitation of the C statistic is that it only measures the predictive power of a model at the patient level and does not directly express the ability of the model to accurately profile hospitals with respect to the hospital specific risk-standardized ratios". However, our approach actually gives a solution to those critics as the quantification of hospital general effects allows to accurately profile hospitals. For this purpose, we can calculate the hospital ICC adjusted for patient variables (i.e., Model 3 in our study) or perform a two-step analysis using the AUC. That is, first we fit a customary single level logistic regression model only including individual characteristics, thereafter, we use a multilevel regression model that includes a random effect for the hospital (and wards) level and, finally, we calculate the improvement in the AUC that is due to the general hospital effect. For more information on the link between AUC and ICC and particularities of each measure, please see S1 Appendix.
As far as we known, vast majority of previous studies have primarily used the AUC or C-statistic solely to statistically check the model performance using fixed effect analyses ignoring the multilevel nature of the variance.[54–57]. On some occasions, researchers have applied mixed effect models considering the multilevel structure of the data [58, 59], but they have not used the AUC for evaluating hospital general effects as this study does. Those multilevel studies did not explicitly consider the AUC to quantify if the hospital/ward level information could discriminate between patients who will and who will not suffer the outcome as is done in the current and in previous studies [29, 30, 60]. As far we know, the AUC approach for evaluating hospital performance was introduced by this team [29, 30] and was followed by a French publication.
In essence, the idea of using measures of variance like the ICC [2, 10] for the evaluation of institutional performance is not new and this team already applied it to evaluate survival after initial hospitalization for heart failure  and myocardial infarction  in Sweden in 2001. Using this methodology we also audited neonatal mortality in the Swedish acute care hospitals  and evaluated different quality indicators for medication use in the country . We have applied a similar approach when it comes to investigate geographical differences in health outcomes    . The methodology we use has also been recommended in a white paper published in 2011 by the committee assigned to set statistical guidelines for assessing Centers for Medicare and Medicaid Services (CMS) hospital performance in USA. S1 Appendix incorporates more details on estimating ICC, a conceptual link between ICC and AUC, as well as alternative measures of clustering.
We agree with Krumholz et al, 2013  in that for evaluating hospital quality, we are seeking to measure a latent variable of quality. This idea approximates the concept of general (or “latent”) hospital effect in our model 3 and the use of the ICC we have promoted in this and previous studies of ours [25, 67]. However, the innovativeness of our study is that by including the hospital residuals from the MLRA in the prediction and calculating the AUC we are able to quantify general (i.e., “latent”) hospital effects in the same way we evaluate the predictive ability of patient level variables [29, 30].
Observe that the hospital variance can be very small but still “statistically significant” if the sample is large. What matters most is not statistical significance but the size of the ICC. This idea can also be expressed by the AUC approach discussed in this study.
A simple way of evaluating performance using multilevel analysis of discriminatory accuracy
Two sources of information are needed to evaluate hospital and ward performance: 1) the overall average of short term patient mortality in Sweden, and 2) the variation around this average. The multilevel analysis decomposed this variance at the hospital, ward and patient level, which allows us to identify which of those levels is actually relevant for the outcome. With regard to this study, it is important to stress that the low ICC/AUROC by itself does not mean that the hospital level has no influence on patient’s survival. It is obvious that the care and treatment of heart failure patients at the hospital conditions their prognosis. A low ICC/AUC means there are no remarkable hospital differences. In other words, information at hospital and ward levels does not contribute to understanding individual differences in short term mortality after admission with heart failure. For instance, an explanation could be that all hospitals are performing homogeneously in Sweden.
In practice, we could imagine four extreme situations combining overall prevalence (good vs. bad) and measures of ICC/AUROC (low vs. high) with gliding situations between them. We formulate these scenarios for the hospital level. If the overall survival is good (i.e., high) and the ICC low, all hospitals were doing homogeneously well. However, if the overall survival is low and the ICC low, all hospitals were doing homogeneously bad. In practice, a low ICC suggests that we do not need to point out specific hospitals to improve patient survival. Rather, the focus on intervention should be targeting all hospitals in the country. However, the higher the ICC the more appropriate to act on the hospitals with the worst survival. A similar reasoning can be applied when using the AUC in combination with the overall average 30-day mortality. If the hospitals and wards do not add discriminatory accuracy, it would not be effective to focus on certain hospitals. If we aim to decrease mortality, we should focus on improving care in the patients with higher mortality while it does not matter which is the hospital of treatment. In our study, knowing the patient age and previous diseases (i.e., the RS) was enough to achieve an AUC = 0.727 which was only slightly better when knowing the sex and ethnic status of the patient (+0.002 units). However, this patient level information was only marginally improved by knowing the hospital or the ward where the patient was treated.
In summary, we may find a good or a bad overall average for the quality indicator but before planning an intervention at the actual institutional level (e.g., hospitals, wards) we need to consider other complementary sources of information such as the size of the ICC or AUC or both depending on the purpose of evaluation. As it concerns 30-day survival after HF in Sweden, we observed that the Swedish hospitals were performing homogeneously well. However, to further increase survival after HF, it is necessary to improve treatment in high risk patients regardless of which hospital they are treated in.
We would like to thank Ms. Raquel Perez Vicente for preparing the study dataset, as well as Ms. Sharon Miller who provided language revision of the final version of the manuscript.
Conceived and designed the experiments: JM NG. Analyzed the data: NG JM. Contributed reagents/materials/analysis tools: NG JM AF PW. Wrote the paper: NG JM AF PW. Wrote the initial manuscript, applied the methodology and performed the analyses in collaboration with JM: NG. Had the initiative of the study and developed the methodology in collaboration with PW: JM. Provided special knowledge on measurement of hospital quality: NG AF. Contributed to the design and writing of the study and interpretation of the results: NG JM AF PW.
- 1. Schaufelberger M, Swedberg K, Köster M, Rosén M, Rosengren A. Decreasing one-year mortality and hospitalization rates for heart failure in Sweden. European Heart Journal. 2004;25(4):300–7. pmid:14984918
- 2. Merlo J, Ostergren PO, Broms K, Bjorck-Linne A, Liedholm H. Survival after initial hospitalisation for heart failure: a multilevel analysis of patients in Swedish acute care hospitals. J Epidemiol Community Health. 2001;55(5):323–9. Epub 2001/04/12. pmid:11297650; PubMed Central PMCID: PMC1731888.
- 3. Bhatia RS, Austin PC, Stukel TA, Schull MJ, Chong A, Tu JV, et al. Outcomes in patients with heart failure treated in hospitals with varying admission rates: population-based cohort study. BMJ Quality & Safety. 2014.
- 4. Harris S, Tepper D, Ip R. Divergent Trends in Survival and Readmission Following a Hospitalization for Heart Failure in the Veterans Affairs Health Care System 2002 to 2006. Congestive Heart Failure. 2011;17(1):47–.
- 5. Krumholz HM, Merrill AR, Schone EM, Schreiner GC, Chen J, Bradley EH, et al. Patterns of Hospital Performance in Acute Myocardial Infarction and Heart Failure 30-Day Mortality and Readmission. Circulation: Cardiovascular Quality and Outcomes. 2009;2(5):407–13.
- 6. Goldstein H, Spiegelhalter D. League Tables and Their Limitations: Statistical Issues in Comparisons of Institutional Performance. Journal of the Royal Statistical Society, SocA. 1996;159:385–443.
- 7. Leckie G, Goldstein H. Understanding Uncertainty in School League Tables. Fiscal Studies. 2011;
- 8. Merlo J. Changing analytical approaches in European epidemiology—a short comment on a recent article. European journal of epidemiology. 2005;20(8):737; author reply 8. Epub 2005/09/10. pmid:16151890.
- 9. Merlo J, Gerdtham UG, Eckerlund I, Hakansson S, Otterblad-Olausson P, Pakkanen M, et al. Hospital level of care and neonatal mortality in low- and high-risk deliveries: reassessing the question in Sweden by multilevel analysis. Med Care. 2005;43(11):1092–100. Epub 2005/10/15. pmid:16224302.
- 10. Ohlsson H, Librero J, Sundquist J, Sundquist K, Merlo J. Performance evaluations and league tables: do they capture variation between organizational units? An analysis of 5 Swedish pharmacological performance indicators. Med Care. 2011;49(3):327–31. Epub 2011/01/26. pmid:21263360.
- 11. Ohlsson H, Merlo J. Is there important variation among health care institutions? Med Care. 2010;48(8):757–8; author reply 8. Epub 2010/07/22. pmid:20647865.
- 12. Vernaz N, Huttner B, Muscionico D, Salomon J-L, Bonnabry P, López-Lozano JM, et al. Modelling the impact of antibiotic use on antibiotic-resistant Escherichia coli using population-based data from a large hospital and its surrounding community. Journal of Antimicrobial Chemotherapy. 2011.
- 13. Harbarth S, Harris AD, Carmeli Y, Samore MH. Parallel Analysis of Individual and Aggregated Data on Antibiotic Exposure and Resistance in Gram-Negative Bacilli. Clinical Infectious Diseases. 2001;33(9):1462–8. pmid:11588690
- 14. Vernaz N, Sax H, Pittet D, Bonnabry P, Schrenzel J, Harbarth S. Temporal effects of antibiotic use and hand rub consumption on the incidence of MRSA and Clostridium difficile. Journal of Antimicrobial Chemotherapy. 2008;62(3):601–7. pmid:18468995
- 15. Tu JV, Ko DT. Ecological Studies and Cardiovascular Outcomes Research. Circulation. 2008;118(24):2588–93. pmid:19064693
- 16. Bottle A, Middleton S, Kalkman CJ, Livingston EH, Aylin P. Global Comparators Project: International Comparison of Hospital Outcomes Using Administrative Data. Health Services Research. 2013;48(6pt1):2081–100. pmid:92599947.
- 17. Mohammed MA, Deeks JJ, Girling A, Rudge G, Carmalt M, Stevens AJ, et al. Evidence of methodological bias in hospital standardised mortality ratios: retrospective database study of English hospitals2009 2009-03-18 15:01:48.
- 18. Collaboration. Atlas VPM. Atlas de Variaciones en la Práctica Médica en el Sistema Nacional De Salud http://wwwatlasvpmorg/avpm/inicioiniciodo. 2010.
- 19. Bernal E, (Coordinator). The European Collaboration for Health Optimization (ECHO). http://echo-health.eu/partners/ accessed 11-02-2015. European Comission, Research & Innovation (Health). 2015.
- 20. Bernal-Delgado E, Christiansen T, Bloor K, Mateus C, Yazbeck AM, Munck J, et al. ECHO: health care performance assessment in several European health systems. European journal of public health. 2015;25 Suppl 1:3–7. pmid:25690123.
- 21. Garcia-Armesto S, Angulo-Pueyo E, Martinez-Lizaga N, Mateus C, Joaquim I, Bernal-Delgado E, et al. Potential of geographical variation analysis for realigning providers to value-based care. ECHO case study on lower-value indications of C-section in five European countries. European journal of public health. 2015;25 Suppl 1:44–51. pmid:25690129.
- 22. Merlo J, Viciana-Fernandez FJ, Ramiro-Farinas D, Research Group of Longitudinal Database of Andalusian P. Bringing the individual back to small-area variation studies: a multilevel analysis of all-cause mortality in Andalusia, Spain. Soc Sci Med. 2012;75(8):1477–87. Epub 2012/07/17. pmid:22795359.
- 23. Merlo J, Chaix B, Yang M, Lynch J, Rastam L. A brief conceptual tutorial of multilevel analysis in social epidemiology: linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Community Health. 2005;59(6):443–9. Epub 2005/05/25. pmid:15911637; PubMed Central PMCID: PMC1757045.
- 24. Merlo J. Invited Commentary: Multilevel Analysis of Individual Heterogeneity-A Fundamental Critique of the Current Probabilistic Risk Factor Epidemiology. Am J Epidemiol. 2014. pmid:24925064.
- 25. Merlo J, Östergren P-O, Broms K, Bjorck-Linné A, Liedholm H. Survival after initial hospitalisation for heart failure: a multilevel analysis of patients in Swedish acute care hospitals. Journal of Epidemiology and Community Health. 2001;55(5):323–9. pmid:11297650
- 26. Ash AS, Fienberg E, Louis A, Norm S-lT, Stukel A, Utts PJ. STATISTICAL ISSUES IN ASSESSING HOSPITAL PERFORMANCE Commissioned by the Committee of Presidents of Statistical Societies The COPSS-CMS White Paper Committee. 2011.
- 27. Khoury MJ, Newill CA, Chase GA. Epidemiologic evaluation of screening for risk factors: application to genetic screening. American journal of public health. 1985;75(10):1204–8. Epub 1985/10/01. pmid:3862352; PubMed Central PMCID: PMC1646387.
- 28. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159(9):882–90. Epub 2004/04/24. pmid:15105181.
- 29. Wagner P, Merlo J. Measures of discriminatory accuracy in multilevel analysis. European journal of epidemiology. 2013;28(1, Supplement):135.
- 30. Wagner P, Merlo J. Discriminatory accuracy of a random effect in multilevel logistic regression. 20th IEA World Congress of Epidemiology (WCE2014). 2014.
- 31. Merlo J, Wagner P, Ghith N, leckie G. An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PLOS-ONE. 2015 (PONE-D-15-36083_R1).
- 32. Merlo J, Chaix B, Ohlsson H, Beckman A, Johnell K, Hjerpe P, et al. A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena. J Epidemiol Community Health. 2006;60(4):290–7. Epub 2006/03/16. pmid:16537344; PubMed Central PMCID: PMC2566165.
- 33. Joynt KE, Orav EJ, Jha AK. The association between hospital volume and processes, outcomes, and costs of care for congestive heart failure. Annals of Internal Medicine. 2011;154(2):94–102. pmid:PMC3336194.
- 34. Ross JS, Normand S-LT, Wang Y, Ko DT, Chen J, Drye EE, et al. Hospital Volume and 30-Day Mortality for Three Common Medical Conditions. New England Journal of Medicine. 2010;362(12):1110–8. pmid:20335587.
- 35. Hughes RG, Garnick DW, Luft HS, McPhee SJ, Hunt SS. Hospital volume and patient outcomes. The case of hip fracture patients. Med Care. 1988;26(11):1057–67. pmid:3185017.
- 36. Tsai AC, Votruba M, Bridges JF, Cebul RD. Overcoming bias in estimating the volume-outcome relationship. Health services research. 2006;41(1):252–64. pmid:16430610; PubMed Central PMCID: PMC1681538.
- 37. Moons KG, de Groot JA, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med. 2014;11(10):e1001744. pmid:25314315; PubMed Central PMCID: PMC4196729.
- 38. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. Journal of Chronic Diseases. 1987;40(5):373–83. doi: http://dx.doi.org/10.1016/0021-9681(87)90171-8. pmid:3558716
- 39. Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity Measures for Use with Administrative Data. Medical Care. 1998;36(1):8–27. pmid:9431328
- 40. Li P, Kim M, Doshi J. Comparison of the performance of the CMS Hierarchical Condition Category (CMS-HCC) risk adjuster with the charlson and elixhauser comorbidity measures in predicting mortality. BMC Health Services Research. 2010;10(1):245.
- 41. Larsen K, Merlo J. Appropriate assessment of neighborhood effects on individual health: integrating random and fixed effects in multilevel logistic regression. Am J Epidemiol. 2005;161(1):81–8. Epub 2004/12/24. pmid:15615918.
- 42. Larsen K, Petersen JH, Budtz-Jorgensen E, Endahl L. Interpreting parameters in the logistic regression model with random effects. Biometrics. 2000;56(3):909–14. pmid:10985236
- 43. Browne WJ. MCMC Estimatin in MLwiN v2.29. Centre for Multilevel Modelling, University of Bristol.; 2013.
- 44. Spiegelhalter DJ, Best N, Carlin BP, Linde AVD. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series C, Applied statistics. 2002;64:583–639.
- 45. Health Indicators Warehouse: Medicare Hospital Compare (CMS); 2012 [cited 2015 17 May]. Available from: http://www.healthindicators.gov/Indicators/Hospital-30-day-death-mortality-heart-failure-patients-percent_344/Profile/ClassicData.
- 46. Northwestern Memorial HealthCare: DEATH FROM ANY CAUSE WITHIN 30 DAYS OF HOSPITALIZATION FOR HEART FAILURE: U.S. Department of Health & Human Services; 2012 [cited 2015 12 April]. Available from: www.hospitalcompare.hhs.gov; http://www.nmh.org/nm/quality-death-from-any-cause-within-30-days-of-heart-failure.
- 47. Ludvigsson JF, Andersson E, Ekbom A, Feychting M, Kim JL, Reuterwall C, et al. External review and validation of the Swedish national inpatient register. BMC public health. 2011;11:450. pmid:21658213; PubMed Central PMCID: PMC3142234.
- 48. Lang CC, Mancini DM. Non‐cardiac comorbidities in chronic heart failure. Heart. 2007;93(6):665–71. pmid:PMC1955190.
- 49. Lee C, Chien C, Bidwell J, Gelow J, Denfeld Q, Creber R, et al. Comorbidity profiles and inpatient outcomes during hospitalization for heart failure: an analysis of the U.S. Nationwide inpatient sample. BMC Cardiovascular Disorders. 2014;14(1):73.
- 50. Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression: John Wiley & Sons; 2013.
- 51. Moons KGM, de Groot JAH, Bouwmeester W, Vergouwe Y, Mallett S, Altman DG, et al. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLoS Med. 2014;11(10):e1001744. pmid:25314315
- 52. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology. 49(12):1373–9. pmid:8970487
- 53. Katzan IL, Spertus J, Bettger JP, Bravata DM, Reeves MJ, Smith EE, et al. Risk Adjustment of Ischemic Stroke Outcomes for Comparing Hospital Performance: A Statement for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke. 2014;45(3):918–44. pmid:24457296
- 54. Coiera E, Wang Y, Magrabi F, Concha O, Gallego B, Runciman W. Predicting the cumulative risk of death during hospitalization by modeling weekend, weekday and diurnal mortality risks. BMC Health Services Research. 2014;14(1):226.
- 55. Umegaki T, Sekimoto M, Hayashida K, Imanaka Y. An outcome prediction model for adult intensive care. Crit Care Resusc. 2010;12(2):96–103. pmid:20513217.
- 56. Gastmeier P, Sohr D, Breier A, Behnke M, Geffers C. Prolonged duration of operation: an indicator of complicated surgery or of surgical (mis)management? Infection. 2011;39(3):211–5. pmid:21509426
- 57. Brinkman S, Abu-Hanna A, de Jonge E, de Keizer N. Prediction of long-term mortality in ICU patients: model validation and assessing the effect of using in-hospital versus long-term mortality on benchmarking. Intensive Care Med. 2013;39(11):1925–31. pmid:23921978
- 58. Krumholz HM, Lin Z, Drye EE, Desai MM, Han LF, Rapp MT, et al. An Administrative Claims Measure Suitable for Profiling Hospital Performance Based on 30-Day All-Cause Readmission Rates Among Patients With Acute Myocardial Infarction. Circulation Cardiovascular Quality and Outcomes. 2011;4(2):243–52. pmid:PMC3350811.
- 59. Varagunam M, Hutchings A, Black N. Do patient-reported outcomes offer a more sensitive method for comparing the outcomes of consultants than mortality? A multilevel analysis of routine data. BMJ Quality & Safety. 2014.
- 60. Wagner P, Merlo J. Measures of discriminatory accuracy are an alternative to the Median Odds Ratio and intra-class correlation for estimating contextual effects in multilevel analyses. EUROEPI (submitted) Lund University, working paper ©. 2013.
- 61. Saunders L, Perennec-Olivier M, Jarno P, L’Hériteau F, Venier A-G, Simon L, et al. Improving Prediction of Surgical Site Infection Risk with Multilevel Modeling. PLoS ONE. 2014;9(5):e95295. pmid:24835189
- 62. Merlo J, Broms K, Ostergren PO, Hagberg O, Norlund A, Lithman T. [Multilevel analysis of regional disparities in survival after heart failure: differences between county health services affect little patients' prognosis]. Lakartidningen. 2001;98(44):4838–44. Epub 2001/12/04. pmid:11729797.
- 63. Lynch KF, Subramanian SV, Ohlsson H, Chaix B, Lernmark A, Merlo J. Context and disease when disease risk is low: the case of type 1 diabetes in Sweden. J Epidemiol Community Health. 2010;64(9):789–95. Epub 2009/10/17. pmid:19833608.
- 64. Merlo J. Multilevel analytical approaches in social epidemiology: measures of health variation compared with traditional measures of association. J Epidemiol Community Health. 2003;57(8):550–2. Epub 2003/07/29. pmid:12883048; PubMed Central PMCID: PMC1732554.
- 65. Merlo J. Invited commentary: multilevel analysis of individual heterogeneity-a fundamental critique of the current probabilistic risk factor epidemiology. American Journal of Epidemiology. 2014;180(2):208–12. Epub 2014/06/14. pmid:24925064.
- 66. Krumholz HM, Lin Z, Normand S-LT. Measuring hospital clinical outcomes. BMJ. 2013;346.
- 67. Merlo J, Chaix B, Ohlsson H, Beckman A, Johnell K, Hjerpe P, et al. A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena. Journal of Epidemiology and Community Health. 2006;60(4):290–7. pmid:16537344