Can we trust the standardized mortality ratio? A formal analysis and evaluation based on axiomatic requirements

Background The standardized mortality ratio (SMR) is often used to assess and compare hospital performance. While it has been recognized that hospitals may differ in their SMRs due to differences in patient composition, there is a lack of rigorous analysis of this and other—largely unrecognized—properties of the SMR. Methods This paper proposes five axiomatic requirements for adequate standardized mortality measures: strict monotonicity (monotone relation to actual mortality rates), case-mix insensitivity (independence of patient composition), scale insensitivity (independence of hospital size), equivalence principle (equal rating of hospitals with equal actual mortality rates in all patient groups), and dominance principle (better rating of unambiguously better performing hospitals). Given these axiomatic requirements, effects of variations in patient composition, hospital size, and actual and expected mortality rates on the SMR were examined using basic algebra and calculus. In this regard, we distinguished between standardization using expected mortality rates derived from a different dataset (external standardization) and standardization based on a dataset including the considered hospitals (internal standardization). The results were illustrated by hypothetical examples. Results Under external standardization, the SMR fulfills the axiomatic requirements of strict monotonicity and scale insensitivity but violates the requirement of case-mix insensitivity, the equivalence principle, and the dominance principle. All axiomatic requirements not fulfilled under external standardization are also not fulfilled under internal standardization. In addition, the SMR under internal standardization is scale sensitive and violates the axiomatic requirement of strict monotonicity. Conclusions The SMR fulfills only two (none) out of the five proposed axiomatic requirements under external (internal) standardization. Generally, the SMRs of hospitals are differently affected by variations in case mix and actual and expected mortality rates unless the hospitals are identical in these characteristics. These properties hamper valid assessment and comparison of hospital performance based on the SMR.


Results
Under external standardization, the SMR fulfills the axiomatic requirements of strict monotonicity and scale insensitivity but violates the requirement of case-mix insensitivity, the equivalence principle, and the dominance principle. All axiomatic requirements not fulfilled under external standardization are also not fulfilled under internal standardization. In addition, the SMR under internal standardization is scale sensitive and violates the axiomatic requirement of strict monotonicity.

Introduction
Assessing quality of care in hospitals is of high interest to patients, healthcare professionals, and political decision makers. Consequently, there are multiple attempts to characterize and compare hospitals based on quality indicators [1][2][3][4][5]. The design of those indicators usually includes some form of risk adjustment. Utilizing statistical methods and measures, risk adjustment aims to facilitate comparison of hospitals with differences in case mix (e.g. different shares of high-risk patient groups) that induce outcome differences between the hospitals irrespectively of the true quality of care. Such adjustment is particularly relevant for quality indicators based on in-hospital mortality, which is one of the most frequently considered hospital outcomes.
A frequently used measure of risk-adjusted mortality is the standardized mortality ratio (SMR) [6][7][8][9][10][11][12]. Using indirect standardization, the SMR relates the observed mortality rate of a hospital to its expected mortality rate. The latter is derived by estimating expected mortality rates for predefined strata of patients (i.e. patients with similar risk factors characteristics) and aggregating these stratum-specific expected mortality rates according to the hospital's case mix (details on calculation of the SMR are provided below). In this way, the SMR aims to describe the ratio of the observed mortality at a specific hospital to a benchmark that projects strata-specific mortality rates averaged over the entire population to the hospital's own patient population distribution.
While the SMR is the dominant measure in empirical applications of hospital quality assessment, some basic methodological issues were recognized in previous work [13][14][15][16][17][18]. Notably, evidence from empirical and simulation studies suggested that the SMR is case-mix sensitive, implying that two hospitals with identical mortality rates in all patient groups may differ in their SMRs due to differences in patient composition [19][20][21][22][23]. While those analyses can provide evidence on specific properties of the SMR, this evidence provides no explanation for why a measure designed to account for case-mix would appear to vary when mortality rates are constant but patient composition differs. Against that background, this paper uses rigorous formal analysis to provide reliable and generalizable insights on basic properties of the SMR.
By drawing on a formal approach, our study is closely related to previous work on mathematical properties of statistical measures used for standardization and comparison of rates and ratios [24]. Methodological issues related to indirect standardization were already revealed by Yule in 1934, who highlighted that the quotient of two SMRs cannot be expressed as a weighted mean of the stratum-specific mortality rates with constant weights [25]. As the latter ensures comparability of mortality indices between multiple study populations, Yule posits that fulfilling this property is essential for all standardization methods. Accordingly, Yule concluded that indirect standardization "is not fully a method of standardization at all, but is only safe for the comparison of single pairs of populations" [25]. Freeman and Holford showed that comparison of indirectly standardized rates may only be valid if there is proportionality between the stratum-specific mortality rates across populations [26]. An additional requirement for validity revealed by Freeman and Holford is that the stratum-specific mortality rates in the standard population are also proportional for every stratum. This requirement is reflected in the condition of proportionality formulated by Breslow and Day, which states that "for an SMR analysis to be completely appropriate [. . .] the stratum-specific death rates for each exposure class [must] be proportional to the external standard rates" [27]. Based on this insight, the authors demonstrated relationships between calculation of the SMR and the fitting of multiplicative regression models. Breslow and Day also showed that indirect standardization based on average stratum-specific mortality rates violates the condition of proportionality. Moreover, the authors highlighted that the SMR is sensitive to patient composition and pointed to potential bias arising from the choice of external or internal standard populations. Since the construction of an internal standard population uses information from the current sample, the authors noted that it may be dominated by few large exposure groups [27].
Given these cautionary insights from related studies and the frequent use of the SMR in the context of hospital performance assessments, a clear and comprehensive formal characterization of the SMR is of high relevance. Against that background, we extended previous work by systematically investigating and evaluating multiple basic properties of the SMR. This includes analysis of properties other than case-mix sensitivity and consideration of both external and internal approaches to standardization. In a first step, we proposed general properties that characterize adequate measures of standardized mortality. In a second step, we utilized these proposed characteristics to derive a set of axiomatic requirements that should be fulfilled by standardized mortality measures. Formulation of axiomatic requirements for adequate statistical measures is long-established in literature on measurement of income inequality [28] and facilitates clear evaluation of the measures' mathematical properties. In a third step, we examined properties of the SMR by drawing on analytical mathematical methods. This approach allowed us to formally investigate the behavior of the SMR given variations in case mix, hospital size, and actual and expected mortality rates. The insights on properties of the SMR were evaluated with respect to the formulated axiomatic requirements for standardized mortality measures. In this way, this paper clarifies and extends the results of previous analyses by providing a comprehensive, systematic, and transparent examination and assessment of the SMR's basic properties.

Methods
All formal analyses relied on basic algebra and differential calculus. In preparation of these analyses, the following sections outline the definition and the analyzed properties of the SMR and the notational conventions used throughout this paper.

Definition and interpretation of the standardized mortality ratio
We considered H hospitals, indexed by h = 1, . . ., H. Each patient treated in one of these hospitals belonged to one of S strata, indexed by s = 1, . . ., S. Each stratum represents a group of patients with the same risk factor characteristics. Let n hs 2 N denote the number of patients belonging to stratum s treated in hospital h and n h ¼ P S s¼1 n hs denote the total number of patients treated in hospital h. Note that we also refer to n h as a measure of hospital size. Furthermore, assume that each hospital was characterized by actual stratum-specific mortality rates p hs 2 [0, 1]. Given expected stratum-specific mortality rates p e s 2 ½0; 1�, the SMR of hospital h is defined as the relation between its actual mortality rate � Note that while actual mortality rates p hs may be specific for each stratum in a hospital, expected mortality rates p e s may only vary by stratum but are the same for all considered hospitals. Hence, the SMR may be interpreted as evaluating actual mortality rates of all hospitals relative to the same "benchmark" (i.e. expected) mortality rates, where both actual and expected mortality rates are weighted by each hospital's stratum-specific patient numbers. If the SMR of a hospital exceeds the value of 1, the hospital is judged to perform worse than expected. A SMR smaller than 1 is interpreted as better-than-expected performance. The relative performance of hospitals is often assessed by comparison of their SMRs.
In practice, the hospital-specific mortality rates p hs are unknown and may be estimated by the hospital's observed stratum-specific mortality rates. Under this approach, the numerator of Eq 1 becomes the hospital's observed number of deaths while the denominator is the hospital's expected number of deaths. However, since our analysis does not focus on issues of estimation but examines general properties of the SMR, we treat the actual mortality rates p hs as known (or perfectly estimated) throughout the paper.

Axiomatic requirements for standardized mortality measures
The objective of this paper is to evaluate properties of the SMR in a systematic way. While Eq 1 provides the basis for formal analysis, evaluation of the SMR's properties also requires general assumptions on desirable properties of standardized mortality measures. Those properties should be relevant for fair comparison of hospital performance in terms of mortality, which, by assumption, is influenced by the hospitals' care qualities. For this purpose, we propose that a wellbehaved measure of standardized mortality should be characterized by the following properties: • Increases (decreases) in actual mortality rates should, ceteris paribus, always be reflected in increased (decreased) values of the standardized mortality measure. Rationale: Keeping (all relevant) patient-specific risk factors constant, increasing (decreasing) mortality in patients treated in a hospital indicates worse (better) performance of the hospital.
• The measure should be independent of the hospital's patient composition.
Rationale: The hospital's case mix does not reflect the hospital's care quality and, thus, should not influence the performance assessment.
• The measure should be independent of hospital size. Rationale: Hospital size per se does not reflect quality of care and, thus, should not influence the performance assessment.
• The measure should assign the same value to hospitals with identical performance in terms of mortality. Rationale: Fair comparison of hospital performance requires that hospitals with identical care quality may not be evaluated differently.
• The measure should always rank one hospital better than another hospital if the former unambiguously performs better in terms of care-quality related mortality.
Rationale: Lower mortality rates of all patient groups in one hospital compared to another hospital imply that each patient's risk of death is lower when being admitted in the former hospital.
Based on these necessary properties for valid comparisons of quality of care, we postulate the following five axiomatic requirements for standardized mortality measures: • Strict monotonicity: Increases (decreases) in a hospital's stratum-specific mortality rates p hs always induce increases (decreases) in the value of the measure assigned to the hospital if the hospital treated patients belonging to that stratum (n hs > 0).
• Case-mix insensitivity: Holding actual stratum-specific mortality rates p hs , expected mortality rates p e s , and the hospital's number of patients n h constant, the value of the measure is insensitive to the hospital's case mix, i.e. the hospital's stratum-specific patient shares n hs /n h , s = 1, . . ., S.
• Scale insensitivity: Holding case mix (n hs /n h , s = 1, . . ., S), actual mortality rates p hs , and expected mortality rates p e s constant, the measure is insensitive to the hospital's total number of patients n h .
• Equivalence principle: The measure assigns the same value to two hospitals with identical stratum-specific mortality rates p hs or identical deviations of actual stratum-specific mortality rates p hs from expected stratum-specific mortality rates p e s . • Dominance principle: The measure always ranks hospital 1 better than hospital 2 if the actual mortality rates of all patient groups treated in hospital 1 are equal to or lower than the mortality rates of these patient groups in hospital 2 (p 1s � p 2s 8s = 1, . . ., S) and the mortality rate of at least one patient group is lower in hospital 1 than in hospital 2 (9k 2 {1, . . ., S}:p 1k < p 2k ).
Given these axiomatic requirements, we examined effects of variations in case mix, hospital size n h , actual mortality rates p hs , and expected mortality rates p e s on the SMR. In this regard, the following terminological distinctions are noteworthy: 1) Direct vs. indirect standardization: Direct standardization applies the stratum-specific mortality rates of each hospital to the case mix of the same "reference"/"standard" hospital. In contrast, the SMR as a measure of indirect standardization applies stratum-specific expected mortality rates p e s to the specific case mix of each hospital. Since we focused on properties of the SMR, our formal analysis therefore does not consider direct but indirect standardization. 2) External vs. internal standardization: This distinction refers to the way in which the expected mortality rates p e s are derived: • External standardization: Expected mortality rates may be derived from data that is not included in the analysis of the hospitals under consideration, e.g. from a dataset of hospitals from a different geographical region. This approach is refereed to as external standardization.
• Internal standardization: Alternatively, expected mortality rates may be derived from the same dataset used to calculate the SMRs of the considered hospitals. In this case, the performance of the hospitals usually is evaluated against their average performance in terms of mortality rates. This approach is referred to as internal standardization.
Taking the difference between external and internal standardization into account, we examined properties of the SMR for both standardization approaches separately.

Notation
For notational brevity, arguments of functions are stated explicitly only when they are relevant for the analysis. For instance, the SMR of a specific hospital h, which depends on stratum-specific numbers of patients n h1 , . . ., n hS , the hospital's stratum-specific mortality rates p h1 , . . ., In the same way, the overall mortality rate of hospital h when multiplying all stratum-specific patient numbers n hs , s = 1, . . ., S with a common factor l 2 R þ is written as To distinguish in notation between external and internal standardization, variables that are affected by the choice of standardization approach are tagged with the superscripts "ext" and "int", respectively.

Results
In the following, effects of variations in case mix, hospital size, and actual and expected mortality rate are examined formally. Analyses were first conducted for the SMR under external standardization and subsequently for the SMR under internal standardization.

External standardization
As noted above, external standardization refers to the case in which the stratum-specific expected mortality rates are derived from a different dataset. Letting p e;ext s denote these expected mortality rates, the externally standardized SMR of hospital h is

PLOS ONE
Can we trust the standardized mortality ratio?
It is noteworthy that Eq 6 implies that the SMR generally changes due to a shift of patients from stratum l to stratum k even if the hospital's mortality rates in both strata are equal to the expected mortality rates (p hk ¼ p e;ext k ; p hl ¼ p e;ext l Þ as long as p e;ext k 6 ¼ p e;ext l . Hence, performance in line with expected mortality for both strata generally does not imply that the SMR is insensitive to the number of patients belonging to these strata. This result demonstrates the SMR's violation of the axiomatic requirement of case-mix insensitivity.
For further investigation, the change in the SMR due to the shift in case mix is defined as Eq 7 shows that the change in the SMR due to a shift of patients from stratum l to stratum k is, in absolute terms, large if the number of shifted patients η is large, the number of patients treated in the hospital n h is small, and the hospital's overall expected mortality rate � p e;ext h ðn hk ; n hl Þ is low. Since � p e;ext h ðn hk ; n hl Þ depends on the stratum-specific patient numbers n hs , s = 1, . . ., S, the latter implies that the change in the SMR due to a variation in case mix depends on the initial case mix of the hospital.
The sign of Eq 7 is determined according to Hence, the direction of the change in the SMR due to a shift of patients from stratum l to stratum k depends on the difference between the hospital's mortality rates of these strata (p hk − p hl ), the difference between the strata's expected mortality rates (p e;ext k À p e;ext l ) and the hospital's SMR. If the hospital's mortality rate in stratum k is higher than in stratum l (p hk − p hl > 0) while the opposite is true for the expected mortality rates (p e;ext k À p e;ext l < 0), the hospital's SMR increases due to the shift of patients, and vice versa. However, if both actual and expected mortality rate differences are positive (p hk − p hl > 0 and p e;ext k À p e;ext l > 0), hospitals with high SMRs experience a reduction in the SMR whereas hospitals with low SMRs experience an increase in the SMR. Similarly, concordant negative hospital-specific and expected mortality rate differences (p hk − p hl < 0 and p e;ext k À p e;ext l < 0) imply that the SMR of a hospital increases (decreases) if the initial SMR of the hospital is high (low). The SMR generally changes in accordance with the relation between actual and expected mortality rate differences only if À p e;ext l Þ. As noted above, performance in line with expected mortality rates (p hk ¼ p e;ext k and p hl ¼ p e;ext l ) does not imply that the SMR is insensitive to the number of patients belonging to the considered strata. Under this condition, Thus, a shift of patients from a stratum with a lower to a stratum with a higher mortality rate leads to an increase (decrease) in the SMR if the hospital's SMR initially is smaller (greater) than 1. By the same token, a shift of patients from a stratum with a higher to a stratum with a lower mortality rate p hk − p hl < 0 implies that O ext hkl ðZÞ⋛0 if SMR ext h ðn hk ; n hl Þ⋛1 if the hospital performs in line with expected mortality rates. In this scenario, the SMR of the hospital increases (decreases) if its initial SMR is above (below) unity.
In the extreme case, in which all patients are concentrated in a specific stratum k (n hk = n h ), the hospital's SMR equals the relation between the actual and the observed mortality rate of that stratum, i.e.
For illustration of case-mix sensitivity under external standardization, we considered two hospitals and three strata of patients (Table 1). Both hospitals had the same case mix, with 20 − η patients belonging to stratum 1, η patients belonging to stratum 2 and 5 patients belonging to stratum 3. The parameter η is used to determine the allocation of patients to stratum 1 and stratum 2. If η = 0, both hospitals had 20 patients in stratum 1 and 0 patients in stratum 2. If η = 20, 20 patients were allocated to stratum 2 while the hospitals have no patient in stratum 1. Furthermore, both hospitals performed in line with expected mortality rates in strata 1 and 2.
The only difference between the hospitals is that hospital 1 had a higher-than-expected mortality rate in stratum 3 (0.2 > 0.15) while hospital 2 performed better than expected in this stratum (0.1 < 0.15). Accordingly, the SMR of hospital 1 exceeds the value of 1 while the SMR of hospital 2 is below unity. Fig 1 shows the SMRs of the hospitals for different allocations of patients to strata 1 and 2 as induced by different values of η. Although both hospitals were identical in case mix and performed in line with expected mortality rates in both affected strata, their SMRs are affected by a shift of patients from stratum 1 to stratum 2. As indicated by Eqs 8-10, hospital 1 experiences an increase in its SMR whereas the SMR of hospital 2 decreases when the number of patients allocated to stratum 2 is increased (i.e. η is increased). This is because mortality in stratum 2 was lower than mortality in stratum 1 and the SMR of hospital 1 exceeds unity while the SMR of hospital 2 is below unity.
Variations in hospital size under external standardization. To examine variations in hospital size, we considered a proportional shift in the numbers of patients treated in all strata of a hospital by the scale factor l 2 R þ , where λ = 1 is the initial scale of the hospital. For λ > 1, this reflects a situation in which the total number of patients treated in the hospital is increased by factor λ while the case mix (i.e. the shares of the strata in the hospital's total number patients) is held constant. For the SMR under external standardization is follows that Since the value of the scaled SMR is the same as the value of the original SMR, the SMR fulfills the axiomatic requirement of scale insensitivity under external standardization. Increases in hospital size do not change the value of the SMR, ceteris paribus.

PLOS ONE
Can we trust the standardized mortality ratio?
Scale insensitivity under external standardization is illustrated by the example of a hospital with two strata, containing 20 and 40 patients, respectively, in the initial situation ( Table 2). The mortality rates were assumed to be 0.05 in the first and 0.15 in the second stratum. Expected mortality rates in both strata were fixed at the value of 0.1. In the initial situation, this corresponds to 7 observed and 6 expected deaths, which results in a SMR of 1.17. Doubling the size of the hospital while holding case mix constant (λ = 2) doubles both, the number of patients and the number of deaths in each stratum. However, the SMR of the hospital remains constant at the value of 1.17. The same is true for further increases of hospital size as induced by higher values of λ.
Variations in actual mortality rates under external standardization. Effects of variations in actual mortality rates were examined by calculating the marginal effect (i.e. the partial derivative) [29] of an increase in the mortality rate of stratum k in hospital h on the hospital's SMR: If n hk > 0, Eq 13 implies that ME ext h;p hk > 0, i.e. an increase in the mortality rate of a specific stratum increases the SMR of the hospital. The SMR under external standardization therefore fulfills the axiomatic requirement of strict monotonicity. The increase in the SMR induced by an increase in the stratum-specific mortality rate is relatively large (small) if the patients included in the stratum account for a large (small) share n hk /n h of patients treated in the hospital. Furthermore, the marginal effect decreases in the hospital's expected overall mortality rate � p e;ext h . The latter implies that an increase in stratum-specific mortality generally affects hospitals differently, as � p e;ext h depends on a hospital's case mix. This result also applies in the case in which the hospital's actual mortality rates of all strata are increased by the absolute amount of dp. This corresponds to a situation in which the overall mortality rate of the hospital is increased by dp. Calculating the differential of Eq 5 in all actual mortality rates and using dp hs = dp, s = 1, . . ., S yields Similar to an increase in the mortality rate of a single stratum, increases in the mortality rates of all strata have a large (small) impact on the hospital's SMR when the hospital's overall expected mortality rate is small (large).
The results on mortality rate variations under external standardization are illustrated by the example of two hospitals and three strata of patients (Table 3). Both hospitals treated 10 patients, with 5 belonging to stratum 1. The difference between the hospitals was that the remaining 5 patients of hospital 1 belonged to stratum 2 whereas those of hospital 2 belonged to stratum 3. In all strata, the hospitals performed in line with expected mortality rates, such that the SMR of both hospitals in the initial situation is 1.
Holding the remaining parameter values constant, Fig 2 shows the SMRs of the hospitals for different mortality rates in stratum 1. Note that the mortality rates in stratum 1 were varied simultaneously for hospital 1 and hospital 2 in each scenario (p 11 = p 21 ), such that there is no difference in the performance of the hospitals with respect to stratum 1. While the SMRs of both hospitals are equal in the initial situation, lower-than-expected mortality rates in stratum 1 (p 11 = p 21 < 0.1) imply that the SMR of hospital 1 is lower than the SMR of hospital 2. For higher-than-expected mortality rates (p 11 = p 21 > 0.1), the SMR of hospital 1 is higher than the SMR of hospital 2. The reason for this result is that (actual and expected) mortality rates of hospital 2 in stratum 3 are higher than (actual and expected) morality rates of hospital 1 in stratum 2. This implies that the expected overall mortality rate of hospital 2 is higher than the expected overall mortality rate of hospital 1. According to Eq 13, this implies that the SMR of hospital 1 reacts more sensitive to changes in case mix than the SMR of hospital 2. The example therefore demonstrates that two hospitals with identical deviations of actual from expected mortality rates generally do not have the same SMR value. Hence, the SMR under external standardization violates the equivalence principle.
According to Eq 15, n hk > 0 implies that ME ext h;p e k < 0, i.e. an increase in the expected mortality rate of a stratum reduces the hospital's SMR if it treated patients belonging to this stratum. This reduction is (in absolute terms) larger for hospitals with higher SMRs, a larger share n hk / n k of patients belonging the considered stratum, and lower expected overall mortality rates � p e;ext h . Thus, effects of variations in stratum-specific expected mortality rates depend on the hospital's case mix and the initial value of the hospital's SMR.
The same applies to an increase in all stratum-specific expected mortality rates by the absolute amount of dp e;ext s ¼ dp; s ¼ 1; . . . ; S as the associated change in the SMR depends on both the size of the hospital's SMR and the expected overall mortality rate: h @p e;ext s dp e;ext To illustrate the effects of changes in expected mortality rates under external standardization, we considered two hospitals and two strata of patients (Table 4). Both hospitals had 5 patients with a mortality rate of 0.1 in stratum 1. The hospitals differed with respect to stratum 2, where hospital hospital 1 had 5 patients with a mortality rate of 0.2 and hospital hospital 2 had 15 patients with a mortality rate of 0.15. Note that both hospitals performed worse than expected in stratum 2 as the expected mortality rate was 0.1. Overall, hospital 2 was performing better than hospital 1 due to equal actual mortality rates in stratum 1 and a lower mortality rate in stratum 2. on the SMRs of the hospitals. Starting at low expected mortality rates of stratum 1, the SMR of hospital 1 is higher than the SMR of hospital 2, implicating that hospital 2 performed better than hospital 1. Increasing the expected mortality rate of stratum 1 reduces the SMRs of both hospitals. However, since hospital 1 has a higher share of patients in stratum 1 and a higher initial SMR, it experiences a stronger decrease in its SMR. At an expected mortality rate of p e;ext 1 ¼ 0:14, the SMRs of both hospitals are equal. For further increased expected mortality rates of stratum 1, the SMR of hospital 1 becomes lower than the SMR of hospital 2 although the overall performance of hospital 2 was better than the performance of hospital 1. This result is driven by the fact that stratum 1 accounts for a higher share of patients in hospital 1 than in hospital 2. By the virtue of Eq 15, this implies that hospital 1 "benefits" more from increases in the expected mortality rate of this stratum in terms of reductions in the SMR even if the SMRs of both hospitals are equal. With respect to the formulated axiomatic requirements, the example therefore demonstrates that the SMR under external standardization violates the dominance principle.

PLOS ONE
Can we trust the standardized mortality ratio?

Internal standardization
The stratum-specific mortality rates p e;int s were calculated from the same dataset used for later analysis when derived by internal standardization. The SMR of hospital h based on the internal standard therefore is expressed as is the hospital's internally standardized expected mortality rate. Since expected mortality rates were derived from the same dataset used for calculation of the hospitals' SMRs, they implicitly depend on the stratum-specific mortality rates p js and the number of patients n js of the included hospitals j = 1, . . ., H. The internal standard may be chosen in different ways. In nonparametric SMR estimations, the internal standard is often chosen such that the expected mortality rate of each stratum equals the weighted average mortality rate of that stratum across all hospitals, i.e.
where n s ¼ P H j¼1 n js is the total number of patients in stratum s. In terms of interpretability, this approach to standardization has the advantage that a hospital with average mortality rates in all strata (p hs ¼ � p s ; s ¼ 1; . . . ; S) has a SMR int h ¼ 1.

Variations in case mix under internal standardization
Hence,p hk andp hl represent weighted averages of the hospital's stratum-specific mortality rates (p hk , p hl ) and the respective average stratum-specific mortality rates (� p k , � p l ). The weights α hk = (n hk + η)/(n k + η) and α hl = (n hl − η)/(n l − η) reflect the degree to which hospital h accounts for the total number of patients in the considered strata. Similar to the SMR under external standardization, the SMR under internal standardization generally changes due to a change in case mix. Thus, it does not fulfill the axiomatic requirement of case-mix insensitivity. Similar to the results for case-mix variations under external standardization (Eqs 8-10), the direction of change in the SMR induced by a shift of patients from stratum l to stratum k depends on the difference of the actual stratum-specific mortality rates (p hk − p hl ) and the SMR-weighted difference in the endogenous threshold mortality rates (p hk Àp hl ).
In the extreme case in which the hospital accounts for the total number of patients in both strata (n hk = n k , n hl = n l ), it holds that α hk = α hl = 1, which implies thatp hk ¼ p hk Hence, a shift of patients from a stratum with a lower to a stratum with a higher mortality rate increases (decreases) the SMR of hospitals with below-average (above-average) SMRs. Similarly, a shift of patients from a stratum with a higher to a stratum with a lower mortality rate decreases (increases) the SMR of hospitals with below-average (above-average) SMRs. These results are driven by assumption that the hospital fully serves as its own reference in both strata. Hence, a concentration of patients in a stratum with a relatively high actual mortality rate implies a greater "benefit" in terms of a lower SMR for hospitals with above-average SMRs and vice versa.
In the other extreme case, the hospital accounts for a negligible share of the strata's total number of patients. Holding n hk and n hl constant, it can be derived that lim n k !1 a hk ¼ lim n k !1 a hl ¼ 0, which implies that lim n l !1phk ¼ � p k ðn hk Þ and lim n l !1phl ¼ � p l ðn hl Þ.
For large values of n k and n h relative to n hk and n hl , respectively, the stratum-specific average mortality rates � p k ðn hk Þ and � p l ðn hl Þ are almost exclusively determined by the mortality rates of For illustration, we considered the case of two hospitals and two strata of patients (Table 5). With regard to stratum 1, both hospitals were characterized by a mortality rate of 0.1, implying that the expected mortality rate of stratum 1 is also 0.1. With respect to stratum 2, hospital 1 was characterized by a higher mortality rate than hospital 2 (0.3 > 0.1). While the patient numbers of hospital 2 allocated to the strata were fixed at values of 25 and 10, respectively, the parameter η determined the number of patients treated in hospital 1 belonging to stratum 1 and 2, respectively. If η = 0, all patients of hospital 1 were allocated to stratum 1 and no patient was allocated to stratum 2. If η = 50, no patient of stratum 1 was treated in hospital 2 while the hospital treated 50 patients belonging to stratum 2.
The SMRs of both hospitals for different values of η are shown by Fig 4. As the mortality rates of hospital 1 were higher or equal to those of hospital 2, the SMR of hospital 1 exceeds unity while the SMR of hospital 2 is below unity. If all patients of hospital 1 are allocated to stratum 1 (η = 0), increases in η lead to an increase in the SMR of hospital 1 and a decrease in the SMR of hospital 2. This behavior is in line with the fact that higher values of η imply that more patients of hospital 1 are shifted from a stratum with a mortality rate equal to expected mortality to the stratum with a higher-than-expected mortality rate. However, at a certain number of patients allocated from stratum 1 to stratum 2, the SMR of hospital 1 does not change due to a change in η. At this point, the size of the hospital's SMR and its influence on the expected mortality rate of stratum 2 has become sufficiently large to meet the condition stated by Eq 23. When the number of patients treated in hospital 1 that is allocated from stratum 1 to stratum 2 is further increased, the SMR of hospital 1 even starts to decrease as implied by Eq 24.

Variations in hospital size under internal standardization
Since Eq 32 shows that the stratum-specific expected mortality rates depend on λ, the SMR does not fulfill the axiomatic requirement of scale insensitivity under internal standardization. For further investigation, the change in the SMR due to increasing the number of patients by factor λ is defined as Eq 33 implies that the SMR of hospital h increases (decreases) due to an increase in hospital size if the hospital's expected overall mortality rate decreases (increases) due to scaling. Furthermore, the magnitude of change induced by scaling is (in absolute terms) higher (lower) for hospitals with higher (lower) initial SMRs. The condition determining the direction of change in the SMR may be expressed as For a hospital with above-average stratum-specific mortality rates in all strata, Eqs 37-39 imply a decrease in the SMR when the scale of those hospital is increased. On the contrary, the SMR of a hospital performing better than average in all strata increases when its size is increased while holding case mix constant. In accordance with these results, it can be derived since Eq 32 implies that lim l!1 p e;int s ðln hs Þ ¼ p hs . Hence, the SMR under internal standardization approaches (but does not cross) unity when the scale of a hospital is increased. These results reflects that the hospital is increasingly becoming its own reference when its size is increased because it increasingly dominates the value of the stratum-specific expected mortality rates.
For illustration of scale sensitivity under internal standardization, we considered three hospitals and three strata of patients (Table 6). Hospitals 1 and 2 had patients in strata 1 and 2 and no patient belonging to stratum 3. Hospital 3 treated patients belonging to strata 2 and 3 but no patient belonging to stratum 1. In terms of mortality rates, hospital 2 performed better than the other hospitals in all strata. Hospital 2 performed better than hospital 3 in stratum 2. The patient numbers of hospital 1 in all strata were scaled by factor λ.
As depicted by Fig 5, in the initial situation (λ = 1) hospital 1 has the highest SMR while hospital 2 has the lowest SMR. The SMR of hospital 3 exceeds unity, indicating worse-thanaverage performance, but is lower than the SMR of hospital 1. Doubling the size of hospital 1 (λ = 2) leads to a decrease in the SMRs of all three hospitals. This is due to the increased weight of hospital 1 in the calculation of the expected mortality rates of strata 1 and 2. Since the stratum-specific mortality rates of hospital 1 are higher than the average mortality rates in the initial situation, this results in an increase in expected mortality rates (see Eqs 37-39). However, the induced decrease in the SMR is strongest for hospital 1, implying that it becomes more close to hospital 3 in terms of the overall performance assessment. For further increased scales of hospital 1 (λ � 3), this trend continues and the SMR of hospital 1 becomes lower than the SMR of hospital 3.

Variations in actual mortality rates under internal standardization
The marginal effect of an increase in the mortality rate of stratum k in hospital h can be derived as Note that the first term on the right-hand side of Eq 41 is similar to the first term on the right hand side of Eq 13. Thus, this term represents the direct effect of an increase in the stratum-

PLOS ONE
Can we trust the standardized mortality ratio?
specific mortality on the SMR of hospital h. However, when using an internal standard there is also an indirect effect, represented by the second term on the right hand side of Eq 41. This indirect effect emerges from the fact that the expected mortality rate p e;int k depends on the the mortality rate p hk of patients included in this stratum treated in hospital h. Given that @p e;int k =@p hk ¼ n hk =n k > 0 if n hk > 0, the second term on the right-hand side of Eq 41 is negative, which indicates that the indirect effect counteracts the positive direct effect of an increase in the stratum-specific mortality rate.
It follows that As shown by Eq 45, the marginal effect of p hk may even be negative if the hospital's SMR is high and the hospital accounts for a large share of patients in stratum k, i.e. if n k /n hk is small. This corresponds to the paradoxical situation in which increasing mortality in a stratum of patients treated in a specific hospital reduces the SMR of that hospital. Hence, the SMR under internal standardization does not fulfill the axiomatic requirement of strict monotonicity. Since n k /n hk � 1, Eq 45 further shows that a negative marginal effect can only arise in hospital with above-average SMRs, i.e. in hospitals with SMR int h > 1. Considering a change in the mortality rates of all strata by the amount of dp hs = dp, s = 1, . . ., S yields The expression describing the change in the internally standardized SMR (Eq 46) differs from the expression for the change in the externally standardized SMR (Eq 14) due to the factor in parentheses included in Eq 46. This factor is smaller than 1 if SMR int h > 0, implying that the increase in the SMR of a hospital due to an increase in its overall mortality rate is, generally, smaller under internal than under external standardization. Moreover, the sign of Eq 46 is ambiguous since ð48Þ An increase in the overall mortality rate of a hospital therefore may reduce its SMR if the hos- if all patients of the hospital are concentrated in a specific stratum k (n hk /n h = 1) and the hospital accounts for all patients belonging to this stratum (n hk /n k = 1). Thus, paradoxical effects of mortality rate increases on the SMR may particularly arise in specialized hospitals treating specific patient groups that are seldom treated in other hospitals. For illustration, we considered three hospitals and two strata of patients (Table 7). Hospitals 1 and 2 treated patients belonging to strata 1 and 2 whereas hospital 3 treated patients belonging to stratum 2 only. With respect to stratum 2, hospital 1 had the highest mortality rate whereas hospital 3 had the lowest mortality rate. In the following, the mortality rate p 11 2 [0, 1] in stratum 1 of hospital 1 is varied for different shares w 11 2 {0.6, 0.8, 1} of patients in stratum 1 treated in hospital 1. The larger w 11 , the higher the share of patients belonging to stratum 1 that were treated in hospital 1.
The results are shown in Fig 6. If 60% of all patients in stratum 1 are allocated to hospital 1 (w 11 = 0.6), an increase in the mortality rate of these patients is associated with an increase in the SMR of hospital 1 and a decrease of the SMR of hospital 2. Note that the SMR of hospital 3 is not affected by variations in the mortality rate of stratum 1 as no patients belonging to this stratum were treated in hospital 3. When the share of patients belonging to stratum 1 allocated to hospital 1 is increased to 80% (w 11 = 0.8), the SMR of hospital 1 decreases in the morality rate p 11 . This illustrates the paradoxical situation captured by Eq 45, in which increasing mortality in a stratum of patients reduces the SMR of the considered hospital. In the extreme case in which all patients in stratum 1 are treated in hospital 1 (w 11 = 1), the inverse relationship between stratum-specific mortality and SMR of hospital 1 gets even more pronounced. If the mortality rate of stratum 1 in hospital 1 reaches 100%, the SMR of hospital 1 gets close to unity. Note that this scenario also illustrates that the SMR of hospital 1 can become lower than the SMR of hospital 3 although the mortality rate in stratum 2 (the only stratum with a positive number of patients treated in hospital 3) is 10% lower in hospital 3 than in hospital 1.

Variations in expected mortality rates under internal standardization
Analogous to the SMR under external standardization, the SMR under internal standardization is affected by changes in expected mortality rates, which leads to a violation of the dominance principle. However, due to the endogeneity of the stratum-specific expected mortality rates p e;int s under internal standardization, variations in these expected mortality rates may be driven by variations in the mortality rates and patient compositions of all hospitals in the sample. The analyses shown above already highlighted effects of variations in mortality rates and patient composition of a hospital on its own SMR. In the following, we examine the influence of other hospitals.
First, the change in the SMR due to a change in the expected mortality rate of stratum k is expressed as Eq 50 indicates that the SMR decreases (increases) if the expected mortality rate increases (decreases). Using Eq 18, the marginal effect of an increase in the mortality rate of stratum k in In combination, Eqs 50-51 imply that the SMR of a hospital decreases when the stratum-specific mortality rates of other hospitals in the sample are increased. The reason is that such increases in mortality rates unambiguously increase the expected mortality rates of the affected strata. This effect is illustrated by the behavior of the SMR of hospital 2 in Fig 6, which decreases in the mortality rate of stratum 1 in hospital 1 as long as hospital 2 accounts for a positive number of patients in that stratum.
Second, adding η patients to stratum k in hospital i implies p e;int k ðn ik þ ZÞ À p e;int k ðn ik Þ ¼ Hence, increasing the size of stratum k in hospital i (and, thus, its share in the total number of patients belonging to stratum k) increases (decreases) the expected mortality rate of that stratum if hospital i's mortality rate of that stratum p ik is higher (lower) than the average mortality rate of that stratum � p k ðn ik Þ. The direction of change in the SMR of a hospital h (Eq 50) due to a change in other hospitals' stratum-specific patient numbers therefore depends on whether those hospitals perform better or worse than average in the affected strata.
An illustration of this result is given by the variation in the SMR of hospital 2 in Fig 4. Since hospital 1 performs worse than average in stratum 2, increasing the number of patients η in this hospital belonging to that stratum reduces the SMR of hospital 2.

Summary of results
Evaluating the derived properties of the SMR using the five axiomatic requirements formulated above yielded differences between external and internal standardization (Table 8). Under external standardization, the SMR fulfills the requirements of strict monotonicity and scale insensitivity but violates the requirement of case-mix insensitivity, the equivalence principle, and the dominance principle. All axiomatic requirements not fulfilled by the SMR under external standardization are also not fulfilled by the SMR under internal standardization due to similarity in their mathematical structure. Additionally, higher mortality rates may induce lower SMR values and the SMR of large hospitals is driven towards unity under internal standardization. The internally standardized SMR therefore also violates the requirements of strict monotonicity and scale insensitivity and, thus, fulfills none of the postulated axiomatic requirements.

Discussion
This paper proposed five axiomatic requirements for risk standardized mortality measures (strict monotonicity, case-mix insensitivity, scale insensitivity, equivalence principle, dominance principle). Given these axiomatic requirements, properties of the SMR were formally investigated and evaluated. The results of our analyses indicate that several properties of the SMR hamper valid assessment and comparison of hospital performance based on this measure. This finding has very high public health relevance, as clinicians, healthcare decision makers, the public, and all users of quality of care information based on SMRs are confronted with potentially biased information and, thus, may draw inappropriate conclusions. Effects of variations in case mix on the SMR were found to depend not only on hospital size and the initial patient composition of a hospital but also on the size of its SMR. Variations in actual mortality rates depend on the hospital's expected overall mortality rate and, thus, on its case mix. Under external standardization, the stratum-specific expected mortality rates have crucial influence on the size of the SMR. Paradoxically, variations in these expected mortality rates may reverse the rank of two hospitals although one of the hospitals unambiguously performs better than the other in terms of actual mortality rates.
While hospital size has no effect on the SMR under external standardization, this desirable property of scale insensitivity is absent under internal standardization. In this case, the SMR of large hospitals is, ceteris paribus, more close to 1 than the hospital of small hospitals. This results is driven by the fact that large hospitals have more influence on expected mortality rates than small hospitals under internal standardization. This influence on expected mortality rates also modifies the effect of variations in actual mortality rates on the SMR. In extreme cases, higher actual mortality rates may be related to a lower SMR of the considered hospital. This paradoxical effect particularly may arise in specialized hospitals that almost exclusively treated specific patient groups.
In summary, our findings significantly extend previous research on properties of the SMR [19][20][21][22][23][24][25][26][27] by formally deriving expressions and conditions describing the behavior of the SMR. In this way, this study provides a comprehensive and exact characterization of this commonly used hospital performance measure.

Limitations and prospects
The analyses presented in this paper provide a clear description of central properties of the SMR. However, although we constructed hypothetical examples illustrating these properties, we did not provide empirical examples based on real-world data. Presumably, the extent to which the described drawbacks of the SMR are empirically relevant depends on the considered indication, the choice of risk factors used to define patient strata, and the similarity of the considered hospitals with respect to case mix and mortality rates. While investigating these issues in specific settings is beyond the scope of this paper, future studies may examine the insights highlighted in this paper empirically. Such analyses may also make the SMR's properties, which were derived analytically in this paper, more accessible and understandable for a broader audience.
Some methodological issues related to the SMR, that were known from previous studies, were not presented again in detail in our analysis. SMR values were found to be sensitive to the choice of the estimation method [13], readmission rates [14], differences between hospitals with respect to coding quality [15], and correlation between quality of care and risk factors [16]. In some cases, changes in the SMR over time were primarily driven by changes in expected rather than actual mortality rates [17]. Moreover, violations of the assumption of identical relationships between mortality and its risk factors across all analyzed hospitals were shown to induce bias in the estimation of the SMR [18].
Furthermore, while this study revealed several undesirable properties of the SMR under both external and internal standardization, it did not provide an alternative measure of hospital performance. Some studies point to certain advantages of measures like the comparative mortality figure (CMF) or excess risk (ER) [16,30]. A problem with direct standardization approaches (as underlying the CMF) in the context of hospital performance assessment is that the number of considered risk strata is often large, which increases the likelihood that some hospitals may have treated only few or no patients from all strata. In this case, direct standardization may assign huge weights to small quantities of data. In the extreme case of zero observations for specific strata, the corresponding estimators are even undefined [31]. Hence, conventional approaches to direct standardization are often not applicable in the context of hospital performance assessment. Systematic analysis and development of suitable approaches to measurement of hospital performance [30,32,33] therefore may be a promising route for further methodological development. In this regard, regression approaches relying on multiplicative model formulations may be of special interest due to their direct relation to the calculation of the SMR outlined above [27]. While heterogeneity between hospitals in terms of case mix and mortality rates also reduces the validity of model-based approaches, they offer the advantage of making assumptions required for valid analysis more explicit. This, in turn, may facilitate empirical assessment of the validity of model-based SMR estimations [27].

Practical implications
Contrary to internal standardization, external standardization ensures strict monotonicity and scale insensitivity. Hence, external standardization should generally be preferred over internal standardization in practical applications. This is particularly true when the number of analyzed hospitals is small or when there are large and/or specialized hospitals that almost exclusively treated specific patient groups. Nonetheless, practitioners should be aware of the potential drawbacks related to the use of the SMR under both standardization approaches. The SMR generally violates the requirement of case-mix insensitivity, the equivalence principle, and the dominance principle. Particularly in the presence of large heterogeneity of the analyzed hospitals in terms of case mix and mortality rates, the SMR cannot be trusted. As a general recommendation, empirical studies therefore should assess and report the degree of heterogeneity of the considered hospitals and take effects of heterogeneity into account when interpreting calculated SMRs. Useful approaches to assessing potential bias could build on the above-mentioned condition of proportionality, which must hold for SMR estimations to be valid [27]. However, further research is required to derive specific, reliable recommendations to assess potential bias in practical applications of hospital performance measurement.