Improving the Rank Precision of Population Health Measures for Small Areas with Longitudinal and Joint Outcome Models

Jessica K. Athens; Patrick L. Remington; Ronald E. Gangnon

doi:10.1371/journal.pone.0130027

Abstract

Objectives

The University of Wisconsin Population Health Institute has published the County Health Rankings since 2010. These rankings use population-based data to highlight health outcomes and the multiple determinants of these outcomes and to encourage in-depth health assessment for all United States counties. A significant methodological limitation, however, is the uncertainty of rank estimates, particularly for small counties. To address this challenge, we explore the use of longitudinal and pooled outcome data in hierarchical Bayesian models to generate county ranks with greater precision.

Methods

In our models we used pooled outcome data for three measure groups: (1) Poor physical and poor mental health days; (2) percent of births with low birth weight and fair or poor health prevalence; and (3) age-specific mortality rates for nine age groups. We used the fixed and random effects components of these models to generate posterior samples of rates for each measure. We also used time-series data in longitudinal random effects models for age-specific mortality. Based on the posterior samples from these models, we estimate ranks and rank quartiles for each measure, as well as the probability of a county ranking in its assigned quartile. Rank quartile probabilities for univariate, joint outcome, and/or longitudinal models were compared to assess improvements in rank precision.

Results

The joint outcome model for poor physical and poor mental health days resulted in improved rank precision, as did the longitudinal model for age-specific mortality rates. Rank precision for low birth weight births and fair/poor health prevalence based on the univariate and joint outcome models were equivalent.

Conclusion

Incorporating longitudinal or pooled outcome data may improve rank certainty, depending on characteristics of the measures selected. For measures with different determinants, joint modeling neither improved nor degraded rank precision. This approach suggests a simple way to use existing information to improve the precision of small-area measures of population health.

Citation: Athens JK, Remington PL, Gangnon RE (2015) Improving the Rank Precision of Population Health Measures for Small Areas with Longitudinal and Joint Outcome Models. PLoS ONE 10(6): e0130027. https://doi.org/10.1371/journal.pone.0130027

Academic Editor: Thomas Niederkrotenthaler, Medical University of Vienna, AUSTRIA

Received: February 27, 2013; Accepted: May 15, 2015; Published: June 22, 2015

Copyright: © 2015 Athens et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Funding: The work represented in this paper was supported by the Robert Wood Johnson Foundation under grant 65017, Mobilizing Action Toward Community Health (MATCH). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The County Health Rankings, first published in 2010 by the University of Wisconsin Population Health Institute, provide population health measures for nearly all United States counties. The Rankings are designed to direct media and policy-maker attention toward the multiple determinants of health and encourage in-depth community health assessment [1]. The community engagement and discussion about health and its determinants is intended to motivate the implementation of evidence-based programs to improve population health.

One challenge to the Rankings is its reliance on small-area estimates, which are commonly affected by small sample sizes, large standard errors, and statistical outliers. These features of small-area estimates lead to uncertainty regarding the health of counties—especially for small counties—in the Rankings. Hierarchical Bayesian models, however, can be used to improve small-area estimates of health-related measures and the resulting ranks. The benefits of Bayesian estimates are well-known: they draw in extreme values that are often statistical artifacts due to data sparsity by using information from related units [2]. Furthermore, Bayesian estimates allow us to estimate more adequately uncertainty in performance across units—a critical feature when comparing (or ranking) entities [3].

Bayesian estimates can be generated from hierarchical models with no fixed effects (empty models), or estimates can be informed by adding covariates to the model which, in most cases, improve their precision [4]. Longitudinal and joint outcome models are another way to leverage information and estimate more accurately county-level health measures and ranks. Longitudinal data are ideally suited for hierarchical models, as data from different time points are nested within counties. Joint outcome or shared component models are a generalization of the longitudinal model. Two or more measures that share common determinants (i.e., risk factors) can be modeled together in order to pool data and “borrow strength” for more reliable point and rank estimation [5,6]. These features of joint or longitudinal modeling could prove valuable for the Rankings, as earlier work has demonstrated that applying univariate hierarchical Bayesian models result in limited improvement for rank precision relative to ranks based on observed data, particularly for middle-ranked counties [4].

Methods

Data

The County Health Rankings report data for more than 25 health-related measures. Five of these measures represent the “health outcomes” or the current health of a community. Health outcomes measures include premature mortality, self-reported fair or poor health prevalence, average poor mental health days per month, average poor physical health days per month, and percent of live births with low birth weight. The remaining measures in the Rankings model represent the determinants of these health outcomes and are divided into four categories: health care, health behaviors, socioeconomic factors, and the physical environment [7]. Not all of the measures have longitudinal data available or are appropriate for modeling together, so we restricted our analyses to the five health outcome measures. More information on the measures and methods used in the County Health Rankings is available at www.countyhealthrankings.org.

The measures and data sources for health outcomes in the 2010 Rankings are listed in Table 1. For all measures but premature mortality, the analyses used National Vital Statistics System (NVSS) and Behavioral Risk Factor Surveillance System (BRFSS) data aggregated to the county level and reported to the Institute.

Download:

Table 1. Data Used in Longitudinal and Joint Outcome Models.

https://doi.org/10.1371/journal.pone.0130027.t001

The Rankings represent premature mortality by the rate of years of potential life lost before the age of 75 (YPLL-75) per 100,000 population, where each death in a county is weighted based on the difference between age 75 and the age at death [8]. Because YPLL-75 rates are a composite of age-specific death rates, we directly modeled the underlying mortality rates for nine age groups and used the resulting output to calculate posterior samples of YPLL-75. Raw mortality and population counts for age < 1, ages 1–4, and 10-year age groups between the ages of 5 to 74, were accessed from the CDC WONDER underlying cause of death query system for four 3-year time periods: 1995–97, 1998–2000, 2001–03, and 2004–06 [9].

The percentage of counties with missing health outcome data ranged from 3.1% to 13.7%. Counties with missing data were disproportionately rural. Values for vital statistics measures—YPLL-75 and low birth weight births—were suppressed if based on five or fewer events. BRFSS censored values for counties with fewer than 50 respondents or a 95% confidence interval width greater than 20% of the point estimate.

Longitudinal model

Mortality data for each of nine age groups included number of events (y) and population denominator (n) for the years 1995–97 (t = -3), 1998–2000 (t = -2), 2001–03 (t = -1), and 2004–06 (t = 0). Estimates from the years 2004–06 were our primary interest, so we set that time period as our intercept. For each age group, we fit the following generalized linear mixed effects log-linear regression models for age-specific mortality with state- and county-level random intercepts and slopes.

Where

y_jkt = number of deaths in county j in state k during year t

n_jkt = population for age group in county j in state k during year t

ρ_jkt = mortality rate for age group in county j in state k during year t

α = fixed effects intercept parameter for mortality at time = 0 (2004–06)

a_k = state-specific random intercept parameter

= variance of state-specific random intercept parameters

a_jk = county-specific random intercept parameter

= variance of county-specific random intercept parameters

β = fixed effects slope parameter for time

b_k = state-specific random slope parameter

= variance of state-specific random slope parameters

b_jk = county-specific random slope parameter

= variance of county-specific random slope parameters

for j = 1, 2, …, m_k; k = 1, 2, …, 51; and t = -3 (1995–97), -2 (1998–2000), -1 (2001–03), 0 (2004–06)

We used the resulting posterior samples of age-specific death rates in 2004–06 (ρ_jkt) to calculate premature mortality (YPLL-75) rates by county. For more information on the calculation and use of YPLL rates, see Wise et al., 1988 [8] and Gardner and Sanborn, 1990 [10].

Joint outcome models

We also examined the performance of joint outcome models for 2004–06 age-specific mortality data—borrowing strength across age groups rather than over time—and for two additional sets of measures: (1) average poor physical and poor mental health days per month and (2) percent reporting fair or poor health and percent of live births with low birth weight. In our measure dyads (1) and (2) we chose to jointly model measures that followed the same distribution (e.g., Gaussian, binomial). Although one could jointly model measures from different families of distributions [5,11], we do not consider such models in this paper.

The first measure pair was selected due to the positive correlation between average poor mental health days and poor physical health days per month. High county-level averages for poor mental and poor physical health days are both strongly associated with low levels of physical activity, high poverty rates, and inadequate social support [12]. Allostatic load or stress response is considered a common determinant for mental and physical health [13]. Though—unlike the model for mortality rates—this joint modeling does not fundamentally increase sample size, we hypothesized that including more information per respondent in a joint outcome model could improve the accuracy of the resulting estimates for each measure. The second measure pair—fair/poor health prevalence and low birth weight births—was chosen to demonstrate that, for unrelated or weakly related measures, the joint modeling procedure will neither improve nor degrade the accuracy of the resulting estimates.

Premature mortality.

Using cross-sectional mortality data (2004–06), we created the following joint outcome model for premature mortality across the 9 age groups.

Where

y_ajk = number of deaths for age group a in county j in state k

n_ajk = population for age group a in county j in state k

ρ_ajk = mortality rate for age group a in county j in state k

α = fixed effects intercept parameter for reference age group (age <1 year)

a_k = state-specific random intercept parameter for reference age group

= variance of state-specific random intercept parameters for reference age group

a_jk= county-specific random intercept parameter for reference age group

= variance of county-specific random intercept parameters for reference age group

β_a = fixed effects slope parameter for age group a

b_ak = state-specific random slope parameter for age group a

= variance of state-specific random slope parameters for age group a

b_ajk = county-specific random slope parameter for age group a

= variance of county-specific random slope parameters for age group a

for a = 1, 2, …, 9; j = 1, 2, …, m_k; k = 1, 2, …, 51

Average poor physical and poor mental health days.

Average number of poor physical health days per month and the average number of poor mental health days per month were reported from the BRFSS. Data reported included the (observed) mean value by county (m), the number of respondents (n), and the 95% confidence limits for the county-specific means, from which standard errors (s) could be calculated. Based on the Central Limit Theorem, the sampling distribution of the observed county-specific mean value is approximately normal regardless of the underlying distribution of poor physical and poor mental health days. Using these data, we fit the following joint model for these outcomes.

Where

m_ijk = observed mean for outcome i in county j in state k

μ_ijk = population mean for outcome i in county j in state k

= sampling variance for log(m_ijk)

α = fixed effects intercept parameter for reference measure (poor physical health days)

a_k = state-specific random intercept parameter for reference measure

= variance of state-specific random intercept parameters for reference measure

a_jk = county-specific random intercept parameter for reference measure

= variance of county-specific random intercept parameters for reference measure

β_i = fixed effects slope parameter for poor mental health days

b_k = state-specific random slope parameter for poor mental health days

= variance of state-specific random slope parameters for poor mental health days

b_jk = county-specific random slope parameter for poor mental health days

= variance of county-specific random slope parameters for poor mental health days

for i = 0, 1; j = 1, 2, …, m_k; k = 1, 2, …, 51

Fair or poor health prevalence and percent low birth weight births.

Data on fair or poor health prevalence, reported from BRFSS, consisted of a point estimate (p_0jk) with nominal 95% confidence limits, which were used to calculate the county-specific standard error of p_0jk (s_0jk). From the estimated standard error, we determined the effective sample size: . Data on low birth weight births consisted of a census of all live births (n_1jk), a count of births for which birth weight was less than 2500 grams (y_1jk) and the observed proportion of low birth weight births (p_1jk = y_1jk/n_1jk). The following joint logistic regression model was fit for these outcomes.

Where

y_ijk = count of outcome i in county j in state k

π_ijk = population proportion for outcome i in county j in state k

n_ijk = (effective) sample size for outcome i in county j in state k

α = fixed effects intercept parameter for reference measure (fair or poor health prevalence)

a_k = state-specific random intercept parameter for reference measure

= variance of state-specific random intercept parameters for reference measure

a_jk = county-specific random intercept parameter for reference measure

= variance of county-specific random intercept parameters for reference measure

β_i = fixed effects slope parameter for percent low birth weight births

b_k = state-specific random slope parameter for low birth weight births

= variance of state-specific random slope parameters for low birth weight births

b_jk = county-specific random slope parameter for low birth weight births

= variance of county-specific random slope parameters for low birth weight births

for i = 0, 1; j = 1, 2, …, m_k; k = 1, 2, …, 51

Cross-sectional models

In addition to the longitudinal and joint outcome models described above, each of the five health outcomes measures were estimated using cross-sectional, univariate data in a generalized linear mixed effects model with an intercept as fixed effect, and state- and county-level random effects. An example of the Poisson model specification is below.

Where

y_jk = number of events in county j within state k

n_jk = population county j within state k

ρ_jk = event rate for county j in state k

β₀ = Intercept, or the average log event rate across all counties in all states

e_k = state-specific random effect parameter

= variance of state-specific random effects

e_jk = county-specific random effect parameter

= variance of county-specific random effects

For j = 1, 2, …, m_k; k = 1, 2, …, 51

Estimation and model fit

Model parameters (regression coefficients and variances of the state- and county-level random effects) were estimated by maximum likelihood. Empirical Bayes estimates of the random effects were obtained by conditioning on the estimated variance parameters. Samples from the joint posterior distribution of the regression coefficients, state-level random effects, and county-level random effects were drawn from (multivariate) normal approximations to their conditionally independent posterior distributions. Posterior samples of the county-specific measures were then derived from these samples. Estimation and simulations were performed in R (version 2.15.1), using the lme4 and MASS libraries [14–16].

To test our models’ performance, we used random number generation procedures for binomial, Poisson, and normal distributions to produce posterior predictive data sets based on the posterior samples and the reported population data for each measure. Similarities between parameters calculated from the posterior predictive data sets and the original data were quantified by posterior predictive p-values. These p-values are similar to those in frequentist statistics, in which a probability is set as a threshold below which the null hypothesis is rejected. In this case, a significant p-value (p < 0.05) suggested a poor fit [17,18]. Mean values would be replicated even with a poorly fitted model, so we selected inter-quartile range and skew as summary measures of a variable’s distribution.

Ranking

Posterior samples of county-specific ranks in the entire nation were obtained by ranking each draw of the county-specific means or rates. Point estimates of ranks were obtained by ranking the posterior mean ranks, which is equivalent to minimizing squared error loss on the ranks. (See Louis, 2001 for the comparative advantages of different loss functions using ranks [19].) We used these ranks to assign counties to (national) quartiles of performance: quartile 1 represents the healthiest counties, quartile 4 the least healthy. The probability of a county ranking in its assigned quartile across the posterior samples was also calculated as a means to represent rank certainty. We then mapped the posterior probability of a county ranking in its assigned quartile using the maps library in R (2.15.1) [20].

Results

Estimates and model fit

For premature deaths, our posterior predictive checks initially suggested that the joint outcome model—modeling mortality rates for all age groups simultaneously—best captured the distribution of the original data. The posterior predictive p-values for inter-quartile range and skew were both insignificant (p > 0.05), suggesting good model fit. The cross-sectional and longitudinal models, conversely, were unable to capture the dispersion of observed premature mortality rates, represented by highly significant posterior predictive p-values for inter-quartile range. However, posterior predictive p-values for individual age-group mortality rates indicate that the joint outcome model performed poorly at replicating the original data for the following age groups: ages 1–4, ages 5–14, and ages 65–74. In contrast, posterior predictive p-values for individual age-group mortality rates based on the cross-sectional and longitudinal models were all > 0.05 and indicative of reasonable model fit.

In our two other joint outcomes models, we saw some improvement in model fit for poor physical health days using the joint outcome model. The posterior predictive checks indicated that samples from the joint outcomes model better captured the dispersion in the observed data for poor physical health days, but there was little to no improvement in model fit for poor mental health days. For the final measure pair, fair/poor health prevalence and percent of births with low birth weight, the model fits were equivalent between the cross-sectional and joint outcome models. For fair/poor health prevalence, both models successfully replicated the interquartile range and skew of the original data; for low birth weight births, the models captured the dispersion of the observed measure, but not the skew. (See Tables 2–9 for estimates of fixed and random effects from all measure-model combinations.)

Download:

Table 2. Univariate Model Fits for Infant, Ages 1–4, and Ages 5–14 Mortality Rates.

https://doi.org/10.1371/journal.pone.0130027.t002

Download:

Table 3. Univariate Model Fits for Ages 15–24, 25–34, and Ages 35–34 Mortality Rates.

https://doi.org/10.1371/journal.pone.0130027.t003

Download:

Table 4. Univariate Model Fits for Ages 45–54, 55–64, and Ages 65–74 Mortality Rates.

https://doi.org/10.1371/journal.pone.0130027.t004

Download:

Table 5. Joint Outcome Model Fit for Age-specific Mortality Rates.

https://doi.org/10.1371/journal.pone.0130027.t005

Download:

Table 6. Univariate Model Fits for Fair or Poor Health Prevalence and Percent of Births with Low Birth Weight.

https://doi.org/10.1371/journal.pone.0130027.t006

Download:

Table 7. Joint Outcome Model Fit for Fair or Poor Health Prevalence and Percent of Births with Low Birth Weight.

https://doi.org/10.1371/journal.pone.0130027.t007

Download:

Table 8. Univariate Model Fits for Average Poor Mental Health Days and Average Poor Physical Health Days.

https://doi.org/10.1371/journal.pone.0130027.t008

Download:

Table 9. Joint Outcome Model Fit for Average Poor Mental Health Days and Average Poor Physical Health Days.

https://doi.org/10.1371/journal.pone.0130027.t009

Rank estimates

We used the posterior ranks to determine the probability of a county ranking in a quartile as a means to explore rank precision. For each measure-model combination, we assigned counties to a quartile based on their mean percentile rank and calculated the probability with which each county ranked in its assigned quartile. This probability typically ranged between 0.30 and 1.0, and reflects the degree of certainty with which we know a county’s rank. Among counties within a quartile, we divided them into three classes based on the probability (certainty) with which they ranked in that quartile: low certainty (p < 0.50), medium certainty (p = 0.50–0.75), and high certainty (p ≥ 0.75).

Estimates of county ranks for premature death are similar across the three models considered (cross-sectional, longitudinal and joint outcomes). Based on the posterior probabilities of counties ranking in their assigned quartile, ranks are most precise using the longitudinal model and least precise using the joint outcomes model. Table 10 shows that the percent of counties ranking in their assigned quartiles with high probability (p ≥ 0.75) is highest for the longitudinal model (71.1% of all counties) and lowest for the joint outcomes model (46.8% of all counties) (Table 10). The improvement in precision with the longitudinal model over the cross-sectional model is expected. The degradation in performance of the joint outcomes model relative to the univariate model, though surprising, may reflect challenges in model fit for the age groups with low mortality rates.

Download:

Table 10. Percent and Number of Counties with High Probability (>75%) of Ranking in Assigned Quartile by Model and Measure.

https://doi.org/10.1371/journal.pone.0130027.t010

Joint modeling of poor physical and poor mental health days produced marginally better rank precision for poor mental health days compared with the univariate model. Using a joint outcomes approach, approximately 50% of counties ranked with high probability in their assigned quartile for poor mental health days, compared to 44% of counties with a univariate approach. However, improvements across counties for poor physical health days are marginal. Exploiting the association between these two measures improved our ability to differentiate the counties in terms of performance on poor mental health days and, to a lesser degree, on poor physical health days (Table 10).

As expected, joint modeling of fair or poor health prevalence and percent of births with low birth weight had little, if any, impact on the precision of the estimated ranks relative to separate univariate models for the two outcomes. The percent of counties ranking with high certainty in their assigned quartiles is the similar between the joint outcome and univariate models, indicating no improvement—and importantly no degradation—in rank precision.

Maps of rank performance

In the following choropleth maps, we used four colors (purple, blue, orange, red) to represent the quartiles. The color ramps indicate precision; greater saturation indicates greater certainty that the county ranks in its assigned quartile. Maps of quartiles for all measure-model combinations are shown in Figs 1 through 4B.

Download:

Fig 1. Choropleth Maps of U.S. County Rank Performance in Premature Deaths.

Results based on independent model (A), longitudinal model (B), and joint outcome model (C).

https://doi.org/10.1371/journal.pone.0130027.g001

Download:

Fig 2. Choropleth Maps of U.S. County Rank Performance in Average Poor Mental Health Days.

Results based on independent model (A) and joint outcome model (B).

https://doi.org/10.1371/journal.pone.0130027.g002

Download:

Fig 3. Choropleth Map of U.S. County Rank Performance in Fair or Poor Health Prevalence.

https://doi.org/10.1371/journal.pone.0130027.g003

Download:

Fig 4. Choropleth Map of U.S. County Rank Performance in Low Birth Weight Births.

https://doi.org/10.1371/journal.pone.0130027.g004

Fig 1A–1C show U.S. counties mapped by quartile for premature death (quartile 1 representing the “healthiest” counties, quartile 4 representing the “least healthy”), based on the independent, longitudinal, and joint outcome models. These maps show differences in our ability to distinguish among counties based on our modeling approach. Ideally, a map would have very few light-colored counties, indicating that counties ranked in the same quartile across the posterior samples. The map representing rank performance from the longitudinal model (Fig 1B), though similar to the independent model (Fig 1A), shows greater rank precision. Seventy-one percent of counties rank with high certainty (p ≥ 0.75) in their assigned quartiles using posterior samples from the longitudinal model; using posterior samples from the cross-sectional model, only 59% of counties rank with high certainty (p ≥ 0.75) in a quartile. Conversely, Fig 1C demonstrates how differences among counties are attenuated when using posterior samples from the joint outcome model, with only 47% of counties ranking in their respective quartiles with high certainty (p ≥ 0.75). In these three maps, counties in the Southeastern United States rank among the least healthy, whereas the Midwest, New England, and coastal California rank among the healthiest regions.

Fig 2A and 2B show U.S. counties mapped by quartile and precision for poor mental health days, based on posterior ranks from the independent and joint outcome models. The precision of results from the independent model are poor for this measure, with nearly 20% of counties ranking in their respective quartiles with low certainty (p < 0.5). Only 14% of counties rank with such uncertainty based on posterior samples from the joint outcome model. Poor physical health days do not demonstrate notable improvement with a joint modeling approach, as greater than 20% of counties rank with low certainty in their assigned quartiles in both the univariate and joint outcome models (maps not shown). For both measures, high-performing (healthier) counties tend to cluster in the Plains states and along the mid-Atlantic seaboard. Low-performing counties are grouped in the states of Oklahoma, Kentucky, Mississippi, and Alabama.

In contrast to improvements in rank certainty that result from the longitudinal models for premature death and the joint outcomes model for poor mental health days, rank certainty based on the independent and joint outcome models is similar for percent reporting fair or poor health and percent of births with low birth weight (Figs 3 and 4 show the map results for the univariate models for fair/poor health and low birth weight births, respectively). The geographic distribution of counties ranking in the top and bottom quartiles for fair or poor health mimics that of the premature death measure: New England and the Midwest tend to rank best, with Southeastern counties disproportionately ranking in the lower quartile. High-performing counties in the low birth weight measure are clustered in the Upper Midwest and along the West coast. Counties that rank in the bottom quartile for low birth weight births are commonly located in the Southeast as well as the frontier states of Wyoming and Colorado.

Discussion

Overall, the expanded models in this paper—using either longitudinal or pooled data—can improve rank estimation and precision compared to cross-sectional or univariate models (for motivations for national rankings versus in-state rankings, see Athens et al. 2013 [4]). In the case of pooled or joint outcome models, advantages are not necessarily observed when the measures modeled jointly are insufficiently related, such as fair or poor health prevalence and percent of births with low birth weight. In this case, however, the quality of the estimates and ranks mimic that of the univariate models, and there is no degradation of rank precision through a joint modeling approach.

The finding that posterior samples for premature deaths from the joint outcome model resulted in less rank precision over the univariate models was not anticipated. One cause of this rank degradation is poor model fit for age groups 1–4, 5–14, and 65–74. Mortality rates among the young age groups are particularly low; a number of counties had zero events over the 2004–06 time period. Though we did not allow for distributions to vary within a joint outcome model, using a negative binomial or zero-inflated Poisson model would better replicate the observed data in these age groups [21]. For the oldest age group, ages 65–74, the joint outcome model overestimated the variance in mortality rates. The independent and longitudinal models estimate almost one-third of the variance for the oldest age group compared to the joint outcome model. Though a death among the 65–74 age group is given little weight in the composite YPLL measure, deaths in this age group comprise about 10% of total years of potential life lost, given the very high mortality rates at these ages. Consequently, the variance from the joint outcome model fit for age 65–74 mortality is reflected in the overall premature mortality estimates.

Refinement in rank estimates was the goal of the expanded models explored in this paper. For select measures, we saw improvement in the certainty with which we could assign counties to a specific quartile. Even so, the quartiles for which the most counties are placed with “high certainty” (p ≥ 0.75) are the top and bottom quartiles (Table 10). Ranks for middle-ranking counties remain more volatile. These results are consistent with Hall and Miller’s examination of rank performance in which a small group of “highly performing” (or, alternately, poorly performing) entities tends to remain fixed in rank [22]. Adding more information confirms the rank location of counties at the extremes, but improving precision for middle-ranking counties is more difficult.

Despite these limitations, these models may be useful in addressing another aspect of data sparseness reflected in the Rankings. The Institute currently aggregates up to seven years of data for its measures to improve estimate stability. This approach, however, limits the reactivity of the Rankings to changes in community health over time. A potential alternative is to report Bayes estimates and ranks based on data averaged over fewer years. Data from earlier years (or from other, related measures) could be leveraged in the modeling stage to provide reasonable estimates over a shorter time period.

Conclusion

Using longitudinal or pooled outcome data in Bayesian hierarchical models to create empirical estimates of county event rates and ranks in some cases reduced rank intervals, though some joint models had no impact on rank precision. Further, given the fundamentally different variance in mortality rates across age groups, the joint model for mortality resulted in less precision. These results demonstrate the importance of proper selection of measure combinations for joint modeling to improve rank performance. For models that resulted in improvements, those improvements appeared relatively modest. However, these models can be built upon by including all available data—time series, related outcomes, and perhaps other fixed effects—to maximize the performance of the models for rank estimation.

Acknowledgments

The authors would like to thank Dr. Bridget Catlin for her thoughtful review of this manuscript.

Author Contributions

Conceived and designed the experiments: JA RG PR. Analyzed the data: JA. Wrote the paper: JA PR RG.

References

1. Remington PL, Booske BC (2011) Measuring the health of communities—How and why? J Public Health Manag Pract 17(5):397–400. pmid:21788775
- View Article
- PubMed/NCBI
- Google Scholar
2. Louis TA (2010) On Bayesian shared component disease mapping and ecological regression with errors in covariates. Statist Med 29: 1239–1249.
- View Article
- Google Scholar
3. Goldstein H, Spiegelhalter DJ (1996) League tables and their limitations: Statistical issues in comparisons of institutional performance. J R Statist Soc A 159(3): 385–443.
- View Article
- Google Scholar
4. Athens JK, Catlin BB, Remington PL, Gangnon RE (2013) Using empirical Bayes methods to rank counties on population health measures. Prev Chronic Dis 10: 130028.
- View Article
- Google Scholar
5. McNab YC (2006) An innovative application of Bayesian disease mapping methods to patient safety research: A Canadian adverse medical event study. Statist Med 25: 3960–2980. pmid:16498611
- View Article
- PubMed/NCBI
- Google Scholar
6. Schur N, Gosoniu L, Raso G, Utzinger J, Vounatsou P (2011) Modelling the geographical distribution of co-infection risk from single-disease surveys. Statist Med 30: 1761–1776.
- View Article
- Google Scholar
7. University of Wisconsin Population Health Institute (2012) Ranking methods. Madison, WI: UWPHI. Available: www.countyhealthrankings.org/ranking-methods. Accessed 9 January 2013.
8. Wise RP, Livengood JR, Berkelman RL, Goodman RA (1988) Methodological alternatives for measuring premature mortality. Am J Prev Med 4(5):268–73. pmid:3224004
- View Article
- PubMed/NCBI
- Google Scholar
9. Centers for Disease Control and Prevention (2012) WONDER databases: Underlying cause of death (compressed mortality) database, 1999–2009. Atlanta: Centers for Disease Control and Prevention. Available: http://wonder.cdc.gov/mortSQL.html. Accessed 11 January 2013.
10. Gardner JW, Sanborn JS. Years of potential life lost (YPLL)—what does it measure? (1990) Epidemiology 1:322–29. pmid:2083312
- View Article
- PubMed/NCBI
- Google Scholar
11. Schwartz SL, Gelfand AE, Miranda ML (2010) Joint Bayesian analysis of birthweight and censored gestational age using finite mixture models. Statist Med 29: 1710–23.
- View Article
- Google Scholar
12. Jia H, Moriarty DG, Kanarek N (2009) County-level social environment determinants of health-related quality of life among US adults: A multilevel analysis. J Community Health 34:430–9. pmid:19554435
- View Article
- PubMed/NCBI
- Google Scholar
13. Beckie TM (2012) A systematic review of allostatic load, health, and health disparities. Biol Res Nurs 14: 311–346. pmid:23007870
- View Article
- PubMed/NCBI
- Google Scholar
14. R Development Core Team (2009) R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available: http://www.R-project.org. Last accessed 4 February 2013.
15. Bates D, Maechler M (2009) lme4: Linear mixed-effects models using S4 classes. R package version 0.999375–32. Available: http://CRAN.R-project.org/package=lme4. Last accessed 9 January 2013.
16. Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th ed. New York: Springer, 214 p.
17. Meng XL (1994) Posterior predictive p-values. Ann Stat 22(3):1142–60.
- View Article
- Google Scholar
18. Gelman A, Meng XL, Stern H (1996) Posterior predictive assessment of model fitness via realized discrepancies. Stat Sin 6(4):733–807.
- View Article
- Google Scholar
19. Louis TA (2001) Bayes/EB ranking, histogram, and parameter estimation: Issues and research agenda. In: Ahmed SE, Reid N, editors. Empirical Bayes and likelihood inference. New York: Springer, pp 1–15.
20. Brownrigg R (2012) Package ‘maps’. Available: http://cran.r-project.org/web/packages/maps/. Last accessed 4 February 2013.
21. Lord D, Washington SP, Ivan JN (2005) Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accid Anal Prev 37(1): 35–46. pmid:15607273
- View Article
- PubMed/NCBI
- Google Scholar
22. Hall P, Miller H (2010) Modeling the variability of rankings. Ann Statist 38(5): 2652–77.
- View Article
- Google Scholar

[ref1] 1. Remington PL, Booske BC (2011) Measuring the health of communities—How and why? J Public Health Manag Pract 17(5):397–400. pmid:21788775
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Louis TA (2010) On Bayesian shared component disease mapping and ecological regression with errors in covariates. Statist Med 29: 1239–1249.
View Article
Google Scholar

[6] View Article

[7] Google Scholar

[ref3] 3. Goldstein H, Spiegelhalter DJ (1996) League tables and their limitations: Statistical issues in comparisons of institutional performance. J R Statist Soc A 159(3): 385–443.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Athens JK, Catlin BB, Remington PL, Gangnon RE (2013) Using empirical Bayes methods to rank counties on population health measures. Prev Chronic Dis 10: 130028.
View Article
Google Scholar

[12] View Article

[13] Google Scholar

[ref5] 5. McNab YC (2006) An innovative application of Bayesian disease mapping methods to patient safety research: A Canadian adverse medical event study. Statist Med 25: 3960–2980. pmid:16498611
View Article
PubMed/NCBI
Google Scholar

[15] View Article

[16] PubMed/NCBI

[17] Google Scholar

[ref6] 6. Schur N, Gosoniu L, Raso G, Utzinger J, Vounatsou P (2011) Modelling the geographical distribution of co-infection risk from single-disease surveys. Statist Med 30: 1761–1776.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref7] 7. University of Wisconsin Population Health Institute (2012) Ranking methods. Madison, WI: UWPHI. Available: www.countyhealthrankings.org/ranking-methods. Accessed 9 January 2013.

[ref8] 8. Wise RP, Livengood JR, Berkelman RL, Goodman RA (1988) Methodological alternatives for measuring premature mortality. Am J Prev Med 4(5):268–73. pmid:3224004
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref9] 9. Centers for Disease Control and Prevention (2012) WONDER databases: Underlying cause of death (compressed mortality) database, 1999–2009. Atlanta: Centers for Disease Control and Prevention. Available: http://wonder.cdc.gov/mortSQL.html. Accessed 11 January 2013.

[ref10] 10. Gardner JW, Sanborn JS. Years of potential life lost (YPLL)—what does it measure? (1990) Epidemiology 1:322–29. pmid:2083312
View Article
PubMed/NCBI
Google Scholar

[28] View Article

[29] PubMed/NCBI

[30] Google Scholar

[ref11] 11. Schwartz SL, Gelfand AE, Miranda ML (2010) Joint Bayesian analysis of birthweight and censored gestational age using finite mixture models. Statist Med 29: 1710–23.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Jia H, Moriarty DG, Kanarek N (2009) County-level social environment determinants of health-related quality of life among US adults: A multilevel analysis. J Community Health 34:430–9. pmid:19554435
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref13] 13. Beckie TM (2012) A systematic review of allostatic load, health, and health disparities. Biol Res Nurs 14: 311–346. pmid:23007870
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref14] 14. R Development Core Team (2009) R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available: http://www.R-project.org. Last accessed 4 February 2013.

[ref15] 15. Bates D, Maechler M (2009) lme4: Linear mixed-effects models using S4 classes. R package version 0.999375–32. Available: http://CRAN.R-project.org/package=lme4. Last accessed 9 January 2013.

[ref16] 16. Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th ed. New York: Springer, 214 p.

[ref17] 17. Meng XL (1994) Posterior predictive p-values. Ann Stat 22(3):1142–60.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref18] 18. Gelman A, Meng XL, Stern H (1996) Posterior predictive assessment of model fitness via realized discrepancies. Stat Sin 6(4):733–807.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref19] 19. Louis TA (2001) Bayes/EB ranking, histogram, and parameter estimation: Issues and research agenda. In: Ahmed SE, Reid N, editors. Empirical Bayes and likelihood inference. New York: Springer, pp 1–15.

[ref20] 20. Brownrigg R (2012) Package ‘maps’. Available: http://cran.r-project.org/web/packages/maps/. Last accessed 4 February 2013.

[ref21] 21. Lord D, Washington SP, Ivan JN (2005) Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accid Anal Prev 37(1): 35–46. pmid:15607273
View Article
PubMed/NCBI
Google Scholar

[54] View Article

[55] PubMed/NCBI

[56] Google Scholar

[ref22] 22. Hall P, Miller H (2010) Modeling the variability of rankings. Ann Statist 38(5): 2652–77.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

Figures

Abstract

Objectives

Methods

Results

Conclusion

Introduction

Methods

Data

Longitudinal model

Joint outcome models

Premature mortality.

Average poor physical and poor mental health days.

Fair or poor health prevalence and percent low birth weight births.

Cross-sectional models

Estimation and model fit

Ranking

Results

Estimates and model fit

Rank estimates

Maps of rank performance

Discussion

Conclusion

Acknowledgments

Author Contributions

References