The Role of Geography in the Assessment of Quality: Evidence from the Medicare Advantage Program

The Affordable Care Act set in motion a renewed emphasis on quality of care evaluation. However, the evaluation strategies of quality by the Centers for Medicare and Medicaid Services do not consider geography when comparisons are made among plans. Using an overall measure of a plan’s quality in the public sector—the Medicare Advantage (MA) star ratings—we explored the impact of geography in these ratings. We identified 2,872 U.S counties in 2010. The geographic factor predicted a larger fraction of the MA ratings’ compared to socio-demographic factors which explained less. Also, after the risk adjustments, almost half of the U.S. states changed their ranked position in the star ratings. Further, lower MA star ratings were identified in the Southeastern region. These findings suggest that the geographic component effect on the ratings is not trivial and should be considered in future adjustments of the metric, which may enhance the transparency, accountability, and importantly level the playing field more effectively when comparing quality across health plans.


Introduction
Geographic variation has long been considered as an important case-mix variable in its relationship with patient and clinical outcomes [1]. A number of factors explain the importance of geographic variation in measuring health status. First, many systems of health care are organized on a geographical basis. Hence, distribution of health care resources are tailored to respond to local demands. Second, health care facilities such as hospitals and clinics are concentrated in specific locations, turning geography into a predictor of health utilization and outcomes. Third, there's substantial evidence that "area effects" are drivers of health inequality, after controlling for social and economic factors. These "area effects" represent relationships between area characteristics and individual behavior which can't be explained by individual attributes alone [2]. Therefore, when the star ratings are the main driver of bonuses to MA plans, the adjustment strategy of the ratings can't ignore contextual factors embedded in geographic considerations (such as type of population covered and regular and close access to healthcare facilities) and reward those plans that offer tailored care to their specific geographic needs.
In a pioneering study by Wennberg & Gittelsohn [3], small area variation was demonstrated as a fundamental variable in the analysis of processes of care. Since then, several studies have documented large geographic variation in regards to health expenditure [4][5][6], surgical procedures [7][8][9] and utilization of services [10][11][12][13]. Even though the large majority of these studies adjusted their results for factors such as demographics, comorbidities and socioeconomic status, the investigation between small geographic variation and quality of care requires more attention. Moreover, our knowledge of the variation of both small and large geographic areas in terms of patient reported outcomes across insurance arrangements is still limited, although some previous evidence offers some insights. For example, Kazis et al [14] in an earlier work demonstrated small to moderate differences across Veteran Administration hospitals nationwide using the Veterans RAND 36 Item Health Survey, a Patient Reported Outcome metric that assesses a person's physical and mental health.
Considering the proven extent of geographic variation in Medicare utilization of services [10], policymakers need a more complete understanding of the underlying sources of this variation in terms of quality of care before formulating policies to shrink them. For that purpose, the Medicare Advantage Program (MA), adopted the star ratings system in order to rank the plans in terms of quality of care and consumer satisfaction. The rankings have important implications for re-imbursement, bonuses and expansion of plan businesses. These rankings are computed based on four different data sources: (1) the Healthcare Effectiveness Data and Information Set (HEDIS1), (2) the Consumer Assessment of Healthcare Providers and Systems (CAHPS1), (3) the Medicare Health Outcomes Survey (HOS)which includes the Veterans Rand 12-Item Health Survey (VR-12), and (4) CMS administrative data which includes information about member satisfaction and disenrollment, as well as plans' appeals processes, audit results, and customer service [15].
The MA star ratings have been proven to vary greatly in terms of hospital characteristics and profit status. Xu et al. demonstrated that for profit and non-profit status are important predictors of the Star ratings as well as the volume of subscribers in a plan and the longevity of the plan [16]. This paper aims to investigate the association between geographic variation and the MA star ratings nationally to determine the importance of geography in predicting the ratings.

Methods
The summary MA star rating provides an overall measure of a plan's quality based on indicators related to process of care, health outcome, access to care, and beneficiary satisfaction (Kaiser Family Foundation 2009). One star indicates poor performance, two stars means below average, three stars is average, and four and five stars signify above-average and excellent performance respectively. At the time of the study, the star rating covered 36 different topics in 5 categories: staying healthy, managing chronic (long-term) conditions, member experience, member complaints and customer service for MA. If the contract provides medication benefits only, the star rating covers 15 topics and 4 categories: customer service, member complaints, member experience, and patient safety. If the contract provides health and medication services together then the star rating covers all of the topics listed above.

Data
Four data sources were used in this study: 1) The 2010 MA plan quality database, which consists of star rating information at the contract level. 2) The Medicare Advantage contract enrollment database, which reflects enrollment status as of October 1 st , 2010.
3) The 2000 census databases, which was merged to the MA plan quality database to extract the demographic information described in Table 1. 4) The U.S. county FIPs code, which is a five-digit Federal Information Processing Standard (FIPS) code that uniquely identifies counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states.

Study Data and Methods
Inclusion/Exclusion Criteria. We included only those contracts offering health services ("Part C" or "Part C + Part D") and exclude the contracts related to medication only (Part D) in our computations. Such an approach allows us to make the scores comparable within contracts. Counties in Puerto Rico, Guam and Virgin Islands were not included in our study since they are technically not states. As a result, 409 (71.1%) of 575 MA contracts were included in our final study sample, representing 86.3% of the total Medicare Advantage population (11.7 million).
Institutional Review Board approval was obtained through Boston University Medical Center. Records/information was anonymized and de-identified prior to analysis.

Unit of analysis
The analysis of this study was based upon the MA star ratings at the contract level. We began with an analysis at the FIPS level by county and contract. The U.S county serves as our unit of analysis. A county is a geographic subdivision of a state (or federal territory), usually assigned some governmental authority. County is chosen because census information is collected at the county level and Medicare Advantage contracts locate their health plan also at the county level. In subsequent analysis we also use contracts negotiated among the MA plans as a separate unit of analysis.

County Level Analysis
This study focuses on the variation in MA star ratings associated with geography, namely U.S. counties or FIPS. First, the level of heterogeneity of the MA star ratings across 50 states was tested. Variables reflected the make-up of each contract based on the weighted values from the counties in which they operated. For each county, a weight was assigned at the Medicare Advantage contracts level accounting for the population density (i.e., contracts with more enrollees will have larger weights since they account for more Medicare Advantage market share). Then, all contracts operated in each county were averaged to obtain the aggregated MA star rating score. The methodology used was not unlike that by Schneider et al [17].

Statistical analysis
The data analyses focused on the county level of MA contracts and differences in their star ratings in a series of steps. First, the MA star ratings were measured with a multicomponent index scale that can take on values from 1 to 5 in increments of 0.5. Because the underlying variables that comprise this index are continuous by nature, we analyzed the star ratings as a continuous variable as in previous published analysis [16]. Initial analysis applied general linear models to obtain the unadjusted MA star ratings by state. Multivariate linear regression models were weighted by counties to determine the fraction of each county contributing to the nationwide enrollee population (Weight = number of enrollees / 10.1 million × 2873 counties). Then, in univariate analyses, we examined the data set of 409 contracts with respect to age, race, education, below poverty level or not, rural vs. urban, and MA enrollment penetration, using nominal scale, categorical variables in terms of percentages. The Median household income is reported in measures of central tendency (mean and median counts).
Separately, we determined the independent (pure) effect of geography and socio-demographics on the MA star ratings using a variance components analysis in multivariate regressions. We defined total effect as the G (pure geography effect) + S (pure socio-demographics effect) + J (joint effect). Then, we calculated total sum of squares from 3 regression models: one with the G component, one with the S component and the third with both G+ S. Lastly, the independent (pure) effect of geography and socio-demographics was calculated by subtraction. We next analyzed the data using multivariate ordinary least squares regression models for the MA star ratings adjusting for demographics and geographic variation. The principal independent variable for the adjusted star ratings are reported by state, aggregating county data at the state level. Separate analysis included a mixed model to control for clustering effects at the county level using random effects (data available per request). All analyses were conducted in a SAS 9.1 environment.

Results
In our analysis of 409 MA contracts, there were 10.6 million beneficiaries, representing 86.3% of the total Medicare Advantage population (11.7 million). The mean star rating for all contracts was 3.33, where 5 denotes excellent performance. Table 1 reports the descriptive characteristics by the FIPs counties nationally. We observed that 14.5% of the sample were older than 65 years of age, about 84% White and 9% African American. For education, 10% had a bachelor degree or above, about 13% were below the poverty level, 57% had a rural area of residence and enrollment penetration of the MA program averaged 17%. The mean and median household income was about $35,000. Table 2 gives the variance components analysis results. Geography alone (G) explained 59.5% of the total variance. Demographic characteristics alone (S) explained 31.2% and when combined (J), all the covariates explained 71.7% of the total variance (p<0.0001).  Table 3 reports the results of the FIPS county level analysis for the current MA star rating and the adjusted star rating (when geography location and population density are factored in) at the state level. Results are rank ordered by the 50 states and highlights the "difference" between the unadjusted, unweighted star rank (current MA rating) versus the adjusted, weighted star rank (adjusted MA rating). In other words, the rank difference emphasizes a different ranking across states when geography and population density estimates are both accounted for. Positive values represent an improvement in the ranking after the adjustments. Results indicated a positive change in 20% of the states ( 5 point differences); for 6% of the states the adjustment did not change their ranking positions and 24% changed negatively (5 point differences). For example, let's describe Michigan: the current MA star rating positioned the state in the 26 th place compared to the other 50 states. After the adjustment, it ranked as the fourth highest in the country, a +22 point difference change. The opposite occurred for Hawaii with a -22 point difference after adjustment. Only three states did not change ranking after the adjustments. Overall adjustments for the S effect had a small impact on the rankings compared to the G effect (results available on request).
To assess for geographic spatial relationships, the adjusted star ratings were plotted by state (Fig 1). Results suggested lower star ratings for the southeastern and upper and lower mid sections of the U.S compared to northeastern and western states.
Finally, we reported the state-level adjusted star ratings by mean and standard deviation (Fig 2). A total of 42 states had average star ratings (between 3 and 4). Only three states ranged above-average after the adjustments. A total of seven states performed below average. When compared to current star ratings, three states went from below-average to average and one state went from average to above-average after the adjustments. In summary, the adjusted star ratings were changed positively by 0.18 and negatively by 0.15 compared to the current MA ratings.
Separately, we conducted sensitivity analyses at the contract level, to test if the contract, which can be present in more than one FIPs, may play a role on the findings about ranking variation at the state level. In total, there were 409 contracts, of these, 80 (19.6%) spanned all 50 states, 55 (13.4%) operated in a range of 2-23 states and 274 (66.9%) operated in one state only. Results from the analysis of variance in balanced data showed no significant differences between these 3 contract constellations in terms of star ratings (P = 0.32 in unadjusted models) compared with the FIPS analysis in Table 3.

Discussion
Geographic variation in healthcare remains a pervasive, persistent and substantial concern for policymakers, providers and the public in general. This and previous work [1,4,7,12,18,19] have established that first, geographic variation is not random, second, demographics and health status reduces to some extent the variation, but much unexplained variation remains, third, contextual local factors that are geographically based need to be considered more carefully when comparisons are made among plans.
This study found extensive geographic variation throughout the U.S in terms of quality of care. Overall, almost half of the states moved 5 points above or below their original MA star rank when controlling for small area variation and demographics. Also, the geography effect explained a larger fraction of the star ratings' variability than sociodemographic factors. Finally, these results suggest lower adjusted star ratings in mid-central and southeastern states. To our knowledge, this is the first study to aggregate the MA star ratings at the state level and assess the effects of the beneficiaries' demographics and small area variation (FIPS counties) over the ratings. We postulate that the geographic variability in quality of care likely arises from the interaction of several components such as the differences in the demographics and socio-economic status of populations, underlying prevalence of morbidities, differences in the approaches to treatments, and overuse and misuse of medical technologies. Thus far, the literature has provided possible explanations for these differences in quality of care, rather than systematic investigations of factors associated with geographic variability in the utilization of services and its impact on the quality of health care.
The MA star ratings are a unique opportunity to assess in a single metric relevant information for the consumer in terms of health effectiveness data, patient reported outcomes and beneficiary satisfaction. Previous work assessing geographic variation and quality have important limitations: first, many studies rely on mortality to measure performance and outcomes across geographic regions [20,21]. Mortality is reliable and valid in those conditions that have a high probability of premature death and reduction of mortality is a meaningful endpoint. However, for those conditions that are chronic and with impacts on the physical and psychological status of individuals-often seen in MA beneficiaries-increased survival in the elderly pushes research to more intermediate and long-term outcome measures with proven reliability, validity and responsiveness to change over time such as some of the components of the MA star ratings including the VR-12 physical and mental summary measures. Second, variation in quality of care should be able to distinguish between "acceptable" and "unacceptable" factors that contribute to processes of care including different comorbidity profiles among subjects, demographic characteristics and geography. Unexplained variability across geographic regions may be explained by market characteristics and discretionary provision of inefficiencies in the care process, as highlighted by the 2013 report from the Institute of Medicine [22]. It is important to recognize these factors when reaching conclusions about why utilization varies from one part of the country to another. Unfortunately, classifying the reasons behind this variation including socioeconomic status or race, are controversial adjustments because they are often seen as factors beyond the scope of the health care system, as noted by CMS. However, CMS has also been diligent in introducing changes to the MA star ratings yearly. For example, mammography will be introduced in the 2016 version of the ratings. This change reflects CMS' ongoing efforts to reflect broader appropriateness measures [23].
In addition, this study found substantial quality of care variation across counties, which warrants future studies with deeper analysis in terms of the underlying factors driving this variability.
It remains unclear as to the degree of geographic variation that should be considered relevant from a societal or health system perspective. For geography, the amount of variability reported is different across studies with differences in health status ranging from 18% [4] to almost 70% [6]. Such variability is partly explained by the different geographical unit of analysis and the risk adjustment methodology. In addition to individual characteristics or insurance provisions, other factors such as local and regional markets for pharmaceuticals, supply of providers, hospital size, and level of competition among institutions may have an impact on quality of care. For example, Wennberg and colleagues found that higher ratios of beds per capita (larger institutions) are associated with higher health care utilization in inpatient settings [24]. Another study found that higher percentage of PCPs predict lower spending per beneficiary within a region [25].
Overall, the vast majority of studies address the regional variability in terms of price [26][27][28]. However the measurement of quality of care across different geographical locations is sparse in the published literature. Even though the price factor is an important predictor for MA as a system, quality measurement across regions in the country would allow for the development of more sensitive and useful indicators for the consumer, and more importantly, identify avoidable utilization, which may be easier to address than price [18].
Practice of medicine may also help explain the geographic variation. Proposed guidelines and formularies may raise controversies within provider's practices about alternative therapies and additional information may be needed to settle these differences. Describing geographic variation does not necessarily provide solutions. Also, different concepts may be considered "proper treatment" in different regions. For example, provider groups located in urban areas may be more willing to adapt to more innovative and newer treatment options compared to the willingness of providers practicing medicine in more rural, isolated areas. Therefore, more studies are needed at finer levels of granularity. For example, Barnato et al [29] explored treatment choices for critically ill patients in Pittsburg and found clinicians overestimating the patient's preference for intensive treatment. These type of preferences may be contextual of clinicians in Pittsburg and may not hold true in other regions. Thus, finding "hot spots" of overuse is a good example of policy relevant interventions that can be planned when small geographic variation is identified, improving the likelihood of fairer comparisons.
The characteristics of the insurance arrangement within MA may also have a role in the geographic variation. Even though previous studies did not find an association between plan characteristics and quality metrics [30], no previous literature explored the interactions between regions and insurance arrangements, which might shed light about different region/ insurance mechanisms. A report about commercial insurance in the U.S related to depression treatment found that HMOs in the South were less likely to offer guideline concordant care compared to HMOs in the West unpublished data). Hence, more research is needed to disentangle the contextual factors impacting on the insurance plans' operations.
One of the main aspects of the debate in 2009 around the passing of the Affordable Care Act is the relationship between geographic variation and healthcare cost [19]. One of the main objectives of the law was universal coverage of health insurance, however concerns were raised over the additional costs of this coverage. The discussion included identifying geographic variation in terms of costs and health outcomes (mortality) to reduce over-use of supply-sensitive services and transform high cost, ineffective regions to low-cost, highly efficient care with quality services [19]. Mangione and colleagues [31] rather than using only mortality to assess variation of care across 6 hospitals, decided to measure survival after heart failure finding higher survivability rates among teaching hospitals with increased resource use. They concluded, "much more work is needed to truly distinguish inefficient from beneficial resource use" [31].
We should acknowledge several limitations for this study. First, our findings may not be generalize to health care systems with different structure, financial incentives, or provider training than MA. Therefore, our findings may not generalize to health contracts covering other populations, such as Medicaid and commercial enrollees. Second, our sample was mostly represented by those MA enrollees 75 years or younger, as a result the findings of this study may not be generalizable to those above 75 years of age. Several measurements in the Medicare plan star ratings were revised in 2012. As the CMS continues to provide updates on the ratings each year, we do not expect that star ratings changes would alter our major results in important ways. Third, our results would be strengthened if the CMS referenced the data available with various data components that are aggregated into the star ratings. However, the 2010 individual component or domain data were not publicly available at the time this study was done. Future studies that examine how contract characteristics may affect specific individual measures will be of great clinical value. Although we realize that such studies should also include the point of view of both patients and medical providers, we focused on star ratings in this study.
The evaluation of quality at the aggregated level allows a better "prioritization" of resources when comparisons are made at the county or state level. The results of this work highlight the need of contemplating the introduction of better tools to quantify and explore what drives this substantial geographic variation.

Conclusion
The current health reform underway in the United States focuses on changing the behavior of individual providers to increase access to health services, improve quality of care and decrease healthcare costs. The results of this paper suggest that the rationale for a geographical focus is strong, to the extent that the explained variation detected may reflect a number of factors some of which may include system inefficiencies, local context or diverse health care processes. The factors associated with population health in different geographic areas are largely local, rooted in the environmental, social, economic, and behavioral determinants of health. These proposed changes should encourage CMS to consider stratification by geographic units to allow fairer comparisons among plans. The pursuit of better quality of care should incentivize the use of more refined approaches such as use of geospatial statistical methods allowing the introduction of local information in the calculation of composite measures such as the star ratings.