The Charlson Comorbidity Index Can Be Used Prospectively to Identify Patients Who Will Incur High Future Costs

Background Reducing health care costs requires the ability to identify patients most likely to incur high costs. Our objective was to evaluate the ability of the Charlson comorbidity score to predict the individuals who would incur high costs in the subsequent year and to contrast its predictive ability with other commonly used predictors. Methods We contrasted the prior year Charlson comorbidity index, costs, Diagnostic Cost Group (DCG) and hospitalization as predictors of subsequent year costs from claims data of fund that provides comprehensive health benefits to a large union of health care workers. Total costs in the subsequent year was the principal outcome. Results Of the 181,764 predominantly Black and Latino beneficiaries, 70% were adults (mean age 45.7 years; 62% women). As the comorbidity index increased, total yearly costs increased significantly (P<.001). At lower comorbidity, the costs were similar across different chronic diseases. Using regression to predict total costs, top 5th and 10th percentile of costs, the comorbidity index, prior costs and DCG achieved almost identical explained variance in both adults and children. Conclusions and Relevance The comorbidity index predicted health costs in the subsequent year, performing as well as prior cost and DCG in identifying those in the top 5% or 10%. The comorbidity index can be used prospectively to identify patients who are likely to incur high costs. Trial Registration ClinicalTrials.gov NCT01761253


The population
The population is a union of health and hospital workers in the Northeast, whose self-insured trust fund administers comprehensive health benefits to the members, their spouses and children. The fund covers all medically necessary hospital, medical, maternity, behavioral health and pharmacy services, and maintains their own claims data repository. There were 226,157 individuals who were eligible for benefits for at least 11 months between January-December 2009; of those, 185,294 (86.2%) remained eligible for at least 11 months from January-December 2010. Overall 18.1% (n540,862) did not remain eligible for reasons such as retirement or dependent children reaching the maximum age of eligibility. This analysis focuses on the 181,764 beneficiaries who were consistently eligible for benefits over at least 22

Demographic and clinical data
Beneficiary age and gender were available. Comorbidity was assessed through the Charlson comorbidity index [19]. Different weights are assigned for specific conditions and the weights are added to find the index for a specific patient (e.g., a patient with depression, COPD and lymphoma would have a weight of 4) ( Table 1). Data on use of warfarin was not available.
The comorbidity index was assessed from claims data for services provided between January and December 2009; each claim had at least one primary ICD-9 diagnosis code and up to three secondary codes. The comorbidity index was assessed using the Deyo strategy [23]. The comorbidity index for the first year was computed for all diagnoses recorded in all of the claims during the first year. 10.2% had no claims in 2009; of those, 63% were adults and 37% children. Among adults, of those who had no claims in 2009, 53% also had none in 2010. Among children, 43% of those who had no claims in 2009 also had none in 2010. Costs were zero for those without claims. The prospective DCG model utilizes claims data and age, sex, diagnoses and their interactions to predict future costs; the 2009 prospective DCG was used to

Health care costs
Claims data was used to document the cost and utilization of services among these consistently eligible beneficiaries. The type and place of service were available. The prior year hospitalization was the actual number of hospitalizations. Total amount paid for all services including inpatient, outpatient, emergency room, laboratory tests, behavioral health and prescription drugs for January-December 2010 were evaluated.

Data Analysis
To compare the costs for those with a given chronic disease versus those with that disease and other comorbid diseases, an adjusted comorbidity index was calculated by subtracting the Charlson comorbidity weight for that disease from the total comorbidity score. For example, patients with congestive heart failure (CHF) who have an adjusted index score of zero have only CHF, while those with an index of one or more have other illnesses as well. Four predictors of subsequent yearly costs (i.e., prior year costs, prior year hospitalization, prior year comorbidity and prior year DCG) were compared using four different analytic approaches: a two part regression modeling strategy often used in econometric analysis; quantile regression to predict the upper 5% and 10% of the cost distribution; logistic regression to predict whether a specific individual would be in the upper 5% or 10% cost strata using positive predictive value; and receiver operating characteristic (ROC) analysis to evaluate sensitivity and specificity. Since cost data was skewed by both high cost patients and by those with zero costs, standard regression could not be used. Age, gender and mental health were controlled for in all regressions. There was no data on race/ethnicity in the claims; however, the beneficiaries are predominately Black and Latino. The data did not include the specific location where the beneficiaries received services.
A two part mean regression framework for modeling total health care cost The first part of the two-part model is a binary outcome model that describes the distinction between non-users (zero cost) and users of services (non-zero cost), while the second part is a linear regression that describes the distribution of total health care cost for patients who used services (see File S1) [25]. As a diagnostic to assess the functional form for current and prior year (log) cost, we fit a nonparametric estimate of the relationship using a local polynomial smoother and found that the linear term was adequate. The higher order terms added little to the model fit statistics, consequently we went with the linear term.
Quantile regression was employed to assess the relationship between predictors and the upper tail of the cost distribution, controlling for age, gender and mental health diagnoses [26]; since it focuses on the upper tail, those with zero costs in the lower tail of the distribution do not heavily affect the estimates. The pseudo R 2 is the measure of model fit [27]. The methodology for fitting a quantile regression model involves minimizing a weighted sum of absolute deviations. The pseudo-R 2 in quantile regression is calculated as 1 -(sum of weighted deviations about estimated quantile)/(sum of weighted deviations about raw quantile) (see Koenker, R. 2005. Quantile Regression. New York: Cambridge University Press). We used this definition of pseudo-R 2 for each of the models. The pseudo-R 2 is analogous to the classical R 2 51 -(error sum of squares)/(total sum of squares) from multiple regression.
Positive predictive value was assessed through logistic regression which was used to build a model that predicted whether an individual would fall into the top 5% and 10% highest predicted cost groups [28]. A 50% training set was used to generate 1000 new training sets by sampling uniformly and with replacement (the bootstrap samples); models were then fit using the bootstrap samples and combined for classification into the top 5% and 10% groups. This bagged bootstrap sampling was used to generate positive predictive values [29].
Receiver operating characteristic (ROC) analysis evaluates the sensitivity and specificity at various threshold settings of prior year costs, prior year comorbidity, prior year hospitalizations and prior year DCG to predict membership in the top 5% and 10% cost groups [30]. The area under the curve (AUC) used to compare classifiers.

Results
Overall, 69.3% of the predominantly Black and Latino beneficiaries were adults (mean age 45.7¡12.3 years; 62.3% women), while 30.7% were children less than 18 years of age (3.9% less than one year; 21.3%, 1-6 years; 34.0%, 6-11 years; and 40.8%, 12-17 years). With regard to the adults, 75.3% were union members; 20.5%, their spouses; and 4.2%, their covered adult children (mean age 19.5¡1.2 years). 99.2% of the beneficiaries live in the New York tristate area where they receive care through many different physicians, practices and hospitals. Table 2 shows the beneficiaries, adults or children, according to the prior year comorbidity index and their subsequent adjusted yearly costs. (Table S1 in File S1 shows the unadjusted costs; it also shows results were monotonic and did not depend on specific groupings).Overall, 89.0% of the costs were incurred by adults and 11.0% by children. Adults had a mean yearly cost of $4,371 and children, a mean cost of $1,324 (median costs $1,879 and $533, respectively).
The total health care costs in 2010 were $670 million. Beneficiaries with a comorbidity index $4 constituted only 6.5% of the population, but accounted for $154 million, or 23.0% of total 2010 costs. As expected, adults and children differed significantly in comorbidity (p,.001); only 0.2% of children had a comorbidity index $4 in comparison to 9.2% of the adults.
Comorbidity and health care costs Figure 1 shows the distribution of beneficiaries according to the adjusted comorbidity index for the specific chronic diseases. Those with an adjusted comorbidity index of zero have only the specific chronic disease. Regardless of specific chronic disease, most patients had low comorbidity. Only a small proportion of patients with any specific chronic disease had high comorbidity. (See Table S2 in File S1 for numbers of patients in each category.) Figure 2 shows the adjusted comorbidity index and total costs. As the index increased, total yearly cost increased significantly, regardless of the specific chronic disease (P,.001). At lower comorbidity, the costs did not differ significantly for different chronic diseases. Comorbidity and hospitalization Overall 6.4% of adults and 2.0% of children were hospitalized. Table 3 shows hospitalization rates according to comorbidity. As comorbidity index increased, the proportion of beneficiaries hospitalized once, twice or three or more times increased significantly (P,.001). Patients with higher comorbidity had more hospitalizations (p,.001) and more repeated hospitalizations (p,.001). Table 3 also shows the costs for beneficiaries according to the comorbidity index and the number of hospitalizations. Costs steadily rose with increasing comorbidity index (p,.001) and increasing hospitalizations (p,.001).

Comorbidity and mental/behavioral health
The only mental or behavioral health conditions in the comorbidity index are depression and dementia. 0.9% of the beneficiaries had dementia, and only 2.9%  . The y-axis is total costs, that is, the total yearly costs for patients with that disease according to the comorbidity level. The x-axis is the adjusted comorbidity index found by subtracting the weight of each disease from the patient's comorbidity index for those patients with the stated disease. Thus, a patient with an adjusted comorbidity index of 0 has only that chronic disease. Predictive modeling Table 4 shows predictions of the 2010 total expenditure from regressions that included either prior year costs, prior year comorbidity index, prior year hospitalization or the DCG prospective risk score. Total costs in 2009 were $4,256  Overall, 9.7% of beneficiaries had zero costs in 2010. Those with lower comorbidity (p,.001), without a mental health diagnosis (p,.001), men (p,.001), and those with zero costs in 2009 (p,.001) were more likely to have zero costs in 2010. In adults, prior year costs explained 31% of the variance, while the comorbidity index and prospective DCG each explained 20%; prior hospitalization explained only 11%. In children, prior costs explained 20% of the variance, and the comorbidity index explained 17% of the variance, while prospective DCG and prior hospitalization explained 7%. Each model in Table 4 is controlled for age, gender and mental health diagnosis. Random effects for zip code and fixed effects for employer type were not significant and were omitted from the regression models in Table 4. Table 5 shows the results of quantile regressions for predicting the top 5 th and 10 th percentile of costs for both adults and children. Prior year costs, prior year comorbidity, prior year DCG, and prior year hospitalizations were all evaluated as predictors of upper 5% and upper 10% of subsequent (2010) costs in separate models controlling for age, gender and mental health diagnosis. In adults, the comorbidity index was equivalent to DCG and prior cost in predicting the top 5% and 10% of cost, while prior hospitalization had much lower ability to identify such patients. In children, the comorbidity index, prior costs, and DCG also had similar predictive ability for the top 5% and 10% of costs. Table S3 in File S1 shows the percent of patients in each prior year comorbidity rank who fell into the upper 0.5%, 1.0%, 2.0%, 5.0%, and 10% of subsequent year costs. Since there is a monotonic gradient of increasing costs with increasing comorbidity, the findings are not contingent on a specific cutoff. Another approach to classification assessed how likely it was for an individual to be in the top 5% and 10% of 2010 costs based on prior year characteristics. Table S4 in File S1 shows the positive predictive values for predicting 2010 costs from bootstrapping using prior year costs, prior year comorbidity, prior year hospitalizations and prior year DCG to predicting membership in the top 5% and 10% of 2010 cost, controlling for age, gender and mental health diagnosis. Prior year comorbidity had a slightly higher positive predictive value for identifying those who would become a member of the top 5% and 10% subsequent year cost than hospitalization and both are better than prior year costs and prior year DCG score. The positive predictive values are relatively stable across the top 5% and 10% cost groups. With respect to the ROC analysis, the comorbidity index achieved almost identical area under the curve as prospective DCG and prior cost in predicting the top 5% and 10% of subsequent costs, these three measures are slightly better than prior years hospitalization. The small difference between the findings for positive predictive value and area under the curve is that the positive predictive value depends on the prevalence of high costs as well as on sensitivity and specificity. Each model in Table 5 is controlled for age, gender and mental health diagnosis. Again, in this set of regression models random effects for zip code and fixed effects for employer type were not significant and were omitted from the regression models in Table 5.

Discussion
In this study, multiple chronic disease was operationally defined using the Charlson comorbidity index. Cited in more than 9,500 publications, the comorbidity index is the most extensively validated measure of the prognostic impact of multiple chronic illnesses. The prior year comorbidity index was compared to prior year cost, prior year DCG and prior year hospitalization as a predictor of subsequent year costs using two part regression and quantile regression as well as several classification approaches.
With two part regression the comorbidity index explained 20% of the overall variance in adults vs. 31% for the benchmark prior year costs, 20% for DCG and 11% for prior hospitalizations. The comorbidity index did almost as well as prior cost in children (17% vs. 20%), better than DCG (7%), or hospitalizations (7%). A similar analysis evaluating predictors of costs among VA patients enrolled in an ambulatory care quality improvement trial found that the adjusted R 2 for prior year costs was 4.9%; DCG, 7.2%, and the comorbidity index, 3.6%; however, the explained variance was low because costs were estimated base on units of service (visits, hospitalizations) and not on actual expenditures [31]. In the quantile regression analysis, for both adults and children, the comorbidity index did slightly less well than prior year costs and as well or better than DCG in predicting the crucially important highest-cost patients, represented as the upper 5 th and 10 th percentile. One study that used the Medical Expenditure Panel Survey data to predict the top 10% of costs found that the pseudo R 2 for DCG was identical to our study, 20% [32]. A recent evaluation of adults in the upper 10% of costs who receive care from the Mayo Employee and Community Health practice found that CMS-HCC and Charlson comorbidity index had identical explanatory power as assessed by the C statistic [33]. In both of these analyses, the amount of explained variation, while relatively low, is typical for models predicting yearly costs including those currently used by CMS [20].
Two classification approaches positive predictive value and ROC analysis were also used to classify patients according to their likely membership in high cost groups. Prior year comorbidity had a slightly higher positive predictive value than prior hospitalization and both were better than prior costs and prior DCG score. With respect to the ROC analysis, the comorbidity index achieved almost identical area under the curve as prospective DCG and prior cost in predicting the top 5% and 10% of subsequent costs.

Limitations
These results were obtained from analysis of a specific population and cannot be directly compared to the explanatory power of other published cost models, in part because any analysis of costs, including this one, encompass differentials in provider pricing. However, contrasting the different models in this population provides a context for evaluating the predictive ability of prior comorbidity, prior cost, prospective DCG and prior hospitalizations to predict subsequent costs in the same population. The comorbidity index could not be calculated for the 10% who did not have claims in the first year.
In addition, the comorbidity index has not been independently validated in children, although it has been used in some pediatric studies [34][35][36]. The comorbidity index does not capture a variety of conditions unique to pediatrics. Although children who have illnesses that in the index do accumulate higher costs, further modification would be required to adapt it to the chronic conditions unique to children. It should also be noted that supply side factors were not considered in this analysis.

The importance of the findings
To be successful in reducing health care expenditures, interventions have to identify the patients who are most likely to have sustained high costs. If not targeted precisely toward such patients, interventions will cost more than they will save. This was the experience with disease management interventions [37][38][39][40]. Studies have shown that both prior costs and prior hospitalization overestimate subsequent health care utilization and neither reliably identifies those who have stable high costs from year to year [3,6]. Likely as a result, intervention programs that have targeted beneficiaries using these definitions have not demonstrated savings [3,41]. Thus, to date, most interventions have not been successful in reducing costs in patients with multiple chronic diseases. Our hypothesis is that they did not efficiently target the patients at highest risk of high future costs.
Focused on adults, a comorbidity index $5 identifies 5% of patients who will have 17.5% of costs; or comorbidity index $4 identifies 9.3% of patients who will incur 25.6% of costs. A recent study using the VA criteria found that 32% of patients had 65% of costs [42]. The issue is the tradeoffs between the percent of patients and the percent of possible reduction in future costs.
This study used the comorbidity index calculated for patients from claims data from the prior year to predict subsequent year costs and contrasted it with commonly used predictors of cost including prior hospitalization, prior costs, and DCG. Prior costs and prior DCG usually require 12 months of claims data, which generally creates a lag time of 6-18 months [43]. Accordingly, such data cannot be used to identify risk for high costs among patients for whom health plans or employers have no prior utilization data, including those who may gain insurance under the Affordable Care Act. Prior hospitalization might be ascertainable at the time of enrollment but clearly affords less explanatory power.
Importantly, the Charlson comorbidity index can also be documented prospectively by a short questionnaire as well as through claims or chart data. For more than 20 years, we and others have ascertained the comorbidity index in less than 10 minutes through interview in person or by telephone [44,45].

Conclusions
Among 181,764 beneficiaries, the Charlson comorbidity index from a prior year predicted high health costs in the subsequent year. The index performed as well as prior cost or DCG in identifying those who would have higher costs, and those who would fall in the top 5% or 10% of costs in both adults and children. The comorbidity index also predicted subsequent hospitalization and repeated hospitalization, the largest component of costs. The comorbidity index provides a reproducible and prospectively applicable method of identifying patients at high risk of incurring high subsequent costs. As a reproducible, clinically validated and widely used method that has commensurate face validity for clinicians, the Charlson comorbidity index shows where the cost curve bends [46].

Supporting Information
Data S1. doi:10.1371/journal.pone.0112479.s001 (ZIP) File S1. Contains the description of the two-part regression. Table S1, Yearly costs for adults and children according to each rank of the comorbidity index. Table S2, Number of patients with specific chronic diseases according to the adjusted comorbidity score.