The Drug Derived Complexity Index (DDCI) Predicts Mortality, Unplanned Hospitalization and Hospital Readmissions at the Population Level

Objective to develop and validate the Drug Derived Complexity Index (DDCI), a predictive model derived from drug prescriptions able to stratify the general population according to the risk of death, unplanned hospital admission, and readmission, and to compare the new predictive index with the Charlson Comorbidity Index (CCI). Design Population-based cohort study, using a record-linkage analysis of prescription databases, hospital discharge records, and the civil registry. The predictive model was developed based on prescription patterns indicative of chronic diseases, using a random sample of 50% of the population. Multivariate Cox proportional hazards regression was used to assess weights of different prescription patterns and drug classes. The predictive properties of the DDCI were confirmed in the validation cohort, represented by the other half of the population. The performance of DDCI was compared to the CCI in terms of calibration, discrimination and reclassification. Setting 6 local health authorities with 2.0 million citizens aged 40 years or above. Results One year and overall mortality rates, unplanned hospitalization rates and hospital readmission rates progressively increased with increasing DDCI score. In the overall population, the model including age, gender and DDCI showed a high performance. DDCI predicted 1-year mortality, overall mortality and unplanned hospitalization with an accuracy of 0.851, 0.835, and 0.584, respectively. If compared to CCI, DDCI showed discrimination and reclassification properties very similar to the CCI, and improved prediction when used in combination with the CCI. Conclusions and Relevance DDCI is a reliable prognostic index, able to stratify the entire population into homogeneous risk groups. DDCI can represent an useful tool for risk-adjustment, policy planning, and the identification of patients needing a focused approach in everyday practice.

0.835, and 0.584, respectively. If compared to CCI, DDCI showed discrimination and reclassification properties very similar to the CCI, and improved prediction when used in combination with the CCI.

Conclusions and Relevance
DDCI is a reliable prognostic index, able to stratify the entire population into homogeneous risk groups. DDCI can represent an useful tool for risk-adjustment, policy planning, and the identification of patients needing a focused approach in everyday practice.

What is new?
Administrative health databases can be used to obtain algorithms useful to forecast readmission and reduce in-care cost. Validated comorbidity indexes (such as Charlson Comorbility Index) have been applied on hospitalization data to predict the risk of death or readmission, but these models do not permit to define the out-patient risk profile. More complex predictive models were obtained to overcome this limitation through the integration of several datasources, including outpatients, accident and emergency, electronic clinical data from general practitioners, socio-economic data, and community dispensed prescriptions. Unfortunately, the different data-bases required are not always available and/or standardized. Our data show that a much simpler scoring system, solely based on drug prescriptions, can accurately predict one-year and long-term mortality, as well as the risk of unplanned hospitalization and hospital readmission.

Background
Healthcare utilization, unnecessary care and health care spending increase linearly with the number of chronic conditions affecting an individual. In U.S., 25% of the population with multiple chronic conditions account for two-thirds of total health care spending [1,2].
An accurate prediction of the risk of poor outcomes in individuals with multiple comorbidities would allow health care professionals to focus on patients who are at highest risk of hospital readmissions, inappropriate care, elevated healthcare costs, and mortality. Stratifying patients according to risk can help identifying individuals candidate to an appropriate intervention in order to improve health outcomes, allocate resources more efficiently, reduce costs and facilitate better planning. As an example, several studies have shown that focused care after discharge can decrease the risk of readmission to hospital [3][4][5][6][7][8]. Several predictive models have been developed, mainly based on clinical, hospital discharge data, or validated comorbidity indexes [7,9,10]. The main limitation of these tools is represented by the difficulty to apply them at the population level, and not only to individuals admitted in hospital or undergoing ad hoc assessments.
An alternative approach can be represented by the use of drug prescription data, using the chronic use of specific classes of drugs as a proxy of chronic diseases and an expression of healthcare complexity. The possibility to use prescription data as indicators of underlying diseases was experienced in many clinical contexts [11][12][13][14][15], and their use to define the clinical risk profile represents the evolution of this process. was able to stratify the general population according to the risk of death, unplanned hospital admission, and readmission, and compared it with the Charlson Comorbidity Index in terms of discrimination and reclassification.

Research design and methods
We conducted a population-based retrospective cohort study, using a record-linkage analysis of prescription databases, hospital discharge records, and the civil registry, including data on the population aged 40 years or over of the Puglia region in Italy (approximately 2 million out 4.1 citizens in 6 local health authorities).

Data sources
All Italian citizens have equal access to health care services and are cared for by a general practitioner as part of the National Health System (NHS). With the only exception of some drugs delivered directly by hospital pharmacy (biological agents, some anticancer drugs), prescription databases provide information on all community prescriptions reimbursed by the NHS with drugs coded according to the Anatomical Therapeutic Chemical (ATC) classification system [16]. Hospital discharge records include information about primary diagnoses and up to five co-existing conditions, performed procedures, and in-hospital death. All diagnoses are coded according to the International Classification of Disease, Ninth Revision (ICD-9 CM) [17]. Civil registry provides information on age, sex, and death or migration.
The reliability of data sources and their linkage to produce epidemiological information have been previously described [18,19]. All security and protection measures for data from patients were performed according to the national law [20]. Data were obtained from the regional health authority in Puglia, Italy, providing data on all residents and were not generated or collected for this study. Data protection was ensured by the Healthcare Agency of Puglia. All data were anonymized prior to being accessed by the authors and none of the authors were involved in data anonymization. In Italy no ethical approval is required for aggregated-anonymous data.

Baseline risk factors
Data from January 1 st 2003 to December 31 st 2010 were used to develop and validate the DDCI. A fixed cohort of all residents at 01/01/2004 and aged 40 years or above was identified from the civil registry of the Puglia region. Index date was represented by January 1 st 2004 for all citizens registered at the local health authority and alive at that time. The year of observation prior to the index date was used to define baseline characteristics. Patients were followed up from their index date to the earliest of death, migration, or the end of the study period. Prescription patterns indicative of chronic diseases (PPCD) and chronic exposure to drugs were derived from prescription databases.
Charlson Comorbidity Index (CCI) was calculated based on the diagnoses contained in hospital discharge databases.

Outcome variables
The main outcome was overall mortality. The first unplanned hospital admission occurred after the index date, hospital readmissions, and 1-year mortality were considered as secondary outcomes. Overall survival was defined as the time between index date and death. For subjects who did not die, survival time was censored at the end of follow-up period or the date of leaving the region. The time horizon for risk prediction was set at 7 years.

Study design
To control the accuracy of predictions and to increase the reliability of all statistical analyses, the whole population was divided into 2 random samples (Fig 1): • Training set, including 50% of subjects, in which different PPCDs and drug classes were included in a predictive model to develop the DDCI; • Validation set, including the other half of the population, in which the predictive properties of the in DDCI were confirmed.
In addition, two cohorts were selected from the validation set: 1. a first cohort of randomly selected residents, in which the discrimination and reclassification power of DDCI were assessed; 2. a second cohort of all subjects hospitalized during 2003, in which the performance of DDCI was compared to that of the Charlson Comorbidity Index.

Model building and statistical analysis
Patients' baseline characteristics are reported as frequency (percentage) and mean±standard deviation (SD). A multivariate Cox proportional hazard regression model including all the PPCDs and drugs to which patients were chronically exposed at baseline was performed to identify predictors of overall mortality. All drug classes significantly associated with mortality risk were included in the final model. A weight assigned to each drug class was derived from regression coefficient value divided by 0.3; the value obtained was rounded to the nearest integer, as proposed by Gagne et al. [21]. The overall sum of weights determined the score of DDCI.
Overall mortality and time-to-first unplanned hospitalization analyses were performed using multivariate Cox proportional hazards regression models, and risks were reported as hazard ratios (HRs) with their 95% confidence interval (95% CI). Survival curves and probabilities were reported according to the Kaplan-Meier method. We evaluated the performance of the score in terms of discrimination and reclassification, with mortality and first unplanned hospitalization as outcomes. A model including age and sex was considered the reference model to which DDCI was added. In hospitalized subjects selected from the validation sample the same  age and sex based reference model was used. Models adding separately the Charlson Comorbidity (CCI) index or the DDCI were tested. The final model included age, sex and both CCI and DDCI. Accurate predictions discriminate between those with and those without the outcome. Discrimination power was assessed by estimating survival C index with their 95% CI 22 . The reclassification extends the concept of discrimination by evaluating separately the subjects with and without outcome. The interpretation is opposite for subjects with and without the outcome. The proportion of events correctly reclassified and not-events correctly reclassified are summed. This sum was labeled as Net Reclassification Improvement (NRI) [22,23]. Subsequently, indices of discrimination and reclassification of the models were compared with the reference (adjusted only for age and sex). Readmissions to hospital were estimated as incidence rates (IRs; number of hospital readmissions per person years). A Poisson regression model was applied to estimate incidence rate ratios (IRRs) for individuals in the different DDCI score classes, taking the lowest class as the reference category. All statistical analyses were performed using SAS Software Release 9.4 (SAS Institute, Cary, NC).

Results
Overall, a cohort of 1,998,948 subjects aged > = 40 years (mean age 60.17 ± 13.57 years, males 46.29%) was identified. The mean follow-up period was of 6.62±1.28 years, 12.53% had experienced at least 1 hospitalization for any cause and 76.25% had at least one drug prescription in the 12 months prior the index date. The training and the validation data sets included 999,391 and 999,557 subjects, respectively (Fig 1). During 7 years 106,664 deaths were registered in the training set and 106,590 in the validation set, corresponding to a cumulative mortality proportion of 10.67% and 10.66%, respectively. There were no statistically significant differences between the two groups in terms of age, gender, previous hospitalization, chronic exposure to the different drug types, and mortality rates ( Table 1).

Development of DDCI in training sample
In the training set, time-to-death analysis was used to assess weights of different PPCDs and drug classes. All the classes of drugs not significantly associated with overall mortality were excluded from the final model. Among the drug classes included in the final model, some (such as "antidepressants" and "nonsteroidal anti-inflammatory drugs") showed a poor contribution to mortality risk, totalizing a score of 0. Opioids appeared to be the drugs more related to mortality risk. The use of lipid modifying agents and immunosuppressants was associated with a risk of death lower than the reference category, represented by individuals not taking any of the drugs considered, and therefore a negative score was assigned to them. The regression coefficient values calculated on overall mortality and the weight assigned to each drug class of the best-in-class model are shown in Table 2. DDCI was obtained through the algebraic sum of the weights of the different drug classes. The score of DDCI ranges between -3 and 33. Due to the small number of cases, subjects with a negative score of DDCI were incorporated into the lowest risk class, represented by individuals with a score of 0; similarly, the upper class contains values of 11 or greater; therefore, 12 classes of DDCI score were eventually identified.

Results of DDCI application on validation set
A statistically significant increase in hazard ratios values with the increase in DDCI score was documented in the validation cohort, closely reproducing the results obtained in the training sample (Table 3). In particular, overall mortality was below 5% in the lowest risk group, while it exceeded 70% in the highest risk group. Survival curves with Kaplan-Meier method according to DDCI score values are reported in DDCI was also able to predict time to first unplanned hospitalization; Kaplan-Meier curve showed a progressive increase in unplanned hospital admission risk with increasing DDCI score (Fig 2, Panel B). Among the 485,445 subjects with a first unplanned hospitalization in the follow-up period, the DDCI score also predicted hospital readmission. The age and gender adjusted risk of hospital readmission during the period of observation increased with the DDCI score, and the highest risk group had an incidence rate ratio of hospital readmission per person-year equal to 5.62 (5. 48-5.66) when compared to the lowest risk class ( Table 4).
The evaluation of accuracy, discrimination and reclassification capacity of DDCI was performed for the residents of the largest local health authority (N = 306,016) ( Table 5, Model A). As for 1-year mortality the estimated survival C index pointed out that in the model with age, sex and DDCI the discrimination power was slightly better than the model with only age and sex (0.851 [0.846-0.856], 0.815 [0.809-0.820], respectively). In other words, age, sex and DDCI altogether had a probability of 85.1% to predict 1-yearmortality. As for overall mortality the gain in the discrimination power by adding DDCI was slightly lower (0.835 [0.83-0.837]) than the analogous model relative to the 1-year mortality analysis. A slight improvement in term of discrimination masked a much greater improvement in terms of reclassification. In fact, the  compared to overall mortality and first unplanned hospitalization (with respect to overall mortality and first unplanned hospitalization).

Comparison between DDCI and Charlson Comorbidity Index
In the validation data set, 125,094 citizens had had at least one hospitalization during the 12 months prior the index-date. The clinical risk for these subjects was assessed through the application of the Charlson Comorbidity Index and the DDCI. When compared to the reference model (age and sex), both DDCI (Table 5 -Model B) and CCI (Table 5 -Model C) markedly improved prediction. In particular, CCI showed a better survival C index on short and longterm outcomes and a better performance than DCCI in reclassification on overall-mortality. In the final model (Table 5 -Model D), the 1-year-mortality analysis showed that the model including also DDCI was more accurate than the model with age, sex and CCI (NRI = 0.263 [0.239-0.284]) with 11.8% of events correctly reclassified and 14.5% of non-events correctly reclassified. The overall mortality analysis showed that the model including also DDCI was more accurate than the model with age, sex and CCI (NRI = 0.342 [0.329-0.357]), with 5.4% of events correctly reclassified and 28.7% of non-events correctly reclassified As for the prediction of first unplanned hospitalization, both DDCI and CCI showed a better performance when compared to the reference model, with a further improvement when used simultaneously.

Main findings
Our study shows that a simple score, based on drug prescriptions as proxies of chronic conditions, is able to stratify the risk of the general population in terms of short and long term mortality, unplanned hospital admission and readmission. In hospitalized individuals, the performance of the DDCI score was similar to that of the Charlson index. When used in combination with the Charlson index, the DCCI significantly improved the prediction, thus representing an added value even in the presence of clinical information. Since the DCCI score is solely based on drug prescriptions, it allows the risk stratification of entire populations, without the need for clinical data, hardly available at the population level. Comparison with existing data Many clinical risk models are been proposed to predict death or unplanned hospitalization risk 4 , using different approaches: • "threshold modelling" [24,25], that has proven to be more accurate when used within a specific clinical context than within a general population [26][27][28]; • "clinical knowledge", in which physicians may be able to identify current high risk patients. Exportability of these models presents limitations due to problems in standardizing clinical judgements of different physicians in predicting citizens that may become high risk patients [29]; • "predictive modeling", that uses regression models and appears to be more effective than other techniques [4].
There are a variety of predictive tools used for the identification of high risk patients through the integration of clinical, laboratory, functional, socio-familiar, and care variables into statistical predictive models [30][31][32][33][34][35][36][37]. Obtaining all this information in large populations requires an expensive ad-hoc clinical data collection, making this approach unfeasible in many instances. In alternative, administrative health databases can be used to obtain algorithms useful to forecast readmission [38,39] and reduce in-care cost [7,40]. Validated comorbidity indexes have also been applied on hospitalization data to predict the risk of death or readmission [8,9]. These models do not permit to define the out-patient risk profile [35], unless full clinical documentation is available. More complex predictive models were obtained to overcome this limitation through the integration of several data-sources [41][42][43][44], including outpatients, accident and emergency, electronic clinical data from general practitioners [35], socio-economic data [45], and community dispensed prescriptions. Unfortunately, the different data-bases required are not always available and/or standardized. Our data show that a much simpler scoring system, solely based on drug prescriptions, can accurately predict one-year and long-term mortality, as well as the risk of unplanned hospitalization and hospital readmission, thus overcoming many of the limitations of previous predictive instruments. These outcomes are the most important determinant of "frailty" [46,47] and a standardized tool able to stratify the clinical risk profile of patients is a priority in many healthcare settings [48]. Some predictive models derived by prescribed drug registers are been experienced [49]; nevertheless, the drug-based comorbidity scores previously developed address other issues or show some limitations: 1) the small sample [50]; 2) the arbitrary choices of the class of drugs to be included in the model [50,51]; 3) the objective of the tool circumscribed to the prediction of health care costs [52]; 4) the poor ability to predict mortality [53]; 5) the lack of a sufficient spatial and temporal coverage to perform the analysis [50][51][52].

Implications for clinical practice
The DDCI score can be used to help policy planners to identify at the population level those individuals showing a higher likelihood of intensive resource utilization. A focused, proactive approach targeted to these individuals may avoid that their health deteriorate to such a point that they need to be hospitalized, with positive implications also in terms of costs of care. As an example, several studies suggest that focused care after discharge can improve post-discharge outcomes and avoid readmission [54][55][56].
Since hospital admission, particularly early readmissions can be considered to be an indicator of poor quality of care (i.e. unsuccessful discharge processes or inadequate social care [57]), the DDCI score can also be used as a case-mix measure to compare the performance of different structures/health districts. The availability of a tool capable to capture the level of clinical complexity of patients could assist physician in their everyday practice, allowing to make more objective their clinical judgments. The possibility of an uniform assessment of the complexity of the patients could also facilitate the definition of guidelines and care models centered on patients with multiple morbidities and carrying a high risk of unfavorable outcomes.
Moreover, since the last year of life is characterized by high healthcare costs, the identification of individuals at high risk of short-term death can be of great significance to health providers and insurers [58].

Strengths and limitations
The major strength of the DDCI score is its reliance on a single source of administrative data based on the ATC coding system, widely utilized in many countries. For this reasons the DDCI can be applied in all the healthcare contexts in which there is a lack of clinical data; it can be easily applicable at population level, requiring only the availability of data on drug prescriptions.
The study also has limitations. First the application of DDCI at the population level is not generalizable outside of comprehensive health networks or all-payer administrative datasets. Nevertheless, the score can be used in any database containing drug prescriptions (for example general practitioners databases) or within studies to stratify the clinical risk profile of specific cohorts. It can be also used as a case-mix adjustment measure whenever clinical data needed to calculate CCI are not available.
Second, in our cohort, the use of lipid modifying agents and immunosuppressants was associated with a risk of death lower than the reference category. These results could appear counterintuitive. However, the use of lipid lowering drugs was very limited (6.7% of the population). We can thus hypothesize that within the reference group represented by individuals not treated with any of the drug classes considered there were also individuals candidate to lipid lowering treatment but not receiving the drug. This can be responsible for a higher risk of death in the reference category as compared to people treated with statins. As for immunosuppressants, we do not have an obvious explanation. Nevertheless, it should be noted that this class represented less than 0.2% of the whole cohort, making its contribution to the scoring almost irrelevant. Third, it should be emphasized that the scoring system can underestimate the risk in some individuals, due to the lack of information on those drugs, particularly cancer drugs, dispensed at the hospital. Furthermore, it reflects the risk of the Italian population, and its testing in other healthcare systems is warranted.
Finally, DDCI was not externally validated, but the random split in 2 equally large dataset (training and validation set) of a whole regional population of approximately 2 million of people constitutes a reliable methodology for the validation. Anyway, further studies are needed to evaluate the performance of DDCI as compared to CCI on an external outpatients population.

Conclusions
We developed and validated a prognostic index derived from prescription data, able to stratify the entire population into homogeneous risk groups. DDCI can represent a useful tool for riskadjustment and for policy planning, as well as an instrument for the identification of patients needing a focused approach in the everyday practice. Its use can thus help improving the quality of care and optimize resource allocation.