Predictive risk score for unplanned 30-day rehospitalizations in the French universal health care system based on a medico-administrative database

Background Reducing unplanned rehospitalizations is one of the priorities of health care policies in France and other Western countries. An easy-to-use algorithm for identifying patients at higher risk of rehospitalizations would help clinicians prioritize actions and care concerning discharge transitions. Our objective was to develop a predictive unplanned 30-day all-cause rehospitalization risk score based on the French hospital medico-administrative database. Methods This was a retrospective cohort study of all 2015 discharges from acute-care inpatient hospitalizations in a tertiary-care university center comprising four hospitals. The study endpoint was unplanned 30-day all-cause rehospitalization via emergency departments, and we collected sociodemographic, clinical, and hospital characteristics based on hospitalization database computed for reimbursement of fees. We derived a predictive rehospitalization risk score using a split-sample design and multivariate logistic regression, and we compared the discriminative properties with the LACE index risk-score. Result Our analysis included 118,650 hospitalizations, of which 4,127 (3.5%) led to rehospitalizations via emergency departments. Variables independently associated with rehospitalization were age, gender, state-funded medical assistance, as well as disease category and severity, Charlson comorbidity index, hospitalization via emergency departments, length of stay (LOS), and previous hospitalizations 6 months before. The predictive rehospitalization risk score yielded satisfactory discriminant properties (C statistic: 0.74) exceeding the LACE index (0.66). Conclusion Our findings indicate that the possibility of unplanned rehospitalization remains high for some patient characteristics, indicating that targeted interventions could be beneficial for patients at the greatest risk. We developed an easy-to-use predictive rehospitalization risk-score of unplanned 30-day all-cause rehospitalizations with satisfactory discriminant properties. Future works should, however, explore if other data from electronic medical records and other databases could improve the accuracy of our predictive rehospitalization risk score based on medico-administrative data.

south France (Assistance Publique-Hôpitaux de Marseille, APHM). All data were collected from the APHM Hospital database computed for reimbursement of fees including both the PMSI (PMSI-Programme de Médicalisation des Systèmes d'Information) database as well as administrative data. The PMSI is the French medico-administrative database for all hospitalizations based on diagnosis related-groups (DRG) that we can group into significant diagnostic categories.

Study setting and inclusion criteria
The APHM is public tertiary-care center comprising four hospitals (La Timone, La Conception, Sainte-Marguerite, and North), 3,400 beds, and 2,000 physicians. Approximately 300,000 hospitalizations are recorded every year at the APHM, involving approximately 210,000 patients. All acute-care hospitalizations were included in this study. We excluded hospitalizations in ambulatory care unit (i.e., ambulatory surgery, radiotherapy, dialysis, chemotherapy, transfusions) as well as in-hospital mortalities.

Study outcome
The study outcome was unplanned 30-day all-cause rehospitalization, defined as any cause of admission via emergency departments in any acute care wards within 30 days of discharge. To calculate this outcome, the unique and individual identifying variable was used to track rehospitalizations 30 days following discharge. We excluded patients with identification problems in the database. No more than one rehospitalization for each discharge was taken into account. Readmission via the emergency department was employed to identify unplanned rehospitalizations [30].

Collected data
The following data were collected from the PMSI: • socio-demographic characteristics: age, gender, state-funded medical assistance (Aide Médicale d'Etat, AME) (i.e., health cover for undocumented migrants), and free universal health care (Couverture Maladie Universelle, CMU) (i.e., universal health coverage for those not covered by employment/business-based schemes); • clinical characteristics: category of disease based on the 10th revision of the International Statistical Classification of Diseases [31], disease severity and the Charlson comorbidity index based on the algorithm developed by Quan et al. [32]; type of hospitalization (medical, surgical or obstetrical). Disease severity (no or low severity, moderate-high severity or not determined for short hospitalizations) as well as the categories of disease are constituted from the Diagnosic Related Groups issued from the PMSI's algorithm which takes into account age and other comorbidities, medical or chirurgical procedures. This algorithm is available on the ATIH Website [33]. Categories of disease are clusters of distinct DRG. This algorithm is used for all the French hospitals (private, public and university ones) and is reproducible.
• the LACE index: a widely used instrument for predicting the risk of unplanned rehospitalization within 30 days of discharge [34,35]. It is computed from four variables: length of stay (LOS), admission via emergency departments, Charlson comorbidity index [32], and previous admission via emergency departments six months before. Scores range from 0, indicating the lowest risk, to 19, indicating the highest risk; • Hospitalization characteristics: patient origin (home or other hospital institution), hospitalization via emergency departments, LOS, destination after hospital discharge (home or transfer to other hospital institution), hospitalization via emergency departments 6 months before.

Statistical analyses
The unit of analysis was the hospitalization. Descriptive analyses for the socio-demographic, clinical, and hospitalization data were expressed as frequencies and percentages. Chi-squared tests were employed to compare socio-demographic, clinical, and hospitalization data between unplanned 30-day all-cause rehospitalized and non-rehospitalized patients. Multivariate logistic regression was then performed to identify variables potentially associated with unplanned 30-day all-cause rehospitalization, after adjusting for confounding factors. Variables relevant to the model were selected based on a threshold p-value (�0.2) in the univariate analysis and had to be non collinear with other variables introduced in the model. Odds ratios (OR) with confidence intervals (CI) were calculated. Based on the beta coefficients issued from the multivariate logistic regression, we developed a 0-to-100 point score for rehospitalization risk prediction using a regression-coefficient-based scoring method [36,37]. The number of points assigned to each modality equaled its regression (beta) coefficient multiplied by 100 and divided by the highest score of rehospitalization (corresponding to the sum of the highest beta coefficient of each variable). We then calculated each patient's final score by totaling their points. The area under the receiver operating characteristics curve (AUC under ROC) was derived to evaluate this risk score's capacity of discriminating between rehospitalized and non-rehospitalized patients. The AUC ranges from 0.5 to 0.99, with higher values signifying higher model discrimination. The AUC of the predictive rehospitalization risk-score was then compared to the AUC of the LACE index score [38], computed on the same database. To compare these two AUCs, we used Chi-square statistic developed by Gönen [39].
In addition to the AUC metric, we used other metrics based on a threshold value determined as the best trade-off between sensitivity and specificity. For this cutoff, we provide the following metrics for each score (i.e., the predictive rehospitalization risk-score and the LACE index): sensitivity, specificity, the Youden Index, the accuracy and the F1 score: Then, we assessed the robustness of the predictive rehospitalization risk score. Following the methodology proposed by Tuffery [37], the data were split into two samples: a training sample including 2/3 of the data and a test sample including the remaining 1/3. Using the training datasets, risk scores were built for each independently-associated factors of 30-day allcause rehospitalization, previously determined and using the multivariate logistic regression. The AUCs obtained for the training and testing datasets were then compared. The model was considered robust if the AUCs between the testing and training datasets were similar.
Lastly, we computed the 30-day rehospitalization rates to each class (10 by classes) of the predictive risk score and test the association using the Chi-square test.
Statistical significance was defined as p <0.05. The statistical analyses were performed with Statistical Analysis Software (SAS), Version 9.4 (SAS Institute).

Ethics and consent to participate
The datasets generated and/or analyzed during the current study are issued from the Assistance Publique des Hôpitaux de Marseille (APHM) hospitalization database computed for reimbursement of fees. Patients are informed by the hospital that their data may be analyzed for research purpose, consequently respecting the French law for research that does not require explicit or written consent of the patient. No additional data out of this database has been computed for the study. The use of such database is governed by a local ethic committee (named CIL-APHM) and declared under the following authorization numbers 1305855 for medical data and 2012-1 for administrative ones. According to the French law, there is no need to ask to another relevant ethical review board or relevant regulatory body.

Rates of unplanned 30-day all-cause rehospitalization
A total of 289,358 hospitalizations (112,662 patients) were recorded in 2015 in this French University Hospital. After excluding mortalities and hospitalization in ambulatory hospitalizations care unit (ambulatory surgery, radiotherapy, dialysis, chemotherapy, transfusions), 118,650 hospitalizations (82,862 distinct patients) were included. The most common diseases were digestive disease, nervous system conditions, and cardiovascular and pulmonological diseases. We excluded 4 patients with identification problems in the database. In total, 4,127 (3,294 distinct patients) hospitalizations resulted in rehospitalizations via emergency departments 30 days after discharge (30-day re-hospitalization rate equal to 3.5%). Thirty-days rehospitalization rates according to socio-demographic, clinical, and hospitalization characteristics are presented in Table 1.

Factors associated with rehospitalizations
The univariate and multivariate analyses results are presented in Tables 1 and 2. Overall, the multivariate analysis confirmed the findings of the univariate analysis, except for patients who returned home being at higher risk of rehospitalization compared to those discharged to other hospitals or institutions. The following variables were found to be independently associated with rehospitalization: age, gender, state-funded medical assistance, as well as disease category and severity, Charlson comorbidity index, hospitalization via emergency departments, LOS, and previous hospitalizations 6 months before. The type of hospitalization (medical, surgical or obstetrical) was not introduced into the multivariate model due to colinearity with the disease category. The Charlson comorbidity types are provided in S1

Development and performance of the predictive rehospitalization risk score
The scores for the predictive rehospitalization risk calculation are presented in Table 2. The characteristics accounting for the highest risks of rehospitalization were some disease categories, such as newborn and perinatal diseases (+27 points), toxicology (+26 points), pulmonology (+26 points), psychiatry (+25 points), chronic pain and palliative care (+24 points), and digestive diseases (+24 points). Previous hospitalization via emergency departments at least 6 months before was also a crucial factor involved with being rehospitalized (+20 points), as was simply being hospitalized via emergency departments (+15 points). Concerning socio-demographic factors, patients who benefited from state-funded medical assistance were at higher risk of being rehospitalized (+12 points). The ROC curves of the predictive rehospitalization risk score and LACE index score are presented in Fig 1. The predictive rehospitalization risk-score yielded a better AUC than that For the rehospitalization risk score, the best trade-off between sensitivity and specificity corresponds to a probability of 0.03 (score near 42) and yield a sensitivity equal to 0.65, a specificity = 0.70, a Youden score = 0.35, an accuracy = 0.69 and a F1-score = 0.13. For the Lace score, the best trade-off between sensitivity and specificity corresponds to a probability of 0.03 (score near 6) and yield a sensitivity equal to 0.63, a specificity = 0.60, a Youden score = 0.23, an accuracy = 0.60 and a F1-score = 0.10. Moreover, we confirmed the accuracy of the predictive rehospitalization risk score given that 30-day rehospitalization rate increased with the predictive risk score (10-by classes), as shown in Fig 2 (p <0.0001).
The robustness of the predictive rehospitalization risk score was confirmed with similar AUCs generated for the learning (0.74) and testing (0.73) datasets.

Discussion
The principal findings of this study can be summarized as follows. In a large sample of acutecare inpatients (82,862 patients and 118,650 hospitalizations), the rate of unplanned 30-day all-cause rehospitalization in four French university hospitals proved to be low (3.5%). Several factors predicted these rehospitalizations (i.e., age, gender, state-funded medical assistance, disease category and severity, Charlson comorbidity index, hospitalization via emergency departments, LOS, and previous hospitalization 6 months before), which could be targeted in a French national rehospitalization reduction program. Finally, we developed and internally validated an easy-to-use predictive rehospitalization risk score of unplanned 30-day all-cause rehospitalization with satisfactory discriminatory properties that can help physicians identify patients at high risk then propose adapted transitional care interventions.
The 3.5% unplanned rehospitalization rate in our study appears substantially lower than that of studies performed in other countries, even though such comparisons should be interpreted with caution due to differences in methodology (e.g., definition of unplanned rehospitalizations), and given that population studies commonly focus on particular conditions (e.g., older people, heart failure, diabetes [13,15,40]). In the few studies performed on all-cause unplanned rehospitalizations, the rates were always higher (i.e., 5.2% [41], 8.5% [19], 16.7% [42], 17.6% [3]) than in our study. We cannot exclude that patients could have been rehospitalized in another hospital and that our rate is underestimated. However, it is likely that this represents a few proportion of patients. In a study led in Switzerland, Halfon et al. [41] found that 17% of the avoidable readmissions were on a different hospital. This low unplanned 30-day all-cause rehospitalization rate may also be explained by the French universal insurance coverage since, according to Gusmano et al. [26], inadequate insurance coverage may result in more severe illness and consequently more hospitalizations. Such a finding has been reported in a recent study carried out in the elderly where the rehospitalization rate was 20% in the USA. vs. 15% in France [4]. The authors hypothesized that this discrepancy between countries was probably due to a combination of better access to primary care and longer average length of inpatient stay in France [4]. It is interesting to note that the French rehospitalization rate remains low despite a recognized lack of coordination between hospitals and primary care, in addition to a lack of preparation of discharge from the hospital in France, two factors known to be associated with rehospitalizations [16,43]. This suggests that insurance coverage may be an important factor in controlling rehospitalization that should be kept in mind in health policies in addition to more targeted interventions (e.g., the development of safety-net institutions to improve access to primary care, interventions for improving coordination of care and discharge planning, involvement of patients and caregivers in discharge). In addition, French hospitals are under pressure to make cost savings and, reducing LOS is strongly advocated. Although the average LOS has decreased substantially over the years in France, there is still pressure to pursue reductions. Future studies should thus explore the consequences of this health policy, and in particular its impact on rehospitalization and more generally on quality of care.
Despite this low unplanned rehospitalization rate, our findings also indicate that the possibility of unplanned rehospitalization remains high for patients with certain characteristics, suggesting it could be beneficial to target interventions for patients at the greatest risk. The association between older age and rehospitalization had already been found in previous studies [4,15,16,44,45], confirming the frailty of the elderly at discharge and the need to develop specific care transition interventions for them (e.g., comprehensive inpatient geriatric health care assessment followed by ongoing multidisciplinary support after discharge, plus involvement of the patients and their caregivers) [9,46,47]. Furthermore, men were more often rehospitalized than women. Gender differences in the use of ambulatory care, higher in women compared to men, has previously been described [26,48,49], and is a complex phenomenon involving differences in illness severity along with social and cultural specificities which should be further explored in France to provide equal care for both men and women. State-funded medical assistance was associated with higher rehospitalization. This finding is not surprising, as recent works have already reported that undocumented migrant patients had high levels of chronic illness and low consultation rates to physicians in France [50,51]. In addition, this state medical assistance is underused, accessible to only 10.2% of undocumented migrant patients [52]. It should probably be improved and incorporated with France's free universal health care, despite the current unfavorable political climate, in order to improve access to healthcare for migrants and reduce their level of rehospitalization. As in previous studies, several categories of disease, illness severity, and higher Charlson comorbidity indices were associated with higher readmission rates [42,53,54]. The strongest association concerned newborns and perinatal diseases, toxicology, pulmonology, psychiatry, chronic pain and palliative care, and digestive diseases. These particular conditions must therefore be prioritized to reduce rehospitalization.
Prior hospitalizations, especially via emergency departments, long LOS, and hospitalizations conducted via emergency departments were important predictors of unplanned rehospitalization. These factors may account for the total burden of illness, functional status, and social environment [19,34,55,56], causing more frequent rehospitalization. Importantly, short LOS was also associated with rehospitalization, confirming that current financial injunctions, particularly those that incite the development of ambulatory care, should be accompanied by appropriate reorganization of care processes to avoid detrimental effects on quality of care [57]. Lastly, return to home was associated with more rehospitalizations, thus confirming the necessity for clinicians to better prepare discharges, check the availability of home-based services, and carefully plan the transition of care. This is clearly a weak point of the FHS [16,43].
Lastly, our final aim was to propose a predictive rehospitalization risk score and, to our knowledge, this was the first unplanned 30-day rehospitalization risk model to use an understanding set of factors sourced directly from the French hospital medico-administrative database. This score is easy to use, accurate in predicting the risk of rehospitalization via emergency departments, it already presents higher discriminative properties than the LACE score (c statistics = 0.74 vs. 0.66 for the LACE index), despite being recommended by the French Health Authority. The French policy is mainly based on the publication of guidelines by the French Health Authority (HAS) [2], which recommends identifying patients at risk using the LACE index [34] or 8Ps risk assessment tools used in the BOOST (Better Outcomes for Older adults through Safe Transitions) program involving 11 hospitals in the USA [58]. Despite the interest of these two instruments, they have not yet been rigorously validated in France. The LACE index presented poor discriminative ability starting from its construction [18,34] and the authors themselves later improved it (AUC for 30-day urgent readmission between 0.743 and 0.753, depending on the inclusion of case mix groups) by adding other covariates closer than those used in our model but validated on the Ontario administrative database [35]. The 8Ps check list issued from the BOOST could be cumbersome in routine practice if performed for every hospitalized patient since it requires physicians to identify and address each of these factors then propose an appropriate intervention [59]. An important advantage of our predictive rehospitalization risk score is that it does not require additional completion by physicians to that already required for the PMSI; consequently, even if analyzing such a medico-administrative database may require computational aid, it is important to use the data already available and not to increase doctors' medico-administrative work burdens [60].

Perspectives and limitations
Our findings must be interpreted in the context of our study's limitations.
Despite the large overall sample size of this multi-center study, our findings may not be applicable to all French hospitals, particularly concerning general hospitals whose patients offer potentially different characteristics from those of university hospitals. In addition, the four university hospitals included in our study were located in only one geographical area, even though social and healthcare geographical characteristics (e.g., poverty, density of physicians, number of beds, and private hospitals) are known to influence to the risk of rehospitalization [4,20,40,61]. Future studies should thus be conducted in different categories of hospitals and several geographical areas to confirm the properties and interest of our predictive riskscore. An external validation in addition to the internal validation performed in this study will guaranty the generalizability of this score.
Our model does not take into account deaths outside the hospital since we do not dispose of this information in our database. Other studies with available data on outpatient events are needed to investigate to what extent this could impact our predictive risk score using a competing risk model as an example.
In this study, we excluded from the analyses the ambulatory surgery. However, this specific topic should be better studied in the French context, strongly marked by pressures for reducing length of stay and consequently cost of care.
Even if our predictive risk-score led to better AUC score than other scores already performed, other and more advanced methods like machine learning are advocated to investigate whether they can give better predictive power than our score derived from the logistic regression. The advantages of these methods is their ability to use more information that cannot be used with classical statistical methods such as logistic regressions, and in particular textual information in electronic patient records. Future works should use all the data relevant in hospital databases and these new methods to improve the level of prediction.
A substantial amount of data (e.g., polypharmacy, socio-economic status, and self-reported functional status) has been reported as predictive of rehospitalization [9,20,[61][62][63], though it is not currently available in the French PMSI database. Future works should explore how to systematically collect this data in the other available databases in hospitals (e.g., electronic medical records) and if such data could improve the accuracy of our predictive rehospitalization risk score based on the PMSI database.
Another perspective of our research would be to disentangle rehospitalizations for a previously known affection from those for other and unknown affection, and thus to better precise the part of avoidable hospitalizations. This difference has been explored by Halfon et al. [41] who developed the SQLape [64] algorithm to identify avoidable hospitalizations based on specific diagnosis and specific interventions.
Lastly, the majority of predictive risk scores are based on data at discharge while they should ideally give information early enough during the hospitalization to trigger transitional care intervention [20]. While these instruments based on discharge data have been proven to lead to greater models with better performance [20,65] than models based solely on admission data, some authors have argued that this improvement was limited [65]. To address this duality, an interesting perspective study would be to implement real-time predictive rehospitalization risk scores during hospitalization, updated for all new available data, then to propose early alerts for high risk of rehospitalization. Recent works reported that machine learning methods can be used in real-time predictions using routinely collected clinical data exclusively, without the need for any manual processing [66].

Conclusion
The 3.5% unplanned rehospitalization rate was substantially lower in our study than that of studies performed in other countries, suggesting that universal insurance coverage may be a key factor for controlling rehospitalization. Despite this low unplanned rehospitalization rate, our findings likewise indicate that the possibility of unplanned rehospitalization remains high for patients with certain characteristics, suggesting the interest of proposing targeted interventions for patients at the greatest risk. We also developed an easy-to-use predictive rehospitalization risk-score of unplanned 30-day all-cause rehospitalizations with satisfactory discriminant properties. Future works should, however, explore if other data available in electronic medical records and other databases could improve the accuracy of our predictive rehospitalization risk score based on medico-administrative data. Finally, further research is required to determine whether such quantification risk modifies in fine in real-life patient care and outcomes.
Supporting information S1