Effects of pain, sedation and delirium monitoring on clinical and economic outcome: A retrospective study

Background Significant improvements in clinical outcome can be achieved by implementing effective strategies to optimise pain management, reduce sedative exposure, and prevent and treat delirium in ICU patients. One important strategy is the monitoring of pain, agitation and delirium (PAD bundle). We hypothesised that there is no sufficient financial benefit to implement a monitoring strategy in a Diagnosis Related Group (DRG)-based reimbursement system, therefore we expected better clinical and decreased economic outcome for monitored patients. Methods This is a retrospective observational study using routinely collected data. We used univariate and multiple linear analysis, machine-learning analysis and a novel correlation statistic (maximal information coefficient) to explore the association between monitoring adherence and resulting clinical and economic outcome. For univariate analysis we split patients in an adherence achieved and an adherence non-achieved group. Results In total 1,323 adult patients from two campuses of a German tertiary medical centre, who spent at least one day in the ICU between admission and discharge between 1. January 2016 and 31. December 2016. Adherence to PAD monitoring was associated with shorter hospital LoS (e.g. pain monitoring 13 vs. 10 days; p<0.001), ICU LoS, duration of mechanical ventilation shown by univariate analysis. Despite the improved clinical outcome, adherence to PAD elements was associated with a decreased case mix per day and profit per day shown by univariate analysis. Multiple linear analysis did not confirm these results. PAD monitoring is important for clinical as well as economic outcome and predicted case mix better than severity of illness shown by machine learning analysis. Conclusion Adherence to PAD bundles is also important for clinical as well as economic outcome. It is associated with improved clinical and worse economic outcome in comparison to non-adherence in univariate analysis but not confirmed by multiple linear analysis. Trial registration clinicaltrials.gov NCT02265263, Registered 15 October 2014.

incentivisation of these measures by the reimbursement system. We therefore hypothesise that although adherence to the PAD bundle is linked to improved clinical outcome, there is no sufficient financial benefit to implementing these methods in a Diagnosis Related Group (DRG)based reimbursement system. This absence of a financial benefit is in turn associated with a worse economic outcome for the hospital, which means lower daily revenues (case mix) and profits.

Material and methods
The institutional review board ("Ethics Committee of Charité-Universitätsmedizin Berlin") approved the analysis and waived informed consent (EA 2/092/14). We accessed the date between June 2017 and August 2018. We allocated an alias to data immediately after export. The treatment range was between January 2016 and December 2016. All data come from our hospital.
We set up a retrospective cohort study analysing both clinical and economic outcome. In addition to classical statistical approaches, we used a machine-learning algorithm (Boruta) and the maximal information coefficient (MIC) to analyse the importance (Boruta) and strength (MIC) of effects of monitoring adherence on clinical and economic outcome.
We used routinely collected data and included DRG-invoiced ICU patients who were admitted to and discharged from one eight of the centre's ICUs in 2016 and who did not receive PAD monitoring as part of a clinical trial. We excluded non adults, re-admissions to more than one ICU ward, patients without a documented day between admission and discharge, deceased patients and cases with no possible CAM-ICU monitoring (see explanation on the main predictor variables).
The study has been registered at ClinicalTrials.gov, number NCT 02265263. The local ethics committee approved the analysis and waived informed consent (EA 2/092/14).

Data sources
Routine clinical data were acquired from the two electronic patient data management systems used at the hospital (COPRA, Berlin, Germany and SAP, Walldorf, Germany).

Measures
Main predictor variables. The main predictor variables were adherence to pain, sedation and delirium monitoring. Pain, sedation and delirium monitoring were aligned in an algorithm: Starting with sedation monitoring, all patients with a Richmond Agitation Sedation Scale (RASS) of −3 or greater were monitored for delirium with the Confusion Assessment Method for the Intensive Care Unit (CAM-ICU) [26]. Patients with negative CAM-ICU results were assessed for pain using the Visual Agitation Scale (VAS) [27]. In case of a positive CAM-ICU, or if screening for sedation revealed a RASS of −4 or less, patients were screened for pain using the Behavioural Pain Scale (BPS) [28].
The adherence to pain, sedation and delirium monitoring for a patient was calculated as follows: the number of "adhered shifts" was divided by the total number of ICU shifts in which the patient was treated. Shifts on the day of admission and the day of discharge were not considered. Accordingly, the adherence could assume a value between 0 and 100%. A patient's ICU shift was rated as "adhered" if the patient was monitored at least once per shift. Patients with an Adherence of 100% were classified as "achieved" for univariate analysis. If the patient received no delirium monitoring, the shift was only rated as adhered when there was no RASS assessment or CAM-ICU was not possible. Patients for whom CAM-ICU monitoring was not possible were not considered in the analysis of CAM-ICU monitoring adherence. We evaluated pain, sedation and delirium separately, not as bundle. Outcome variables. The clinical outcome variables were hospital LoS, ICU LoS and duration of mechanical ventilation. The economic variables included case mix per day and profit per day because of their incentive effect for hospitals. Profit and turnover or case mix influence the management of a hospital: Profit per day addresses two points, a day view and the difference between turnover and costs. Turnover is a variable concerning the market share of a hospital. Therefore, we use profit per day and not total costs or costs per day.

PLOS ONE
The case mix (measured in case mix points) was derived from Diagnosis Related Groups (DRG). These points multiplied with a base rate (measured in EUR) are the substantial part of hospital revenue for a hospital case. Profit per day was calculated by case mix multiplicated with baserate plus other receipts minus the case costs documented for the German nationwide institute of hospital revenue and costs calculations (InEK: Institut für das Entgeltsystem im Krankenhaus).
Covariates. The control variables for the multiple linear analysis and machine-learning algorithm were determined a priori based on available literature and clinical experience. They included age and the Acute Physiology and Chronic Health Evaluation II (APACHE II) scale score.

Analysis
We used both a classical statistical approach and a machine-learning analysis approach (the Boruta algorithm with MIC). While the classical approach uses all available data to explain or best describe particular linear associations and correlations, the use of cross-validation via a non-linear machine learning method allowed us to identify robust predictor variables for our outcome measures that are potentially non-linearly related to the latter. Cross-validation is the process of training the model on a subset of the data and then allowing it to assess the remainder of the cases in the dataset. This reduces the chance of model overfitting, e.g., capturing spurious correlations. In short, we supplemented the classical statistical analyses by alternative approaches for the identification of variables that are non-linearly related to our outcome and that are likely to also be predictive in new patient populations, i.e., that are likely to generalise.
For all classical statistical analyses, we used SPSS Version 24.0.0.0 (IBM SPSS Statistics). For univariate analysis, we split patients into groups. To differentiate between the monitored (monitoring adherence achieved) and not-monitored (monitoring adherence not achieved) patients, we set a 100% adherence quote. To differentiate between disease severity we used APACHE < = 10-group, APACHE 11-20-group and APACHE >20-group. For testing the association between monitoring adherence and outcome, we also used linear regression models.
For the machine-learning analysis, we used RStudio version 1.1.419 (R Foundation for Statistical Computing), and the "Boruta" package (for details, see [29]). Boruta is a random-forest-based method of feature selection. A random forest [30] is an ensemble model that constructs a multitude (often thousands) of decision trees based on the data and then makes a committee prediction (e.g., the algorithm predicts whatever the majority of the individual decision trees predict). These models can capture non-linear and non-monotonic relationships between the input variables and outcome criteria that linear models would not be able to capture. The Boruta algorithm builds these random forests from the dataset. It then randomly shuffles the values of each variable one by one to test whether the forest's classification performance declines when a variable's potential statistical relationship with the criterion is eliminated by this process of value randomisation. This method is also known as permutationbased variable importance.
The Boruta algorithm only provides the relative strength of association of the different input variables; therefore, the association strength cannot be easily compared across the different outcome criteria. To identify the strength of the associations, we used the maximal information coefficient (MIC) [31], a measure of information entropy that, like random forests, is not limited to specific types of functions (linear, non-linear, or non-monotone). MIC values can range from 0 to 1 and tend to be similar to R 2 in size and interpretation.

Results
We included 1323 patients for pain and agitation/sedation monitoring and 1266 patients for delirium monitoring (Fig 1). The groups differed because some patients had insufficient RASS data: For example, a patient with a RASS of -3 or less during their hospital stay could not be monitored for delirium (see the explanation of the main predictor variables).
The median age of the admitted patients was 68 years, 84.4% received surgery, and 60% were male. The median ICU LoS was 4 days, while the median hospital LoS was 12 days, and the median duration of mechanical ventilation was 32 hours. The median monitoring adherence was 91.7% for pain, 90.4% for agitation/sedation and 100% for delirium ( Table 1). The median economic outcomes were 0.44 case mix per day and 11.05 EUR profit per day. The median APACHE II score at admission was 14.
Relating monitoring and outcome a) Two-group-analyses. For pain monitoring adherence (Tables 2 and 3), in most cases, we observed statistically significant improvement of clinical outcome and worse economic outcome for all patients and for each disease severity patient group. Exceptions were case mix per day for the APACHE>20 group, profit per day for the APACHE < = 10 group and the APACHE 11-20 group and profit per day for the APACHE>20 group. All exceptions were not significant.
For sedation monitoring adherence (Tables 2 and 3), in most cases, we observed statistically significant improvement of clinical outcome and worse economic outcome for all patients and for each disease severity patient group. Exceptions were hospital LoS for the APACHE < = 10-group, the ICU LoS for the APACHE < = 10 group, profit per day for the APACHE 11-20 groups and profit per day for the APACHE>20 group. All exceptions were not significant.
For delirium monitoring adherence (Tables 2 and 3), in most cases, we observed significantly better clinical outcome and worse economic outcome for all patients and for each disease severity patient group. Exceptions were hospital LoS for the APACHE < = 10 group, case mix per day for the APACHE >20 group, profit per day for the APACHE < = 10 and APACHE 11-20 group and profit per day for the APACHE >20 group. All exceptions were not significant. b) Multiple linear regression. Multiple linear regression found different associations (Table 4). An increase of APACHE II score was associated with an increased hospital LoS. An increase of sedation monitoring adherence and APACHE II score was associated with an increase in days of ICU LoS. An increase of pain monitoring adherence was associated with a decreased ICU LoS. An increase of sedation monitoring adherence and APACHE score was associated with an increase in duration of MV. An increased age was associated with a decreased duration of MV. An increase of pain monitoring adherence, age and APACHE score was associated with an increased case mix per day. An increase in delirium monitoring adherence was associated with a decreased case mix per day. An increase of pain monitoring adherence, age and APACHE score was associated with an increased profit per day. Adjusted R-square was very low for all multiple linear regression.
Predicting outcome based in monitoring adherence a) Relative predictability. The machine-learning analysis showed that pain monitoring adherence was the most important predictor of clinical outcome (hospital LoS, ICU LoS, duration of MV) and case mix per day (see Boruta in Fig 2). Furthermore, sedation monitoring was more important to clinical outcome and case mix per day than the APACHE II score. The APACHE II was more important to profit per day than the other variables. However, delirium monitoring was important to clinical and economic outcome and was more important than the APACHE II score for hospital LoS, ICU LoS and case mix per day, but not for duration of

PLOS ONE
Effects of pain, agitation and delirium monitoring on clinical and economic outcome: A retrospective study MV. Delirium was further not predictive of profit per day. Age was not important for hospital LoS, ICU LoS or case mix per day. Age was important for the duration of MV but was less important than PAD monitoring and APACHE II score. Age was also important for profit per day but less important than the APACHE II and pain monitoring and agitation monitoring. b) Associations between monitoring adherence and outcome. The strongest associations were shown between PAD monitoring and ICU LoS, with MICs of 0.47, 0.51 and 0.27,

PLOS ONE
Effects of pain, agitation and delirium monitoring on clinical and economic outcome: A retrospective study    respectively (see MIC in Fig 2). A strength of association higher than 0.2 was found between pain and sedation management and the duration of MV (see MIC in Fig 2). The strength of the association between pain, sedation and delirium monitoring and hospital LoS were all lower than 0.15. The strength of the association between pain, agitation/sedation and delirium monitoring and case mix per day were all lower than 0.14.

Discussion
We revealed that adherence to PAD bundles was associated with improved clinical outcome (hospital LoS, ICU LoS, duration of MV) but worse economic outcome (case mix per day, profit per day). Unfortunately, the result could not be confirmed in a multiple linear regression. A cause for the less pronounced effect in our cohort could be the overall high adherence and, thus, a smaller effect size. In addition to the classical statistical evaluation, we used the machine-learning algorithm Boruta and a novel statistic (MIC). The Boruta algorithm revealed that monitoring adherence was important for clinical and economic outcome. Pain monitoring was the most important predictor of clinical outcome and case mix per day. The analysis showed that sedation and delirium monitoring are less important than pain monitoring but in most cases are more important than APACHE II score and age. While age was irrelevant, Boruta showed that the APACHE II score is an important predictor of clinical and economic outcome. This matches studies showing that the APACHE II score can predict hospital and ICU LoS as well as mortality rates [32].
While the Boruta algorithm only describes the relative strength of the associations of different input variables, MIC analysis identifies the strength of the individual associations. The strongest associations were shown between pain, agitation/sedation and delirium monitoring and ICU LoS.
Our clinical data are in line with most previous studies showing that adherence to the PAD bundle is independently associated with improved clinical outcome. For example, Luetz and colleagues found an independent association between delirium monitoring and in-hospital mortality for ventilated patients, Mansouri et al. found a substantial reduction in the duration of MV, ICU LoS, and mortality through protocol-directed PAD management and Dale et al. found decreases in delirium, duration of MV as well as ICU and hospital LoS [33][34][35].
Despite the favourable clinical outcome, the economic outcome within the German DRG system was associated with a decrease in both case mix per day and profit per day shown by univariate analysis but not conformed by multiple linear analysis. But it has to be considered, adjusted R-square was very low for all multiple linear regression. The focus of implementation of a DRG system was primarily keeping the quality on the existing level and reining back the costs. The focus on better quality and a change of incentive structure is an actual trend (see below).
The relevance of DRG for Germany is more important than in other health systems: While in other countries, the rate is much lower (40%-50%), in Germany, 80% of hospital finances are covered by DRG [36]. There were no surcharges for fixed costs and quality until 2020, due to a DRG system. Although there are no studies showing false incentives by DRG system, there is a trend concerning the implementation of surcharges for quality and financing of some fixed costs by surcharges. For example, Germany implemented a surcharge for nursing expenses since 2020. A recent study showed that the type and amount of reimbursement has a strong influence on the chosen treatment strategy [37]. This could be a way to optimise incentives in a DRG dominated system.

Strengths and limitations of this study
Our study has several strengths. This is the first study to use machine-learning analyses to examine the importance of monitoring in terms of both clinical and economic outcome. Furthermore, this study is the first to evaluate the economic effects of all PAD bundle monitoring values in a DRG system by case mix and profit per day.
Our study has several limitations. The first limitation of our results is that we used routine data from our clinical systems. Because of permanent validation procedures, the quality of the data can be assumed to be high, but incorrect entries in individual datasets cannot be excluded.
A further limitation is seen by measuring economic outcome only with turnover (casemix) and profit per day. We did not separately analysed the total costs. With shorter ICU and hospital LoS the total costs are probably lower than with regular LoS. Our economic view based just on the figures with an (economic) incentive for hospital (see explanation on outcome variables).
Additionally, case mix is a highly context-sensitive system. Hence, the results might not be applicable to different international health care systems. Additionally, the German national system is constantly changing, and the results for years other than 2016 could be different.
The economic analysis is only for Germany. A predication for other health systems is not possible because of the different health reimbursement systems. Many countries have a DRG system but using own databases and cost accounting guidelines.
The above cited study regarding APACHE and ICU LoS used the APACHE IV, while our hospital uses the APACHE II [32]. Furthermore, we assume in our economic analysis that reductions in LoS could allow additional patients to be treated. That is the reason for using case mix and profit per day and not per case. Factors other than the PAD bundle certainly also influenced clinical and economic outcome, but these factors could not be quantified.
A further limitation is that we treated anything less than full adherence as non-adherence in univariate analysis. The reason for this choice was the high implementation rates at our hospital. The national guideline recommends that monitoring should occur during a minimum of 70% of all shifts [17]. The 70% threshold is known from other contexts (antibiotics stewardship) to be an effective margin for reaching a significant effect by implementing standard operating procedures [38]. However, greater implementation still has effects, as our clinical data show. Using a lower percentage threshold would decrease the effect size and is therefore the more conservative approach. Our results might be a reason to conduct further studies concerning the need for higher target values.
Adherence to monitoring and analysis of effects depends on the guideline used by the institution because there are often differences in the details of different guidelines; e.g., the Spanish guideline requires sedation monitoring every six hours only for mechanically ventilated patients and gives a target value of 95% [39]. The Germany national guideline requires sedation monitoring every eight hours for every patient and gives a standard of 70% [17]. However, independent of the individual recommendations of a guideline, all advice concerning PAD management increases the sensitivity of employees in intensive care and helps to increase quality of care.
There is a need to conduct prospective studies on the topic of PAD monitoring adherence to validate our results and focus more closely on economic outcome to improve incentives for quality in a DRG-based system. Further studies should also aim at cohorts with larger differences regarding PADs adherence for confirmatory analysis.

Conclusion
Adherence to PAD bundle is important for clinical as well as economic outcome. It is associated with improved clinical and worse economic outcome in comparison to non-adherence in univariate analysis but not confirmed by multiple linear analysis.