SafeNET: Initial development and validation of a real-time tool for predicting mortality risk at the time of hospital transfer to a higher level of care

Background Processes for transferring patients to higher acuity facilities lack a standardized approach to prognostication, increasing the risk for low value care that imposes significant burdens on patients and their families with unclear benefits. We sought to develop a rapid and feasible tool for predicting mortality using variables readily available at the time of hospital transfer.

Introduction Each year, nearly 1.6 million patients are transferred to referral centers accounting for as much as 3.5% of all inpatient admissions [1]. Many transferred patients are critically ill and high-risk who require facilities equipped to provide specialized services for their complex needs [2]. However, securing these services often requires travelling burdensome distances away from patients' homes and communities of support. Furthermore, these patients often arrive with poorly defined goals of care and experience unfavorable outcomes including higher rates of mortality [3][4][5][6]. The challenge is compounded by the fact that the transferred patients and families frequently do not understand the severity of illness, leading to unrealistic expectations and a potentially false sense of hope [7]. These circumstances impose significant burdens with unclear benefits, thus increasing the chance of rendering low value care [8].
Current practice leaves risk assessment to the ad-hoc judgment of bedside clinicians, suggesting an opportunity for better care coordination by more systematic procedures that should inform shared decisions about transferring critically ill patients or preparing the referral center to manage patients in line with patient preferences. This gap in coordination of patient care is largely due to the paucity of real-time tools to rapidly risk-stratify patients at the time of transfer. There are several mortality prediction tools used in ICU and other hospital settings. For example, the Acute Physiology, Age, Chronic Health Evaluation (APACHE) [9] and Sequential Organ Failure Assessment (SOFA) scores [10] are validated mortality predictors in the ICU, but because these tools are not feasibly implemented at the point of care, they have limited utility. Available admission mortality predictors have variable degrees of performance, and some are adapted for specific patient populations (e.g. surgical patients, geriatric trauma patients), limiting their utility for disease-agnostic acute settings [11][12][13][14][15][16]. Of the tools available, the 3-item quick SOFA (qSOFA) is the most feasible bedside mortality predictor with its rapid assessment in real-time. To date, it has been predominately tested in patients with infection and has shown modest discrimination and unclear predictive power [17]. Taken together, existing tools either fall short or are understudied in regard to reliable prediction of recovery from an acute episode in high-risk patients in real-time.
Based on the limitations of existing tools, the clinical leadership of a large, multi-hospital healthcare system defined a pressing need for a novel tool that could identify high-risk patents at the time and point of transfer. The overarching goal of this quality improvement project was to provide needed information to front line physicians to inform shared decisions about the highest risk patients, directing additional resources to these patients to ensure that the plan of care was consistent with their values and goals. In this study, we describe how we use guided machine learning to develop and validate a tool called "SafeNET" (Safe Nonelective Emergent Transfers) that predicts expected mortality at the time of transfer based on variables available to bedside clinicians. We compared SafeNET's predictive capabilities with qSOFA, the only published tool to our knowledge suitable to ascertain mortality risk in this transfer patient population.

Setting
This study was conducted at the University of Pittsburgh Medical Center (UPMC), a large multiple-hospital integrated healthcare system. Work was approved by UPMC's Quality Review Committee as a quality improvement project (QRC #2040). Data are reported according to SQUIRE 2.0 standards for quality improvement reporting excellence [18].

Population for model development
We developed the SafeNET mortality risk tool using retrospective data from patients aged 18 or older who were transferred to a UPMC hospital during a 12-month period. Transfer status was derived from the admission source on billing data indicating that the patient was transferred from a hospital, hospice, skilled nursing, or other health care facility with an admission priority of direct emergency admit. For patients with multiple inpatient stays during this time period, one record was randomly selected. Patients were excluded if their discharge disposition indicated they were discharged against medical advice, eloped, or had an unknown destination.

Independent and dependent variables
Tool development began with a focused review of available literature on mortality risk assessment models currently used in ICU and admission settings. This literature review yielded 8 relevant articles describing 7 tools predicting mortality [10][11][12][13][14][15][16][17]. These included a tool developed for use in emergency department triage in Vietnam [11], the Early Warning Score [12,13], Simple Clinical Score [14], Rapid Emergency Medicine Score [16], Worthing Physiologic Score [15], SOFA score [10], and quick SOFA Score [17]. From these validated risk models, we constructed an extensive list of 70 independent variables used in one or more of these models including demographics, vital signs, lab tests, functional status, comorbidities, therapeutic maneuvers (e.g., respiratory support or blood product transfusion) and code status (S1 Table). We then queried UPMC billing data and the inpatient electronic health record (EHR) to determine if these variables were recorded by the receiving facility, focusing only on values recorded within 3 hours of transfer to most closely approximate the patient's condition at the time of transfer. After reviewing which variables were reliably available within this timeframe, we retained 54 of the 70 variables. If more than one value was recorded during this time, we selected the value closest to the time of admission. Functional status was assessed with the Boston University Activity Measure for Post-Acute Care (AM-PAC) Short Form "6-Clicks" [19]. We further linked each record to the patient's vital status and date of death, if deceased, using a proprietary file maintained by UPMC that combines the social security death index with other sources to render the best available record of vital status. We used the date of death to calculate the dependent variable of In-hospital mortality, which was the primary outcome. We additionally examined secondary endpoints of mortality occurring 30 days and 90 days from the date of admission as well as a composite outcome for patients who survived, but who were transitioned to either hospice care or made "comfort measures only" during that admission.

Guided machine learning modeling
We developed SafeNET using Gradient boosting because of its ability to automatically incorporate all variable interactions, account for any amount of missing information, and easily rank variables in terms of their predictive performance. We randomly divided the data into training (80%) and test (20%) sets to first develop the model and then test internal validity, respectively. Differences in training and test sets were examined with likelihood ratios, chisquared and Student's t-tests for categorical and continuous variables, respectively. Three separate stochastic gradient boosting (XGBoost) algorithms with boosted decision trees and logarithmic loss function were employed on the training set to determine the influence and rank of variables for predicting each of the three mortality outcomes (in-hospital, 30-day and 90-day mortality) and CMO status. In order to focus on variables reliably available at the time of transfer, we modeled only those variables that were either (a) part of the Elixhauser Comorbidity Index [20] and thus reliably applied to all inpatient records or (b) for which at least 50% of our sample had valid values recorded within 3 hours of arrival. The model was then internally validated by using the same seed and running the algorithm a single time. Model discrimination was assessed with the c-statistic (e.g., area under the Receiver Operating Characteristic Curve); differences between training and test set c-statistics were assessed using the methods described by DeLong, et al. [21]. Model calibration was assessed with Spiegelhalter's z-test and by plotting observed and expected results across the range of risk. For each model, the variables were ranked according to importance and reviewed by a clinician with expertise in risk stratification (DEH) to select a limited set of variables for a parsimonious model that could feasibly be collected at the bedside yet capture most of the predictive power of the full model. The number of trees was set to 1,000 to tune model hyperparameters and subsequently monitor model performance.
In addition to calculating each patient's SafeNET score, we also calculated quick SOFA (qSOFA) scores on patients in the test set for whom all necessary information was available (respiratory rate, systolic blood pressure, Glasgow Coma Score). A Spearman correlation was used to assess the association between the two models. C-statistics, ROC curves, and calibration plots were obtained to compare the overall effectiveness of SafeNET and qSOFA scores to predict in-hospital mortality. All statistical analyses were conducted using R (Version3.4.1 "Single Candle"; R Core Team, 2017, Vienna, Austria [22]).

Implementation and validation
After developing and internally validating SafeNET in the retrospective sample, we sought to determine SafeNET's validity and feasibility for prospective use in real time at the point of care. We built SafeNET as a web-based application that was easily accessible from any browser behind the UPMC firewall (S1 Fig). The application guided users to enter as many of the variables as were immediately available and then generated the predicted risk of each outcome by running a cloud-based instance of R pre-loaded with the gradient boosting algorithm. After securing support from the Vice President of Medical Affairs and the Chief Nursing Officer at each of three pilot hospitals within the UPMC system, a variety of training opportunities were made available to teach bedside nurses how to use the SafeNET tool. The UPMC Medcall team, which coordinates all transfers into and between UPMC facilities, was also enlisted.
After completing this academic detailing, we began piloting the prospective use of SafeNET with the following workflow: (1) When a bedside nurse from a piloting facility contacted Medcall to request a transfer to a higher level of care, the Medcall agent initiated the transfer process as usual, but asked the bedside nurse to complete the SafeNET score while they waited for bed assignment; (2) the nurse then accessed the web based tool, recorded patient identifiers and generated a SafeNET score which s/he (3) reported to the Medcall agent at the time that the agent connected the bedside nurses of the sending and receiving facilities for the typical sign-out discussion. The pilot was limited to patients older than 18 years transferred to a single, tertiary care hospital from select emergency department or intensive care units at one of the 3 UPMC referring hospitals. We excluded patients transferred for myocardial infarction (both those with and without ST-segment elevation) or maternal labor and delivery so as not to disturb existing transfer algorithms for these patient populations. Daily transfers were monitored by an implementation specialist (MKW) who made regular audit and feedback reports to participating facilities regarding their compliance with the SafeNET initiative as defined by the proportion of patients transferred for whom a SafeNET score was recorded. If compliance lagged, MKW offered additional academic detailing to lagging sites. We examined feasibility with compliance rates, data missingness, and the time to administer the tool. Prospective validation was assessed with c-statistics, calibration plots, Spiegelhalter's z, sensitivity, specificity, and both positive and negative predictive values for in-hospital, 30-day, and 90-day mortality.

SafeNET development
A total of 20,985 patients were identified as transfer patients at a UPMC hospital between July 2017 and June 2018. Of these patients, 10,696 (51.0%) were directly admitted as inpatients whereas 10,289 (49.0%) were admitted as inpatients via the emergency department. Demographically, patients averaged 65 years of age, and the cohort was 52.0% male and 82.9% white race ( Table 1). The patients had a mean Elixhauser Comorbidity Index of 4.2 conditions, and common Major Diagnostic Categories (MDC) of their inpatient stays were Diseases and Disorders of the Circulatory System or Nervous System ( Table 1). There were 1,937 (9.2%) inhospital mortalities, 2,884 (13.7%) 30-day mortalities, 3,899 (18.6%) 90-day mortalities, and 1,944 (9.3%) transitions to hospice or CMO among the transfer patients ( Table 1). No differences were detected in patient demographics or characteristics between the training (N = 16,788) and test (N = 4,197) sets ( Table 1).
After building models for each outcome using all 54 retained independent variables, the variables were ranked according to importance in predicting the outcome (S2 Table). Across the 4 outcomes, the top variables of importance included patient age, AM-PAC Activity score, blood urea nitrogen (BUN) levels, AM-PAC Mobility Score, fluid and electrolyte disorders, body temperature, mechanical ventilation, albumin, glucose, heart rate, systolic blood pressure, platelet count, and white blood cell count. There was a significant positive correlation between AM-PAC Activity Score and AM-PAC Mobility Score, so we retained the former to alleviate redundancy. Similarly, BUN and creatinine levels were highly correlated, so we included only BUN levels in the final model. Moreover, we chose to include a composite measure of cancer status (both metastatic cancer and solid tumor without metastases), both of which ranked highly in importance. These 14 variables were used to build the parsimonious model. Table 2 describes the proportion of patients with data for each of these variables along with estimates of central tendency. With the exception of albumin levels, data were available for all variables in >90% of the cases. In the event that one or more variables were missing for any individual patient in the dataset, the SafeNET score was computed based only on the available data (e.g., no imputation of missing values). This approach mimics real-world conditions and leverages gradient boosting's ability to render the best possible prediction given the available data with any amount of missing data.
As expected, the full 54-variable model had the best discrimination across all mortality outcomes with c-statistics of 0.903 (95% CI: 0.891-0.916), 0.877 (95%CI: 0.864-0.890), and 0.869 (95%CI: 0.857-0.881) for in-hospital, 30-day and 90-day mortality, respectively. As indicated by a Spiegelhalter's Z-test p-value � 0.05 consistent with no significant difference between observed and predicted values across the range of risk, calibration of the 54-variable model was good for predicting in-hospital and 30-day mortality (Fig 1A & 1B) but was less accurate for 90-day mortality (Spiegelhalter's Z-test p-value<0.05; Fig 1C). The parsimonious 14-variable model demonstrated better calibration for predictions of in-hospital and 30-day mortality because, unlike the 54-variable model, it was tuned to adjust for hyperparameters (Spiegelhalter's Z-test P-value�0.05; Fig 1E & 1F). However, calibration of the 14-variable model was less accurate for predicting 90-day mortality (Spiegelhalter's Z-test p-value<0.05; Fig 1G). After tuning, the parsimonious models achieved most of the full model's discrimination with c- We also investigated the ability of SafeNET to predict patients whose care would transition to comfort measures only or hospice during the hospital stay. Both the full 54-variable model and 14-variable model showed good discrimination with c-statistics of 0.867 (95% CI: 0.852-0.881) and 0.840 (95% CI: 0.822-0.857), respectively. However, calibration plots indicated that both models were less accurate at predicting CMO/hospice care transition (Spiegelhalter's Ztest p-value<0.05 for both models) (Fig 1D and 1H) with the model overpredicting the risk of the composite CMO/Hospice transition. However, these models were not tuned for hyperparameters.
We next compared the capabilities of SafeNET versus qSOFA at predicting in-hospital mortality. Among the 4,197 patients randomly selected to be a part of the test cohort (Table 1), 2,260 of these patients had sufficient clinical information to calculate a qSOFA score. We found a significant positive association between SafeNET and qSOFA scores (Spearman Correlation Coefficient = 0.536; p<0.001). Evaluation of ROC curves indicated that SafeNET had superior discrimination to qSOFA (c = 0.836, 95% CI: 0.814-0.857 vs. 0.713, 95% CI: 0.686-0.739, p<0.001 Fig 2).

Discussion
Although interhospital transfer of critically ill patients is a common occurrence, there is no standardized process to assess these patients' risk for poor outcomes and communicate that crucial information at the initial point-of-care. This often leads to care plans misaligned with patient values and poor outcomes. As part of a quality improvement initiative, our institution developed SafeNET, a robust mortality prediction tool with the sensitivity to accurately riskstratify critically ill patients at the time of hospital transfer. Feasibility of SafeNET was demonstrated by increasing compliance rates and ability to complete and automatically generate predicted mortality within minutes at the point of care. Moreover, performance of SafeNET was validated in a prospective cohort which demonstrated discrimination and calibration on par

PLOS ONE
with that produced in the development phase, particularly for predictions of in-hospital and 30-day mortality.
The SafeNET tool has several advantages over current mortality prediction tools developed for admission and ICU settings. First, SafeNET uses only 14-variables that are readily available at the time of presentation, taking only 4 minutes to complete at the bedside. By contrast,

PLOS ONE
existing tools rely on clinical data collected over the course of 24 hours, which cannot accommodate the clinically acute need to identify high-risk patients during the critical timeframe interhospital transfer [23][24][25][26][27]. Moreover, some of the existing tools require input of a large number of variables, which may be difficult to obtain and time consuming to complete [23,28].
Second, although rapid, brief and thus feasible for bedside assessment, SafeNE's predictive ability outperforms qSOFA, the only other tool to our knowledge proven feasible for bedside assessment, and it does so in a diverse population agnostic of disease process. The qSFOA was designed as an abbreviated version of the SOFA score and was shown to outperform SOFA in non-ICU settings at predicting in-hospital mortality in patients with infection and suspected sepsis [17]. It can be feasibly implemented at the bedside and requires only three clinical criteria including respiration rate, systolic blood pressure, and the Glasgow Coma Score [17]. However, the qSOFA model was built as a short-term mortality prediction tool using patients with infection, and reports on predictive capabilities in more generalized non-infected patient populations have varied findings [29][30][31]. In the current study, we found that SafeNET is superior to qSOFA at prediction of in-hospital mortality (c = 0.836 vs c = 0.713, p < .001), and that it does so agnostic of disease process in a diverse sample. This is likely due to the more complex nature of our 14-variable model, but we have shown that these variables are readily obtainable at the bedside with minimal disruption to clinical workflow.
Third, SafeNET is robust to missing data, effectively making risk predictions with whatever data is readily at hand. Previous mortality prediction models had to exclude records for missing data [16]. However, the gradient boosting algorithm used to develop SafeNET is able to account for missingness so that not all variables are needed to make a prediction. In the bedside setting, a degree of missingness of clinical variables is expected due to timing issues and difficulty obtaining information, but the SafeNET tool was able to provide strong discrimination and calibration even in a setting where only 54% of predictions were made with at least one missing variable.
Fourth, the manual data entry format of the online SafeNET tool is implementable at any bedside with ready access to an internet connection. This is critically important considering that at our center, approximately 75% of patients transferred into our hospitals come from outside institutions with isolated electronic records. As such, even if SafeNET were automated for within-system transfers based on data extant in the electronic record, a manual alternative would be required for the majority of patients arriving from outside institutions where a link to the SafeNET tool could be easily delivered via a variety of electronic media.
Finally, our data demonstrate that SafeNET has good discrimination for predicting transition to comfort measures only or hospice care, although this prediction was not as well calibrated as the model for predicting mortality as we did not tune this for hyperparameters. There are limited resources for predicting patients whose care will be transferred to hospice or comfort measures only. The Hospital End-of-Life Prognostic Score (HELPS) predicts the aggregate outcome of in-hospital mortality and discharge to hospice using variables such as patient demographics, resuscitation status, nutrition status, and comorbidities, and its retrospective development demonstrated discrimination (C = 0.866) comparable to SafeNET [32].

PLOS ONE
However, it has not been prospectively validated at the bedside as done here, and it relies on the calculation of potentially cumbersome and time consuming subscores such as the Inpatient Physiologic Failure Score that itself requires an assessment of 12 variables including vital signs, blood chemistry and consciousness. Other tools have been developed with the specific goal of identifying patients who would benefit from palliative care, but these are tailored for more specific populations such as cancer patients [33]. Predicting these transitions to end-of-life care are especially important, perhaps even before transferring to a higher level of care, because they can trigger conversations clarifying patient goals and preferences, but future work is necessary to determine if and how these end-of-life transition predictions translate to more goalconcordant care plans. There are several limitations to employing SafeNET as a point-of-care mortality risk model. First, data is restricted to a single, multi-hospital healthcare system and findings may not generalize to other settings. Second, some of the information that feeds SafeNET's algorithms may be subject to bias (e.g. rater bias with completing the AM-PAC Activity score) and bedside assessment of comorbidities is likely different that the post-facto administrative ICD-10 coding on which the models were developed. However, model discrimination and calibration performed as expected, and the real-time calculation proved feasible with modest effort of bedside clinicians. Third, manual data entry is required. Future automation may further expedite implementation for patients already within a hospital system's data infrastructure, but manual data entry is feasible for immediate implementation and may actually facilitate calculation when patients are transferred from an outside hospital that does not share data infrastructure.
In conclusion, the development of SafeNET provides an objective and systematic way to risk stratify patients at the time of hospital transfer; its use and generalizability across other health systems and settings remains to be determined and is a focus of future work. SafeNET is not meant to supersede clinical judgement, but rather it is intended as a means to trigger a pause so that clinicians are better prepared to inform high-risk patients (or their surrogates) of the severity of their illness and address goals of care when they arrive at the receiving facility, or in some cases, before transferring patients outside their communities of support. Patients who are aware of their condition and participate in conversations with physicians about their values tend to receive care that is consistent with these care preferences [34]. Moreover, both patient and family satisfaction are significantly improved among patients whose care included physician-directed advance care planning [35]. Ongoing work detailing the implementation process to mainstream SafeNET are necessary next steps to facilitate these efforts.  to help coordinate implementation of SafeNET across pilot sites within the UPMC system and Adam Yee and Jeremy Wells for their work in building the SafeNET tool as a web-based application.