Comparative prognostic accuracy of sepsis scores for hospital mortality in adults with suspected infection in non-ICU and ICU at an academic public hospital

Background Sepsis is a global healthcare challenge and reliable tools are needed to identify patients and stratify their risk. Here we compare the prognostic accuracy of the sepsis-related organ failure assessment (SOFA), quick SOFA (qSOFA), systemic inflammatory response syndrome (SIRS), and national early warning system (NEWS) scores for hospital mortality and other outcomes amongst patients with suspected infection at an academic public hospital. Measurements and main results 10,981 adult patients with suspected infection hospitalized at a U.S. academic public hospital between 2011–2017 were retrospectively identified. Primary exposures were the maximum SIRS, qSOFA, SOFA, and NEWS scores upon inclusion. Comparative prognostic accuracy for the primary outcome of hospital mortality was assessed using the area under the receiver operating characteristic curve (AUROC). Secondary outcomes included mortality in ICU versus non-ICU settings, ICU transfer, ICU length of stay (LOS) >3 days, and hospital LOS >7 days. Adjusted analyses were performed using a model of baseline risk for hospital mortality. 774 patients (7.1%) died in hospital. Discrimination for hospital mortality was highest for SOFA (AUROC 0.90 [95% CI, 0.89–0.91]), followed by NEWS (AUROC 0.85 [95% CI, 0.84–0.86]), qSOFA (AUROC 0.84 [95% CI, 0.83–0.85]), and SIRS (AUROC 0.79 [95% CI, 0.78–0.81]; p<0.001 for all comparisons). NEWS (AUROC 0.94 [95% CI, 0.93–0.95]) outperformed other scores in predicting ICU transfer (qSOFA AUROC 0.89 [95% CI, 0.87–0.91]; SOFA AUROC, 0.84 [95% CI, 0.82–0.87]; SIRS AUROC 0.81 [95% CI, 0.79–0.83]; p<0.001 for all comparisons). NEWS (AUROC 0.86 [95% CI, 0.85–0.86]) was also superior to other scores in predicting ICU LOS >3 days (SOFA AUROC 0.84 [95% CI, 0.83–0.85; qSOFA AUROC, 0.83 [95% CI, 0.83–0.84]; SIRS AUROC, 0.75 [95% CI, 0.74–0.76]; p<0.002 for all comparisons). Conclusions Multivariate prediction scores, such as SOFA and NEWS, had greater prognostic accuracy than qSOFA or SIRS for hospital mortality, ICU transfer, and ICU length of stay. Complex sepsis scores may offer enhanced prognostic performance as compared to simple sepsis scores in inpatient hospital settings where more complex scores can be readily calculated.


Introduction
Sepsis is a major healthcare challenge in the United States and globally, and is associated with profound mortality, morbidity, and healthcare costs [1][2][3][4][5][6][7]. Early recognition and treatment of sepsis improves outcomes; reliable tools are needed to identify patients at increased risk of developing sepsis and to prognosticate their mortality and other complications [8,9]. Sepsis-3 authors recommend the quick sepsis-related organ failure assessment (qSOFA) score to identify patients at high risk of developing sepsis outside the intensive care unit (ICU) and the SOFA score for patients in the ICU [10,11]. Internal and external validation studies have demonstrated the superiority of the qSOFA and SOFA scores for the identification and mortality prognostication of sepsis patients, when compared to the systemic inflammatory response syndrome (SIRS) criteria [10,[12][13][14][15]. While the Sepsis-3 authors initially proposed the qSOFA and SOFA scores as tools to identify patients with organ dysfunction among those with suspected infection, there is widespread interest in using these and other scores in prognosticating patient outcomes secondary to sepsis [16][17][18][19][20][21]. A recent meta-analysis comparing the qSOFA score with SIRS criteria concluded that the qSOFA score was more predictive of hospital mortality but SIRS was superior for sepsis diagnosis [22]. However, other studies have shown that alternative scores, such as the national early warning score (NEWS) may be superior [20,[23][24][25][26]. The ideal sepsis identification and outcome prognostication scoring system remains uncertain. We compared the prognostic accuracy of sepsis scores for hospital mortality among patients with suspected infection presenting to the emergency department (ED) and then admitted to either the acute care service or ICU of an academic public hospital. We hypothesized that there were important differences between scores which may impact score performance among different hospitalized populations.

Study design and population
A retrospective cohort study was performed using all patients �18 years of age with suspected infection who presented to the ED and were admitted to Harborview Medical Center, a tertiary academic public hospital in Seattle, WA with 413 beds, between January 2011 and March 2017. Patients were identified on the basis of suspected infection given that there is no gold standard for the diagnosis of sepsis [10]. Suspected infection was defined as (1) any blood, urine, or sputum culture order followed by clinician order of an intravenous (IV) antibiotic within 72 hours, or (2) clinician order of an IV antibiotic followed by a culture order within 24 hours. This method was chosen for consistency with recent major sepsis studies [7,10,14,15,23]. All patients in the ED, acute care service, or ICU were eligible for inclusion in the study. The time at which a patient met the definition of suspected infection was used as the time of study inclusion. Primary exposures were the maximum SIRS, qSOFA, SOFA, and NEWS scores upon inclusion. Patients who were directly admitted to the hospital without being evaluated in the ED, those transferred from the ED or inpatient wards of another hospital, those who were evaluated in the ED and then discharged, and those admitted to inpatient psychiatric or rehabilitation services were excluded. The study was approved by the University of Washington Institutional Review Board (IRB #00002870).

Data collection
Patient demographic data, vital signs, laboratory values, orders (e.g., medications, cultures, oxygen therapy, vasopressors), hospital mortality data, and ICU and hospital length of stay (LOS) were extracted from the electronic health record, de-identified, and made available on a secure server for analysis. Each patient's qSOFA, SOFA, SIRS, and NEWS scores were calculated at time of inclusion in the study using the most deranged physiologic and laboratory parameters recorded within the 24 hours preceding and the 24 hours following time of inclusion [11,12,25,27]. Standard criteria for score positivity were applied, using a threshold of 2 or more points for each of SIRS, qSOFA, and SOFA, and a threshold of 5 or more points for NEWS. Glasgow coma scale (GCS) �14 was used to define altered mental status [10,28]. No contribution was made to the total score if an individual component of the score was missing. Patients in whom all components of any score were missing were excluded from analysis.

Outcomes
The primary outcome was prognostic accuracy of individual scores for hospital mortality. We did not compare the ability of scores to identify patients with sepsis because of the limitations of the retrospective study design and the lack of a gold standard for comparison. Secondary outcomes included hospital mortality stratified by non-ICU and ICU setting at time of inclusion, transfer to the ICU from a non-ICU setting, ICU LOS >3 days, and overall hospital LOS >7 days following study inclusion. ICU transfer was further defined as patient transfer from a non-ICU to ICU setting within the 24 hours preceding and the 24 hours following time of study inclusion with subsequent ICU duration of at least 24 hours or death within 24 hours of transfer. Secondary outcomes were chosen to reflect clinical events significant to both individual patients and more broadly to hospitals and health systems.

Statistical analysis
All analyses were performed using Stata version 15 (StataCorp, College Station, TX). Patient characteristics are presented as number (%), mean ± standard deviation (SD) for quantification of normally distributed variables, or median and interquartile range (IQR) for non-normally distributed variables. For comparison of continuous variables, Student's t-test was used. For comparison of dichotomous variables, chi-square test was applied. Comparative prognostic accuracy for the primary and secondary outcomes was assessed using the area under the receiver operating characteristic curve (AUROC) for each score individually (crude analysis) and in conjunction with a baseline risk model (adjusted analysis) to demonstrate the additional prognostic value of sepsis scores beyond potentially confounding demographic factors. Age, sex, and race were used to calculate a baseline level of risk for mortality and other outcomes based on sepsis data from the United States demonstrating disparities according to these factors [29][30][31]. Adjusted risk ratios for outcomes comparing positive vs. negative scores (e.g., SOFA >2 vs. SOFA <2) were assessed. A 2-sided p-value of <0.01 was used to indicate statistical significance and ensure a robust analysis based on the Bonferroni correction for multiple comparisons [32].

Results
There were 125,431 patient encounters during the study period, of which 10,981 met study inclusion criteria. Thirty-nine patients had incomplete records precluding the calculation of at least one of the scores and were omitted from analysis; 10,942 patients were included in the study (S1 Fig). Median age was 52 years (IQR, 29-75 years), 70% were male (n = 7,645), and 30% were female (n = 3,297) ( Table 1). 61% were Caucasian (n = 6,710), 18% African-American (n = 1,922), and 21% other racial/ethnic groups (n = 2,349). 7,193 patients (66%) were in a non-ICU setting at the time they met inclusion criteria, compared to 3,749 (34%) in the ICU. 774 patients (7.1%) died in the hospital; of these, 116 (15%) deaths occurred in a non-ICU setting and 658 (85%) occurred in the ICU. 313 patients were transferred from a non-ICU setting to the ICU. Of those patients who were admitted to the ICU at any point during their Within the study cohort, 8,534 (78%) had a SIRS score �2; 4,864 (44%) had a qSOFA score �2; 9,746 (89%) had a NEWS score �5; and 6,219 (57%) had a SOFA score of �2. The full distributions of scores and their relationship with hospital mortality are presented in Figs 1 and 2. The incidence of missing score components in the study cohort was low (S1 Table). However, the PaO2/FiO2 ratio and serum bilirubin level were missing for 77% (n = 8,489) and 36% (n = 3,927) of patients, respectively.
After adjusting for baseline risk factors for death, discrimination of hospital mortality was significantly higher for SOFA ( Analyses using crude data without adjustment for baseline risk of mortality are reported in the supplement and resulted in similar estimates of AUROC for outcomes by each scoring system (S2 Table, S3 Table, S8-S13 Figs).
Assessment of adjusted risk ratios for outcomes comparing positive vs. negative scores in a binary fashion (e.g., qSOFA �2 vs. qSOFA <2) are also reported in the supplement (S14 Fig

Discussion
In this study comparing the prognostic accuracy of SOFA, qSOFA, SIRS, and NEWS scores for clinically relevant outcomes in a large population of adult patients with suspected infection at an academic public hospital, more detailed multivariate prediction scores outperformed simpler scores. SOFA, a complex score combining physiologic and laboratory data, demonstrated superior prognostic accuracy for overall hospital mortality and ICU mortality compared to all other scores. NEWS, which utilizes physiologic data only, outperformed all other scores in   predicting transfer to the ICU and for ICU LOS >3 days. There was no significant difference between qSOFA and SIRS in predicting the primary outcome of hospital mortality. However, qSOFA was superior to SIRS in predicting transfer to the ICU, ICU LOS >3 days, and hospital LOS >7 days in the study cohort. This study's finding that SIRS is a poor predictor of mortality is consistent with prior studies [33,34]. In the sepsis consensus definition paper, the ability of SOFA, qSOFA, and SIRS to predict hospital mortality was determined in a mixed cohort of ICU and non-ICU encounters [10]. Seymour and colleagues found that 1) the predictive validity of qSOFA for hospital mortality was statistically greater than SOFA or SIRS for non-ICU encounters, and 2) the predictive validity of SOFA for hospital mortality was superior to qSOFA or SIRS for ICU encounters [10]. Freund et al. conducted an international prospective cohort study that showed the superiority of qSOFA in prognosticating hospital mortality as compared to SIRS in patients presenting to the emergency department with suspected infection [13]. Raith et al. performed a retrospective cohort analysis on a large cohort of patients in Australian and New Zealand ICUs and found that SOFA had greater prognostic accuracy for hospital mortality compared to SIRS or qSOFA [14]. Moreover, qSOFA was found to have superior discrimination of mortality as compared to SIRS in adult patients with suspected infection hospitalized in low-and middle-income countries [15]. A recent meta-analysis of 38 studies comparing the prognostic accuracy of qSOFA and SIRS for hospital mortality among patients with suspected infection reported that qSOFA was more predictive of mortality but SIRS was superior for sepsis diagnosis [22]. However, these findings are inconclusive given that the included studies varied significantly in the studied patient population (e.g., ED vs. acute care vs. ICU), outcome measures (hospital mortality vs. 28-day mortality), and on the definition of "suspected infection," with only 10 of the 38 studies using a standardized approach incorporating antibiotic treatment or initiation of body fluid cultures.
This study found that SOFA outperformed qSOFA, SIRS, and NEWS in predicting overall hospital mortality and ICU mortality. Nonetheless, the observed event rate of non-ICU fatalities was low in this cohort. Given that the majority of deaths occurred in the ICU, we found few significant differences between scores in prognosticating non-ICU mortality. This study further confirms the superiority of SOFA to discriminate ICU mortality, as previously reported by others [14,16]. In this cohort qSOFA was superior to SIRS in predicting overall hospital mortality, transfer to the ICU, ICU LOS >3 days, and hospital LOS >7 days. This finding is consistent with the meta-analysis by Fernando et al. in which qSOFA was found to be superior to SIRS in predicting hospital mortality [22]. However, we found that NEWS had better discriminative value than qSOFA for hospital mortality and was superior to all other scores in predicting ICU transfer and ICU LOS >3 days. This finding strongly supports mounting evidence that qSOFA should not replace general early-warning scores in risk-stratifying patients with suspected infection in high-resource hospitalized settings [20,23,24,26]. However, qSOFA may have utility in environments where calculation of complex scores is a challenge, such as outpatient clinics, emergency departments, or low-resource settings.
This study has several strengths. The analysis is based on a large dataset encompassing patients in non-ICU and ICU settings and includes both medical and surgical patient cohorts. The dataset was designed to address the study question and was further strengthened by the low incidence of missing data elements. The data have excellent external validity as evidenced by a hospital mortality rate of 1.7% in the non-ICU cohort and 16.4% in the ICU cohort that is highly consistent with other published reports [10,14]. This work benefitted from its use of a reproducible identification schema for suspected infection and use of similar methodology to the consensus paper and other major studies [7,10,14,15,23]. Hospital mortality was the primary outcome, but the study also measured the ability of scores to prognosticate other clinically relevant outcomes pertinent to individual patients, hospitals, and healthcare systems that have not been previously reported in other large studies of this kind.
Data were retrospectively collected for this analysis and were limited to a single academic public hospital in the United States. The majority of patients in this cohort were male and/or Caucasian. While this is consistent with the hospital's overall patient demographics and previously published studies from this center, the generalizability of the results to other hospitals and healthcare settings with different patient demographics is unclear [35]. Calculation of scores was based upon the most deranged physiologic and biochemical score components within the 24 hours preceding and 24 hours following inclusion in the study, consistent with other studies in the field [7,10,14,15,23]. Thus, these data may bias towards higher scores. Additionally, missing data regarding the respiratory and hepatic components of the SOFA score may have limited its prognostic accuracy for mortality outcomes in this analysis. While adjusted analyses were performed to demonstrate the additive power of sepsis scores to predict outcomes beyond baseline risk, the variables used to generate this model were limited in scope due to lack of administrative data pertaining to relevant comorbid conditions on admission. This study reports on the performance of scores in ICU and non-ICU settings but does not compare score performance across the ED, acute care service, and ICU due to its retrospective nature and limitations of the available administrative data. Finally, the majority of deaths in this study cohort occurred in the ICU and thus the present work may be underpowered to determine which score performs best in non-ICU environments.

Conclusions
In this large single-center retrospective cohort study of adult medical and surgical inpatients with suspected infection, multivariate prediction scores such as SOFA and NEWS demonstrated superior prognostic accuracy for hospital mortality, ICU transfer, and ICU LOS as compared to qSOFA and SIRS. These findings suggest that complex scores may such as SOFA and NEWS may offer enhanced prognostic performance over simple sepsis scores such as qSOFA and SIRS in inpatient hospital settings where more complex scores can be readily calculated.
Supporting information S1 Table. Missing score components in the final study cohort. Abbreviations: FiO2, fraction of inspired oxygen; GCS, Glasgow coma scale; ICU, intensive care unit; MAP, mean arterial pressure; NEWS, national early warning score; O2, oxygen; PaO2, partial pressure of arterial oxygen; qSOFA, quick sequential organ failure assessment; SBP, systolic blood pressure; SIRS, systemic inflammatory response syndrome; SOFA, sequential organ failure assessment; WBC, white blood cell count. (DOCX)

S2 Table. Crude AUROCs and comparisons for prediction of hospital mortality outcomes.
Abbreviations: AUROC, area under the operator receiver curve; CI, confidence interval; ICU, intensive care unit; NEWS, national early warning score; qSOFA, quick sequential organ failure assessment; SIRS, systemic inflammatory response syndrome; SOFA, sequential organ failure assessment. N values correspond to the number of patients included in the analysis who were eligible to experience the outcome. (DOCX) S3 Table. Crude AUROCs and comparisons for prediction of ICU transfer and length of stay outcomes. Abbreviations: AUROC, area under the operator receiver curve; CI, confidence interval; ICU, intensive care unit; NEWS, national early warning score; qSOFA, quick sequential organ failure assessment; SIRS, systemic inflammatory response syndrome; SOFA, sequential organ failure assessment. N values correspond to the number of patients included in the analysis who were eligible to experience the outcome. For ICU transfer, the reported n of 7,287 indicates the 7193 non-ICU patients and an additional 94 patients who were in the ICU at time of inclusion but had been transferred to the ICU within the preceding 24 hours and met the definition for ICU transfer. (DOCX) S1 Fig. Flow diagram of eligible patient population and exclusion criteria. Abbreviations: ICU, intensive care unit. 125,431 patient encounters were screened for eligibility. Following exclusion of patients <18 years of age, patients who were directly admitted to the hospital or were transferred from outside institutions, were admitted to inpatient psychiatric or rehabilitation services, were evaluated in the ED and discharged, or encounters did not meet criteria for suspected infection, 10,981 patients remained.