The use of early warning system scores in prehospital and emergency department settings to predict clinical deterioration: A systematic review and meta-analysis

Background It is unclear which Early Warning System (EWS) score best predicts in-hospital deterioration of patients when applied in the Emergency Department (ED) or prehospital setting. Methods This systematic review (SR) and meta-analysis assessed the predictive abilities of five commonly used EWS scores (National Early Warning Score (NEWS) and its updated version NEWS2, Modified Early Warning Score (MEWS), Rapid Acute Physiological Score (RAPS), and Cardiac Arrest Risk Triage (CART)). Outcomes of interest included admission to intensive care unit (ICU), and 3-to-30-day mortality following hospital admission. Using DerSimonian and Laird random-effects models, pooled estimates were calculated according to the EWS score cut-off points, outcomes, and study setting. Risk of bias was evaluated using the Newcastle-Ottawa scale. Meta-regressions investigated between-study heterogeneity. Funnel plots tested for publication bias. The SR is registered in PROSPERO (CRD42020191254). Results Overall, 11,565 articles were identified, of which 20 were included. In the ED setting, MEWS, and NEWS at cut-off points of 3, 4, or 6 had similar pooled diagnostic odds ratios (DOR) to predict 30-day mortality, ranging from 4.05 (95% Confidence Interval (CI) 2.35–6.99) to 6.48 (95% CI 1.83–22.89), p = 0.757. MEWS at a cut-off point ≥3 had a similar DOR when predicting ICU admission (5.54 (95% CI 2.02–15.21)). MEWS ≥5 and NEWS ≥7 had DORs of 3.05 (95% CI 2.00–4.65) and 4.74 (95% CI 4.08–5.50), respectively, when predicting 30-day mortality in patients presenting with sepsis in the ED. In the prehospital setting, the EWS scores significantly predicted 3-day mortality but failed to predict 30-day mortality. Conclusion EWS scores’ predictability of clinical deterioration is improved when the score is applied to patients treated in the hospital setting. However, the high thresholds used and the failure of the scores to predict 30-day mortality make them less suited for use in the prehospital setting.


Introduction
Initially used in the intensive care unit (ICU), Early Warning System (EWS) scores have been employed in multiple health care settings including hospital wards, the emergency department (ED), and pre-hospital community settings (1,2). The scores primarily aim to detect clinical deterioration in patients by tracking their vital signs, with high EWS scores triggering a response to prevent any potential clinical deterioration. Patients' vital signs commonly change before clinical deterioration (3,4), and if early and timely interventions are adequately performed, adverse outcomes of patients may be prevented. The earliest EWS score was validated in 1981 for ICU patients only (5). Variations over time were developed to suit different hospital inward settings (3) with some being more speci c to certain conditions such as blunt trauma or sepsis (6-9). However, the fundamentals of the scores have not changed. They are determined by measuring variations of vital signs (e.g., systolic blood pressure, oxygen saturation, temperature, heart rate, and Glasgow Coma Scale, of which, one or all are adopted against prede ned parameters to calculate an aggregated score. Other versions of the EWS scores employ advanced therapies, laboratory testing and patients' demographic information. (10,11). The usage of EWS scores constitutes a standardised practice across the UK in both pre-hospital and in-hospital settings (7), and in some parts of Australia, a few selected EWS scores are applied in the in-hospital setting (12).
Numerous systematic reviews and meta-analyses have identi ed the optimal performing EWS scores in in-hospital settings (e.g., wards or the ED) (3,(13)(14)(15)(16)(17), however, only a limited number of reviews have focused on pre-hospital settings (18,19). Available systematic reviews have demonstrated that EWS scores can potentially improve patient outcomes (11)(12)(13)(14)(15) but, since there are several EWS scores that are used across different settings, it is unknown which score should be used in the ED or pre-hospital setting to best predict clinical outcomes (20,21). Furthermore, it is not known which EWS scores, and which cut-off points best predict outcomes such as short-term and long-term mortality or ICU admission (7)(8)(9).
This systematic review and meta-analysis aimed to estimate the pooled odds of predicting in-hospital deterioration, including short (≥ 3-day) and longterm (30-day) mortality, and ICU admission, strati ed by the EWS score cut-off points as used in the ED and pre-hospital settings. Length of stay in hospital and cardiac or respiratory arrests were also investigated.

Methods
This systematic review reviewed the ve most used EWS scores in either the ED or pre-hospital settings: the National Early Warning Score (NEWS) and its updated version the National Early Warning Score 2 (NEWS2), the Modi ed Early Warning Score (MEWS), the Rapid Acute Physiological Score (RAPS) and the Cardiac Arrest Risk Triage (CART) (13,(15)(16)(17)(18)22). These scores were selected since they rely solely on observations that are readily available to health care professionals in the pre-hospital setting and are easy to calculate and apply. This is important as expensive and time-consuming pathological and other complex testings are less commonly performed in the pre-hospital environment (23).
A PICO framework was used to inform the literature search strategy (Appendix 1).

Protocol And Registration
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was used to design and report this systematic review (24).
The protocol was registered with the PROSPERO-International register of systematic reviews (registration number of CRD42020191254).

Inclusion Criteria And Exclusion Criteria
Experimental, quasi-experimental or observational studies using EWS scores in either the ED or pre-hospital setting were eligible for inclusion. Studies were considered eligible if they reported on medical or trauma patients aged ≥ 14 years with study outcomes including in-hospital mortality up to 3 or 30 days, ICU admission, cardiac arrest/respiratory arrest, and length of stay. Publication language was restricted to English, but no restrictions were applied to year of publication.
Studies focusing on obstetric, maternal, or palliative care patients were excluded. Similarly, articles focusing only on patients experiencing sepsis or septic shock or Severe Acute Respiratory Syndrome Coronavirus 2 infections were excluded. Such patients were not representative of the general population treated in either setting. Articles focusing on rare or speci c conditions (e.g., portal hypertension, traumatic brain injuries and splenic abscess) were excluded. Articles containing duplicate data were excluded.

Search Strategy
Four databases (CINAHL, Embase, PubMed/MEDLINE, and Web of Science) were systematically searched in April 2020 and updated in February 2021. Key search terms included pre-hospital/ambulance/paramedic, ED/emergency room and the selected EWS scores. All synonyms and MeSH terms were included in the searches (Appendix 2). The references of the identi ed articles were hand-searched for additional articles that could have been missed in the electronic searches.

Selection Process And Data Extraction
Three reviewers (GG, GM, CL) independently screened all articles based on title and abstract. In addition, reviewer GG screened all potential articles crosschecked by CL (80%) and GM (20%). Any disagreements or con icts were discussed to make the nal decision for inclusion.
Data extracted included author name, year and country of publication, study setting (ED or pre-hospital), study outcomes, EWS score applied, cut-off points of the EWS scores, sample size, sensitivity and speci city of the EWS score to predict investigated study outcomes, mean or median age of study population, sex proportion, study design and study inclusion and exclusion criteria. The Newcastle-Ottawa Scale was used to assess the methodological quality of the included articles (23). The assessments were independently conducted by GG and SB with con icts resolved after discussion with coauthors.

Statistical analysis
An odds ratio (OR) was computed as a summary measure of the predictive accuracy of each EWS from (TP x TN) / (FP x FN), in which TP, TN, FP and FN respectively express true positive, true negative, false positive and false negative (25). The con dence interval (CI) of the OR was estimated as: where Zα/2 denotes the critical value of the normal distribution at α/2 (e.g., for a CI 95%, α is 0.05, and the critical value reaches 1.96).
To express the diagnostic accuracy of the EWS scores, the log ORs, together with their corresponding log standard errors, were meta-analysed using DerSimonian and Laird random-effects models (26). The analyses were conducted by types of EWS scores, and the cut-off points utilised, patient outcomes (short-and long-term mortality from hospital admission, and ICU admission) and study setting (ED or pre-hospital). This systematic review de ned short-term mortality as death within 3 days from admission and long-term mortality as death within 30 days from admission. Between study heterogeneity was estimated with I 2 statistic. Meta-regressions were constructed to quantify the proportion of between study variances explained by sample size, sex proportion, and age. Deeks' funnel plot asymmetry test was used to test publication bias. Pooled ORs were compared after converting them to z scores. The p values were estimated using the normal distribution table. Sensitivity analyses were conducted by risk of bias.

Results
The electronic databases searches identi ed 11,565 potential references. After removing duplicates and excluding irrelevant articles based on the title and abstract, 260 articles were included in the full-text review. Of these, 15 articles with a total sample of 79,214 patients were meta-analysed (Fig. 1).

Risk of bias and quality assessment
Quality and risk of bias assessments of each included study are found in Appendix 3. Of all studies, two were rated poor (13.3%), and the remaining 12 (86.7%) were rated good. A sensitivity analysis was conducted with and without studies with high ROB in the ED setting, as shown in Figures 2a and 2b.

Meta-analysis: ED setting
The pooled diagnostic ORs (DOR) of MEWS (cut-off points of ≥3 and ≥4) and NEWS (cut-off point ≥6) to predict 30-day mortality and of MEWS (cut-off point ≥3) to predict ICU admission were estimated ( Figure 2

Meta-analysis: Pre-hospital setting
As illustrated in Figure 3, NEWS2 was evaluated for short-term mortality with cut-off points at 5, 7 and 9.

Between-study variances
Studies reporting on 30-day mortality had moderate to high heterogeneity (I 2 >70%) in both the ED and pre-hospital settings. Meta-regressions, including studies from the ED setting, were constructed to detect which known study variables contributed to the between-studies differences. Of the study variables, only patients' age contributed to heterogeneity, explaining 92% of the between-study variance. In the pre-hospital setting, meta-regression could not be performed due to a limited number of studies investigating 30-day mortality.

Publication bias
Since studies often reported multiple results for different EWS scores, to detect any publication bias, the highest and lowest reported ORs in each study were included in two separate Deeks' funnel asymmetry tests, as shown in Figures 4a and 4b. No evidence for publication bias was detected, with p values using the highest or lowest ORs p=0.82 and p=0.44, respectively.

Discussion
The use of EWS scores in the ED is well documented, while only limited studies have been published in the pre-hospital setting. This systematic review and meta-analysis explored the predictability of different EWS scores, as utilised in the ED or pre-hospital setting, to predict up-to-3-and 30-day mortality and admission to an ICU. The systematic review found that different cut-off points of different EWS scores applied in the ED had similar ability to predict clinical deterioration. However, in the pre-hospital setting, where relatively high cut-off points were utilised, the EWS only predicted short-term clinical deterioration with it failing to predict 30-day mortality.
Similar to other studies, our systematic review demonstrated the ability of EWS scores to predict 30-day mortality when applied in the ED setting (41,42). However, unlike other studies that suggested optimal cut-off points of different EWS scores to predict clinical deterioration (43)(44)(45)(46)(47), this systematic review did not detect any signi cant variation in the predictability of different scores by different cut-off points when applied in the ED. The choice of cut-off points mostly depends on the severity of illness and the acuteness of the investigated condition. Patient presentation varies across different settings; in the pre-hospital setting, the general population attended by paramedics are less severely ill (with a considerable majority having non-urgent low acuity presentations) than patients who were sick enough to have been brought to the ED(48). In critically ill patients, lower cut-off points are able to predict clinical deterioration, which may indicate that the predictability of the score is outcome and patient population speci c, as evidenced by this systematic review with both settings requiring different optimal thresholds. The cut points applied in ED are typically lower than those used in the pre-hospital setting. The high cut-off points in the pre-hospital setting may imply that EWS scores are adequate to predict short-term clinical deterioration among the critically ill, as the higher EWS scores target more severely ill patients who are at high risk of clinical deterioration in the short term (19,49,50). Thus, the high thresholds used in the pre-hospital setting may be targeting the critically ill patients who will constitute a relatively small proportion of the overall prehospital patient population that paramedics treat. The review conducted by Williams et al on the use of EWS scores in pre-hospital setting also suggests that critically ill patients are the best candidates for the use of pre-hospital EWS scores; the authors also argue that achieving an optimal EWS score is di cult due to the short duration of the interaction between paramedics and patients (19). The reporting of high cut-off points in the pre-hospital setting is also due to a trade-off in sensitivity and speci city. Lower cut-off points often result in poor sensitivity and speci city in the pre-hospital setting. Conversely, the cut-off points in the ED are often similar to the cut-off points used in in-hospital settings suggesting that EWS scores can be compared between the ED and in-hospital wards, whereas this comparison becomes less valid when it is conducted against the pre-hospital setting (51). In our review, NEWS2 with a cut-off point of 5, 7 or 9 and NEWS with a cut-off point of 7 had similar predictability. This is supported by a study that compared NEWS and NEWS 2 at the same threshold of 7 without detecting signi cant difference between the two scores when predicting short-term mortality (3). The Royal College of Physicians in London argue that NEWS2 is superior to NEWS in predicting clinical deterioration (52, 53); however, this was not supported by our ndings. Similarly, Hodgson and colleagues demonstrated that NEWS2 did not outperform NEWS in predicting clinical deterioration in patients admitted to hospital with acute exacerbation of Chronic Obstructive Pulmonary Disease, (53), which was one of the main reasons why oxygen saturation was added as an additional parameter in NEWS2. Based on the available studies and our systematic review, NEWS and NEWS2 had similar predictabilities.

Limitations
The number of articles included in this systematic review was limited due to the use of multiple EWS scores with different thresholds in different settings.
The results of the systematic review apply only to the most commonly used EWS scores assessed in this study. The analysis lacked power to assess medical versus trauma conditions separately. Patients' main complaints and diagnoses were not known and could not be accounted for in the systematic review. The number of studies reporting on cardiac and/or respiratory arrest, and length of stay were limited and could not be meta-analysed.

Conclusions
The accuracy and predictability of the EWS scores depend on numerous factors such as the outcome measure, population and setting being investigated.
In the ED setting, the patient population is by default more morbid than those managed by caregivers in the community. This, in turn, explains why in the ED, low cut-off points of such scoring systems predict clinical deterioration of patients. We report that different EWS cut-off points in the ED have similar predictability.
Studies using EWS scores in the pre-hospital setting utilised relatively high cut-off points. This may indicate that early warning scoring systems may be less applicable for the general population treated by paramedics in the pre-hospital setting. This scoring system may only be suited for critically ill patients treated in the pre-hospital setting. Our ndings suggest that EWS scores applied to the pre-hospital setting cannot accurately predict long-term events such as 30-day mortality. EWS scores used in the pre-hospital setting can predict immediate outcomes when applied on a relatively sicker patient population compared to the general population seen by paramedics.

Declarations
If any of the sections are not relevant to your manuscript, please include the heading and write 'Not applicable' for that section.
Ethics approval and consent to participate Not applicable

Consent for publication
Not applicable Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests. Meta-analysis result for pre-hospital setting a Deeks' funnel plot testing for publication bias using the highest OR b Deeks' funnel plot testing for publication bias using the lowest OR