Comparison of CATs, CURB-65 and PMEWS as Triage Tools in Pandemic Influenza Admissions to UK Hospitals: Case Control Analysis Using Retrospective Data

Triage tools have an important role in pandemics to identify those most likely to benefit from higher levels of care. We compared Community Assessment Tools (CATs), the CURB-65 score, and the Pandemic Medical Early Warning Score (PMEWS); to predict higher levels of care (high dependency - Level 2 or intensive care - Level 3) and/or death in patients at or shortly after admission to hospital with A/H1N1 2009 pandemic influenza. This was a case-control analysis using retrospectively collected data from the FLU-CIN cohort (1040 adults, 480 children) with PCR-confirmed A/H1N1 2009 influenza. Area under receiver operator curves (AUROC), sensitivity, specificity, positive predictive values and negative predictive values were calculated. CATs best predicted Level 2/3 admissions in both adults [AUROC (95% CI): CATs 0.77 (0.73, 0.80); CURB-65 0.68 (0.64, 0.72); PMEWS 0.68 (0.64, 0.73), p<0.001] and children [AUROC: CATs 0.74 (0.68, 0.80); CURB-65 0.52 (0.46, 0.59); PMEWS 0.69 (0.62, 0.75), p<0.001]. CURB-65 and CATs were similar in predicting death in adults with both performing better than PMEWS; and CATs best predicted death in children. CATs were the best predictor of Level 2/3 care and/or death for both adults and children. CATs are potentially useful triage tools for predicting need for higher levels of care and/or mortality in patients of all ages.


Introduction
Triage tools identifying need for higher levels of care and risk of severe outcome have an important role in pandemic situations where secondary care capacity may be insufficient to meet demand [1].The time available for clinical decision making may be limited by workload pressures and healthcare workers unfamiliar with clinical assessment and admission decision making may be asked to fulfil 'gatekeeper' roles.The CURB-65 score is a validated predictor of 30-day mortality from community acquired pneumonia in adults but was never intended for use in children [2,3].The CURB-65 score does not perform as well in predicting higher levels of care and was not designed to predict mortality from non-pneumonic presentations [4,5].Challen et al proposed the Pandemic Medical Early Warning Score (PMEWS) as a clinical triage tool to aid hospital admission decisions for adults in a pandemic situation [6].They validated PMEWS in adults presenting to hospital with community acquired pneumonia and found that it was better than the CURB-65 score for predicting need for admission and higher levels of care but had limited ability to predict mortality.
In 2009, the Department of Health England published a package of care that included Community Assessment Tools (CATs) and patient pathways for use by the NHS in a severe pandemic event [7].CATs were developed to help non-specialist front-line staff identify which sick children and adults are most likely to benefit from interventions and levels of care only available in hospitals when resources are limited.CATs use six objective and one subjective criteria based on simple clinical assessment.Meeting any CATs criterion warrants referral and admission to hospital.Criteria are: A) Severe respiratory distress, B) Increased respiratory rate, C) Oxygen saturation #92% on pulse oximetry breathing air, or on oxygen, D) Respiratory exhaustion, E) Severe dehydration or shock, F) Altered consciousness level and G) Causing other clinical concern.
While criterion fields are common to adult and paediatric CATs, the abnormal physiological thresholds and clinical signs are age-appropriate.Like PMEWS, there is no requirement for laboratory investigation to complete the assessment.CATs were only intended for use ''during severe and exceptional circumstances when surge demand for healthcare services leads to a need for strict triage''; and as such, were not deployed during the 2009/ 10 pandemic.
Goodacre and colleagues (2010) conducted an evaluation of the discriminatory value of the CURB-65 score, PMEWS and CATs for predicting severe illness or mortality in patients with suspected pandemic influenza, but were unable to draw any conclusions regarding their clinical utility in a pandemic situation due to insufficient case numbers especially of adults, and a low incidence of severe outcome [8].We aimed to use data from the much larger Influenza Clinical Information Network (FLU-CIN) cohort to compare the clinical validity and utility of CATs, CURB-65 and PMEWS as predictors for higher levels of care, in-patient mortality and severe combined outcome in pandemic influenza.The details of data collection and the findings have been described elsewhere [9].A/H1N1 2009 influenza infection was diagnosed by a positive reverse transcribed polymerase chain reaction (PCR) result from respiratory samples obtained during the admission episode.Data was gathered from routine case notes using the first recorded routine clinical assessment on or shortly after admission.A case-controlled analysis using retrospective data of the predictive ability of CATs, CURB-65 and PMEWS was conducted using the full FLU-CIN cohort.Analyses were conducted by age group.A complete case analysis was used.

Methods
CATs scores were calculated by awarding a single point for each of the following: severe respiratory distress, increased respiratory rate, oxygen saturation #92% (in air or supplemental oxygen), respiratory exhaustion, severe clinical dehydration, altered consciousness and a maximum of one point for causing any other clinical concern to the attending clinicians; on or shortly after admission to hospital.The definitions for CATs criteria differ for children and adults and are provided in Appendix S1.CURB-65 scores were calculated by awarding one point for each of the following: confusion, urea .7 mmol/l, respiratory rate $30/ minute, low systolic (,90 mmHg) or diastolic (#60 mmHg) blood pressure and age $65 years [2].PMEWS scores were calculated using the algorithm described by Challen et al. with points being allocated on a weighted basis for varying values of the following indicators: respiratory rate, oxygen saturation, heart rate, systolic blood pressure, temperature, neurological signs (level of alertness).In addition, a point was awarded for age $65 years, social isolation, chronic disease and performance status of limited activity (modified Karnofsky .2) [6].
The discriminatory value of the three tools was initially compared using logistic regression to assess whether various outcomes: patients admitted to higher levels of care (high dependency care -Level 2 or intensive care -Level 3), death, or severe outcomes as a whole (a combined measure indicating either Level 2/3 admission or death); were more likely to have higher scores than controls.Each scoring system was included in a univariable logistic model as a continuous variable on the assumption that the scores would follow a linear trend.
Results were presented as unadjusted Odds Ratios (ORs) and 95 per cent Confidence Intervals (95% CI).The resulting ORs could therefore be interpreted as the increased likelihood of a given clinical outcome for every unit increase on the scoring scale.
The three tools were then compared on their ability to predict: admission to higher levels of care, death or severe outcome (combined higher level of care and or death); using area under the Receiver Operating Characteristic (ROC) curve (AUROC) comparisons with 95% confidence intervals.Calibration of the model was tested using the Hosmer-Lemeshow goodness-of-fit test.
The sensitivity (the proportion of true positives that are correctly identified by the test), specificity (the proportion of true negatives that were correctly predicted by the test), positive predictive value (PPV) i.e., the proportion of test positive patients who actually had the outcome; and negative predictive value (NPV) i.e., the proportion of test negative patients who were actually negative for the outcome, were calculated for each of the tools using various score thresholds.All analyses were carried out using Stata version 11.0 (StataCorp.2009).
Before commencement, FLU-CIN procedures were reviewed by the Ethics and Confidentiality Committee of the National Information Governance Board for Health and Social Care in England and approved for collection, storage and use of personal data for surveillance purposes.

Results
The study sample comprised 1040 (68.4%) adults and 480 (31.6%) children (age,16 years) admitted to hospital in two pandemic waves: Spring/Summer 2009 (n = 601) and Autumn/ Winter 2009/10 (n = 919).The median age was 26 years (interquartile range 9 to 44 years).There were 800 (52.6%) females of whom 83 aged 14 to 44 years were pregnant (20.8%).The clinical characteristics of the first-wave cohort have been described previously [9].Tables 1 and 2 present the distribution of CATs scores, CURB-65 scores and PMEWS scores by admission to higher levels of care and mortality.Results are presented as unadjusted Odds Ratios (ORs) and 95% Confidence Intervals (95%CI).The resulting ORs could therefore be interpreted as the increased likelihood of a given clinical outcome for every unit increase on each scoring scale.For each of the triage tools, adult patients with any severe outcome (higher level of care and/or inpatient death) were more likely to have higher scores as compared to controls.In children, both CATs and PMEWS scores were more likely to be higher in patients with severe outcomes.
Calibration i.e. the proximity of observed and expected values or goodness-of-fit of the logistic regression models was tested using the Hosmer-Lemeshow goodness-of-fit test.In adults, the outcomes 'Level 2 or 3 admission' and 'Death', all logistic regression models for all three triage tools (CATs, CURB-65 and PMEWS) showed good calibration.When considering combined severe outcomes (Level 2/3 admission or death), only CATs and CURB-65 demonstrated good calibration between observed and expected values; PMEWS had a poor fit (p = 0.0453).In children, CATs was the only triage tool for which the logistic regression model showed good calibration for all three outcomes.Both CURB-65 and PMEWS showed good calibration between observed and expected values for 'death' but poor calibration when used for predicting 'Level 2 or 3 admission' (p = 0.0204 and p = 0.0176 respectively).
The ROC curves and AUROC values comparing the predictive value of the three clinical triage tools are described in figure 1    score .9,both significantly better than CURB-65.In children, a PMEWS score .9 was the best predictor of mortality in children; performing marginally better than a CATs score .3,both significantly better than CURB-65.

Discussion
There has been only one head-to-head validation of the performance of CURB-65, PMEWS and CATs during the 2009 pandemic period [8].Our study has the advantages of large size (n = 1520), confirmation of cases by standardised PCR criteria, and relatively few missing data.Reported cases were followed up without selection and the acquisition of cases closely mirrored the national epidemic curve geographically and temporally [10].
Overall, 16.5% of patients required high dependency or intensive care and 5.3% died.
Two characteristics are crucial when evaluating a clinical prediction test or algorithm: clinical validity and clinical utility.Simon defines clinical validity as the ability of the test result to correlate with a clinical end point or characteristic [11].Our results show that for each of the three clinical triage tools, a higher score is associated with a greater likelihood of severe clinical outcomes in adult cases, indicating that all three demonstrate clinical validity.In the case of children however, only CATs and PMEWS demonstrate this linear relationship.The ROC curves and AUROC analysis show that in terms of overall performance, CATs are significantly better than CURB-65 or PMEWS as a predictor of combined severe outcomes across all age groups.CURB-65 and CATs are similar in their ability to predict mortality in adults but CATs has better performance in predicting admission to higher levels of care.It can be argued that the latter outcome is more meaningful for clinicians as the primary aim of triage tools is to identify patients who are most likely to benefit from higher levels of care rather than those most likely to die.
The CURB-65 score is validated only for use in adults with community acquired pneumonia to predict 30-day mortality [12,13].CURB-65 was not developed for use in non-pneumonic respiratory tract infections nor to predict need for intensive care admission.Results from the current study reinforce these points.
A predictive test has clinical utility only if the use of the test results in improved outcomes for patients [11].Although clinical utility can only be fully evaluated in a separate prospective cohort, the first step towards this is to determine a suitable threshold value that can discriminate between alternative clinical outcomes.Ideally, a good prediction test should have both high sensitivity and specificity.There is usually a trade-off between sensitivity and PMEWS.7 0.59 (0.52, 0.65) 46.8 (34.0, 59.
PMEWS.9 0.54 (0.49, 0.60) 21.0 (11.7, 33. PMEWS.11 0.51 (0.48, 0. specificity.The AUROC provides a combined measure of all the sensitivity/specificity pairs resulting from varying levels of the decision threshold over the entire range of results [14].There may be some scenarios however, where it is very important not to miss a 'diagnosis', one may opt in favour of a higher sensitivity as compared to specificity for e.g. a disease with high mortality where an effective treatment is available [15].The use of lower thresholds with PMEWS (cut-off values of 1, 2, 3 or 4) demonstrated high sensitivity (77 to 98%) but it is probable that in a pandemic situation where surge capacity is reached, these low thresholds will not offer sufficient discrimination for healthcare prioritisation.Positive predictive values across various thresholds for all scoring systems were generally low but these findings may well reflect the general mildness of 2009 pandemic influenza and the associated low incidence of severe outcomes.As such these measures may not predict the performance of these tools during a more severe influenza pandemic or other highly pathogenic pandemic.Another aspect of clinical utility is the ease of Table 4. Predictive values of CATs, CURB-65 and PMEWS scores for predicting severe outcomes in children (,16 years, n = 480).applicability of the test [14].CURB-65 scores require serum urea measurements which are not easily or rapidly available in community settings and the PMEWS algorithm uses a complex weighted matrix to calculate scores [6].CATs on the other hand, rely on clinical indicators that can be easily and immediately assessed in community settings and can be repeated and compared in any setting.The sensitivity analysis restricted to adults with proven A/ H1N1 2009 and a diagnosis of community acquired pneumonia validated with reported radiographs shows that in the setting of triage for this pandemic event (and only in this setting), this group of adults would not have been disadvantaged if they were assessed using the adult CATs.
This study shows that on the basis of AUROC values a CATs score $3 offers the best predictive value for Level 2/3 admissions and death when considered as independent or combined outcomes in adults.In children, a CATs score $3 offers the best predictor of need for higher levels of care and combined severe outcome, while a PMEWS score .9 was marginally the better predictor of mortality, followed closely by a CATs score $3.However, as the 95% CI for the two AUROCs overlap, a CATs score $3 would offer a reasonable substitute given the overall better performance across age groups for predicting higher levels of care and combined severe outcomes.
A CATs score $3 could therefore be used to fast-track patients of any age to critical care earlier in the hope that their survival will improve.In a pandemic situation, when critical care is overburdened, clinical decision-makers may face very difficult ethical dilemmas concerning access to critical care.CATs allow both children and adults to be triaged within the same conceptual framework.This will be important if scarce resources are to be shared across wider age groups than would occur under normal conditions.The use of CATs scores may help to ensure that treatment access is determined in a fair way, by use of an objective measure of likelihood of benefit from such care.The ethical dilemmas arising in this situation have been considered elsewhere [1].
Appropriate use of triage tools should expedite referral both to hospital, and where scores are high, prompt consideration for admission to Level 2/3 care.This may be associated with improved patient outcomes.A study using the FLU-CIN cohort found that delayed admission to hospital ($4 days after symptom onset) was significantly associated with increased likelihood of admission to critical care and death [16].
This study confirms the lack of effectiveness of the CURB-65 score as a triage tool for children during an influenza pandemic.The AUROC values for CURB-65 scores in children all approximate to 0.5, not significantly different from pure chance.CURB-65 should not be considered for use in this, and probably any setting involving children.
The validity of the CURB-65 score to predict mortality in adults with A/H1N1 2009 infection both with and without radiograph validated pneumonia is confirmed.Access to laboratory and radiological investigations during a severe pandemic may limit the utility of this tool.
Ideally, the clinical validity and utility of triage tools should be studied prospectively in parallel in a community cohort of pandemic influenza patients, to establish whether they can be used by general practitioners to decide which patients could benefit from hospitalisation.

Limitations
This was a case-control analysis using retrospectively collected data derived from physicians' first routine clinical assessment of patients during a pandemic event.By design it is not possible to assess intra-observer agreement, inter-observer agreement or ability to detect change.
A potential limitation of this study relates to possible missing data in some criteria.This applies in particular to those criteria that depend upon clinicians recording as a matter of routine the presence or absence of a criterion such as ''capillary refill time .2seconds or other evidence of shock''.As this is a secondary data analysis based on pragmatic recording of routine clinical assessments, the underpinning assumption is that the data recorded on criteria is reasonably complete; however there is no way to verify this.By default, some missing data will be incorrectly attributed to the control group in each analysis.That is, where a criterion is not recorded as being present, that criterion is assumed to be absent.Attempts were made to overcome this by applying criterion definitions to clinical data in other sections to validate and if necessary, update variable values.Using this approach, we were able to impute 20-35% data values, which would have otherwise been missing data.This limitation is common to the whole data set, reflects the reality of clinical practise, and does not preclude fair comparison of the validity and utility of the three tools.
A possible limitation of our study is that we used a completecase analysis approach.This could bias our results if the data are not 'missing completely at random' (MCAR).Multiple imputation is often recommended but it is still based on the assumption that every subject in a randomly chosen sample can be replaced by a new subject that is randomly chosen from the same source population as the original subject, without compromising the conclusions [17].However, given that the three tools have some common variables in their construction (particularly the ones with missing values), one could still argue that any bias would be nondifferential and so our comparison still stands.
This study does not include comparative assessment of the triage tools in the community.The validity and utility of triage tools in the community remains untested.
Morbidity and mortality rates were low during this event when compared to some previous influenza pandemics and the use of anti-viral therapy was generally low in our cohort despite it being widely available at the time.A more severe pandemic may be associated with a greater acceptance of anti-viral therapy and this may impact upon need for higher levels of care and death.

Generalisability
CATs and PMEWS were developed for use during pandemic events and their criteria address the most likely modes of critical illness arising from influenza, or the complications of influenza.Both were also designed to identify sick patients most likely to benefit from higher levels of care due to other illnesses, which at presentation are indistinguishable from influenza like illness.CATs may have value in other scenarios where high-bar triage is required for both adults and children such as other severe acute respiratory pandemic events and possibly some mass casualty events.

Conclusions
This study shows that CATs appear better suited as a predictive tool for severe outcomes in pandemic influenza than the CURB-65 score and PMEWS.We propose a CATs score $3 as a decision threshold prompting consideration for admission to higher levels of care.This was a retrospective study and the validity and utility of CATs needs to be assessed in a separate prospective cohort including triage in the community.Conducting this study prospectively in a community cohort linked to hospital outcome during a future pandemic would also enable researchers to assess and compare the validity and utility of CATs and other triage tools in relation to hospital admission.Since pandemics are unpredictable and infrequent, limited but potentially useful information would be gained from a prospective evaluation during seasonal influenza periods.
FLU-CIN was an 'emergency' surveillance network established by the Department of Health England.FLU-CIN used a purposive sampling frame based on 13 sentinel hospitals situated in five clinical 'hubs' in Nottingham, Leicester, London, Sheffield and Liverpool, with contributions from a further 45 non-sentinel hospitals in England and 17 in Scotland, Wales and Northern Ireland.Between April 2009 and January 2010, clinical, epidemiological and outcome data were collected on 1520 patients (800 female, 480 children ,16 years) admitted to participating UK hospitals with confirmed A/H1N1 2009 influenza infection.