Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The Predictive Value of the NICE “Red Traffic Lights” in Acutely Ill Children

The Predictive Value of the NICE “Red Traffic Lights” in Acutely Ill Children

  • Evelien Kerkhof, 
  • Monica Lakhanpaul, 
  • Samiran Ray, 
  • Jan Y. Verbakel, 
  • Ann Van den Bruel, 
  • Matthew Thompson, 
  • Marjolein Y. Berger, 
  • Henriette A. Moll, 
  • Rianne Oostenbrink, 
  • for the European Research Network on recognising serious InfEctions (ERNIE) members



Early recognition and treatment of febrile children with serious infections (SI) improves prognosis, however, early detection can be difficult. We aimed to validate the predictive rule-in value of the National Institute for Health and Clinical Excellence (NICE) most severe alarming signs or symptoms to identify SI in children.

Design, Setting and Participants

The 16 most severe (“red”) features of the NICE traffic light system were validated in seven different primary care and emergency department settings, including 6,260 children presenting with acute illness.

Main Outcome Measures

We focussed on the individual predictive value of single red features for SI and their combinations. Results were presented as positive likelihood ratios, sensitivities and specificities. We categorised “general” and “disease-specific” red features. Changes in pre-test probability versus post-test probability for SI were visualised in Fagan nomograms.


Almost all red features had rule-in value for SI, but only four individual red features substantially raised the probability of SI in more than one dataset: “does not wake/stay awake”, “reduced skin turgor”, “non-blanching rash”, and “focal neurological signs”. The presence of ≥3 red features improved prediction of SI but still lacked strong rule-in value as likelihood ratios were below 5.


The rule-in value of the most severe alarming signs or symptoms of the NICE traffic light system for identifying children with SI was limited, even when multiple red features were present. Our study highlights the importance of assessing the predictive value of alarming signs in clinical guidelines prior to widespread implementation in routine practice.


Fever is one of the most common symptoms among children presenting to ambulatory care.[1][3] The majority of children presenting with an acute illness to ambulatory care will have self-limiting viral infections, with only a small proportion having a serious infection (SI).[1], [4][6] Early recognition and treatment of children with SI are related to better prognosis,[7], [8] however identification of SI at first presentation can be difficult.

The National Institute for Health and Clinical Excellence (NICE) 2013 guideline for the management of children with feverish illness provides comprehensive guidance on the assessment, investigation and management of children presenting at different settings, including primary care and pediatric specialty settings.[6], [9] One of the key elements of the guideline is a “traffic light” system for the diagnostic assessment of children under five years of age presenting with a feverish illness. This evidence and consensus-based system includes clinical features identified from existing scoring systems for acutely ill children,[10][13] and disease-specific signs and symptoms. Children with the most alarming (or “red”) features are considered at higher risk of SI, for whom subsequent management includes invasive investigations, treatment, and hospital admission.

As one of the few evidence-based guidelines for children with fever [14], [15] and the only for both primary and secondary care, the NICE febrile child guideline has been implemented in many settings in not only the United Kingdom but also in other countries. Recently, two studies reported low specificities for the approach that any abnormal amber or red feature would indicate possible SI.[16], [17] This could be due to the inclusion of amber features, whose association with SI may be weaker.

In this study we aimed to determine the predictive ("rule-in") value of the red features of the NICE traffic light system, both for the individual red features as their combinations for identifying children with SI in various acute pediatric settings in Europe.


Identification of datasets

We used data on seven independent cohorts [4], [18][23] collected by collaborators of the European Research Network on recognising serious InfEctions (ERNIE) group.[24] Data were prospectively collected at first contact using standardised (site-specific) documentation of patient characteristics, except for Monteny et al [19] where data was collected using structured clinical proformas separate from the consultation. All datasets were cohort studies of children in various age ranges (0–16 years), presenting to ambulatory care settings (i.e. general or family practice, pediatric outpatient clinic, pediatric assessment unit or emergency department) with an acute illness or infection.

Two datasets based on primary care settings were considered as low prevalence settings of SI (<5%) and five datasets based on emergency care settings as high prevalence settings (>5%).[25] More details on the original cohorts have been published elsewhere ([4], [18][23]).

Ethical approval

This research conforms to the Helsinki Declaration and to local legislation. The original study authors have all agreed to share their data, and had obtained ethical approval from their local research ethics committees for the initial data collection, prior to this study.

Processing of included datasets

Key characteristics of each dataset are shown in table 1. We selected children under the age of five years with an acute illness based on general symptoms [4], [21], [22] or specifically on the presence of fever [18][20], [23], as this is the target group of the NICE guideline (table 1).

Table 1. Characteristics of datasets with children <5 years of age suspected of acute illness*.

The NICE traffic light system includes 16 red features, which are categorised into 5 main domains: Colour (1 red feature), Activity (4 red features), Respiratory (3 red features), Hydration (1 red feature), and Other (7 red features).[6],[9] When study variables were not entirely identical to the red features in the NICE febrile child guideline, we identified proxies where possible. Identification and handling of variables has been described earlier [17], a full list of all approximations is described in table S1. When a red feature was not recorded in the dataset and no suitable proxy was identified, this item was excluded from that specific dataset. Table S2 outlines the unrecorded and missing data from each dataset separately.

Missing values were not imputed because the necessary missing-at-random assumption was likely to be incorrect. We considered red features that were “not documented” in individual patient records as “absent”, given that the red feature or its proxy was recorded in that particular dataset.[17]

The translation, recoding and data-checking were performed by two authors (EK, JV) and the results of each step were discussed with all primary study authors.[17]

Outcome measures

Serious infections (SI) were defined as sepsis (including bacteremia), meningitis, pneumonia, osteomyelitis, cellulitis, and complicated urinary tract infections. [25] Serious infections (SI) were not only based on clinical diagnosis, but reference standard test criteria were used to determine final diagnoses of SI. Detailed description on these reference standard test criteria are available in the original study papers.[4], [18][23] Assessment of the diagnoses to ensure comparability of outcomes was discussed with the lead investigator of each study as described earlier.[17]

Statistical analysis

The individual red features were analysed in every dataset separately. Additionally, results were categorised as “general” red features (items 1–7 and 9–10) and “disease-specific” red features (items 8 and 11–16).

We assessed the rule-in value for SI for each red feature separately by calculating positive likelihood ratios (LR+). Red features were considered to have rule-in value if they raised the probability of illness with a positive likelihood ratio of more than 5.0.[25] The univariable association between each individual red feature and the presence of SI was tested by Chi-square analysis. Likelihood ratios, sensitivity and specificity were measured for the presence of ≥1 RTL, ≥2 RTLs and ≥3 RTLs. The sensitivity and specificity for “general” and “disease-specific” red features were plotted in receiver operating characteristic (ROC) space.

The incremental diagnostic value for up to more than four red features compared to one red feature was evaluated by logistic regression analyses with forward selection (Wald test, p-value <0.05).

We visualised the change in pre-test probability versus post-test probability for SI in a Fagan nomogram.[26]

No overall pooled likelihood ratios were calculated because of the substantial clinical heterogeneity between datasets (differences in setting, inclusion criteria, immunisation schedules and definition of serious infection).[17] All analyses were done with SPSS software (version 20.0, SPSS Inc, Chicago).


Included datasets

We selected 6,260 children under five years of age of seven pre-existing datasets (n = 6,260/10,812, 58%) for diagnostic studies in children with an acute illness (table 1). Children were included based on fever,[19], [20], [23] acute illness,[4], [18] acute infection,[21] and referral for meningeal signs.[22] Children with various severities of co-morbidity were excluded in five studies,[4], [19][23], one study excluded children if the acute episode was caused by an exacerbation of a chronic condition[4] and one study excluded children who required immediate resuscitation [18] (table 1). All studies included sepsis, meningitis, pneumonia and complicated urinary tract infections in their outcome definition. Osteomyelitis and cellulitis were explicitly mentioned in five and three datasets, respectively.

The median age of the selected children ranged from 0.8 years to 1.9 years. The prevalence of SI ranged from 1.2% to 4.1% in two datasets from general practice [4], [19] and from 9.3% to 40.2% in five datasets from emergency departments and a pediatric assessment unit [18], [20][23].

Red traffic lights included in the datasets

Data on all red features included in domains “Colour” and “Hydration” were available in all datasets. The red features “no response to social cues”, and “weak, high-pitched or continuous cry” of domain “Activity” were not recorded in two [20], [23], and one dataset [18], respectively. Other red features in this domain were available in all datasets. Red features related to the “Respiratory” domain were not recorded in four (“grunting”) [4], [21][23], one (“tachypnoea”) [22], and two (“chest indrawing”) [22], [23], datasets respectively. “Disease-specific” red features (items 8 and 11–16) were recorded less frequently in all datasets but in particular in low prevalence settings (range missing values 0–50%), see table S2).

Performance of individual red traffic lights

Table 2 shows positive and negative likelihood ratios of the 16 individual red features for each dataset separately. All red features with high rule-in value (LR+ >5) are highlighted in bold.

Four of all 16 red features did not achieve high rule-in value (LR+ <5) including two red features which were not available in the datasets or were not reaching significance (p<0.05) when present.

The one red feature which provided high rule-in value in two datasets from both low and higher prevalence settings, was “does not wake or if roused does not stay awake” (LR+5.9 (95% CI 3.5–10.0) and LR+7.8, 95% CI 4.4–13.6, respectively). The red features “reduced skin turgor”, “non-blanching rash”, and “focal neurological signs” showed high rule-in value in two high prevalence settings each (range LR+5.0-9.7)[18], [20], [22]. The red features “pale/mottled/ashen/blue”, “appears ill to a healthcare professional”, “weak, high-pitched or continuous cry”, “tachypnoea”, “moderate or severe chest indrawing”, and “age 0–3 months & temperature ≥38°C” showed high rule-in value in one low prevalence setting (range LR+5.9-83.6)[4]. High rule-in value for the red features “grunting” and “bulging fontanelle”, was observed in one high prevalence dataset (range LR+7.8–11.3).[20] In two high prevalence settings for none of the red features high rule-in value was observed.[21], [23]

Performance of multiple red traffic lights

The association between SI and the number of positive red features with the performance measures of positive likelihood ratios, sensitivity and specificity is shown in table 3. We measured the maximum predictive value of multiple red features by logistic regression analysis and the slope of the ROC-curve. We noted a significant increase of rule-in value with the number of positive red features in most datasets (range LR+2.1 – 10.0 when ≥3 red features), with the exception of Monteny et al.[19] (p-value <0.05). This was also observed in the increased values of specificity when more red features were present. The presence of 4 or more red features did not contribute to discriminative value compared to up to 3 red features. The proportion of children having ≥3 red features ranged from 2% to 50% and did not differ between low and high prevalence settings. “General” red features were almost entirely responsible for the total ROC-area (table 3). We did not test disease-specific red features on disease-specific outcome measures due to the small numbers of these events. In figure 1 we visualised the change in pre-test to post-test probability for SI when three or more (general or disease-specific) red features were present in a Fagan nomogram.[27] For example, the 9% pre-test probability of having a SI for a child in the Brent et al dataset increases to 28% (95% CI 17–42%) post-test probability when having three or more red features, but decreases only to 7% (95% CI 6–9%) if less than three red features were present.

Figure 1. Calculation of post-test pobability for serious infections if ≥3 red traffic lights present using Fagan nomogram.

Table 3. Likelihood ratios and ROC-areas of combinations of multiple red traffic lights.


Main findings

This is the first study on broadly validating the diagnostic performance of the individual red features and their combinations of the NICE febrile child guideline in acutely ill children in various settings in Europe. Although we observed rule-in value for almost all individual red features in at least one dataset, only four red features raised the probability of SI with a positive likelihood ratio of more than 5.0 in more than one setting: “does not wake or if roused does not stay awake”, “reduced skin turgor”, “non-blanching rash”, and “focal neurological signs”. Children with more than one red feature had an increased risk of SI, however, more than three red features did not further increase disease probability.

Comparison with other studies

To our knowledge there are three previous studies that estimated the predictive value of any amber or red feature for the detection of SI, but they did not evaluate the individual features of the NICE traffic light system separately. De et al.[16] found that the NICE traffic light system failed to identify a substantial proportion of children with serious bacterial infections. Combining the amber and red feature categories resulted in a sensitivity of 85.8% and specificity of 28.5% for the detection of any serious bacterial infections. Within the original data of Thompson et al. the diagnostic value of vital signs and the NICE traffic light system for identifying children with SI was assessed in a pediatric assessment unit.[21] They stated that the presence of one or more amber and red features was 85% sensitive, but only 29% specific in identifying serious or intermediate infections.[21] However, this original study was performed in children up to 16 years of age in contrast to this present study limited to children up to 5 years of age. Finally, a previous study assessing the diagnostic value of any abnormal amber or red feature (not considering combinations) of the NICE traffic light system to rule-out SI, had sensitivity of 97–100% in low and intermediate prevalence settings and 87–99% in high prevalence settings.[17] The results of all three validation studies suggest possible clinical value for ruling-out SI using both amber and red features, but at the expense of a large group of children testing false positive. However, up to 15% of children with a serious infection will be missed. Alternatively, the presence of any amber or red feature does not allow ruling-in SI considering the very low specificity. In low prevalence settings, alarming signs are preferably highly sensitive to correctly rule-out SI in order to limit incorrect referral.[24] In high prevalence settings specificity is more important because a high rate of false positive children could result in high admission rates and unnecessary investigations.[24] Unfortunately there was too much heterogeneity in our datasets to stratify according to prevalence.

Clinical and research implications

With decreasing incidence of SI, clinicians may increasingly rely on alarming symptoms described in (inter)national clinical guidelines. Broad validation could support the wider adoption of the NICE guideline in various settings in Europe and other high-income countries. Although the traffic light system of the NICE febrile child guideline is mostly based on systematic literature reviews and consensus, only four red features achieved high rule-in value in more than one dataset and none of them across all settings. Moreover, in at least as many datasets these four red features did not achieved high rule-in value and therefore hampers strong conclusions.

The rule-in value of several other red features was not confirmed in multiple settings either, questioning their inclusion in this setting-independent traffic light system.

Our observations of varying rule-in values of red features in the 7 databases did not support the development of one prediction model including the most important red features. However, we consistently observed an association between 3 or more red features and SI but combinations of red features will never be able to definitely rule-in a SI without uncertainty. This could be due to dilution of their accuracy by the inclusion of aspecific red features or because of the interaction between different red features.

The relatively lower recording of “disease-specific” features hampered our analyses, in particular in low prevalence settings. This may in part have been caused by the fact that it is more difficult to identify proxies for such features, in contrast to more general features.

The main findings in our study corresponds with the limited performance of the Yale Observation Scale, on which the NICE traffic light system is partly based.[17], [25] In the revised 2013 guideline[9] two red features were deleted of the previous 2007 protocol6 or transferred to amber features: “Age 3–6m & temperature ≥39°C” and “bile-stained vomiting”. This is supported by our findings that we did not find rule-in value for the former but only had one dataset available for the latter which showed high rule-in value though. Next, as disease specific red features are strongly related to specific but rare diseases, their positive documentation rate is already expected to be low. Although these disease specific red features may be relevant for one specific outcome, it is difficult to evaluate these in the general population of fever with a broad differential diagnosis. However, achieving complete certainty with clinical features is not the goal here. Rather, red features should lift the probability of SI over a certain decision threshold: either to refer, request additional testing or start empiric treatment. As we do not know at what specific risk thresholds we (intuitively) undertake action, clinical interpretation of post-test probabilities as expressed in Fagan nomograms (figure 1) remains difficult. As diagnosis assessment is a dynamic process and may be influenced by evolution of symptoms in time, repeated assessment of deviating red features in those with only one or two features in particular, may improve the evaluation of SI.

Finally, the NICE traffic light system could also be improved by taking more recent evidence into account, such as on peripheral circulation, parental concern [25] or urine analysis [16].

Strengths and limitations

We assessed the NICE red traffic lights in 6,260 children from seven existing datasets with various pediatric populations and settings including two low prevalence primary care settings, which are usually underrepresented in diagnostic studies in this area.[24] In addition, we validated the red features separately to identify their individual predictive value.

Despite the large amount of data, not all red features had been recorded in all datasets, necessitating the use of proxy variables.[17] Furthermore, differences in population characteristics (table 1), such as age distribution or prevalence of specific diagnoses within the group of SI, prevented the calculation of overall diagnostic performance measures.

Furthermore, by assuming missing red features as not present and more complete documentation of red features in ill children, we may have overestimated our likelihood ratios by increasing the contrast between children with and without SI.

However, the variability in variables and case-mix reflects clinical practice and therefore will strengthen generalizability of our results.


Our results support rule-in value of several individual red features from the NICE febrile child guideline in specific settings, although not consistent. However most features had little rule-in value across multiple settings. The NICE red traffic lights, even when three or more features are present, seem to have limited value for ruling-in serious infections. Our results underline the importance to widely validate the predictive value of individual and combinations of multiple red features in clinical guidelines, prior to widespread dissemination and adoption.

Supporting Information


The authors wrote the paper on behalf of the European Research Network on Recognising Serious Infection (ERNIE). The principal investigators are: Marjolein Berger, Frank Buntinx, Bert Aertgeerts, Monica Lakhanpaul, David Mant, Henriette Moll, Rianne Oostenbrink, Richard Stevens, Matthew Thompson, Ann Van den Bruel and Jan Verbakel. We gratefully acknowledge the members of the ERNIE group for the collaboration and their advice on the message of the manuscript. We want to thank the emergency-personnel of all ambulatory care settings for their participation and careful collection of the required data. We acknowledge the researchers Sacha Bleeker, Andrew Brent, Samiran Ray, Jolt Roukema, Matthew Thompson, Miriam Monteny and Ann Van den Bruel for the data acquisition and contribution of the original studies.

Disclaimer: for this study we selected children with an acute illness from seven cohorts collected by the collaboration of the European Research Network on recognising serious InfEctions(ERNIE) group.[4], [18], [19], [20][23]

Author Contributions

Conceived and designed the experiments: EK ML SR HAM RO. Performed the experiments: EK ML SR JV AVDB MT MYB HAM RO. Analyzed the data: EK RO. Contributed reagents/materials/analysis tools: ML JV AVDB MT MYB HAM RO. Wrote the paper: EK RO. Reviewed and revised the manuscript: EK ML SR JV AVDB MT MYB HAM RO. Translation, synopsis and recoding of the datasets: EK SR JV. Participated in recoding and data checking and in discussion about each step of the results: ML SR JV AVDB MT MYB HAM RO.


  1. 1. Armon K, MacFaul R, Hemingway P, Werneke U, Stephenson T (2004) The impact of presenting problem based guidelines for children with medical problems in an accident and emergency department. Arch Dis Child 89: 159–164.
  2. 2. Bruijnzeels MA, Foets M, van der Wouden JC, van den Heuvel WJ, Prins A (1998) Everyday symptoms in childhood: occurrence and general practitioner consultation rates. Br J Gen Pract 48: 880–884.
  3. 3. Moll van Charante EP, van Steenwijk-Opdam PC, Bindels PJ (2007) Out-of-hours demand for GP care and emergency services: patients' choices and referrals by general practitioners and ambulance services. BMC Fam Pract 8: 46.
  4. 4. Van den Bruel A, Aertgeerts B, Bruyninckx R, Aerts M, Buntinx F (2007) Signs and symptoms for diagnosis of serious infections in children: a prospective study in primary care. Br J Gen Pract 57: 538–546.
  5. 5. Craig JC, Williams GJ, Jones M, Codarini M, Macaskill P, et al. (2010) The accuracy of clinical symptoms and signs for the diagnosis of serious bacterial infection in young febrile children: prospective cohort study of 15 781 febrile illnesses. BMJ 340: c1594.
  6. 6. National Institute for Health and Clinical Excellence (2007) Feverish illness in children - Assessment and initial management in children younger than 5 years. London: National Institute for Health and Clinical Excellence.
  7. 7. Inwald DP, Tasker RC, Peters MJ, Nadel S (2009) Paediatric Intensive Care Society Study G (2009) Emergency management of children with severe sepsis in the United Kingdom: the results of the Paediatric Intensive Care Society sepsis audit. Arch Dis Child 94: 348–353.
  8. 8. Kumar A (2009) Optimizing antimicrobial therapy in sepsis and septic shock.Crit Care Clin 25: : 733–751, viii.
  9. 9. Chen SM, Chang HM, Hung TW, Chao YH, Tsai JD, et al. (2013) Diagnostic performance of procalcitonin for hospitalised children with acute pyelonephritis presenting to the paediatric emergency department. Emerg Med J 30: 406–410.
  10. 10. McCarthy PL, Sharpe MR, Spiesel SZ, Dolan TF, Forsyth BW, et al. (1982) Observation scales to identify serious illness in febrile children. Pediatrics 70: 802–809.
  11. 11. Teach SJ, Fleisher GR (1995) Efficacy of an observation scale in detecting bacteremia in febrile children three to thirty-six months of age, treated as outpatients. Occult Bacteremia Study Group. J Pediatr 126: 877–881.
  12. 12. McCarthy PL, Lembo RM, Baron MA, Fink HD, Cicchetti DV (1985) Predictive value of abnormal physical examination findings in ill-appearing and well-appearing febrile children. Pediatrics 76: 167–171.
  13. 13. Baker MD, Avner JR, Bell LM (1990) Failure of infant observation scales in detecting serious illness in febrile, 4- to 8-week-old infants. Pediatrics 85: 1040–1043.
  14. 14. Berger MY, Boomsma LJ, Albeda FW, Dijkstra RH, Graafmans TA, et al. (2008) Guideline Children with fever [NHG-standaard Kinderen met koorts]. Dutch College of General Practitioners [Nederlands Huisartsen Genootschap].
  15. 15. Chiappini E, Principi N, Longhi R, Tovo PA, Becherucci P, et al. (2009) Management of fever in children: summary of the Italian Pediatric Society guidelines. Clin Ther 31: 1826–1843.
  16. 16. De S, Williams GJ, Hayen A, Macaskill P, McCaskill M, et al. (2013) Accuracy of the "traffic light" clinical decision rule for serious bacterial infections in young children with fever: a retrospective cohort study. BMJ 346: f866.
  17. 17. Verbakel JY, Van den Bruel A, Thompson M, Stevens R, Aertgeerts B, et al. (2013) How well do clinical prediction rules perform in identifying serious infections in acutely ill children across an international network of ambulatory care datasets? BMC Med 11: 10.
  18. 18. Brent AJ, Lakhanpaul M, Thompson M, Collier J, Ray S, et al. (2011) Risk score to stratify children with suspected serious bacterial infection: Observational cohort study. Arch Dis Child 96: 361–367.
  19. 19. Monteny M, Berger MY, van der Wouden JC, Broekman BJ, Koes BW (2008) Triage of febrile children at a GP cooperative: determinants of a consultation. Br J Gen Pract 58: 242–247.
  20. 20. Roukema J, Steyerberg EW, van der Lei J, Moll HA (2008) Randomized trial of a clinical decision support system: impact on the management of children with fever without apparent source. J Am Med Inform Assoc 15: 107–113.
  21. 21. Thompson M, Coad N, Harnden A, Mayon-White R, Perera R, et al. (2009) How well do vital signs identify children with serious infections in paediatric emergency care? Arch Dis Child 94: 888–893.
  22. 22. Oostenbrink R, Moons KG, Derksen-Lubsen AG, Grobbee DE, Moll HA (2004) A diagnostic decision rule for management of children with meningeal signs. Eur J Epidemiol 19: 109–116.
  23. 23. Bleeker SE, Derksen-Lubsen G, Grobbee DE, Donders AR, Moons KG, et al. (2007) Validating and updating a prediction rule for serious bacterial infection in patients with fever without source. Acta Paediatr 96: 100–104.
  24. 24. Oostenbrink R, Thompson M, Steyerberg EW, ERNIE members (2012) Barriers to translating diagnostic research in febrile children to clinical practice: a systematic review. Arch Dis Child 97: 667–672.
  25. 25. Van den Bruel A, Haj-Hassan T, Thompson M, Buntinx F, Mant D, et al. (2010) Diagnostic value of clinical features at presentation to identify serious infection in children in developed countries: a systematic review. Lancet 375: 834–845.
  26. 26. Nomogram for Bayes's Theorem. New England Journal of Medicine 293: 257–257.
  27. 27. Fagan TJ (1975) Letter: Nomogram for Bayes theorem. N Engl J Med 293: 257.