Can clinical prediction models assess antibiotic need in childhood pneumonia? A validation study in paediatric emergency care

Objectives Pneumonia is the most common bacterial infection in children at the emergency department (ED). Clinical prediction models for childhood pneumonia have been developed (using chest x-ray as their reference standard), but without implementation in clinical practice. Given current insights in the diagnostic limitations of chest x-ray, this study aims to validate these prediction models for a clinical diagnosis of pneumonia, and to explore their potential to guide decisions on antibiotic treatment at the ED. Methods We systematically identified clinical prediction models for childhood pneumonia and assessed their quality. We evaluated the validity of these models in two populations, using a clinical reference standard (1. definite/probable bacterial, 2. bacterial syndrome, 3. unknown bacterial/viral, 4. viral syndrome, 5. definite/probable viral), measuring performance by the ordinal c-statistic (ORC). Validation populations included prospectively collected data of children aged 1 month to 5 years attending the ED of Rotterdam (2012–2013) or Coventry (2005–2006) with fever and cough or dyspnoea. Results We identified eight prediction models and could evaluate the validity of seven, with original good performance. In the Dutch population 22/248 (9%) had a bacterial infection, in Coventry 53/301 (17%), antibiotic prescription was 21% and 35% respectively. Three models predicted a higher risk in children with bacterial infections than in those with viral disease (ORC ≥0.55) and could identify children at low risk of bacterial infection. Conclusions Three clinical prediction models for childhood pneumonia could discriminate fairly well between a clinical reference standard of bacterial versus viral infection. However, they all require the measurement of biomarkers, raising questions on the exact target population when implementing these models in clinical practice. Moreover, choosing optimal thresholds to guide antibiotic prescription is challenging and requires careful consideration of potential harms and benefits.


Introduction
Community-acquired pneumonia is the second largest cause of childhood mortality worldwide [1]. Despite improvements over the past decades, lower respiratory tract infections are still responsible for 103.3 deaths per 100,000 people in children under five years globally, with large differences across regions [2]. Respiratory tract infections are also a common reason for emergency department (ED) visit and the most frequent indication for antibiotic prescription in children [1,3]. Discriminating bacterial infections that require antibiotic treatment from viral, self-limiting disease is one of the biggest diagnostic challenges in childhood pneumonia. Chest x-ray is no longer recommended as the gold standard for bacterial pneumonia [4], and routinely available biomarkers are not pathognomonic for this diagnosis [5]. At the same time, accurate diagnosis of bacterial infection is crucial, since misuse of antibiotics is associated with increased antimicrobial resistance, which in turn also causes morbidity and mortality [6]. Current antibiotic prescription for suspected pneumonia in Western countries ranges from 23-59% with wide acknowledgement that a considerable proportion of these antibiotics are not necessary [3,7].
In order to standardize the evaluation and treatment of children suspected of pneumonia, clinical decision support systems could be useful tools to classify children into a high or low risk profile [8]. Multiple clinical prediction models for childhood pneumonia have been developed. Even though their current use in clinical practice is limited, they may play a role as treatment decision support, thereby improving rational antibiotic prescription. However, since those models are mainly developed with chest x-ray as their reference standard, it is unclear if they can also validly predict a clinically based diagnosis of pneumonia. Moreover, the question is whether these models can be translated into clinical practice by guiding decisions on antibiotic treatment.
This study aims to systematically search available clinical prediction models for childhood pneumonia in ED settings in high-income countries, to evaluate their validity using a new, clinical diagnosis reference standard, and to explore their potential to guide decisions on antibiotic treatment.

Selection and quality assessment of prediction models
A systematic search for prediction models of childhood pneumonia was performed in Embase, Medline Ovid, Web of science, PubMed and Google scholar in September 2017. We included studies on diagnosis and treatment of uncomplicated childhood pneumonia in ED settings in Western countries published since 2000 (see search strategy and exclusion criteria, S1 Text).
imputation model included information about clinical signs and symptoms, referral, diagnostic tests and treatment. We performed all analyses of the validation on the 10 imputed datasets and then averaged the results [15]. When a variable of a prediction model was completely missing in our database, multiple imputation was not possible and we used a proxy (e.g. 'retractions' as a proxy variable for 'dyspnoea', if 'dyspnoea' was not available). For continuous variables, the prevalence of that variable in the original derivation population of the prediction model was used (mean imputation) [16]. CRP-level was truncated at the level of 225 mg/L, following the study of Nijman [17].
We evaluated the validity of those prediction models of which more than 50% of the predictors were available in our database, assuming this as a minimum for credible predictions [16]. We calculated the risk of bacterial pneumonia using each of the included prediction models for all children in our study populations, illustrated by histograms and boxplots. To measure performance, we calculated the ordinal c-statistic (ORC)-a measure similar to the area under the receiver-operating-curve (AUC), but for ordinal instead of dichotomous outcomes. This statistic can be interpreted as the probability that two cases of randomly selected outcome categories are correctly ranked [18]. We defined models with an ORC of at least 0.55 as performing well and explored their potential to guide antibiotic prescription. For this purpose, we evaluated the harms and benefits of withholding antibiotics in low-risk patients, compared to the observed usual care in which treatment decisions were based on clinical judgment and routine diagnostic tests. Benefit was defined as the potential reduction of antibiotic prescription and harm as the potential risk of under treatment. Under treatment was defined as children that were classified as having a bacterial infection and who had been treated with antibiotics, but whom the prediction model classified as low-risk. We explored different thresholds for the prediction models to define low-risk and evaluated their effect on harms and benefits. All analyses were performed using SPSS (IBM version 24.0) and R (version 3.3.2).

Identification, quality and original performance of prediction models
We identified 4324 unique articles (after removal of duplicates). Based on title and abstract 4176 articles were excluded as not relevant (see S2 Fig). After full-text selection and searching references, 11 articles were eligible for inclusion (see Table 1). Eight were primary derivation studies, describing different prediction models [17,[19][20][21][22][23][24][25], three were validation or impact studies of three of these models [12,26,27] and one derivation study also included the validation of another model [25]. Even though VandenBruel's model was derived mainly in general practice setting, it was also validated in an ED setting, and therefore included in our study. Most studies included children up to the age of 16, but the majority of the included patients in all studies were under five. Most studies had radiographic pneumonia as their reference standard, except for VandenBruel's study that used hospitalization for radiographic pneumonia as its reference standard ( Table 1). All prediction models aimed to improve clinical decisionmaking in the child suspected of bacterial pneumonia. Three studies mainly focused on decisions on diagnostic tests [19,21,23]; the other studies also mentioned the potential of the models to improve management decisions on antibiotic treatment, admission or referral [17,20,22,24,25].
In general the quality of the prediction models was moderate (see Table 1 and S3 Fig) with 3 models having some risk of bias [19,21,24] and one study with concerns about the applicability [20]. Nijman's model was evaluated most thoroughly including impact analysis [17]. The models by VandenBruel, Lynch and Oostenbrink were broadly validated in previous studies  [19,20,24]; those by Mahabee-Gittens, Neuman, Craig and Irwin were only derived or validated in one setting by the original authors [21][22][23]25]. Three prediction models provided a risk classification (high versus low risk), based on the presence of specific symptoms [20,21,23]. Of these models, sensitivity at model development was moderate to good, with varying specificity (see Table 1). Only VandenBruel's model was validated in different settings, performing poorly due to high sensitivity and low specificity in three settings, the opposite in another setting, and in a last setting both poor sensitivity and specificity [26]. The other four prediction models provided a probability (predicted risk in %) of pneumonia, based on a multiple logistic regression model [17,19,24,25]. These models showed moderate to good performance at development (AUC ranging from 0.67 to 0.84) as well as in the validation studies [22,24,26]. Table 2 shows the baseline characteristics of the two populations. Using the clinical diagnosis, bacterial infection rate ranged from 9-17% and 38-41% were classified as 'unknown'. Of this latter category 74-87% recovered without antibiotics. We included seven prediction models in our validation study. We did not assess validity of Craig's model as only 14/28 variables were  present in both databases. Lynch-having only 2/4 variables available-was not validated in the Coventry database. The supplementary S1 Table gives an overview of all variables and proxies of the validated prediction models. Mahabee-Gittens published a regression model providing a probability, but the coefficients to calculate this probability were not available from the author [23]. We therefore used the presence of one or more of the included variables classifying patients at high risk of bacterial pneumonia. VandenBruel published a general prediction model for febrile children, and one for pneumonia; for this review we only used the pneumonia model [20]. Neuman used a decision tree to classify patients into 3 categories (high/intermediate/low risk of pneumonia) [21]. In this model 'history of fever' discriminated intermediate from low risk, but since fever was an inclusion criteria of all our validation populations, only high and low risk patients were identified, based on the first step of the decision tree (oxygen saturation <92%).

Validation study
Performance of prediction models. The performance of the three models with a risk classification (high/low risk) is shown in Fig 1A. The white bars indicate the number of children with predicted low risk of pneumonia and the grey bars the number of patients with predicted high risk, across the five reference standard categories (bacterial to viral infection). For example, when we used Mahabee-Gittens' model to predict the risk of having a bacterial pneumonia

Predictor variables median (IQR) or n(%) median (IQR) or n(%)
Age (  categories. Almost all children were assigned to a low risk group using Neuman's model, including children with bacterial infections. Fig 1B shows the performance of the prediction models providing a probability. Again, predictions are shown across the five diagnosis categories for each model and for both populations, illustrated by a boxplot. Lynch's model predicted high risk of pneumonia (around 90%) for all children, with little variation across the different outcome categories (see S4 Fig), and did not contribute to discrimination between bacterial or viral disease. The models by Oostenbrink, Nijman and Irwin assigned higher risks to children with bacterial infections than to the children with viral infections, confirmed by a moderate ordinal c-statistic of �0.55 (see Fig  1B).
To assess the clinical relevance of these findings, we explored the potential of the last three models to define low-risk patients possibly not needing antibiotic treatment. For example, applying a risk threshold of 10% using Nijman's model would classify 130 children (52%) in the Rotterdam population as being at low risk of bacterial pneumonia (see Table 3, details in Table 3. Clinical consequences of using prediction models to guide antibiotic prescription.

Rotterdam, n = 248 Coventry, n = 301
Observed antibiotic prescription, n (%) 51 (21%) 105 (35%) Application of clinical prediction models for childhood pneumonia S2 Table). Of these children 16 were currently treated with antibiotics. If this threshold would be used in clinical practice, and antibiotics would be withheld in all low-risk children, the overall antibiotic prescription rate would reduce from 21% (observed antibiotic prescription) to 14% (expected antibiotic prescription) in the Rotterdam population and from 35% to 16% in the Coventry population (Table 3). The potential risk of under treatment (e.g. withholding antibiotics in children with a bacterial infection who were currently treated with antibiotics) would be 2% (Rotterdam) and 5% (Coventry). Similar benefits and harms were observed when applying the models of Oostenbrink and Irwin. A threshold of 15% would lead to greater reduction in antibiotic prescription, but at a higher risk of under treatment.

Discussion
We identified eight clinical prediction models for childhood pneumonia by literature review.
Following changing perspectives on a relevant reference standard for childhood pneumonia, we could assess the validity of seven of them for a clinical diagnosis of bacterial, unknown bacterial/viral or viral infection. Three models-with good original performance and qualityassigned a higher risk to children with bacterial infection than to those with viral infection, with the potential of proper selection of children who may recover without antibiotics.
An important strength of our study is the broad validation of multiple prediction models in prospective cohorts including over 500 patients in two different European acute care settings. Our populations were rather heterogeneous in terms of their clinical characteristics, increasing the generalizability of our findings. A limitation is the heterogeneity of the information available, and missing values in general, which is related to the use of already existing datasets. We have accounted for this by multiple imputation or by using proxies where possible. Another limitation is the retrospective classification of the clinical diagnosis, based on the working diagnosis by the treating physician not blinded for clinical features and diagnostic tests. Because none of these clinical features or tests alone determined classification into a final diagnosis category, we believe this potential bias is limited. Diagnostic tests were performed at the discretion of the treating clinician, and included chest x-rays mainly. For 22 patients a definite viral or bacterial test was recorded to be positive, however, we had no data on the total performed viral/bacterial tests. Previous studies in these settings have shown that these are performed in about 10% of febrile children [12,13]. Validity assessment of the model by Mahabee-Gittens was limited by the absence of the original coefficients. Of Irwin's model only 3 out of 5 predictor variables were present, for the other two variables we used mean imputation. This may have underestimated the model's discriminative value; but given the small effect sizes of the missing variables, we consider this effect limited [16].
We should appreciate several differences between our study populations and the populations the models were originally derived on. Since our populations included febrile children at the ED, it is not surprising that we observed less variability in the predicted probabilities in the validation of Neuman and Lynch' models, since fever was one of their predictor variables. Furthermore, differences in pneumonia prevalence in the derivation populations (6-36%) of the models may explain systematic differences in predicted probabilities in 4 models [17,19,24,25,28]. In general, correcting for this involves recalibration (calibration-in-the-large) of the model to a new target population [28]. However, this type of recalibration does not influence discrimination (the ordinal c-statistic), and thus not our conclusions. It may, however, explain the variable impact the suggested thresholds have using the different models. Next, the type of reference standard (radiographic pneumonia vs. clinical diagnosis) differed between derivation and validation studies, as was the purpose of our study. Given the diagnostic limitations of chest X-rays, we chose to define our reference standard following Herberg's classification [14]. It must be noted that this choice was not proposed as a new gold standard, but rather used as a model that may reflect our best current practice. In our aim to translate prediction models into clinical practice, we observed that the performance varied by type of model. We observed that the models using the probability scale had better diagnostic performance (reflected by a higher ORC statistic) than those using a risk classification (high/low risk). This can partly be explained by the ability to adjust risk thresholds-with a direct link to the harmbenefit ratio-more easily in models using the probability scale. Models using a risk classification have a fixed threshold and lack this flexibility and may therefore show lower diagnostic performance when validated according to a new reference standard.
In order to improve rational use of antibiotics in children with respiratory infections, there is a need to improve discrimination between bacterial and viral, self-limiting disease. We showed that three of seven tested clinical prediction models could identify a low-risk group of children with self-limiting disease in an ED population fairly well and we believe those three have the potential to improve treatment decisions. Those models include a combination of signs of general illness and/or respiratory distress and biomarkers. The availability of biomarkers will influence the feasibility of implementation of these models in clinical practice. The models of Oostenbrink and Nijman include CRP measurement, Irwin's model includes CRP, procalcitonin and resistin. Given the wide availability of point-of-care CRP tests the first two models will be most feasible for routine use in the ED.
Another important challenge to be faced before prediction models can be implemented as decision tools in clinical practice is to choose optimal decision thresholds, adapted to the appropriate target population. A balance is needed between the benefit of reducing unnecessary antibiotic prescription and the harm of potential under treatment of bacterial infections. The prior risk of severe illness in a population is an important consideration. For example, in settings with high prevalence of comorbidity, the course of pneumonia will generally be more severe and missing a serious infection will have worse consequences than in a low-risk population. Next, the natural course of the disease should be taken into account. Last, access to (good quality) healthcare is important. In a setting with limited possibility for patient follow-up, potential risks of under treatment will higher. Given the natural course of pneumonia (developing over days instead of hours), a watchful waiting approach instead of immediate antibiotic treatment in children with uncomplicated pneumonia with a predicted risk <10-15% might be justified in settings with good access to care, in the presence of a proper safety-netting strategy for unexpected disease course. In low resource settings or high-risk populations lower thresholds may be reasonable. Before implementing treatment interventions based on these prediction models in clinical practice, a prospective study is needed to evaluate the overall impact of treating children according to such a prediction model, compared to usual care. Such a study should assess the feasibility and safety of the suggested thresholds for that specific setting.
Three out of seven clinical prediction models for pneumonia could discriminate fairly well between a new reference standard of bacterial and viral infection in children presenting at the ED. However, they all require the measurement of biomarkers, raising questions on the exact target population when implementing these models in clinical practice. Moreover, choosing optimal decision to guide antibiotic prescription is challenging and requires careful consideration of potential harms and benefits. Future research should focus on the feasibility and safety of treatment based on chosen decision thresholds for specific settings.