Derivation and validation of a clinical prediction model for risk-stratification of children hospitalized with severe pneumonia in Bangladesh

Children with severe pneumonia in low- and middle-income countries (LMICs) suffer from high rates of treatment failure despite appropriate World Health Organization (WHO)-directed antibiotic treatment. Developing a clinical prediction rule for treatment failure may allow early identification of high-risk patients and timely intervention to decrease mortality. We used data from two separate studies conducted at the Dhaka Hospital of the International Centre for Diarrheal Disease Research, Bangladesh (icddr,b) to derive and externally validate a clinical prediction rule for treatment failure of children hospitalized with severe pneumonia. The derivation dataset was from a randomized clinical trial conducted from 2018 to 2019, studying children aged 2 to 59 months hospitalized with severe pneumonia as defined by WHO. Treatment failure was defined by the persistence of danger signs at the end of 48 hours of antibiotic treatment or the appearance of any new danger signs within 24 hours of enrollment. We built a random forest model to identify the top predictors. The top six predictors were the presence of grunting, room air saturation, temperature, the presence of lower chest wall indrawing, the presence of respiratory distress, and central cyanosis. Using these six predictors, we created a parsimonious model with a discriminatory performance of 0.691, as measured by area under the receiving operating curve (AUC). We performed external validation using a temporally distinct dataset from a cohort study of 191 similarly aged children with severe acute malnutrition and pneumonia. In external validation, discriminatory performance was maintained with an improved AUC of 0.718. In conclusion, we developed and externally validated a parsimonious six-predictor model using random forest methods to predict treatment failure in young children with severe pneumonia in Bangladesh. These findings can be used to further develop and validate parsimonious and pragmatic prognostic clinical prediction rules for pediatric pneumonia, particularly in LMICs.


Introduction
Despite advances in modern medicine, pneumonia remains among the leading causes of mortality in children under 5, with annual deaths estimated at 740,000 [1].Pneumonia alone accounts for 15% of an estimated 5.4 million global deaths in children under 5 years of age [2].As a result, a recent report commissioned by the WHO emphasized the importance of further research in the risk stratification of children with pneumonia [3].
The greatest burden of pneumonia is concentrated in low-and middle-income countries (LMICs), where resources for medications and hospital-based management are limited [4].The rates of treatment failure are considerably higher in LMICs compared to high-income countries [5].A randomized controlled trial on under-five children conducted in a tertiarylevel hospital in Bangladesh found that 14% of participants experienced treatment failure, with 53% of patients requiring switching to second-line antibiotics [6].Another cohort study from Tanzania found that the rate of first-line antibiotic treatment failure was over 50% among children with severe community-acquired pneumonia [7].Identifying children with a high risk of treatment failure will help to triage appropriate resources for children who need closer inpatient monitoring and post-discharge follow-up.
Prior research has identified predictors associated with treatment failure.For example, Chisti et al. in Bangladesh found that dehydration, hypocalcemia, and bacteremia were independently associated with treatment failure [8].Likewise, Muro et al. found that in their Tanzanian facility, HIV positivity, acute malnutrition, poor oxygen saturation, convulsions, central cyanosis, and abnormal chest X-rays were independently associated with treatment failure [7].The goal of our study is to build on this body of research by using machine learning methods on datasets of children with severe pneumonia to build a pragmatic clinical prediction model to identify children at high risk of treatment failure.

Derivation dataset
The data for the derivation of the clinical prediction rule was obtained from a recently completed randomized controlled trial comparing antibiotic regimens for the treatment of severe pneumonia in 308 children aged 2-59 months in a tertiary center in Bangladesh.Full details of the study design are described elsewhere [9,10].Briefly, the study enrolled patients from the Dhaka Hospital of International Centre for Diarrheal Disease Research, Bangladesh (icddr,b), an urban facility that provides care for nearly 200,000 patients a year according to the icddr,b annual report, 2021.Children aged 2-59 months were recruited from January 2018 to October 2019.The median age of the children among control and intervention groups were 7.1 and 7.2 months respectively with a male predominance (60% and 65% respectively).All participants had severe pneumonia according to the revised 2014 WHO guidelines for pediatric pneumonia [11], based on cough and difficulty in breathing, plus at least two of the following "danger signs": (i) central cyanosis or oxygen saturation <90% on pulse oximetry, (ii) severe respiratory distress (e.g., grunting, very severe chest indrawing), (iii) signs of pneumonia with a general danger sign (e.g., inability to breast-fed or drink, lethargy or unconsciousness, convulsion).Exclusion criteria included: those who received antibiotics at home or another hospital for �48 hours prior to admission, had known congenital or chromosomal anomaly (e.g., congenital heart disease, laryngomalacia, cleft lip, cleft palate, and trisomy 21), or required cardio-pulmonary resuscitation or referral to another facility for management of acute renal failure.Treatment failure was defined as either persistence of danger signs (central cyanosis, hypoxemia, grunting, inability to breast-fed or drink, lethargy or convulsions) at the end of 48 hours, or the appearance of new danger signs including requiring mechanical ventilation, development of severe sepsis or septic shock, death, or referral to another specialized hospital (e.g. for acute kidney injury requiring dialysis) within 24 hours of enrollment.Among the participants, 14 (4.5%) children died [9].

External validation dataset
For the external validation of our prediction model, we used data from a cohort study of 191 children aged 0-59 months with severe acute malnutrition and severe pneumonia, hospitalized between April 2015 and March 2017 in the Dhaka Hospital of icddr,b.Clinical and demographic information was collected at enrollment, and information regarding outcomes was recorded at discharge and at 30-day follow-up from admission.The median age of the enrolled children was 8.0 months with a male predominance (61%).78% had been born preterm.A total of 14 (7.3%) children died within 30 days of hospital admission; of these, 12 (6.3%)died in-hospital and two (1.0%) died after discharge.These deaths were coded as treatment failures for the validation of our model.Details of this study can be found elsewhere [12].For the external validation, only parameters identified in the derivation analysis and selected for inclusion in the prediction rule were used.

Derivation of a clinical prediction rule
The derivation dataset contained patient information for 185 candidate predictors that were assessed at the time of presentation.We conducted an initial variable screening in which predictors with few positives or negatives (i.e. less than 5% of patients) were excluded.For example, 99% of participants had a cough so it would be unlikely to be helpful in a prediction model.Other reasons for initial exclusion were similarity (such as heart rate and pulse) or lack of clinical relevance.The derivation dataset had also randomized patients to amoxicillin and ampicillin arms in a 1:1 fashion.In order to create a more generalizable prediction rule, the antibiotic assignment was excluded.After this initial screening, we further analyzed 131 predictors.Missing data were imputed by a single imputation using the multiple imputations by chained equations (MICE) package in R [13].
We used repeated 5-fold cross-validation using 100 iterations to estimate the generalizable performance of the random forest model and determine the optimal number of predictors.This was done with the permimp package and cforest function in R, a conditional permutation approach that is particularly robust to co-linearity when estimating variable performance [14].Following cross-validation, the optimal model was applied to the entire dataset.We assessed discriminatory performance by calculating the AUC (area under the receiver operating characteristic curve (ROC).The ROC curve plots the true positive rate against the false positive rate, with the AUC providing the probability that our model correctly predicted the outcome (ie.treatment failure).Calibration was assessed by using calibration-in-the-large and calibration slope [15].External validation was performed using similar methods on the validation dataset described above [12].
For the development of the prediction models and further analysis, R was used.The TRI-POD (transparent reporting of a multivariable prediction model for individual diagnosis) guidelines were followed (S1 Text).

Ethical approval statement
This is a secondary analysis from the data of two different studies among the children.Ethical approval for both of the studies had been given by the Institutional Review Board (IRB) of International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b).Informed written consent was obtained from the legal guardian of all participants before enrolling into the studies. 1 describe the process of developing the random forest model, with a representative sample of the types of predictors that were included.There was no missing data for the outcome of treatment failure, with 62 (20%) of patients experiencing treatment failure.Treatment failure was due to persistence of danger signs in 30 patients (10%) and development of new danger signs in 32 patients (10%).Predictors that required imputation included family income, adequacy of oral rehydration, type of diarrhea, height, mid upper arm circumference (MUAC), weight for height z-score (WHZ), height for age z-score (HAZ), pulse, room air saturation, systolic and diastolic blood pressure.For these predictors, less than 5% of patients had a missing value.Further description of baseline patient characteristics was published in the original study [9].

Fig 1 and Table
The top 10 predictors for the random forest model are provided in Table 2, with the six most important predictors being the presence of grunting (yes or no), room air saturation, temperature, presence of lower chest wall indrawing (yes or no), presence of respiratory distress (yes or no), and presence of central cyanosis (yes or no).The order of importance for these predictors did not change with the adjustment of various tuning parameters of our model, such as the number of trees included in the model.The most important predictors could be assessed during the initial physical exam.Indeed, two of the top three predictors are vital signs.Duration of poor feeding was the only predictor among the top 10 that would need to be assessed by history taking.
Partial dependency plots of the top six predictors are shown in Fig 2 .These help to demonstrate the magnitude and effect direction of our top predictors on treatment failure.For example, the top predictor, the presence of grunting, was associated with a roughly 78% chance of treatment failure if it was present.Conversely, treatment failure was seen in roughly 14% of patients who did not have grunting on examination.The discriminatory power of predictors like the presence of lower chest wall indrawing was less pronounced, with 24% of those with this finding experiencing treatment failure compared to 14% of those without.The continuous variables of room air saturation and temperature demonstrated some dichotomous characteristics with sharp changes in treatment failure probability at certain values.For example, treatment failure significantly decreased in patients with room air saturations greater than 89 percent and temperatures less than 37.5 degree Celsius.We performed repeated five-fold cross-validation with our random forest models across 100 iterations, achieving an average AUC that ranged from 0.641 (2 variables) to 0.696 (7 variables), with standard deviations ranging from 0.0686 to 0.0936 for other models using between one and ten predictors.Interestingly, increasing the number of variables beyond 7 decreased the discriminatory performance of our model.Likewise, the improvement in AUC from 1 variable to 7 variables, or certainly 3 variables (AUC of 0.663) to 7 variables was modest (Fig 3).We ultimately used a model composed of 6 predictors, as the improvement in discriminatory performance from 6 to 7 predictors (0.691 to 0.696) was too modest to justify adding an additional predictor in a pragmatic clinical prediction tool.
The 6-variable model was externally validated by applying it to a cohort study of 191 children with severe pneumonia from Dhaka Hospital between 2015 and 2017, which examined  intercept suggests an overestimation of the risk of treatment failure [16].A slope that is smaller than 1 suggests that the estimated probabilities of our model are too extreme, meaning that the calculated probabilities of treatment failure are too high for those at high risk of treatment failure and vice versa.A slope less than 1 may be due to overfitting of the model, though overfitting would tend to decrease discriminatory performance as well [15].

Discussion
Our study used data from a randomized control trial of 308 children between the ages of 2 and 59 months in 2018-2019 to derive a prognostic model for severe pneumonia using machine learning methods.We built a parsimonious model of six predictors, which was externally validated by a dataset that was temporally distinct, achieving a modest discriminatory performance with an AUC of 0.718.Development of clinical prediction models such as ours may assist in triaging patients and resources in the management of severe pneumonia in LMICs.The top six predictors in our model were, in order of importance: the presence of grunting, room air saturation, temperature, the presence of lower chest wall indrawing, the presence of respiratory distress, and the presence of central cyanosis.The addition of one more variable to our model would have modestly improved the AUC but for the sake of pragmatism, a model of six predictors was chosen, achieving an AUC of 0.6911.There is some collinearity in the predictors of our model as they focus on the respiratory system.However, the presence or absence of collinearity in a model has not been demonstrated to affect predictive outcomes [17].Indeed, a random forest model that excluded collinear predictors such as severe respiratory distress and central cyanosis achieved a similar AUC of 0.6945.Prior studies have demonstrated that hypoxemia, central cyanosis, tachypnea, and acute malnutrition were associated with treatment failure [7,18].This is in line with our findings.Measures of developmental status such as mid-upper arm circumference and adjusted weight for age were also in our top 10 most important predictors.
Our work builds upon this body of knowledge by providing a model that combines these variables into a prognostic prediction rule that could be used for decision support in severe pediatric pneumonia.Our model is both parsimonious and pragmatic allowing it to be especially useful in LMIC settings.The variables screened are able to be assessed by history and physical as soon as a patient presents to medical care, potentially obviating the need for more advanced and costly diagnostics.Furthermore, several of the variables, such as room air saturation and temperature, can be assessed by non-physician support staff with minimal clinical training.Patients at high risk of treatment failure could immediately be triaged to wards with higher levels of nursing care better able to provide more frequent monitoring and essential supportive cares such as aggressive secretion management.
Our model could be easily operationalized by incorporation into the electronic medical record or as mobile applications.A similar alert system using anthropometric measurements and the CURB-65 score is currently being used at the study site though our model would be more specific to the pediatric population.As a clinical support tool, our model can assist clinicians by identifying patients who would benefit from use of limited diagnostic resources to assess for alternative etiologies of respiratory failure as well as those who may benefit from more advanced respiratory support devices.Our tool would not necessarily guide antibiotic selection or escalation as the derivation study was not designed to assess for antibiotic resistance or other microbiologic factors as a cause of treatment failure.It may, however, suggest a patient population who would benefit from diagnostic testing in the form of blood or sputum cultures to ensure that they receive appropriate antibiotics.In sum, our model could be easily incorporated into the healthcare systems of LMIC and provide high value care by identifying high risk patients, thereby maximizing efficient use of diagnostic resources, support staff, respiratory devices, and intensive care units.
Our study has several limitations.First, our model was derived from data collected from a single-center study of hospitalized children in Bangladesh.Though the derivation study enrolled children younger than 5 years of age, it may be most applicable to younger populations, as the median age was around 7 months with an interquartile range of 6.5 months.It is likely that many of the top predictors identified would be generalizable other settings as they have been identified by other studies conducted in other LMICs to be associated with treatment failure.Secondly, some of the predictors used in our model, such as the presence or absence of respiratory distress, are subjective and may depend on the experience of the clinician.We do not have data regarding an interobserver agreement for these predictors.Thirdly, our external validation was performed using data from the same center, albeit from an earlier period.However, to our knowledge, the validation dataset was the most appropriate dataset with openly accessible data, as it examined similarly aged children with severe pneumonia as defined by the most recent guidelines for pediatric pneumonia put forth by the WHO in 2014.Though the discriminatory performance of our model was actually improved with external validation, the calibration slope and intercept suggest that our model may be systematically extreme and that the calculated probabilities of treatment failure may be too high for those at high risk of treatment failure and vice versa.In addition, our negative calibration intercept suggests that our model may overestimate the risk of treatment failure.This may be due to the higher incidence of treatment failure in the primary dataset than in the external validation dataset.In addition, the derivation study found that the amoxicillin treatment was associated with a reduction in treatment failure compared with the ampicillin arm.We removed the treatment arm as a predictor in this study to make our model more generalizable, though this potentially reduces its predictive performance.Finally, the sample sizes of our studies for the derivation and validation of our clinical prediction rule were relatively small, at 308 and 191 patients, respectively, resulting in relatively large standard deviations in our AUC.

Conclusions
We used random forest models to develop a parsimonious, 6-predictor model for treatment failure in children aged 2-59 months with severe pneumonia, achieving an externally validated AUC of 0.718.We identified grunting, room air saturation, temperature, lower chest wall indrawing, respiratory distress, and central cyanosis as the most important predictors of treatment failure.Future studies should reproduce our methods in other centers with the aim of developing and validating clinical prediction rules that may help improve patient care and triage healthcare resources in diverse global settings.

Table 2 . Top 10 predictors for treatment failure.
Importance is defined as the amount that the AUC would decrease if the predictor were to be excluded.For categorical variables, the number of patients positive for a given predictor are provided.For continuous variables, the median value is presented. https://doi.org/10.1371/journal.pgph.0002216.t002