Development of a risk prediction model (Hangang) and comparison with clinical severity scores in burn patients

Purpose The purpose of this study was to develop a new prediction model to reflect the risk of mortality and severity of disease and to evaluate the ability of the developed model to predict mortality among adult burn patients. Methods This study included 2009 patients aged more than 18 years who were admitted to the intensive care unit (ICU) within 24 hours after a burn. We divided the patients into two groups; those admitted from January 2007 to December 2013 were included in the derivation group and those admitted from January 2014 to September 2017 were included in the validation group. Shrinkage methods with 10-folds cross-validation were performed to identify variables and limit overfitting of the model. The discrimination was analyzed using the area under the curve (AUC) of the receiver operating characteristic curve. The Brier score, integrated discrimination improvement (IDI), and net reclassification improvement (NRI) were also calculated. The calibration was analyzed using the Hosmer-Lemeshow goodness-of-fit test (HL test). The clinical usefulness was evaluated using a decision-curve analysis. Results The Hangang model showed good calibration with the HL test (χ2 = 8.785, p = 0.361); the highest AUC and the lowest Brier score were 0.943 and 0.068, respectively. The NRI and IDI were 0.124 (p-value = 0.003) and 0.079 (p-value <0.001) when compared with FLAMES, respectively. Conclusions This model reflects the current risk factors of mortality among adult burn patients. Furthermore, it was a highly discriminatory and well-calibrated model for the prediction of mortality in this cohort.


Introduction
Prediction of critically ill patients in a systemic manner based on clear, objective data is an essential part of care in an Intensive Care Unit (ICU). The development of severity scoring systems has been transformed to predict outcomes in a more objective and reliable way and has sequentially influenced management decisions, including do-not-resuscitate status and the withdrawal of life support [1]. Severity scoring systems have continued to be developed and various scoring systems have been used for the critical ill patient. Severity scoring systems should have validity, calibration, and discrimination to predict the severity of disease and mortality, as well as repeatability and reliability in different populations and diseases [2,3]. There are generally two kinds of prediction models due to the different characteristics of individual diseases. One is used for the general intensive care patients and is focused on the acute physiological status and associated comorbidities assessed by the Acute Physiologic and Chronic Health Evaluation (APACHE) II [4], Simplified Acute Physiologic Score (SAPS) II [5], Logistic Organ Dysfunction Score (LODS) [6], and Sequential Organ Failure Assessment (SOFA) [7]. The other is specific to each individual disease and consists of the disease-related features. Among burn patients, the abbreviated burn severity index (ABSI) [8], FLAMES (Fatality by Longevity, APACHE II score, Measured Extent of burn, and Sex) [9], the revised Baux index (rBaux) [10], the models which were developed by Ryan et al [11], and the Belgian outcome in burn injury (BOBI) [12] group are known and used widely. These burn-specific prediction models, with the exception of FLAMES, consist of patient-related factors; no laboratory variables are included and even in FLAMES there is not burn specific laboratory factors. Therefore, these models are only able to determine some of the risk factors for mortality rather than a continuous range of risk factors [9]. It is necessary to develop a prediction model that includes a wider range of treatment-related biological variables as well as patient-related variables to accurately reflect the rapid progress in burn treatment. Additionally, the existing scoring systems for the general critically ill patients do not accurately predict the severity and the risk of mortality in the burn patients because they were developed from the general ICU, and did not specifically take into consideration burn populations [13].
The purpose of this study was to develop a new prediction model for mortality among burn patients that included specific laboratory tests to better reflect the risk of mortality and severity of disease as survival rates have increased due to the development of burn treatment in recent years. Additionally, we aimed to evaluate whether the newly developed prediction model could predict the risk of mortality more accurately than existing scoring systems.

Patients
This study included 2009 patients aged more than 18 years who were admitted within 24 hours after a burn in the burn intensive care unit (BICU) of Hangang Sacred Heart Hospital, Hallym University Medical Center from January 2007 to September 2017. The criteria for admission to BICU were as follows; 1) partial thickness burn of more than 20% of total body surface area (TBSA) for adults and partial thickness burn of more than 10% of TBSA if the patient was over 65 years of age, 2) inhalation injury, 3) electrical burn, 4) pre-existing medical disorder that could incur complications, or affect mortality, and 5) with concomitant trauma, which could elevate the risk of the morbidity or mortality. We divided the patients into two groups to develop and validate the new Hangang model; the patients who were admitted from January 2007 to December 2013 were included in the derivation group and the patients who were admitted form January 2014 to September 2017 were included in the validation group. This study was approved by the Institutional Review Board of Hangang Sacred Heart Hospital. Informed consent was waived due to the retrospective nature of the study.

Variables and prediction models
All medical records of patients were retrieved from the clinical data warehouse which stored all electronic medical records anonymously in Hallym University Medical Center. The following demographic variables were collected; age, sex, type of burn, percentage of TBSA (% TBSA) burned, presence of inhalation, pre-existing medical history. There were no missing data; all subjects had complete data. The outcome of the prediction models was the 60-day mortality. We evaluated 10 prediction models. Five models such as ABSI [8], Ryan [11], FLAMES [9], BOBI [12] and the revised Baux [10] which are specific for burn patients and the most well-known [2], were calculated from electronic medical records. The ABSI scores consisted of five variables; age (1-5 points), % TBSA burned (1-10 points), female gender (1 point), the presence of inhalation injury (1 point), and the presence of full-thickness burn (1 point). The Ryan score was the sum of the presence of three risk factors (greater than 60 years of age, greater than 40% TBSA, and the presence of inhalation injuries). The FLAMES score was calculated using age, the percentage of partial and full thickness burns, gender, and the APACHE II score on day 1. The BOBI score was calculated using age (0-3 points), % TBSA burned (0-4 points), and the presence of inhalation injury (3 points). The revised Baux score was calculated using age + % TBSA burned + 17 (presence of inhalation injury). Inhalation injuries were diagnosed based on the patients' history (burned in a closed space, unconscious at the scene, prolonged extrication), physical findings such as singed facial hair, carbonaceous deposits in the nose or mouth, or facial burns, and other diagnostic modalities such as bronchoscopy, carbon monoxide levels, and serial chest x-rays. The presence of medical comorbidities was identified based on the presence of one or more of the following; cardiac disease, liver or kidney disease, and diabetes mellitus. APACHE II [4], SPAS II [5], LODS [6], SOFA [7] which are models generally used for the critical ill patients, were also calculated using the electronic medical records. APACHE II and SAPS II, LODS score, and SOFA consisted of 12, 11, and six physiologic variables [3]. All laboratory variables were used in these prediction models; the models retrieved the worst value of laboratory variables during first 24 hours after admission.

Burn management
All patients who were admitted to BICU received initial fluid resuscitation using the modified Parkland formula (4 mL × kg × % TBSA burned); the fluid volume was adjusted as needed to maintain a minimum urine output of 0.5 mL/kg/hour. Enteral feeding was the first choice and initiated within 48 hours if there was no ileus; parenteral nutrition was supplemented to meet the target caloric requirements, which were measured using the European society of parenteral and enteral nutrition (ESPEN) guidelines for intensive care [14]. Burn wound dressing was conducted daily using hydrofoam and topical antimicrobials. Early excision and grafting with auto-/allograft was performed within 5 days after admission.

Statistical analysis
Continuous variables distributed normally and non-normally were presented as means ± standard deviation (SD) and as medians (25 th interquartile range [IQR] -75 th IQR), respectively. The paired t-test or Wilcoxon signed rank test, depending on the normality of the data, was used to determine differences between the two groups. Categorical variables are presented as proportions and differences between them were analyzed using Chi-square tests. Two side p-values <0.05 were considered statistically significant. All statistical analyses were conducted using the computing statistical R-project program version 3.5.1.

Development of the new prediction model (Hangang)
The following 30 variables were obtained, including eight patients' variables (six demographic values, participants medical histories, and the Glasgow coma scale) and 22 physiologic values (Table 1). To detect multicollinearity for the all variables in this model, we used variance inflation factors. Shrinkage methods (the least absolute shrinkage and selection operator [LASSO]) with 10-folds cross-validation were performed using the computing statistical R-project program with the 'glmnet' package to determine the least number of variables for the development of the model and to limit overfitting of the model. Then, ten variables including age, % TBSA burned, inhalation injuries, serum lactate, pH, prothrombin time (PT), serum bilirubin, serum myoglobin, serum creatinine, and lactate dehydrogenase (LD) were included finally in the Hangang model. The continuous features were divided into groups which were mapped to a target variable (mortality) by supervised discretization using algorithms such as Recursive Partitioning, which can identify optimal cut points and evaluate the relationship with the outcome using the Weight of Evidence and Information Values [15]. The optimal cut points were adjusted to ensure the model was simple and easy to interpret. Then the variables were categorized by the adjusted cut points. The points were assigned to the categorized variable using the coefficients calculated using the computing statistical R-project program with the 'smbinning' package (Table 1). A nomogram of the Hangang model shows the scores of each variable ( Table 2). The minimum and maximum scores of the Hangang model ranged from 91 to 216 and the probability (%) of mortality according to scores (Table 3).

Model performance
The discrimination was analyzed by the area under the curve (AUC) of the receiver operating characteristic curve (ROC); as the AUC approaches one, the discriminating power increased [16]. The Brier score was calculated; a Brier score of 0 indicates total accuracy [17]. The integrated discrimination improvement (IDI) and the net reclassification improvement (NRI) were also calculated using category options between the Hangang and other existing models using the computing statistical R-project program with the 'PredictABEL' package. The calibration was analyzed using the Hosmer-Lemeshow goodness-of-fit test (HL test), which assesses how well the mortality pattern in the data under analysis is described; non-significant p-values indicated that the fit of the model was good [18]. Clinical usefulness, or the ability to make better decisions with a model than without, was not assessed by discrimination and calibration [19]. Therefore, we also performed a decision-curve analysis. The code and manual for the decision-curve analysis is publicly available (www.decisioncurveanalysis.org).

Comparison of baseline characteristic between the derivation group and validation group
In total, 2009 patients were included in this study; they were then divided into the derivation (n = 1406) and validation (n = 603) groups. The overall median age was 47.0 (38.0-56.0) years and participants were older in the validation group than in the derivation group (49.0 years vs 46.0 years, p = 0.003). The overall % TBSA burned was 30.0%; there was no significant difference between the two groups based on % TBSA burned (p = 0.127). Inhalation injuries were significantly more frequent in the derivation group (57.3% vs 51.1%, p = 0.011) and the patients in the validation group had more medical comorbidities (46.8% vs 21.0%, p<0.001).
Overall mortality was 21.7% and there was no significant difference between the two groups. The validation group had significant differences in patient characteristics with the exception of sex (p = 0.409) and the type of burns (p = 0.099). The scores of the prediction models such as ABSI, rBaux, Ryan BOBI, FLAMES for burn patient did not significantly differ between the two groups. Only the SOFA scores for the prediction models for ICU patients did not significantly differ ( Table 4). All physiologic variables are shown in the Table 4.

Validation of the new model (Hangang) in the validation cohort
Even though the demographic and physiologic characteristics of the validation group differed compared to the derivation group, the Hangang model showed improved prediction of the risk of mortality. Hangang had better calibration (HL test, χ 2 = 8.785, p = 0.361) for predicting mortality; this was reinforced by the highest AUC (0.943) and the lowest Brier score (0.068). The NRI and IDI when compared with FLAMES were 0.124 (p = 0.003) and 0.079 (p-values <0.001), respectively. Among the prediction models tested for ICU patients, SAPS II had the highest AUC (0.860), an accuracy of 0.786, the lowest Brier score (0.115), and a HL test χ 2 of 6.489 (p = 0.593) ( Table 6). The calibration plots for all the existing models included in this study are shown in the Figs 1 and 2. The decision-curve indicates that the Hangang model was the best for predicting the probability of mortality (Fig 3).

Discussions
Despite the existence of several prediction models, there are not many realistic models to accurately predict the outcomes of burn patients [20]. Various prediction models suggest that there is no ideal model to predict outcomes accurately in every population [2]. The ideal prediction model generally is simple, reliable, and objective (observer independent) [13]. However, in most burn-specific prediction models, it might be difficult to accurately reflect the risk of mortality, which has been changed as a result of the advancement of burn treatment. This is due to  the fact that these models consist of patient-related variables (such as age and % TBSA) and do not contain objective laboratory values [21]. The % TBSA burned was the most powerful predictor in this study, however, it is measured differently based on the experience of the treating physician; the estimation error can be up to 20% among inexperienced physicians [22]. Therefore, in hospitals that are not specialized in treating burn patients, such errors can affect the model and make it difficult to accurately predict mortality. To compensate for these errors, prediction models should include the addition of objective laboratory results.
We assessed the validation of prediction models by calibration and decimation. Additionally, we assessed the ability to make better decisions with a model than without by conducting a decision-curve analysis [23]. The Hangang model showed that the net benefit (NB) was higher than other prediction models for patients in ICU and higher, with the exception of extremes, than other models for burn patients. These findings suggest that the Hangang model assists in making better decisions for the prediction of mortality. Our model ensured accuracy, reliability, and objectivity by adding seven variables (lactate, pH, creatinine, PT, bilirubin, LD, serum myoglobin) associated with treatment over three variables (age, % TBSA burned, inhalation injury) which are commonly applied to existing burn specific prediction models, with the exception of FLAMES. Our model showed superiority when compared to the other existing models.
Among the laboratory variables included in this model, serum myoglobin and LD were not used as predictors for mortality in other prediction models. Serum myoglobin is associated with the burn depth and severity of the burn; previous studies have shown that burn patients with high myoglobinemia have a high risk of mortality [24,25]. LD is also associated with burn diseases and mortality in patients with major burns [25][26][27]. When compared to FLAMES, which includes other physiologic variables (a form of APACHE II) similar to our Hangang model, we inferred that the Hangang model would have better prediction ability because burn-specific serum myoglobin and LD were included and other non-significant variables were excluded. Prediction models for the general ICU such as APACHE II, SAPS II, SOFA, and LODS showed poor predictability in critical burn patients. Therefore, caution should be taken when applying general prediction models to burn patients because they do not take into account the profound physiological effects of the burn itself, although they may prove valid in a general critical ill patients. [20] This study was subject to several limitations. First, we did not validate the Hangang prediction score externally at other hospitals, because our burn center is the only burn center run by the Hallym university and has been designated as "The Emergency Center for Burn Care" by the Ministry for Health, Welfare, and Family Affairs in South Korea. However, validation was performed in the cohort recently treated in our center. Second, our study group did not include pediatric burn patients due to their different physiologic characteristics. Further studies including pediatric burn patients are needed. Third, not all patients who were admitted to BICU were included in this study; only acute burn patients who were admitted within 24 hours after injury were included in order to exclude other confounding factors. Fourth, although we collected the worst laboratory value over 24 hours for laboratory variables to minimize other affecting factors, the seven variables included in our model might have been affected by the level of fluid resuscitation, thus affecting our model.
Despite these limitations, it is important to recognize that our model reflects outcomes as a result of care provided under the current standards. Although our Hangang prediction model was developed in a single center, this is the largest study, to our knowledge, to date to test a new model for the prediction of mortality among burn patients. In the future, it might require modification to assist with decision-making as new therapies are introduced. We advocate that physicians who do not have much experience treating burns should consult experienced doctors when using this prediction model.  Table 6. https://doi.org/10.1371/journal.pone.0211075.g001

Conclusions
There are many severity scoring systems widely used in the ICU to predict outcomes and characterize the severity of the disease. All of these scoring systems have been developed for the mixed population in the ICU. Their accuracy among subgroups, such as burn patients, is questionable and therefore, burn-specific scoring systems are required for accurate prediction. This model reflects the burn specific risk factors such as serum myoglobin and LD as well as current risk factors for mortality; it is a highly discriminatory and well-calibrated model for the prediction of mortality in adult burn patients.  Table 6. https://doi.org/10.1371/journal.pone.0211075.g002