Testing the Prognostic Accuracy of the Updated Pediatric Sepsis Biomarker Risk Model

Background We previously derived and validated a risk model to estimate mortality probability in children with septic shock (PERSEVERE; PEdiatRic SEpsis biomarkEr Risk modEl). PERSEVERE uses five biomarkers and age to estimate mortality probability. After the initial derivation and validation of PERSEVERE, we combined the derivation and validation cohorts (n = 355) and updated PERSEVERE. An important step in the development of updated risk models is to test their accuracy using an independent test cohort. Objective To test the prognostic accuracy of the updated version PERSEVERE in an independent test cohort. Methods Study subjects were recruited from multiple pediatric intensive care units in the United States. Biomarkers were measured in 182 pediatric subjects with septic shock using serum samples obtained during the first 24 hours of presentation. The accuracy of PERSEVERE 28-day mortality risk estimate was tested using diagnostic test statistics, and the net reclassification improvement (NRI) was used to test whether PERSEVERE adds information to a physiology-based scoring system. Results Mortality in the test cohort was 13.2%. Using a risk cut-off of 2.5%, the sensitivity of PERSEVERE for mortality was 83% (95% CI 62–95), specificity was 75% (68–82), positive predictive value was 34% (22–47), and negative predictive value was 97% (91–99). The area under the receiver operating characteristic curve was 0.81 (0.70–0.92). The false positive subjects had a greater degree of organ failure burden and longer intensive care unit length of stay, compared to the true negative subjects. When adding PERSEVERE to a physiology-based scoring system, the net reclassification improvement was 0.91 (0.47–1.35; p<0.001). Conclusions The updated version of PERSEVERE estimates mortality probability reliably in a heterogeneous test cohort of children with septic shock and provides information over and above a physiology-based scoring system.


Introduction
Heterogeneity is a major feature of pediatric septic shock, including widely variable mortality risk [1]. In the absence of tools to accurately assess mortality risk, clinicians have little objective information to benchmark septic shock outcomes, adjust for risk in analyses of clinical data, risk stratify patients for interventional clinical trials, and guide decisions on which patients need the most aggressive treatment, and which do not. We recently reported the derivation and validation of the pediatric sepsis biomarker risk model (PERSEVERE; PEdiatRic SEpsis biomarkEr Risk modEl) [2]. PERSEVERE was derived using a Classification and Regression Tree (CART) approach to predict 28-day mortality. The derivation selected five biomarkers and age, from among twelve biomarkers (serum proteins) and clinical variables potentially associated with outcome. Importantly, PERSEVERE was derived using data measured during the first 24 hours of presentation to the pediatric intensive care unit (PICU) with septic shock, which is an optimal time for risk stratification. In addition, participants were drawn from multiple centers in the United States [3][4][5].
Updating risk models using larger learning data sets can enhance generalizability and reliability. After the initial derivation and validation of PERSEVERE, we therefore combined the derivation and validation cohorts (n = 355) and updated PERSE-VERE [2]. The purpose of the current study is to formally test the prognostic accuracy of the updated version of PERSEVERE using an independent test cohort, which is a critical next step after updating the model. The study is reported following the STARD (STAndards for the Reporting of Diagnostic accuracy studies) initiative [6].

Ethics statement and test cohort study subjects
The test cohort subjects were pooled from four sources, all of which used the same definition for septic shock [7] Eighty-seven subjects were included from an ongoing genomics study in pediatric septic shock being conducted at 17 participating institutions [8][9][10][11][12][13][14][15][16][17]. Briefly, children #10 years of age admitted to the PICU and meeting pediatric-specific criteria for septic shock were eligible for enrollment. After written informed consent from parents or legal guardians, serum samples were obtained within 24 hours of initial presentation to the PICU with septic shock. The current analysis included subjects enrolled between September 2011 and May 2013.
Sixty subjects were included from among those enrolled in an ongoing, quality improvement program at Cincinnati Children's Hospital Medical Center (CCHMC), Cincinnati, Ohio. The program uses PERSEVERE to benchmark septic shock outcomes for all patients admitted to the CCHMC PICU with septic shock. Enrollment procedures are identical to those described above, except that there is no age restriction and the CCHMC IRB has granted permission for waiver of informed consent. Serum samples are collected from residual blood samples in the clinical laboratory. Subjects from this source were enrolled between May 2012 and May 2013.
Nineteen subjects (age range: 8 days to 18 years) were participants in a prospective, observational study at Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, Illinois, evaluating nitric oxide metabolism and mitochondrial function in children with septic shock [18]. Of the 30 subjects with septic shock enrolled in that study, 19 had serum samples available for analysis. The current analysis included subjects enrolled between May 2009 and June 2010.
Sixteen subjects (age range: 2 to 20 years old) were participants in a prospective, observational study at Yale-New Haven Children's Hospital, New Haven, Connecticut, evaluating angiopoietin levels in children with septic shock [19]. Of the 17 subjects with septic shock enrolled in that study, 16 had serum samples available for analysis. The current analysis included subjects enrolled between September 2009 and December 2011.

Study procedures
For all studies, annotated clinical and laboratory data were collected daily while the participant was in the PICU. Illness severity was calculated prospectively using the Pediatric Risk of Mortality (PRISM) score [20]. The number of organ failures during the initial 7 days of PICU admission was recorded using pediatric-specific criteria [7]. PICU free days were calculated by subtracting the actual PICU length of stay from a theoretical maximum PICU length of stay of 28 days. Patients with a PICU length of stay greater than 28 days and patients who died during the 28-day study period were classified as having zero PICU free days. All-cause mortality was tracked up to 28 days after meeting criteria for septic shock.

Statistical Analysis
Initially, data are described using medians, interquartile ranges, frequencies, and percentages. Comparisons between groups used the Mann-Whitney U-test, Chi-square, or Fisher's Exact tests as appropriate. Descriptive statistics and comparisons used SigmaStat Software (Systat Software, Inc., San Jose, CA).
CART analysis was used to derive, validate, and update PERSEVERE (Salford Predictive Modeler v6.6, Salford Systems, San Diego, CA) [2,21,22]. Performance of the resulting decision tree in this new test cohort is reported using diagnostic test statistics with 95% confidence intervals computed using the score method as implemented by the VassarStats Website for Statistical Computation [23]. The net reclassification improvement (NRI) was used to estimate the incremental predictive ability of the biomarker-based model compared to using PRISM scores alone [24]. The NRI was computed using the R-package Hmisc. Table 1 describes the new test cohort (n = 182), and compares this to the previously published derivation cohort (n = 355). The test cohort had a higher median age and a greater proportion of subjects with no race reported. No other differences were observed. Within the test cohort, the only difference between survivors and non-survivors was the median PRISM score.

Testing the model
The test cohort subjects were classified based on the decision rules of the updated model, without any modifications. Figure 1 shows the classification of the test cohort subjects according to the updated decision tree, which includes three low risk terminal nodes (TN2, TN4, and TN7; mortality probability 0.000 to 0.025), three intermediate risk terminal nodes (TN1, TN3, and TN5; mortality probability 0.182 to 0.267), and two high-risk terminal nodes (TN6 and TN8; mortality probability 0.472 to 0.625). There were 123 test cohort subjects classified as low risk and 59 subjects classified as either intermediate or high risk. Among the low risk subjects, four (3.3%) had died by 28 days. Among the intermediate and high-risk subjects 20 (33.9%) had died by 28 days. Table 2 shows the diagnostic test characteristics of the decision tree in the test cohort.
When adding the information in PERSEVERE to the information in PRISM, the NRI was 0.906 (95% CI: 0.465-1.350; p,0.001). The NRI is a measure of how much the accuracy of predicted outcomes is improved when adding information [24]. The NRI ranges between 22 and +2. A score of 22 indicates that all true positives are reclassified as false negatives and all true negatives are reclassified as false positives, and no false classifications are reclassified as true classifications. Conversely, when the score is 2, adding the information correctly reclassifies every case. Our results demonstrate that the PERSEVERE provides significant additional classification value beyond the information included in PRISM.

Secondary considerations
In our prior study, we noted that subjects classified as false positives (i.e. those predicted to die, but who actually survived) had greater illness severity than the true negative subjects (i.e. those predicted to survive, who did survive), as measured by PICU length of stay and organ failure burden [2]. We conducted a similar secondary analysis for the current test cohort. Table 3 shows that the false positive subjects in the test cohort had greater illness severity than the true negative subjects as measured by PICU length of stay, PICU free days, organ failure burden, and organ failure duration.

Discussion
Risk models require updating and ongoing prospective evaluation in order to enhance generalizability and acceptance. We have prospectively evaluated the prognostic accuracy of the updated version of PERSEVERE and found that it estimates mortality probability reliably in a heterogeneous test cohort. Among subjects predicted to be at intermediate or high-risk, the  overall mortality rate was 33.9%, whereas subjects classified as low risk had an overall mortality rate of 3.3%. This dichotomous interpretation of PERSEVERE partitions a heterogeneous cohort of patients with septic shock into two groups having a ten-fold difference in mortality. A more comprehensive view of PERSEVERE is to view each terminal node in the decision tree, and to assign individual patients with a mortality risk based on the probability of death in that terminal node. This allows for assigning a range of clinically relevant mortality probabilities and the ability to partition patients into low, intermediate, and high-risk groups. Moreover, the current validation study also demonstrates that PERSEVERE adds significant prognostic value to a physiology-based scoring system. PERSEVERE generates reliable mortality risk prediction, but is imperfect; 21% of the test cohort subjects were false positives. This is to be expected if therapeutic interventions modify the outcomes of higher risk patients; the false positive subjects likely represent patients for whom therapeutic interventions prevented the predicted death. Support for this assertion is provided by our secondary analysis of the false positive and true negative patients. False positive subjects had a greater burden and duration of organ failure, and a greater PICU length of stay than true negative subjects, suggesting that PERSEVERE accurately identified higher acuity patients. Thus, even when the prediction is a false positive the information is likely clinically relevant.
The current test cohort was significantly older than the derivation cohort with almost one-third of the subjects being greater than 10 years of age. The issue of age is particularly important since the original derivation of PERSEVERE was based exclusively on children less than or equal to 10 years of age, and because developmental age strongly influences the host response during septic shock [13,25]. As well as expanding the likely generalizability of PERSEVERE to subjects greater than 10 years, the test cohort was pooled from four different sources, each having its own unique potential for selection bias. This suggests that PERSEVERE has the potential for both broad applicability in pediatric septic shock, as well as having the potential to perform reliably in future prospective testing.
PERSEVERE has various potential clinical applications. First, it can be used as a benchmark to objectively evaluate septic shock outcomes. Poor outcomes in patients with a low PERSEVEREbased mortality risk, could suggest clinical underperformance and the need to review the clinical care process, while good outcomes in patients with a high PERSEVERE-based mortality risk could indicate better than expected clinical performance. We note that the actual mortality rate of the test cohort (13.3%) was higher than the overall mortality predicted by PERSEVERE (9.3%; 95% CI 7.2 to 11.3). This discrepancy reflects the four false negative classifications in terminal nodes 2 and 7. Three of these deaths were attributable to a single center, and in the quality peer review of these subjects, it was deemed that the deaths were unlikely to reflect a deficit in the care process. All three subjects had ''do not resuscitate'' orders in place and died after removal of advanced life support upon determination by the family and health care team that further support was futile. Two subjects had chronic multiorgan dysfunction associated with complications following bone marrow transplantation. The third subject had a lethal, progressive neurodegenerative disease. This illustrates how PERSEVERE can lead to a quality review of the care process, and the challenges inherent to assigning a mortality probability in patients with complex co-morbidities. Future calibrations of PERSEVERE may require the consideration of a co-morbidity variable, including immune function status. We note, however, that many test cohort subjects with significant co-morbidities (n = 45) or immune suppression (n = 12) were correctly classified by PERSEVERE.
Second, PERSEVERE could be used to conduct risk-stratified analyses of clinical data, as demonstrated in a recent study by our group [26]. We found the influence of positive fluid balance on pediatric septic shock outcomes to depend on risk as predicted by PERSEVERE. A positive fluid balance was associated with poor outcomes in the low mortality risk group, but not in the intermediate or high mortality risk groups. Third, PERSEVERE could be used to stratify patients for interventional clinical trials, and possibly to inform individual patient decision-making. These two latter applications will require the development of a rapid assay platform to generate biomarker data in a timely manner. No assay platform currently exists, but the technology to develop is readily available.
In conclusion, we have taken an important next step in the development of PERSEVERE. We have prospectively tested the prognostic accuracy of the updated version of PERSEVERE and found that it can be used to assign a reliable mortality probability in children with septic shock. This tool has various potential applications in the field of pediatric septic shock.