Machine learning models for early sepsis recognition in the neonatal intensive care unit using readily available electronic health record data | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Experimental workflow.
Data was obtained from the CHOP NICU Sepsis Registry (NSR). Domain expert review was used to identify an initial feature set. Continuous data was normalized. Mean data imputation was used to complete missing data. Nested k-fold cross-validation, in which the complete dataset is divided into k stratified bins of approximately equal size (k = 10 in our study), was used to train and evaluate models. The curved arrows indicate loops over the data folds. The outer loop runs over all k folds. For each iteration, a fold is reserved for testing. The remaining k-1 folds are passed to the inner loop, which performs standard k-fold cross-validation to automatically select features and model tuning parameters. Mutual information between individual features and target class was used for automated feature selection. The model is then trained using k-1 folds and evaluated on the held-out fold. This process is repeated k times so that each data fold is used once for evaluation.

More »

Fig 2 — Fig 2.

Timeline representation of a hypothetical NICU hospitalization and corresponding sepsis data sampling scheme.
Sepsis evaluation times are indicated by t₀ and k₀. For this hypothetical scenario, case data is taken from the two 44-hour windows, [t_-48, t_-4] and [k_-48, k_-4], starting 4 hours prior to blood draw, t₀ and k₀, for the two sepsis evaluations. Time indices, t_-n, indicate times n hours prior to blood draw. In this scenario, individual control start times are randomly selected from all candidate control start times (CCST) (indicated by the shaded regions). CCST include all times starting on day 3 after admission (indicated by x₀) that are separated by at least 10 days from any sepsis evaluation time. For a randomly selected control start time, b₀, control data is taken from the 44-hour window [b_-48, b_-4].

More »

Fig 3 — Fig 3.

Study flow diagram.
Excluded episodes, Indeterminate*: episodes with pending cultures at the time of data extraction, results that most likely represented contaminants, episodes with bacteria isolated from sources other than blood.

More »

Table 1 — Table 1.

Demographics at time of initial sepsis evaluation.
Columns 2–4 indicate values by evaluation result (individuals may have multiple evaluations). Last column indicates overall study population values. Values in brackets indicate number of individuals.

More »

Table 2 — Table 2.

Domain expert identified features with percent missing.
Heart rate, temperature, respiratory rate, and mean arterial blood pressure differences are the difference between the most recent measurement and the average over the previous 24 hours.

More »

Fig 4 — Fig 4.

Pseudo-code for nested k-fold cross validation.
The inner loop performs cross-validation to identify the best features and model hyper-parameters using the k-1 data folds available at each iteration of the outer loop. The model is trained once for each outer loop step and evaluated on the held-out data fold. This process yields k evaluations of the model performance, one for each data fold, and allows the model to be tested on every sample.

More »

Table 3 — Table 3.

Hyper-parameters and value ranges evaluated in each fold of nested cross-validation procedure.
For models with more than one parameter, the cross-product of all parameter value combinations was evaluated. Detailed definitions of each parameter are available in the Python scikit-learn documentation (https://scikit-learn.org/stable/modules/classes.html).

More »

Fig 5 — Fig 5.

Receiver operating characteristic (ROC) curves for the logistic regression model and the gradient boosting model on the CPOnly (left) and CP+Clinical dataset (right), respectively. Each figure presents three curves: the solid black curve corresponds to the iteration of the nested cross-validation procedure with the median value of area under the curve (AUC), the two dashed lines represent the iterations with the minimum and maximum AUC.

More »

Table 4 — Table 4.

Area under receiver operating characteristic for CPOnly (controls and culture positive cases) and CP+Clinical (controls, culture positive cases, and clinically positive cases) for each model.
Each value is computed as the mean over 10 iterations of cross-validation. Values in brackets indicate performance range over the 10 iterations. Bold text indicates highest performance in each column. The null hypothesis of equal inter-model distributions was rejected by the Friedman rank sum test with p-values of <0.001 for both the CPOnly and CP+Clinical datasets.

More »

Table 5 — Table 5.

Classifier model prediction performance on CPOnly (controls and culture positive cases) for fixed sensitivity ratio of 0.8.
The probability of sepsis threshold was adjusted individually for each model in each cross validation run to achieve 0.8 sensitivity. Each metric value is computed as the mean over 10 iterations of cross-validation. Values in brackets indicate performance range over the 10 iterations. Bold text indicates highest performance in each column.

More »

Table 6 — Table 6.

Classifier model prediction performance on CP+Clinical (controls, culture positive cases, and clinically positive cases) for fixed sensitivity ratio of 0.8.
The probability of sepsis threshold was adjusted individually for each model in each cross validation run to achieve 0.8 sensitivity. Each metric value is computed as the mean over 10 iterations of cross-validation. Values in brackets indicate performance range over the 10 iterations. Bold text indicates highest performance in each column.

More »

Fig 6 — Fig 6.

Density estimates of continuous valued and distribution of binary valued features.
Only features where the mean magnitude of the logistic regression coefficient was ≥ 0.1 and the feature was selected by the univariate feature selection process more than half of the cross-validation iterations for the CPOnly dataset are shown. Dashed lines indicate controls, solid lines indicate cases. The horizontal axis indicates the normalized feature value. The vertical axis indicates the proportion of samples with the feature value.

More »

Table 7 — Table 7.

Features selected by the univariate feature selection process more than half of the cross-validation iterations for the CPOnly dataset where the mean magnitude of the logistic regression coefficient was ≥ 0.1.
The CPOnly Count column indicates the number of iterations out of ten for which the feature was selected. All features were used in every iteration for the CP+Clinical dataset. The CPOnly Coefficient and CP+Clinical Coefficient indicate the mean coefficient for the feature as learned by the logistic regression classifier for the CPOnly and CP+Clinical datasets, respectively. Positive coefficients (bold text) indicate features for which positive values are associated with an increase in the predicted sepsis probability. Negative coefficients (italics text) indicate features for which positive values are associated with a decrease in the predicted sepsis probability. The “difference” variables are positive when the value has increased compared to the patient’s average over the previous 24 hours.

More »

Table 8 — Table 8.

SVM with radial basis kernel performance when removing input features.
Features are removed cumulatively, that is each row represents performance when removing all features indicated in the rows up to and including the current row. Metric values are computed as the mean over 10 iterations of cross-validation. Bold text indicates best performance in each column.

More »

Fig 7 — Fig 7.

Learning curves (LC) for: (A) SVM on the CPOnly dataset; (B) SVM on the CP+Clinical dataset; (C) logistic regression on the CPOnly dataset; (D) logistic regression on the CP+Clinical dataset. Performance was evaluated by 10-fold cross validation. Symbols indicate the mean value over the 10 folds and the shaded region indicates one standard deviation. Optimal F1 score (y-axis) is 1.0.

More »

Table 9 — Table 9.

Selected studies applying machine learning for sepsis recognition.
Prediction time indicates the time of model prediction relative to time of draw for blood culture.

More »