Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning

doi:10.1371/journal.pone.0174708

Table 1.

Data types used in models and amount of missing or out of range data.

More »

Expand

Table 2.

Features used for the predictive models.

More »

Expand

Fig 1.

Pipeline for natural language processing and prediction.

Our algorithm first takes as input a triage note and processes it by applying tokenization followed by bigram and negation detection, the latter using a customized version of the NegEx tool [14]. The processed text is then transformed into a set of features. The Bag-of-Words features count how many times each word in our vocabulary appears in the processed note, and the Topic model features (derived using the Mallet [17] tool) measure how much certain topics are represented in the note. A Support Vector Machine (SVM) is then trained on these sets of features to determine whether the patient presents an infection, using the SVM^perf software [15].

More »

Expand

Table 3.

Statistics of data set.

More »

Expand

Table 4.

Performance characteristics for the SVM models.

More »

Expand

Fig 2.

Receiver operating characteristic curve.

Vitals—Age, Gender, Severity, Temperature, Heart Rate, Respiratory Rate, Oxygen Saturation, Systolic Blood Pressure, Diastolic Blood Pressure, Pain Scale. Chief Complaint—Chief Complaint + Vitals. Bag of Words—Vitals + Chief Complaint + Triage Assessment. Topics—Vitals + Chief Complaint + Triage Assessment

More »

Expand

Fig 3.

Calibration plots.

We assess the models’ calibration by plotting for each predicted probability range, in increments of 0.1, the fraction of patients with this predicted probability of infection that truly had an infection. Perfect calibration would correspond to the straight line from (0,0) to (1,1). We additionally show bar plots of the number of predictions made by each method within each probability interval. The Vitals model, which has the least data to go on, makes very few predictions of infection with probability greater than 0.5, leading to very large confidence intervals toward the upper right of the plot. The Bag of Words and Topics models are better calibrated, and are particularly accurate for the highest risk patients. Vitals—Age, Gender, Severity, Temperature, Heart Rate, Respiratory Rate, Oxygen Saturation, Systolic Blood Pressure, Diastolic Blood Pressure, Pain Scale. CC—Chief Complaint + Vitals. BoW (Bag of Words)—Vitals + Chief Complaint + Triage Assessment. Topics—Vitals + Chief Complaint + Triage Assessment

More »