Diagnostic suspicion bias and machine learning: Breaking the awareness deadlock for sepsis detection

Many early warning algorithms are downstream of clinical evaluation and diagnostic testing, which means that they may not be useful when clinicians fail to suspect illness and fail to order appropriate tests. Depending on how such algorithms handle missing data, they could even indicate “low risk” simply because the testing data were never ordered. We considered predictive methodologies to identify sepsis at triage, before diagnostic tests are ordered, in a busy Emergency Department (ED). One algorithm used “bland clinical data” (data available at triage for nearly every patient). The second algorithm added three yes/no questions to be answered after the triage interview. Retrospectively, we studied adult patients from a single ED between 2014–16, separated into training (70%) and testing (30%) cohorts, and a final validation cohort of patients from four EDs between 2016–2018. Sepsis was defined per the Rhee criteria. Investigational predictors were demographics and triage vital signs (downloaded from the hospital EMR); past medical history; and the auxiliary queries (answered by chart reviewers who were blinded to all data except the triage note and initial HPI). We developed L2-regularized logistic regression models using a greedy forward feature selection. There were 1164, 499, and 784 patients in the training, testing, and validation cohorts, respectively. The bland clinical data model yielded ROC AUC’s 0.78 (0.76–0.81) and 0.77 (0.73–0.81), for training and testing, respectively, and ranged from 0.74–0.79 in four hospital validation. The second model which included auxiliary queries yielded 0.84 (0.82–0.87) and 0.83 (0.79–0.86), and ranged from 0.78–0.83 in four hospital validation. The first algorithm did not require clinician input but yielded middling performance. The second showed a trend towards superior performance, though required additional user effort. These methods are alternatives to predictive algorithms downstream of clinical evaluation and diagnostic testing. For hospital early warning algorithms, consideration should be given to bias and usability of various methods.


Supplementary methods
Variable GCS Age (years) Temperature high Temperature low Pulse pressure (i.e., systolic blood pressure minus diastolic blood pressure) Product of pulse pressure and heart rate (surrogate for cardiac output) Mean arterial pressure divided by product of pulse pressure and heart rate (surrogate for peripheral vascular resistance) Systolic blood pressure (mmHg) Ratio of heart rate and systolic blood pressure (Shock index, bpm/mmHg) SpO2 (%) Heart rate high (bpm) Heart rate low (bpm) Respiratory rate (breaths/min) Next, we sought to accommodate major nonlinear associations between these predictors versus sepsis, using a principled, data-driven approach.For each continuously valued feature, we performed a univariate locally weighted scatter plot smoothing (LOWESS) fit to non-parametrically assess the trend of the relationship between sepsis incidence and the variable, as measured at over the whole domain of values it was observed to take on.
In general, this analysis resulted in one of three possible actions for each variable: a) For most continuous-valued variables (for example, age), the relationship appeared to be essentially linear over the entire domain of the variable present in our data.In these cases, we incorporated these variables into the model with only a simple z-standardization.
b) The second group of variables consisted of the five variables in which there was clearly a saturation in sepsis incidence on one or both ends of the variables' domains (Fig SM-1, top row).For example, the incidence of sepsis clearly decreases as triage SBP increases until about an SBP of 100 mmHg.From this point and above, sepsis incidence is not associated with SBP.To build this relationship into the models, we implemented a saturation nonlinearity, clipping any variable identified to have such a relationship at the threshold value, which was identified by manual review (Fig SM-1, bottom row).All SBP measurements, for instance, whether at triage or later in the ED stay, were clipped such that any value above 100 mmHg was used in further analysis as exactly 100 mmHg.For example, as the SIRS criteria would suggest, there is a positive association with sepsis for both abnormally elevated and abnormally decreased body temperature.To capture this biphasic relationship in the models, we transformed each variable in which we observed such a relationship into two variables, with one representing the relationship at the low end, clipped above a high threshold  The full list of nonlinearly scaled continuous variables and their cut-offs are summarized in the table below.

Adjudication of "auxiliary queries"
The first auxiliary query was "Was there a report of a "bacterial infection symptom complex" (BISC)?"A priori, our team developed criteria that we hypothesized would indicate that a patient likely has a bacterial infection based on symptoms alone.The intent of the BISC criteria was an objective clinical tool for determining whether a patient likely had a bacterial infection.The BISC was developed after prior work by our team on assessing clinical probability of infection, in which we noted excessive inter-rater variability.The BISC criteria are as follows: • BISC Criterion A: At least one symptom localizing the source of infection; • BISC Criterion B: At least one symptom indicative of an infectious process (constitutional or local inflammation); • BISC Criterion C: No alternative diagnosis that was substantially more likely to cause those symptoms than bacterial infection.
In Table SM-3, we illustrate different subtypes of BISC that were also developed a priori to chart review.After completing chart review for Interval-1, we examined the test characteristic of the BISC criteria for predicting sepsis.Recall that this was a study population of ED patients with at least one vital-sign abnormality documented during their ED visit (SBP < 100 mmHg or HR > SBP).For those patients who met criteria for any of the BISC subtypes, sensitivity for sepsis was 36% (28-45%); specificity was 95% (93-96%); and positive predictive value (PPV) was 62% (50-72%).Therefore, the majority of ED patients who had at least mild vitalsign abnormalities and who also met the BISC criteria were septic [1].
The second auxiliary query was "Was there a report of fatigue or altered mental status?"The third auxiliary query was "Was there concern for bacterial infection prior to arrival in the Emergency Department?" which included either referral from outpatient clinic with concern for infection, or report of fever/chills/rigors prior to arrival.Our guidelines for chart review for those questions are shown in Table SM Table SM-4.Criteria for chart review for "altered mental status", "fatigue/malaise", or "pre-arrival concern for infection".

"Altered mental status"
Was there written evidence to suggest the patient had altered mentation in the ED documentation (ED nursing notes, mid-level provider notes, MD notes)?In addition to the note(s) saying explicitly that the patient is confused or experiencing altered mental status, being somnolent, unresponsive and/or being found down (unless the patient has fallen due to weakness) also counts.

"Fatigue/Malaise"
If there is written evidence in the ED documentation of the patient being malaise/lightheaded/weak/lethargic you should check "yes" for this field."Pre-arrival concern for infection" Referred in for infectious diagnosis or diagnostic data suggestive of infection?Not including explicitly viral processes such as diagnosed influenza.There must be labs that suggest infection or some other diagnostic data present, such as elevated WBC count, leukocytosis, positive blood culture, positive urine culture, chest x-ray findings (consolidation, PNA) or the patient was referred in for an explicit infectious diagnosis, such as sepsis, UTI, SBO, diverticulitis, etc. "Fever/chills/rigors prior to ED arrival" Was there written evidence to suggest the patient had fevers/chills/rigors prior to ED arrival?The fever can be a subjective fever and shaking can be considered chills if the patient doesn't have something else going on that could cause shaking.It's also important to note that "sweats" and "feels cold" doesn't count.

Adjudication of major comorbidities
Finally, we studied major comorbidities documented within the patient's medical record."Any major comorbidity" for the Essential Model was taken as one or more from a list of major comorbidities.Criteria for the individual comorbidities is given in Table SM-5.In this analysis, these items were determined by manual chart review.In the future, the patient's EMR problem list could be electronically filtered to automatically determine whether such major comorbidities were present.
Table SM-5.Criteria for chart review for major comorbidities.
"active cancer" If it's there has been resection/remission, and there is no current treatment (chemo, hormone-based chemo, radiation (XRT), trial drugs, ect.), then the cancer is not considered to be active.Active liquid tumors count for this field (e.g., myelodysplastic syndrome and multiple myeloma count)."coronary artery disease" Check "yes" for this field if there is written evidence in the admission note/discharge summary of past medical history of CAD, STEMI, NSTEMI, ischemic cardiomyopathy, or coronary atherosclerosis."congestive heart failure" Check "yes for this field if there is written evidence in the admission note/discharge summary of past medical history of CHF, HF, HFpEF, HFrEF."COPD or chronic respiratory illness" Check "yes" for this field if there is written evidence in the admission note/discharge summary of past medical history of COPD, home O2 use, interstitial lung disease, and chronic asthma."connective tissue disease" You should mark this field as "yes" if there is evidence of SLE, polymyositis, mixed CTD, polymyalgia rheumatica, moderate to severe RA. "cerebrovascular accident" Check "yes" for this field if there is written evidence in the admission note/discharge summary of past medical history of CVA, ischemic CVA, hemorrhagic CVA, stroke.

"diabetes mellitus with endstage complications"
None or uncomplicated for when the patient does not have diabetes, or if the patient does have diabetes but it is controlled by medication or diet.End-organ damage for when the patient has written evidence of end-organ dysfunction (e.g., retinopathy, neuropathy).

"immunocompromised"
If the patient is immunocompromised due to transplant, chronic immunosuppression medication, no spleen, AIDS, or primary immunodeficiency, check "yes" for this field."liver disease" If there is no evidence of liver disease, select none.Check mild if there is cirrhosis without PHT, or if there are elevated LFTs with a history of HCV.Check moderate to severe if there is evidence of cirrhosis with portal hypertension +/variceal bleeding.

"chronic disability"
Chronic disability due to quadriplegia, hemiplegia/paraplegia, dementia, inability to walk or care self, chronic trach/vent, tubes/drains, chronic wound, self-cath bladder."chronic kidney disease" Check "yes" for this field if there is evidence of moderate CKD (creatinine >3mg/dL (0.27 mmol/L), or evidence of severe CKD (on dialysis, s/p kidney transplant, uremia)."major surgery within one month" Any incision or manipulation that went deeper than the skin and is more than a vascular-puncture/percutaneous.An example of just a vascular-puncture is a needle biopsy, stent, urinary stenting.Of note, I&D does count as it is more than a puncture and there is a possibility for pus re-accumulation.

"IV drug use"
Active IV drug use, not just history of IV drug use.* CMS SEP-1 metrics data from: [5]; Massachusetts average for CMS SEP-1 metrics 55% and US National average 58%.Furthermore, not that the meaning of these measures remains controversial: "experts have continued to raise concerns that SEP-1 remains overly prescriptive, lacks a sound scientific basis and presents risks (overuse of antibiotics and inappropriate fluids not titrated to need).".[6] 2. Supplementary results

Model composition (additional details)
Additional technical details about the model, including details about the pre-processing of the predictor variables, is available in the first author's doctoral thesis [7].
For the "Essential Model" the final input variables were, in order of weight, as follows: pre-ED concern for infection; major comorbidities; fatigue or confusion; bacterial symptom complex; respiratory rate; shock index; gender; age; temperature; GCS; SBP; SpO2.Using the testing dataset, the calibration of this model was evaluated: For the "Bland Model" the final input variables were, in order of weight, as follows: respiratory rate; shock index; temperature (fever only); age; gender; GCS; SBP; temperature (hypothermia only); and SpO2.

Audit for biases related to social determinants of health
In multivariable analysis of the relationship between the Essential Model output and sepsis, race/ethnicity and gender were not significant factors, i.e., p > 0.05, indicating that there was no statistically significant global bias towards positive predictions nor negative predictions by the Essential Model as a function of race/ethnicity nor gender.See Table SM-9.

Fig SM- 1 :
Fig SM-1: Transformations of continuous variables that were found to have monotonic associations with sepsis.Top row includes LOWESS regression with sepsis incidence, along with the cumulative distribution function of the variable at triage and a smoothed probability density function.Density functions that are very broad were multiplied by five for visualization purposed where noted.Vertical black lines indicate the thresholds below or above which variable values were clipped.
and the other representing the relationship at the high end, clipped below a low threshold (Fig SM-2, bottom row).

Fig SM- 2 :
Fig SM-2: Transformations of continuous variables that were found to have biphasic associations with sepsis.Top row includes LOWESS regression with sepsis incidence, along with the cumulative distribution function of the variable at triage and a smoothed probability density function.Density functions that are very broad were multiplied by five for visualization purposed where noted.Vertical black lines indicate the thresholds below or above which variable values were clipped.Bottom row shows two lines, one for each of the two transformed variables created.

Fig SM- 3 :
Fig SM-3: Calibration curve for Essential Model using testing dataset.

variables sourced from the EMR or computed from such variables
Continuous variables sourced from the EMR or computed from such variables are listed in Table SM-1.Systolic blood pressure, diastolic blood pressure, and heart rate were normalized by their values at triage.