A Risk Prediction Model for Screening Bacteremic Patients: A Cross Sectional Study

Background Bacteraemia is a frequent and severe condition with a high mortality rate. Despite profound knowledge about the pre-test probability of bacteraemia, blood culture analysis often results in low rates of pathogen detection and therefore increasing diagnostic costs. To improve the cost-effectiveness of blood culture sampling, we computed a risk prediction model based on highly standardizable variables, with the ultimate goal to identify via an automated decision support tool patients with very low risk for bacteraemia. Methods In this retrospective hospital-wide cohort study evaluating 15,985 patients with suspected bacteraemia, 51 variables were assessed for their diagnostic potency. A derivation cohort (n = 14.699) was used for feature and model selection as well as for cut-off specification. Models were established using the A2DE classifier, a supervised Bayesian classifier. Two internally validated models were further evaluated by a validation cohort (n = 1,286). Results The proportion of neutrophile leukocytes in differential blood count was the best individual variable to predict bacteraemia (ROC-AUC: 0.694). Applying the A2DE classifier, two models, model 1 (20 variables) and model 2 (10 variables) were established with an area under the receiver operating characteristic curve (ROC-AUC) of 0.767 and 0.759, respectively. In the validation cohort, ROC-AUCs of 0.800 and 0.786 were achieved. Using predefined cut-off points, 16% and 12% of patients were allocated to the low risk group with a negative predictive value of more than 98.8%. Conclusion Applying the proposed models, more than ten percent of patients with suspected blood stream infection were identified having minimal risk for bacteraemia. Based on these data the application of this model as an automated decision support tool for physicians is conceivable leading to a potential increase in the cost-effectiveness of blood culture sampling. External prospective validation of the model's generalizability is needed for further appreciation of the usefulness of this tool.


Background
Bacteraemia is a frequent and severe condition with an annualized incidence of 122 per 100.000 people. The mortality rate ranges between 14% and 37% [1][2][3]. Risk factors for bacteraemia are advanced patient's age, urinary or indwelling vascular catheter, fulfilment of two or more SIRS criteria, impaired renal or liver function, malignancy or other chronic co-morbidities [4][5][6][7][8]. Although blood culture analysis is considered the gold standard for diagnosing bacteraemia in patients with suspected blood stream infection, the clinical decision of when to take a blood culture is not trivial. Despite profound knowledge about the pre-test probability of positive blood culture results, which is strongly influenced by the site of infection, true positive rates identifying a causative pathogen are in a low range when consecutively assessed (4.1%-7%) [9][10][11]. Compared to the true positive rate, false positive results due to contamination are in a similar or even in a higher range, varying between 0.6% to over 8% [11][12][13]. Importantly, these imperfections of blood culture analysis have an important economic impact, resulting in a 20% increase of total hospital costs for patients with false positive blood cultures [14][15][16][17]. Economic analyses estimate the costs related to a single false positive blood culture result between $6,878 and $7,502 per case [17][18][19]. Table 1. Patient characteristics and variables analysed.   To increase the cost effectiveness of blood culture analysis, the identification of targeted patient cohorts is therefore highly needed. Several prediction systems for bacteraemia in special patient cohorts have been published with ROC-AUCs in a moderate range [20][21][22][23][24]. However, physicians are arguably inefficient in applying a multitude of available prediction scores for specific conditions and specific patient cohorts [25,26]. The aim of the current study was therefore to establish a machine learning based prediction system for inpatients and outpatients with suspected bacteraemia using highly standardized and routinely available laboratory parameters to identify those patients for whom blood culture sampling may safely be omitted due to very low pre-test probability for bacteraemia.

Study Design and Data Collection
The current study was designed as a retrospective cohort study, including inpatients and outpatients at the Vienna General Hospital, Austria, a 2,116-bed tertiary teaching facility. Between January 2006 and December 2010, patients with the clinical suspicion to suffer from bacteraemia were included if blood culture analysis was requested by the responsible physician and blood was sampled for assessment of haematology and biochemistry. Patients younger than 18 years and patients with unavailable laboratory parameter results were excluded. Patients with a potential blood culture contaminant and those with missing or inaccurate identification to the species level were excluded from further analysis. Blood culture contamination was defined according to the criteria of Hall and Lyman [27]. Furthermore, patients with rare blood culture isolates (less than 0.15% frequency of positives) were also excluded. Patients'age, gender and 49 laboratory parameters (see table 1) were used in the analysis. All laboratory parameters had been assessed in accordance to parameter specific SOPs at the Clinical Department of Laboratory Medicine, Medical University Vienna, an ISO 9001:2008 certified and ISO 15189:2008 accredited facility. Anonymous raw data can be request by contacting the corresponding author. Following national regulations each request will be evaluated for approval by the local human data safety commission.

Ethical Considerations
The study was approved by the local Ethics Committee of the Medical University Vienna (EC-Nr.: 333/2011) and conducted in accordance to the Declaration of Helsinki (1965, including current revisions), the rules of Good Clinical Practice (GCP, European Union) and the standards for the reporting of diagnostic accuracy studies (STARD). Since a retrospective study design was applied, informed consent was not sought from study participants. To assure anonymity, every study participant was assigned a consecutive identification number, which was exclusively used for further analysis.

Evaluation method
The data set was divided into a derivation set (Jan 1, 2006 to Jul 31, 2010) and a validation set (Aug 1, 2010 to Dec 31, 2010) based on the date of inclusion. For feature selection and model training the derivation set was used. Feature selection and internal validation of the trained model was performed using a 10 fold cross validation scheme. Results of the internal validation were taken to set cut-off points for risk stratification of the study population. The Youden index method was applied to set optimal cut-off points [28,29]. Using likelihood ratios (LR; LR 2 :0.12, LR + :4.93, see figure S1) of corresponding cut-off values, three strata were established to group the patients into a low risk, intermediate risk and high risk group. For the low risk group a cutoff point for the classification probability was set to yield 1% posttest probability for bacteraemia. For the high risk group, a cut-off point resulting in more than 30% post-test probability was predefined. Classification probabilities between these defined cut off points were allocated to the intermediate risk group. To externally validate the discriminatory potency of the previously trained algorithm and risk strata, the validation set was used.

Statistical Analysis
For statistical analysis, WEKA (Version 3.7.10, GNU General Public License) and R (Version 3.0.2, GNU General Public License) were used [30]. Descriptive statistics of all variables indicated are given as median and interquartile range. For single variable analysis, the Mann-Whitney U-test, Pearson's chi-squared test and area under the receiver operating characteristic curve (ROC-AUC) analysis of individual variables were applied [31]. To train the multivariable models, variables with a high discriminative power were selected, using the wrapper subset evaluator algorithm and the correlation feature selection (CFS) subset evaluator of WEKA. The wrapper approach aims at selecting a relevant set of variables for a specific classification algorithm (in our case the A2DE algorithm, see below) [32]. The CFS subset evaluator evaluates the discriminatory power of a variable subset with respect to their inter-correlation to each other [33]. Furthermore, the effect of each variable was evaluated by a step-wise deletion of variables in the order of their individual Pearson's correlation coefficient with respect to the outcome.
For statistical modelling, several major groups of supervised machine learning algorithms were applied, including Bayesian classifiers such as Naïve Bayes, artificial neural networks such as multilayer perceptrons, or support vector machines. The best results were consistently achieved with the averaged 2-dependence estimators (A2DE) algorithm. The A2DE, belonging to the averaging n-dependence estimator classifier group, is a semi-Naïve Bayes method [34]. This group of algorithms assumes that  Table 2. Differences between derivation cohort and validation cohort.  Table 3. Results of the models'diagnostic performances at predefined cut-off points. each predicting variable depends on the outcome-class and n other variables. In case of the A2DE classifier, n equals two, whereas the classic Naïve Bayes algorithm is a zero-dependence estimator, assuming that all variables are conditionally independent from each other [35,36]. In many real-world applications, this independence assumption is violated, leading to inadequate results. The Naïve Bayes algorithm requires a two dimensional table (outcome class and predicting variable) for indexing the probability estimates. In contrast, the A2DE requires two additional dimensions for the estimation of the two additional variable dependencies. Further, these classifiers aggregate the predictions made by a collection of n-dependence estimators [37]. These procedures decrease the bias but slightly increase the model's variance [38]. However, comprehensive experimental evaluations indicate that the A2DE's trade-off between bias and variance results in a good predictive accuracy for many applications and data sets [39][40][41]. For ROC-curve comparison, a paired t-test (comparison of paired cross validation folds), the DeLong test or the Hanely and McNeil comparison test were applied to values of the ROC-AUC [42][43][44]. Furthermore, 95% confidence intervals of performance measures, including sensitivity, specificity, negative predictive value (NPV) or positive predictive value (PPV), were calculated with bootstrapping (2,000 iterations) [45]. Where appropriate, the Bonferroni-Holm method was used to control for type I errors, related to multiple testing. Statistical significance was defined as a p-value less than 0.05.

Study population
Between January 2006 and December 2010, blood culture analysis was requested for 23,765 patients. Figure 1 presents the selection process of patients. Patients less than 18 years old (n = 3,879), patients with unavailable laboratory parameter results (n = 3,389), patients with blood culture contamination, patients with blood culture results having missing or inaccurate identification to the species level and fungal growth (n = 464) and patients with rare blood culture isolates (n = 48) were excluded from analysis. The final study population consisted of 15,985 patients. Among them, 1,286 patients (8%) had a positive blood culture result. Most prevalent bacteria were E. coli (n = 406, 31.5%), S. aureus (n = 297, 23.1%), and K. pneumonie (n = 83, 6.5%). Patient characteristics are presented in Table 1. According to a predefined temporal criterion (cut-off date: Aug 1, 2010), the data set was divided into a derivation set (n = 14,691, 8% bacteraemia) and a validation set (n = 1,294, 8.2% bacteraemia).

Feature selection and model training
Among 51 available variables in the derivation set, 40 variables resulted in a statistically significant difference between bacteraemia and non-bacteraemia patients. The best individual discriminatory variable was the proportion of neutrophil leukocytes in differential blood count (p,0.0001) with an ROC-AUC of 0.694 (CI: 0.686-0.702). At the Youden Index cut-off point, the relative amount of neutrophils resulted in 61.95% (59.1%-64.7%) sensitivity and 67.6% specificity (66.8%-68.4%), respectively. Among all vari- ables, 20 variables were selected by the wrapper approach (model 1), which were further evaluated by the CFS subset evaluator (model 2). Finally, model 2 consisted of ten variables, including patient's age, proportion of neutrophils, monocytes (absolute and relative value), eosinophils (absolute value), lymphocytes (absolute value), sodium, C-reactive protein, creatinine and total bilirubin ( Table 2). Also other feature selection steps were evaluated, resulting in models with lower ROC-AUCs than described below.
A number of applicable classes of supervised machine learning techniques including artificial neural networks and support vector machines were screened in the model selection process. Figure S2 presents ROC-curves of various classifiers. The best results in ROC curve analysis were achieved by applying the A2DE classifier yielding an ROC-AUC of 0.767 (CI: 0.754-0.781) in model 1, and of 0.759 (CI: 0.745-0.773) in model 2, respectively. This classifier is conceptually simpler than other algorithms available, and presented constantly better results in ROC-AUC analysis than other classifier tested. Generally, the models'calibration appears to be good. Calibration plots are shown in figure S3. Model 1 shows a modest risk for overestimation for patients at higher bacteraemia risk. This overestimation effect is not seen in model 2, which therefore appears to be very well calibrated.
Using the Youden Index method to set an optimal cut-off point, model 1 yielded 72.1% sensitivity and 70.3% specificity with 17.3% PPV and 96.7% NPV. Model 2 yielded 67.7% sensitivity and 72.8% specificity with 17.8% PPV and 96.7% NPV. Different cut-off points were used to establish a low risk, an intermediate risk and a high risk group for bacteraemia. Table 3 summarizes diagnostic prediction measures when using different cut-off points. Importantly, the low risk group demonstrates a NPV of 98.84 (model 1) and 99.14 (model 2), respectively.

Effects of feature reduction and missing values
To estimate the effect of omitting variables with low predictive power, variables of model 1 were ranked according to their individual Pearson correlation coefficient against the outcome variable and deleted step by step in that order. The majority of deletion steps led to a significant decrease of the ROC-AUC. Figure 2 summarizes this deletion procedure.
Due to its retrospective study design, some variables were not available for all patients ( Table 2)

Validation set
To test the generalizability of the established models, a validation set (n = 1,294) was used. Model 1 achieves an ROC-AUC of 0.80 (CI: 0.76-0.84, see figure S4). Model 2 yields an ROC-AUC of 0.79 (CI: 0.74-0.83). No significant differences were found between ROC-AUCs derived from the validation set and the corresponding ROC-AUCs derived from the derivation set (model 1: p = 0.1542, model 2: p = 0.2594).
When applying the cut-offs point predefined by the Youden index method in the derivation cohort, model 1 yields a sensitivity of 79.3% and a specificity of 68.4% with 18.4% PPV and 97.4% NPV. Model 2 achieved a sensitivity of 80.2% and a specificity of 70.0% with 19.3% PPV and 97.5% NPV. Using the predefined cut-off points for the risk model, 16% of the patients (n = 202) were allocated to the low risk group and 7% (n = 89) to the high risk group, respectively. Among the patients in the low risk group, only 2 patients were false negatives. Similarly, applying model 2, 157 patients (12%) were allocated to the low risk group with 3 false negatives. Details of the risk model are provided in table 2 while figure 3 represents a tree-based graphical representation of the prediction outcome.

Discussion
The goal of the current study was to assess the discriminatory power of machine learning models with frequently requested variables for predicting negative blood culture results in inpatients and outpatients with a suspicion to suffer from bacteraemia. The cost effectiveness of blood culture analysis very much depends on the diagnostic yield and therefore an automated tool improving the selection of patients may therefore increase cost-effectiveness. Several scoring systems predicting the probability of a positive blood culture result in a specific patient cohort have been published previously [20,21,[46][47][48]. However, since these scores necessitate the manual calculation by the physician, these are often not applied. Our approach was to compute a potentially automated decision support tool to improve the cost-effectiveness of blood culture sampling using highly standardized data resulting in ROC-AUCs between 0.759 and 0.804. Based on these models the NPV was 99.01% for model 1 and 98.1% for model 2 for patients of low risk for bacteraemia. Based on these results the proposed support tool would be able to safely reduce 12-16% of blood culture sampling leading to a reduction of costs.
In this study, statistical analysis was restricted to laboratory parameters as well as gender and patient's age, which are all readily available and highly standardized. These variables combine the advantage of reproducibility and availability as opposed to most clinical variables.
Pre-test probability of bacteraemia may vary considerably between studies potentially impacting on the diagnostic accuracy of prediction models [10,11]. Our results are similar to those of a previous study by Piftenmeyer et al. reporting a 8.2% prevalence of bacteraemia [49]. Nakamura et. al. published a hospital based study with a 19.5% prevalence of bacteraemia and predicting bacteraemia with an ROC-AUC of 0.73 [47]. The prevalence of bacteraemia (19.5%) in this study is higher than generally reported for hospital-based studies and may therefore lack generalizability [10,11]. Finally, Jin et al. evaluated a Bayesian algorithm for the prediction of bacteraemia in 19,303 patients, yielding an ROC-AUC of 0.70 [50]. In contrast to our study, however, laboratory markers included in the analysis were allowed a considerable lag time to blood culture sampling of up to 72 hours, or even 7 days in case of albumin and alkaline phosphatise. Considering the dynamic evolution of inflammation makers, this discrepancy in sampling times may have importantly impacted on their results.
Several limitations have to be acknowledged in this study. Firstly, the retrospective nature of the study may introduce bias in the analysis of the results. Although the data set has been split into a sub-set used for model generation and one for validation, the external generalizability needs to be addressed prospectively at other health care institutions. Finally, the applicability of an automated decision support tool needs to be tested in clinical practice. The potential trade-off between diagnostic certainty and economic aspects must be well-balanced and may vary between different settings [51,52].
In conclusion our data show the utility of highly standardized variables for predicting bacteraemia with an ROC-AUC between 0.759 and 0.800. This prediction model may be tested for implication as clinical support tool to exclude blood culture sampling in patients with very low probability for bacteraemia. A prospective evaluation of the model's generalizability would be indicated. Figure S1 Fagan's Nomogram. To graphically represent the correlation between pre-test probability, likelihood ratio and posttest probability; left side: negative likelihood ratio for low risk group cut-off point specification; right side: positive likelihood ratio for high group cut-off point specification.