The authors have declared that no competing interests exist.
Circulatory shock is a life-threatening disease that accounts for around one-third of all admissions to intensive care units (ICU). It requires immediate treatment, which is why the development of tools for planning therapeutic interventions is required to deal with shock in the critical care environment. In this study, the ShockOmics European project original database is used to extract attributes capable of predicting mortality due to shock in the ICU. Missing data imputation techniques and machine learning models were used, followed by feature selection from different data subsets. Selected features were later used to build Bayesian Networks, revealing causal relationships between features and ICU outcome. The main result is a subset of predictive features that includes well-known indicators such as the SOFA and APACHE II scores, but also less commonly considered ones related to cardiovascular function assessed through echocardiograpy or shock treatment with pressors. Importantly, certain selected features are shown to be most predictive at certain time-steps. This means that, as shock progresses, different attributes could be prioritized. Clinical traits obtained at 24h. from ICU admission are shown to accurately predict cardiogenic and septic shock mortality, suggesting that relevant life-saving decisions could be made shortly after ICU admission.
Shock (or circulatory shock) is a life-threatening medical condition that requires immediate treatment. It is prevalent in the Intensive Care Unit (ICU) and a major concern in Critical Care in general. This condition occurs when the organs and tissues of the body do not receive enough blood and, as a result, cells see their oxygen and nutrients supply restricted so that organs become damaged. Hypotension, tissue hypoperfusion and hyperlactatemia are amongst the most common symptoms [
Four types of shock are commonly defined: hypovolaemic shock (e.g. hemorrhagic shock), cardiogenic shock, distributive shock (e.g. septic shock) and obstructive shock. The mortality rate of the condition remains very high and depends on its type. It is quantified at 30% for septic shock (according to the new definition of such condition [
Machine Learning (ML) techniques can help find a list of attributes to predict the outcome of shock patients from clinical data (i.e. data routinely monitored in the ICU). However, there is a large amount of attributes that characterize each patient, ranging from base pathology information, to lab procedures or hemodynamic data and treatment, to name a few. As a result, data are likely to be very high dimensional and, therefore, feature selection (FS) methods are often required to reduce data dimensionality and its associated variability. The combination of statistics and ML may provide an appropriate framework to retrieve new knowledge from data, as well as causal relationships between different features from high-dimensional datasets [
This study analyzes the clinical database gathered within the
When it comes to the detection of circulatory shock, physicians and therapists largely depend on a combination of clinical, hemodynamic and biochemical signs. For a treatment to be chosen, prompt identification of shock is necessary. Appropriate treatment is based on a good understanding of the physiological mechanisms behind the condition. The management of shock is challenging and patient survival is highly dependent on the timely administration of the appropriate treatment. Normally, it requires the administration of vasoactive drugs and fluid resuscitation. This treatment is usually given to counteract the other conditions that go along with shock (e.g. hypotension/hemodynamic instability, inflammation and multiple organ failure (MOF)). The fact that they have similar symptoms makes addressing the causes of shock very difficult. As a result, decision making at the onset of the condition is not a trivial task and, therefore, it is important to design fast, reliable and interpretable tools to plan therapeutic interventions to prevent mortality, or any irreversible consequences caused by shock.
The study of the pathophysiology of shock and its management has been an active area of research in ML applications for Critical Care. For example, fuzzy decision support systems (DDS) for the management of post-surgical cardiac intensive care patients have been described in [
ML methods have also been used with varying success for the more specific problem of the prediction of mortality caused by sepsis. A diagnostic system for septic shock based on ANNs (Radial Basis Functions -RBF- and supervised Growing Neural Gas) was presented in [
The work reported in this paper attempts to identify clinical traits that can be used as predictors of mortality in patients with cardiogenic or septic shock. Missing data imputation techniques and ML models are used, followed by different approaches to FS from different data subsets. Selected features are later used to build causal Bayesian Networks (CBN), in order to reveal potential causal relationships between features and ICU outcome.
The remaining of the paper is structured as follows. The section Materials and methods describes the
The ML pipeline presented in this paper is divided into two main experimental phases: FS and causal discovery.
The FS experiments deal with the problem of finding the most promising attributes (clinical traits) for the prediction of mortality related to shock. To do this, three subsets of the
The evaluated feature sets were obtained by applying four different FS techniques to the aforementioned datasets: Univariate FS based on ANOVA F-value (
Finding a subset of attributes that are predictive of mortality due to shock may not suffice in a clinical setting. The causal discovery experiments reported next allow us to go one step further and analyze the causal relationships between the data features that were selected in the previous phase of the ML pipeline. In order to reveal these causal relationships, the Fast Greedy Search (FGES) algorithm was used. The algorithm was applied twice for each of the feature subsets obtained in the FS experiments. First, the algorithm was applied only to the features, excluding the outcome, and, then, it was applied to both the features and the outcome. The expectation was to find few or no differences between features that did not involve the outcome. That would mean that the CBNs are stable enough to draw some clear conclusions. The CBNs were built using the whole dataset with the selected features imputed by RF.
For the implementation and more detailed information about the methods, please refer to the corresponding sections.
This study is part of the prospective observational trial
The complete database integrates blood samples and hemodynamic recordings from septic shock and cardiogenic shock patients, and from septic patients, obtained in the ICU [
Parameter | Cardiogenic shock | Septic shock | p-values |
---|---|---|---|
Number of patients | 25 | 50 | - |
Gender (Males:Females) | 20:5 | 34:16 | - |
Outcome (Alive:Dead) | 19:6 | 38:12 | - |
Body mass index | 26.8 ± 5.93 | 25.22 ± 5.54 | 0.2591 |
Total days in ICU | 6.6 ± 5.42 | 7.56 ± 6.54 | 0.5289 |
SOFAT1 | 10.72 ± 4.27 | 11.50 ± 3.7 | 0.4165 |
SOFAT2 | 8.33 ± 4.92 | 8.77 ± 4.3 | 0.699 |
SOFAT3 | 5.87 ± 3.68 | 6.75 ± 4.7 | 0.5334 |
APACHE IIT1 | 24.12 ± 9.79 | 23.16 ± 7.52 | 0.6396 |
APACHE IIT2 | 18.42 ± 9.41 | 16.36 ± 8.45 | 0.3530 |
APACHE IIT3 | 12.8 ± 5.35 | 14.83 ± 5.93 | 0.2727 |
Lactate levelsT1 | 6.04 ± 4.3 | 4.84 ± 2.58 | 0.1431 |
Lactate levelsT2 | 1.52 ± 0.91 | 2.1 ± 1.52 | 0.1126 |
Lactate levelsT3 | 1.25 ± 0.5 | 1.43 ± 0.58 | 0.3309 |
Summary of patient populations with cardiogenic and septic shock. Numerical parameters of both populations (
The
The rows and the columns of the map correspond to observations and features, respectively. The black color represents missing values, the white color corresponds to present values in the
The
A research hypothesis is that the closer in time the feature to the final outcome, the better the prediction of mortality. To test this hypothesis, two further datasets were created, namely
A fourth feature set was built with features that are commonly assumed to be associated with the mortality of patients with shock, according to current practice. This feature set is referred to as
Name of the feature | Observations | Type |
---|---|---|
Which type of shock | 75 | categorical (4 val) |
Lactate levels (mmol/L)T1 | numerical (cont) | |
Lactate levels (mmol/L)T2 | numerical (cont) | |
Lactate levels (mmol/L)T3 | numerical (cont) | |
Mean arterial pressure (mmHg)T1 | 75 | numerical (cont) |
Mean arterial pressure (mmHg)T2 | numerical (cont) | |
Mean arterial pressure (mmHg)T3 | numerical (cont) | |
SOFAT1 | 75 | numerical (cont) |
SOFAT2 | numerical (cont) | |
SOFAT3 | numerical (cont) | |
APACHE IIT1 | 75 | numerical (cont) |
APACHE IIT2 | 71 | numerical (cont) |
APACHE IIT3 | 44 | numerical (cont) |
Result in ICU | 75 | categorical (2 val) |
The columns of the table correspond to the name of the feature, the number of available observations and the type of the feature.
In order to identify promising feature sets, five FS techniques were applied to the first three data sets described in the previous section. The resulting feature subsets were used to train ML models and five performance measures were recorded. The first four measures are defined through the numbers of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN):
The experiment pipeline for the FS procedure can be described as follows. First, 100 random splits were created for each of the datasets, using 75% for training and the rest for testing. Then, for each pair of training and test sets RF imputation [
The process of FS and classification consists of the following steps: 1) create 100 random splits of the dataset: 75% for training and 25% for testing; 2) for each split impute both sets separately, using the imputed training set for a test set imputation; 3) after that, use the FS technique on the imputed training set, varying the size of the selected feature set (from a minimum of 2 to a maximum of 60), then choose the selected features in both sets, creating new pairs of training and test sets; 4) finally, using sets with different amounts of features, train and evaluate a ML model and choose the one with the highest AUC; record which and how many features were used for training this model, and its performance measures. Repeat these steps for all 100 random splits for each dataset and FS technique.
Now that the best feature sets in all pairs of splits were selected, the stability scores were calculated as frequencies: the individual feature occurrences in all chosen sets were counted and then divided by the number of pairs (100 in the reported experiments). In the
The RF data imputation was performed in
The scikit-learn [
The FGES algorithm was used to obtain the CBN structures that would reveal the interactions between the features. This is an optimized and parallelized version of the Greedy Equivalence Search algorithm. It heuristically searches the space of CBNs and returns the model with the highest value for the Bayesian Information Criterion (BIC) [
The algorithm works on the assumption that the causal process generating the data is accurately modeled by a CBN. Each node in this CBN is a a linear function of its parents, plus a finite additive Gaussian noise term. Each observation in the data is assumed to be independent and obtained by randomly sampling all the variables from the joint distribution. Given all these assumptions, the FGES procedure outputs the CBN structure that contains:
an arc X → Y, if and only if X causes Y;
an edge (-), if and only if either X causes Y or Y causes X;
no edge between X and Y, if and only if X and Y have no direct causal relationship between them.
The
The FS experiments are divided into three groups. Each group corresponds to its own dataset:
# Features | Data | Method | Accuracy | MCC | Sensitivity | Specificity | AUC |
---|---|---|---|---|---|---|---|
- | - | Majority class | 0.763 ± 0.09 |
0.0 ± 0.0 |
0.0 ± 0.0 |
1.0 ± 0.0 |
0.5 ± 0.0 |
11 | IFS | - | 0.644 ± 0.124 |
0.028 ± 0.255 |
0.269 ± 0.252 |
0.767 ± 0.14 |
0.518 ± 0.139 |
13.5 ± 13.6 | T1 | UFS | 0.828 ± 0.083 |
0.573 ± 0.183 |
0.731 ± 0.217 |
0.873 ± 0.095 |
0.802 ± 0.108 |
16.5 ± 13.2 | T1 | RFE | 0.823 ± 0.079 |
0.549 ± 0.19 |
0.702 ± 0.23 |
0.876 ± 0.089 |
0.789 ± 0.114 |
14.5 ± 13.0 | T1 | UFS+RFE | 0.814 ± 0.077 |
0.52 ± 0.181 |
0.656 ± 0.226 |
0.876 ± 0.092 |
0.766 ± 0.106 |
18.1 ± 10.8 | T1 | RF | |||||
15.7 ± 12.8 | T1 | Aggr. | 0.834 ± 0.08 |
0.577 ± 0.186 |
0.711 ± 0.221 |
0.887 ± 0.09 |
0.799 ± 0.109 |
10.8 ± 14.2 | T1+T2 | UFS | 0.839 ± 0.085 |
0.584 ± 0.207 |
0.715 ± 0.229 |
0.889 ± 0.091 |
0.802 ± 0.116 |
17.5 ± 15.8 | T1+T2 | RFE | 0.807 ± 0.089 |
0.503 ± 0.205 |
0.637 ± 0.265 |
0.874 ± 0.101 |
0.755 ± 0.122 |
12.9 ± 10.7 | T1+T2 | UFS+RFE | 0.796 ± 0.074 |
0.465 ± 0.175 |
0.596 ± 0.251 |
0.873 ± 0.097 |
0.734 ± 0.109 |
21.2 ± 14.4 | T1+T2 | RF | |||||
15.6 ± 13.9 | T1+T2 | Aggr. | 0.83 ± 0.087 |
0.561 ± 0.208 |
0.678 ± 0.246 |
0.89 ± 0.095 |
0.784 ± 0.12 |
17.9 ± 17.2 | Full | UFS | 0.754 ± 0.105 |
0.31 ± 0.23 |
0.431 ± 0.256 |
0.861 ± 0.129 |
0.646 ± 0.119 |
19.7 ± 16.5 | Full | RFE | 0.758 ± 0.088 |
0.334 ± 0.213 |
0.483 ± 0.278 |
0.855 ± 0.114 |
0.669 ± 0.118 |
17.4 ± 15.2 | Full | UFS+RFE | 0.743 ± 0.09 |
0.311 ± 0.21 |
0.477 ± 0.251 |
0.839 ± 0.123 |
0.658 ± 0.111 |
21.9 ± 15.5 | Full | RF | |||||
19.2 ± 16.2 | Full | Aggr. | 0.769 ± 0.096 |
0.371 ± 0.228 |
0.501 ± 0.267 |
0.864 ± 0.117 |
0.683 ± 0.123 |
Best feature sets obtained in the FS experiments and their general performance across 100 random data splits. The columns correspond to the number of features, the data that were used to obtain the feature set, the FS method and five performance measures (
As it can be seen from
For the causal discovery experiments, it was decided to choose the 20 features with the highest stability scores for the models whose results are shown in
# Features | Data | Method | Accuracy | MCC | Sensitivity | Specificity | AUC |
---|---|---|---|---|---|---|---|
18.1 ± 10.8 | T1 | RF | 0.87 ± 0.07 |
0.668 ± 0.152 |
0.755 ± 0.198 |
0.922 ± 0.074 |
0.839 ± 0.092 |
15.7 ± 12.8 | T1 | Aggr. | 0.834 ± 0.08 |
0.577 ± 0.186 |
0.711 ± 0.221 |
0.887 ± 0.09 |
0.799 ± 0.109 |
21.2 ± 14.4 | T1+T2 | RF | |||||
15.6 ± 13.9 | T1+T2 | Aggr. | 0.83 ± 0.087 |
0.561 ± 0.208 |
0.678 ± 0.246 |
0.89 ± 0.095 |
0.784 ± 0.12 |
21.9 ± 15.5 | Full | RF | 0.819 ± 0.082 |
0.53 ± 0.176 |
0.614 ± 0.244 |
0.902 ± 0.087 |
0.758 ± 0.111 |
19.2 ± 16.2 | Full | Aggr. | 0.769 ± 0.096 |
0.371 ± 0.228 |
0.501 ± 0.267 |
0.864 ± 0.117 |
0.683 ± 0.123 |
The columns of the table correspond to the number of features, the data that was used to obtain the feature set, the FS method used and five performance measures (
[T1, RF]: APACHE II T1, SOFA T1, Respiratory rate T1, Urine Output (mL/day) T1, E wave (cm/s) T1, Fluid Balance (ml) T1, K Ur T1, Lactate levels (mmol/L) T1, Tidal volume (VT) T1, Pulmonary artery systolic pressure (TR jet by CW + CVP) (mmHg) T1, * Norepinephrine (mg/kg/min) T1, PCT Value (mg/mL) T1, Heart rate (bpm) T1, E/e’ T1, PEEP T1, Pplat T1, Platelet count T1, Na Ur T1, pH T1, PaO2/FiO2 T1;
[T1, Aggr.]: APACHE II T1, SOFA T1, Respiratory rate T1, Glasgow Coma Scale T1, K Ur T1, Platelet count T1, Tidal volume (VT) T1, Tricuspid regurgitation maximal velocity (by CW) (cm/s) T1, HCO3 (mmol/L) T1, LA dilatation by eyeballing T1, Pulmonary artery systolic pressure (TR jet by CW + CVP) (mmHg) T1, Base Excess (mmol/L) T1, E wave (cm/s) T1, PCT Value (mg/mL) T1, Respiratory rate (rpm) T1, Pplat T1, Tricuspid annular tissular doppler S wave (DTI) (cm/s) T1, * Norepinephrine (mg/kg/min) T1, * Dobutamine (mg/kg/min) T1, Lactate levels (mmol/L) T1;
[T1+T2, RF]: Lactate levels (mmol/L) T2, SOFA T2, Tidal volume (VT) T2, APACHE II T1, Urine Output (mL/day) T1, Urine Output (mL/day) T2, APACHE II T2, Respiratory rate T1, K Ur T2, SOFA T1, * Norepinephrine (mg/kg/min) T2, Neutro abs count T2, E wave (cm/s) T1, Na Ur T2, pH T2, K Ur T1, Glasgow Coma Scale T2, Tricuspid regurgitation maximal velocity (by CW) (cm/s) T1, PT T2, Pulmonary artery systolic pressure (TR jet by CW + CVP) (mmHg) T1;
[T1+T2, Aggr]: APACHE II T1, SOFA T2, Tricuspid regurgitation maximal velocity (by CW) (cm/s) T1, Glasgow Coma Scale T1, Neutro abs count T2, Glasgow Coma Scale T2, Platelet count T1, K Ur T1, APACHE II T2, SOFA T1, Respiratory rate T1, HCO3 (mmol/L) T1, Lactate levels (mmol/L) T2, E wave (cm/s) T1, Tidal volume (VT) T1, Respiratory rate (rpm) T1, K Ur T2, Na Ur T2, PT T2, LA dilatation by eyeballing T1;
[Full, RF]: Neutro abs count T1, Platelets (10 3/mm 3) T3, E/e’ T2, Lateral e’ (cm/s) T2, Creat Ur T2, Urine Output (mL/day) T2, Lateral e’ (cm/s) T1, Respiratory rate (rpm) T1, Lympho abs count T1, Mean arterial pressure (mmHg) T1, Diastolic Blood Pressure (mmHg) T2, E wave deceleration time (ms) T2, Tricuspid annular tissular doppler S wave (DTI) (cm/s) T1, Sat O2/FiO2 T2, Platelets (10 3/mm 3) T1, FiO2 T3, Tidal volume (VT) T1, RBC count T3, Fluid Balance (ml) T2, Platelet count T2;
[Full, Aggr]: A wave (cm/s) T2, Systolic blood pressure (mmHg) T2, Heart rate (bpm) T1, Respiratory rate (rpm) T1, Weight (kg), Height (cm), A wave (cm/s) T1, PT T1, E wave deceleration time (ms) T3, Hematocrit (%) T2, Platelet count T3, Creat Ur T1, E wave deceleration time (ms) T1, PaCO2 (mmHg) T1, K Ur T1, E wave (cm/s) T2, aPTT T1, Mean arterial pressure (mmHg) T1, Base Excess (mmol/L) T1, PaO2 (mmHg) T2.
Within square brackets: the subset of data that was used to obtain the feature set and the applied FS technique.
For each of the feature sets there are two corresponding CBNs. The first one (the CBN ‘a’ was built without the target feature (
The CBN for the
The CBN for the
In most of CBNs pairs, the presence of the target value did not imply much change. The majority of edges of the
Analyzing the results of the FS experiments, one may notice that there are a lot of reoccurring features with high stability scores for each of the datasets. These features can be clearly seen from the
Regarding the
In contrast, the worst performance was that obtained for the
The performance of features from different datasets and models were further compared to the model that showed the best results, which was the
The CBN structures revealed connections between the features and the outcome. The assumption behind the evaluation of the most important clinical attributes for outcome prediction was that such features are the closest to the outcome in a graph. The direction of the edge from feature to the outcome is also a good indicator that a certain feature is important.
According to the
Some parallels can be seen between the features that achieved high
High stability scores features: APACHE II T1, SOFA T1, Respiratory rate T1, Glasgow Coma Scale T1, K Ur T1, SOFA T2, Tricuspid regurgitation maximal velocity T1, Neutro abs count T2, Glasgow Coma Scale T2.
Causal discovery features: Norepinephrine T1, SOFA T1, Respiratory rate T1, Platelet count T1, E wave at T1, Dobutamine T1, Tricuspod regurgitation maximal velocity T1, Neutro abs count T2, Norepinephrine T2, SOFA T2, APACHE II T2.
List of attributes that were considered promising for the outcome prediction in patients with shock.
These findings agree with other studies that support the use of certain attributes for mortality prediction. For instance, the SOFA score has previously been shown to have a significant prognostic value for in-hospital mortality prediction [
Finally, it should be noted that some features seem more important at certain time-steps. For example, the difference in stability scores shows that
In this paper, we have analyzed and experimentally compared several models for the prediction of mortality in patients with cardiogenic or septic shock, identifying those clinical traits that are more relevant for such prediction in the ICU at different time-points. These models are the result of the application of different FS techniques over the available clinical data. In particular, we were interested in obtaining an actionable classifier for the acute phase of shock, which is the most critical to implement an appropriate therapeutic response and, therefore, has the greater impact on the prognosis of patients. For the acute phase (i.e. the first 24 hours of evolution of shock), we have obtained a classifier that leverages upon well established attributes for septic shock such as the SOFA or APACHE II scores and treatment with vasopressors (for example, norepinephrine), which are part of the cardiovascular SOFA score. This classifier for the acute phase of shock also resorts to other valuable attributes such as those obtained through echocardiography during cardiogenic shock (e.g. E wave velocity, early mitral inflow velocity and mitral annular early diastolic velocity (E/e’)). Other important attributes are related to respiratory function (such as respiratory rate, tidal volume and ventilator pressures), fluid balance, pulmonary artery pressure, renal function (such as urine output and urea levels), lactate levels, acidosis and C-reactive protein levels. These significant attributes are very closely related to the new official definition of shock and its corresponding management guidelines, in particular those related to organ dysfunction and respiratory function. However, our causal discovery experiments showed that treatment with pressors, respiratory rate, platelet count and the SOFA score presented the highest dependence with ICU outcome. The best classifier for the acute phase yielded an accuracy of 0.870, a sensitivity of 0.755 and a specificity of 0.922. The accuracy results were not statistically different from those obtained with the best classifier at T1 and T2.
The main conclusion of this study is that it is possible to predict the risk of death in the acute phase of septic and cardiogenic shock with quite acceptable results by taking into consideration the attributes routinely measured through echocardiography in the ICU. Use of this data for assessing the prognosis of patients is considered valuable for the clinical management of patients with shock in the ICU at ICU admission or during the first day of evolution.
The results presented here open up a new line of research for the study of the pathophysiology of shock when combined with other data related to organ dysfunction beyond the commonly used scales such as the SOFA score. In the next step of this research, we plan to add other types of data that are related to the patient’s response to shock and assess to what extend the expanded models may yield improved results. This new layer of data will include transcriptomics, proteomics and metabolomics features as detailed in the scope of the ShockOmics European project.
The resulting features from applying multiple FS techniques to the three available datasets, their stability scores, additional CBNs and a full list of ShockOmics attributes.
(PDF)
This research was supported by the European FP7 project Shockomics (Nr. 602706), Multiscale approach to the identification of molecular biomarkers in acute heart failure induced by shock, and Spanish research project TIN2016-79576-R.