Feature selection for the accurate prediction of septic and cardiogenic shock ICU mortality in the acute phase

Circulatory shock is a life-threatening disease that accounts for around one-third of all admissions to intensive care units (ICU). It requires immediate treatment, which is why the development of tools for planning therapeutic interventions is required to deal with shock in the critical care environment. In this study, the ShockOmics European project original database is used to extract attributes capable of predicting mortality due to shock in the ICU. Missing data imputation techniques and machine learning models were used, followed by feature selection from different data subsets. Selected features were later used to build Bayesian Networks, revealing causal relationships between features and ICU outcome. The main result is a subset of predictive features that includes well-known indicators such as the SOFA and APACHE II scores, but also less commonly considered ones related to cardiovascular function assessed through echocardiograpy or shock treatment with pressors. Importantly, certain selected features are shown to be most predictive at certain time-steps. This means that, as shock progresses, different attributes could be prioritized. Clinical traits obtained at 24h. from ICU admission are shown to accurately predict cardiogenic and septic shock mortality, suggesting that relevant life-saving decisions could be made shortly after ICU admission.


Introduction
categorical values, but the majority of data is numerical. The ShockOmics dataset was split into three datasets. The first dataset was named 155 Full and it was obtained by filtering features from the original ShockOmics dataset. 156 Certain features that did not make sense from the classification viewpoint were 157 manually removed. They were either very general comments in natural language or 158 explicitely revealed some information about the outcome of the patient. Both cases were 159 deemed not to be reliable as features to feed the ML model. In detail, the following 160 features were manually removed: Reason for admission; ICU admission; RV area/LV 161 area (T1, T2, T3); Microorganisms (three columns with the same name); ID; Death due 162 to withdrawal of care; Mortality 28 days, 100 days (two columns); Hospital results; 163 Total days in ICU, in Hospital (two columns). The resulting dataset had 316 features 164 with one target feature. 165 A research hypothesis is that the closer in time the feature to the final outcome, the 166 better the prediction of mortality. To test this hypothesis, two further datasets were

172
A fourth feature set was built with features that are commonly assumed to be 173 associated with the mortality of patients with shock, according to current practice. This 174 feature set is referred to as initial feature set (IFS ).    Table 3 shows the comparison of different feature sets: their size, the dataset and the 294 FS method used. The feature sets from the same dataset are grouped together. The 295 additional IFS dataset is used as a baseline to compare the rest of feature sets 296 performances. A naive majority classifier that always predicts the most frequent label 297 in the training set, was also added for baseline comparison. This table presents only the 298 best results for each FS method. The columns correspond to the number of features, the data that was used to obtain the feature set, the FS method and five performance measures (mean ± std(p − value)). Welch's t-test is used to obtain p-values for the null hypothesis that two performance measures have identical values. Each measure was tested against the same measure but of the Full (7, UFS) feature set. The IFS is used as a baseline for comparison. The best results for accuracy are highlighted in bold.

299
As can be seen from Table 3, the UFS and the RF models produce consistently good 300 results. Their performance is better than the models with the IFS, and this is especially 301 noticeable when it comes to the MCC. The best results in the experiments were found 302 for the UFS FS method in the Full dataset, while the second best was the UFS+RFE 303 technique with the same dataset. This is interesting, since UFS+RFE showed poor 304 performance in other two datasets. The RFECV features achieved the worst results. It 305 uses the SVC model for cross-validation, but the performance was measured with the 306 G-NB. It was impossible to use the G-NB model, since it cannot evaluate the 307 importance of features. When the SVC model was used for testing the performance, it 308 showed slightly better results. The RFE feature showed reasonably good performance 309 but worse than the UFS and the RF.

310
In order to obtain the most promising feature sets, the results from both tables were 311 filtered based on their performance (see Table 4). For this purpose, different  List of promising feature sets: the subset of data that was used to obtain the feature set; 366 the size and the applied FS technique are in the parenthesis.

368
For each of the feature sets there are two corresponding CBNs. The first one (the CBN 369 "a'are was built without betweene the target feature (is an Result ion of the ICU ), the 370 second one (the CBN "b") included all features and the target.      Features like SOFA, APACHE II, Lactate levels, Respiratory rate, Urine Output and X 435 Norepinephrine usually scored high. In most of the cases, SOFA and APACHE II were 436 also highly rated by other FS methods. Additionally, it seems that certain features are 437 highly valuable only at certain time steps. For example, Lactate levels seems to have 438 the most predictive value at T2, X Norepinephrine and Respiratory rate are particularly 439 valuable at T1, Urine Output -at T1 and T2 and Heart rate at T3. These hypotheses 440 were further tested in the causal discovery experiments.

441
Although such patterns of features are noticeable, the differences between feature  The assumption behind the evaluation of the most important clinical attributes for 451 outcome prediction was that such features are the closest to the outcome in a graph.

452
The direction of the edge from feature to the outcome is also a good indicator that the 453  close connection to the target feature makes them valuable for mortality prediction, as 469 well. Such feature include X Norepinephrine at T1, FiO2 at T2, Sedation Scale SAS at 470 T2, FiO2 at T2, Lactate levels at T2, SOFA at T1, Creatinine at T3. SOFA at T1 and 471 FiO2 at T2 appeared close to Result in ICU twice, which increases the confidence in 472 their importance.

473
These findings agree with other studies that support the use of certain attributes for 474 mortality prediction. For instance, the SOFA score has previously been shown to have a 475 significant prognostic value for in-hospital mortality prediction [ The main conclusion of this study is that it is possible to predict risk of death in the 518 acute phase of septic and cardiogenic shock with quite acceptable results by taking into 519 consideration the attributes routinely measured through echocardiography in the ICU. 520 Use of this data for assessing the prognosis of patients is considered valuable for the 521 clinical management of patients with shock in the ICU.              The first 30 features stability scores using the T1+T2 dataset: for the RF (Random Forest) feature selection and for the rest of the feature selection techniques (T1+T2 ). The first 30 features stability scores using the Full dataset: for the RF (Random Forest) feature selection and for the rest of the feature selection techniques (Full ).