Machine learning-based prediction of in-hospital mortality using admission laboratory data: A retrospective, single-site study using electronic health record data

doi:10.1371/journal.pone.0246640

Fig 1.

Schema of selection of admission cases and data preprocessing.

More »

Expand

Fig 2.

Scheme of multiple imputation, cross-validation, training the model, and testing the model.

The missing data are filled with multiple imputation in m (= 20) times. As a result, m (= 20) complete data sets were generated after multiple imputation. In the training phase, cross-validation was performed in the condition of k (= 5) fold. Four machine learning methods (logistic regression, random forest, multilayer perceptron, and gradient boosting decision tree) were applied in this study.

More »

Expand

Fig 3.

(A) The receiver operating characteristics curve and the area under the receiver operating characteristic (AUROC) curve of the models. The results of the prediction using all test data are shown. AUROC is shown with 95% confidence interval (CI). (B) The precision-recall curve and the area under the precision-recall (AUPRC) curve of the models. The results of the prediction using all test data are shown. AUPRC is shown with 95% confidence interval (CI). (C) The distribution of predicted probability of in-hospital mortality and observed in-hospital mortality. The results of logistic regression (LR), random forest (RF), multilayer perceptron (MLP), and gradient boosting decision tree (GBDT) are shown. The bar graph shows the distribution of in-hospital mortality of test data (n = 33,970) and predicted probabilities of in-hospital mortality obtained from the prediction models (left). The table shows the observed in-hospital mortality and detail number of patients at each range of predicted probabilities obtained from the prediction models (right).

More »

Expand

Fig 4.

The results of Shapley additive explanations (SHAP) value calculation.

The results of logistic regression, random forest, multilayer perceptron, and gradient boosting decision tree are shown. The figures on right side show the distribution of the SHAP value calculated with the test data set (n = 33,970). One plot means one prediction result in the test data set. The color of plots shows variable values as shown in color scale bars on the right side of the figures. In the figure, a positive SHAP value means contribution to in-hospital mortality in 14 days and a negative SHAP value means the opposite. The bar graph on the left side shows the mean absolute value of SHAP value in each variable. The names of variables are displayed in the center in order of mean absolute value of SHAP value.

More »

Expand