A machine learning based exploration of COVID-19 mortality risk

doi:10.1371/journal.pone.0252384

Fig 1.

Illustration of the modeling framework.

Three machine learning models were developed using the SVM framework with three input groups; invasive, non-invasive, and their combination. The invasive group comprises laboratory results. Non-invasive features comprise patient clinical and demographic data. The joint group comprises the combination of invasive and non-invasive features. P1, P2, and P3 represent the prediction performance provided by the non-invasive, joint, and invasive models, respectively. The non-invasive model displayed good prediction performance in the farther future (P1) whereas the invasive model showed good prediction performance for the near future (P3). Neighborhood Component Analysis (NCA), recursive feature elimination via Support Vector Machine (SVM-RFE), and linear SVM with least absolute shrinkage and selection operator (Lasso) sparsity regularization (Sparse Linear SVM) were utilized for inspection of feature contributions and dynamics with respect to the outcome.

More »

Expand

Table 1.

Criteria for disease diagnosis and severity assessment upon hospital admission.

More »

Expand

Table 2.

Demographic, clinical features and mortality outcome of patients collected from medical records.

More »

Expand

Table 3.

Patients’ laboratory data collected from medical records.

More »

Expand

Fig 2.

Contribution of demographic, clinical, and laboratory features to mortality prediction.

(A) The results of the regularized NCA analysis displays the contribution of single features to mortality prediction. Features are sorted based on contribution importance and category. Features with prominent weights were displayed by orange squares for visual convenience. (B) is a favorable feature space (PTT and age) where the information content of features with respect to the outcome is high, so many data points could be visually distinguished via an illustrative decision border. Panel (C), in contrast, demonstrates unfavorable feature space where the low information content of features has led to data points becoming crunched and hard to distinguish (Sex and Hgb). Panels B and C were created using half of the data and Principal Component Analysis (PCA) for illustrative purposes.

More »

Expand

Fig 3.

Comparison of mortality prediction of invasive and non-invasive models.

(A) ROC curve of joint, invasive, and non-invasive models. (B) Investigation of models’ performance and robustness towards sample size. For each data point, a model was trained and evaluated using 90% of data which was randomly bootstrapped from the main dataset while maintaining the original discharge to expired ratio. The models were robust to the sample size and no significant difference was observed between the performance of invasive and non-invasive models. (C) Performance table of invasive, non-invasive, and joint models. Performances are reported as mean along with standard deviations. (D) Comparing the dynamics of laboratory and non-invasive features for randomly selected combinations of features. (E) Recursive feature elimination. Compared with invasive features, prominent non-invasive features had significant prediction information contents. In general, the first three features with prominent contributions to the improvement of the non-invasive model’s performance were SPO₂, age, and presence of cardiovascular disorders; the first three invasive features were BUN, LDH, and PTT. (F) Sparsity analysis. Sparse linear SVM was utilized to investigate optimal feature combinations for fixed predictor numbers. For a specific sparsity level (features number), the non-invasive model performs better than the invasive model. Green and gray represent non-invasive and invasive modes, respectively.

More »

Expand

Fig 4.

Temporal range model predictions.

(A) Temporal distribution of patient expiration intervals. The black vertical dashed line corresponds to the peak of the expiration distribution which was 3 days from admission. The gray vertical dashed line corresponds to the median expiration interval which was 7 days after admission. (B) and (C) Prediction performance of invasive and non-invasive models across expiration temporal spectrum. For panel (B), invasive and non-invasive models were trained over all the dataset. Afterwards, the expiration prediction performance was evaluated for 8 different expiration intervals. Days to outcome represents the number of days between patient admission and expiration. For panel (C), patient data were divided into three expiration intervals; from admission to day 3, from day 3 to day 7, and after day 7. For each interval, independent SVM models were trained and the true expiration ratio (True positive rate) was reported for each interval’s model. While invasive features were better predictors for imminent expiration, they were outperformed by non-invasive features over larger expiration intervals. Green and gray represent non-invasive and invasive modes, respectively.

More »

Expand