Using tree-based ensemble methods to produce a population-based mortality risk score in Ontario, Canada

doi:10.1371/journal.pone.0347302

Table 1.

Prevalence and crude risk of 1-year all-cause mortality by comorbidity in full cohort (N = 12,080,801).

More »

Expand

Table 2.

Model performance statistics on the test set.

More »

Expand

Fig 1.

Calibration curve and Receiver Operator Characteristic (ROC) curves for best performing models.

A) Model 1H (CatBoost; learning rate = 0.05, min samples leaf = 10, no regularization, 1000 iterations, depth = 6); B) Model 2E (CatBoost; same as Model 1H, but with the inclusion of primary cancer types, chronic kidney disease stages, and comorbidities categorized by hospital encounter source). AUROC – area under the ROC curve.

More »

Expand

Table 3.

1-year mortality by summary score in test set (n=3,624,241).

More »

Expand

Fig 2.

Kaplan-Meier plots for 1-year all-cause mortality stratified by the predicted probability of death.

A) Model 1H (CatBoost; learning rate = 0.05, min samples leaf = 10, no regularization, 1000 iterations, depth = 6); B) Model 2E (CatBoost; same as Model 1H, but with the inclusion of primary cancer types, chronic kidney disease stages, and comorbidities categorized by hospital encounter source).

More »

Expand

Fig 3.

Feature importance: A) Feature importance from CatBoost (Model 1H), internal from model structure; B) Permutation feature importance (Model 1H), model-agnostic; C) Importance from Explainable Boosting Machine (Model 3D).

Model 1H: CatBoost; learning rate = 0.05, min samples leaf = 10, no regularization, 1000 iterations, depth = 6. Model 3D: explainable boosting machine max rounds = 1000, learning rate = 0.01, max leaves = 3, max bins = 255, interactions = 20. DAD – Discharge Abstract Database (hospitalizations); NACRS – National Ambulatory Care Reporting System (ambulatory hospital visits); OHIP – Ontario Health Insurance Plan (physician billing); ASA – American Society of Anesthesiologists (ASA) physical status classification; ADL – Activities of Daily Living score.

More »

Expand

Fig 4.

Calibration curve and Receiver Operator Characteristic (ROC) curves for the best performing models (2E) applied to the 2024 validation cohort.

AUROC – area under the ROC curve.

More »

Expand