Table 1.
Prevalence and crude risk of 1-year all-cause mortality by comorbidity in full cohort (N = 12,080,801).
Table 2.
Model performance statistics on the test set.
Fig 1.
Calibration curve and Receiver Operator Characteristic (ROC) curves for best performing models.
A) Model 1H (CatBoost; learning rate = 0.05, min samples leaf = 10, no regularization, 1000 iterations, depth = 6); B) Model 2E (CatBoost; same as Model 1H, but with the inclusion of primary cancer types, chronic kidney disease stages, and comorbidities categorized by hospital encounter source). AUROC – area under the ROC curve.
Table 3.
1-year mortality by summary score in test set (n=3,624,241).
Fig 2.
Kaplan-Meier plots for 1-year all-cause mortality stratified by the predicted probability of death.
A) Model 1H (CatBoost; learning rate = 0.05, min samples leaf = 10, no regularization, 1000 iterations, depth = 6); B) Model 2E (CatBoost; same as Model 1H, but with the inclusion of primary cancer types, chronic kidney disease stages, and comorbidities categorized by hospital encounter source).
Fig 3.
Feature importance: A) Feature importance from CatBoost (Model 1H), internal from model structure; B) Permutation feature importance (Model 1H), model-agnostic; C) Importance from Explainable Boosting Machine (Model 3D).
Model 1H: CatBoost; learning rate = 0.05, min samples leaf = 10, no regularization, 1000 iterations, depth = 6. Model 3D: explainable boosting machine max rounds = 1000, learning rate = 0.01, max leaves = 3, max bins = 255, interactions = 20. DAD – Discharge Abstract Database (hospitalizations); NACRS – National Ambulatory Care Reporting System (ambulatory hospital visits); OHIP – Ontario Health Insurance Plan (physician billing); ASA – American Society of Anesthesiologists (ASA) physical status classification; ADL – Activities of Daily Living score.
Fig 4.
Calibration curve and Receiver Operator Characteristic (ROC) curves for the best performing models (2E) applied to the 2024 validation cohort.
AUROC – area under the ROC curve.