COVID-19 mortality risk assessment: An international multi-center study

Timely identification of COVID-19 patients at high risk of mortality can significantly improve patient management and resource allocation within hospitals. This study seeks to develop and validate a data-driven personalized mortality risk calculator for hospitalized COVID-19 patients. De-identified data was obtained for 3,927 COVID-19 positive patients from six independent centers, comprising 33 different hospitals. Demographic, clinical, and laboratory variables were collected at hospital admission. The COVID-19 Mortality Risk (CMR) tool was developed using the XGBoost algorithm to predict mortality. Its discrimination performance was subsequently evaluated on three validation cohorts. The derivation cohort of 3,062 patients has an observed mortality rate of 26.84%. Increased age, decreased oxygen saturation (≤ 93%), elevated levels of C-reactive protein (≥ 130 mg/L), blood urea nitrogen (≥ 18 mg/dL), and blood creatinine (≥ 1.2 mg/dL) were identified as primary risk factors, validating clinical findings. The model obtains out-of-sample AUCs of 0.90 (95% CI, 0.87–0.94) on the derivation cohort. In the validation cohorts, the model obtains AUCs of 0.92 (95% CI, 0.88–0.95) on Seville patients, 0.87 (95% CI, 0.84–0.91) on Hellenic COVID-19 Study Group patients, and 0.81 (95% CI, 0.76–0.85) on Hartford Hospital patients. The CMR tool is available as an online application at covidanalytics.io/mortality_calculator and is currently in clinical use. The CMR model leverages machine learning to generate accurate mortality predictions using commonly available clinical features. This is the first risk score trained and validated on a cohort of COVID-19 patients from Europe and the United States.

The XGBoost algorithm is one of the most popular ensemble methods for binary classification in the machine learning field [2]. It is based on a large number of trees that are built in an iterative fashion. Later trees are constructed based on the errors that existed in earlier trees, giving the model more power to handle "harder" cases. This error correction ability often gives XGBoost a performance edge over other linear or tree-based methods. There is multitude of hyperparameters that need to be tuned for this algorithm. Three of them are particularly important: number of trees, depth of trees and learning rate. In this study, we tune the parameters: learning rate, γ, λ, α, minimum child weight, maximum tree depth, number of estimators. The learning rate, also called shrinkage factor or η, controls the weighting factor for corrections by new trees added in the model: it takes values between 0 and 1, with values closer to 1 having more corrections for each tree and higher risk of overfitting on the training data. Gamma (γ) is a regularization parameter controlling the minimum loss reduction required to make a further partition on a leaf node of a tree: it takes positive values, with larger ones defining a more conservative model. Lambda (λ) is the L2 regularization parameter on the feature weights: it takes positive values, with the larger ones encouraging smaller weights, thus making the model more conservative. Alpha (α) is the L1 regularization parameter on the feature weights: it takes positive values, with the larger ones driving to 0 the weights, defining a more conservative model. Minimum child weight is the minimum Hessian weight required to create a new node, with a role similar to that of γ, i.e. regularization at the splitting step: it takes positive values, with higher values making the model more conservative. The maximum depth of a tree controls the maximum number of nodes that can exist between the root node and the farthest leaf in the tree: it is a positive integer, and large values usually lead to overfitting on the training data. The number of estimators determines the number of trees to fit in the model: it is a positive integer, and large values usually lead to overfitting on the training data. All remaining parameters are set to their default values.

S4 Text. Methods comparison.
We compared three different machine learning methods in the development of our model. In all cases, we formulate a binary classification problem to predict mortality (1) or discharge (0) as the endpoint of a patient's hospitalization. Predictive models are trained using XGBoost, Logistic Regression, and Classification And Regression Trees (CART); all methods are implemented in Scikit-learn [3]. Logistic Regression assumes an additive relationship between risk factors, whereas CART and XGBoost are able to capture non-linearities and feature interactions. While CART forms a single decision tree, XGBoost is an ensemble method: it constructs a set of decision trees which are then combined to yield a single prediction for a given patient.
S3 Table reports the AUC and various threshold-based metrics for the three algorithms. For each method, we select the threshold that yields a sensitivity of at least 80% to reflect the priority of correctly identifying mortality. Of the three methods, XGBoost is able to capture the most sophisticated interactions between features and subsequently demonstrates the strongest performance. Logistic Regression reports a strong test set AUC but indicates a loss in specificity and precision for the chosen thresholds. CART has the highest negative predictive value but is outperformed by both other models on all other metrics.

S5 Text. Parameter tuning.
In this project, we leverage the hyperparameter optimization framework Optuna [4] as follows. We first identify the corresponding parameter spaces for the Scikit-Learn implementations of XGBoost, Logistic Regression and CART [3]. Second, we define the objective function as the 300-folds cross validation area under the curve (AUC). Finally, we employ a pipeline to maximize the objective over 500 maximum iterations on multiple cores.

S6 Text. SHAP methodology.
SHapley Additive exPlanations (SHAP) are useful tools to interpret model predictions and risk drivers [5,6]. The SHAP methodology explains a patient risk prediction (normalized between 0 and 1) by computing the contribution of each feature. This is obtained by approximating the nonlinear XGBoost prediction model as a linear model around the patient prediction. The coefficients of the linear approximation are estimated by introducing every feature one at a time and comparing the model output variations. We use the SHAP Python package [6], featuring an efficient algorithm to compute the SHAP values and the plot generation functions, to interpret the outcomes of XGBoost model in Figure 1.
Tables S1 Table. Descriptive summary of derivation population broken down by study site.
S2 Table. Descriptive summary of validation population broken down by study site. S3 Table. AUC performance and threshold-based metrics for different machine learning methods, evaluated on the test set from the derivation cohort.
S4 Table. Overview of participating institutions in the The Hellenic COVID-19 Study Group. Figures S1 Figure. Receiver operator curves (ROC) evaluating the model's performance on the testing set for patient subgroups. S1   The General University Hospital of Larissa is the referral center of the 5th Health Region of Central Greece for the management of COVID-19 patients, covering more than 1,000,000 population. Since March 2020, COVID-19 patients are managed in its Infectious Disease Unit. Patients were treated according to the therapeutic algorithms proposed by the Greek Committee of Public Health of the Ministry of Health, using hydroxychloroquine and azithomycin as the first-line main antiviral agents.