Development and external validation of a prediction risk model for short-term mortality among hospitalized U.S. COVID-19 patients: A proposal for the COVID-AID risk tool

Background The 2019 novel coronavirus disease (COVID-19) has created unprecedented medical challenges. There remains a need for validated risk prediction models to assess short-term mortality risk among hospitalized patients with COVID-19. The objective of this study was to develop and validate a 7-day and 14-day mortality risk prediction model for patients hospitalized with COVID-19. Methods We performed a multicenter retrospective cohort study with a separate multicenter cohort for external validation using two hospitals in New York, NY, and 9 hospitals in Massachusetts, respectively. A total of 664 patients in NY and 265 patients with COVID-19 in Massachusetts, hospitalized from March to April 2020. Results We developed a risk model consisting of patient age, hypoxia severity, mean arterial pressure and presence of kidney dysfunction at hospital presentation. Multivariable regression model was based on risk factors selected from univariable and Chi-squared automatic interaction detection analyses. Validation was by receiver operating characteristic curve (discrimination) and Hosmer-Lemeshow goodness of fit (GOF) test (calibration). In internal cross-validation, prediction of 7-day mortality had an AUC of 0.86 (95%CI 0.74–0.98; GOF p = 0.744); while 14-day had an AUC of 0.83 (95%CI 0.69–0.97; GOF p = 0.588). External validation was achieved using 265 patients from an outside cohort and confirmed 7- and 14-day mortality prediction performance with an AUC of 0.85 (95%CI 0.78–0.92; GOF p = 0.340) and 0.83 (95%CI 0.76–0.89; GOF p = 0.471) respectively, along with excellent calibration. Retrospective data collection, short follow-up time, and development in COVID-19 epicenter may limit model generalizability. Conclusions The COVID-AID risk tool is a well-calibrated model that demonstrates accuracy in the prediction of both 7-day and 14-day mortality risk among patients hospitalized with COVID-19. This prediction score could assist with resource utilization, patient and caregiver education, and provide a risk stratification instrument for future research trials.


Introduction
The 2019 novel coronavirus disease , caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has become an international pandemic. Although the original outbreak was attributed to zoonotic transmission in Wuhan, China, human-to-human transmission through respiratory droplets and aerosolization has resulted in rapid disease spread across the world. As of May 11, 2020 there have been over 4 million confirmed cases of COVID-19, with many more likely infected, and more than 280,000 associated deaths worldwide [1]. Clinical presentations of COVID-19 have been heterogeneous, ranging from mild flu-like symptoms (fever, cough, and fatigue) to severe respiratory symptoms and hypoxia resulting in acute respiratory distress syndrome (ARDS). Given the wide spectrum of symptoms, there have been varied clinical trajectories, ranging from outpatient management to hospital admission, need for intensive care and/or mechanical ventilation, multisystem organ failure, and death. In select cases, the progression of disease may be extremely rapid, with the observed time between onset of symptoms and the development of ARDS as short as 9 days [2].
With a vast number of individuals affected by the disease, there has been an imbalance between the supply and demand of hospital and intensive care unit (ICU) beds, straining available healthcare resources [3]. Attempts have been made to clarify the relationship of risk factors with clinical prognosis in order to risk stratify patients and appropriately allocate available but limited healthcare resources. Several observational studies have noted that patients who are older or carry various comorbidities, such as diabetes, cardiovascular disease [4], and hypertension [5,6], have higher risk for in-hospital mortality from COVID-19 [7]. Other studies have shown certain biomarkers such as ferritin, lactate dehydrogenase (LDH), D-dimer, and C-reactive protein (CRP) to predict COVID-19 severity [8,9]. While there have been attempts at creating prediction models that combine several variables to estimate prognosis, including the use of scoring systems and machine learning [10], many of these models have been suboptimal due to high-risk of bias, restricted sample sizes, and limited number of outcomes of interest [10].
Given the paucity of comprehensive data to guide providers and caregivers on the prognosis of COVID-19 patients and the potential for rapid disease progression, there remains a need to develop a prediction model for mortality. As New York City has become the epicenter of the COVID-19 pandemic in the United States, we sought to use our large cohort of patients to develop a prognostic model that could predict the risk of death within 7 or 14 days from admission. Using data from hospitalized patients infected with COVID-19 from two New York City hospitals, the primary objective of this study is to construct an accurate prognostic model, called the COVID-AID (Admission to Death) risk tool, and externally validate the model using another large cohort from a different region of the United States.

Patient population and data collection for independent variables
This was a retrospective study performed at two hospitals in Manhattan (an academic tertiary referral center and a smaller community hospital). Adult patients (age �18 years) with a positive real-time reverse-transcription polymerase chain reaction (RT-PCR) from a respiratory sample (naso-or oropharyngeal, or bronchial/sputum samples) for SARS-CoV-2 between March 4, and April 9, 2020 were included. All included patients had laboratory-confirmed COVID-19. Patients who were admitted (including temporary observation defined as admission to emergency department and discharge within 24 hours) were included in this analysis. The study was reviewed and approved by the institutional review board (Weill Cornell Medicine: 2004021793). We followed TRIPOD guidelines for reporting multivariable prediction models (See S1 Table in S1 File) [11].
Date of first symptoms recorded and date of positive SARS-CoV-2 PCR were recorded, as were the initial vital signs upon presentation. The first set of recorded vital signs including temperature (with fever defined as T �37.8), respiratory rate (RR), heart rate (HR), systolic, diastolic, and mean arterial pressures (SBP, DBP, MAP, respectively), and body mass index (BMI) were extracted. BMI was categorized into normal weight between a BMI �18.5 and <25 kg/m 2 (reference category), underweight BMI <18.5 kg/m 2 , overweight BMI �25 kg/m 2 and <30 kg/m 2 , obese BMI �30 kg/m 2 and <40 kg/m 2 , and morbidly obese BMI �40 kg/m 2 .
A comprehensive set of laboratory studies was also extracted upon admission. This included complete blood count (white blood cell, absolute neutrophil, absolute lymphocyte, and platelet counts), serum creatinine (sCr), liver tests including alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin, alkaline phosphatase, and albumin, as well as serum troponin, procalcitonin, lactate dehydrogenase, fibrinogen, lactate levels, and inflammatory markers including C reactive protein (CRP), D-dimer, and ferritin. Patients were considered to have biochemical indication of liver injury at presentation if they had ALT or AST>40 U/L, total bilirubin>1.2 mg/dL, or alkaline phosphatase>150 U/L (upper limit of normal at our laboratory). Kidney dysfunction was defined as Kidney Disease Improving Global Outcomes Acute Kidney Injury (KDIGO AKI) stage 2 or greater where sCr was at least 2 times or more then reference value with reference estimate at 1mg/dL (i.e. sCr � 2.0 mg/dL) [12].
Patients' degree of hypoxia on admission was categorized based on pulse oximetry as a) no hypoxia (defined as an oxygen saturation of �95% on room air), b) moderate hypoxia (defined as maintaining an oxygen saturation of 90-95% on room air or �90% with 4 liters or less supplemental oxygen through a nasal cannula, or c) severe hypoxia (defined as needing more than 4 liters of supplemental oxygen, non-rebreather mask or non-invasive (e.g. BiPAP) or invasive ventilation to maintain an oxygen saturation of �90%, or failure to maintain an oxygen saturation of �90%).
Outcomes. Data were extracted regarding need for supplementary oxygen, non-invasive positive pressure ventilation (NIPPV), or invasive ventilatory support with mechanical ventilation, ICU admission, and death. The main outcome of this study was 14-day mortality. The secondary outcome of interest was 7-day mortality.
Follow-up, survival modeling, and prediction. Patients were followed from time of admission until May 24th, 2020. Survival time was defined as the time between admission to death (failure time) or the date when patients were last known to be alive (censoring time). To calculate 7-day mortality, deaths that occurred within 7 days of admission were kept while deaths occurring after 7 days were recorded as non-events. To calculate 14-day mortality, deaths that occurred within 14 days of admission were kept while deaths occurring after 14 days were recorded as non-events.
Univariable logistic regression models were created for each of the aforementioned independent variables. To lessen the influence of extreme values, we transformed continuous variables into natural logarithms. Selection of predictors was performed using Chi-square automatic interaction detection (CHAID) modeling in order to decrease the dimensionality of the data and explore the most informative variables for identifying patient groups with the highest risk of mortality [13]. Variables which were significant in univariable analysis (defined as p-value <0.05) were included in the CHAID model with adjusted significance testing (Bonferroni method) without limitation on the number of nodes and branches [14,15]. Independent risk factors that were chosen from the CHAID algorithm (X j ) were subsequently included in a multivariable logistic regression and their regression coefficients (β j ) were stored. An individual odds ratio of mortality (OR i ) was calculated for each patient by adding the product of each individual risk factor level and its corresponding coefficient: Where OR i is each individual's OR of mortality, X ij is the individual's level of j'th risk factor, and β j is the coefficient for the j'th risk factor, and β 0 is the intercept for the logistic regression. An individual probability of death (P i ) was calculated from the individual's OR i :

Internal and external validation of the prediction model
10-fold cross validation was used for internal validation of the prediction models, and mean (95%CI) performance characteristics across 10 internal validations as well as overall performance characteristics in the whole development cohort were reported [16]. Discriminant analyses in the internal and external validation sets were performed using receiver operating characteristic (ROC) curve. Area under the curve (AUC) and its 95% confidence intervals were reported. Calibration was performed using visual calibration plots of observed versus predicted risk of death within groups formed by 10 quantiles of predicted risk of death, with overlying linear predictions and their 95% confidence intervals. Chi-Squared statistics and corresponding p-values (DF = 8) were reported from the Hosmer-Lemeshow goodness-of-fit test for calibration.
External validation was separately performed by using a cohort of 265 adult patients (age> = 18) admitted with a positive RT-PCR for SARS-CoV-2 from a respiratory sample between March 7, and April 2, 2020, in 2 tertiary care and 7 community hospitals from a single healthcare system in Massachusetts. Identical definitions, independent, and outcome variables were used in the external validation analysis.
The study was reviewed and approved by the corresponding institutional review board (Partners Healthcare: 2020P0000983).
Chained multiple imputations (50 repetitions) using linear and logistic regressions for continuous and categorical variables, respectively, were used to impute missing data on independent variables (S2 Table in S1 File) [17]. All tests were two-tailed with a significance level of alpha < 0.05, except when adjusted for multiple comparisons as described above. All analyses were performed with Stata 13.0 for Windows (StataCorp LP, College Station, TX). The 3D graphs of risk were generated using Microsoft Mathematics (Microsoft Corporation, Redmond, WA).

Analysis of mortality risk factors
In univariable analysis, age, race/ethnicity, history of hypertension or cardiovascular disease, history of chronic kidney disease, mean arterial pressure (MAP), respiratory rate and presence of hypoxia on presentation, serum creatinine level, presence of kidney dysfunction, platelet count, procalcitonin, lactate dehydrogenase, lactic acid, troponin, Ferritin, D-dimer, C-reactive protein, and AST levels on presentation were significantly associated with 14-day mortality (Table 1).
These variables were then included in the CHAID algorithm to find an optimal decision tree for splitting patients into low-and high-risk categories and predicting the risk of death. Age, admission MAP, presence of severe hypoxia (compared to no or moderate hypoxia) on presentation, and presence of kidney dysfunction on admission were selected as the most informative risk factors for categorizing patients according to their risk of 14-day mortality (Fig 2).

Development of 7-day and 14-day mortality prediction models
These four risk factors (age, MAP, presence of severe hypoxia, presence of kidney dysfunction) were then included in two separate multivariable regression models, one with 7-day mortality as the outcome, and another with 14-day mortality as the outcome, in order to estimate the corresponding coefficient for each risk factor and outcome pair. The details of the regression parameters are included in Table 2.

External validation
The external validation cohort consisted of 265 patients admitted with laboratory-confirmed COVID-19. Patients in this cohort had a mean age of 65 years (SD = 17), and were 56% male. 39 deaths occurred within 14 days of admission and 7-day and 14-day mortality rates were 7.5% (95%CI 4.4-10.8%) and 14.7% (95%CI 10.4-19.0%), respectively (Fig 1). The same prediction models were used to predict each patient's probability of 7-day and 14-day mortality within the external validation cohort. The model had excellent discrimination for 7-day mortality with an AUC of 0.851 (95%CI 0.781-0.921; Fig 4A), as well as an excellent discrimination for 14-day mortality with an AUC of 0.825 (95%CI 0.764-0.887; Fig 4C).

Discussions
The COVID-AID risk tool is a prediction model that accurately estimates the 7-and 14-day risk of death following admission for patients hospitalized with COVID-19 using four simple, Data are mean ± SD, or n(%).

PLOS ONE
well-defined variables that are all available at initial presentation-patient age, mean arterial pressure, serum creatinine and severity of hypoxia. We demonstrated that this prognostic model had consistent test performance in forecasting mortality risk using an independent COVID-19 positive population from another U.S. region in external validation. While other groups around the world have attempted prognostic modeling for COVID-19 disease severity or mortality, a recent systematic review found a lack of generalizability, poor reporting, and severe biases limiting their use [10]. In addition, while some studies have established the severity of respiratory distress by applying a scoring system (Brescia-COVID Respiratory Severity Scale) based on oxygenation status and chest imaging among a critically-ill Italian population affected with COVID-19 [18,19], there remains great need for a precise prediction score based on variables available at first encounter in order to add clinical meaningfulness and practicality. More recently, a prediction score, COVID-GRAM, was constructed in China to predict development of critical illness among hospitalized patients with COVID-19 [20]. While this 10-item prediction rule was found to have good predictive value (AUC 0.88), there are several limitations that may affect its applicability and generalizability. The study and model's endpoint, an aggregate outcome of ICU admission and intubation, may be affected by non-clinical factors such as differences in local policy, demand, and resources available. Furthermore, the hospitalized COVID-19 population may also be different in China compared to U.S., as  threshold for admission may vary. Certain factors in the prediction rule, such as imaging results or state of consciousness, are also subject to interpretation of the user, thereby adding another layer of subjectivity. These factors may all limit the generalizability of this prediction score, particularly among the U.S. hospitalized COVID-19 population. Therefore, we sought to address this need by developing the COVID-AID 7-day and 14-day mortality risk tool and validated the discrimination and calibration using two independent U.S. populations with COVID-19 disease. The COVID-AID risk tool is an easy-to-use, bedside clinical decision instrument that may assist healthcare workers in determining resource utilization and hospital triage of patients infected with SARS-CoV2. Additionally, the calculator may also allow patients, family members, and additional caregivers to gain helpful insight on disease severity and prognosis. Moreover, we propose that the COVID-AID risk tool might assist future therapeutic trial design as a validated tool to be used for risk stratification. The strength of this instrument is the simplicity of the variables used in the model, including the generalizability of included admission vitals, age, and serum creatinine, all of which can be readily obtained at all hospitals at the time of hospital or emergency room presentation (See case examples in S3 Table in S1 File). The use of only objective parameters would also help reduce inter-user variability.
Regarding the comprised model variables, each has scientific rationale in the pathophysiology of COVID-19 disease, which strengthens the generalizability of the model (See predicted mortality risks in S1 and S2 Figs). Not surprisingly, the degree of hypoxia at presentation has been well-defined as a significant indicator of severity of illness, particularly in acute respiratory stress conditions, and carries strong justification to be a significant risk factor in the clinical course of severe COVID-19 [18,19,21,22]. In addition, given our focus on short-term mortality as an outcome, we found that lower MAP at presentation was linked with early death and remained a consistent, early adverse predictor among patients in our cohort. We hypothesize that this is due to the fact that these patients were more likely to be suffering from systemic vasodilatory states, such as sepsis [23] or the inflammatory cytokine storm syndrome, which has been associated with severe COVID-19 and ARDS [24][25][26]. Interestingly, we also found that kidney dysfunction at presentation (defined as serum creatinine �2 mg/dL), regardless of chronicity, was the most significant extra-thoracic organ system to impact short-term mortality among hospitalized patients afflicted with COVID-19. This, too, is not surprising, as kidney dysfunction, particularly acute kidney injury, is associated with increased mortality among critically ill patients [27,28] and is commonly associated with episodes of hypotension [29]. Further supporting our findings, elevated serum creatinine was observed more often in international COVID-19 cohorts among those who died [5,7,30], and a recent study on a large cohort of admitted patients with COVID-19 in New York, USA, reported that 22% of total admissions and more than two thirds of admissions leading to death were complicated by acute kidney injury, making it the most common end-organ failure among the admitted patients with COVID-19 [31]. Kidney dysfunction in COVID-19 is hypothesized to be either a consequence of a direct local inflammatory response on the renal epithelial cell during viral inclusion or indirectly as a result from pro-inflammatory and immune-mediated kidney damage [32,33]. Lastly, patient age, perhaps the most common risk factor for adverse outcomes in acute and chronic illnesses, maintained significance in our prediction model, and thus, justified prior literature quoting age as an important risk factor in COVID-19 adverse outcomes [10,30,[33][34][35]. Therefore, these four variables demonstrated that older patients who present with poor oxygenation, hypotension, and kidney dysfunction have a generalizable and plausible increased risk for short-term demise from COVID-19.
Our study relies on the retrospective collection of clinical and outcome data. However, we used a structured data abstraction tool and increased the generalizability of the results by obtaining data from a large cohort of patients admitted at two different New York City hospitals (a tertiary care and a smaller non-teaching hospital). Furthermore, we externally validated our model in a large healthcare system (composed of both an academic tertiary center as well as affiliated community hospitals) in Massachusetts in an effort to ensure consistency in the model's performance. The COVID-AID score also provides specific advantages over the recently published COVID-GRAM score developed in China, as it requires less input of data (4 variables versus 10), uses only objective parameters easily obtained upon presentation, and predicts the universal outcome of death with comparable performance (AUC 0.825-0.851). Our model requires global validation; however, we attempted to focus on easily reproducible and generalizable variables for this model that entailed only age, initial vital signs (hypoxia and blood pressure), and one laboratory test (serum creatinine) that can be easily obtained in a uniform fashion from a variety of healthcare settings.
In conclusion, the COVID-19 pandemic continues to wage a catastrophic burden on international healthcare and the global economy. As research continues to elucidate effective therapies for SARS-CoV2, healthcare workers and patients alike need assistance in understanding what clinical parameters on admission might predict increased severity of disease and shortterm mortality. We have developed an easy-to-use clinical prognostic score that accurately predicts risk of mortality with excellent calibration and consistency in test performance using an external validation cohort. The COVID-AID risk tool calculator is also available online at www.covidaidscore.com. While there is need for international validation, this novel mortality prediction model may help providers understand the expected risk of death for patients presenting to the hospital. We propose that the COVID-AID risk tool can enhance our knowledge of how to successfully manage these patients, lead to more effective healthcare resource utilization, and provide patients and their loved ones with improved understanding of disease severity and prognosis. Additionally, the COVID-AID risk tool also delivers an accurate risk stratification estimate for researchers to properly design future trials in hopes of discovering effective therapies against this virus.