Development and validation of a predictive model of in-hospital mortality in COVID-19 patients

We retrospectively evaluated 2879 hospitalized COVID-19 patients from four hospitals to evaluate the ability of demographic data, medical history, and on-admission laboratory parameters to predict in-hospital mortality. Association of previously published risk factors (age, gender, arterial hypertension, diabetes mellitus, smoking habit, obesity, renal failure, cardiovascular/ pulmonary diseases, serum ferritin, lymphocyte count, APTT, PT, fibrinogen, D-dimer, and platelet count) with death was tested by a multivariate logistic regression, and a predictive model was created, with further validation in an independent sample. A total of 2070 hospitalized COVID-19 patients were finally included in the multivariable analysis. Age 61–70 years (p<0.001; OR: 7.69; 95%CI: 2.93 to 20.14), age 71–80 years (p<0.001; OR: 14.99; 95%CI: 5.88 to 38.22), age >80 years (p<0.001; OR: 36.78; 95%CI: 14.42 to 93.85), male gender (p<0.001; OR: 1.84; 95%CI: 1.31 to 2.58), D-dimer levels >2 ULN (p = 0.003; OR: 1.79; 95%CI: 1.22 to 2.62), and prolonged PT (p<0.001; OR: 2.18; 95%CI: 1.49 to 3.18) were independently associated with increased in-hospital mortality. A predictive model performed with these parameters showed an AUC of 0.81 in the development cohort (n = 1270) [sensitivity of 95.83%, specificity of 41.46%, negative predictive value of 98.01%, and positive predictive value of 24.85%]. These results were then validated in an independent data sample (n = 800). Our predictive model of in-hospital mortality of COVID-19 patients has been developed, calibrated and validated. The model (MRS-COVID) included age, male gender, and on-admission coagulopathy markers as positively correlated factors with fatal outcome.

Early and effective predictive models of clinical outcomes are necessary for risk stratification of hospitalized COVID-19 patients, especially if there is a high volume of patients consulting in the emergency departments [11]. Clinicians need better predictors of mortality and tools capable to detect which patients are prone to deteriorate rapidly. Our aim was to evaluate the ability of demographic data, medical history, and on-admission laboratory parameters to predict mortality in hospitalized COVID-19 patients.

Patients and sample handling
Two thousand eight hundred and seventy nine consecutive hospitalized adult patients with confirmed moderate or severe COVID-19 from four hospitals [Hospital General de Villalba (Collado Villalba, Madrid), Hospital Infanta Elena (Valdemoro, Madrid), Hospital Universitario Rey Juan Carlos (Móstoles, Madrid) and Hospital Universitario Fundación Jiménez Díaz in Madrid] from February 27 to April 17, 2020, were retrospectively evaluated. COVID-19 was considered at least moderate and required hospitalization if any of these criteria was met: CURB-65 score >2 or FINE>II, peripheral capillary oxygen saturation (SpO 2 ) <93% or respiratory rate >20 breaths per minute or PaO 2 <65 mmHg, bilateral infiltrates in chest X-ray, ARDS or sepsis/septic shock. All patients received protocolized pharmacological and supportive treatment after admission, and VTE prophylaxis with low molecular weight heparin. Demographic data and medical history of arterial hypertension, diabetes mellitus, smoking habit, obesity [body mass index (BMI) �30 kg/ m 2 ], renal failure [estimated glomerular filtration rate (eGFR) by CKD-EPI <60 ml/min/ 1.73m 2 ], cardiovascular diseases and pulmonary diseases were obtained. Cardiovascular diseases included arrhythmia, congestive heart failure, ischemic heart disease, valvulopathy and hypertensive cardiomyopathy. Pulmonary diseases included chronic obstructive pulmonary disease, asthma, obstructive sleep apnea and pulmonary tuberculosis. Patients were considered to have thrombocytopenia when platelet count was lower than 140 x10 9 /l, prolonged PT when PT was higher than 14 seconds, and elevated ferritin when serum ferritin levels were higher than 400 ng/ml. Data were obtained from a big data research using extract transform load (ETL) tools and natural language processing (NLP) with our Huawei (Huawei Technologies Co., Ltd., Shenzhen, China) platform and the collaboration of Indizen-Scalian (Madrid, Spain). The clinical outcomes were monitored up to April 17, 2020. Only those patients that had been discharged from hospital or those who had died were finally recruited. Exclusion criteria: patients who remained hospitalized at the time of analysis and patients on chronic anticoagulant treatment before hospitalization. A flow diagram of the sample selection and study design is shown in Fig 1. The diagnosis of COVID-19 was made according to World Health Organization interim guidance [21] and confirmed by RNA detection of the 2019-nCoV in the clinical laboratory of Hospital Universitario Fundación Jiménez Díaz.

Laboratory tests
D-dimer levels were determined on ACL Top 700 analyzer (Instrumentation Laboratory, Bedford, MA, USA) using a highly sensitive assay (IL D-dimer HS 500). Prothrombin time (PT), activated partial thromboplastin time (APTT), and fibrinogen were also determined on ACL Top 700 analyzer. Complete blood count was determined on Sysmex XN-1000 analyzer (Sysmex, Kobe, Japan). Serum ferritin levels were determined on Roche Cobas 6000 (Roche Diagnostics, Mannheim, Germany).

Ethics statement
This observational study followed the ethical principles of the Helsinki Declaration and was previously approved by the Ethics Committee for Clinical Research of the Hospital Universitario Fundación Jiménez Díaz on April 14, 2020. Medical records of all the patients included were accessed from April 1 to May 15, 2020. All data were fully anonymized before we accessed them. Due to the retrospective nature of our study, the ethics committee waived the requirement for informed consent.

Statistical analysis
All the laboratory results analyzed (serum ferritin, lymphocyte count, APTT, PT, fibrinogen levels, D-dimer levels, and platelet count) were the first determination of each parameter, which had been performed either in the emergency department or within 3 days from admission to ward. Age, gender and chronic comorbidities (arterial hypertension, diabetes mellitus, obesity, smoking habit, renal failure, cardiovascular disease and pulmonary disease) were also analyzed. Statistical comparisons of survivors and non-survivors were calculated using the chisquare test for categorical variables and Student's t test for continuous variables. The results were expressed as mean ± standard deviation if normal distributed, and as median (25-75 percentiles) if skewed, and numbers (percentage). Two-sided p values less than 0.05 were considered statistically significant.
In order to simplify the score and increase its reproducibility and applicability in other hospitals and countries, the statistically significant quantitative variables were categorized. Age was splitted into 5 subgroups (�50, 51-60, 61-70, 71-80, and >80 years-old) since it is the most important prognostic factor [5]. We applied a previously published cut-off for D-dimer levels �1000 μg/l [two-fold increase of upper limit of normality (ULN)] or >2 ULN [3], whereas the other two variables were categorized into two subgroups according to their normality range: PT �14 or >14 seconds and platelet count �140 x10 9 /l or >140 x10 9 /l. Statistically significant variables in the categorical analysis were included in a logistic regression model, performed in a randomly selected training cohort including around 60% of the total amount of patients. Missing data were estimated by multiple imputation with 50 different estimations performed [22]. Enter method was employed with Wald P values. In order to achieve a better adjustment of the model, once significant variables were identified, a new model including only these variables was estimated. Logistic regression coefficients and P values shown were obtained from the pooled analysis. Brier score analysis was calculated and odds predicted by the model were analyzed by using a ROC analysis. Prognostic features of the model in both cohorts were calculated by using a complete case analysis. A cut-off was selected based on its sensitivity and specificity. The logistic regression coefficients and the cut-off selected were validated in a different cohort composed of around 40% of total patients. Sensitivity, specificity, and predictive values in both cohorts were assessed and two-sided confidence intervals (CI) were calculated by the Wilson method. This was carried out using the Domenech Macro! DTfor SPSS (http://www.metodo.uab.cat/macros.htm). All statistical tests were performed in SPSS version 19.0 statistics package.
We adhered to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement for reporting [23].

Results
A total of 2879 moderate to severe COVID-19 hospitalized patients were initially evaluated for inclusion in the development cohort. Of these, 809 were excluded: 515 remained hospitalized at the time of analysis and 294 were on chronic anticoagulant treatment before hospitalization. The final sample consisted of 2070 patients (884 females and 1186 males) with definite outcomes: 1677 (81.01%) patients had been discharged (survivors) and 393 (18.99%) patients had died (non-survivors).
The laboratory parameters and clinical characteristics of the patients at baseline are presented in Table 1 for all patients, survivors and non-survivors; data on some variables were missing for some patients. The mean age at disease onset was 65.68 years (range, 20-104). The proportion of male patients was higher in non-survivors (20.92% vs. 16.29%, p = 0.008). The mean length of hospital stay was 6.87 days (range, 0-41) in survivors and 6.51 days (range, 0-35) in non-survivors. Compared with survivors, non-survivors showed higher D-dimer levels on admission, prolonged PT and lower platelet count (Table 1). No significant differences were found in smoking, obesity, lymphocyte count, APTT and fibrinogen levels. Additionally, there were no differences in the duration of hospitalization between survivors and non-survivors (6.87 ± 5.86 days vs. 6.51 ± 5.25 days, p = 0.232).
Significant differences in their in-hospital mortality were observed in categorized quantitative variables. In-hospital mortality was 2.73% in patients younger than 50 years old (used as the reference category) (p<0.001), 5% in those between 51 and 60 years [Odds ratio (OR) 1 . Additionally, OR of in-hospital mortality was higher in patients with arterial hypertension, cardiovascular diseases, pulmonary diseases, and renal failure. Although non-survivors had slightly lower serum ferritin levels, when categorized according to elevated ferritin (yes/no), no differences were found between both groups.
A total of ten parameters showed statistically significant differences between survivors and non-survivors. They were then examined in a multivariate logistic regression model including 1270 patients to identify independent prognostic factors of moderate/severe COVID-19 inhospital mortality (  Table 2). Arterial hypertension, diabetes mellitus, cardiovascular disease, pulmonary disease, renal failure, and thrombocytopenia lost their significance and were not included in the final model. D-dimer levels >1000 μg/l (2 ULN), prolonged PT, male gender, and age showed an increase in the probabilities of death. The model showed no overdispersion. The formula of MRS-COVID-19 (Mortality Risk prognostic Score for hospitalized  Table 3). Mortality rate in the validation cohort was 17.39%, comparable to first cohort´s. Brier score was 0.11 and 0.12 in the development and validation cohorts, respectively.
An interactive risk calculator for the application of individual combinations of the five parameters is provided at GooglePlay called MRS-COVID-19. This calculator allows for the classification of patients into low-risk or high-risk of in-hospital mortality and estimates OR values using young female without coagulopathy markers as the reference category.

Discussion
The main finding of our study was the development and validation of a predictive model of inhospital mortality based on age, gender, and on-admission coagulopathy markers of COVID-19. The actual COVID-19 pandemic has become a huge challenge for the health care systems of many countries due to the massive number of infected subjects. Emergency departments

PLOS ONE
have been overwhelmed due to insufficient medical personnel and resources and patient overcrowding [24]. The access to invasive ventilation and/or intensive care units has been limited or prioritized to patients developing severe hypoxemic respiratory failure. In order to address these shortages and their consequences, it is essential that health care systems develop efficient strategies and plans to effectively deal with them. A risk model or score capable of predicting on admission which COVID-19 patients will most probably survive would be a strategy of great interest, in order to avoid the collapse of acute care hospitals as far as possible. Thus, predictive models with high sensibility and, therefore, high negative predictive value would be desirable, since low-risk patients could either be discharged or derived to other support institutions that lack intensive care units.
Our study demonstrates that in-hospital mortality among moderate or severe hospitalized COVID-19 patients is predicted by the combination of age, gender, and coagulopathy markers (D-dimer and PT). The regression coefficients and cut-off selected were then validated in an independent data sample. Because our aim was to create a screening tool, we intentionally used a cut-off with high sensitivity and NPV, but low specificity and PPV. Therefore, the proportion of patients misclassified as high-risk will be elevated. However, patients classified as low-risk on admission could get either discharged early or derived to other centers without intensive care units with the certainty that their likelihood of dying is not as high as those classified as high risk, based on our arbitrarily selected and afterwards validated cut-off. Further external validation of our findings should be performed. Although COVID-19 mortality rates may be lower in future outbreaks due to improvements in its management and better access to medical infrastructures, the predictive capacity of our model should not be worse.
The model could be easily implemented in any laboratory information system (LIS), so that clinicians may automatically have the prognostic information. Additionally, in clinical trials that include adult COVID-19 patients of all ages, our model could be useful to ensure the comparability of included comparison groups.
Based on the logistic regression model coefficients, age was confirmed to be the strongest predictor of mortality in our cohort. Most of COVID-19 patients aged less than 50 years old (97.27%) or between 51 and 60 years old (95%) will be discharged within a few days regardless of their laboratory parameters on admission. On the other hand, near half of the patients over 80 years old died (45.77%), probably owing to a less rigorous immune response, thus suggesting that our predictive model seems to be less helpful in extreme age ranges. However, the addition of coagulopathy markers to age and gender may help clinicians refine the prognosis of hospitalized COVID-19 patients, especially those aged between 50 and 70 years.
Moderate or severe COVID cases are more likely to occur in older men with comorbidities [1]. A recent meta-analysis with aggregated data, including a total number of 3027 COVID-19 patients, confirmed that male, aged over 65 years, smoking and comorbidities such as hypertension, diabetes, cardiovascular disease, and respiratory diseases were risk factors for severe disease and mortality [2]. More than 60% of our 2070 cases were over 60 years old, and the likelihood of dying was higher in men compared to women. Non-survivors from our cohort were older, had more chronic pathologies (with the exception of obesity and smoking habit), and a showed a higher proportion of males. Our findings are in agreement with previous reports, since the outcome was significantly worse in male patients and those with chronic pathologies. However, the presence of all of these comorbidities was excluded from our final model.
Although the pathophysiology underlying severe COVID-19 remains poorly understood, a lung-centric coagulopathy is believed to play an important role [25]. COVID-19 associated coagulopathy correlates with illness severity and mortality, and may include increased Ddimer levels, mild PT prolongation and mild thrombocytopenia [10,13,26]. Thrombotic complications have emerged as an important issue in COVID-19 patients as a result of the inflammatory response to SARS-CoV-2. COVID-19 prothrombotic status seems to be multifactorial. The illness severity and hypoxia, hemostatic abnormalities, the severe inflammatory response, plus any other underlying thrombotic risk factors can lead to a thrombotic event [27].
Compared to survivors, the COVID-19 non-survivors from our cohort presented significantly higher D-dimer levels, prolonged PT and lower platelet counts. These results are in agreement with previously published data [10][11][12][13][14]. Similar to our approach, Zhang et al retrospectively analyzed 343 COVID-19 hospitalized patients and reported that a four-fold increase of on-admission D-dimer levels could effectively predict their in-hospital mortality [20]. To our knowledge, there are two studies reporting predictive models of mortality in adult hospitalized COVID-19 patients based on baseline clinical and laboratory data [28,29]. Their risk of bias is high, either because the sample size is small or because they are not validated.
Wang and colleagues developed (n = 296) and validated (n = 44) two models, both based on age: one clinical (including age, hypertension and coronary heart disease sensitivity), and one based on laboratory parameters [age, C-reactive protein, SpO 2 , neutrophil and lymphocyte count, D-dimer, aspartate aminotransferase (AST) and GFR] which had a significantly stronger discriminatory power than the clinical model [28]. The model from Chen and colleagues was developed from a bigger retrospective cohort (n = 1590), and included age, coronary heart disease, cerebrovascular disease, dyspnea, procalcitonin level >0.5 ng/mL, and AST) [29]. However, it has not been validated.
Although most of predictive models have been reported to be at high risk of bias [30], we adhered to the TRIPOD reporting guideline [23] to perform our model, and the Brier test results ensure its good calibration.
The strengths of our model include the study population size, the multicenter nature of data and the inclusion of a validation cohort. However, the model has some limitations. First of all, it is a retrospective analysis. Second, no data from possible hospital readmission of survivors were available, and it is possible that some initially recovered patients may have worsened a few days later. Finally, although we obtained dichotomized variables in order to simplify the model and increase its applicability, the use of continuous variables has the potential to provide more refined information.
In conclusion, we developed and validated a predictive model for in-hospital mortality of moderate or severe COVID-19 patients, which included D-dimer levels >2 ULN, prolonged PT, male gender and age as positively correlated factors with fatal outcome. Our findings, obtained and validated from a large series of hospitalized COVID-19 patients, support the use of this prognostic tool on admission to identify a low-risk group that may benefit from early discharge or derivation to support institutions, in order to prevent acute care hospitals getting overwhelmed. Prospective studies are needed to confirm our findings.