Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Development of a severity of disease score and classification model by machine learning for hospitalized COVID-19 patients

  • Miguel Marcos ,

    Contributed equally to this work with: Miguel Marcos, Moncef Belhassen-García

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    mmarcos@usal.es

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Moncef Belhassen-García ,

    Contributed equally to this work with: Miguel Marcos, Moncef Belhassen-García

    Roles Conceptualization, Data curation, Writing – original draft, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Antonio Sánchez-Puente,

    Roles Data curation, Formal analysis, Investigation, Software, Writing – original draft, Writing – review & editing

    Affiliations Department of Cardiology, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain, CIBERCV, Instituto de Salud Carlos III, Madrid, Spain

  • Jesús Sampedro-Gomez,

    Roles Conceptualization, Data curation, Investigation, Methodology, Software, Validation, Writing – review & editing

    Affiliations Department of Cardiology, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain, CIBERCV, Instituto de Salud Carlos III, Madrid, Spain

  • Raúl Azibeiro,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Hematology, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Pedro-Ignacio Dorado-Díaz,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliations Department of Cardiology, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain, CIBERCV, Instituto de Salud Carlos III, Madrid, Spain

  • Edgar Marcano-Millán,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Intensive Care Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Carolina García-Vidal,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Infectious Diseases, Hospital Clínic-Universitat de Barcelona, IDIBAPS, Barcelona, Spain

  • María-Teresa Moreiro-Barroso,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Noelia Cubino-Bóveda,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • María-Luisa Pérez-García,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Beatriz Rodríguez-Alonso,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Daniel Encinas-Sánchez,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Sonia Peña-Balbuena,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Eduardo Sobejano-Fuertes,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Hematology, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Sandra Inés,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Cristina Carbonell,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Miriam López-Parra,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Hematology, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Fernanda Andrade-Meira,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Infectious Diseases, Hospital Clínic-Universitat de Barcelona, IDIBAPS, Barcelona, Spain

  • Amparo López-Bernús,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Catalina Lorenzo,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Adela Carpio,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • David Polo-San-Ricardo,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Miguel-Vicente Sánchez-Hernández,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Anesthesiology and Reanimation, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Rafael Borrás,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Emergency Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Víctor Sagredo-Meneses,

    Roles Data curation, Investigation, Writing – review & editing

    Affiliation Department of Intensive Care Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • Pedro-Luis Sanchez,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Writing – review & editing

    Affiliations Department of Cardiology, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain, CIBERCV, Instituto de Salud Carlos III, Madrid, Spain

  • Alex Soriano ,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Supervision, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Infectious Diseases, Hospital Clínic-Universitat de Barcelona, IDIBAPS, Barcelona, Spain

  •  [ ... ],
  • José-Ángel Martín-Oterino

    Roles Conceptualization, Funding acquisition, Investigation, Supervision, Writing – review & editing

    ‡ These authors also contributed equally to this work.

    Affiliation Department of Internal Medicine, University Hospital of Salamanca-IBSAL, University of Salamanca, Salamanca, Spain

  • [ view all ]
  • [ view less ]

Development of a severity of disease score and classification model by machine learning for hospitalized COVID-19 patients

  • Miguel Marcos, 
  • Moncef Belhassen-García, 
  • Antonio Sánchez-Puente, 
  • Jesús Sampedro-Gomez, 
  • Raúl Azibeiro, 
  • Pedro-Ignacio Dorado-Díaz, 
  • Edgar Marcano-Millán, 
  • Carolina García-Vidal, 
  • María-Teresa Moreiro-Barroso, 
  • Noelia Cubino-Bóveda
PLOS
x

Abstract

Background

Efficient and early triage of hospitalized Covid-19 patients to detect those with higher risk of severe disease is essential for appropriate case management.

Methods

We trained, validated, and externally tested a machine-learning model to early identify patients who will die or require mechanical ventilation during hospitalization from clinical and laboratory features obtained at admission. A development cohort with 918 Covid-19 patients was used for training and internal validation, and 352 patients from another hospital were used for external testing. Performance of the model was evaluated by calculating the area under the receiver-operating-characteristic curve (AUC), sensitivity and specificity.

Results

A total of 363 of 918 (39.5%) and 128 of 352 (36.4%) Covid-19 patients from the development and external testing cohort, respectively, required mechanical ventilation or died during hospitalization. In the development cohort, the model obtained an AUC of 0.85 (95% confidence interval [CI], 0.82 to 0.87) for predicting severity of disease progression. Variables ranked according to their contribution to the model were the peripheral blood oxygen saturation (SpO2)/fraction of inspired oxygen (FiO2) ratio, age, estimated glomerular filtration rate, procalcitonin, C-reactive protein, updated Charlson comorbidity index and lymphocytes. In the external testing cohort, the model performed an AUC of 0.83 (95% CI, 0.81 to 0.85). This model is deployed in an open source calculator, in which Covid-19 patients at admission are individually stratified as being at high or non-high risk for severe disease progression.

Conclusions

This machine-learning model, applied at hospital admission, predicts risk of severe disease progression in Covid-19 patients.

Introduction

Since late 2019, a pneumonia outbreak caused by coronavirus SARS-CoV-2 began in the Chinese city of Wuhan and has evolved into a global pandemic [1]. Clinical manifestations of patients with SARS-CoV-2 infection range from mild disease (e.g., only fever or cough) to critically ill cases with acute respiratory distress syndrome and septic shock. In a large report from the Chinese Center for Disease Control and Prevention, with 44415 cases, 36160 (81%) were described as mild, 6168 (14%) as severe, and 2087 (5%) as critical illness, with a mortality of 49% in the latter group [2]. Due to this variability, several factors have been identified to predict increased severity, such as older age, neutrophilia, organ dysfunction, coagulopathy, or elevated D-dimer levels [3].

Machine-learning is a subfield of computer science and statistics that has received growing interest in medicine, especially in infectious diseases, and has allowed to develop tools to predict clinical outcomes such as the occurrence of sepsis in intensive care units or the diagnosis of surgical site infection [4]. Therefore, in this context of worldwide health emergency, early detection of patients who are likely to develop critical illness is of paramount importance and may aid in delivering proper care and optimizing use of limited intensive care resources.

For this purpose, we report here a machine-learning model able to predict risk of severity of disease progression in Covid-19 patients at the time of admission, developed and validated in two large cohorts of patients from two university hospitals, including easy-to-collect variables such as peripheral blood oxygen saturation (SpO2)/fraction of inspired oxygen (FiO2) ratio, age, estimated glomerular filtration rate, procalcitonin, C-reactive protein, updated Charlson comorbidity index and lymphocytes.

Material and methods

Study design and data sources

We conducted a training, validation an external-testing study on an intelligence-based machine-learning model [5], using clinical and laboratory features obtained at hospital admission. A data set from 918 confirmed Covid-19 patients from the University Hospital of Salamanca, Spain, was used for training and internal validation. For external testing we included 352 Covid-19 patients from another university hospital (Hospital Clinic of Barcelona, Spain). A flowchart illustrating the detailed steps involved in our score development is provided as S1 Fig.

Institutional approval was provided by the Ethics Committee of the University Hospital of Salamanca (2020/03/470) and the Comité Ètic d’Investigació Clínica of the Hospital Clínic of Barcelona (HCB/2020/0273), which waived the need for informed consent. All data set were anonymously analyzed, and the study was performed following current recommendation of the Declaration of Helsinki [6].

Task definition

The aim of our study was to develop and validate a machine-learning model to predict, at the moment of hospital admission, the likelihood that a Covid-19 patient will die or require invasive mechanical ventilation during hospitalization. A secondary objective was to deploy this model into a simple clinical digital application to facilitate its use in real time.

Input data (features) consists in demographic variables (including age and sex), individual comorbidities and Charlson Comorbidity Index, chronic medical treatment, clinical characteristics, physical examination parameters, and biochemical parameters available at hospital admission (Tables 1 and 2). As for the corresponding outcome (label), we defined severity of disease progression during hospitalization as the use of mechanical ventilation or death.

thumbnail
Table 1. Admission characteristics of patients from internal validation cohort by outcome.

https://doi.org/10.1371/journal.pone.0240200.t001

thumbnail
Table 2. Admission laboratory findings of patients from internal validation cohort by outcome.

https://doi.org/10.1371/journal.pone.0240200.t002

Data preparation

The data was preprocessed by one-hot encoding multicategory features and completing missing values with the trimmed mean between 5–95 percentiles and mode of each continuous and categorical feature, respectively.

Training and validation of the classification machine-learning model

Three machine-learning classifiers typically used in data sets composed by heterogeneous features [7] were trained: random forest [8], xgboost [9] and regularized logistic regression. The development cohort data was split in a train and validation data set following a 10-stratified fold cross-validation scheme with 10 repetitions [10], and the validation results in each of these splits were averaged to assess the performance of the classifiers. In the training phase, all models were optimized by fine tuning their hyperparameters with 10-fold cross-validation scheme and a grid search algorithm, configuring a nested cross-validation scheme to first perform this hyperparameter optimization and secondly internally evaluate the classifier. The fixed values of not optimized hyperparameters and the ranges of optimized ones for each classification and feature selection algorithm can be consulted in S1 Table.

The code to develop the models was written in Python and open source libraries scikit-learn [11], xgboost and eli5 were used for the implementation of the machine-learning classifiers and cross-validation schemes. The code can be consulted at http://github.com/hus-ml/covid19salamanca-score

In order to better assess the clinical significance of our results, a real-world application of the model was evaluated with patients from a second tertiary university center, the Hospital Clinic of Barcelona.

Evaluation metrics

The differences in clinical, epidemiological and analytical variables between patients with and without severe disease progression at both hospitals were compared using χ2 tests for categorical variables and Student’s t-test for continuous variables.

The performance of the model was evaluated by calculating the area under the receiver-operating-characteristic curve (AUC) and its confidence interval for each prediction model [12, 13]. The classification performance at particular cutoff thresholds based on the receiver-operating-characteristic curve were evaluated according to its sensitivity, specificity, positive predictive value, and negative predictive value.

Severity of disease classification calculator

The developed machine-learning model was deployed in an open source calculator that can be run on a web application (https://covid19salamanca-score.herokuapp.com), in which Covid-19 patients at hospital admission can be individually stratified as high and non-high risk for severity of disease progression.

To develop a friendly and practical calculator, the number of features used by the machine-learning model was reduced from 140 to less than 10. In order to ensure that all relevant clinical features were present a number of additional models were built using combinations of outcomes (death or death plus mechanical ventilation as labels) and restricting the data set to subgroups (older of 75 years of age, younger than 75 years of age or without age-restriction). The performance of these models were compared and the importance of each feature for these models was computed using Mean Decrease Accuracy [9]. We tallied the number of times each feature appeared as one of the most important in a model and chose the most frequent features. Additionally, correlated features with similar importance were chosen by clinical significance and by their availability in the external data set. We selected a final number of seven features as 8th and 9th variables were of much less importance according to mentioned criteria. The new model developed with the selected features was validated to ensure similar results to the original one with all the features.

Results

Development cohort

Between March 1st and April 23rd 2020, among 918 patients that had been admitted at the University Hospital of Salamanca because of SARS-CoV-2 pneumonia, 363 patients (39.5%) died or required mechanical ventilation by May 15th (312 patients died and 82 required mechanical ventilation -31 of them finally died-) and 555 patients (60.5%) did not progress to critical illness and had been discharged by that date. Cause of death was directly related to Covid-19 in 297 patients and to other causes in 15 patients. Diagnosis was confirmed by RT-PCR assay from nasopharyngeal swab or immunochromatography assay in 859 and 59 patients, respectively. Table 1 shows features of patients of this cohort by severity of disease progression.

Concerning clinical variables, patients with severe disease progression were older (average age 79.2 years) and presented with a higher updated Charlson comorbidity index (mean value of 1.8). Overall, patients who developed critical illness had more cardiovascular and central nervous system diseases, and 35 out of 50 patients (70%) with chronic kidney disease had severe disease progression. Cancer was more prevalent in those with severity of disease progression (19.3% vs. 10.8%). Regarding clinical manifestations at admission, shortness of breath and labored breathing were present in 68.9% and 55.7% of the patients who progressed to severe disease, respectively. This group of patients had a significantly lower ratio of oxygen saturation as measured by pulse oximetry divided by the fraction of inspired oxygen (391.7 vs. 296.3) and 66.4% of them required oxygen supplement at admission whilst only 38.6% in the non-severe progression group.

Table 2 represents the laboratory findings at the time of admission by outcome. The patients with severity of disease progression presented at admission with neutrophilia, lymphopenia and higher levels of D-dimer, ferritin, C-reactive protein, procalcitonin and fibrinogen. The critically ill group patients had altered renal function at admission, measured by increased urea and creatinine levels and reduced estimated glomerular filtration rate.

Risk model performance

In order to develop the risk model, we first selected all variables included in the Tables 1 and 2. Using all the cohort patients and variables, the best model obtained in the internal cross-validation an AUC of 0.86 (CI: 0.83–0.88). With the aim of developing a more user-friendly application and according to the described methodology, we identified 7 variables present in all models with independent prognostic significance: peripheral blood oxygen saturation (SpO2)/fraction of inspired oxygen (FiO2) ratio, age, estimated glomerular filtration rate (calculated using Chronic Kidney Disease Epidemiology Collaboration [CKD-EPI] equation), procalcitonin, C-reactive protein, updated Charlson comorbidity index (detailed in S2 Table) and lymphocytes. By restricting to these 7 variables, out of the 3 trained machine-learning classifiers, the best classifier achieved a highest mean AUC of 0.85 (CI: 0.82–0.87) from our development cohort without significant difference among them (Fig 1).

thumbnail
Fig 1. Receiver operating characteristic curves of the machine-learning model for the different classification algorithms.

Panel A shows the internal crossvalidation results and panel B shows the external testing results. The results of a risk classification only based on age are also shown for comparison.

https://doi.org/10.1371/journal.pone.0240200.g001

External testing cohort

Between February 15th and April 28th 2020, 352 patients were admitted at the Clinic Hospital of Barcelona because of their first episode of SARS-CoV-2 pneumonia confirmed by RT-PCR assay from nasopharyngeal swab. Among them, 128 (36.3%) patients developed critical illness (64 died and 77 required mechanical ventilation -13 of them finally died-) and 224 (63.6%) did not and were discharged by May 20th. Cause of death was directly attributed to Covid-19 in all patients but three. The baseline characteristics and laboratory findings at the time of admission in this external testing cohort are represented in the Table 3. Patients with severity of disease progression were older (median age of 68.7), with lower SpO2/FiO2%, lower glomerular filtration rate and higher procalcitonin and C-reactive protein values. In addition, they presented with lower lymphopenia count and higher updated Charlson Comorbidity index scores.

thumbnail
Table 3. Admission demographic and clinical characteristics of patients from external testing cohort by outcome.

https://doi.org/10.1371/journal.pone.0240200.t003

The three trained classifiers restricted to the 7 most relevant variables were externally validated on this cohort. In this case, the best classifier obtained a mean AUC of 0.83 (CI: 0.81–0.85), again without significant differences respect to the other classifiers (Fig 1) and very consistent with the results obtained in the development cohort.

The relative contribution to the AUC of each feature both in the development and testing populations are shown in Table 4. In both cohorts, SpO2/FiO2 and C-reactive protein were the best predictors of critical evolution of disease, while procalcitonin and lymphocyte count showed lower contribution to the prediction.

thumbnail
Table 4. Relative importance of each variable according to mean decrease accuracy, scaled to the most important one.

https://doi.org/10.1371/journal.pone.0240200.t004

Calculator application

The 7-variable model based on the regularized logistic regression, which obtained the best result in the external testing cohort, has been deployed in an open-source web calculator (https://covid19salamanca-score.herokuapp.com/) to predict the risk of severity of disease progression, with the possibility of selecting different cut-off thresholds according to the desired sensitivity for the detection of high-risk patients. This also allows to individualize this threshold depending on the availability of hospital resources (e.g., higher resources may allow more sensitivity to detect high-risk patients). As an example, we have predefined a high availability resource cut-off threshold, which is estimated to obtain in the internal validation cohort a sensitivity of 0.90 and specificity of 0.52 for detecting high-risk patients. This threshold results in the identification of 2 groups of patients representing the 64.6% and 35.4% of the cohort with 55.1% and 11.2% of them developing severe disease, respectively (S2 Fig). These values of sensitivity and specificity are susceptible to change in populations with different risk distributions (e.g., younger populations) or if there are other pre-admission criteria that skew the population. As a consequence, this high availability resource cut-off threshold, evaluated on the external setting cohort (younger population), identified groups including the 39.2% and 60.8% of the population with 65.9% and 17.3% of them developing severe disease, respectively. S3 Table shows the values of sensibility, specificity, precision, and negative predictive value for these two possible thresholds (high and low resource availability).

Discussion

In this study, we have developed and validated, through machine-learning, a clinical risk score to predict at the moment of hospital admission by Covid-19, the risk of mechanical ventilation or death. This score is also provided as an open-source web-based calculator, which allows clinicians to estimate an individual Covid-19 patient risk and make decisions based on availability of resources for critical patients and patient overload.

This score includes several common and readily available variables that may be collected at admission in most hospitals. Both development and testing cohorts of patients are representative series for gaining insights into the prediction of disease severity in Covid-19 patients because both are university institutions, patients were in charge of Infectious Diseases/Internal Medicine Departments and treatment protocols were quite homogeneous due to the recommendations of the Spanish Agency of Medicines and Medical Devices (AEMPS). Further, the selected time frame corresponds to the peak Covid-19 incidence and excess mortality in Spain.

As far as the variables included in the risk model here presented, age has been described as one of the main risk factors predicting severity and inpatient mortality in Covid-19 and other scores have also included this variable [2, 14]. Concerning comorbidities, although the exact type and number of comorbidities posing more risk for adverse outcomes is still unknown, our analysis has shown that updated Charlson comorbidity index was the most powerful variable to integrate and combine comorbidities at admission and resulted better than individual variables, such as hypertension or heart failure, or the classical Charlson index. Considering that the updated Charlson index is an improved and more parsimonious prognostic score than the classical one, has been previously shown to be a useful tool to reduce potential confounding in epidemiological research, and has been described as a prognostic tool in many settings, including infectious diseases [15, 16], this score may therefore serve to adjust for comorbidity in other Covid-19 studies.

The ratio of oxygen saturation as measured by pulse oximetry divided by the fraction of inspired oxygen is a simple measure, which has been previously used in the setting of acute respiratory distress syndrome instead of more complex variables [17], and thus can be evaluated in each patient with Covid-19 pneumonia to help identify patients at higher risk of severe disease.

Regarding laboratory variables, decreased estimated glomerular filtration rate and increased acute phase reactants like procalcitonin or C-reactive protein are associated with higher risk of severe disease. Although renal disease is part of the Charlson index as a comorbid disease, decreased estimated glomerular filtration rate may indicate not only the presence of this comorbidity but also acute kidney injury due to disease severity (e.g., septic shock). Therefore, it is a simple variable to assess severity of disease progression at the time of first visit. Indeed, kidney disease as a predictor of increased Covid-19 inpatient mortality rate has been previously described in a single-center study in China [18].

Increased levels of C-reactive protein and its association with prognosis and severity in Covid-19 have been reported and correlated with pro-inflammatory response [19]. Disease severity has also been linked with increased procalcitonin levels and described in some series although its elevation might be likely associated with the presence of bacterial superinfection [20]. Low lymphocyte count has already been linked to poorer outcomes in Covid-19 inpatients and other viral infections such as influenza [19, 21]. In addition, lymphopenia may play a pathogenic role in this disease due to a decrease of specific lymphocyte subpopulations and tissue infiltration [22].

Machine-learning models incorporate classical methods such as multivariate logistic regression but also add regularization terms and cross validation schemes, which makes the models more robust against overfitting and allows more prediction accuracy for each variable. The score presented here exhibits a very good performance and accuracy, as well as excellent validation in the testing cohort with an easy-to-use web interface. It is of note that our results are also quite consistent with a recent study from Spain which identified advanced age, several comorbidities included in the Charlson index, age-adjusted oxygen saturation, higher concentrations of C-reactive protein, and lower estimated glomerular filtration rate as independent factors associated with increased hazard of death [23]. Although previous scores for Covid-19 risk prediction have been developed and validated in Chinese [14, 19], European [24, 25] or North American patients [26], our score offers an open-source web calculator based in machine learning methodology, shows a very good AUC value with excellent replication in a testing cohort from another center and uses easy-to-collect variables on admission. For instance, our score offers better AUC (development cohort: 0.85, 95% CI 0.82 to 0.87; validation cohort: 0.83, 0.81 to 0.85) than the 4C mortality score [24] developed in European population (development cohort: 0.79, 95% CI 0.78 to 0.79; validation cohort: 0.77, 0.76 to 0.77). As a potential limitation, we have to acknowledge that elderly patients with several comorbidities may have not been candidates for certain therapies such as invasive or non-invasive mechanical ventilation, which may influence mortality in this subgroup of patients and limit the generalization of our findings. However, the results of our score were very similar between development and external testing cohorts and both centers followed the same recommendations for intensive care treatment during COVID-19 pandemic [27]. In any case, additional validation outside of Spain is needed to ensure generalizability. We would also like to highlight the possibility of the open-source web calculator to select different sensitivity cut-off thresholds to classify patients depending on health-care resources and population risk distributions. This possibility may improve the efficiency of triage of Covid-19 patients at hospital admission through a real-time, automated and personalized method that would also take into account hospital intensive care unit availability within this pandemic situation. Thus, patients with a high risk of severe disease could be early transferred to tertiary care hospitals or to intermediate care units if available in order to close monitoring and early access to non-invasive or invasive mechanical ventilation.

This study, to assure uniformity, only focused on patients admitted to a university hospital after an emergency department visit and did not include those Covid-19 cases managed in the outpatient setting. However, it would be optimal to validate this risk score at the time of first evaluation by family physicians to potentially identify patients at risk of progressive disease and thus allow early hospital referral.

In summary, this risk model may represent a reliable system that uses widely available clinical and laboratory parameters at hospital admission. The application of machine-learning methods has led to better prediction of the outcome for the identification of Covid-19 inpatients that will likely develop progressive disease after admission.

Supporting information

S1 Table. List of adjusted hyperparameters of each classifier, their tested values and optimized value.

https://doi.org/10.1371/journal.pone.0240200.s002

(DOCX)

S2 Table. Definition and weights of comorbid conditions included in the updated Charlson comorbidity index.

https://doi.org/10.1371/journal.pone.0240200.s003

(DOCX)

S3 Table. Evaluation metrics as measured in the internal validation dataset for the suggested thresholds in the calculator.

https://doi.org/10.1371/journal.pone.0240200.s004

(DOCX)

S1 Fig. Flow chart illustrating steps involved in prognostic score development.

https://doi.org/10.1371/journal.pone.0240200.s005

(TIF)

S2 Fig. Caption of web based calculator.

A high availability resource cut-off threshold is estimated to obtain in the internal validation cohort a sensitivity of 0.90 and specificity of 0.52 for detecting high-risk patients. Web based calculator available at https://covid19salamanca-score.herokuapp.com/.

https://doi.org/10.1371/journal.pone.0240200.s006

(TIF)

Acknowledgments

We acknowledge to María-Victoria Mateos and Jose-Ramón González-Porras from the Hematology Department of the University Hospital of Salamanca for her contribution with the analysis and writing of this manuscript as well as to all doctors and healthcare personnel involved in the COVID-19 Working Team from the Internal Medicine Department and others departments of the University Hospital of Salamanca: José-Ignacio Martín-González, José-Ignacio Madruga-Martín, María-José García-Rodríguez, Ángela Romero-Alegría, Nora Gutiérrez-San Pedro, Leticia Moralejo-Alonso, José-Ignacio Herrero-Herrero, Antonio Chamorro, Mercedes Martín-Ordiales, Celestino Martín-Álvarez, Guillermo Hernández, María Sánchez-Ledesma, María-José Sánchez-Crespo, Felipe Álvarez-Navia, Patricia Araoz-Sánchez, Judit Aparicio-García, Jacinto Herráez, Ronald Macías, Alejandro Rolo, Juan-Francisco Soto, Laura Manzanedo, Luis Seisdedos, Juan-Miguel Manrique, Alfredo-Javier Collado, Sandra Rodríguez, Ana Rodríguez, Silvia Ojea, Laura Burgos, Carlos Reina, Eugenia López, José-María Bastida, María Díez-Campelo, Alberto Hernández-Sánchez, Luis-Mario Vaquero, Ignacio Martín, Cristina de-Ramón, Estefanía Pérez, Borja Puertas, Daniel Presa, Ana Yeguas, Ana África, María-Victoria Coral, Rosa Tejera, Laura Gil, Javier Fernández, Elisa Acosta, Fátima Bouhmir, Sonia Pastor, Marta Fonseca, María-de-los-Ángeles Pérez-Nieto, Javier Fernández, Ernesto Parras, María Cartagena, Víctor Barreales, Óscar Humberto, María Bartol, Olga Compán, Ana Ramón, Raquel Rodríguez, Silvia Ruiz, Sonsoles Garrosa, Alexis-Alan Rodrigo, Sara Alonso, Raquel Domínguez, Felipe Peña, María García-Duque, Ángela Romero, Ana Menéndez, Esther Puerto, Ana Rodrigo, Juan Hernández-Goenaga, Monica Sánchez and all the staff members. We want also to thank all COVID-19 patients admitted at the University Hospital of Salamanca and their families. We acknowledge to the Infectious Diseases’ Research Group of the Hospital Clínic of Barcelona: Albiach L, Agüero D, Ambrosioni J, Bodro M, Blanco JL, Cardozo C, Chumbita M, De la Mora L, García-Alcaide F, García-Pouton N, González-Cordón A, Hernández-Meneses M, Inciarte A, Laguno M, Leal L, Linares L, Macaya I, Mallolas J, Martínez E, Martínez M, Meira F, Miró JM, Mensa J, Moreno A, Moreno A, Moreno-García E, Morata L, Martínez JA, Puerta-Alcalde P, Rico V, Rojas J, Solá M, Tomé A, Torres B, Torres M, and all the staff members. We also acknowledge to the Medical Intensive Care Unit: Adrian Téllez, Sara Fernández, Pedro Castro, Josep M Nicolás, and all the staff members.

References

  1. 1. Lu RJ, Zhao X, Li J, Niu P, Yang B, Wu H et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet, 2020;395:565–74. pmid:32007145
  2. 2. Wu Z, McGoogan JM. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA, 2020;24.
  3. 3. Yang XB, Yu Y, Xu JQ, Shu H, Xia J, Liu H et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respiratory Medicine, 2020;8:475–81. pmid:32105632
  4. 4. Peiffer-Smadja N, Rawson TM, Ahmad R, Georgiou P, Lescure FX, Birgand G, et al. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clinical Microbiology and Infection, 2020;26:584–95. pmid:31539636
  5. 5. Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med, 2019;380:1347–58. pmid:30943338
  6. 6. World Med A. World Medical Association Declaration of Helsinki Ethical Principles for Medical Research Involving Human Subjects. Jama-Journal of the American Medical Association, 2013;310:2191–4. pmid:24141714
  7. 7. Dorado-Diaz PI, Sampedro-Gomez J, Vicente-Palacios V, Sanchez PL. Applications of Artificial Intelligence in Cardiology. The Future is Already Here. Revista Espanola De Cardiologia, 2019;72:1065–75. pmid:31611150
  8. 8. Breiman L. Random forests. Machine Learning, 2001;45:5–32.
  9. 9. Chen TQ, Guestrin C, Assoc Comp M. XGBoost: A Scalable Tree Boosting System. Kdd’16: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 2016:785–94.
  10. 10. Rodriguez JD, Perez A, Lozano JA. Sensitivity analysis of kappa-fold cross validation in prediction error estimation. IEEE Trans Pattern Anal Mach Intell, 2010;32:569–75. pmid:20075479
  11. 11. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 2011;12:2825–30.
  12. 12. Bouckaert R. Choosing Between Two Learning Algorithms Based on Calibrated Tests. Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.
  13. 13. Nadeau C, Bengio Y. Inference for the generalization error. Machine Learning, 2003;52:239–81.
  14. 14. Liang W, Liang H, Ou L, Chen B, Chen A, Li C, et al. Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern Med, 2020;180(8):1081–1089 pmid:32396163
  15. 15. Ternavasio-de la Vega HG, Castano-Romero F, Ragozzino S, Sánchez González R, Vaquero-Herrero MP, Siller-Ruiz M, et al. The updated Charlson comorbidity index is a useful predictor of mortality in patients with Staphylococcus aureus bacteraemia. Epidemiology and Infection, 2018;146:2122–30. pmid:30173679
  16. 16. Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, et al. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol 2011;173:676–82 pmid:21330339
  17. 17. Schmidt MFS, Gernand J, Kakarala R. The use of the pulse oximetric saturation to fraction of inspired oxygen ratio in an automated acute respiratory distress syndrome screening tool. Journal of Critical Care, 2015;30:486–90. pmid:25746583
  18. 18. Cheng Y, Luo R, Wang K, Zhang M, Wang Z, Dong L, et al. Kidney disease is associated with in-hospital death of patients with COVID-Covid-19. Kidney Int, 2020;97:829–838 pmid:32247631
  19. 19. Yan L, Zhang HT, Goncalves J, Xiao Y, Wang M, Guo Y, et al. An interpretable mortality prediction model for COVID-19 patients. Nat Mach Intell, 2020;2:283–8.
  20. 20. Lippi G, Plebani M. Procalcitonin in patients with severe coronavirus disease 2019 (COVID-19): A meta-analysis. Clinica Chimica Acta, 2020;505:190–1. pmid:32145275
  21. 21. Lalueza A, Folgueira D, Diaz-Pedroche C, Hernández-Jiménez P, Ayuso B, Castillo C, et al. Severe lymphopenia in hospitalized patients with influenza virus infection as a marker of a poor outcome. Infectious Diseases, 2019;51:543–6. pmid:31012776
  22. 22. Chen G, Wu D, Guo W, Cao Y, Huang D, Wang H, et al. Clinical and immunological features of severe and moderate coronavirus disease 2019. The Journal of clinical investigation, 2020;130:2620–9. pmid:32217835
  23. 23. Berenguer J, Ryan P, Rodríguez-Baño J, Jarrín I, Carratalà J, Pachón J, et al. Characteristics and predictors of death among 4,035 consecutively hospitalized patients with COVID-19 in Spain. Clin Microbiol Infect, 2020;S1198-743X(20)30431-6.
  24. 24. Knight Stephen R, Antonia Ho, Riinu Pius, Iain Buchan, Gail Carson, Drake Thomas M et al. Risk stratification of patients admitted to hospital with covid-19 using the ISARIC WHO Clinical Characterisation Protocol: development and validation of the 4C Mortality Score. BMJ 2020; 370:m3339. pmid:32907855
  25. 25. Galloway JB, Norton S, Barker RD, Brookes A, Carey I, Clarke BD, et al. A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: An observational cohort study. J Infect. 2020;81:282–288 pmid:32479771
  26. 26. Sun H, Jain A, Leone MJ, A labsi HS, Brenner LN, Ye E et al. CoVA: An Acuity Score for Outpatient Screening that Predicts COVID-19 Prognosis [published online ahead of print, 2020 Oct 24]. J Infect Dis. 2020;jiaa663. pmid:33098643
  27. 27. Rubio O, Estella A, Cabre L, Saralegui-Reta I, Martin MC, Zapata L, et al. Ethical recommendations for a difficult decision-making in intensive care units due to the exceptional situation of crisis by the COVID-19 pandemia: A rapid review & consensus of experts. Med Intensiva. 2020;44:439–445. pmid:32402532