Development and validation a nomogram for predicting the risk of severe COVID-19: A multi-center study in Sichuan, China

Background Since December 2019, coronavirus disease 2019 (COVID-19) emerged in Wuhan and spread across the globe. The objective of this study is to build and validate a practical nomogram for estimating the risk of severe COVID-19. Methods A cohort of 366 patients with laboratory-confirmed COVID-19 was used to develop a prediction model using data collected from 47 locations in Sichuan province from January 2020 to February 2020. The primary outcome was the development of severe COVID-19 during hospitalization. The least absolute shrinkage and selection operator (LASSO) regression model was used to reduce data size and select relevant features. Multivariable logistic regression analysis was applied to build a prediction model incorporating the selected features. The performance of the nomogram regarding the C-index, calibration, discrimination, and clinical usefulness was assessed. Internal validation was assessed by bootstrapping. Results The median age of the cohort was 43 years. Severe patients were older than mild patients by a median of 6 years. Fever, cough, and dyspnea were more common in severe patients. The individualized prediction nomogram included seven predictors: body temperature at admission, cough, dyspnea, hypertension, cardiovascular disease, chronic liver disease, and chronic kidney disease. The model had good discrimination with an area under the curve of 0.862, C-index of 0.863 (95% confidence interval, 0.801–0.925), and good calibration. A high C-index value of 0.839 was reached in the interval validation. Decision curve analysis showed that the prediction nomogram was clinically useful. Conclusion We established an early warning model incorporating clinical characteristics that could be quickly obtained on admission. This model can be used to help predict severe COVID-19 and identify patients at risk of developing severe disease.


Introduction
Since December 2019, several cases of coronavirus disease (COVID-19) emerged in Wuhan, Hubei Province, China [1][2][3]. SARS-Cov-2 is transmitted by respiratory droplets and aerosols, direct contact, and stool [8,9]. COVID-19 is very contagious, and the general population is highly susceptible to infection. The number of affected countries and number of deaths have increased dramatically since the beginning of the outbreak [4]. COVID-19 infection reached more than 130 countries. According to the World Health Organization, as of April 17, 2020, more than 2,034,802 confirmed cases were reported worldwide, and more than 135,163 infected patients died [5]. Therefore, COVID-19 is a serious health problem worldwide.
Coronavirus can affect multiple organs, including the lungs [1]. The main clinical presentation of COVID-19 is pneumonia. Most patients have mild disease, with common respiratory symptoms and good prognosis [6]. The most common clinical symptoms are fever and cough. However, it has been reported that only 43.8% of patients present with fever on admission. Radiologic abnormalities were not observed on initial presentation in approximately 20% of cases [7]. According to the sixth edition of the Novel Coronavirus Pneumonia Diagnosis and Treatment Plan, severe cases should meet any of the following criteria: 1) shortness of breath (respiratory rate �30 breaths per min), 2) oxygen saturation �93% at rest, or 3) arterial partial pressure of oxygen/fraction of inspired oxygen � 300 mm Hg [8]. A small percentage of patients present with severe disease before or during hospitalization, including severe pneumonia, adult respiratory distress syndrome, or multiple organ failure, which are associated with worse outcomes [9][10][11]. As of February 15, 2020, the mortality rate and percentage of severe cases in Hubei Province were 2.6% and 15.2%, respectively [11,12]. Therefore, specific predictive methods are urgently required to predict the risk of severe COVID-19 [9]. Of all existing models, nomogram allows for individualized and evidence-based risk estimation, facilitating management-based decision-making [13,14]. To the best of our knowledge, no previous studies have evaluated early warning models for predicting the risk of severe COVID-19.
Previous studies have shown that there are significant regional differences in the mortality rate and percentage of severe cases [12,15]. The aim of this study is to describe the clinical characteristics of confirmed cases of COVID-19 in Sichuan, China, and construct an early warning prediction nomogram model incorporating clinical characteristics to identify the risk of developing severe COVID-19. respiratory tract [16]. All study patients were diagnosed with COVID-19 according to the WHO interim guidance [17]. The study was approved by the Research Ethics Committee of West China Hospital, and data were collected retrospectively after patients gave written informed consent.

Demographical and risk variables
The following data were obtained from electronic medical records: demographics (age and gender), clinical signs on admission, clinical symptoms, clinical risk factors, and exposure to infection. Clinical symptoms were defined as the interval between the onset of clinical symptoms and the data of admission. Exposure to infection was defined as contact with sources of infection in the past 14 days, including Wuhan or other COVID-19 affected areas, febrile patients, or COVID-19 patients, and the incidence had clustering phenomenon. The risk of exposure to infection changed as the relevant definitions in the COVID-19 guidelines of the National Health Commission of the People's Republic of China changed. National early warning score (NEWS) [18] was calculated on admission. If data were missing from the records or clarification was needed, data were obtained by direct communication with attending physicians and other health care providers. All data were analyzed by two physicians (He YQ and Zhou YW), and a third researcher was consulted in cases of disagreement. The clinical and demographic features in our cohort are summarized in Table 1.

Definition of outcomes
The primary outcome was severe COVID-19 during hospitalization according to the American Thoracic Society guidelines for community-acquired pneumonia [19]. Severe cases should meet one major criterion (septic shock with need for vasopressors or respiratory failure requiring mechanical ventilation) or at least three minor criteria (respiratory rate �30 breaths per min, arterial partial oxygen pressure/fraction of inspired oxygen �250 mmHg, multilobar infiltrates, confusion/disorientation, uremia (blood urea nitrogen level �20 mg/dL), leukopenia (white blood cell count <400 cells/μL), thrombocytopenia (platelet count <100,000/μL), hypothermia (core temperature <36˚C), and hypotension requiring aggressive fluid resuscitation [19].

Statistical analysis
Statistical analyses were performed using R software version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) and SPSS version 25.0 (IBM Corporation, Armonk, NY). Continuous variables were expressed as median and interquartile range. Categorical variables were expressed as absolute values and percentages. The means of continuous variables were compared using independent group t-tests for normally distributed data and the Mann-Whitney test for non-normally distributed data. The χ 2 or Fisher's exact test was used to compare proportions.
The least absolute shrinkage and selection operator (LASSO) method, which is suitable for analyzing high-dimensional data, was used to select the most significant predictive features [20,21]. Features with non-zero coefficients in the LASSO regression model were selected in the forward stepwise logistic regression model [22]. The features were considered as odds ratio (OR) with 95% confidence interval [23] and two-tailed p-values. Variables with p-values smaller than 0.1 in the univariate analysis and potentially significant in the multivariate analysis were included in the logistic regression analysis, and the forward selection procedure was used to develop a parsimonious model for predicting severe COVID-19 in our cohort.

PLOS ONE
Nomogram is a statistical model useful for risk assessment. A predictive nomogram was developed using the independent factors selected by LASSO to generate a combined indicator for estimating the severity of COVID-19, and provided a quantitative tool for physicians to assess the individual probability of disease severity. The created nomogram was used for internal validation, and the total score for each nodule was calculated. The nomogram was constructed using the total score as a factor. Adequate discrimination and calibration were performed to test and validate the prognostic accuracy of the nomogram model [24]. Discrimination was quantified using Harrell's concordance index (C-index), in which an absolute value close to 1 indicated that the model had strong predictive ability. The nomogram was further validated by bootstrapping (1000 bootstrap replicates) to calculate the corrected C-index. Calibration plots were developed to assess the predictive accuracy and agreement between predicted and observed severity. Decision curve analyses (DCAs) were performed to assess the clinical usefulness of the nomogram. The net benefit was calculated by subtracting the proportion of patients with false-positive results from the proportion of patients with true-positive results and by weighing the relative risk of an intervention compared with the adverse effects of an unnecessary intervention. The precision of the predictions was evaluated using the area under the receiver-operating characteristic curve (AUC). Two-sided p-values of less than 0.05 were considered to indicate a statistically significant difference.

Clinical characteristics of patients
A total of 366 patients with COVID-19 who had been hospitalized in 47 regions of Sichuan were enrolled until January 20, 2020. Most patients were admitted to a public health clinical center located in Chengdu. The demographic and clinical characteristics of our cohort are shown in Table 1. The median age was 43 years (interquartile range, 31.8-51.0), and 56.6% were male. Fever occurred in 42.9% of patients from the earliest onset of symptom and in 29.5% on admission. The second most common symptom was cough (31.4%). Digestive symptoms, including vomiting and diarrhea, were present in 7.4% of cases. In our cohort, 25.9% had at least one coexisting disease (hypertension, diabetes, or chronic obstructive pulmonary disease).
Disease severity was considered mild in 323 patients and severe in 43 patients. Patients with severe disease were older than those with mild disease by a median of 6 years. More than 50% of severe patients had systolic blood pressure greater than 110 mmHg. Moreover, fever, cough, and dyspnea were more common in patients with severe disease than those with mild disease (58.1% vs. 40.9, 69.8% vs. 26.3, and 27.9% vs. 3.4%, respectively). Comorbidities were more prevalent in severe patients than mild patients, including hypertension (32.6% vs. 7.4%), cardiovascular disease (16.3% vs. 0.6%), diabetes (14.0% vs. 4.6%), chronic liver disease (9.3% vs. 1.2%), and chronic kidney disease (7.0% vs. 0.3%). More than 37% of severe patients had visited Wuhan or other COVID-affected areas in the past 14 days. However, the history of contact with febrile or COVID-19 patients was similar between the two groups.

Selection of independent predictive factors
Based on demographics, clinical signs on admission, clinical symptoms, clinical risk factors, and exposure to infection, seven potential predictors with non-zero coefficients were selected in the LASSO logistic regression model (Fig 1A and 1B).
The selected predictors were body temperature on admission, cough, dyspnea, hypertension, cardiovascular disease, chronic liver disease, and chronic kidney disease. The results of the logistic regression analysis are shown in Table 2.

Building and validating a prediction nomogram model
The nomogram used for predicting severe COVID-19 was formulated using significant independent factors, including body temperature at admission, cough, dyspnea, hypertension, cardiovascular disease, chronic liver disease, and chronic kidney disease. The nomogram showed that the best predictor of severity was comorbidity, including chronic kidney disease, cardiovascular disease, and chronic liver disease. Each variable was assigned a score according to the demographic and clinical characteristics of each patient, and the total score was computed by summing individual scores. Patient severity probabilities were also obtained from the nomogram (Fig 2).
The C-index of the nomogram was 0.863 (95% CI, 0.801-0.925) in our model and 0.839 by bootstrapping analysis, suggesting that the model had good discriminative ability. The calibration plots of the nomogram showed that the agreement between predicted and observed severity was optimal (Fig 3A). In addition, DCA showed that the predictive model had significant net benefits for almost all threshold probabilities at different time points, demonstrating the potential clinical benefit of the predictive model (Fig 3B). The AUC of the nomogram was 0.862, indicating improved survival prediction compared with the nomogram model (Fig 3C).

Discussion
We developed and validated a prediction nomogram based on clinical features to identify patients who might develop severe disease. The nomogram included vital signs, symptoms, and comorbidities, and showed good discrimination and calibration. Our model is useful to predict severe COVID-19. Traditional evaluation scoring tools, including NEWS, qSOFA, and

PLOS ONE
CURB-65, are adopted to assess disease severity in emergency departments [23,25,26]. However, there is no evidence that these tools are useful for the early assessment of COVID-19 severity. Compared with other diseases, COVID-19 progresses faster, and severity cannot be identified promptly. The early symptoms of COVID-19 are more insidious, the disease progresses faster, and early detection is challenging. Therefore, our nomogram is a convenient and valuable clinical tool for predicting severe COVID-19.
Previous studies have shown that age is an important independent prognostic factor in patients with severe infection diseases, such as severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) [20,26]. It has been demonstrated that prognosis in older COVID-19 patients, especially those aged >65 years, was worse than in younger patients [1,5]. In our study, the severe group was older than the non-severe group. However, age was not an independent predictive factor for severe disease.
Underlying diseases are more prevalent in severe COVID-19 patients than in mild patients, and the most common comorbidities are hypertension, diabetes, and coronary heart disease [11,27]. In the univariate analysis, hospital mortality was significantly higher in patients with underlying diseases (i.e., diabetes and coronary heart disease) than in patients without these comorbidities. To the best of our knowledge, no other studies have evaluated the relationship between underlying diseases and COVID-19 severity. Our study found that chronic

PLOS ONE
cardiovascular disease, hypertension, kidney disease, and liver disease were risk factors for the development of severe illness. COVID-19 impairs the function of multiple organs, including the heart, liver, and kidneys. Existing research suggests that angiotensin-converting enzyme 2 (ACE2) may be a functional receptor for SARS-CoV-2 entry into human cells, and the virus may increase pulmonary vascular permeability and induce acute lung injury by down-regulating ACE2 expression and increasing angiotensin II levels [28][29][30]. ACE2 receptors are highly expressed in cells of the bronchial epithelium, alveoli (type 2 cells), myocardium, renal proximal tubule epithelium, bladder epithelium, esophagus, and ileum, suggesting that SARS-Cov-2 infection not only affects the respiratory system, but may also affect the circulatory, urinary, and digestive systems [31]. Severe patients have multiple organ damage, potentially leading to multiple organ failure. However, additional studies are needed to confirm that patients with underlying diseases (i.e., cardiovascular disease or kidney disease) infected with SARS-Cov-2 will accelerate this series of processes and their underlying mechanisms.
In the early stages of COVID-19, the diversity of symptoms and imaging manifestations limit diagnosis [32][33][34]. Fever and cough are common, and gastrointestinal symptoms are rare in COVID-19 [35]. No fever is more common than SARS and MERS in patients with early stage COVID-19 [11]. Therefore, patients without fever may be undiagnosed. This study found that body temperature higher than 37.3˚C was not a risk factor in COVID-19, and patients without fever in the early stage of the disease had a higher risk of developing severe conditions. The reason is that fever may encourage the patient to seek medical treatment promptly, allowing early disease detection and implementation of medical interventions. In addition, fever can inhibit the reproduction or growth of the virus; however, this process is complex, and the effect of fever on this parameter should be better investigated.
The clinical data obtained on admission are included in the COVID-19 early warning system, which is simple, practical, reliable, and fast. This system was used to assess the risk of developing critical illness in the emergency department and allows medical staff to intervene at an early stage and determine their treatment location and the type of intervention. This system is more practical to evaluate COVID-19 patients than other scoring tools.
Our study has some limitations. First, the design was retrospective. Second, some cases had incomplete data on symptoms, laboratory tests, and imaging examinations, given the variation in the structure of electronic databases across different participating hospitals and an urgent data extraction schedule. Third, some patients were not discharged during the study period, and final prognosis could not be determined. Fourth, the model verification method used internal random verification. Fifth, although the AUC was high (0.863) and 95% CI was adequate (0.801-0.925), the number of severe cases was small; therefore, future studies with larger sample sizes are warranted to validate our results. Sixth, severe patients were older than nonsevere patients, and this difference in age may be a confounding factor. Seventh, although the study is multicenter, the results cannot be generalized to other populations.

Conclusion
We established an early warning model incorporating clinical characteristics that could be quickly obtained on hospital admission. This model can be conveniently used to facilitate the predict the individual risk of severe COVID-19 and help identify patients who might develop severe disease at early stage with convenience.