Machine learning methods to predict mechanical ventilation and mortality in patients with COVID-19

Background The Coronavirus disease 2019 (COVID-19) pandemic has affected millions of people across the globe. It is associated with a high mortality rate and has created a global crisis by straining medical resources worldwide. Objectives To develop and validate machine-learning models for prediction of mechanical ventilation (MV) for patients presenting to emergency room and for prediction of in-hospital mortality once a patient is admitted. Methods Two cohorts were used for the two different aims. 1980 COVID-19 patients were enrolled for the aim of prediction ofMV. 1036 patients’ data, including demographics, past smoking and drinking history, past medical history and vital signs at emergency room (ER), laboratory values, and treatments were collected for training and 674 patients were enrolled for validation using XGBoost algorithm. For the second aim to predict in-hospital mortality, 3491 hospitalized patients via ER were enrolled. CatBoost, a new gradient-boosting algorithm was applied for training and validation of the cohort. Results Older age, higher temperature, increased respiratory rate (RR) and a lower oxygen saturation (SpO2) from the first set of vital signs were associated with an increased risk of MV amongst the 1980 patients in the ER. The model had a high accuracy of 86.2% and a negative predictive value (NPV) of 87.8%. While, patients who required MV, had a higher RR, Body mass index (BMI) and longer length of stay in the hospital were the major features associated with in-hospital mortality. The second model had a high accuracy of 80% with NPV of 81.6%. Conclusion Machine learning models using XGBoost and catBoost algorithms can predict need for mechanical ventilation and mortality with a very high accuracy in COVID-19 patients.


Objectives
To develop and validate machine-learning models for prediction of mechanical ventilation (MV) for patients presenting to emergency room and for prediction of in-hospital mortality once a patient is admitted.

Methods
Two cohorts were used for the two different aims. 1980 COVID-19 patients were enrolled for the aim of prediction ofMV. 1036 patients' data, including demographics, past smoking and drinking history, past medical history and vital signs at emergency room (ER), laboratory values, and treatments were collected for training and 674 patients were enrolled for validation using XGBoost algorithm. For the second aim to predict in-hospital mortality, 3491 hospitalized patients via ER were enrolled. CatBoost, a new gradient-boosting algorithm was applied for training and validation of the cohort.

Results
Older age, higher temperature, increased respiratory rate (RR) and a lower oxygen saturation (SpO2) from the first set of vital signs were associated with an increased risk of MV amongst the 1980 patients in the ER. The model had a high accuracy of 86.2% and a negative predictive value (NPV) of 87.8%. While, patients who required MV, had a higher RR,

Introduction
The number of infections related to the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and causing coronavirus disease 2019 (COVID-19) has increased exponentially with over 4 million cases reported in the US alone. Many states and hospital systems have experienced considerable challenges with the unexpected number of cases with a strain on an already fragile health system, causing multiple hospitals to reach or exceed capacity [1]. The majority of patients experience mild disease but approximately 15% -20% of symptomatic patients progress to severe pneumonia requiring hospitalization [2]. Current evidence suggests important derangements within the immune system and the coagulation cascade in COVID-19 patients [3,4].
Observational studies have shown several features, associated with increased risk of hospitalization in COVID-19 including older age, male sex, obesity, admission oxygen saturation (SpO2) less than 88%, respiratory rate greater than 24/minute, comorbid conditions such as diabetes, hypertension, chronic kidney disease and lab values like, elevated troponin level, C reactive protein level > 200 and D-dimer level > 2500 [5][6][7]. All these studies also point to a high mortality in intubated patients over 50%. Other reports suggest that patients over the age of 65, and those with co-morbid conditions are at a higher risk of mortality, ranging from 4.3%-7.5% [8][9][10].
There is an urgent need for disease stratification during the pandemic and several statistical models are being developed based on observational studies. However, despite various retrospective associations, it is still unclear if an individual patient in the emergency room (ER) with mild to moderate disease is at risk of progression to severe disease. Machine learning (ML) algorithms are designed to scrutinize big data from both structured and unstructured data and gather information without bias. Real time efficient management of patient and hospital resource allocation would require development of a predictive model, which can accurately classify COVID-19 patients at risk of invasive mechanical ventilation (MV) and death.
We hypothesized a parsimonious model with fewer parameters including vital signs and demographics at the time of presentation would be helpful in the ER for determining need for intubation and mechanical ventilation, while a complicated profile with laboratory values will be beneficial for hospitalized patients to prognosticate mortality.
Health Institutional Review Board 2020-125 and all data were fully anonymized. The patients' COVID-19 infection was confirmed by a positive SARS-CoV-2 nucleic acid by real-time fluorescent RT-PCR test of respiratory tract or blood specimens. Participates a. Prediction of MV: A total of 1,980 COVID-19 patients who were evaluated at ER between 2/20/2020 and 5/5/2020 were enrolled. The patients who visited an ER department between 2/20/2020 and 4/17/2020 were used as the training and testing cohort. COVID-19 patients who visited ER between April 18 and May 5, 2020 were enrolled as the prospective validation cohort.
b. Prediction of Mortality. A total of 3,491 hospitalized COVID-19 in-patients were enrolled. They visited ER departments of Beaumont Health and were subsequently admitted between 2/1/2020 and 5/4/2020. Survivors were hospitalized for 8.4 days on average; demised patients were hospitalized for 11.1 days on average. More clinical and laboratory data were collected on these patients.

Data collection
Information including demographics, past smoking and drinking history, past medical history and vital signs at ER, laboratory values, and treatments were used as independent features of the prediction models. They were collected from EPIC EMR system at Beaumont Health using Structured Query Language (SQL) queries.

Statistic
In baseline characteristics, continuous features are presented as means with standard deviations (SDs), and comparisons between groups were analyzed by performing two-sided Student's t-test. Categorical variables were represented as frequencies and percentages and they were compared using Chi-square test (if cell counts equal to or more than 5) or Fisher's exact test (if cell counts below 5). Statistics analysis was conducted by SAS (version 9.4, SAS Institute, Cary, NC).

Machine leaning algorithms
For prediction of mechanical ventilation, we implemented a classification algorithm based on XGBoost (https://github.com/dmlc/xgboost/). Designed for speed and performance, XGBoost is decision-tree-based ensemble Machine Learning algorithm [11]. It uses an ensemble method that fits each iteration of the new model with residuals from previous prediction in both regression and classification trees. Since its introduction in 2016, it has been credited for winning numerous data science competitions and improving industry applications [12]. We utilized k-fold cross-validation during training and hyperparameter optimization to prevent overfitting. Prediction of mortality was performed using CatBoost (https://catboost.ai/), a new gradient-boosting algorithm. It manages categorical features out-of-box and outperforms state-of-the-art machine-learning algorithms on popular publicly available data sets. In implementation, categorical features were indicated explicitly and CatBoost encodes them one-hot encoding.
Accuracy and AUC (Area Under the Curve) ROC (Receiver Operating Characteristics) curve were used to evaluate the performance of prediction models. Our algorithms were developed in Python (3.6.3) for data collection, data cleaning, feature engineering, machine learning training and testing. The development environments included PyCharm and Jupyter Notebook. The key libraries included Numpy, Pandas, Sklearn, Scipy, XGBoost, catBoost, imLearn, and matplotlib. The last decade has witnessed the rapid progress in machine learning and AI. Their adoption in medicine lags behinds other industries. Unexplainability is one of the major criticisms. In this study, we attempted to shed light on ML models in predicting COVID-19 patients' clinical outcome using SHAP (SHapley Additive exPlanations). SHAP is a game theoretic approach to explain the output of any machine learning model [13]. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. SHAP values are the average of the marginal contributions across all permutations, providing global view of feature ranking and individual force view.

Prediction of mechanical ventilation
A total of 1,980 unique patients were analyzed ( Table 1). The average age was 63.2 ± 17.1 years old and 1,013 (51.2%) were male. 1,306 patients visited an ER department in Beaumont Health system between 2/20/2020 and 4/17/2020 and 674 of them between 4/18/2020 and 5/6/2020 for a COVID-19 related symptom. There are significant statistical differences in sex, race, BMI, smoking history, history of DM, lung disease and heart disease between those who were mechanically ventilated and those who were not. Performance of the model. The patient cohort of 1,306 patients (between 2/20/2020 and 4/17/2020) was used for training and validation of XGBoost model. After the model was trained and its hyperparameters were optimized in a k-fold cross-validation fashion, the performance of the model on a 20% randomly selected patients is summarized in Table 2. The  Table 3. The prediction accuracy is 86.2% (95% CI: 0.026) with a NPV of 87.8%, and specificity of 97.6%. AUC of ROC was 68% (Fig 1).

PLOS ONE
Machine learning algorithm to predict outcomes in COVID-19

Feature importance
The features are ranked in descending order of their impact on prediction outcomes in Figs 2 and 3. Fig 2 shows  higher chance of MV. Patients with elevated temperature and an elevated RR had a higher chance of requiring MV. Likely, lower SpO2, history of DM and smoking were related to increased chance of MV.

Prediction of mortality
This cohort included 3,491 COVID-19 patients, who visited ER departments of Beaumont Health and were subsequently hospitalized between 2/1/2020 and 5/4/2020 (Table 4). Their average age was 62.3 ± 17.5 years old and 51.4% were females. As with the MV cohort, the mortality cohort also had significant statistical differences among deceased and surviving patients in several categories including age, sex, race, BMI, smoking history, alcohol history, history of DM, history of lung disease, history of heart disease, and history of kidney disease ( Table 4).

Performance of the model
The patient cohort was randomly split into training (80%) and testing (20%) groups to train and test CatBoost model. The confusion matrix is shown in Table 5. The accuracy of the model reached 88.3% (95% CI + 0.024) ( Table 5) and the AUC of ROC is 90% (Fig 4). Because of the unbalanced nature of mortality in the COVID-19 patients, population of survived patients were randomly down-sampled to achieve a new balanced patient cohort, consisting of 506 survived patients and 506 deceased patients. The sample was then randomly split into training (80%) and testing (20%) to retrain CatBoost model. Its performance on testing group was shown in the confusion matrix in Table 6. The accuracy remained high at 80.3% (95% CI + 0.025). The NPV was 81.6%, and PPV was 79.0% with the balanced model. The AUC of ROC is 85% (see Fig 5).

Feature importance
The top-20 predictors using the CatBoost model are ranked in descending order by feature importance, as shown in Figs 6 and 7. Requirement of MV is the most important predictor of survival. Other important features included admission to the ICU, need for vasopressors, elevated respiration rate and pulse rate.

Discussion
The highlight of this study is three-fold. First, we used a parsimonious ML algorithm to predict hard end points, such as mechanical ventilation with high specificity and NPV. The model is reliant on initial triage vitals in the ER, such as temperature and minimum oxygen saturation, and basic demographics such as age and BMI. Thus, it can assist physicians during the pandemic with making critical decisions of discharging home versus hospital admission. The model may also help with resource allocation and flow of operations for crisis management teams, especially with scarcity of ventilators. Secondly, the mortality prediction algorithm uses several key laboratory and other features in addition to patient characteristics and has consistent accuracy of over 85%. The major features were whether patient was receiving MV, had a high initial respiratory rate, longer length of stay and increased BMI. From a clinical standpoint, daily mortality assessment is crucial to determine the need for escalation of therapy, site of care decisions and goals of care discussion. The model may provide an avenue for dynamic deployment in the hospitals across the country to give "at a time risk of mortality" among admitted patients. Finally, ML algorithms used in model development are rigorous and can account for missing data and categorical nature of real-world data. Arvind et.al. studied predictors of MV among 4087 patients from New York City, using random forest classifier, a supervised ML algorithm and demonstrated an AUC of 0.84, similar to our findings [14]. Unlike our model, they used 24-hour data to predict 72-hour risk of intubation in a time serious manner. But interestingly the highest weight in their model was elevated RR, again one of the major features in our initial predictive model. Compared to their study, we wanted to risk predict those at risk of intubation from the time of admission. Yan and colleagues from Wuhan, China predicted mortality from different biomarkers using ML and AI algorithms [15]. They had studied patients with COVID-19 from January 2020 to February 2020 even before the pandemic hit United States. They used XGBoost classifier as a predictor model. In their model high lactate dehydrogenase (LDH), low lymphocyte count (lymphopenia) and high levels of high sensitivity C-reactive protein (hs-CRP) were found to be predictors of mortality. Their model has high level of accuracy (90%) in predicting mortality 10 days in advance but their sample size was small (n = 485). Compared to our study, Yan only studied biomarkers, while we included all parameters including demographics, vitals, comorbidities, and other variables like need for MV and need for vasopressors. Therefore, biomarkers were not ranked high in our prediction model. In another study, Wu et.al, studied COVID 19 patients from China, as well as, other countries like Italy and Belgium to train and validate a clinical prediction model for severity of pneumonia [16]. Non-severe patients were treated at home or mobile hospitals, while severe patients required higher level of care including ICU care. They did not use machine learning instead they used clinical scoring system with all available demographics, comorbidities and investigational data. During validation of model they achieved high accuracy with AUC ranging from 0.84 to 0.89. Age was one of the important predictors. Similar to their results our machine leaning algorithms found similar significance for age in mortality prediction.
Cheng and colleagues used a random forest model to predict ICU admission within 24 hours among 1987, COVID-19 patients admitted to non-ICU units of a large hospital system in NY [17]. Their model had good specificity of 76.3% with an accuracy of 76.2% (95% CI: 74.6-77.7%). However, the population included majority of women and patients younger than 65. They used 9639 feature vectors with data from each day of non-ICU hospital stay. The final model in the study found strongest predictors of ICU admission to be respiratory rate and white blood cell count. Other features included markers of respiratory failure, systemic inflammation, shock, and renal failure. Paradoxically patients older than 65 years had lower ICU transfer rate despite high mortality. Compared to Cheng's study, our first algorithm predicted the need for MV during the pandemic. The boundaries for ICU were not clear during the surge in our hospital system with many patients admitted to progressive care units, which were equipped and staffed to function as ICU. Further, we included all ER patients at the time of evaluation and not admitted patients. Our aim was to develop a model using minimal features to practically predict, who is not at risk of MV to determine who may be safely discharged.
Yadaw and associates examined mortality predictors in a large cohort of 5,051 COVID-19 patients using XGBoost similar to our study [18]. Of the initial 20 features that were selected the consistent features in a SHAP model that showed reliable mortality prediction were minimum oxygen saturation, age, type of encounter, maximum body temperature, and use of hydroxychloroquine during treatment. Their model performed similarly to ours with an AUC of > 0.9. The findings in the Yadaw study are similar to ours and reiterates the robustness of ensemble-based ML classification algorithm. Although, we used XGBoost for MV prediction, we felt CatBoost was a superior technique for mortality prediction in its ability to handle categorical and numerical data.
We acknowledge out study had some limitations. Although, we had a large number of patients with several important features, our dataset for MV and mortality were unbalanced affecting the sensitivity of the results. Since our goal was to have a high accuracy and NPV in the ER, we kept the same sample. We did down sample the mortality data to balance the group with similar accuracy. We used admission vitals instead of time series data. Consistent with our decision on using admission vitals for disease stratification, Fernandes and colleagues used ML and natural language processing algorithm to predict ICU admission among patients presenting to ER [19]. As with our results in COVID-19 patients, they noted initial vitals including heart rate, oxygen saturation, RR and sBP to be highly correlated to ICU admission. We also did not use time-series data for mortality prognostication for patients in the ICU, although length of stay was one of the major predictive features for mortality in our model. In future studies, we will further strengthen the model by using time-series data.
Lastly, one very important point we want to highlight is contextual factors. Overall mortality in mortality prediction model was around 14%. At the time of this study state of Michigan had significant number of cases of COVID 19 compared to other states in United States. Science about COVID-19 is still not clearly understood but during early part of pandemic understanding about the disease was negligible. Crisis, fear amongst healthcare workers, lack of resources and lack of understanding might have contributed to higher need for mechanical ventilation and/or mortality. These contextual factors are very important while comparing data from one study to other. It is not possible to conclude meaningfully without putting these global factors in to consideration which might be limitation of our study as well as limitation of most of the studies we have cited.

Conclusion
Machine learning models using XGBoost for need for MV and catBoost for prediction of mortality amongst COVID-19 patients are accurate with high specificity and NPV. Simple factors like age and vitals can predict need for mechanical ventilation, thus helping ER physicians to decide the need for admission to hospital versus discharging patient home. Patients requiring mechanical ventilation, higher respiratory rate and BMI were amongst the top predictors for mortality.