Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity

  • David Goodman-Meza ,

    Contributed equally to this work with: David Goodman-Meza, Akos Rudas, Jeffrey N. Chiang

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    dgoodman@mednet.ucla.edu

    Affiliation Division of Infectious Diseases, David Geffen School of Medicine at UCLA, Los Angeles, California, United States of America

  • Akos Rudas ,

    Contributed equally to this work with: David Goodman-Meza, Akos Rudas, Jeffrey N. Chiang

    Roles Formal analysis

    Affiliations Department of Computational Medicine, UCLA, Los Angeles, California, United States of America, Faculty of Informatics, Eötvös Loránd University (ELTE), Budapest, Hungary

  • Jeffrey N. Chiang ,

    Contributed equally to this work with: David Goodman-Meza, Akos Rudas, Jeffrey N. Chiang

    Roles Formal analysis

    Affiliation Department of Computational Medicine, UCLA, Los Angeles, California, United States of America

  • Paul C. Adamson,

    Roles Conceptualization, Writing – review & editing

    Affiliation Division of Infectious Diseases, David Geffen School of Medicine at UCLA, Los Angeles, California, United States of America

  • Joseph Ebinger,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Cardiology, Cedars-Sinai Medical Center, Los Angeles, California, United States of America

  • Nancy Sun,

    Roles Data curation

    Affiliation Department of Cardiology, Cedars-Sinai Medical Center, Los Angeles, California, United States of America

  • Patrick Botting,

    Roles Data curation

    Affiliation Department of Cardiology, Cedars-Sinai Medical Center, Los Angeles, California, United States of America

  • Jennifer A. Fulcher,

    Roles Conceptualization, Writing – review & editing

    Affiliation Division of Infectious Diseases, David Geffen School of Medicine at UCLA, Los Angeles, California, United States of America

  • Faysal G. Saab,

    Roles Conceptualization, Writing – review & editing

    Affiliation Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California, United States of America

  • Rachel Brook,

    Roles Conceptualization, Writing – review & editing

    Affiliation Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California, United States of America

  • Eleazar Eskin,

    Roles Supervision, Writing – review & editing

    Affiliations Department of Computational Medicine, UCLA, Los Angeles, California, United States of America, Department of Computer Science, UCLA, Los Angeles, California, United States of America, Department of Human Genetics, UCLA, Los Angeles, California, United States of America

  • Ulzee An,

    Roles Formal analysis

    Affiliation Department of Computer Science, UCLA, Los Angeles, California, United States of America

  • Misagh Kordi,

    Roles Formal analysis, Software

    Affiliation Department of Computational Medicine, UCLA, Los Angeles, California, United States of America

  • Brandon Jew,

    Roles Formal analysis

    Affiliation Department of Computational Medicine, UCLA, Los Angeles, California, United States of America

  • Brunilda Balliu,

    Roles Formal analysis

    Affiliation Department of Computational Medicine, UCLA, Los Angeles, California, United States of America

  • Zeyuan Chen,

    Roles Formal analysis

    Affiliation Department of Computer Science, UCLA, Los Angeles, California, United States of America

  • Brian L. Hill,

    Roles Formal analysis

    Affiliation Department of Computer Science, UCLA, Los Angeles, California, United States of America

  • Elior Rahmani,

    Roles Formal analysis

    Affiliation Department of Computer Science, UCLA, Los Angeles, California, United States of America

  • Eran Halperin ,

    Roles Formal analysis, Methodology, Supervision, Writing – review & editing

    ‡ These authors jointly supervised the work.

    Affiliations Department of Computational Medicine, UCLA, Los Angeles, California, United States of America, Department of Computer Science, UCLA, Los Angeles, California, United States of America, Department of Human Genetics, UCLA, Los Angeles, California, United States of America, Department of Anesthesiology, David Geffen School of Medicine at UCLA, Los Angeles, California, United States of America

  •  [ ... ],
  • Vladimir Manuel

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    ‡ These authors jointly supervised the work.

    Affiliations Faculty Practice Group, David Geffen School of Medicine at UCLA, Los Angeles, California, United States of America, UCLA Clinical and Translational Science Institute, Los Angeles, California, United States of America

  • [ view all ]
  • [ view less ]

A machine learning algorithm to increase COVID-19 inpatient diagnostic capacity

  • David Goodman-Meza, 
  • Akos Rudas, 
  • Jeffrey N. Chiang, 
  • Paul C. Adamson, 
  • Joseph Ebinger, 
  • Nancy Sun, 
  • Patrick Botting, 
  • Jennifer A. Fulcher, 
  • Faysal G. Saab, 
  • Rachel Brook
PLOS
x

Abstract

Worldwide, testing capacity for SARS-CoV-2 is limited and bottlenecks in the scale up of polymerase chain reaction (PCR-based testing exist. Our aim was to develop and evaluate a machine learning algorithm to diagnose COVID-19 in the inpatient setting. The algorithm was based on basic demographic and laboratory features to serve as a screening tool at hospitals where testing is scarce or unavailable. We used retrospectively collected data from the UCLA Health System in Los Angeles, California. We included all emergency room or inpatient cases receiving SARS-CoV-2 PCR testing who also had a set of ancillary laboratory features (n = 1,455) between 1 March 2020 and 24 May 2020. We tested seven machine learning models and used a combination of those models for the final diagnostic classification. In the test set (n = 392), our combined model had an area under the receiver operator curve of 0.91 (95% confidence interval 0.87–0.96). The model achieved a sensitivity of 0.93 (95% CI 0.85–0.98), specificity of 0.64 (95% CI 0.58–0.69). We found that our machine learning algorithm had excellent diagnostic metrics compared to SARS-CoV-2 PCR. This ensemble machine learning algorithm to diagnose COVID-19 has the potential to be used as a screening tool in hospital settings where PCR testing is scarce or unavailable.

Introduction

Severe acute respiratory syndrome coronavirus-2 (SARS-CoV2) is a worldwide public health emergency [1, 2]. Polymerase chain reaction (PCR) testing for SARS-CoV-2 is critical to the public health response to coronavirus disease 2019 (COVID-19). PCR testing capacity is especially important in the hospital setting for clinical decision making and infection control procedures [3]. Yet, the inability to scale up testing has been one of the most discussed topics in both the scientific and popular literature [3, 4].

In many hospital settings, PCR testing capacity remains limited. Many PCR assays have short analysis time; however, many hospitals lack on-site PCR capabilities and are tasked with sending samples to centralized laboratories. Transport times and queues lengthen the turnaround time and results can be delayed up to 48 to 96 hours [57]. This wait time slows the clinical decision-making process and wastes scarce personal protective equipment.

Machine learning could help fill this gap. Ancillary laboratory values in blood samples of patients with COVID-19 demonstrate a distinct pattern to that of other diseases [3, 811]. These changes include elevations in inflammatory markers (ferritin, lactate dehydrogenase [LDH], C-reactive protein, among others) and decreases in certain blood cell counts (absolute lymphocyte count) and an increase in the neutrophil to lymphocyte ratio. Since the SARS-CoV-2 epidemic reached pandemic status, research groups developed prediction algorithms applicable to their particular context [1216]. One of the major limitations of these previous approaches is that the datasets that were used to train and test the approaches were small. Our aim was to develop a machine learning algorithm using the largest dataset to date, to serve as a COVID-19 diagnostic proxy to be useful in hospitals where SARS-CoV-2 specific PCR testing is unavailable or scarce. We hypothesized that a machine learning-based algorithm based on a parsimonious set of blood markers that include inflammatory markers could predict the presence or absence of COVID-19 with high sensitivity and potentially be used as a screening tool in clinical practice.

Methods

Study design

We used electronic health data from the UCLA Health System (Los Angeles, California, USA) to develop a machine learning algorithm to serve as a proxy to diagnose COVID-19 in the hospital setting. Our set of features were selected based on prior studies reporting a difference in these features between patients with and without COVID-19, and higher values in those with severe COVID-19 compared to mild COVID-19 [3, 811]. This study was deemed non-human-subjects research by the institutional review board (IRB) at UCLA as all analyses used de-identified data. We report our findings based on STARD-2015 guidelines [17].

Data sources

We retrospectively considered all cases that were tested for SARS-CoV-2 in the emergency room or inpatient setting within the UCLA Health System between 1 March 2020 and 24 May 2020. After constructing our initial pool of cases, we included only cases with complete blood counts and at least one inflammatory marker (C-reactive protein, ferritin, or LDH) within 48 hours of the sample collection for SARS-CoV2 PCR testing.

All data were extracted from the electronic medical record. Features included in the models were age, gender, hemoglobin, red blood cell count, absolute neutrophil, absolute lymphocyte, absolute eosinophil and absolute basophil counts, the neutrophil to lymphocyte ratio, C-reactive protein, ferritin, and LDH. Prior to entering the model, all features were normalized to have zero mean and unit standard deviation. The normalization parameters (e.g., mean and standard deviation) were computed using the training set, and the features in the test set were scaled using these values. After scaling, missing lab values were imputed with zero, effectively inserting the mean feature value from the training set. Mean imputation was determined appropriate after evaluating several imputation methods (K-nearest neighbor and Iterative Imputation), which did not result in significant improvements.

Gold standard

Diagnosis of SARS-CoV-2 was confirmed by PCR testing assays performed at the UCLA Microbiology Laboratory. These assays included the 2019-nCoV Real-Time (RT)-PCR Diagnostic Panel (CDC, Atlanta, GA), the Diasorin Simplexa COVID-19 Direct RT-PCR (Diasorin Molecular LLC, Cypress, CA), the TaqPath COVID-19 Combo Kit (Thermo Fisher Scientific Inc., Waltham, MA).

Machine learning analysis

We compared seven machine learning models: Random forest, logistic regression, support vector machine, multilayer perceptron (neural network), stochastic gradient descent, XGBoost, and ADABoost. An ensemble (combined) model was then created based on those seven individually trained machine learning models. The final classification as positive or negative was decided using the majority vote of the classifiers calculated by averaging their respective probabilities. The dataset was split 60% for training, 10% for validation, and 30% for testing. The discriminatory operating threshold was determined using a validation set held out from the training set and selected such that the sensitivity on the validation set would be above a predefined threshold of 0.95 by configuring the beta parameter of the F-score. The resulting model was then evaluated on the held-out test set using the following diagnostic metrics: area under the receiver operator curve (AUROC), area under the precision recall curve (AUPRC), sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). Confidence intervals were constructed for each metric using a bootstrapping procedure on the test set in which the test set was repeatedly resampled with replacement 1000 times. Feature importance was assessed using a permutation test on importance. To test the contribution of each feature to model performance, the feature values were randomly shuffled, thereby disrupting their correlations with the outcome, and the decrease in model performance (f1-score) was recorded. All machine learning analyses were performed using Python, making extensive use of the Scikit-learn package.

Results

Descriptive

In total, there were 3,444 cases who were tested for SARS-CoV-2 and considered in our analysis. After exclusion of patients who did not have the minimal necessary features to make predictions (a complete blood cell count and at least one inflammatory marker), 1455 cases remained (1273 negative and 182 positive cases) (see Fig 1). All cases were either from the emergency room or inpatient settings. Mean age was 58.1 (SD 22.3), 53% were men, 49% white, 24% Latino, and 29% immunosuppressed. See Table 1 for descriptive characteristics for included features by SARS-CoV-2 status.

thumbnail
Fig 1. Diagram of eligible, included and excluded cases, and diagnostic cross tabulation.

https://doi.org/10.1371/journal.pone.0239474.g001

Machine learning model: Diagnostic metrics

The AUROC of the model in the held-out test set (n = 392) was 0.91 (95% confidence interval [CI] 0.87–0.96) and the AUPRC was 0.76 (95% CI 0.66–0.83). The model achieved a sensitivity of 0.93 (95% CI 0.84–0.98), specificity of 0.64 (95% CI 0.59–0.69), NPV of 0.98 (95% CI 0.96–1.00), and PPV of 0.29 (95% CI 0.23–0.36). Receiver operator curves and precision-recall curves were presented in Fig 2. Using a feature importance analysis, we found that the features that provide most of the information to the model were: C-reactive protein and LDH (see Fig 3).

thumbnail
Fig 2. Performance of the model on the held-out test set (N = 392).

A) Receiver operator curve. B) Precision-recall curve. At a sensitivity-optimized operating threshold, sensitivity and specificity were 0.93 (95% CI 0.85–0.98) and 0.64 (95% CI 0.59–0.69), respectively. Red solid lines were the mean receiver operator curve and mean precision-recall curve, respectively; the purple shaded lines were the curves obtained from the bootstrapping procedure to calculate the 95% confidence intervals.

https://doi.org/10.1371/journal.pone.0239474.g002

thumbnail
Fig 3. Combined model feature importance.

Decrease in model performance (f1-score) after randomly shuffling the respective feature values. Higher values represent important features for classification. Abbreviations: LDH, lactate dehydrogenase; NLR, neutrophil to lymphocyte ratio; RBC, red blood cells.

https://doi.org/10.1371/journal.pone.0239474.g003

In sensitivity analyses, we calculated AUROC and AUPRC when adding the inflammatory features relative to the baseline model of only demographic characteristics and features of the complete blood cell count (see Fig 4). The AUROC of the model of the baseline model was 0.79 (95% CI 0.71–0.85). Then, we added the inflammatory markers to the model one at a time. With ferritin, the AUROC was 0.83 (95% CI 0.78–0.88); with C-reactive protein 0.86 (95% CI 0.79–0.92); with LDH, 0.87 (95% CI 0.82–0.92). The AUPRC of the baseline model was 0.50 (95% CI 0.36–0.65); with ferritin 0.56 (95% CI 0.45–0.68); with LDH, 0.66 (95% CI 0.55–0.77); with C-reactive protein 0.66 (95% CI 0.50–0.80). Through these analyses we observed that adding inflammatory markers, especially LDH, CRP, and the combination of the three resulted in statistically significant improvements relative to the baseline model.

thumbnail
Fig 4. Performance of models while removing one of the features.

All analyses were performed on the held-out test set (N = 392). A) Receiver operating curve. B) Precision-recall curve. Base model includes only demographic features and complete blood cell count. Abbreviations: CRP, C-reactive protein; LDH, lactate dehydrogenase.

https://doi.org/10.1371/journal.pone.0239474.g004

Discussion

This is the largest study to date using a machine learning algorithm as a proxy to diagnose COVID-19. We built the algorithm based on a set of basic demographic characteristics and frequently obtained blood biomarkers that could be easily obtained in many hospital settings. Thus, the most likely application of the approach presented in this work is the use of these biomarkers as a proxy for testing in locations where COVID-19 testing is scarce. We showed a high sensitivity for COVID-19 diagnosis when compared to SARS-CoV-2 RT PCR testing as the gold standard. The blood biomarkers included in the model can be obtained with a single blood draw and turnaround time is typically within 24 hours at most hospital centers with laboratory capabilities. Due to the model’s high sensitivity and rapid turnaround time, the proposed algorithm lends itself to practical use in hospital facilities as a screening tool. At the time of submission, this model was being actively developed into a web or mobile application, whereby a clinician inputs the obtained values and receives immediate prediction on the probability of a particular patient having COVID-19. Further validation will be required to ascertain its performance in other medical centers.

Our set of features performed as well as, or better than, the three diagnostic algorithms with the largest number of cases known to us at this time [12, 13, 16]. A report by Sun et al. used epidemiologic, clinical, laboratory and imaging features in their algorithms and reported AUROCs of 0.91 (full model), 0.88 (without epidemiologic features), 0.88 (without imaging features), and 0.65 (with clinical features alone) [12]. They used features from a complete blood cell count and from a basic chemistry panel (sodium and creatinine), whereas, we used inflammatory markers (ferritin, C-reactive protein, LDH) instead of sodium, potassium, and creatinine as we did not suspect significant differences a priori in sodium, potassium, or creatinine. Meng et al reported an AUROC of 0.89 using a different set of features that included activated partial thromboplastin time, triglycerides, uric acid, albumin/globulin, sodium, and calcium [16]. Batista et al. developed an algorithm aimed for use in lower resource settings and reported an AUROC of 0.87 in a sparser dataset that only included basic demographics and complete blood cell counts [13]. In fact, our model which incorporated inflammatory markers significantly improved upon this set of features in terms of both AUROC and AUPRC. For a full comparison of diagnostic algorithms related to COVID-19 we refer the reader to [15]—a living systematic review.

Our findings should be considered in light of the following limitations. We included data from one medical center in Los Angeles. Incorporating data from other medical centers in other geographic areas would provide a higher likelihood of generalizability. Second, although many of our patients either had immunosuppressive conditions (e.g., solid organ transplants) or were taking immunosuppressive medications (e.g., steroids), immunosuppressed hosts are a heterogenous group and their immunosuppression may impact the laboratory values we used in our models. We would need more cases with those conditions to understand how the algorithm would perform in these populations. It is likely that specific models tailored to the immunocompromised host should be developed to improve accuracy in this population. Third, it is also possible that other community respiratory viral infections (e.g., influenza, RSV) could cause a similar laboratory profile; however, incidence of these other community respiratory viruses was low during the case inclusion period. Further validation comparing COVID-19 cases to cases of other community respiratory viruses is needed. Finally, as all of our patients’ blood was tested in the emergency department or as an inpatient, the applicability of this model in the outpatient setting or milder cases of COVID-19 is unclear.Our report, in combination with others [12, 13, 15, 16], demonstrate the high diagnostic accuracy of machine learning models based on early available data. Other models have also been developed based on characteristic imaging changes [15]. We and others were able to demonstrate impressive results in our data silos [1215]. Yet, to realize the full potential of machine learning and its applicability to clinical medicine, collaborations from the international community are crucial, both for the sharing of data and for the development and validation of advanced algorithms. It is unclear if testing capacity for active disease using PCR-based methods will ever meet the expanding need globally. In fact, countries in low-resource settings, such as in Sub-Saharan Africa or Latin America, face bottlenecks in the testing supply chain, and are unable to compete with affluent nations for prohibitively expensive PCR test kits. Even in developed nations, scale up of PCR-based testing has many bottlenecks that include purchase of new testing platforms, sample acquisition, availability of reagents, swabs and transport media, and the technical human expertise in performing PCR tests.

In summary, by using readily available laboratory tests combined with machine learning we achieved a high sensitivity comparable to that of PCR. This machine learning modality may be especially useful as a screening test in smaller medical centers or those in resource-poor regions that may have limited capacity for COVID-19 PCR-based diagnosis, or in instances were testing capacity is in danger due to low supplies. Further validation is necessary in diverse geographic settings and in a prospective manor to be used is a reliable tool to support clinical decision making.

References

  1. 1. Team CC-R. Geographic Differences in COVID-19 Cases, Deaths, and Incidence—United States, February 12-April 7, 2020. MMWR Morb Mortal Wkly Rep. 2020;69(15):465–71. Epub 2020/04/17. pmid:32298250.
  2. 2. Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020. Epub 2020/02/23. pmid:32087114; PubMed Central PMCID: PMC7159018.
  3. 3. Cheng MP, Papenburg J, Desjardins M, Kanjilal S, Quach C, Libman M, et al. Diagnostic Testing for Severe Acute Respiratory Syndrome–Related Coronavirus 2. Annals of Internal Medicine. 2020;172(11):726–34. pmid:32282894
  4. 4. Cheng MP, Papenburg J, Desjardins M, Kanjilal S, Quach C, Libman M, et al. Diagnostic Testing for Severe Acute Respiratory Syndrome-Related Coronavirus-2: A Narrative Review. Ann Intern Med. 2020. pmid:32282894; PubMed Central PMCID: PMC7170415.
  5. 5. Ward S, Lindsley A, Courter J, Assa'ad A. Clinical Testing For Covid-19. J Allergy Clin Immunol. 2020. Epub 2020/05/24. pmid:32445839; PubMed Central PMCID: PMC7237919.
  6. 6. Sheridan C. Coronavirus and the race to distribute reliable diagnostics. Nature Biotechnology. 2020;38(4):382–4. pmid:32265548
  7. 7. Service R. The standard coronavirus test, if available, works well—but can new diagnostics help in this pandemic? Science. 2020.
  8. 8. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395(10229):1054–62. Epub 2020/03/15. pmid:32171076.
  9. 9. Wang D, Hu B, Hu C, Zhu F, Liu X, Zhang J, et al. Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China. JAMA. 2020. pmid:32031570; PubMed Central PMCID: PMC7042881.
  10. 10. Guan WJ, Ni ZY, Hu Y, Liang WH, Ou CQ, He JX, et al. Clinical Characteristics of Coronavirus Disease 2019 in China. N Engl J Med. 2020;382(18):1708–20. Epub 2020/02/29. pmid:32109013; PubMed Central PMCID: PMC7092819.
  11. 11. Petrilli CM, Jones SA, Yang J, Rajagopalan H, O'Donnell LF, Chernyak Y, et al. Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease in New York City. medRxiv. 2020:2020.04.08.20057794.
  12. 12. Sun Y, Koh V, Marimuthu K, Ng OT, Young B, Vasoo S, et al. Epidemiological and Clinical Predictors of COVID-19. Clin Infect Dis. 2020. Epub 2020/03/27. pmid:32211755.
  13. 13. Batista AFdM, Miraglia JL, Donato THR, Chiavegatto Filho ADP. COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv. 2020:2020.04.04.20052092.
  14. 14. Feng CH Zhi; Wang Lili; Chen Xin; Zhai Yongzhi; Zhu Feng; Chen Hua; et al. A Novel Triage Tool of Artificial Intelligence Assisted Diagnosis Aid System for Suspected COVID-19 Pneumonia in Fever Clinics. SSRN. 2020. Epub 3/8/2020.
  15. 15. Wynants L, Van Calster B, Bonten MMJ, Collins GS, Debray TPA, De Vos M, et al. Prediction models for diagnosis and prognosis of covid-19 infection: systematic review and critical appraisal. Bmj. 2020;369:m1328. Epub 2020/04/09. pmid:32265220.
  16. 16. Meng Z, Wang M, Song H, Guo S, Zhou Y, Li W, et al. Development and utilization of an intelligent application for aiding COVID-19 diagnosis. medRxiv. 2020:2020.03.18.20035816.
  17. 17. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Bmj. 2015;351:h5527. Epub 2015/10/30. pmid:26511519; PubMed Central PMCID: PMC4623764