Empirical data on conditions that increase risk of coronavirus disease 2019 (COVID-19) progression are needed to identify high risk individuals. We performed a comprehensive quantitative assessment of pre-existing clinical phenotypes associated with COVID-19-related hospitalization.
Phenome-wide association study (PheWAS) of SARS-CoV-2-positive patients from an integrated health system (Geisinger) with system-level outpatient/inpatient COVID-19 testing capacity and retrospective electronic health record (EHR) data to assess pre-COVID-19 pandemic clinical phenotypes associated with hospital admission (hospitalization).
Of 12,971 individuals tested for SARS-CoV-2 with sufficient pre-COVID-19 pandemic EHR data at Geisinger, 1604 were SARS-CoV-2 positive and 354 required hospitalization. We identified 21 clinical phenotypes in 5 disease categories meeting phenome-wide significance (P<1.60x10-4), including: six kidney phenotypes, e.g. end stage renal disease or stage 5 CKD (OR = 11.07, p = 1.96x10-8), six cardiovascular phenotypes, e.g. congestive heart failure (OR = 3.8, p = 3.24x10-5), five respiratory phenotypes, e.g. chronic airway obstruction (OR = 2.54, p = 3.71x10-5), and three metabolic phenotypes, e.g. type 2 diabetes (OR = 1.80, p = 7.51x10-5). Additional analyses defining CKD based on estimated glomerular filtration rate, confirmed high risk of hospitalization associated with pre-existing stage 4 CKD (OR 2.90, 95% CI: 1.47, 5.74), stage 5 CKD/dialysis (OR 8.83, 95% CI: 2.76, 28.27), and kidney transplant (OR 14.98, 95% CI: 2.77, 80.8) but not stage 3 CKD (OR 1.03, 95% CI: 0.71, 1.48).
Citation: Oetjens MT, Luo JZ, Chang A, Leader JB, Hartzel DN, Moore BS, et al. (2020) Electronic health record analysis identifies kidney disease as the leading risk factor for hospitalization in confirmed COVID-19 patients. PLoS ONE 15(11): e0242182. https://doi.org/10.1371/journal.pone.0242182
Editor: Harald Mischak, University of Glasgow, UNITED KINGDOM
Received: August 21, 2020; Accepted: October 28, 2020; Published: November 12, 2020
Copyright: © 2020 Oetjens et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting information files.
Funding: This work was supported by GM111913 from the NIH-NIGMS (NIH.GOV) to T.M. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Coronavirus disease 2019 (COVID-19) is an emerging illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. COVID-19 was declared a pandemic by the World Health Organization in March 2020. The United States reported the first case on January 22, 2020; by October 12th, there were >7,740,000 total cases and >214,000 deaths (cdc.gov). The severity of COVID-19 illness is variable, ranging from asymptomatic  to severe complications that require hospitalization . Several pre-existing conditions have been identified as risk factors for COVID-19-related hospitalization and death [3, 4]. A recent study developed a risk score that predicted progression to intensive care in hospitalized patients based on present and preexisting risk factors (e.g. chest radiographic abnormality, hemoptysis, dyspnea, history of cancer and other comorbidities) . Comprehensive quantitative data on the contribution of pre-existing conditions to COVID-19 disease severity are still needed. We applied an agnostic cross-disease approach  to data captured in the patient’s electronic health record (EHR) of SARS-COV-2-positive patients to identify associations between pre-existing conditions and COVID-19-related hospitalization.
This study was conducted at Geisinger, an integrated health system in central and northeastern Pennsylvania . This study was reviewed and approved by the Geisinger Institutional Review Board. This analysis includes patients with a laboratory confirmed diagnosis of COVID-19 reported between March 7, 2020 and May 19, 2020. All patients displayed symptoms that met CDC screening criteria for COVID-19 at the time of testing.
International Classification of Diseases Ninth (ICD-9) and Tenth (ICD-10) revision disease diagnosis codes and the last outpatient serum creatinine value were extracted from patients’ EHR dated prior to January 1st, 2020. Potential risk factor phenotypes were defined by PheCodes mapped from ICD codes using PheCodes Map 1.2  (https://phewascatalog.org/phecodes). For each individual, duplicate PheCode occurrences on the same date were dropped such that only one occurrence per date for a given PheCode remained. Cases for a phenotype were defined as having at least three occurrences of the PheCode; individuals with one or two occurrences were excluded from analysis of the phenotype, and the remaining individuals were classified as controls. To ensure that individuals in the study were adequately assessed for clinical history during clinical care, we restricted the analyses to individuals who were cases for at least one phenotype, which denotes that they have been clinically assessed on at least three distinct occasions. Our analysis required at least 20 cases and 20 controls for each phenotype among the 1,604 SARS-CoV-2 positive subjects, resulting in 313 distinct phenotypes. S1 Fig shows a flow diagram of the study design. The ICD code terminology used for the PheWAS data reflects the codes utilized by the PheCode Map 1.2 exactly.
We conducted additional analyses to further explore the relationship between kidney diseases and risk of COVID-19 hospitalization. We used estimated glomerular filtration rate (eGFR), calculated by the CKD Epidemiology Collaboration equation, data up until August 2018 from the United States Renal Data System (USRDS  https://www.usrds.org/2018,), and ICD codes to categorize patients into 1 of 5 groups: 1) eGFR ≥ 60 ml/min/1.73m2 without kidney transplant; 2) eGFR 30–59 ml/min/1.73m2 without kidney transplant; 3) eGFR 15–29 ml/min/1.73m2 without kidney transplant; 4) eGFR <15 ml/min/1.73m2 or on dialysis; 5) kidney transplant with eGFR ≥ 15 ml/min/1.73m2.
A phenome-wide association study (PheWAS) was performed to identify pre-existing conditions associated with hospitalization of patients with SARS-COV-2 infection. Tests were performed with Firth’s logistic regression  adjusted for age, sex and race:
Odds ratios (ORs) indicate the relative odds of COVID-19 related hospital admission given the presence of a pre-existing phenotype. We defined phenome-wide significance using a Bonferroni corrected p-value for the number of clinical PheCodes tested (p<0.05/313 = 1.60x10-4).
Of 18,372 individuals tested for SARS-CoV-2 at Geisinger between March 7, 2020 and May 19, 2020; 15,707 tested negative, 2,665 tested positive, and 565 were admitted to the hospital. Among the total number tested, 12,971 met inclusion criteria for PheWAS analysis (Methods). Of the 12,971 SARS-CoV-2 tested patients used in PheWAS, 1,604 were positive for SARS-CoV-2 of whom 354 (22.1%) were admitted to the hospital (Table 1; demographics). Admitted patients were more likely to be older and male (p < 0.0001, Table 1). Of the 354 hospitalized patients, 106 were admitted to the ICU, 70 required ventilation, 71 died, and 54 remained hospitalized as of May 19, 2020.
We performed a PheWAS analysis to test for associations between COVID-19 related hospital admission and 313 clinical phenotypes (Fig 1; Table 2). Phenotypes that reached phenome-wide significance (p < 1.60x10-4) fell into five disease categories: renal, cardiovascular, endocrine/metabolic, respiratory, and hematopoietic. The most significant associations (smallest p value and largest OR) were related to disorders of renal function, including chronic kidney disease (unspecified stage) (OR = 3.43, 95% CI [2.36,5], p = 1.33 x 10−10), end stage renal disease or stage 5 CKD (OR = 11.07, 95% CI [4.54,26.97], p = 1.96 x 10−8), stage III chronic kidney disease, (OR = 2.68, 95% CI [1.76,4.06], p = 4.74 x 10−6) and acute renal failure (OR = 3.26, 95% CI [1.89,5.62], p = 3.08 x 10−5). Six disorders in the cardiovascular disease category, including nonhypertensive congestive heart failure (OR = 3.35, 95% CI [2.16,5.2], p = 8.13 x 10−8), and peripheral vascular disease (OR = 3.25, 95% CI [1.84,5.71], p = 6.37 x 10−5) reached phenome-wide significance. Type 2 diabetes (OR = 1.8, 95% CI [1.35,2.41], p = 7.51 x 10−5) was among three disorders in the endocrine/metabolic disease category that reached phenome-wide significance. Within the respiratory disease category, 5 conditions were significant, including chronic airway obstruction (OR = 2.54, 95% CI [1.65,3.93], p = 3.71 x 10−5), pneumonia (OR = 3.17, 95% CI [1.89,5.33], p = 2.48 x 10−5), and chronic bronchitis (OR = 5.9, 95% CI [2.58,13.48], p = 3.26 x 10−5). Lastly, we identified a single hematopoietic association with anemia of chronic disease (OR = 4.86, 95% CI [2.33,10.15], p = 4.36 x 10−5). A list of all 313 conditions tested in PheWAS is shown in S1 Table.
Using a minimum case count of 20, we identified 313 clinical phenotypes, from PheCode Map 1.2, that could be used for these association studies. Dashed line denotes the Bonferroni significance (1.60X10-4).
In addition to PheWAS findings, we also used several algorithms that have been extensively validated to define disease phenotypes using EHR data (Table 1, S1 Methods). We observed significantly higher frequencies of several of these phenotypes in hospitalized patients; chronic kidney disease was most strongly associated with hospitalization (S2 Fig). In analyses using eGFR data and USRDS data, stage 4 CKD (OR 2.90, 95% CI: 1.47, 5.74), and stage 5 CKD/dialysis (OR 8.83, 95% CI: 2.76m 28.27) were associated with increased risk of COVID-19 hospitalization whereas stage 3 CKD was not (OR 1.03, 95% CI: 0.71, 1.48). Five (71%) out of 7 patients with history of kidney transplant were hospitalized (OR 14.98, 95% CI: 2.77, 80.88). Among 565 hospitalized patients, which included those who were not included in the PheWAS due to limited EHR data, 122 had some history of CKD. The metabolic burden among these 122 patients was higher than those without CKD. These patients were typically older and had significantly higher rates of death (Table 3).
The outbreak of COVID-19 has spurred unprecedented efforts to characterize biological and clinical aspects of the disease [10, 11]. COVID-19 is the first pandemic in the digital health age, which has allowed rapid epidemiologic studies [1, 13]. Data from government agencies such as Centers for Medicare & Medicaid Services (cms.gov/covid-19-data-snapshot-fact-sheet), encompass large cohorts but are mostly snapshots that lack granular data and rely heavily on claims and provider supplied data. Here, we used data from an integrated health system with outpatient and inpatient COVID-19 testing capacity and utilized a PheWAS study design to conduct a comprehensive analysis of clinical phenotypes associated with increased risk of COVID-19 related hospital admission. To control for potential bias related to exposure to the SARS-CoV-2 virus, we limited our study population to SARS-CoV-2 positive patients screened at Geisinger. Additional analyses using eGFR and USRDS data confirmed our findings that patients with stage 4–5 CKD, ESRD on dialysis or with kidney transplant are at extremely high risk for severe complications due to COVID-19 (Table 4). These findings complement findings from the OPEN Safely study, which found similar results but was limited by including clinically suspected (non laboratory confirmed) COVID-19 . As well as the CMS reports showing higher risk of hospitalization among ESKD patients (not adjusted for covariates). Our finding of high risk of hospitalization in kidney transplant patients mirrors that of a case series of 36 consecutive kidney transplant patients at Montefiore where 28/36 (78%) were hospitalized . The findings reported here identify co-morbidities that impact the clinical course of COVID-19 and may be used to identify individuals at greatest risk for COVID-19-related complications.
The majority of conditions associated with increased risk of COVID-19 related hospital admission have been suggested in previous studies, including diabetes, heart failure, hypertension, and chronic kidney disease [14–16]. What is striking from our results is the magnitude of the kidney disease-related risk. Patients with end-stage renal disease were at 11-fold increased odds of hospitalization (Table 2). How clinical conditions increase the risk of COVID-19-related complications is not fully clear yet. The physiological stress caused by excessive inflammatory response to SARS-COV-2 infection could destabilize organs already weakened by chronic disease . Alternatively, direct organ-specific injury from SARS-CoV-2 infection could act as a “second-hit” to these organs. Consistent with this hypothesis, kidney and heart are among the tissues with the highest expression of ACE2, a SARS-CoV-2 receptor .
The current study has several limitations. The sample size is relatively small, and the available data are limited to information captured in the EHR. Nevertheless, we were able to identify several highly significant traits associated with hospitalization, many of which are consistent with previous reports. The study population is subject to potential bias resulting from the availability of testing, which largely excluded asymptomatic individuals and an enrichment of individuals from nursing homes and healthcare workers . To partially overcome this, we included in our analyses individuals who were tested in a single health system. This population is predominantly Caucasian, which may limit the generalization of our findings to other racial and ethnic groups. A recent study indicated that hospitalization rates may differ between racial and ethnic groups with COVID-19 .
In conclusion, this study leverages extensive longitudinal EHR data prior to the COVID-19 pandemic to identify pre-existing clinical phenotypes associated with increased risk of COVID-19 hospitalization. These results provide key information for public policymakers highlighting the need to prevent COVID-19 related illness in patients with kidney disease and other high-risk conditions.
S1 Table. A list of all 313 conditions tested in PheWAS.
S2 Fig. Prevalence of validated disease phenotypes using EHR data among the total EHR population, all those tested for COVID-19, those who tested negative for COVID-19, COVID-19(+) individuals not needing admission and hospitalized for COVID-19(+) individuals.
Data reported here have been supplied by the United States Renal Data System (USRDS). The interpretation and reporting of these data are the responsibility of the author(s) and in no way should be seen as an official policy or interpretation of the U.S. government.
- 1. Gandhi RT, Lynch JB, Del Rio C. Mild or Moderate Covid-19. N Engl J Med. 2020. pmid:32329974
- 2. Arentz M, Yim E, Klaff L, Lokhandwala S, Riedo FX, Chong M, et al. Characteristics and Outcomes of 21 Critically Ill Patients With COVID-19 in Washington State. JAMA. 2020. pmid:32191259
- 3. Kim L, Garg S, O’Halloran A, Whitaker M, Pham H, Anderson EJ, et al. Risk Factors for Intensive Care Unit Admission and In-hospital Mortality among Hospitalized Adults Identified through the U.S. Coronavirus Disease 2019 (COVID-19)-Associated Hospitalization Surveillance Network (COVID-NET). Clin Infect Dis. 2020.
- 4. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584: 430–436. pmid:32640463
- 5. Liang W, Liang H, Ou L, Chen B, Chen A, Li C, et al. Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19. JAMA Intern Med. 2020. pmid:32396163
- 6. Bush WS, Oetjens MT, Crawford DC. Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet. 2016;17: 129–145. pmid:26875678
- 7. Carey DJ, Fetterolf SN, Davis FD, Faucett WA, Kirchner HL, Mirshahi U, et al. The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med. 2016;18: 906–913. pmid:26866580
- 8. Carroll RJ, Bastarache L, Denny JC. R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics. 2014;30: 2375–2376. pmid:24733291
- 9. Saran R, Robinson B, Abbott KC, Agodoa , Bragg-Gresham J, Balkrishnan R, et al. US Renal Data System 2018 Annual Data Report: Epidemiology of Kidney Disease in the United States. Am J Kidney Dis. 2019;73: A7–A8. pmid:30798791
- 10. Hamer M, Kivimaki M, Gale CR, Batty GD. Lifestyle risk factors, inflammatory mechanisms, and COVID-19 hospitalization: A community-based cohort study of 387,109 adults in UK. Brain Behav Immun. 2020;87: 184–187. pmid:32454138
- 11. Hamer M, Gale CR, Kivimaki M, Batty GD. Overweight, obesity, and risk of hospitalization for COVID-19: A community-based cohort study of adults in the United Kingdom. Proc Natl Acad Sci U S A. 2020;117: 21011–21013. pmid:32788355
- 12. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, et al. OpenSAFELY: factors associated with COVID-19 death in 17 million patients. Nature. 2020.
- 13. Akalin E, Azzi Y, Bartash R, Seethamraju H, Parides M, Hemmige V, et al. Covid-19 and Kidney Transplantation. N Engl J Med. 2020;382: 2475–2477. pmid:32329975
- 14. Ng JH, Hirsch JS, Wanchoo R, Sachdeva M, Sakhiya V, Hong S, et al. Outcomes of patients with end-stage kidney disease hospitalized with COVID-19. Kidney Int. 2020.
- 15. Caillard S, Anglicheau D, Matignon M, Durrbach A, Greze C, Frimat L, et al. An initial report from the French SOT COVID Registry suggests high mortality due to Covid-19 in recipients of kidney transplants. Kidney Int. 2020. pmid:32853631
- 16. Petrilli CM, Jones SA, Yang J, Rajagopalan H, O’Donnell L, Chernyak Y, et al. Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease 2019 in New York City: prospective cohort study. BMJ. 2020;369: m1966. pmid:32444366
- 17. Jose RJ, Manuel A. COVID-19 cytokine storm: the interplay between inflammation and coagulation. Lancet Respir Med. 2020;8: e46–e47. pmid:32353251
- 18. Li MY, Li L, Zhang Y, Wang XS. Expression of the SARS-CoV-2 cell receptor gene ACE2 in a wide variety of human tissues. Infect Dis Poverty. 2020;9: 45–x. pmid:32345362
- 19. Griffith G, Morris TT, Tudball M, Herbert A, Mancano G, Pike L, et al. Collider bias undermines our understanding of COVID-19 disease risk and severity. medRxiv. 2020: 2020.05.04.20090506.
- 20. Price-Haywood EG, Burton J, Fort D, Seoane L. Hospitalization and Mortality among Black Patients and White Patients with Covid-19. N Engl J Med. 2020;382: 2534–2543. pmid:32459916