Electronic health record analysis identifies kidney disease as the leading risk factor for hospitalization in confirmed COVID-19 patients

Background Empirical data on conditions that increase risk of coronavirus disease 2019 (COVID-19) progression are needed to identify high risk individuals. We performed a comprehensive quantitative assessment of pre-existing clinical phenotypes associated with COVID-19-related hospitalization. Methods Phenome-wide association study (PheWAS) of SARS-CoV-2-positive patients from an integrated health system (Geisinger) with system-level outpatient/inpatient COVID-19 testing capacity and retrospective electronic health record (EHR) data to assess pre-COVID-19 pandemic clinical phenotypes associated with hospital admission (hospitalization). Results Of 12,971 individuals tested for SARS-CoV-2 with sufficient pre-COVID-19 pandemic EHR data at Geisinger, 1604 were SARS-CoV-2 positive and 354 required hospitalization. We identified 21 clinical phenotypes in 5 disease categories meeting phenome-wide significance (P<1.60x10-4), including: six kidney phenotypes, e.g. end stage renal disease or stage 5 CKD (OR = 11.07, p = 1.96x10-8), six cardiovascular phenotypes, e.g. congestive heart failure (OR = 3.8, p = 3.24x10-5), five respiratory phenotypes, e.g. chronic airway obstruction (OR = 2.54, p = 3.71x10-5), and three metabolic phenotypes, e.g. type 2 diabetes (OR = 1.80, p = 7.51x10-5). Additional analyses defining CKD based on estimated glomerular filtration rate, confirmed high risk of hospitalization associated with pre-existing stage 4 CKD (OR 2.90, 95% CI: 1.47, 5.74), stage 5 CKD/dialysis (OR 8.83, 95% CI: 2.76, 28.27), and kidney transplant (OR 14.98, 95% CI: 2.77, 80.8) but not stage 3 CKD (OR 1.03, 95% CI: 0.71, 1.48). Conclusions This study provides quantitative estimates of the contribution of pre-existing clinical phenotypes to COVID-19 hospitalization and highlights kidney disorders as the strongest factors associated with hospitalization in an integrated US healthcare system.


Introduction
Coronavirus disease 2019 (COVID- 19) is an emerging illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. COVID-19 was declared a pandemic by the World Health Organization in March 2020. The United States reported the first case on January 22, 2020; by October 12 th , there were >7,740,000 total cases and >214,000 deaths (cdc.gov). The severity of COVID-19 illness is variable, ranging from asymptomatic [1] to severe complications that require hospitalization [2]. Several pre-existing conditions have been identified as risk factors for COVID-19-related hospitalization and death [3,4]. A recent study developed a risk score that predicted progression to intensive care in hospitalized patients based on present and preexisting risk factors (e.g. chest radiographic abnormality, hemoptysis, dyspnea, history of cancer and other comorbidities) [5]. Comprehensive quantitative data on the contribution of pre-existing conditions to COVID-19 disease severity are still needed. We applied an agnostic cross-disease approach [6] to data captured in the patient's electronic health record (EHR) of SARS-COV-2-positive patients to identify associations between preexisting conditions and COVID-19-related hospitalization.

Methods
This study was conducted at Geisinger, an integrated health system in central and northeastern Pennsylvania [7]. This study was reviewed and approved by the Geisinger Institutional Review Board. This analysis includes patients with a laboratory confirmed diagnosis of COVID-19 reported between March 7, 2020 and May 19, 2020. All patients displayed symptoms that met CDC screening criteria for COVID-19 at the time of testing.
International Classification of Diseases Ninth (ICD-9) and Tenth (ICD-10) revision disease diagnosis codes and the last outpatient serum creatinine value were extracted from patients' EHR dated prior to January 1st, 2020. Potential risk factor phenotypes were defined by Phe-Codes mapped from ICD codes using PheCodes Map 1.2 [8] (https://phewascatalog.org/ phecodes). For each individual, duplicate PheCode occurrences on the same date were dropped such that only one occurrence per date for a given PheCode remained. Cases for a phenotype were defined as having at least three occurrences of the PheCode; individuals with one or two occurrences were excluded from analysis of the phenotype, and the remaining individuals were classified as controls. To ensure that individuals in the study were adequately assessed for clinical history during clinical care, we restricted the analyses to individuals who were cases for at least one phenotype, which denotes that they have been clinically assessed on at least three distinct occasions. Our analysis required at least 20 cases and 20 controls for each phenotype among the 1,604 SARS-CoV-2 positive subjects, resulting in 313 distinct phenotypes. S1 Fig shows a flow diagram of the study design. The ICD code terminology used for the PheWAS data reflects the codes utilized by the PheCode Map 1.2 exactly.

Statistics
A phenome-wide association study (PheWAS) was performed to identify pre-existing conditions associated with hospitalization of patients with SARS-COV-2 infection. Tests were performed with Firth's logistic regression [8] adjusted for age, sex and race: Odds ratios (ORs) indicate the relative odds of COVID-19 related hospital admission given the presence of a pre-existing phenotype. We defined phenome-wide significance using a Bonferroni corrected p-value for the number of clinical PheCodes tested (p<0.05/313 = 1.60x10 -4 ).

Results
Of 18,372 individuals tested for SARS-CoV-2 at Geisinger between March 7, 2020 and May 19, 2020; 15,707 tested negative, 2,665 tested positive, and 565 were admitted to the hospital. Among the total number tested, 12,971 met inclusion criteria for PheWAS analysis (Methods). Of the 12,971 SARS-CoV-2 tested patients used in PheWAS, 1,604 were positive for SARS-CoV-2 of whom 354 (22.1%) were admitted to the hospital (Table 1; demographics). Admitted patients were more likely to be older and male (p < 0.0001, Table 1). Of the 354 hospitalized patients, 106 were admitted to the ICU, 70 required ventilation, 71 died, and 54 remained hospitalized as of May 19, 2020.
We performed a PheWAS analysis to test for associations between COVID-19 related hospital admission and 313 clinical phenotypes (Fig 1; Table 2). Phenotypes that reached phenome-wide significance (p < 1.60x10 -4 ) fell into five disease categories: renal, cardiovascular,  Table. In addition to PheWAS findings, we also used several algorithms that have been extensively validated to define disease phenotypes using EHR data ( Table 1, S1 Methods). We observed significantly higher frequencies of several of these phenotypes in hospitalized patients; chronic kidney disease was most strongly associated with hospitalization (S2 Fig). In analyses using eGFR data and USRDS data, stage 4 CKD (OR 2.90, 95% CI: 1.47, 5.74), and stage 5 CKD/dialysis (OR 8.83, 95% CI: 2.76m 28.27) were associated with increased risk of COVID-19 hospitalization whereas stage 3 CKD was not (OR 1.03, 95% CI: 0.71, 1.48). Five (71%) out of 7 patients with history of kidney transplant were hospitalized (OR 14.98, 95% CI: 2.77, 80.88). Among 565 hospitalized patients, which included those who were not included in the PheWAS due to limited EHR data, 122 had some history of CKD. The metabolic burden among these 122 patients was higher than those without CKD. These patients were typically older and had significantly higher rates of death (Table 3).

Discussion
The outbreak of COVID-19 has spurred unprecedented efforts to characterize biological and clinical aspects of the disease [10,11]. COVID-19 is the first pandemic in the digital health age, which has allowed rapid epidemiologic studies [1,13]. Data from government agencies such as Centers for Medicare & Medicaid Services (cms.gov/covid-19-data-snapshot-fact-sheet), encompass large cohorts but are mostly snapshots that lack granular data and rely heavily on claims and provider supplied data. Here, we used data from an integrated health system with outpatient and inpatient COVID-19 testing capacity and utilized a PheWAS study design to conduct a comprehensive analysis of clinical phenotypes associated with increased risk of COVID-19 related hospital admission. To control for potential bias related to exposure to the SARS-CoV-2 virus, we limited our study population to SARS-CoV-2 positive patients screened at Geisinger. Additional analyses using eGFR and USRDS data confirmed our findings that patients with stage 4-5 CKD, ESRD on dialysis or with kidney transplant are at extremely high risk for severe complications due to COVID-19 (Table 4). These findings complement findings from the OPEN Safely study, which found similar results but was limited by including clinically suspected (non laboratory confirmed) COVID-19 [12]. As well as the CMS reports showing higher risk of hospitalization among ESKD patients (not adjusted for covariates). Our finding of high risk of hospitalization in kidney transplant patients mirrors that of a case series of 36 consecutive kidney transplant patients at Montefiore where 28/36 (78%) were hospitalized [13]. The findings reported here identify co-morbidities that impact the clinical course of COVID-19 and may be used to identify individuals at greatest risk for COVID-19-related complications.
The majority of conditions associated with increased risk of COVID-19 related hospital admission have been suggested in previous studies, including diabetes, heart failure, hypertension, and chronic kidney disease [14][15][16]. What is striking from our results is the magnitude of the kidney disease-related risk. Patients with end-stage renal disease were at 11-fold increased odds of hospitalization (Table 2). How clinical conditions increase the risk of COVID-19-related complications is not fully clear yet. The physiological stress caused by excessive inflammatory response to SARS-COV-2 infection could destabilize organs already weakened by chronic disease [17]. Alternatively, direct organ-specific injury from SARS-CoV-2 infection could act as a "second-hit" to these organs. Consistent with this hypothesis, kidney and heart are among the tissues with the highest expression of ACE2, a SARS-CoV-2 receptor [18]. The current study has several limitations. The sample size is relatively small, and the available data are limited to information captured in the EHR. Nevertheless, we were able to identify several highly significant traits associated with hospitalization, many of which are consistent with previous reports. The study population is subject to potential bias resulting from the availability of testing, which largely excluded asymptomatic individuals and an enrichment of individuals from nursing homes and healthcare workers [19]. To partially overcome this, we included in our analyses individuals who were tested in a single health system. This population is predominantly Caucasian, which may limit the generalization of our findings to other racial and ethnic groups. A recent study indicated that hospitalization rates may differ between racial and ethnic groups with COVID-19 [20].
In conclusion, this study leverages extensive longitudinal EHR data prior to the COVID-19 pandemic to identify pre-existing clinical phenotypes associated with increased risk of COVID-19 hospitalization. These results provide key information for public policymakers highlighting the need to prevent COVID-19 related illness in patients with kidney disease and other high-risk conditions. Supporting information S1 Table.