Characterizing COVID-19 Clinical Phenotypes and Associated Comorbidities and Complication Profiles

Background: There is limited understanding of heterogeneity in outcomes across hospitalized patients with coronavirus disease 2019 (COVID-19). Identification of distinct clinical phenotypes may facilitate tailored therapy and improve outcomes. Objective: Identify specific clinical phenotypes across COVID-19 patients and compare admission characteristics and outcomes. Design, Settings, and Participants: Retrospective analysis of 1,022 COVID-19 patient admissions from 14 Midwest U.S. hospitals between March 7, 2020 and August 25, 2020. Methods: Ensemble clustering was performed on a set of 33 vitals and labs variables collected within 72 hours of admission. K-means based consensus clustering was used to identify three clinical phenotypes. Principal component analysis was performed on the average covariance matrix of all imputed datasets to visualize clustering and variable relationships. Multinomial regression models were fit to further compare patient comorbidities across phenotype classification. Multivariable models were fit to estimate the association between phenotype and in-hospital complications and clinical outcomes. Main outcomes and measures: Phenotype classification (I, II, III), patient characteristics associated with phenotype assignment, in-hospital complications, and clinical outcomes including ICU admission, need for mechanical ventilation, hospital length of stay, and mortality. Results: The database included 1,022 patients requiring hospital admission with COVID-19 (median age, 62.1 [IQR: 45.9-75.8] years; 481 [48.6%] male, 412 [40.3%] required ICU admission, 437 [46.7%] were white). Three clinical phenotypes were identified (I, II, III); 236 [23.1%] patients had phenotype I, 613 [60%] patients had phenotype II, and 173 [16.9%] patients had phenotype III. When grouping comorbidities by organ system, patients with respiratory comorbidities were most commonly characterized by phenotype III (p=0.002), while patients with hematologic (p<0.001), renal (p<0.001), and cardiac (p<0.001) comorbidities were most commonly characterized by phenotype I. The adjusted odds of respiratory (p<0.001), renal (p<0.001), and metabolic (p<0.001) complications were highest for patients with phenotype I, followed by phenotype II. Patients with phenotype I had a far greater odds of hepatic (p<0.001) and hematological (p=0.02) complications than the other two phenotypes. Phenotypes I and II were associated with 7.30-fold (HR: 7.30, 95% CI: (3.11-17.17), p<0.001) and 2.57-fold (HR: 2.57, 95% CI: (1.10-6.00), p=0.03) increases in the hazard of death, respectively, when compared to phenotype III. Conclusion: In this retrospective analysis of patients with COVID-19, three clinical phenotypes were identified. Future research is urgently needed to determine the utility of these phenotypes in clinical practice and trial design.


Data Collection
The data source for this study included EHR reports from 14 U.S. Midwest hospitals and 60 primary care clinics. Patient and hospital-level data were available for 7,538 patients with PCR-confirmed COVID-19. Of these, 1,022 required hospital admission and were included in this analysis. The database included all comorbidities reported since March 29, 1997 for each patient and prior to their COVID-19 diagnosis. The database also included home medications, laboratory values, clinic visits, social history, and patient demographics (age, gender, race/ethnicity, language spoken, zip code, socioeconomic status indicators  Table 3). All comorbidities were identified based on ICD-9, ICD-10, or problem list documentation within the electronic health record. An indicator variable was created for each comorbidity to denote the presence of the selected ICD-9, ICD-10, or problem list documentation at any time in the medical record. To facilitate analysis, comorbidities were grouped by organ system into the following categories: cardiac, respiratory, hematologic, metabolic, renal, hepatic, autoimmune, cancer, and cerebrovascular disease.

Complications and Clinical Outcomes
We selected 30 in-hospital complications measured during each patient s hospital stay for COVID-19 categorized into the following systems: cardiovascular, respiratory, hematologic, renal, hepatic, metabolic, and infectious (Supplemental Table 4). If applicable, complications could span multiple organ system variables. For example, ventilator associated pneumonia was included in both infectious and respiratory complications. Additional clinical outcomes included hospital length of stay (LOS), need for intensive care unit (ICU) admission, need for mechanical ventilation, and mortality.
Mortality was defined as any in-hospital or out-of-hospital death based on death certificate data. All complications and outcomes were followed for a minimum of 2 weeks following hospital admission.

Statistical Analysis
The overall rate of missingness of the 33 variables used for phenotyping, which included the first vitals and labs recorded for each inpatient within 72 hours of admission, was 19% (range 0% -50%  Figure 5). While phenotypes II and III overlay substantially, phenotype I is more clearly defined in the right-hand side of the score plot of the first two principal components (Figure 1). Notably, this figure shows that distinctions between phenotypes are primarily driven by variation in PC1 as opposed to PC2. The variable contributions to PC1

Phenotype Characteristics
Differences across phenotypes with respect to patient demographics, admission vitals and labs, complications, comorbidities, and clinical outcomes are presented in Table 1. Patients with All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020. When grouping comorbidities by organ system, cardiac (p <0.001), respiratory (p =0.002), hematologic (p <0.001), and renal (p <0.001) comorbidities were found to be significantly associated with phenotype. Cancer, hepatic, autoimmune, cerebrovascular, and metabolic comorbidities were not significantly associated with phenotype (Table 1

Association between Phenotype and Clinical Outcomes
Clinical phenotypes I and II were associated with increased odds of respiratory  Table 2). There was a trend towards increased odds of hematologic complications among patients with phenotype I (I: OR: 2.11, 95% CI: 0.99-4.48, p =0.05) compared to III. Phenotype was associated with hepatic complications (p <0.001); however, while All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Phenotype II had a 2.57-fold (HR: 2.57, 95% CI: 1.10-6.00, p=0.03) increase in the hazard of death compared to Phenotype 3. We performed a sensitivity analysis to assess the impact of mortality as a competing risk by fitting the LOS model before and after removing the 127 patients who died. The estimated effect sizes were similar between these two models (data not shown).

Discussion
This is one of the first studies to report on clinical phenotypes associated with COVID-19. We identified three clinical phenotypes for patients with COVID-19 on hospital presentation. Most patients presented with phenotype II, which is associated with a moderate course and an approximately 10% mortality. A subset of patients presented with the more severe phenotype I, which is associated with a staggering 27% mortality. Patients with cardiac, hematologic, and renal comorbidities were most likely to be characterized by phenotype I. Surprisingly, respiratory comorbidities appeared less related to phenotypes I or II and were most associated with phenotype III, which had the most indolent course.
Despite this indolent course, patients with phenotype III had the highest rate of readmission which is likely in part due to the high survival rate. This also suggests patients with pre-existing respiratory comorbidities, while not at highest risk for mortality, may be at highest risk for long term sequalae following COVID-19. Patients that presented with phenotype I were most associated with the development of respiratory, hematologic, renal, metabolic, hepatic, and infectious complications.
Surprisingly, cardiovascular complications did not significantly differ between phenotypes.
Elucidating patient risk factors and severe COVID-19 disease markers may allow early treatment implementation that may impro e the patient s outcome. Multiple studies have documented COVID-19 risk factors; however, most have done so from a homogenous lens. For example, a prospective cohort study from New York City identified that the most considerable risks for hospital admission were age, male sex, heart failure, chronic kidney disease, and high BMI.22 A large observational study conducted in the UK reported that increasing age, male gender, comorbidities such as cardiac disease, chronic lung disease, chronic kidney disease, and obesity were associated with higher mortality in COVID-19 positive patients admitted to the hospital.14 A study from China found that increased odds of in-hospital death due to COVID-19 were associated with older age, higher SOFA score and D-dimers > 1.0 µg/mL on admission.23 Another retrospective study reported that patients with severe COVID-19 disease and diabetes had increased leucocytes, neutrophils count, and increased C-reactive protein (CRP ), D-dimers, fibrinogen levels.24 A systematic review and meta-analysis found that the biomarkers associated with increased mortality include higher CRP, higher D-dimers, increased creatinine, and lower albumin levels.25 However it is well known that patients do not have a singular natural history of disease. Multiple studies including this study found that only half of patients suffer a primarily respiratory disease. 26,27 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Similar to our analysis, they identified three distinct clinical phenotypes. Their low mortality cluster which they called cluster 1 was very similar to our phenotype III with a predominance of females, lower mortality rate, lower D-dimer and CRP levels. Similarly, their high mortality cluster was predominantly male, with elevated inflammation markers on ICU presentation. In this study, we not only characterized three clinical phenotypes, but extended findings outside of the ICU by characterizing the association of comorbidities with clinical phenotype and the association of clinical phenotypes with in-hospital complication and clinical outcomes.
Phenotype I can be termed the Ad erse phenot pe and as associated ith the worst clinical outcomes. LDH, Absolute Neutrophil Count, D-dimer, AST, and CRP were most influential in phenotype I determination. The strong association of RDW with phenotype I was interesting. RDW was strongly associated with genetic age which is hypothesized to be a risk factor in Covid-19.30 As people age, variability in red blood cell volumes increases. Similarly, Gamma Gap, a marker of immunoglobulin levels, was elevated in all three phenotypes (median > All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

3.5).32 However, patients with clinical phenotype I were noted to have the largest increase in Gamma
Gap. In this scenario elevated Gamma Gap was likely an indicator of systemic inflammation and has been associated in other inflammatory disease processes with prognosis. Other groups have previously reported on the importance of the Absolute Neutrophil to Absolute Lymphocyte count, here we noted that ANC/ALC was lowest for phenotype III and highest for phenotype I, in line with previous reports.
Patients with cardiac, hematologic and renal comorbidities were most prone to develop phenotype I.
Phenotype I was associated with numerous complications ( hematologic, hepatic, metabolic, renal, respiratory, and infectious) when compared to other phenotypes. It is interesting to note despite a higher rate of baseline cardiac comorbidities phenotype I was not associated with increased cardiac complications.
Phenotype III as associated ith the best clinical outcomes and can be termed the Fa orable Phenot pe . Surprisingly, patients with phenotype III had a very high rate of respiratory comorbidities and the best clinical outcomes. What is most surprising is despite the lowest complication rate and mortality, this phenotype was associated with a greater than 10% rate of hospital readmission. It is possible that patients pre-existing respiratory comorbidities predisposed them to longer term sequelae which may have resulted in this readmission rate, although additional studies are needed to better elucidate these findings, specifically controlling for differences in survival. Patients with respiratory comorbidities such as asthma and COPD routinely use medications which may be protective in SARS-CoV-2 pathogenesis which may explain this protective effect. For example, our group has previously identified reduced mortality in COVID-19 for patients with asthma treated with beta2-agonists. 16 Patients with phenotype III were more likely to use inhaled steroids, nasal fluticasone, albuterol, and antihistamines.
Ultimately, a deeper investigation into clinical phenotypes and associated genomic, transcriptomic, and proteomic is needed. The ability to classify patients into clinical phenotypes can facilitate the linkage of exome data to better understand SARS-CoV-2 pathogenesis and natural history. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Limitations
Our study has several limitations, including that this is a retrospective study and therefore results may be biased or subject to residual confounding. Second, patients were followed for

NIH NHLBI T32HL129956 (JP, LS)
All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020.

Figure 4: Relative Risk Ratio of Comorbidities to Clinical Phenotypes
Relative Risk ratios of comorbidities of phenotypes I and II compared to the reference group phenotype III.

Supplemental Figure 1: Consensus Cumulative Distribution Functions
Cumulative distribution functions (CDF) for a randomly selected imputed dataset are shown. A range of phenotypes (2-7) were considered, and the optimal choice of phenotypes is 3.

Supplemental Figure 2: Delta Area
All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020. . https://doi.org/10.1101/2020.09.12.20193391 doi: medRxiv preprint The relative change in delta area under the cumulative distribution function is shown for the range of phenotypes (k=2-7) for a randomly selected imputed dataset. The optimal choice of phenotypes is 3.

Supplemental Figure 3: Consensus matrix with 3 clusters
A consensus matrix heatmap is shown for a randomly selected imputed dataset clustered into 3 phenotypes. The heatmap allows visualization of consensus cluster assignments to evaluate cluster stability. Darker shades of green indicate higher stability.

Supplemental Figure 4: Consensus matrix with 4 clusters
A consensus matrix heatmap is shown for a randomly selected imputed dataset clustered into 4 phenotypes. The heatmap allows visualization of consensus cluster assignments to evaluate cluster stability. Darker shades of green indicate higher stability. The choice of 4 clusters shows less stability than 3 clusters (see Supplemental Figure 3).

Supplemental Figure 5: Cumulative Proportion of Variance Explained
The proportion of variance explained by each principal component is summed over all principal components. For example, PC1 and PC2 cumulatively explain 20% of the variation in the dataset.
Abbreviations: PC1 (principal component 1); PC2 (principal component 2) All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Supplemental Figure 6: Comorbidities by Phenotype
Chord diagram illustrates the prevalence of comorbidities (% observed) for the three clinical phenotypes.

Supplemental Figure 7: Complications by Phenotype
Chord diagram illustrates the prevalence of complications (% observed) for the three clinical phenotypes.

Supplemental Figure 8: Clinical Outcomes by Phenotype
Chord diagram illustrates the prevalence of clinical outcomes (% observed) for the three clinical phenotypes.
Tables: Table 1 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020. . https://doi.org/10.1101/2020.09.12.20193391 doi: medRxiv preprint preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020. Score Plot: PC2 vs. PC1 All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Contribution of Variables to PC1
All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020. . https://doi.org/10.1101/2020.09.12.20193391 doi: medRxiv preprint  preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 14, 2020. . https://doi.org/10.1101/2020.09.12.20193391 doi: medRxiv preprint Figure 3 Text All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Figure 4