Trends in COVID-19 patient characteristics in a large electronic health record database in the United States: A cohort study

Background Electronic health record (EHR) databases provide an opportunity to facilitate characterization and trends in patients with COVID-19. Methods Patients with COVID-19 were identified based on an ICD-10 diagnosis code for COVID-19 (U07.1) and/or a positive SARS-CoV-2 viral lab result from January 2020 to November 2020. Patients were characterized in terms of demographics, healthcare utilization, clinical comorbidities, therapies, laboratory results, and procedures/care received, including critical care, intubation/ventilation, and occurrence of death were described, overall and by month. Results There were 393,773 patients with COVID-19 and 56,996 with a COVID-19 associated hospitalization. A greater percentage of patients hospitalized with COVID-19 relative to all COVID-19 cases were older, male, African American, and lived in the Northeast and South. The most common comorbidities before admission/infection date were hypertension (40.8%), diabetes (29.5%), and obesity (23.8%), and the most common diagnoses during hospitalization were pneumonia (59.6%), acute respiratory failure (44.8%), and dyspnea (28.0%). A total of 85.7% of patients hospitalized with COVID-19 had CRP values > 10 mg/L, 75.5% had fibrinogen values > 400 mg/dL, and 76.8% had D-dimer values > 250 ng/mL. Median values for platelets, CRP, lactate dehydrogenase, D-dimer, and fibrinogen tended to decrease from January-March to November. The use of chloroquine/hydroxychloroquine during hospitalization peaked by March (71.2%) and was used rarely by May (5.1%) and less than 1% afterwards, while the use of remdesivir had increased by May (10.0%) followed by dexamethasone by June (27.7%). All-cause mortality was 3.2% overall and 15.0% among those hospitalized; 21.0% received critical care and 16.0% received intubation/ventilation/ECMO. Conclusions This study characterizes US patients with COVID-19 and their management during hospitalization over the first eleven months of this disease pandemic.


Introduction
Severe Acute Respiratory Syndrome coronavirus 2 (SARS-COV-2) was first reported in Wuhan, China at the end of 2019 [1]. On March 11, 2020, the World Health Organization (WHO) declared coronavirus disease 2019 (COVID- 19), the disease that SARS-COV-2 causes, a pandemic. As of June 28, 2021, there were approximately 181 million confirmed cases and 3.9 million deaths worldwide, including approximately 33.6 million confirmed cases and 604,000 deaths in the United States (US) [2].
As a novel disease, COVID-19 requires an extensive description of patients' characteristics and correlates, along with information on management and outcomes to inform prevention and treatment strategies, especially among immunocompromised patients such as those with cancer [3]. Much of the existing description is in the form of case reports or case series, collected at a single site or hospital system, or one time point [4,5]. To address the limited availability of large-scale, longitudinal, geographically diverse information on patients with COVID-19, Optum has developed a large electronic health record (EHR) database sourced from providers actively diagnosing and treating patients with COVID-19 throughout the US. This centralized EHR database of patients with COVID-19 provides a source within which to characterize patients with COVID-19 and evaluate their clinical outcomes, along with time trends in the first year of the pandemic.

Study design, setting, and participants
This study was sourced from patients with COVID-19 between January 2020 and November 2020. The subset of patients with COVID-19 who were hospitalized during the same period was also identified. The presence of COVID-19 infection was based on a record of a specific International Classification of Diseases 10 (ICD-10) diagnosis code for SARS-COV-2 (U07.1) and/or a positive SARS-COV-2 viral test, which included real-time reverse transcription-polymerase chain reaction (RT-PCR) and nucleic acid amplification tests (NAATs) SARS-CoV-2 tests. Patients with serologic antibody tests only (i.e., no record of a confirmatory COVID-19 diagnosis code nor a positive SARS-CoV-2 viral laboratory test result) were not included. The infection date was set as the earlier date of diagnosis and positive lab test results, while the cohort entry date for the hospitalized patients was set to the later of infection date or hospital admission, as some patients were already hospitalized when the diagnosis of infection was made, and some patients were admitted to the hospital after being diagnosed. Setting the cohort entry date in this way allows for the description of patients' clinical characteristics at the time of hospitalization with COVID- 19. The presence of comorbidities and concomitant medication use were assessed in the 21 days before and including the cohort entry date. Patient characteristics were captured from cohort entry to the earlier of the discharge date or 30 days after cohort entry.

Variables
All variables in this study were taken from the structured EHR data and were code-based. Patient demographics were derived from the EHR at the time of cohort entry (age in years, sex [male/female], race [African American/Asian/White/Other or Unknown], ethnicity [Hispanic/Not Hispanic/Unknown], region [Northeast/Midwest/South/West/Other or Unknown]). Comorbidities (diabetes, obesity, COPD, asthma, hypertension, coronary artery disease, congestive heart failure, liver disease, cancer) were assessed based on ICD-10 codes in the 21 days before or on the cohort entry date and patient-reported medication use (statins, angiotensin-converting enzyme inhibitors/angiotensin receptor blockers [ACEs/ARBs], nonsteroidal anti-inflammatory drugs [NSAIDs], corticosteroids for systemic use and proton pump inhibitors [PPIs]) was assessed via Anatomic Therapeutic Chemical (ATC) codes in the 21 days before and including the cohort entry date. For those hospitalized, information was collected on the duration of the hospitalization, vital signs (temperature, oxygen saturation), and laboratory tests and biomarkers (platelet count, C-reactive protein [CRP], ferritin, total lactate dehydrogenase, D-dimer, fibrinogen) via the initial occurrence of Logical Observation Identifiers Names and Codes (LOINCs) on or after hospital admission. Presenting symptoms or complications (hypoxemia, fever, cough, nausea/vomiting, malaise and fatigue, dyspnea/ shortness of breath, acute respiratory failure, pneumonia, sepsis, coagulation defects, or hemorrhagic conditions, arrhythmia, heart failure, myocardial infarction) during hospitalization were assessed via ICD-10 codes. Medications administered (chloroquine/hydroxychloroquine, lopinavir/ritonavir, remdesivir, dexamethasone, ACEs/ARBs, anticoagulants, immunosuppressants, antibacterials for systemic use, antivirals for systemic use, and corticosteroids for systemic use) were assessed via ATC and Optum proprietary codes. During follow-up, mechanical ventilation (including intubation/ventilation/extracorporeal membrane oxygenation) and critical care were ascertained using Current Procedure Terminology (CPT 1 ) and ICD-10 procedure codes during hospitalization. Mortality was determined via linkage to the Social Security Administration's Death Master File or as indicated within the medical record. Data lag for the Death Master File is approximately 6-9 months while the lag for death indicated in the medical record is approximately 2 months.

Data source
Given the importance of describing the clinical course of infection with COVID-19, Optum developed a low latency data pipeline that balances shortened data lag with the completeness of clinical information. The patients in this study were identified from Optum's EHR Database derived from the electronic health records of a network of healthcare provider organizations across the United States that include more than 700 hospitals and 7000 clinics. A total of 48% of the contributing electronic medical record (EMR)/EHR systems are from the Midwest, 20% from the South, 20% from the Northeast, and 8% from the West. This database incorporates clinical and medical administrative data from both inpatient and ambulatory EMRs, practice management systems, and numerous other internal systems. The data are processed from across the continuum of care, including acute inpatient stays and outpatient visits, and are incorporated into the database on a biweekly basis, representing a near-time capture of information. Optum's COVID-19 EHR database captures diagnostics specific to the COVID-19 patient during initial presentation, acute illness, and convalescence with over 500 mapped labs and bedside observations, including COVID-19 specific testing. The database is certified as de-identified by an independent statistical expert following Health Insurance Portability and Accountability Act (HIPAA) statistical de-identification rules and managed according to Optum customer data use agreements. The database was deidentified before the authors gained access. Data were extracted on December 10, 2020.

Statistical methods
Frequencies and percentages were reported for binary and categorical variables, while medians and interquartile ranges (IQRs) were reported for continuous variables. Characterizations were described and visualized overall and by cohort entry month or week. All analyses were conducted in SAS 9.4 (SAS Institute Inc. Cary, NC). Strengthening the Reporting of Observational studies in Epidemiology (STROBE) guidelines were followed in the reporting of this research [6].

Results
The flow chart for the identification of COVID-19 cases within the Optum COVID-19 EHR database is shown in  received critical care and 9,136 (16.0%) treated with mechanical ventilation. There were 12,456 deaths among all identified cases of COVID-19, corresponding to an overall mortality of 3.2%, and 8,526 deaths among patients hospitalized with COVID-19, corresponding to a mortality of 15.0%.
Basic demographics for patients with COVID-19 and the subset of patients hospitalized are displayed in Fig 2. A greater percentage of patients hospitalized with COVID-19 relative to all COVID-19 cases were older (35.0% aged 70 years and older among hospitalized vs. 13.8% among all cases), male (49.5% vs. 44.6%), and African American (20.5% vs. 12.9%). Fig 3 demonstrates the trend and distribution of COVID-19 cases biweekly by region in the left bar of each pair. There were three peaks of case counts observed; the first occurred in late-March to April, the second in late -June to July, and the third and largest in early November. A greater percentage of patients were from the Northeast in March, April, and May, while in June through November, the most patients were from the Midwest.
The trend and distribution of patients hospitalized with COVID-19 were also visualized in the right bar of each pair in Fig 3. The number of hospitalized cases peaked in early April when the majority were from the Northeast. In the summer months, hospitalized cases declined overall though a larger percentage were from the South. Hospitalized cases grew again in the fall, with the largest percentage from the Midwest.
Comorbidities and patient-reported medication use among those hospitalized with COVID-19 are shown in Table 1. The most common insurance type was multiple types at 28.5%, followed by commercial only at 27.7%. The majority (76.7%) of patients presented to the emergency department before hospital admission. A total of 26.0% of the hospitalized patients had a diagnostic code only as their cohort qualifying event, 3.8% had a lab result only, and 70.2% had both. The most common comorbidities were hypertension (40.8%), diabetes (29.5%), and obesity (23.8%), while the most common patient-reported medication used was statins (26.2%).
Characterization of the hospitalized population during the hospitalization both overall and by admission month is presented in Table 2. From January through March, men and African Americans made up a larger percentage of patients hospitalized with COVID-19, while by the summer the hospitalized population was younger, more female, and more Hispanic. By November, the median age of the patients hospitalized with COVID-19 peaked at 65  Among hospitalized patients, 6.6% had a temperature > 38 degrees Celsius and 10.1% had an oxygen saturation < 90%. A total of 85.7% of patients hospitalized with COVID-19 had CRP values > 10 mg/L, 76.8% had D-dimer values > 250 ng/mL, and 75.5% had fibrinogen values > 400 mg/dL. Median values for platelets, CRP, lactate dehydrogenase, Ddimer, and fibrinogen tended to decrease from January-March to November. The most common presenting symptoms or complications during hospitalization were pneumonia (59.6%), followed by acute respiratory failure (44.8%), and dyspnea (28.0%). The prevalence of most symptoms and complications tended to decline from January through March to November.
The most common medications administered during hospitalization were anticoagulants (79.8%) followed by systemic antibacterials (66.8%) and corticosteroids (46.7%). The use of chloroquine/hydroxychloroquine declined throughout the study, with 71.2% of patients receiving it in January-March, 5.1% in May, and less than 1% from August to November. In contrast, remdesivir (1.4% in January-March vs. 46.8% in November) and dexamethasone (4.3% in January-March vs. 48.6% in November) increased during the same period. Use of lopinavir/ritonavir, anticoagulants, immunosuppressants, systemic antibacterials, and systemic antivirals declined from January to November.
The severity of disease and associated outcomes, as measured by receipt of critical care, mechanical ventilation, and occurrence of death, decreased over time. The proportion of patients receiving critical care during hospitalization dropped from 30.9% in January-March to 13.3% in November, receiving mechanical ventilation dropped from 31.1% to 10.1%, and those who died from 23.1% to 4.6%.

Discussion
This characterization of US patients hospitalized with COVID-19 illustrates the clinical correlates and management of the disease over time in a large geographically diverse EHR database. The distribution of patients with COVID-19 and the subset who were hospitalized varied by age, region, race, and time. A change in treatment pattern was also observed, as the use of hydroxychloroquine decreased while the use of dexamethasone and remdesivir increased.

Duration of Hospital Stay in
Most presenting symptoms and complications as well as severe outcomes during hospitalization improved throughout the study period. This EHR data source from which the patients with COVID-19 arose provides an opportunity to identify and characterize large numbers of patients with COVID-19, facilitating rapid observational assessments and providing insight into treatment and the outcome of infection. While randomized controlled trials continue to be seen as the gold standard to determine treatment efficacy for narrowly-defined hypotheses, real-world data sources provide useful insight into treatment effects during an evolving pandemic [7]. Because this database includes close to real-time data and is updated at least monthly, it can also serve as an important source of surveillance activities related to the COVID-19 pandemic in the US. Since it is larger than databases in previously published studies [8,9], it may also have the power to detect rare correlates and outcomes.
In this database, the number of both cases of COVID-19 and hospitalized patients with COVID-19 peaked in early April and then declined until early July, when the number of cases had a second large peak while the number of hospitalized cases increased slightly. Both cases of COVID-19 and hospitalized patients with COVID-19 reached another peak in November. These peaks of cases were likely due to surges in different underlying regions and age groups over time. In March, hospitalized patients with COVID-19 were predominantly older and resided in the Northeast, with many outbreaks reported in nursing homes or assisting living facilities [10]. In contrast, as the pandemic continued into the summer, the median age of cases declined in the US and our data, and the median age peaked again in November [11]. Because the percentage of COVID-19 cases that were hospitalized increases with age [12], this may be a potential explanation for our findings. In addition, the differences observed between cases of COVID-19 and hospitalized patients with COVID-19 may reflect a lag between the timing of infection and the time of admission to the hospital. However, there was little difference between the admission date and the COVID-19 infection date. Consistent with previous studies [13], we also observed that a higher percentage of hospitalized patients were older, male, and African American compared with overall patients with COVID-19. The disproportionate burden of COVID-19 in these groups may indicate differences in underlying diseases, socioeconomic and occupational status, crowded housing, immune response, clinical features, and more compared with younger populations, females, and other races.
In line with other studies [13,14], we observed that hypertension, diabetes, and obesity were common underlying comorbidities among patients with COVID-19, and these conditions may represent correlates of the demographic characteristics of affected patients. The prevalence of these conditions in our study was lower than those reported in previous studies. Reasons for this disparity include a broader region and a longer period of the pandemic covered in the analysis or under captured comorbidities. These reasons may also explain the relatively lower mortality rate in the overall population. The mortality rates in our study were 15% for hospitalized patients and 3.2% for overall patients, consistent with 15% for the hospitalized patients in the MMWR [15]. However, we observed a higher prevalence of severe outcomes than in previous reports; this could be explained by our application of using the broader category of critical care instead of ICU only and the broader category of mechanical ventilation instead of intubation, ventilation, or ECMO individually. We found that 21.0% of hospitalized patients received critical care and 16.0% received mechanical ventilation, in contrast to a previous report in which 14% of patients were treated in the ICU and 12% received invasive mechanical ventilation in the New York City area. In a study of patients readmitted for COVID-19, 15% were admitted to an ICU and 13% required invasive mechanical ventilation during their first admission [15].
The reduction of symptoms, conditions and severe outcomes among hospitalized patients over time aligns with the lessons and knowledge learned during this pandemic as treatment patterns were adjusted. We observed that the use of chloroquine/hydroxychloroquine peaked in January-March and dramatically dropped by May while the use of remdesivir and dexamethasone increased over the study period. This reflects the change in evidence and FDA emergency use authorizations for treatments throughout the pandemic. While hydroxychloroquine was originally touted as a promising treatment, recent randomized trials have demonstrated that this drug is no more effective than a placebo at preventing COVID-19 illness and is associated with more adverse events [16,17]. In contrast, separate trials have shown the use of remdesivir is associated with lower median recovery time in hospitalized adults compared to placebo and the use of dexamethasone was associated with lower 28-day mortality among hospitalized patients with COVID-19 on respiratory support compared to usual care [18,19]. This shows that treating clinicians are rapidly changing their prescribing patterns as additional data are released. The slight decline in the use of anticoagulants, immunosuppressants, antibacterials for systemic use, and antivirals for systemic use over time aligns with the improvement observed in inflammation-related biomarkers and coagulopathy profiles, corresponding to the decrease in severe outcomes over time. However, it is also possible that this decline is due in part to data lag. This study has several limitations. While EHR data are valuable for the examination of clinical health care outcomes and treatment patterns, EHR databases have certain inherent limitations because the data are collected for clinical patient management, not research. The patientreported medication use data are self-reported and do not indicate that medication was prescribed, filled, consumed or that it was taken as prescribed. However, for medications administered during hospitalization, underestimation is less likely. The presence of a diagnosis code may not always represent the presence of disease, as the diagnosis code may be incorrect or may represent a condition being evaluated or ruled out. Also, diagnoses and treatments received from providers outside of the network may not be recorded. Because the clinical information was recorded in the EHR during a stressful pandemic situation, it is possible that typical coding conventions were not followed [7]. Cases found in January and February may represent data errors instead of true cases. Additionally, some laboratory observations were not recorded for all patients. There may be a data lag in identifying hospitalizations, which would underestimate the number of hospitalized patients. In addition, although deaths were ascertained using all data available, underestimation is likely due to data lag. While the death indicator from the EHR does capture medically attended deaths, there may be a lag of up to 2 months for this field to be up to date. For medically unattended death, we relied on linkage to the Social Security Death Master File which has a lag of 6-9 months. Thus, deaths occurring outside of interaction with the EHR provider network may have been missed.
In conclusion, this study provides a characterization of US patients identified with COVID-19 and those hospitalized with COVID-19 and their management over the first 11 months of this disease outbreak. This characterization provides insight into the populations most affected by the disease and evidence for future observational research and may facilitate inferential study design.