Covid-19 by Race and Ethnicity: A National Cohort Study of 6 Million United States Veterans

Background: There is growing concern that racial and ethnic minority communities around the world are experiencing a disproportionate burden of morbidity and mortality from symptomatic SARS-Cov-2 infection or coronavirus disease 2019 (Covid-19). Most studies investigating racial and ethnic disparities to date have focused on hospitalized patients or have not characterized who received testing or those who tested positive for Covid-19. Objective: To compare patterns of testing and test results for coronavirus 2019 (Covid-19) and subsequent mortality by race and ethnicity in the largest integrated healthcare system in the United States. Design: Retrospective cohort study. Setting: United States Department of Veterans Affairs (VA). Participants: 5,834,543 individuals in care, among whom 62,098 were tested and 5,630 tested positive for Covid-19 between February 8 and May 4, 2020. Exposures: Self-reported race/ethnicity. Main outcome measures: We evaluated associations between race/ethnicity and receipt of Covid-19 testing, a positive test result, and 30-day mortality, accounting for a wide range of demographic and clinical risk factors including comorbid conditions, site of care, and urban versus rural residence. Results: Among all individuals in care, 74% were non-Hispanic white (white), 19% non-Hispanic black (black), and 7% Hispanic. Compared with white individuals, black and Hispanic individuals were more likely to be tested for Covid-19 (tests per 1000: white=9.0, [95% CI 8.9 to 9.1]; black=16.4, [16.2 to 16.7]; and Hispanic=12.2, [11.9 to 12.5]). While individuals from minority backgrounds were more likely to test positive (black vs white: OR 1.96, 95% CI 1.81 to 2.12; Hispanic vs white: OR 1.73, 95% CI 1.53 to 1.96), 30-day mortality did not differ by race/ethnicity (black vs white: OR 0.93, 95% CI 0.64 to 1.33; Hispanic vs white: OR 1.07, 95% CI 0.61 to 1.87). Conclusions: Black and Hispanic individuals are experiencing an excess burden of Covid-19 not entirely explained by underlying medical conditions or where they live or receive care. While there was no observed difference in mortality by race or ethnicity, our findings may underestimate risk in the broader US population as health disparities tend to be reduced in VA.


Introduction
The United States has the highest number of reported symptomatic severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections and related deaths in the world, accounting for one-third of global totals. 1 There is growing concern that racial and ethnic minority communities are experiencing a disproportionate burden of morbidity and mortality from symptomatic SARS-Cov-2 infection or coronavirus disease 2019 (Covid-19). [2][3][4][5] One report from Milwaukee County, Wisconsin found that Covid-19 cases were largely clustered in predominately African American areas and that this clustering was independent of income. 6 The largest study to date on racial and ethnic disparities in the US conducted by the Centers for Disease Control and Prevention (CDC) found that black individuals were overrepresented in a national sample of 580 patients hospitalized with Covid-19. 7 Thus, racial and ethnic disparities in the incidence of and outcomes from Covid-19 have been deemed an urgent public health research priority. 8 However, most studies investigating racial and ethnic disparities have focused on hospitalized patients or have not characterized which individuals were tested or those who tested positive for Covid-19. [9][10][11][12][13] In addition, it is not yet known whether such disparities are driven, at least in part, by differences in underlying health conditions, smoking and alcohol use, geographic location, or urban versus rural residence -essential information if we are to design effective interventions.
The electronic health record database of the Department of Veterans Affairs (VA) offers the single largest national data resource available with the necessary information on system-wide testing and detailed medical histories to examine racial and ethnic disparities in the US. We evaluated associations between race/ethnicity and receipt of Covid-19 testing, a positive test result, and 30-day mortality, each outcome conditional on the previous and accounting for a wide range of demographic and clinical risk factors and geographic site of care through May 4, 2020.

Data Source
The VA is the largest integrated healthcare system in the United States and comprises over 1200 points-of-care nationally including hospitals, medical centers, and community outpatient clinics. All care is recorded in a national electronic health record with daily uploads into the VA Corporate Data Warehouse. Available data include demographics, outpatient and inpatient encounters, diagnoses, smoking and alcohol health behaviors, and pharmacy dispensing records.
This study was approved by the Institutional Review Boards of VA Connecticut Healthcare System and Yale University. It has been granted a waiver of informed consent and is Health Insurance Portability and Accountability Act compliant.

Sample, Follow-up, and Outcomes
All individuals active in clinical care (defined as having at least one clinical encounter between January 1, 2018 and December 31, 2019 and alive as of January 1, 2020) were included in this analysis. We identified individuals tested for Covid-19 from date of first recorded test on February 8, 2020 through May 4, 2020 by using text searching of laboratory results containing terms consistent with SARS-CoV-2 or Covid-19. Nearly all tests utilized nasopharyngeal swabs, 1% were from other sources. Testing was performed in VA, state public health, and commercial reference laboratories using emergency use authorization approved SARS-CoV-2 assays. We did not include antibody tests in this analysis.
If an individual had more than one test and all were negative we selected the date of the first negative test, otherwise we used the date of the first positive test. Baseline for individuals tested for Covid-19 was defined as the date of specimen collection unless testing occurred during hospitalization, in which case it was defined as date of admission. If the admission began more than 14 days prior to testing, which may indicate hospital-acquired infection, we set baseline to 14 days prior to testing to better capture health status prior to SARS-CoV-2 infection. We examined three outcomes: 1) receipt of Covid-19 testing among all in care, 2) a positive test result among individuals tested for Covid-19, and 3) 30-day mortality among Covid-19 cases. Deaths were captured using inpatient records and VA death registry to capture deaths outside of hospitalization.

Covariates
We selected previously validated cohort characteristics and those that have been evaluated in prior Covid-19 reports. Demographics included age at baseline, sex, race/ethnicity (non-Hispanic white [white], non-Hispanic black [black], and Hispanic), and rural/urban residence. While the VA cares for individuals of other racial and ethnic backgrounds, there were too few tested for Covid-19 to obtain estimates in adjusted analyses at this time. Residence was defined using geographic information system coding based upon established criteria. 14 We extracted diagnostic codes for asthma, any cancer, chronic obstructive pulmonary disease (COPD), chronic kidney disease, diabetes mellitus, hypertension, liver disease, vascular disease, and alcohol use disorder (definitions provided in Supplementary Table S1). Presence of conditions was determined by one inpatient or two outpatient diagnoses in the two years prior to baseline, except for cancer which was ever prior to baseline. Diagnoses made in the 7 days prior to baseline were not included. We used a validated algorithm to capture smoking status 15 and alcohol consumption. 16 We collected pharmacy fills for angiotensin converting enzyme (ACE) inhibitors and angiotensin II receptor blockers (ARBs) and identified individuals with active prescriptions in the 30 days prior to baseline. Missing data for smoking and alcohol consumption only affected 8% of individuals included in multivariable models, thus complete case analysis was performed.

Statistical Analysis
We calculated number of Covid-19 tests per 1,000 individuals in care and Clopper-Pearson 95% confidence intervals (CIs) by race/ethnicity category. Among those tested for Covid-19, we calculated percent testing positive for Covid-19 and 95% CIs by race/ethnicity category. Logistic regression models were used to estimate age-adjusted and fully adjusted associations between race/ethnicity and testing positive for Covid-19. Fully adjusted models included all pre-specified covariates listed above. We further fitted a model conditioned on site of care among sites with . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 18, 2020. . https://doi.org/10.1101/2020.05. 12.20099135 doi: medRxiv preprint at least five positive tests (which captured 99% of Covid-19 cases) to account for site-specific differences in Covid-19 testing. We repeated this modelling procedure to estimate odds ratios (ORs) and 95% CIs between race/ethnicity and 30-day mortality among those testing positive for Covid-19 on or prior to April 3, 2020 to allow all individuals 30 days of follow-up. In sensitivity analyses, we compared associations with testing positive for Covid-19 over two time periods (February 8 to April 3, 2020 and April 4 to May 4, 2020). We also restricted the outcome for 30-day mortality to inpatient deaths to test robustness of associations found in a model including all deaths. Analyses were performed using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA). R version 3.6.3 was used to map Covid-19 cases nationally.
The geographic distribution of all 5,630 Covid-19 cases in the VA were similar to known hotspots in the general population, including Northeastern and Midwestern states, Louisiana, and Colorado (Figure 1a). Several facilities performing the highest number of tests also had the is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 18, 2020. . https://doi.org/10.1101/2020.05.12.20099135 doi: medRxiv preprint highest proportion of black individuals in care, including New York City, Chicago, New Orleans, and Detroit (Figure 1b).

Rate of Testing and Testing Positive
Testing rates for Covid-19 was higher among black (tests per 1000 in care: 16.4, 95% CI 16.2-16.7) and Hispanic individuals (12.2, 95% CI 11.9-12.5) than white individuals (9.0, 95%CI 8.9-9.1). Testing rates also varied by age, sex, residence type ( Table 2). Among white individuals in care, those aged 60-69 years were most likely to be tested (tests per 1000: 13.9, 95% CI 13.6-14.1). However, among black and Hispanic individuals in care, testing rates steadily increased with age with those aged ≥80 years being most likely to be tested (tests per 1000: 52.2, 95% CI 50.4-54.0 for black individuals; 24.3, 95% CI 22.7-26.0 for Hispanic individuals). Among white individuals, women were more likely than men and those residing in urban areas versus rural areas were more likely to be tested for Covid-19. Conversely, among black and Hispanic individuals, men were more likely than women and those residing in rural areas versus urban areas were more likely to be tested for Covid-19.
Among individuals tested for Covid-19, the proportion with a positive test varied by race/ethnicity ( Table 2); 5.8% (95% CI 5.6-6.1%) of white, 15.2% (95% CI 14.7-15.8%) of black, and 11.7% (95% CI 10.9-12.6%) of Hispanic individuals tested positive for Covid-19. For all race/ethnicity groups, positive results were highest at older ages, among men, and those residing in urban settings, apart from Hispanic individuals whereby positive results were fairly similar across all ages. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 18, 2020.  Figure 2a). Independent of race/ethnicity and all other adjusting factors, men were more likely to test positive than women (OR 1.74, 95% CI 1.55-1.96) and those living in urban settings were more likely than to test positive than those living in rural settings (OR 1.38, 95% CI 1.23-1.55).

Multivariable Models of 30-day Mortality
There were 2,420 individuals who tested positive for Covid-19 on or before April 3, 2020, of whom 284 died within 30 days. Most deaths (n=228, 80.3%) occurred in hospital. In age-adjusted, multivariable, and conditional analyses ( Table 3), black race (OR 0.93, 95% CI 0.64-1.33) and Hispanic ethnicity (OR 1.07, 95% CI 0.61-1.87) were not associated with 30-day mortality (Figure 2b). Urban residence was associated with 30-day mortality in an age-adjusted model (OR 2.01, 95% CI 1.10-3.65), and this association remained elevated but lost significance once accounting for  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 18, 2020. . https://doi.org/10.1101/2020.05. 12.20099135 doi: medRxiv preprint other demographic, clinical, and behavioral factors (OR 1.84, 95% CI 0.87-3.89). There was no association between sex and 30-day mortality in any model, but confidence intervals were wide likely due to the low number of women.

Discussion
This study is the largest in the world to examine racial and ethnic disparities in testing and subsequent Covid-19 mortality representing approximately 6 million individuals receiving care in the US. We found that black and Hispanic individuals were more likely to be tested and test positive for Covid-19 than white individuals, even after accounting for underlying health conditions, other demographics, and geographic location. While individuals from minority backgrounds appeared to experience excess burden of Covid-19, among those infected, there was no observed difference in 30-day mortality by race/ethnicity group. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Key strengths and limitations
This study was the first to elucidate racial disparities in testing patterns of Covid-19 independent of underlying health status and other key factors. Key strengths of this study were that it was based on well annotated national electronic health record data from a team with decades of experience using VA data, enabling a rapid and reliable analysis of Covid-19 by race and ethnicity. This analysis utilized patients' records from an entire healthcare system, which made it less prone to collider bias (i.e., non-random selection of individuals into a study) than other Covid-19 studies limited to individuals testing positive and admitted to hospital. 17 Unlike other national healthcare systems, linkage to Covid-19 testing data or outcomes was not required as the integrated nature of VA healthcare provided at over 1200 sites nationally allows all information to be stored in its national Corporate Data Warehouse. We used validated algorithms to accurately extract information on and adjust models for a wide range of clinical, behavioral, and geographic factors with very little missingness in the data. The scale of VA data also allowed us to assess the impact of Covid-19 across separate race/ethnicity groups, which would have masked important differences between black and Hispanic individuals. We continue to monitor numbers for individuals of other minority backgrounds and plan to follow-up these analyses when there are sufficient numbers for analysis.
While this analysis adds information, its limitations must be kept in mind. First, this study was conducted on Veterans currently receiving care in the VA who are older and have a higher prevalence of chronic health conditions and risk behaviors than the general US population.  Log odds ratio Log odds ratio

(a) Positive test result among tested (b) 30-day mortality among cases
Abbreviations: OR, odds ratio; CI, confidence interval. Notes: Both models conditioned on site of care and adjusted for baseline comorbidity (asthma, cancer, chronic kidney disease, chronic obstructive pulmonary disease, diabetes mellitus, hypertension, liver disease, vascular disease), substance use (alcohol consumption, alcohol use disorder, smoking status), and medication history (angiotensin converting enzyme inhibitor, angiotensin II receptor blocker). *Low number of mortality events in age groups 20-39 (0 events) and 40-49 (3 events) thus grouped with 50-59 (23 deaths).  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 18, 2020. . https://doi.org/10.1101/2020.05. 12.20099135 doi: medRxiv preprint However, prior research has established that after adjusting for age, sex, race/ethnicity, region, and residence type, all of which were included in this study, there is no difference between total disease burden between Veterans and non-Veterans. 20 Therefore, associations reported in this study are likely generalizable to the wider US population. Second, while individuals in VA care represent a diversity of backgrounds, women represented a small proportion of individuals in the sample. Thus, we were not powered to assess interactions between sex and race/ethnicity. Third, beyond adjusting for rural/urban and site of testing, we were not able to explore likely causes of the pronounced differential burden of Covid-19 among minority individuals. More detailed information on nursing home residence, socioeconomic status (e.g., type of employment, income, number of individuals in household) were unavailable or not consistently recorded in VA data as in most other electronic health record data sources. Fourth, as is true outside the VA, a small proportion of individuals have been tested (~1%) and rates of testing vary widely by site and within important subgroups. However, while initial testing was limited, by mid-April the VA began testing all individuals admitted to hospital and before any inpatient or outpatient procedures are done for clearance even in those not suspected to have Covid-19. We performed sensitivity analyses to see if associations with testing positive changed over time and found conclusions were broadly similar but with some important distinctions. However, our models for testing positive should not be interpreted as a proxy of likelihood of infection since those with mild symptoms were unlikely to have received testing, particularly in the early stages of the outbreak.

Findings in context
Our findings of racial and ethnic disparities in Covid-19 provide important distinctions from previous reports in the US and other countries with ethnically diverse populations. To our knowledge, the largest national study of race/ethnicity and Covid-19 in the US to date included 580 individuals across 14 states, showing that black individuals represented 33% of Covid-19 hospitalizations despite only making up 18% of the total population studied. 7 In the UK, which was the first country with an ethnically diverse population to experience a Covid-19 outbreak, 21 a study of 17 million individuals showed that those from minority backgrounds had a substantially higher risk of in-hospital death from Covid-19, which was not fully explained by underlying health conditions or deprivation. 22 While our study also found racial and ethnic disparities that were not fully explained by health status, we found that these disparities occurred primarily at stages prior to hospitalization (i.e., testing positive for Covid-19). We found no evidence of racial or ethnic disparities in 30-day mortality once models were restricted to those who tested positive for Covid-19.
We demonstrated that black and Hispanic individuals were more likely to test positive than their white counterparts even after accounting for underlying health conditions, other demographics, residence type, and site of care. Our finding may still be an underestimate of the US population risk as health disparities in VA tend to be smaller than in the private sector. 23 Based on experience with 1918 Spanish Flu and the 2009 H1N1 epidemic, public health experts have warned that racial and ethnic minority populations may be at higher risk during infectious disease outbreaks due to underlying health conditions, lower access to care, and socioeconomic conditions. 24 25 Prior reports have highlighted that members of racial and ethnic minorities are . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 18, 2020. . https://doi.org/10.1101/2020.05. 12.20099135 doi: medRxiv preprint more likely to live in densely populated areas or multi-generational households, and minority groups are overrepresented in jails, prisons, and detention centers, all of which lead to reduced capacity to implement physical distancing. [26][27][28][29][30] Similarly, black and Hispanic workers are more likely than their white counterparts to be workers in essential industries who continue to work outside the home despite outbreaks in their communities making them more prone to infection. [30][31][32] Testing rates for Covid-19 in the VA were higher among black and Hispanic individuals compared to white individuals. Local reporting from metropolitan areas with large minority populations, including New York City 33 and Chicago, 34 have highlighted the disproportionate impact of Covid-19 in minority communities. We showed that VA facilities in these cities and others around the country that conducted the highest number of Covid-19 tests also had the highest proportion of black individuals in care. There was a striking difference in the rate of Covid-19 testing by age across race/ethnicity groups, whereby higher rates of testing were observed for middle-aged, white individuals while testing rates steadily increased with age and was highest among those aged 80 or more years among black and Hispanic individuals. White individuals living in urban residences were more likely to be tested than those living in rural residences while the opposite was true among black and Hispanic individuals. Among all individuals tested, those living in urban residences were more likely to test positive for Covid-19 than those living in rural residences. Finally, women were more likely to be tested for Covid-19 than men, but men were twice as likely to test positive. This association strengthened after adjustment and in conditional analyses but should be considered preliminary given limited numbers of women in this analysis.

Policy implications
We have been able to confirm several local and national reports that black and Hispanic individuals in the US are impacted by Covid-19 differentially to their white counterparts. While other studies have focused on hospitalized individuals, this study is the first and largest to show that, at least through May 4, 2020, black and Hispanic patients were more likely to be tested and to test positive. We found no difference in 30-day mortality among Covid-19 cases. These findings highlight the urgent need for improved strategies to contain and prevent further outbreaks in the US, particularly within black and Hispanic communities.

Future research
We appeal to other researchers investigating racial and ethnic disparities to perform analyses on the entire population at-risk for Covid-19, where data are available, and compare findings associated at each stage in the clinical course of Covid-19, from testing to outcomes. In this paper, we focused only on 30-day mortality among Covid-19 cases. We plan to explore other outcomes, including hospitalization, intensive care, and intubation, in subsequent analyses to examine whether racial and ethnic disparities exist in the clinical course of Covid-19 after testing positive and before death. Among other factors, future research should consider the role of employment type, nursing home residence, and incarceration. Other racial and ethnic minorities in the US deserve attention and while we did not have enough power to include other groups in this analysis, we will continue to monitor these numbers for future research.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Conclusion
While minority individuals with Covid-19 did not appear to have worse outcomes, black race and Hispanic ethnicity were strongly associated with testing positive for Covid-19 suggesting a substantial excess burden of Covid-19 in US minority populations.

Acknowledgements
The views and opinions expressed in this manuscript are those of the authors and do not necessarily represent those of the Department of Veterans Affairs or the United States Government. The authors wish to recognize Dr. Kendall Bryant as the NIAAA Scientific Collaborator.

Funding
This work was supported by National Institute on Alcohol Abuse and Alcoholism [U01-AA026224, U24-AA020794, U01-AA020790, U10-AA013566]. The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Conflicts of interest
All authors have completed the ICMJE Unified Competing Interest form (available on request from the corresponding author) and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. In summary, the authors declare no conflicts of interest. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Contributorship
The copyright holder for this preprint this version posted May 18, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 18, 2020.  Additionally adjusted for baseline comorbidity (asthma, cancer, chronic kidney disease, chronic obstructive pulmonary disease, diabetes mellitus, hypertension, liver disease, vascular disease), substance use (alcohol consumption, alcohol use disorder, smoking status), and medication history (angiotensin converting enzyme inhibitor, angiotensin II receptor blocker) a Low number of mortality events in age groups 20-39 (0 events) and 40-49 (3 events) thus grouped with 50-59 (23 deaths) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 18, 2020. . https://doi.org/10.1101/2020.05.12.20099135 doi: medRxiv preprint