As diagnostic tests for COVID-19 were broadly deployed under Emergency Use Authorization, there emerged a need to understand the real-world utilization and performance of serological testing across the United States.
Six health systems contributed electronic health records and/or claims data, jointly developed a master protocol, and used it to execute the analysis in parallel. We used descriptive statistics to examine demographic, clinical, and geographic characteristics of serology testing among patients with RNA positive for SARS-CoV-2.
Across datasets, we observed 930,669 individuals with positive RNA for SARS-CoV-2. Of these, 35,806 (4%) were serotested within 90 days; 15% of which occurred <14 days from the RNA positive test. The proportion of people with a history of cardiovascular disease, obesity, chronic lung, or kidney disease; or presenting with shortness of breath or pneumonia appeared higher among those serotested compared to those who were not. Even in a population of people with active infection, race/ethnicity data were largely missing (>30%) in some datasets—limiting our ability to examine differences in serological testing by race. In datasets where race/ethnicity information was available, we observed a greater distribution of White individuals among those serotested; however, the time between RNA and serology tests appeared shorter in Black compared to White individuals. Test manufacturer data was available in half of the datasets contributing to the analysis.
Our results inform the underlying context of serotesting during the first year of the COVID-19 pandemic and differences observed between claims and EHR data sources–a critical first step to understanding the real-world accuracy of serological tests. Incomplete reporting of race/ethnicity data and a limited ability to link test manufacturer data, lab results, and clinical data challenge the ability to assess the real-world performance of SARS-CoV-2 tests in different contexts and the overall U.S. response to current and future disease pandemics.
Citation: Rodriguez-Watson CV, Sheils NE, Louder AM, Eldridge EH, Lin ND, Pollock BD, et al. (2023) Real-world utilization of SARS-CoV-2 serological testing in RNA positive patients across the United States. PLoS ONE 18(2): e0281365. https://doi.org/10.1371/journal.pone.0281365
Editor: AbdulAzeez Adeyemi Anjorin, Lagos State University, NIGERIA
Received: October 5, 2021; Accepted: January 22, 2023; Published: February 10, 2023
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: All relevant data are contained within the paper and its Supporting information files.
Funding: Financial support for this work was provided in part by a grant from The Rockefeller Foundation (HTH 030 GA-S). BDP, CK, WGJ used funding provided by Yale University-Mayo Clinic Center of Excellence in Regulatory Science and Innovation (CERSI), a joint effort between Yale University, Mayo Clinic, and the U.S. Food and Drug Administration (FDA) (3U01FD005938) (https://www.fda.gov/). JA supported by award number A128219 and Grant Number U01FD005978 from the FDA, which supports the UCSF-Stanford Center of Excellence in Regulatory Sciences and Innovation. AJB was funded by award number A128219 and Grant Number U01FD005978 from the FDA, which supports the UCSF-Stanford Center of Excellence in Regulatory Sciences and Innovation (CERSI). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the HHS or FDA. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Natalie E Sheils is an employee of Optum Labs and owns stock in the parent company UnitedHealth group. Anthony M Louder is a paid employee of Aetion and hold Aetion stock options. Nancy D. Lin was an employee of Health Catalyst at the time the work was performed. Jennifer Gatz is a full-time employee of Regenstrief Institute, which provides independent research services to entities including those within the pharmaceutical and medical device industries. Shaun J. Grannis serves as Chief Medical Information Officer for the Indiana Health Information Exchange, and is a founding partner of Uppstroms, LLC. Carly Kabelac is a paid employee of Aetion and hold Aetion stock options. Carrie L. Byington has intellectual property in and receives royalties from BioFire, Inc. She serves as a scientific advisor to IDbyDNA (San Francisco, CA and Salt Lake City, UT). Dr. Byington is on the Board of the Commonwealth Fund. Atul J. Butte is a co-founder and consultant to Personalis and NuMedii; consultant to Samsung, Mango Tree Corporation, and in the recent past, 10x Genomics, Helix, Pathway Genomics, and Verinata (Illumina); has served on paid advisory panels or boards for Geisinger Health, Regenstrief Institute, Gerson Lehman Group, AlphaSights, Covance, Novartis, Genentech, Merck, and Roche; is a shareholder in Personalis and NuMedii; is a minor shareholder in Apple, Facebook, Alphabet (Google), Microsoft, Amazon, Snap, Snowflake, 10x Genomics, Illumina, Nuna Health, Assay Depot (Scientist.com), Vet24seven, Regeneron, Sanofi, Royalty Pharma, Pfizer, BioNTech, AstraZeneca, Moderna, Biogen, Twist Bioscience, Pacific Biosciences, Editas Medicine, Invitae, Doximity, and Sutro, and several other non-health related companies and mutual funds; and has received honoraria and travel reimbursement for invited talks from Johnson and Johnson, Roche, Genentech, Pfizer, Merck, Lilly, Takeda, Varian, Mars, Siemens, Optum, Abbott, Celgene, AstraZeneca, AbbVie, Westat, several investment and venture capital firms, and many academic institutions, medical or disease specific foundations and associations, and health systems. Atul Butte receives royalty payments through Stanford University, for several patents and other disclosures licensed to NuMedii and Personalis. Carla Rodriguez-Watson receives research support from Merck, Novartis, Pfizer, Lilly, Janssen, and AbbVie. She holds minor stock in Gilead.
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2); originally identified in Wuhan, China in December 2019 . In January 2020, COVID-19 was declared a public health emergency in the United States as the disease continued to spread worldwide. As new variants continue to threaten health and well-being across the globe, valid serology tests are needed to support the characterization of immune response—overall and within different subpopulations—to identify effective treatments, prophylaxis, and mitigation strategies [2, 3]. Given the public health emergency, currently authorized serologic assays to test for antibodies against SARS-CoV-2 have not undergone the same evidentiary review standards required for the Food and Drug Administration (FDA) approval [4, 5]. A collaboration among the US National Cancer Institute, Centers for Disease Control and Prevention (CDC), Biomedical Advanced Research and Development Authority (BARDA), and the Food and Drug Administration (FDA) led to the development of a dataset to compare the performance characteristics of different serological tests that were independently evaluated using sample panels of patients who were positive and negative for SARS-CoV-2 antibodies . However, as the sample size of the dataset is limited, more robust population-based studies on the accuracy of serology tests are needed to support assay selection and implementation, interpretation of seroepidemiologic studies, and estimates of COVID-19 prevalence and immune response . Additionally, given disproportionate infection rates in communities of color  and asymptomatic spread and carriage of COVID-19 [9–12], understanding the best use of serologic tests to estimate the true prevalence of disease and immunity is critical to developing sound public health mitigation strategies that serve all communities.
A critical piece to enable the assessment of real-world performance is the ability to link manufacturer test information, lab results, and patient healthcare data. Despite several initiatives to improve interoperability of healthcare data, there are few incentives to create digital “bridges” enabling public health and research networks to leverage more complete data sets for rapid analysis and discovery . The absence of unique device identifiers (UDIs) for clear and unambiguous identification of specific diagnostic tests; and the limited integration and flow of manufacturer assay information impedes the interpretation of seroepidemiologic studies and estimates of COVID-19 prevalence.
An initial step to address this challenge is to identify which metadata can be captured and explore approaches to transmitting data between the instrument, laboratory information system (LIS), and electronic health record (EHR). Enabling such interoperability would likewise allow us to assess the real-world performance of serological tests and describe results in the context of clinical symptoms. Additionally, disproportionately high infection rates in underserved communities and asymptomatic carriage and spread of SARS-CoV-2 [9, 11] underscore the need for reliable serologic test reporting to accurately estimate disease prevalence and to develop equitable public health mitigation strategies [14, 15]. Recent studies by the Centers for Disease Control (CDC) describe SARS-CoV-2 seroprevalence across the U.S. from convenience samples retrieved from routine blood chemistry , and others describe the duration of antibody response [17–20]. However, to our knowledge, few studies characterize the real-world use of serological testing for COVID-19, particularly in the context of symptoms and race .
To address these gaps, the Reagan-Udall Foundation for the FDA, in collaboration with the FDA and Friends of Cancer Research. has convened the COVID-19 Evidence Accelerator (EA). The EA is a consortium of leading experts in health systems research, regulatory science, data science, and epidemiology, specifically assembled to analyze health system data to address key questions related to COVID-19. The EA provides a platform for rapid learning and research using a common analytic plan. In May 2020, the EA launched the Diagnostics EA. As part of the Diagnostics EA, we examined patterns of COVID-19 serological testing using real-world data among the different populations and clinical characteristics. Specifically, our objectives were to 1) understand the current state of data interoperability across instrument, laboratory, and clinical data; 2) describe serological testing by demographic, environmental characteristics (e.g., geographic location), baseline clinical presentation, key comorbidities (e.g., diabetes and cardiovascular disease), and bacterial/viral co-infections (e.g., influenza), and 3) assess the timing of serology testing relative to molecular testing date by the characteristics listed above. Characterizing how serology tests were used (including which tests were used, when, and in whom), as well as potential gaps in data, provide an important context to interpret future results to describe diagnostic accuracy.
Materials and methods
Study population and setting
A call to participate in this descriptive analysis was put out to the Evidence Accelerator (EA) community. Six health systems answered the call and collaborated on the Diagnostics EA: Aetion and HealthVerity, Health Catalyst, Mayo Clinic, OptumLabs, Regenstrief Institute, and the University of California Health System. Health Catalyst, Mayo Clinic, and the University of California Health System all utilized EHR data from their respective healthcare delivery systems, Regenstrief Institute accessed EHR clinical data from the Indiana health information exchange [22, 23], while Aetion and OptumLabs utilized medical and pharmacy claims, as well as data directly from laboratories. Furthermore, Aetion drew hospital billing data from the HealthVerity Marketplace. OptumLabs utilized administrative claims data from a single, large, U.S. insurer. We refer to these health systems as partners A-F for the purposes of anonymity. Data sources included in the analysis are generally categorized as either payer (claims) or healthcare delivery systems. As illustrated in Fig 1, data were drawn from across the U.S. with heavy representation in California, Illinois, Ohio, and Michigan. Characteristics of participating data sources and representative populations are described in S1 Table.
Reprinted from brightcarbon.com under a CC BY license, with permission from Bright Carbon, original copyright (2021). Each color represents the number of data partners with a presence in each state but does not necessarily correspond to the number of people. The darkest color represents those where all six partners had a presence.
Each partner analyzed data collected from their distinct sources according to a master protocol and identified patients across settings (e.g., inpatient, outpatient, or long-term care facility) who tested positive for SARS-CoV-2 ribonucleic acid (RNA) by molecular test between March–September 2020, except one partner who went through April 30, 2021 (Fig 2). “Date of RNA positive” served as the index (cohort entry) date and was defined hierarchically as either the date at 1) sample collection; 2) accession; or 3) result. Among datasets that included primarily claims data, our protocol excluded persons who did not have evidence of enrollment for at least six months in the year before the index to decrease bias in the capture of pre-existing conditions. We did not implement similar data requirements from healthcare delivery systems and health information exchanges (HIEs), given the lack of membership data. We identified comorbidities (pre-existing conditions) 365 days before the index date.
Follow up for serological testing, excluding immunoglobulin M tests, went through 90 days after the index date in all but one partner who identified all RNA positive and serology tests through April 30, 2021 without additional follow-up time for serology. Multiple serological measures were captured. Among those who received a serological test, we described the prevalence of presenting symptoms; concomitant infections with influenza and respiratory syncytial virus; time (in days) to the first serological test; and the number of serological and molecular tests in the 90 days after index.
To minimize the effect of differential missingness between partners, we did the following: 1) included all persons with an office or telephone visit in the +/- 14 days around the index date to enable as complete an assessment of presenting symptoms as possible; 2) in claim systems, included only persons with at least six months of enrollment in the year before index; 3) estimated the proportion of patients at each site who had zero encounters in the prior year to contextualize our capture of pre-existing conditions; and excluded variables from analysis if ≥30% of values were missing. Between 35–65% of patients identified from health care delivery systems had no documented encounter in the system in the 365 to 15 days before the index date. In contrast, only 11% of patients from national insurers reported having zero claims in the baseline period. We also assessed the distribution of age, sex, and geography in those with and without data on serology manufacturers. We did not observe any difference by age or sex in those with known versus unknown serology manufacturer information. In a single partner reporting <30% missing race/ethnicity, we observed over-representation of White and Hispanic individuals in those with known serology manufacturer data.
Demographic and environmental characteristics, baseline clinical presentation, key comorbidities, bacterial/viral co-infections, and test characteristics potentially related to serological testing were included in the analysis (S1 Fig). We identified comorbidities and clinical presentation using phenotypes defined by ICD-10, and/or National Drug Codes. We provided coding algorithms used for other EA studies and from FDA’s Sentinel Initiative for partners to use, while some partners used existing algorithms generated within their systems. The ICD-10 codes used to identify comorbidities are listed in S2 Table. Given differences in data availability across partners, each partner identified which of the prescribed covariates could be included in their analyses.
We interviewed diagnostic manufacturers, clinical laboratory directors, middleware and information technology vendors, and clients to understand the data generated by the instrument and the data flow from the instrument to information systems for laboratory and clinical data.
Descriptive analyses were performed separately by each contributing data partner in accordance with a common analytic plan. Among persons with and without serology, we calculated the distribution by age, sex, race, ethnicity, U.S. region, pre-existing medical conditions (including but not limited to cardiovascular disease, hypertension, kidney disease, asthma, dementia, chronic liver disease, etc.), smoking status, and obesity. We also analyzed body mass index (BMI), pregnancy status, presenting symptoms, and RNA test manufacturer. Among those with at least one serology test after index, we described the frequency of presenting symptoms and the specific manufacturer/assays at the time of the first serology test, and the time to the first test. We calculated the median and interquartile range (IQR) for the number of days between RNA and the first test. Separately, we included all serology and RNA tests after the index date to describe the median and IQR for the number of molecular and serological tests conducted after the index date.
The WCG Institutional Review Board (IRB), the IRB of record for the Reagan-Udall Foundation for the FDA, reviewed the study and determined it to be non-human subjects research.
Study samples ranged from 36,319–363,653 individuals per data set—a total of 930,669 people with a confirmed SARS-CoV-2 infection by molecular test across all partners contributing data from March 1- September 30, 2020; and a sixth partner who captured data through April 30, 2020. As described in Table 1, the study population across all datasets was predominantly female, White, and 45–64 years of age. The geographic distribution of patients included in the analyses represented the population in each of the health systems, with two national datasets drawing primarily from the Mid-Atlantic region. Among two datasets, a majority of the sample population had no evidence of pre-existing conditions, whereas in two nationally representative samples, 30–50% had evidence of such. The most prevalent pre-existing conditions across healthcare partners were diabetes, hypertension, cardiovascular disease, obesity, and lung conditions. Across all healthcare partners, 4–11% of the female population were pregnant in the 40 weeks before the index date. The most common presenting symptoms at index were cough, shortness of breath, and pneumonia. The prevalence of lab-confirmed concomitant respiratory syncytial virus or influenza was <1%.
Serological testing (serotesting)
Generally, 3–6% of those with confirmed infection were serotested–a total of 35,806 people observed.across all datasets. Nearly all follow-up serological tests were immunoglobulin G (IgG) tests (Table 2). Generally, each partner utilized one or two primary serology tests and did not support a large number of tests.
Serology manufacturer and test name were captured by four analytic partners, and mostly complete (<30% missing) for three included in this analysis (A, C, E). One of our largest partners was missing manufacturer data in 85% of the sample, and two partners were missing it completely. While manufacturer and assay name, as well as other metadata, are typically captured and available for export from the instrument, oftentimes laboratory information systems are not configured to receive or store this information. Constraints on integration include technical limitations of software and middleware, as well as a lack of clinical need, business case, or regulatory incentive. Capturing, storing, and transferring this additional data would require a substantial investment of resources to modify and/or reconfigure existing instruments, laboratory information systems, connective middleware, and EHRs. Absent a regulatory or reimbursement requirement, companies perceive little need to invest such resources given other competing priorities.
Serotesting by demographic characteristics
Overall, we observed a higher distribution of persons aged 45–64 among those serotested compared to those not serotested. Four partners representing healthcare delivery systems reported race with <30% missing. Across three of these partners, we observed a higher distribution of White individuals among those serotested compared to those not. We did not observe a consistent pattern in serotesting by sex.
Five partners had representation across more than one region of the U.S. In partners with national representation, patients from the West North Central (Iowa, Nebraska, Kansas, North Dakota, Minnesota, South Dakota, Missouri) and West South Central (Arkansas, Louisiana, Oklahoma, Texas) regions were under-represented among the serotested. Two partners operated primarily in a single U.S. state and thus did not allow assessment of geographic differences.
Serotesting by care-setting, symptoms, and pre-existing conditions
Half of the partners reported care-setting. Generally, most of the population was seen in the outpatient setting for their index visit. Large national insurer data did not suggest any differences in the distribution of index visit care settings among serotested vs. non-serotested. However, EHR data from a large national health data consortium revealed a higher distribution of patients in the inpatient setting among the serotested compared to non-serotested (13% vs. 2%, respectively).
As shown in Table 3, four of six partners reported presenting symptoms at index. Patterns in serotesting by symptoms seem to align with the data source. In partners who relied on claims data, we generally see no systematic trend in serotesting by presenting symptoms at the time of the index visit. Among systems that relied on EHR data, we see a higher distribution of patients with shortness of breath (15–20%), pneumonia (15–37%), and cardiovascular conditions (29–35%) among the serotested vs. non-serotested (10–15%, 10–16%, 17%, respectively).
All but one data partner reported pre-existing conditions. We found individuals with pre-existing cardiovascular disease tended to have greater representation in the serotested (35%–48%) vs. non-serotested group (17%–40%). In partners with EHR data, a greater distribution of patients with pre-existing obesity and kidney disease were also observed among the serotested compared to non-serotested. We did not observe a differential in testing among pregnant women–although only half of the contributing partners reported pregnancy status. We observed similar patterns of pregnancy among women with serological testing (4–13%) compared with women without serological testing (2–11%), with a slightly higher range in prevalence of pregnancy among women with serological testing.
As shown in Table 3 and Fig 3, many of the same symptoms at the time of RNA testing persisted at the time of serotesting, which may be attributed to the high volume of same-day molecular and serological testing.
Frequency and time to serological testing
In all but one healthcare system, serological testing increased substantially after May 1, 2020 (Table 1). Serological testing among those with positive RNA ranged from 3–6% across our contributing partners. Among all people with follow-up samples, 15% had a follow-up serology within 14 days of the index RNA test (Fig 3).
Overall, the median time to serotesting from RNA per data partner ranged from 10–31 days and was shorter in datasets from systems with data from EHRs (Table 4). In terms of age, adults 85 years and older tended to have the shortest time to follow-up between molecular and serology testing (median range: 1–25 days). In partners with robust capture of race and ethnicity, Black patients (median: 7–15 days) tended to experience a shorter time to serotesting as compared to White individuals (median: 13–21 days). In half of the analytic datasets, time to serotesting tended to be shortest in people with a history of dementia (median: 2–15 days). Within and across datasets, there was substantial variability in time to serotesting by presenting symptoms at index. In the two partners reporting on pregnancy, time to serotesting did not tend to differ by pregnancy status.
In general, we did not observe repeat molecular or serological testing within the 90–day time frame. In partners A–E, the median (IQR) number of both tests was 1 (0); while in partner F it was 1 (1). Time to serotesting tended to be shorter for IgG tests as compared to total antibody. There was substantial variation in time to serological testing across manufacturer assays (both molecular and serological). We observed differences in time to serological testing across care settings in only one dataset, with the median time to serotesting being 0 in the inpatient setting and almost one month in the outpatient. Patients with index dates after May 1st, 2020 tended to wait fewer days for serological testing (median: 7–27) compared to those with index before May 1st, 2020 (median: 28–43). This difference may be explained by the lower availability of SARS-CoV-2 tests before May 1 since serology tests were not authorized before April 15, 2020; and molecular tests were not authorized before March 15, 2020.
The Centers for Disease Control has initiated several large-scale population-based seroprevalence studies throughout the U.S. . We conducted this study to characterize the real-world use of COVID-19 serological testing. We identified a number of key findings: 1) a substantial proportion of serology tests were conducted within 14 days of the RNA test, the majority of which occurred on the same day as the positive RNA test; 2) a lack of data interoperability between the instrument, laboratory, and clinical data could limit the ability to conduct a large-scale assessment of the real-world performance of not only COVID-19 tests, but other diagnostic and laboratory tests; 3) missing race/ethnicity data may impede a comprehensive understanding of racial disparities involved in COVID-19 serology and immunity, and 4) important differences in the testing landscape presented from claims vs. EHR data sources may impact results generated from these data sources.
We assumed the date of a positive SARS-CoV-2 molecular test would be a reasonable proxy for symptom onset. We did not expect that 15% of serotesting would occur within 14 days of the RNA test, and most often on the same day. This is an important finding because we would not expect concordance between molecular and serology tests taken in close proximity because of known viral kinetics [25–27] After consulting with our analytic partners, we discovered the implementation of policies within health systems to screen patients admitted for procedures for active or past SARS-CoV-2 to evaluate the risk of nosocomial infections. These policies may be driving observed differences in the median time between molecular and serology tests in claims (31 days), compared to EHR datasets (10–24 days), with the nuance being washed out in larger claims datasets that incorporate a mix of care settings. Clinicians may also be serotesting because they do not believe that patients are presenting close to the time of exposure, desire a better understanding of patients’ disease progression, or to assist in determining clinical course of care, which may depend on whether patients are at increased risk for severe illness due to insufficient antibody response . For all diagnostic and serological tests authorized by the FDA, the FDA produces fact sheets for healthcare providers to provide information about the assay and its limitations . Continued guidance and communication are needed to help clinicians understand how to best use serological tests for SARS-CoV-2 [30, 31].
A higher distribution of patients presenting with respiratory, metabolic, and cardiovascular symptoms among the serotested compared to non-serotested, is consistent with an evaluation by the CDC that indicated such factors are associated with severe COVID-19 illness . Patients with a pre-existing history of cardiovascular disease (including hypertension) and liver disease were over-represented among those serotested vs. those not serotested in multiple datasets. These conditions have been shown to be associated with excess risk in other studies [33, 34]. It was surprising that we did not observe any differences in the distribution of cancer in those serotested compared to the non-serotested. More research is needed to understand why some patients with known active SARS-CoV-2 infection receive a serology test, while others do not.
Across care delivery systems, a notable observation was increased serological testing in White as compared to Black individuals. However, when Black patients did receive serology testing, the time to testing was shorter, which may be due to a pressing need to identify the presence of antibodies/past infection in populations who have been shown to be at higher risk of COVID-19 morbidity and mortality . More importantly, data on race from a large national insurer was missing in about 80% of the sample. Without data on race and ethnicity, the racial disparities in COVID-19 outcomes—and healthcare in general–cannot be addressed.
Another important information gap is in manufacturer data. Despite targeted conversations with technology teams and experts in technical, syntactic, and semantic interoperability, only half of analytic partners were able to integrate test manufacturer data with LIS and EHR data. A lack of data interoperability within healthcare is a historic problem . Such interoperability is the foundation for public health surveillance, research, artificial intelligence, medical advances, and quality assurance in the context of EUA [36, 37]. Healthcare systems, manufacturers, and information technology vendors should move to fill information gaps to improve response to COVID-19 and future public health threats.
Differences in results reported by claims vs. EHR-based systems
Analytic partners ran their analyses in parallel and aligned on a common analytic plan. We did not pool data, which allowed us to highlight, rather than control for differences across partners. Different patterns between EHR and claims systems were apparent in our analysis. In general, claims datasets showed no difference in serotesting by care setting or presenting symptoms, whereas EHR systems did. And while all datasets showed an elevated prevalence of pre-existing cardiovascular disease observed among those serotested (compared to the non-serotested), EHR datasets also showed a greater distribution of people with pre-existing obesity, kidney disease, and chronic lung conditions among the serotested. Because healthcare delivery systems generally have a limited ability to capture all clinical events for a given patient , sicker patients may be driving identification within certain health systems and pre-existing conditions may have been missed in patients who do not regularly attend the facility for care but were diverted to the facility . Our data support this hypothesis on both points of increased illness among patients and lower identification of pre-existing conditions among patients identified from EHR compared to claims data sources. These differences may influence the interpretation of serology tests [38–42].
Strengths and limitations
Our study has many strengths. This was a large assessment of serotesting across the U.S. in diverse datasets leveraging either EHR or claims data. We developed a protocol that incorporated the unique characteristics of each data source and provided a forum to transparently communicate and collaborate on study design and interpretation. We also established a platform to rapidly collect and analyze data from various systems to evaluate process improvement and identify important trends over time. Such a platform may be used to evaluate process improvement and comparisons within data systems.
Our study also has some important limitations. First, we were unable to assess the independence of samples across the healthcare partners directly. Three partners provide national coverage, and thus large sample sizes. The geographic distribution of their populations does not suggest overlap. However, single health systems included in the same geographic region as the larger healthcare partners (specifically in the Pacific and Mountain regions) may be double counted. Second, smoking status, BMI, and race were largely missing in our analysis. These are important characteristics in assessing the impact of COVID-19 on the health of the population. Third, the sample collection date was not always available the and result date was used by some partners. As such, it is possible that samples collected on the same day may have different result dates if tests were run sequentially. Fourth, manufacturer information was largely missing from two of our largest datasets because instrument data either did not flow to the laboratory information system (LIS), or those results were not transmitted from the LIS to the EHR or payer database. However, we did not find differential missingness by age, sex, or geography among individuals with and without manufacturer data. Finally, lack of data on COVID-19 exposure and symptom onset limits our ability to make future inferences on appropriate pairs of molecular and serological tests to assess serological performance for past infection. We note that assumptions regarding the proximity of RNA testing to symptom onset may not be reliable over time. Testing for active infection has gone from severely limited at the start of the pandemic (March-April 2020) to widely available today. People may receive serial RNA testing without suspected exposure for purposes of employment or recreational gathering with friends and family.
As in all observational datasets, the completeness of our assessment is dependent on the capture of events in each of our healthcare data partners. Indeed, we observed that a greater proportion (35–65%) of patients identified in EHR data had no encounter in the year prior to index, compared to 11% among those identified from payer data. Coupled with our observation from EHRs that there seemed to be a greater number of pre-existing conditions for which there was preferential serotesting, these data provide additional evidence that patients identified through EHR data sources may tend to be sicker than those identified in claims. Furthermore, not knowing "care setting" for a large portion of tests could affect interpretation of the performance of serology testing as well, since the sensitivity of serology assays appears to be lower in mildly sick and/or asymptomatic cohorts.
Our results inform the underlying context of serotesting during the first year of the COVID-19 pandemic and differences in serotesting trends observed from claims and EHR data sources–a critical first step to understanding the real-world accuracy of serological tests. The limited ability to link test manufacturer data with lab results and clinical data, and incomplete reporting of race/ethnicity data challenge the ability to assess real-world performance of SARS-CoV-2 tests in different populations and settings. These shortcomings challenge the overall U.S. response to current and future disease pandemics.
S1 Table. Characteristics of participating data sources and representative populations.
S2 Table. Phenotype (code-lists) for specified presenting symptoms & pre-existing conditions.
We would like to thank Christina Silcox, Shamiram Feinglass, Roland Romero, James Okusa, Elijah Mari Quinicot, Amar Bhat, Susan Winckler, Alecia Clary, Sadiqa Mahmood, Philip Ballentine, Perry L. Mar, Cynthia Lim Louis, Connor McAndrews, Elitza S. Theel, Cora Han, Pagan Morris, Charles Wilson, and Bridgit O Crews for their engagement, and assistance with this manuscript. We would also like to note Daniel Caños, Sara Brenner, Wendy Rubinstein, Veronica Sansing-Foster, and Sean Tunis for their support and feedback during this work. A special thanks and recognition for the contributions and sacrifice of Dr. Michael Waters, our dear colleague and friend who will be forever in our thoughts. We thank Amir Alishahi Tabriz MD, PhD for his assistance with manuscript preparation.
- 1. Velavan TP, Meyer CG. The COVID‐19 epidemic. Tropical medicine & international health. 2020;25: 278. pmid:32052514
- 2. Hanson KE, Caliendo AM, Arias CA, Englund JA, Lee MJ, Loeb M, et al. Infectious Diseases Society of America guidelines on the diagnosis of COVID-19. Clinical infectious diseases. 2020.
- 3. Cheng MP, Yansouni CP, Basta NE, Desjardins M, Kanjilal S, Paquette K, et al. Serodiagnostics for Severe Acute Respiratory Syndrome–Related Coronavirus 2: A Narrative Review. Annals of internal medicine. 2020;173: 450–460. pmid:32496919
- 4. Lassaunière R, Frische A, Harboe ZB, Nielsen AC, Fomsgaard A, Krogfelt KA, et al. Evaluation of nine commercial SARS-CoV-2 immunoassays. MedRxiv. 2020.
- 5. Whitman JD, Hiatt J, Mowery CT, Shy BR, Yu R, Yamamoto TN, et al. Test performance evaluation of SARS-CoV-2 serological assays. MedRxiv. 2020.
- 6. CDC. Cases, Data, and Surveillance. In: Centers for Disease Control and Prevention [Internet]. 11 Feb 2020 [cited 2 Jan 2022]. https://www.cdc.gov/coronavirus/2019-ncov/covid-data/serology-surveillance/serology-test-evaluation.html
- 7. Administration UF and D. In vitro diagnostics EUAs. Food and Drug Administration. 2020.
- 8. Gold JA, Rossen LM, Ahmad FB, Sutton P, Li Z, Salvatore PP, et al. Race, ethnicity, and age trends in persons who died from COVID-19—United States, May–August 2020. Morbidity and Mortality Weekly Report. 2020;69: 1517. pmid:33090984
- 9. Arons MM, Hatfield KM, Reddy SC, Kimball A, James A, Jacobs JR, et al. Presymptomatic SARS-CoV-2 infections and transmission in a skilled nursing facility. New England journal of medicine. 2020;382: 2081–2090. pmid:32329971
- 10. Wei WE, Li Z, Chiew CJ, Yong SE, Toh MP, Lee VJ. Presymptomatic transmission of SARS-CoV-2—Singapore, january 23–march 16, 2020. Morbidity and Mortality Weekly Report. 2020;69: 411. pmid:32271722
- 11. Mizumoto K, Kagaya K, Zarebski A, Chowell G. Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020. Eurosurveillance. 2020;25: 2000180. pmid:32183930
- 12. Lavezzo E, Franchin E, Ciavarella C, Cuomo-Dannenburg G, Barzon L, Del Vecchio C, et al. Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo’. Nature. 2020;584: 425–429.
- 13. Dzau VJ, McClellan MB, McGinnis JM, Burke SP, Coye MJ, Diaz A, et al. Vital directions for health and health care: priorities from a National Academy of Medicine initiative. Jama. 2017;317: 1461–1470. pmid:28324029
- 14. Shah M, Sachdeva M, Dodiuk-Gad RP. COVID-19 and racial disparities. Journal of the American Academy of Dermatology. 2020;83: e35. pmid:32305444
- 15. Chowkwanyun M, Reed AL Jr. Racial health disparities and Covid-19—caution and context. New England Journal of Medicine. 2020;383: 201–203. pmid:32374952
- 16. Havers FP, Reed C, Lim T, Montgomery JM, Klena JD, Hall AJ, et al. Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA internal medicine. 2020;180: 1576–1586. pmid:32692365
- 17. Garcia-Beltran WF, Lam EC, Astudillo MG, Yang D, Miller TE, Feldman J, et al. COVID-19-neutralizing antibodies predict disease severity and survival. Cell. 2021;184: 476–488. e11. pmid:33412089
- 18. Dan JM, Mateus J, Kato Y, Hastie KM, Yu ED, Faliti CE, et al. Immunological memory to SARS-CoV-2 assessed for up to 8 months after infection. Science. 2021. pmid:33408181
- 19. Yu H, Sun B, Fang Z, Zhao J, Liu X, Li Y, et al. Distinct features of SARS-CoV-2-specific IgA response in COVID-19 patients. European Respiratory Journal. 2020;56. pmid:32398307
- 20. To KK-W, Tsang OT-Y, Leung W-S, Tam AR, Wu T-C, Lung DC, et al. Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: an observational cohort study. The Lancet Infectious Diseases. 2020;20: 565–574. pmid:32213337
- 21. Mackey K, Ayers CK, Kondo KK, Saha S, Advani SM, Young S, et al. Racial and ethnic disparities in COVID-19–related infections, hospitalizations, and deaths: A systematic review. Annals of internal medicine. 2021;174: 362–373. pmid:33253040
- 22. McDonald CJ, Overhage JM, Barnes M, Schadow G, Blevins L, Dexter PR, et al. The Indiana network for patient care: a working local health information infrastructure. Health affairs. 2005;24: 1214–1220.
- 23. Dixon BE, Whipple EC, Lajiness JM, Murray MD. Utilizing an integrated infrastructure for outcomes research: a systematic review. Health Information & Libraries Journal. 2016;33: 7–32.
- 24. Bajema KL, Wiegand RE, Cuffe K, Patel SV, Iachan R, Lim T, et al. Estimated SARS-CoV-2 Seroprevalence in the US as of September 2020. JAMA internal medicine. 2021;181: 450–460. pmid:33231628
- 25. Sethuraman N, Jeremiah SS, Ryo A. Interpreting diagnostic tests for SARS-CoV-2. Jama. 2020;323: 2249–2251. pmid:32374370
- 26. Wajnberg A, Amanat F, Firpo A, Altman DR, Bailey MJ, Mansour M, et al. Robust neutralizing antibodies to SARS-CoV-2 infection persist for months. Science. 2020;370: 1227–1230. pmid:33115920
- 27. Huang AT, Garcia-Carreras B, Hitchings MD, Yang B, Katzelnick LC, Rattigan SM, et al. A systematic review of antibody mediated immunity to coronaviruses: kinetics, correlates of protection, and association with severity. Nature communications. 2020;11: 1–16.
- 28. Fact Sheet For Health Care Providers Emergency Use Authorization (Eua) Of Bamlanivimab And Etesevimab 12222021.: 45.
- 29. EUAs IVD. Serology and Other Adaptive Immune Response Tests for SARS-CoV-2. 2021.
- 30. West R, Kobokovich A, Connell N, Gronvall GK. COVID-19 Antibody Tests: A Valuable Public Health Tool with Limited Relevance to Individuals. Trends in Microbiology. 2020. pmid:33234439
- 31. Qaseem A, Yost J, Etxeandia-Ikobaltzeta I, Forciea MA, Abraham GM, Miller MC, et al. What Is the Antibody Response and Role in Conferring Natural Immunity After SARS-CoV-2 Infection? Rapid, Living Practice Points From the American College of Physicians (Version 1). Annals of Internal Medicine. 2021;174: 828–835. pmid:33721518
- 32. for Immunization NC. Science Brief: Evidence used to update the list of underlying medical conditions that increase a person’s risk of severe illness from COVID-19. CDC COVID-19 Science Briefs [Internet]. Centers for Disease Control and Prevention (US); 2021.
- 33. Williamson EJ, Walker AJ, Bhaskaran K, Bacon S, Bates C, Morton CE, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584: 430–436. pmid:32640463
- 34. Rosenthal N, Cao Z, Gundrum J, Sianis J, Safo S. Risk factors associated with in-hospital mortality in a US national sample of patients with COVID-19. JAMA network open. 2020;3: e2029058–e2029058. pmid:33301018
- 35. Richwine C, Marshall C, PMP CJ. Challenges to Public Health Reporting Experienced by Non-Federal Acute Care Hospitals, 2019.
- 36. Salmon D, Yih WK, Lee G, Rosofsky R, Brown J, Vannice K, et al. Success of program linking data sources to monitor H1N1 vaccine safety points to potential for even broader safety surveillance. Health Affairs. 2012;31: 2518–2527. pmid:23129683
- 37. Lehne M, Sass J, Essenwanger A, Schepers J, Thun S. Why digital medicine depends on interoperability. NPJ digital medicine. 2019;2: 1–5.
- 38. Zeltzer D, Balicer RD, Shir T, Flaks-Manov N, Einav L, Shadmi E. Prediction accuracy with electronic medical records versus administrative claims. Medical care. 2019;57: 551–559. pmid:31135691
- 39. Friedman A, Crosson JC, Howard J, Clark EC, Pellerano M, Karsh B-T, et al. A typology of electronic health record workarounds in small-to-medium size primary care practices. Journal of the American Medical Informatics Association. 2014;21: e78–e83. pmid:23904322
- 40. Paul MM, Greene CM, Newton-Dame R, Thorpe LE, Perlman SE, McVeigh KH, et al. The state of population health surveillance using electronic health records: a narrative review. Population health management. 2015;18: 209–216. pmid:25608033
- 41. O’Malley AS, Draper K, Gourevitch R, Cross DA, Scholle SH. Electronic health records and support for primary care teamwork. Journal of the American Medical Informatics Association. 2015;22: 426–434. pmid:25627278
- 42. Darmon D, Sauvant R, Staccini P, Letrilliart L. Which functionalities are available in the electronic health record systems used by French general practitioners? An assessment study of 15 systems. International journal of medical informatics. 2014;83: 37–46. pmid:24231269