Epidemiology of atrial fibrillation in the All of Us Research Program

Background The prevalence, incidence and risk factors of atrial fibrillation (AF) in a large, geographically and ethnically diverse cohort in the United States have not been fully described. Methods We analyzed data from 173,099 participants of the All of Us Research Program recruited in the period 2017–2019, with 92,318 of them having electronic health records (EHR) data available, and 35,483 having completed a medical history survey. Presence of AF at baseline was identified from self-report and EHR records. Incident AF was obtained from EHR. Demographic, anthropometric and clinical risk factors were obtained from questionnaires, baseline physical measurements and EHR. Results At enrollment, mean age was 52 years old (range 18–89). Females and males accounted for 61% and 39% respectively. Non-Hispanic Whites accounted for 67% of participants, with non-Hispanic Blacks, non-Hispanic Asians and Hispanics accounting for 26%, 4% and 3% of participants, respectively. Among 92,318 participants with available EHR data, 3,885 (4.2%) had AF at the time of study enrollment, while the corresponding figure among 35,483 with medical history data was 2,084 (5.9%). During a median follow-up of 16 months, 354 new cases of AF were identified among 88,433 eligible participants. Individuals who were older, male, non-Hispanic white, had higher body mass index, or a prior history of heart failure or coronary heart disease had higher prevalence and incidence of AF. Conclusion The epidemiology of AF in the All of Us Research Program is similar to that reported in smaller studies with careful phenotyping, highlighting the value of this new resource for the study of AF and, potentially, other cardiovascular diseases.


Introduction
Atrial fibrillation (AF) is a common cardiac arrhythmia associated with reduced quality of life and increased rates of stroke, heart failure, dementia, and overall mortality [1]. Numerous studies have identified major risk factors for AF, including older age, white race, male sex, hypertension, obesity and other cardiovascular diseases [2]. In the United States, studies on the prevalence, incidence and risk factors of AF have been limited to relatively small but wellphenotyped cohorts [3][4][5], or larger administrative or clinical databases restricted to individuals enrolled in geographically-restricted healthcare plans [6], receiving care in selected medical institutions [7], enrolled in specific insurance programs [8], or residing in specific regions of the United States [9]. However, no prior studies have described the epidemiology of AF in a large cohort representing the broad diversity of the US population in terms of ancestry, demographic variables, socioeconomic status, geographic location, and other characteristics.
Launched in 2018, the All of Us Research Program aims to recruit one million participants in the United States with the primary goal of accelerating biomedical research and improving the health of the public [10]. The program obtains data from participants through self-reported surveys, physical measurements, electronic health records (EHRs), and the collection and analysis of biological samples. Over 80% of study participants are from groups that have been traditionally underrepresented in biomedical research [10]. Still, the reliability of the data being collected, particularly in regards to their ability to identify risk factors for disease status, including AF, has not been established yet. Therefore, our goal was to characterize the epidemiology of AF in the All of Us Research Program to evaluate whether demographic patterns and risk factors of AF established in prior literature replicate in this novel research resource. Replicating these associations would bolster the value of All of Us data for future innovative research on AF and other cardiovascular diseases that takes advantage of the diversity of the study population.

The All of Us Research Program
The All of Us Research Program is a prospective cohort study aiming to recruit at least one million individuals in the United States, with the overall goal of providing a unique resource to study the effects of lifestyle, environment and genomics on health and health outcomes [10]. Participant recruitment is predominantly done through participating health care provider organizations and in partnership with Federally Qualified Health Centers, with an emphasis on recruiting persons affiliated with those centers. Interested potential participants can also enroll in the program as direct volunteers, visiting community-based enrollment sites. Initial enrollment, informed consent (including consent to share EHRs), and baseline health surveys are done digitally through the All of Us program website (https://joinallofus.org). Once this step is completed, the participant is invited to undergo a basic physical exam and biospecimen collection at the affiliated health care site. Participant follow-up is done in two ways, passively via linkage with EHR and actively by periodic follow-up surveys.
For this study, we included data from participants enrolled in the study between May 2017 and August 2019. This was one of several demonstration projects supported by the All of Us Research Program. Demonstration projects were designed to describe the cohort, replicate previous findings for data validation, and avoid novel discovery in line with the program value to ensure equal access by researchers to the data [11]. The work described here was proposed by Consortium members, reviewed and overseen by the program's Science Committee, and was confirmed as meeting criteria for non-human subjects research by the All of Us Institutional Review Board. The initial release of data and tools used in this work was published recently [11]. Results reported are in compliance with the All of Us Data and Statistics Dissemination Policy disallowing disclosure of group counts under 20.

The All of Us research hub
This work was performed on data collected by the previously described All of Us Research Program using the All of Us Researcher Workbench, a cloud-based platform where approved researchers can access and analyze All of Us data. All of Us data currently available include surveys, EHR, and physical measurements, with more data and data types slated to be available in future releases. The details of the surveys are available in the Survey Explorer found in the Research Hub, a website designed to support researchers [12]. Each survey includes branching logic and all questions are optional and may be skipped by the participant. Physical measurements recorded at enrollment include systolic and diastolic blood pressure, height, weight, heart rate, waist and hip measurement, wheelchair use, and current pregnancy status. EHR data was linked for those consented participants. All three datatypes (survey, physical measurements, and EHR) are mapped to the Observational Health and Medicines Outcomes Partnership (OMOP) common data model v 5.2 maintained by the Observational Health and Data Sciences Initiative collaborative. To protect participant privacy, a series of data transformations were applied. These included data suppression of codes with a high risk of identification such as military status; generalization of categories, including age, sex at birth, gender identity, sexual orientation, and race; and date shifting by a random (less than one year) number of days, implemented consistently across each participant record. Documentation on privacy implementation and creation of the curated data repository is available in the All of Us Registered Tier CDR Data Dictionary [13]. The Researcher Workbench currently offers tools with a user interface built for selecting groups of participants (Cohort Builder), creating datasets for analysis (Dataset Builder), and Workspaces with Jupyter Notebooks (Notebooks) to analyze data. The Notebooks enable use of saved datasets and direct query using R and Python 3 programming languages.

Sample selection
Among All of Us enrollees, we selected participants 18 to 90 years of age at the time of initial enrollment, who reported either male or female sex at birth, and reported race as white, black or Asian and ethnicity as Hispanic or non-Hispanic.

Characterization of atrial fibrillation
Presence of AF was determined through answers to self-reported questionnaires and from data obtained from EHRs. At enrollment, participants complete a series of online questionnaires. AF was considered to be present if they answered yes to the question "Has a doctor or health care provider ever told you that you have atrial fibrillation?" Participants responding yes to this question were considered to have prevalent AF.
Presence of AF in the EHR was determined if the EHR data contained two or more instances of atrial fibrillation, identified with selected Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) codes. A complete list of SNOMED CT codes and their corresponding OMOP concept IDs is provided in S1 Table. If the EHR-based diagnosis occurred before participant enrollment, AF was considered to be prevalent; otherwise, the diagnosis of AF was considered incident.
Of note, presence or absence of AF was not noted on the physical measurements recorded at enrollment.

Covariates
Date of birth, sex assigned at birth, race, Hispanic ethnicity, and smoking status were selfreported using online surveys. Ever smoking was defined as having smoked over 100 cigarettes over the lifetime. Height and weight were measured by trained study technicians following a standard protocol. Systolic and diastolic blood pressure were measured three times with the participant seated. The mean of the second and third measurement was used for analysis. The following comorbidities were ascertained via self-report in surveys and in the EHR: stroke, heart failure, coronary artery disease, and diabetes.

Statistical analysis
Participant characteristics were summarized by race/ethnicity and by participation in study components (EHR, medical history survey). Prevalence of AF was calculated as the proportion of participants with AF among all participants. We performed separate analyses among individuals participating in the medical history survey and among those whose EHR data were available. For each analysis, we calculated overall prevalence as well as sex, race/ethnicity, and age-specific prevalence. The incidence rates of newly diagnosed AF were calculated as the number of new AF diagnoses after enrollment date divided by person-years of follow-up among those without evidence of AF at enrollment and with EHR data. We calculated incidence of AF overall and by sex, race/ethnicity, and age. Association of risk factors with prevalent AF was evaluated with odds ratios (OR) and 95% confidence intervals (95% CI) from multivariable logistic regression adjusting for age, sex, and race/ethnicity in all models, while associations with incident AF were estimated with hazard ratios (HR) and 95% CI obtained from Cox regression models, also adjusting for age, sex, and race/ethnicity. Time to event was defined as the time in days between baseline and date of incident AF or August 31, 2019, whichever occurred earlier.

Results
Between May 2017 and August 2019, there were 173,099 individuals who enrolled in the All of Us program and met inclusion criteria. Of these individuals, 35,483 (21%) had completed the medical history survey and 92,318 (53%) had EHR data available at the time of dataset creation. There were 20,683 participants who had both medical history survey and EHR data (12% of all eligible individuals). Characteristics of study participants by race/ethnicity are presented in Table 1, while characteristics by participation in study components is provided in S2 Table. Approximately two thirds of participants were non-Hispanic Whites, with non-Hispanic Blacks, non-Hispanic Asians and Hispanics accounting for 26%, 4%, and 3% of participants, respectively. Overall mean age (standard deviation) was 52 (17), and females accounted for 61% of all participants. Prevalence of most cardiovascular diseases and risk factors was highest in non-Hispanic Blacks and lowest in non-Hispanic Asians (Table 1). Participants with both EHR and survey data were slightly older, more likely to be female and non-Hispanic White, and had lower prevalence of cardiovascular risk factors (smoking, diabetes) and cardiovascular diseases (stroke, heart failure, coronary heart disease) than those without data from the EHR or the medical history survey (S2 Table). Fig 2 presents the overlap in participants based on availability of EHR and medical history data and diagnosis of AF. In the 20,683 participants with overlapping EHR and survey data, 1,486 had AF either from EHR or medical history survey, with 717 reported in both (48%), 208 in the EHR only (14%), and 561 in the medical survey (38%).

PLOS ONE
Among eligible All of Us participants, 35,483 provided data on prior history of medical conditions, including AF. Of them, 2,084 (5.9%) reported a diagnosis of AF. Prevalence of selfreported AF by age group, race/ethnicity and sex are reported in Table 2. Prevalence increased with age, from <1% among participants younger than 40 to >20% in those 80 and older. Males and non-Hispanic Whites reported higher prevalence than females and other racial/ethnic groups. These differences persisted after simultaneous adjustment for age, sex and race/ ethnicity, with men having approximately 70% higher odds of AF than females (OR 1.7, 95% CI 1.5, 1.8), and groups other than non-Hispanic Whites having 25-43% lower odds of selfreported AF than non-Hispanic Whites. The associations were similar when the analysis was restricted to the 20,683 participants who completed the medical history survey and had available EHR data (S3 Table).
A similar pattern was observed among 92,318 All of Us participants with linked EHR. A total of 3,885 (4.2%) participants had evidence of prevalent AF in their EHR at the time of enrollment. During a median follow-up of 16 months, an additional 354 (0.4%) participants were diagnosed with AF after enrollment. Age, sex, and race/ethnicity-specific prevalence and incidence of AF are shown in Table 3. Older age was strongly associated with higher prevalence of AF. For example, compared to those younger than 40, the OR of prevalent AF among those 80 and older was 66.2 (95% CI 51.5, 85.1). Males had 90% higher odds of prevalent AF than females (OR 1.9, 95% CI 1.8, 2.0), while odds of prevalent AF were lower in all race/ethnicity groups compared to non-Hispanic Whites. Similarly, older age and male sex were also associated with higher rates of incident AF. Compared to non-Hispanic Whites, both non-Hispanic Asians and non-Hispanic Blacks experienced lower rates of AF (HR 0.41, 95% CI 0.15, 1.1, and HR 0.55, 95% CI 0.40, 0.76, respectively). Rates of AF in Hispanics were higher than in non-Hispanic Whites, but this is based on a small number of AF events (<20). Table 4 shows the association of selected cardiovascular risk factors and cardiovascular diseases with prevalent and incident AF among participants with EHR data. Higher BMI and prior history of heart failure or coronary heart disease were associated with increased prevalence and incidence of AF. Diabetes and stroke history were associated with prevalent AF but not incident AF. Finally, neither blood pressure nor smoking status were risk factors for AF in this population. The pattern of associations was comparable among the 20,683 participants who completed the medical history survey and had available EHR data (S3 Table).

Discussion
Our analysis describes the epidemiologic characteristics of AF among the diverse participants in the All of Us Research Program. The main findings were that both prevalence and incidence of AF increased with age, were higher in males than females, and non-Hispanic Whites than in other racial/ethnic groups. These patterns are comparable to those from other epidemiologic studies in the US (S4 and S5 Tables). Previously described risk factors for AF, including obesity, diabetes, and prevalence of some cardiovascular diseases, were associated with the risk of AF. The overall findings from this analysis are consistent with established observations about the epidemiology of AF. Over the last two decades, seminal studies in the epidemiology of AF conducted in community-based cohort studies have demonstrated that older age, male sex and being non-Hispanic White are strongly associated with a higher prevalence and incidence of AF [3][4][5][6]. These same studies have also contributed to identify the major risk factors for AF, including hypertension, obesity, and prior history of other cardiovascular diseases [3,14].
Analyses of administrative databases, such as Medicare, have complemented findings from traditional cohort studies on the epidemiology of AF, offsetting their limitations in validity of The intersection of these variables leads to eight different groups: (1) among participants with both EHR and medical survey, those with AF in the EHR and the medical survey (n = 717), AF in the medical survey but no in the EHR (n = 561), AF in the EHR but not in the medical survey (n = 208), and no AF in any data source (n = 19,197); (2) among participants with EHR but not medical survey, AF in the EHR (n = 3,314) and no AF (n = 68,322); and (3) among participants with medical history survey but not EHR, AF in the medical survey (n = 806) and no AF (n = 13,994). The diagram does not include 65,982 participants with neither EHR nor medical history survey.
https://doi.org/10.1371/journal.pone.0265498.g002 disease and risk factor phenotyping by their large sample size and broader geographic distribution [8,9]. In the United States, however, these administrative databases are limited to certain age groups (Medicare beneficiaries older than 65), individuals from specific socioeconomic strata (Medicaid beneficiaries), types of healthcare facilities, or geographic area (state-based databases). Nonetheless, their findings are consistent with those of traditional cohort studies [3][4][5]14].
In comparison with other studies, the All of Us Research Program provides a unique opportunity to characterize the epidemiology and risk factors of common chronic conditions, including AF, in a large and extremely diverse population. Given its specific focus on recruiting groups previously underrepresented in biomedical research, findings from All of Us have the potential of being more generalizable to the overall population. In this context, our analysis has replicated previously described aspects of the epidemiology of AF. As the All of Us Research Program develops and other types of data become available, including genomic information, the rich demographic and geographic diversity among participants will enhance research among underrepresented groups. For example, the most recent published genomewide association study of AF included less than 9,000 Blacks, of which 1,307 had AF, and approximately 3,000 Hispanics (277 AF cases) [15]. The All of Us Research Program will eventually provide data orders of magnitude larger, making possible to understand the influence of ancestry on AF risk and the impact that socioeconomic adversity, disproportionally affecting some underrepresented groups, has on AF risk and outcomes [16,17].
Most of the explored associations were consistent with previous literature. Older age, male sex and non-Hispanic white race, prevalent diabetes, a prior history of cardiovascular disease (heart failure, stroke, coronary artery disease), as well as higher body mass index, were all associated with higher risk of AF. Most of these risk factors are part of established scores for the prediction of AF [14]. However, elevated systolic blood pressure, which is a consistently described risk factor for AF, was not associated with higher AF risk in the All of Us sample. This may be due to not accounting for the use of antihypertensive medication.
We draw two major conclusions from these results. First, by observing the expected associations of demographic and clinical factors with the risk of AF, we are indirectly supporting the validity of the All of Us Research Program data for future studies on the epidemiology of AF and its outcomes. Second, our findings provide additional evidence suggesting that AF rates are highest among non-Hispanic Whites. Ancestry-related genetic factors seem to be partly responsible for this, as shown by higher risk of AF associated with increased degrees of European ancestry among African Americans [18]. Though a few reports suggest that under-ascertainment among non-Whites could partially explain these differences [19], most studies using extended electrocardiographic recordings or pacemaker data and, therefore, less likely to have biased ascertainment, confirm these racial/ethnic differences [20][21][22][23].
Our analysis has some major strengths, including the large sample size and the racial/ethnic diversity of the study population. Absence of validated AF diagnoses, lack of AF ascertainment at the time of enrollment physical measurements, and missing EHR or medical survey data in large subsets of the study population can be considered major weaknesses. Previous studies, however, have demonstrated the validity of administrative diagnostic data to identify AF diagnoses [24]. Also, using medical history or EHR data resulted in comparable patterns of AF prevalence by age, sex and race/ethnicity, indicating that both sources of information may offer valid data.
In conclusion, in a novel research resource such as the All of Us Research Program, we identified epidemiologic patterns of AF similar to what studies with different designs and methods have reported. These findings provide indirect evidence of the validity of data collected by the All of Us Research Program and support the value of this resource to conduct studies on the epidemiology of AF and, potentially, other cardiovascular diseases.