Genome-wide study of resistant hypertension identified from electronic health records

Resistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network. Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site, and a total of 3,006 cases of resistant hypertension and 876 controlled hypertensives were identified among eMERGE Phase I and II sites. After imputation and quality control, a total of 2,530,150 SNPs were tested for an association among 2,830 multi-ethnic cases of resistant hypertension and 876 controlled hypertensives. No test of association was genome-wide significant in the full dataset or in the dataset limited to European American cases (n = 1,719) and controls (n = 708). The most significant finding was CLNK rs13144136 at p = 1.00x10-6 (odds ratio = 0.68; 95% CI = 0.58–0.80) in the full dataset with similar results in the European American only dataset. We also examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. None was significant after correction for multiple testing. These data highlight both the difficulties and the potential utility of EHR-linked genomic data to study clinically-relevant traits such as resistant hypertension.


Introduction
Hypertension, or high blood pressure, affects nearly a third of the US adult population and is a major risk factor for coronary heart disease, stroke, and kidney disease [1,2]. Pharmacological treatment of hypertension has been shown to lower blood pressure and substantially reduce the risk of coronary heart disease and stroke [3,4]. While the number of patients tested and receiving treatment for hypertension has increased substantially over the last 40 years, a majority (~50%) of hypertensive Americans' blood pressure remains uncontrolled [1,5,6]. Resistant hypertension is defined as blood pressure that remains !140/90 mm Hg despite use of three concurrent antihypertensive agents from different classes, one of which includes a diuretic [7]. Other definitions also include individuals who require four or more medications to achieve a blood pressure <140/90 mm Hg. The prevalence of resistant hypertension is progressively increasing and is currently estimated to affect 8-12% of adults with hypertension [8].
Along with age and obesity, well-established independent risk factors of resistant hypertension, several other lifestyle and biological risk factors are believed to contribute to resistant hypertension, including excess alcohol use, increased dietary sodium intake, and use of several classes of medications (such as non-steroidal anti-inflammatory drugs, corticosteroids, and calcineurin inhibitors) [9]. Genetic factors may also play a role [10]. Numerous genetic variants have been identified in studies of the genetic architecture of hypertension and variation in blood pressure [11,12]. Additionally, hypertension pharmacogenomics research has implicated genes important in the inter-individual variability of response to specific antihypertensive drugs [13][14][15]. However, due to the small sample sizes, genetic studies of resistant hypertension have been limited in power and in scope (i.e., candidate gene studies) [16][17][18][19][20][21].
To help overcome the obstacle of limited sample size, we used data available from seven electronic health record (EHR) systems to identify both resistant hypertension cases and hypertensive individuals who responded well to a single antihypertensive medication (controlled hypertensives) in the electronic MEdical Records & GEnomics (eMERGE) Network [22,23]. We then performed a genome-wide association study (GWAS) to identify common variants associated with resistant hypertension in 2,830 cases and 876 controls. We also assessed whether SNPs previously associated with blood pressure or hypertension in the literature were associated with resistant hypertension, a complex and clinically-relevant phenotype.

Materials and methods eMERGE Network
The eMERGE Network and its studies are approved by the Institutional Review Board at each study site, which include Geisinger Health System, Group Health, Marshfield Clinic, Mayo Clinic, Mount Sinai School of Medicine, Northwestern University, University of Washington, Vanderbilt University [22,23]. Participants at all study sites except for Vanderbilt University provided written, informed consent. Vanderbilt University's biobank BioVU followed an optout model where DNA was extracted from discarded blood extracted for clinical purposes. The DNA linked to de-identified EHRs is considered "non-human subjects" as no personal identifying information is available to the investigators (The Code of Federal Regulations, 45 CFR 46.102 (f)) [24,25].
The eMERGE Network, initially funded in 2007 by the National Human Genome Research Institute (NHGRI), consisted of five EHR-linked biorepositories and a coordinating center at the initiation of this study [22]. In brief, the five biorepositories included in eMERGE Phase I were: Group Health/University of Washington (GH/UW), Marshfield Clinic (MFC), Mayo Clinic (MC), Northwestern University (NU), and Vanderbilt University (VU) (Fig 1). Among the five original biorepositories, two used locally-developed EHR systems (MFC and VU), and three used commercial systems with local modifications (NU, MC, and GHC) [26]. A total of 16,029 European Americans and 2,634 African Americans were selected from eMERGE Phase I study sites and genotyped on the Illumina 660-Quad or 1M as previously described [27] and briefly described below.
In the second phase of eMERGE, the number of study sites expanded to include Children's Hospital of Philadelphia, Boston Children's Hospital, Cincinnati Children's Hospital Medical Center, Geisinger Health System (GHS), and Mount Sinai School of Medicine (MSSM) [23]. In this study of adults with resistant hypertension, data from the eMERGE Phase II study sites GHS and MSSM were included (Fig 1), both of which used a commercial EHR system [23].

Selection of resistant hypertension cases and controls
There are multiple definitions of resistant hypertension in the literature [28], including multiple subgroups and classifications of "uncontrolled" and "controlled." International guidelines such as The National Institute for Health and Care Excellence define uncontrolled resistant hypertension as an individual with systolic or diastolic blood pressure measures >140 or >90 mm Hg, respectively, after use of three classes of anti-hypertensive medications concurrently [29]. Controlled resistant hypertension is defined by the American Heart Association Scientific Statement [7] as the concurrent use of at least four antihypertensive medication classes.
Defining any resistant hypertension group within the EHR requires a patient's hypertensive status pre-and post-antihypertensive medication use (based on issued prescriptions) and the medication classes of the anti-hypertensives prescribed. To capture these data, two algorithms were deployed in the eMERGE Network to identify resistant hypertension cases. For the first case algorithm ("controlled case"), each case required the concurrent use of at least four antihypertensive medication classes. Medication classes considered for these definitions included angiotensin converting enzyme inhibitors or angiotensin receptor blockers, beta blockers, non-dihydropyridine calcium channel blockers, dihydropyridine calcium channel blockers, hydralazine, minoxidil, central alpha agonists, direct renin antagonists, aldosterone antagonists, alpha antagonists, and diuretics (thiazides, K-sparing, and loop diuretics) (S1 Table). Direct alpha antagonists (e.g., phentolamine and phenoxybenzamine) were excluded from the antihypertensive medication class as they are typically given to counteract pathologic adrenergic states (e.g., pheochromocytoma), which were excluded from case and control groups. For eMERGE study sites using natural language processing as part of the data extraction process for medications, the algorithm required a dose, strength, route, or frequency present with the medication name to ensure the medication mentioned represents a prescribed medication.
For the second case algorithm ("uncontrolled case"), each case required the concurrent use of at least three antihypertensive medication classes and a systolic blood pressure >140 mm Hg or diastolic blood pressure >90 mm Hg for at least one month after meeting the medication criteria. For both case definitions, patients were excluded if they had systolic heart failure (defined as an ejection fraction 35%) or chronic kidney disease. Chronic kidney disease was defined as an estimated glomerular filtration rate 30 ml/min as calculated using the Modification of Diet in Renal Disease formula [30]. Other exclusion criteria for cases and controls are given in S2 Table. Patients meeting either resistant hypertension case definitions were combined for all analyses.
Controlled hypertensives (the "controls" for this association study) were defined as patients with a measurement of systolic blood pressure >140 mm Hg or diastolic blood pressure >90 mm Hg prior to meeting the medication criteria or with ICD-9 codes for hypertension (401. Ã ) at any time and have one medication from the medication classes described above (and never have more than one simultaneous medication class, although the medication class can change), and have all systolic and diastolic measurements <135 mg/dl and <90 mg/dl, respectively, one month after blood pressure medications were prescribed (this requires at least one blood pressure measurement). For both cases and controls, simultaneous medication class usage was defined as evidence that the patient was taking the medication concurrently based on the presence of the medications in the same medication list (e.g., problem list, clinic note, or discharge summary) or via medication refill data with accompanying evidence of overlapping prescriptions for each drug. Like the resistant hypertension cases, individuals with systolic heart failure or chronic kidney disease were excluded.
Electronic selection logic using billing codes, laboratory values, text queries, and medication records was used to identify resistant hypertension cases and controls at each site (S1 File). The initial algorithm used to identify cases and controls for this study of resistant hypertension was created and iteratively refined at VU with input from local experts as well as the eMERGE Phenotyping Workgroup and then deployed at the other study sites. At VU, physicians not associated with the algorithm development reviewed the de-identified clinical record of a fraction of identified cases and controls to assist in the refinement of the algorithm. In this iterative process, the algorithm was refined until a positive predictive value (PPV) of 92% was achieved based on reviews of 50 randomly selected clinical cases and a PPV of 84% for 50 randomly selected controls. Each round of reviews was independent and did not draw from the same set of identified cases and controls. Evaluation at the other four eMERGE I study sites confirmed PPVs of 84%-100%, including some sites that augmented the algorithmic selection with manual review. The general process of phenotype algorithm development in the eMERGE Network has been previously described [31], and the final version of workflows and algorithms are available at PheKB [32].

Genotyping
Genotyping was performed for eMERGE I study sites using two Illumina arrays at two genotyping centers. Individuals of self-identified or administratively-assigned European-descent were genotyped on the Illumina 660W-Quad, while individuals of self-identified or administratively-assigned African-descent were genotyped on the Illumina 1M. For the majority of patients, genotyping was performed at one of two centers: the Center for Inherited Disease Research (CIDR) at Johns Hopkins University and the Center for Genotyping and Analysis at the Broad Institute as previously described [33,34]. Existing genotype data available for eMERGE II study sites included data from the Illumina 550 (MC; n = 18), Illumina 610 (MC; n = 7), Illumina HumanOmni Express (GHS; n = 3,111), and Affymetrix 6.0 (MSSM; n = 2,775) [35].
Quality control (QC) measures were developed by the eMERGE Genomics Workgroup [33] and were implemented by the Coordinating Center. Briefly, the QC process included examination of sample quality and composition (i.e., sex inconsistency checks, sample call rates, sample relatedness, and population stratification), marker quality (i.e., marker call rate, duplicate concordance, minor allele frequency, and Hardy-Weinberg equilibrium), and of batch effects (i.e., average call rates and minor allele frequency per plate) [33, 34]. Following QC, analyses were limited to patients with call rates >98% and to SNPs with call rates >99% and minor allele frequencies >5%. Race/ethnicity was self-identified or administrativelyassigned, and genetic ancestry was assessed using STRUCTURE [36] and EIGENSTRAT [37]. As reported previously, administratively-assigned race/ethnicity is highly concordant with genetic ancestry for European and African Americans [38, 39].
All genotype data were imputed by Pennsylvania State University Center for Systems Genomics as part of the eMERGE Coordinating Center as previously described [35]. Briefly, all autosomes were imputed using the 1000 Genomes cosmopolitan reference panel (n = 1,092) [40] using SHAPEIT2 [41] and IMPUTE2 [42] for phasing and imputation, respectively. Imputation was performed for each study site and genotyping assay separately and then merged for further quality control and analysis.

Statistical methods
For the GWAS, single-SNP tests of association were performed in PLINK [43] and PLATO [44] using logistic regression and assuming an additive genetic model. All associations were adjusted for sex, decade of birth, median body mass index (BMI in kg/m 2 ), genotyping platform, and genetic ancestry (via ten principal components [PCs] for multi-ethnic analyses and sex-stratified analyses, and three PCs for European-descent only analyses). Analyses were performed in the combined eMERGE I and eMERGE II datasets (3,006 cases and 876 controls). The models did not converge in the combined dataset, and inspection of the distribution of cases and controls by study site and genotyping platform revealed several strata with only case counts but no control counts. Removal of the case-only counts (231) enabled the models to converge.
The results of the GWAS tests of association were plotted using a Manhattan plot. Manhattan plots were generated by the R statistical package using code provided by the blog "Getting Genetics Done" (http://www.gettinggeneticsdone.com/2011/04/annotated-manhattan-plotsand-qq-plots.html), and regional plots were generated using LocusZoom at default settings [45]. Reported p-values were not corrected for multiple comparisons.
SNPs previously associated with blood pressure, systolic blood pressure, diastolic blood pressure, or hypertension among adults at genome-wide significance were drawn from the NHGRI European Bioinformatics Institute (NHGRI-EBI) GWAS Catalog (http://www.ebi.ac. uk/gwas/; accessed September 2016) [46,47]. We then performed a look-up of these previously-associated blood pressure variants in the eMERGE Network resistant hypertension multi-ethnic GWAS dataset. Ad hoc power calculations were performed using CaTS [48].

Identification of resistant hypertensive cases and controls
A total of 3,006 resistant hypertension (uncontrolled and controlled) cases and 876 controlled hypertensives were identified from the EHRs in eMERGE I and II (Table 1). Slightly more than half the cases (55%) were identified by the first case algorithm (see Methods). Cases and controls were drawn from seven study sites of eMERGE I and II, with VU (31%) and MFC (21%) contributing the largest percentage of individuals (S3 Table). For both cases and controls, nearly half were male, and the median BMI was in the overweight category (Table 1).

Discovery
To identify common variants associated with risk of resistant hypertension, a genome-wide association study of 2,530,150 SNPs was performed among all resistant hypertension cases (n = 2,830) and controls (n = 876) adjusted for sex, decade of birth, median BMI, genotyping platform, and ten PCs. A single association at genome-wide significance was observed between case status and ESR1 rs9479122 at p = 6.28x10 -11 with an odds ratio of 0.46 (S1 and S2 Figs). When restricted to European Americans only (n = 1,609 cases and n = 667 controls), ESR1 rs9479122 remained the most significant association, p = 1.12x10 -16 (odds ratio = 0.29; Table 2; S3-S5 Figs).
To corroborate these potential findings, we performed a look-up for ESR1 rs9479122 in two independent datasets, INVEST [21,49] and SPS3 [50]. INVEST was a large hypertension outcomes trial, and SPS3 was a secondary stroke prevention trial with a blood pressure arm. Both trials conducted genetic sub-studies and resistant hypertension phenotypes were constructed in a manner similar to that constructed in eMERGE. In INVEST and SPS3, ESR1 rs9479122 is nearly monomorphic (coded allele G frequency 0.996 and 0.99937, respectively), which is in agreement with HapMap CEU estimates. ESR1 rs9479122 is imputed in this eMERGE dataset [35].
Given the differences in the frequency of the imputed variant in eMERGE versus reference data and the look-up datasets, we deemed ESR1 rs9479122 a likely false positive, removed it from the dataset, and re-generated the Manhattan plots (Figs 2 and 3). No tests of association were genome-wide significant in either the overall dataset or among European Americans only Table 2. SNPs associated with resistant hypertension at p<10 −6 in the eMERGE Network in the genome-wide association study. After removal of ESR1 rs9479122, single-SNP tests of association were performed for 2,530,149 SNPs in eMERGE I and II using logistic regression, assuming an additive genetic model, adjusted for sex, decade of birth, median body mass index, genotyping platform, and genetic ancestry (principal components 1 through 10). Results are shown for tests of association in the eMERGE I and II Network at p<10 −6 . Tests of association were repeated for European Americans only using logistic regression, assuming an additive genetic model, adjusted for sex, decade of birth, median body mass index, genotyping platform, and genetic ancestry (principal components 1 through 3). Results are also shown for European Americans for SNPs associated with resistant hypertension at p<10 −6 in the eMERGE I and II Network. Abbreviations: basepair (bp), chromosome (chr), coded allele (CA), coded allele frequency (CAF).  Association of previously-identified hypertension and blood pressure variants with resistant hypertension We examined whether SNPs known to influence blood pressure or hypertension also influenced resistant hypertension. The NHGRI-EBI GWAS catalog was used to select SNPs previously associated (p<5.0x10 -8 ) with blood pressure, systolic blood pressure, diastolic blood pressure, or hypertension. From published GWAS papers [51][52][53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][69], 118 SNPs were selected for examination in the current genome-wide studies (S4 Table). Of the 118 SNPs, 50 were available in the eMERGE I and II imputed dataset. Of the 50 tests of association performed in the present resistant hypertension study, no one association was significant (p<0.001) and in the same direction compared with the previously published literature. The most significant finding in the present study was TBX3 rs35444, previously associated with increased diastolic blood pressure in East Asians [51], associated with resistant hypertension (eMERGE I and II combined odds ratio = 1.27; p = 1.30x10 -3 ) in this multi-ethnic clinical study. With 2,830 cases and 876 controls, we were powered to replicate genetic associations for common variants (allele frequency !5%) with an odds ratio !1.5, assuming a genetic additive model and a significance threshold of 0.001. As might be expected given that the SNPs tested were from previously reported GWAS, all but one (SLC39A8 rs13107325) of the 50 SNPs met the minor allele frequency threshold assumed in the power calculations based on the 1000 Genomes Project phase 3 genotype data from 2,500 worldwide individuals [70]. Most of the previously-reported associations tested here for associations with resistant hypertension were studies of quantitative traits (such as diastolic and systolic blood pressures) with effect sizes (β) ranging from 0.17 to 5.48 (median = 0.521). Of the few studies reporting odds ratio for associations with hypertension, only one reported an odds ratio >1.5 [71]. Although direct comparisons are difficult to make between the present study's dichotomous outcome and previous studies' quantitative outcomes (S4 Table), it is likely that the present study was under-powered to replicate or generalize previous findings from blood pressure GWAS.

Discussion
We demonstrate here that DNA biobanks linked to EHRs can be used to classify resistant hypertension cases and controls and to perform large-scale genome-wide association studies. Using an algorithm based on a combination of billing codes, laboratory values, text queries, and prescription records, we were able to identify 876 controlled hypertensives and 3,006 cases of resistant hypertension from a sample of >55,000 US adults available in eMERGE Phase I and II [72]. Overall, no variants were associated with resistant hypertension at genome-wide significance in either the full dataset or the dataset limited to European Americans. Also, none of the loci previously identified for hypertension and/or blood pressure was associated with resistant hypertension in cases and controls drawn from eMERGE I and II after correction for multiple testing. It is unclear if the lack of associations observed here represent a lack of sufficient power, a lack of accurate phenotyping, or in the case of the hypertension/blood pressureassociated variants, a lack of biological relevance or connection to the phenotype of resistant hypertension. This present work expands on earlier studies of the genetics of resistant hypertension. The handful of studies that have specifically investigated resistant hypertension have mostly focused on candidate genes: β and γ subunits of the epithelial sodium channel (ENaC) [17], cytochrome P450 gene (CYP3A5) [18], glutathione S-transferase mu type 1 gene (GSTM1) [16], and endothelial nitric oxide synthase gene (NOS3) [19,20]. However, these studies suffered from several limitations including small sample sizes (ranging from 11 to 347 cases) and less-than-stringent significance thresholds. Of note is the large-scale candidate gene study in INVEST [21] that tested for an association between resistant hypertension (with >500 total cases) and approximately 50,000 common genetic variants targeted by the Illumina HumanCVD Genotyping BeadChip [73]. The most significant finding in INVEST (ATP2B1 rs12817819) did not replicate in an independent dataset, but reached chip-wide significance after meta-analysis [21].

Strengths and limitations
Major strengths of our study include the scale and longitudinal nature of the EHR. The combination of collaboration among research groups with access to biobanks and advances in highthroughput technology allowed us to genotype hundreds of thousands of genetic variants and impute to more than two million SNPs in >2,800 resistant hypertensive patients, enabling us to test for novel associations that may not have been interrogated via traditional candidate gene studies. However, compared with GWAS of common diseases, studies requiring exposure to medications or other treatments face additional challenges [74]. While GWAS of common diseases such as type 2 diabetes [75,76] or lipid levels [77,78] are able to accrue tens to hundreds of thousands of individuals for study, obtaining adequate numbers for some clinical traits is challenging as observational cohorts often do not have sufficient data related to drug exposure or treatments, and existing sets of genotyped clinical records are still relatively small. Indeed, even with our fairly large sample size, we were still underpowered to detect small effects at genome-wide significance. Specifically, for common SNPs (minor allele frequency = 0.20), we had 80% power to detect an odds ratio of 1.41.
Statistical power has been a major obstacle for studies designed to identify genetic variants associated with blood pressure in general. Indeed, even contemporary studies with access to genome-wide and imputed data of millions of common variants estimate that one-third to one-half the variability observed for systolic and diastolic blood pressure levels can be attributed to the additive effects of many genetic variants; to date, only~2% of blood pressure variation can be explained by known blood pressure-associated variants [79], each with very small contributions, thus requiring large sample sizes to detect the associations via GWAS [79]. Given the genetic and environmental complexity of blood pressure, it is not surprising that the present study was unable to identify novel variants associated with resistant hypertension.
The upcoming NIH Precision Medicine Initiative Cohort Program ("All of Us"), which seeks to accrue >1 million individuals with longitudinal EHR data, may facilitate future studies on drug effects on a scale not possible today [80,81]. Also, longitudinal studies such as those proposed by the NIH Precision Medicine Initiative Cohort Program increase the chances that secondary causes of resistant hypertension will be identified and eliminated to reduce phenotypic heterogeneity. However, it is worth noting that while potentially large, the Precision Medicine Initiative Cohort Program may suffer from similar limitations experienced by the individual study sites of the eMERGE Network. That is, the eMERGE Network is large in overall sample size, but the requirements of repeated blood pressure measurements and prescription information over the course of time severely limited the case and control counts available for the association study. This dramatic loss of sample size and resulting loss of power is a reflection of the fragmented health care system in the United States. Studies requiring extensive longitudinal data may be better powered in health care settings where patients regularly receive the vast majority of his or her care within a single setting such as an integrated managed care consortium (Kaiser Permanente [82,83]) or the Veterans Health Administration [84].
Phenotypic heterogeneity in general is a major challenge faced by most studies [74]. Our study is no exception. While our algorithm and subsequent review removed potential sources of case misclassification (such as acute myocardial infarction patients on more than one class of medication), it is important to note that elevated blood pressure readings may result from measurement error, other concurrent medical conditions (e.g., pain or anxiety), poor patient compliance, or "white-coat hypertension." Patients who lack blood pressure control despite appropriate treatment due to the aforementioned reasons may present with resistant hypertension but are actually "pseudoresistant" [9]. Potential lifestyle causes of resistant hypertension, such as excessive dietary salt intake and heavy alcohol use, are also not measured and recorded in the EHR. While large-scale biobanks linked to EHRs offer important advantages in research settings, misclassification bias must be recognized and minimized by carefully characterizing the phenotype and exploiting the longitudinal nature of the EHR [85].
Of special note is the issue of poor compliance or non-compliance. Most studies involving medications or self-administered treatments have a problem with non-compliance. For hypertension, clinical and epidemiologic surveys continuously suggest that a high proportion of patients do not achieve target blood pressure levels after treatment [86,87]. This observed resistance to treatment can be reflective of true resistance, the outcome of interest in the present study, or apparent resistance which often times is reflective of poor adherence to treatment. Previous retrospective studies have noted that approximately 40% of newly diagnosed hypertensive patients discontinue treatment within a year [7,88,89]. Multiple factors have been associated with poor adherence to hypertensive treatment or treatment in general including non-patient (such as drug class [90]) and patient factors (such as sex, race/ethnicity, and socioeconomic status [91]), and some data suggest that women prescribed anti-hypertensives are more compliant than men [92]. Nearly half the sample in this present study was male. Additional information such as prescription claims data could help clarify this issue of noncompliance.
Finally, a major challenge in designing optimal studies to identify and characterize the genetic architecture of resistant hypertension may be related to the as-of-yet unknown etiology of the phenotype(s) [10]. For example, more traditional pharmacogenomics studies of blood pressure and response to anti-hypertensive treatments concentrate on a single class of drug (such as hydrochlorothiazide monotherapy [93]) representing a specific pathway or mechanism of action. It could be argued that resistant hypertension represents the dichotomized version of the extremes of the drug-response distribution as the majority of resistant hypertension cases studied here required four medications from different classes. It is unlikely that a single genetic variant affects all pathways represented by these medication classes but rather a complex combination or interaction of variants that phenotypically result in resistant hypertension. This genetic heterogeneity in combination with the aforementioned phenotypic heterogeneity and phenotyping requirements makes the genetic study of resistant hypertension especially challenging.
In conclusion, we describe an approach to identify resistant hypertension cases and controls from seven different institutions' EHRs. To our knowledge, this is the only GWAS of resistant hypertension in any population. These results highlight the utility of EHR-linked genomic data to further refine our understanding of disease and its subtypes, with the goal of uncovering new therapeutic targets and providing better targeting of current therapies. A genome-wide association study was performed for 1,719 European American cases of resistant hypertension and 708 controls from the eMERGE Network adjusted for sex, decade of birth, median body mass index, genotyping platform, and genetic ancestry (principal components 1-3). The left x-axis is the-log 10 (p-value) of the tests of association and the right x-axis is the recombination rate (cM/Mb). The most significant result in European Americans (rs9479122) is plotted as the index variant color-coded as a purple diamond. Surrounding SNPs are circles color-coded by strength of linkage disequilibrium (LD calculated as r 2 ), where red is complete or very strong LD and blue is weak LD or independent variants. LD was calculated using HapMap CEU data (release 22) in the default version of LocusZoom. Gene names and position on chromosome 6 (Mb) are given on the y-axis. The most significant finding (ESR1 rs9479122) is likely a falsepositive due to poor genotyping prior to imputation and was removed. (DOCX)

S6 Fig. Genome-wide association study of European Americans with resistant hypertension versus controlled hypertensives.
A total of 2,530,150 SNPs were tested for an association with resistant hypertension (1,719 cases and 708 controls) among Europeans from the eMERGE I and II network. After removal of ESR1 rs9479122, tests of association were performed using logistic regression assuming an additive genetic model and adjusting for sex, decade of birth, genotyping platform, median body mass index, and principal components (1)(2)(3). Each test of association was plotted on a Manhattan plot where the x-axis is the chromosomal location and the y-axis is-log 10 Table. Exclusions from case and control definitions of resistant hypertension in the eMERGE Network. Individuals were excluded from case or control status based on ICD-9-CM codes and situations as described. In addition to exclusions based on codes, individuals were excluded from case status if there was evidence of chronic kidney disease within six months after meeting the definition for "controlled" resistant hypertension (four medication classes concurrently) or heart failure within one year before or after meeting the definition for "controlled" resistant hypertension. Chronic kidney disease was defined by an estimated glomerular filtration rate (eGFR) 30 ml/min, as calculated by the Modification of Diet in Renal Disease formula. Heart failure was defined as an ejection fraction (EF) or left ventricular ejection fraction (LVEF) 35%. Individuals with evidence of heart failure were also excluded from control status. (DOCX) S3 Table. Resistant hypertension cases and controls for discovery, by eMERGE study site and population. Counts within parentheses represent number of additional samples identified as cases or controls but not included in the final analyses due to model convergence issues. (DOCX) S4 Table. Variants previously identified in GWAS of blood pressure or hypertension in current GWAS of resistant hypertension. The "Published GWAS" columns represent SNPs previously associated with blood pressure, systolic blood pressure, diastolic blood pressure, or hypertension among adults at genome-wide significance drawn from the NHGRI European Bioinformatics Institute (NHGRI-EBI) GWAS Catalog (http://www.ebi.ac.uk/gwas/; accessed September 2016). Published GWAS data are compared with eMERGE I and II association results for resistant hypertension among all racial/ethnic adults if the SNP is present in the dataset. Abbreviations: beta (β), p (p-value), odds ratio (OR), lower 95% confidence interval (L95), and upper 95% confidence interval (U95). Data from eMERGE I and eMERGE II are denoted by the subscripts. (DOCX)