Rare variants in fox-1 homolog A (RBFOX1) are associated with lower blood pressure

Many large genome-wide association studies (GWAS) have identified common blood pressure (BP) variants. However, most of the identified BP variants do not overlap with the linkage evidence observed from family studies. We thus hypothesize that multiple rare variants contribute to the observed linkage evidence. We performed linkage analysis using 517 individuals in 130 European families from the Cleveland Family Study (CFS) who have been genotyped on the Illumina OmniExpress Exome array. The largest linkage peak was observed on chromosome 16p13 (MLOD = 2.81) for systolic blood pressure (SBP). Follow-up conditional linkage and association analyses in the linkage region identified multiple rare, coding variants in RBFOX1 associated with reduced SBP. In a 17-member CFS family, carriers of the missense variant rs149974858 are normotensive despite being obese (average BMI = 60 kg/m2). Gene-based association test of rare variants using SKAT-O showed significant association with SBP (p-value = 0.00403) and DBP (p-value = 0.0258) in the CFS participants and the association was replicated in large independent replication studies (N = 57,234, p-value = 0.013 for SBP, 0.0023 for PP). RBFOX1 is expressed in brain tissues, the atrial appendage and left ventricle in the heart, and in skeletal muscle tissues, organs/tissues which are potentially related to blood pressure. Our study showed that associations of rare variants could be efficiently detected using family information.

For complex traits such as BP, rare variants are suggested to play a greater role in heritability than anticipated in the common disease-common variant hypothesis [22]. A Framingham Heart Study reported rare mutations in three renal salt handling genes causing large reductions in blood pressure and estimated that the overall prevalence of hypertension is reduced by about 1% because of the effects [23]. Linkage studies of family data can be used to uncover missing heritability and identify genetic markers linked to BP [24,25]. However, the identified linkage regions from well-designed linkage studies such as the US Family Blood Pressure Program (FBPP) and the UK Medical Research Council British Genetics of Hypertension (BRIGHT) study [26][27][28][29] do not overlap with many BP loci identified by large BP GWAS of mostly unrelated individuals. In general, GWAS have good power to detect common variants of modest effect with attainable sample sizes, but less power for detecting rare variants with intermediate effect. In contrast, linkage analysis can have good power to detect multiple rare or lower frequency BP variants in a gene or region with relatively larger effect sizes [25]. Thus, we hypothesize that a linkage region observed in a family study, if not overlapping with the BP loci in reported GWAS, may harbor multiple rare or lower frequency BP variants.
Recently, many statistical approaches for rare variant association analyses have been developed for unrelated samples [30][31][32][33][34] and family data [35][36][37][38]. It has been suggested that rare or lower frequency variants can be enriched in families [35,37], and therefore improving the statistical power for their identification. However, the existing rare variant association methods have not incorporated linkage evidence. In this study, we performed variance-component linkage analysis with BP traits, including systolic blood pressure (SBP), diastolic blood pressure (DBP), and pulse pressure (PP) in the Cleveland Family Study (CFS). We searched the published GWAS to examine whether there are reported BP variants in the linkage regions. Using the National Heart, Lung and Blood Institute, AG009740 from the National Institute on Aging (to SLRK), and HL121429 from the National Institutes of Health (to TLE). The Health and Retirement Study is supported by the National Institute on Aging (NIA U01AG009740), and funding for this analysis was also provided by the NIA (R03 AG046498). The genotyping was funded separately (RC2 AG036495, RC4 AG039029) and was conducted by the NIH Center for Inherited Disease Research (CIDR) at Johns Hopkins University. Genotyping quality control and final preparation of the data were performed by the University of Michigan School of Public Health. The Women's Health Initiative (WHI) program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, US Department of Health and Human Services, through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C. The BioVU dataset used in the analyses described were obtained from Vanderbilt University Medical Center's BioVU, which is supported by institutional funding and by the Vanderbilt CTSA grant ULTR000445 from NCATS/NIH. Genotyping was funded by NIH grants RC2GM092618 from NIGMS/OD and U01HG004603 from NHGRI/NIGMS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
the combined linkage and association analyses, we searched for potential functional variants that can explain linkage evidence and replicated the variants in independent cohorts. Table 1 presents the characteristics of white participants in the CFS data. S1 Fig presents the distributions of the residuals of SBP, DBP, and PP, which are all approximately normally distributed. Linkage analysis identified a peak (LOD = 2.81) on chromosome 16p13 linked to SBP (S2 Fig, Materials and Methods). Linkage analysis of further pruned markers using a R 2 threshold of 0.1 or modeling marker-marker linkage disequilibrium resulted in a slight decrease of LOD score in the same region (LOD = 2.30-2.42, S3 Fig). We selected a candidate region of 20cM with 2-LOD score drop for association analysis (Fig 1). This region did not overlap with published GWAS of BP variants. Therefore, we tested the hypothesis that the observed linkage evidence for SBP is due to the presence of multiple rare, coding variants in a gene(s) within the region.

Results
The CFS was genotyped by an exome array, with most of the variants being coding variants. Within the linkage region, there are 454 exonic variants defined by the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium that are genotyped on the exome array [39]. We identified 13 exonic variants (S1 Table) that satisfy the following filtering criteria: 1) either have a SBP association p-value 0.1 or absolute regression coefficient beta ! 5; and 2) present at least twice in at least one family with a family-specific LOD score ! 0.1. A risk score based on the 13 identified SNPs has an effect size of 0.948 ± 0.135 for association with SBP in CFS. After adding this risk score as a covariate in linkage analysis, the MLOD score dropped from 2.809 to 1.055, suggesting that these 13 variants were able to account for most of the observed linkage evidence. To further assess the significance of this LOD drop, we sampled 1,000 independent SNPs from chromosomes other than chromosome 16. Hence, these SNPs should not contribute to the LOD score observed on chromosome 16. We performed linkage analysis with each of these 1,000 SNPs as a covariate in the linkage analysis. We calculated the differences between the original MLOD score and the MLOD scores of the 1,000 linkage analyses with a SNP as a covariate. The largest LOD score drop in these 1,000 linkage analyses was 0.347, suggesting the observed LOD score drop on the risk scores of 13 selected variants is statistically significant (p-value<0.001). Among these 13 exonic variants, two variants (rs149974858 and rs145873257) are present in RBFOX1 and the remaining 11 variants are each in separate genes. When adjusting for the risk scores of rs149974858 and rs145873257, we also observed a drop in LOD score (LOD = 1.97), which suggests that rs149974858 and rs145873257 account for a portion of the observed linkage evidence. The variant rs149974858 shows association evidence with SBP (p-value = 0.0016) in CFS. The minor allele frequency of rs149974858 is 0.0036 and only segregates within a 17-member RBFOX1 associated with lower blood pressure family with family-specific LOD of 0.697 (Fig 2). This missense variant (c.112C>G) results in a proline to alanine substitution (p.Pro38Ala). Five members from this family carrying this rare missense mutation had on average lower SBP (carrier average = 117 mmHg, noncarrier average = 125 mmHg) but higher BMI than other family members (carrier average = 60 kg/ m 2 , noncarrier average = 31 kg/m 2 ). A single SNP association test revealed that rs149974858 is also significantly associated with BMI in CFS (beta = -26.8±4.35, asymptotic p-value = 7.28E-10). The exonic variant rs145873257 segregates within a different family with family-specific LOD score of 0.215. A c.1072G>A base change resulted in a glycine to serine substitution (p. Gly358Ser). Six members of this family are heterozygous for the AT genotype. The estimated effect of this variant is protective in CFS, although it is not statistically significant (S1 Table).
Since both variants rs149974858 and rs148751394 consistently show a protective effect in two large families, we examined the other coding and rare variants in RBFOX1 genotyped on the exome array. Two exonic variants, rs151214012 and rs145873257, and one rare, intronic variant rs2345080 are available in the exome array. These three variants show protective effect despite not satisfying the filtering criteria (S2 Table). Single SNP associations and annotations for all exome array variants of RBFOX1 are provided in the S3 Table. Applying either the family-based burden or SKAT analysis, these five variants are significantly associated with SBP and DBP (Table 2).
We next sought the replication of the rare variants in RBFOX1 in four large independent cohorts (ARIC, WHI, BioUV, and HRS) of whites for the traits SBP, DBP and PP. We specifically looked at the five rare variants of RBFOX1 found in CFS and their associations with BP traits. Within the ARIC data, 4 of the 5 variants were present and all the 4 variants showed a consistent protective effect. For WHI, all 5 variants were present but only 1 variant (rs145873257) had a protective effect size. BioUV contained 4 out of 5 variants found in CFS; 2 variants were protective for SBP and 3 variants were protective for DBP. HRS contained all 5 variants found in CFS; 4 variants were protective for SBP and 3 variants were protective for DBP. Among the total 23 tests (CFS: 5, ARIC: 4, BioUV: 4, WHI: 5, HRS: 5) conducted across all cohorts, 16 of them were protective (p-value = 0.0173 based on binomial distribution), suggesting a consistent protective effect.
We next conducted a gene-based association analyses for RBFOX1 and BP traits using exome array data from the CFS with weights Beta (1,25). Burden, SKAT, and SKAT-O tests were performed using the 5 rare variants of RBFOX1 found in CFS (Table 2). In CFS, the association between RBFOX1 and SBP was found to be significant by the burden test (pvalue = 0.00214) and SKAT-O (p-value = 0.00403), but not by SKAT (p-value = 0.0702). All three gene-based tests for DBP were statistically significant (burden test p-value = 0.0148, SKAT p-value = 0.0494, SKAT-O p-value = 0.0258). When we conducted gene-based analyses using only the 4 coding variants (rs149974858, rs148751394, rs151214012, rs145873257) identified in CFS, the associations are also significant for SBP and DBP (p-value<0.037).
We observed that the variant rs149974858 co-segregated with BMI in the 17-member CFS family. Subsequently, we performed linkage analysis for BMI in CFS on chromosome 16, after adjusting for gender, age, age 2 , and PC1. We did not observe linkage evidence in this region  (Table 2).

Discussion
We performed a linkage analysis of BP traits using the families collected in CFS. The largest linkage peak identified is on 16p13 linked to SBP. The 16p13 region has been reported of linkage evidence with BP in two studies: the Victorian Family Heart Study [40], the extreme-sibpair study in Chinese adults [41]. In addition, the longitudinal change of BP in Mexican Americans [42], and the Hypertension Genetic Epidemiology Network Study in whites [43] reported linkage regions partially overlapped with the current study. The reported linkage evidence from multiple ethnic populations strongly suggests the linkage evidence on 16p13 is real. Linkage analysis using microsatellite markers was performed with 363 sib-pairs of CFS whites for a hypertensive status, adjusted for age, age 2 , sex, BMI, and BMI 2 . This analysis did not find linkage evidence in the 16p13 region. This is unsurprising because the power of using a binary hypertensive status is lower than that of quantitative phenotypes, such as SBP, DBP, or PP. In addition, hypertensive status was defined as either SBP ! 140, DBP ! 90, or taking antihypertensive medications and the sample size in the sib-pair analysis was smaller than the current study, all of which contribute to the lack of linkage evidence observed. Our study demonstrates that high-density SNP genotyping arrays are informative for detecting linkage signals.
We searched among published large GWAS studies of BP traits [5, 7-10] and did not identify any BP variants previously reported on 16p13, suggesting that multiple low frequency or rare variants with relatively large effect sizes are possibly contributing to the observed linkage evidence. We further assumed that variants with relatively large effect sizes are more likely to be coding and rare variants. Thus, we limited our search to only the coding and rare variants under the linkage peak genotyped on the exome array. By examining the associated variants that are able to account for the observed linkage evidence on 16p13, we were able to identify 13 exonic variants. Among these 13 variants, 2 of them fall in RBFOX1, which encodes for the ataxin-2 binding protein 1 (also known as A2BP1), and show a consistent protective effect for SBP in CFS. Gene-based analysis of the four available exonic variants and one rare intronic variant in RBFOX1 are significantly associated with SBP and DBP (p-value = 0.00403, 0.0258, respectively) using SKAT-O. Replication analysis of the rare variants at the gene level (but not at the variant level) is also significant for SBP and PP in the meta-analysis of four large cohorts of whites with a total replication sample size N = 57,234. This study also provides evidence that rare variants within RBFOX1 are protective for BP traits among obese individuals. Among individuals of European ancestry within the CFS, 5 individuals within the same family carried the minor allele for rs149974858, the variant showing significantly protective effect with SBP by single SNP association test. All of the 5 individuals are morbidly obese (with average BMI of 60). However, their SBP (mean = 117) and DBP (mean = 78) are within the normotensive range. Ma et al. conducted a GWAS of BMI in Pima Indians using Affymetrix 100K array and identified two common variants in RBFOX1 associated with BMI [44]. The same two variants could be replicated in non-overlapped Pima Indians but not in French Caucasians, Amish Caucasians, German Caucasians, or Native Americans [44]. In our analysis, we identified four exonic and one intronic rare variants in RBFOX1 that are significantly associated with BMI but the association evidence could not be replicated (Table 2). Therefore, it is inconclusive whether RBFOX1 is an obesity gene. In all our analysis, either linkage or association analysis, BMI is included as a covariate. Furthermore, no linkage evidence was found for BMI on chromosome 16, after adjusting for gender, age, age 2 , and PC1. Our result indicates the RBFOX1 contributes to BP variation independent of obesity, although we are unclear whether RBFOX1 has a pleiotropic effect on both BP and obesity.
We also observed that the effect direction of single variant replication analysis in the four cohorts is not always consistent with that of CFS. However, 16 of the 23 tests were protective (p-value = 0.017), suggesting a consistent protective effect. Assuming a causal rare variant with an effect size equal to one quarter of the BP standard deviation, we estimate the probability of observing an opposite direction in a study to be 40.1%, which is consistent with 7 opposite directions among 18 single SNP replication tests in 4 replication cohorts.
It is interesting that all the four exonic variants in RBFOX1 are either monomorphic or extremely rare in African ancestry populations (S4 Table). Furthermore, the BP admixture mapping analysis by Zhu et al. also suggest local ancestry in this region is associated with SBP and DBP; however, the evidence is not genome-wide significant [11]. Thus, our result suggests that the rare exonic variants in RBFOX1 may contribute to a protective effect for hypertension and further work will be needed to establish whether the lack of these protective variants contribute to the disparity in hypertension occurrence and early age of onset between African Americans and whites.
To our knowledge, only one GWAS study so far has reported on the association of RBFOX1 variants with blood pressure using human genotyping data. Wang et al. reported an association between rs1507023, a candidate SNP in RBFOX1 involved in vitamin D metabolism and signaling, and SBP, DBP, and PP. Its association with blood pressure was significant before, but not after correction for multiple testing [45].
Under the linkage region of 16p13, there are 11 additional variants that either have an association test p-value less than 0.1 or an effect size larger than 5 mmHg in CFS. When we used the risk scores of these 11 variants as a covariate in linkage analysis, the MLOD dropped to 1.932, suggesting that there may be additional variants that contribute to linkage evidence in this region. However, the current exome array data is limited for further dissection of genes or variants contributing the linkage evidence. Whole genome sequencing data, including the sequencing data from the Trans-Omics for Precision Medicine (TOPMed) Program (https:// www.nhlbi.nih.gov/research/resources/nhlbi-precision-medicine-initiative/topmed), will be necessary to identify the genes contributing blood pressure variation in this region.
Gene expression data and previous studies have demonstrated that RNA splicing factor RBFOX1 is important for heart and skeletal muscle development and function [46][47][48]. RBFOX1 expression has been associated with cardiac hypertrophy and heart failure in mice models [49]. Gao et al. found that RBFOX1 expression was significantly diminished in both mouse and human failing hearts [49]. We searched the GTEx database and RBFOX1 is highly expressed in multiple human brain tissues, atrial appendage and left ventricle of the heart, as well as muscle skeletal tissues (Fig 3; http://www.gtexportal.org/home/gene/RBFOX1). Further biological studies are needed to establish the direct role of RBFOX1 in regulating blood pressure.
Our study suggests that family-based linkage evidence can be extremely successful in searching for rare variants contributing to complex traits. In summary, we identified rare, exonic variants in RBFOX1 that have a protective effect on BP traits, which can be important in searching new drugs for cardiovascular disease. However, it should be pointed out that association analysis was performed using variants available in the exome array of this study. The variants identified in RBFOX1 may still be reflecting in LD with the causal variants to BP. While RBFOX1 is expressed in multiple tissues that may relate to blood pressure, the mechanism underlying how this gene contributes to BP variation needs to be further studied. The identification of these rare coding variants will facilitate precision medicine in treating cardiovascular disease.

Materials and methods
The CFS is a family-based longitudinal study comprised of patients with laboratory-diagnosed sleep apnea, their family members, and neighborhood control families, as described before [50]. The data were analyzed anonymously at Case Western Reserve University. The CFS study was approved by Partners Human Research Committee with the proposal number 2011D001860. The analytical sample includes 517 white participants in 130 families who were genotyped with the Illumina OmniExpress Exome array, which includes both GWAS and exome variants (Table 1). Standard quality controls were performed, including checking Hardy-Weinberg equilibrium, Mendelian inconsistences, and verifying pedigree structure using the genetic markers by the software PLINK [51]. BP traits, including SBP and DBP were each determined following standardized guidelines using a calibrated sphygmomanometer [52]. Height and weight were directly measured and antihypertensive medications were ascertained by questionnaire. Data for this analysis were from the last available examination for each participant.
The samples used for replication analysis include five independent cohorts. We included 10,864 unrelated white participants from the Atherosclerosis Risk in Communities (ARIC) Study. The ARIC study is a prospective epidemiologic study designed to investigate the natural history and etiology of atherosclerosis (https://www2.cscc.unc.edu/aric/). There were 18,050 unrelated white participants from the Women's Health Initiative (WHI), a study of postmenopausal women focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures [53,54]. From the Vanderbilt University Biobank (BioVU), we included 18,977 unrelated white individuals. BioVU uses leftover blood samples collected during routine clinical care from consented individuals who visit the Vanderbilt University Medical Center [55]. Lastly, we included 9,343 unrelated white participants from the Health and Retirement Study (HRS). This is a longitudinal survey of a representative sample of Americans over the age of 50 [56][57][58].
SBP and DBP for an individual taking antihypertensive medication were imputed using the standard approach in literature, by adding 15 mmHg and 10 mmHg, respectively. Pulse pressure (PP) was calculated as the difference between imputed SBP and DBP [5].

Linkage analysis in CFS
We calculated residuals of SBP, DBP, and PP after adjusting for gender, age, age 2 , body mass index (BMI; kg/m 2 ) and the first principal component of genotype values in CFS separately. The residuals of these regressions were used for linkage analysis using the software MERLIN [59]. The principal components (PCs) were calculated using the software FamCC, which can be applied to family data [60]. Since the results were essentially the same for including the first PC or the first 10 PCs, we reported the linkage results including the first PC. We used the pairwise linkage disequilibrium (LD) pruning approach with a window size of 50 kb, step size of 5 variants, and R 2 threshold of 0.2. We also required a minor allele frequency (MAF) ! 0.2. This resulted in 56,992 autosomal SNPs using PLINK [51]. Because marker-marker LD may result in biased linkage calculations, we performed linkage analysis by further reducing the R 2 threshold to 0.1 and by modeling the marker-marker LD using MERLIN [59]. Linkage analysis using MERLIN decomposes phenotypic variance into three parts: the variance contributes to the quantitative trait locus (s 2 QTL ), the variance contributes to the polygenetic effect (s 2 G ), and the variance contributes to the random effect (s 2 E ). It also tests the null hypothesis of no linkage H 0 :

Identify coding variants accounting for linkage evidence
We examined exonic variants genotyped in the exome array in the region of 2-LOD score drop from the linkage peak. We performed the family-based association analysis for the exonic variants only in the 2-LOD drop region using the ASSOC package in S.A.G.E [61]. The familybased association analysis was conducted using a linear mixed model y = β 0 + β 1 g + δ + ε, where g is a genotype value vector, β 0 is the intercept, β 1 is the regression coefficient, d $ Nð0; 2Fs 2 G Þ where F is the kinship coefficient matrix and s 2 G is the polygenic variance, and ε $ Nð0; Is 2 ε Þ where s 2 ε is the random error. ASSOC applies the likelihood ratio test to test the null hypothesis of H 0 : β 1 = 0. For each of the variants, we first performed an association analysis with a BP trait using ASSOC and identified variants with either p-value 0.1 (marginal effect) or absolute regression coefficient beta ! 5 (large effect). We next estimated family-specific LOD scores and identified families with LOD score ! 0.1. We kept the variants with association p-value 0.1 or absolute regression coefficient ! 5, and that were present at least twice in at least one family with family-specific LOD score ! 0.1.
We defined the risk score as where β is the regression coefficients of the SNPs, and x i is a vector of the number of risk alleles carried by individual i for these SNPs. Linkage analysis was further performed conditional on the risk scores.

SNP-set burden and SKAT test
We performed family-based burden and SKAT tests for CFS using the software famSKAT and for the replication cohorts using the R package SKAT [30,32,62]. The weight was set to ffiffiffiffi ffi w j p ¼ BetaðMAF j ; 1; 25Þ as suggested to increase the weight of rare variants.