Genotypic and Allelic Variability in CYP19A1 among Populations of African and European Ancestry

CYP19A1 facilitates the bioconversion of estrogens from androgens. CYP19A1 intron single nucleotide polymorphisms (SNPs) may alter mRNA splicing, resulting in altered CYP19A1 activity, and potentially influencing disease susceptibility. Genetic studies of CYP19A1 SNPs have been well documented in populations of European ancestry; however, studies in populations of African ancestry are limited. In the present study, ten ‘candidate’ intronic SNPs in CYP19A1 from 125 African Americans (AA) and 277 European Americans (EA) were genotyped and their frequencies compared. Allele frequencies were also compared with HapMap and ASW 1000 Genomes populations. We observed significant differences in the minor allele frequencies between AA and EA in six of the ten SNPs including rs10459592 (p<0.0001), rs12908960 (p<0.0001), rs1902584 (p = 0.016), rs2470144 (p<0.0001), rs1961177 (p<0.0001), and rs6493497 (p = 0.003). While there were no significant differences in allele frequencies between EA and CEU in the HapMap population, a 1.2- to 19-fold difference in allele frequency for rs10459592 (p = 0.004), rs12908960 (p = 0.0006), rs1902584 (p<0.0001), rs2470144 (p = 0.0006), rs1961177 (p<0.0001), and rs6493497 (p = 0.0092) was observed between AA and the Yoruba (YRI) population. Linkage disequilibrium (LD) blocks and haplotype clusters that is unique to the EA population but not AA was also observed. In summary, we demonstrate that differences in the allele frequencies of CYP19A1 intron SNPs are not consistent between populations of African and European ancestry. Thus, investigations into whether CYP19A1 intron SNPs contribute to variations in cancer incidence, outcomes and pharmacological response seen in populations of different ancestry may prove beneficial.

CYP19A1 facilitates the bioconversion of estrogens from androgens. CYP19A1 intron single nucleotide polymorphisms (SNPs) may alter mRNA splicing, resulting in altered CYP19A1 activity, and potentially influencing disease susceptibility. Genetic studies of CYP19A1 SNPs have been well documented in populations of European ancestry; however, studies in populations of African ancestry are limited. In the present study, ten 'candidate' intronic SNPs in CYP19A1 from 125 African Americans (AA) and 277 European Americans (EA) were genotyped and their frequencies compared. Allele frequencies were also compared with HapMap and ASW 1000 Genomes populations. We observed significant differences in the minor allele frequencies between AA and EA in six of the ten SNPs including rs10459592 (p<0.0001), rs12908960 (p<0.0001), rs1902584 (p = 0.016), rs2470144 (p<0.0001), rs1961177 (p<0.0001), and rs6493497 (p = 0.003). While there were no significant differences in allele frequencies between EA and CEU in the HapMap population, a 1.2-to 19-fold difference in allele frequency for rs10459592 (p = 0.004), rs12908960 (p = 0.0006), rs1902584 (p<0.0001), rs2470144 (p = 0.0006), rs1961177 (p<0.0001), and rs6493497 (p = 0.0092) was observed between AA and the Yoruba (YRI) population. Linkage disequilibrium (LD) blocks and haplotype clusters that is unique to the EA population but not AA was also observed. In summary, we demonstrate that differences in the allele frequencies of CYP19A1 intron SNPs are not consistent between populations of African and European ancestry. Thus, investigations into whether CYP19A1 intron SNPs contribute to variations in cancer incidence, outcomes and pharmacological response seen in populations of different ancestry may prove beneficial.

Introduction
Cytochrome P450 19A1 (CYP19A1) encodes the enzyme aromatase, which catalyzes the conversion of the C19 androgens, androstenedione and testosterone, to estrone and estradiol, respectively [1,2]. Specific single nucleotide polymorphisms (SNPs) in the intronic regions of CYP19A1 have been shown to play a role in altering regulation of transcription and/or splicing of CYP19A1, producing different enzyme products with variable enzymatic activity compared to the normal gene product [3,4]. Studies have identified SNPs in CYP19A1 that are associated with cancer risk primarily in European Americans (EA), North Indian and Chinese populations [5,6]. Variations in the allele frequencies of several CYP19A1 SNPs and their haplotype distributions, especially rs10459592, rs749292, and rs6493497, have also been documented within South Indian, Korean, Hawaiian, Japanese, Latina and populations of European descent within the United States [7][8][9]. It is thus likely that ancestral differences in the frequencies of functional CYP19A1 SNPs can influence disease susceptibility and risk prediction. However, genetic studies of CYP19A1 SNPs in populations of African ancestry are limited.
Human CYP19A1 (Genbank accession number: NC_000015.10) is mapped to the positive strand of the long arm of chromosome 15 at 15q21.2 at chromosomal location 15: chr15:51,222,349-51,338,598. CYP19A1 is approximately 116 kb long and comprises nine protein coding exons and a number of alternative non-coding first exons that regulate tissuespecific expression [10]. Several genetic variants of CYP19A1 are localized within the introns. Intronic SNPs can potentially influence mRNA splicing, leading to CYP19A1 dysfunction. Variability in the frequencies of functional CYP19A1 SNPs can impact a multiplicity of functional elements, including intron splice enhancers and silencers that regulate alternative splicing, trans-splicing elements [11], and other regulatory elements. Several intronic SNPs located within the regulatory regions of CYP19A1 have been shown to influence estrogen-dependent disease risk, serum estrogen levels and/or aromatase production [12][13][14][15][16]. Furthermore, SNPs located within introns of CYP19A1 have been implicated in the development of multicentric adenocarcinomas in the peripheral lung [17], Alzheimer's disease [18], and neuroprotection through the neuroprotective actions of estrogens [19].
In light of these considerations, we hypothesized that the frequency distribution of CYP19A1 intron SNPs that are associated with disease risk would differ between populations of European and African ancestry. To test our hypothesis, we determined the allele frequencies of ten candidate CYP19A1 SNPs, constructed haplotypes, and assessed their distributions in populations of European and African ancestry from Arkansas.

CYP19A1 SNP Selection and Genotyping
Ten CYP19A1 SNPs genotyped in this study were selected based upon their previously published associations with cancer risk and outcomes (Table 1), predicted localization within regulatory binding regions, and/or their predicted association with regulatory proteins involved in pre-mRNA processing, mRNA metabolism and transport, and gene expression ( Table 2). Identifying possible function roles of variants in HaploReg version 2 [20], Human Splicing Finder (version 2.4.1) [21], and TFSEARCH [22] was performed using the dbSNP rs number or 75bp nucleotide sequences upstream and downstream of the CYP19A1 SNP to identify potential target sites containing the test SNP, proteins that regulate expression of CYP19A1, and transcription factor binding sites that had a probability score 90%.
Genotyping of CYP19A1 SNPs was conducted using the TaqMan allelic discrimination assay on ABI PRISM 7900 HT platform (Life Technologies, CarIsbad, USA) according to the manufacturer's recommendations and established quality control measures for reliable genotyping results in the laboratory. In each 384-well reaction plate, two negative controls and positive DNA controls with known SNP genotype at the CYP19A1 were added for quality control. Predictor of clinical outcomes and adverse events associated with letrozole use in patients with metastatic breast cancer [25] rs12591359 Significantly associated with increased risk of colon cancer development [26] rs749292 Associated with an increase in circulating estrogen levels in postmenopausal women and increased endometrial cancer risk [13], [24] rs2470152 Associated with hormone estradiol levels; Significantly associated with increased risk of vertebral fractures; Associated with worse Glasgow Outcome Scale-6 scores after traumatic brain injury [29], [30] rs1902584 Associated with increased colorectal cancer risk in women; Associated with hormone estradiol levels in overweight postmenopausal women [6,31] rs2470144 Associated with worse Glasgow Outcome Scale-6 scores after traumatic brain injury; Significantly associated with increased annual sagittal maxillary growth and mandibular growth in boys [30,32] rs1961177 Significantly associated with an increased likelihood of a MSI+ colon tumor; Significantly associated with increased aortic diameter in men [26,33] rs6493497 Significantly associated with a greater change in aromatase activity after AI treatment and higher plasma estradiol levels in patients pre-AI and post-AI treatment [34] doi:10.1371/journal.pone.0117347.t001 In addition, each DNA sample (20ng) was genotyped in duplicate. The genotype data was analyzed by using SDS 2.3 Allelic Discrimination Software (Applied Biosystems).

Statistical Analyses
Genotype and allele frequencies with the 95% confidence interval (95% CI) for each SNP were calculated using SAS version 9.2 (SAS Institute, Cary, NC) and PLINK v1.07 (http://pngu.mgh. harvard.edu/purcell/plink/) [23]. Ethnic differences in genotype frequency for each SNP were compared using Pearson Chi-Square test. A test for the deviation from the Hardy-Weinberg equilibrium was performed for all the SNPs included in the study. Any additional analyses were done using SAS version 9.2 (SAS Institute, Cary, NC). Linkage disequilibrium (Pairwise linkage disequilibrium (LD) (D') for the SNPs was evaluated and visualized in HAPLOVIEW software (Version 4.2) [24]. LD), between pairs of alleles at different loci (i.e. SNPs), was calculated through the computing of the standardized LD value (D'). D' is the normalization of the LD, dividing it by the theoretical maximum value for the observed allele frequencies (D' = LD/ LDmax). |D'| = 1 indicates complete LD and D' = 0 corresponds to total absence of LD. Pairwise LD for the SNPs was visualized in HAPLOVIEW software (Version 4.2) [24]. After assessing the LD patterns, non-missing genotype frequencies from all SNPs were used to reconstruct haplotypes and estimate their respective frequencies in the two populations using the PHASE v2.1.1 program [25]. This program has been shown to be ideal to generate haplotypes from multiple loci or long DNA sequence stretches. We ran PHASE in three different settings i) 100 Also the pseudo-random number generator for second and third run were changed with -S option (i.e. for second run: -S211 and for the third run:-S3253). We observed estimated haplotype frequencies across different runs were fairly consistent.

Results
Comparison of CYP19A1 genotype and allele frequencies between AA and EA subjects from Arkansas We genotyped ten candidate CYP19A1 SNPs in AA and EA populations. Significant differences in the minor allele frequencies (MAF) between EA and AA populations in six of ten CYP19A1 SNPs were observed (Table 3). Specifically, a significant difference in the MAF between AA and EA was observed for rs10459592 (p<0.0001), rs12908960 (p<0.0001), rs1902584 (p = 0.016), rs2470144 (p<0.0001), rs1961177 (p<0.0001), and rs6493497 (p = 0.003). There was no difference in the MAF between AA and EA for the remaining four SNPS, i.e. rs12591359 (p = 0.891), rs11856927 (p = 0.439), rs749292 (p = 0.434), or rs2470152 (p = 0.19). All SNPs, except for rs1902584, were consistent with HWE in both populations. Genotypes generated from rs1902584 were confirmed to be free from genotyping error.
We also compared the CYP19A1 MAF between EA and AA populations with those from HapMap and the ASW 1000 Genomes population. Population allele and genotype frequencies for all the CYP19A1 SNPs analyzed in this study are summarized in Table 4. The HapMap and other populations analyzed included Utah residents with Northern and Western European ancestry (CEU), Yoruba in Ibadan, Nigeria (YRI), Han Chinese in Beijing, China (HCB), Japanese in Tokyo, Japan (JPT), and Gujarat Indians in Houston, Texas (GIH) that were previously reported according to the NCBI Entrez database, as well as South Indians (SI) and Koreans (KOR). Significant differences in the MAF for rs12591359, rs12908960, rs749292, and rs2470144 were observed between AA and EA populations and the SI, HCB, JPT, YRI, and GIH populations. When comparing EA in our study with CEU, no significant differences in the MAF between the two populations was evident (p = 0.871). However, a 1.2-fold to 19-fold difference in the MAF for rs10459592 (p = 0.004), rs1902584 (p<0.0001), rs2470144 (p = 0.0006), and rs12908960 (p = 0.0006) was observed between AA and YRI, but not between AA and ASW populations, which are each populations of African ancestry. There were no significant differences in the allele and genotype frequencies for the CYP19A1 SNPs analyzed in this study by sex (data not shown).
Comparison of the D' linkage disequilibrium (LD) patterns of the CYP19A1 gene between Arkansas-AA and Arkansas-EA populations We made separate D' LD plots of the CYP19A1 gene for the EA and AA populations in order to examine the similarities and differences in the LD pattern of CYP19A1 SNPs between these two populations. Pairwise LD analysis revealed differences in sizes and patterns of LD block between AA and EA populations at the CYP19A1 gene (Fig. 1). The LD values indicated that there were differences between the two ethnic groups. A relatively strong LD pattern (defined as having a pairwise D' > 0.8) was observed between rs10459592, rs12591359, and rs1290896; rs6493497, rs1961177 and rs2470144 among EA populations while only rs6493497, rs1961177 and rs2470144 had a strong LD pattern among AA. In EA, six CYP19A1 SNPs (defined by rs2470152, rs749292, rs11856927, rs12908960, rs12591359, and rs10459592) were also in linkage and resulted in a 58 kb haplotype block. A smaller block of 19 kb (defined by rs6493497, rs1961177, rs2470144, and rs1902584) was also observed in EA while only one much smaller LD block, 9 kb, was observed in AA. The boundary of the two blocks in EA population was between rs1902584 and rs2470152. Therefore, the LD block extended over 70 kb, with 16kb inter-block distance between the two blocks.

Genetic diversity of CYP19A1 haplotypes between AA and EA populations
Based on the LD patterns, we were able to construct haplotype blocks that are shown in Fig. 2. Towards the 5' end of CYP19A1, we observed the 58 kb EA-specific haplotype block 1   comprising of haplotypes defined by nine different allele arrangements. Towards the 3' of the gene we observed a larger EA-specific haplotype block 2 defined by four SNPs (with five different allele arrangements) and the only smaller AA-specific haplotype block 1 defined by three SNPs (with four different allele arrangements). The most common haplotypes in EA were AGCG (35%), AACG (44%), GGAGAC (36%), and TAGTGT (27%) while GCG (44%), GTA (23%), ACG (16%) and GTG (16%) were the most common haplotypes in the AA population (Fig. 2).

Discussion
To our knowledge, this is the first study to report the allele and genotype frequencies, LD pattern, Hardy-Weinberg equilibrium and haplotype structures of CYP19A1 intron SNPs in populations of African and European ancestry from Arkansas. Arkansas is a primarily rural state that has a high incidence of breast cancer and other health related disparities, particularly among populations of African ancestry. Patterns of genetic variation in the population of Arkansas has been influenced by a regional-specific demographic history (e.g., changes in population size, short-and long-range migration events, admixture and environment) as well as locus-specific forces such as natural selection, recombination, and mutation [Unpublished data] [23]. Thus, we hypothesized that the allele frequency distributions in CYP19A1, because of its association with breast cancer and role in estrogen biosynthesis, may be different between populations of African and European ancestry from Arkansas. Therefore, we conducted the present study. For this study, we focused on CYP19A1 SNPs that were predicted to localize within regulatory binding regions, and/or predicted to associate with regulatory proteins involved in pre-mRNA processing, mRNA metabolism and transport, and previously associated with cancer risk and patient outcomes. Furthermore, the ten CYP19A1 SNPs genotyped have been reported to influence hormone estradiol levels in postmenopausal women [13,24], predict clinical outcome in metastatic breast cancer patients treated with letrozole [25], and significantly increase risk of colon [26] and endometrial cancer development [13,24]. Due to the clinical impact of CYP19A1 in disease risk, this study was initiated to determine whether stratification by ethnicity would reveal regional-specific significant differences in allele frequencies of CYP19A1 SNPs between populations of African and European ancestry in Arkansas.
The allele frequencies of six of the ten CYP19A1 SNPs, rs10459592, rs12908960, rs1902584, rs2470144, rs1961177, and rs6493497, were significantly different between populations of European and African ancestry from Arkansas and when compared to international HapMap populations. Similar findings have also been reported in populations of South Indian (SI) and Korean (KOR) origin, respectively [9,10]. Studies by Umamaheswaran et al., [9] and Lee et al. [10], demonstrated that the minor allele frequencies for several CYP19A1 intronic SNPs were significantly different in SI and KOR populations, respectively, compared with HapMap populations of similar ethnicity. Using Taqman SNP genotyping assays on 163 healthy subjects of South Indian origin, Umamaheswaran et al observed significant differences in the minor allele frequencies for rs10459592, rs749292, and rs6493497 when compared to HapMap populations [7] and 50 unrelated, healthy Koreans in the study by Lee et al. [10]. These genetic differences in CYP19A1 between ethnic groups stress the importance for considering ancestral differences when determining causal SNPs for disease association studies.
For instance, Haiman et al demonstrated that the rs749292 CYP19A1 SNP was a predictor of circulating estrogen levels in white women of primarily European descent [13]. In the Breast and Prostate Cancer Cohort Consortium (BPC3), a large collaborative prospective study of over 8,000 prostate cancer cases and 9,000 age and ethnicity-matched controls consisting of EA, Latinos, Japanese Americans, and Native Hawaiians, several haplotype tagging SNPs, including rs749292, were found to be in LD and were significantly associated with a 5% to 10% difference in estradiol concentrations in men [16]. Another population-based case-control study of colon cancer patients of European descent showed that individuals homozygous for the "A" minor allele of rs12591359 were associated with an increased risk of colon cancer (OR 1.44 95% CI 1.16-1.80) and rs2470144 was associated with reduced risk of rectal cancer [26]. In our study, we did not observe a significant difference in the minor allele frequency for rs749292 and rs12591359 between our AA and EA populations. This implies that circulating estrogen levels may not be significantly different between ethnic groups with similar minor allele frequencies; however, it is highly unlikely that only one SNP would give rise to a given phenotype. On the other hand, the minor allele frequency for rs2470144 was significantly different between AA and EA and between YRI and JPT international HapMap populations. We also showed that frequencies were similar between CEU and EA groups, but interestingly not between AA and YRI. Recently, the rs10459592 SNP was significantly associated with higher clinical benefit rate from letrozole, an aromatase inhibitor, in 109 Korean hormone receptor positive metastatic breast cancer patients [27], which further supports the importance of investigating tagged SNPs in ethnically diverse populations.
Stratification of AA and EA populations by ethnicity and region also revealed significant differences in haplotype frequencies and LD patterns that were unique to EA and also those common to both populations. Two large LD blocks of 58Kb and 19Kb were observed among EA while a smaller 9Kb LD block was observed in AA. The haplotype blocks clustered in smaller blocks among AA population compared to the EA population, which showed evidence of haplotypes clustering in larger blocks. This feature is most likely attributed to populations of African ancestry that have higher effective population size and genetic diversity. Furthermore, similar LD patterns in the human genome across populations have also been reported previously [27,28]. Therefore, we expected to observe larger LD blocks for EA compared to AA from Arkansas. In our analyses, the AA-specific block 1 harboring GCC (AA-block 1, Fig. 2) is unique and found in both a very large and smaller clades arising from unique common ancestry. It is clear that understanding the allele profiles upstream of AA-block 1 can help map and understand the influence of adjacent SNPs along the haplotypes with regards to severity of certain phenotypes. Similarly EA-specific blocks 1 and 2 (Fig. 2) harbor haplotypes that may provide EA-specific adjacent genetic information for mapping purposes and behavior of phenotypes.
In summary, our results provide evidence of ethnic differences in the frequency of CYP19A1 SNPs between populations of African and European ancestry from Arkansas. CYP19A1 is critical for estrogen biosynthesis, thus, identifying genetic variants in CYP19A1 is necessary for assessing cancer risk and predicting response to aromatase inhibitor drugs across ethnically diverse and disparate populations. Furthermore, because CYP19A1 contains thousands of intronic SNPs some of which may lie in regulatory regions that may independently alter the function of the enzyme, analysis of low-frequency SNPs and identifying rare haplotypes among ethnic groups that have higher disease risk is critical for understanding the influence of SNPs which define an individual's genetic background within or adjacent to functional domains that may influence drug response and disease risk.