Microsatellites in the Estrogen Receptor (ESR1, ESR2) and Androgen Receptor (AR) Genes and Breast Cancer Risk in African American and Nigerian Women

Genetic variants in hormone receptor genes may be crucial predisposing factors for breast cancer, and microsatellites in the estrogen receptor (ESR1, ESR2) and androgen receptor (AR) genes have been suggested to play a role. We studied 258 African-American (AA) women with breast cancer and 259 hospital-based controls, as well as 349 Nigerian (NG) female breast cancer patients and 296 community controls. Three microsatellites, ESR1_TA, ESR2_CA and AR_CAG, in the ESR1, ESR2 and AR genes, respectively, were genotyped. Their repeat lengths were then analyzed as continuous and dichotomous variables. Analyses of continuous variables showed no association with breast cancer risk in either AA or NG at ESR1_TA; AA cases had shorter repeats in the long allele of ESR2_CA than AA controls (Mann-Whitney P  = 0.036; logistic regression P  = 0.04, OR  = 0.91, 95% CI 0.83–1.00), whereas NG patients had longer repeats in the short allele than NG controls (Mann-Whitney P  = 0.0018; logistic regression P  = 0.04, OR  = 1.06, 95% CI 1.00–1.11); and AA cases carried longer repeats in the short allele of AR_CAG than AA controls (Mann-Whitney P  = 0.038; logistic regression P  = 0.03, OR  = 1.08, 95% CI 1.01–1.15). When allele sizes were categorized as dichotomous variables, we discovered that women with two long alleles of ESR2_CA had increased risk of breast cancer (OR  = 1.38, 95% CI 1.10–1.74; P  = 0.006). This is the first study to investigate these three microsatellites in hormonal receptor genes in relation to breast cancer risk in an indigenous African population. After adjusting for multiple-testing, our findings suggest that ESR2_CA is associated with breast cancer risk in Nigerian women, whereas ESR1_TA and AR_CAG seem to have no association with the disease among African American or Nigerian women.


Introduction
Breast cancer is a major global problem, as it is one of the most commonly diagnosed cancers among women. In the United States, breast cancer is the first leading and the second fatal cancer among female cancer patients [1], while in western Africa, breast cancer has the second highest incidence and death rates [2]. Given the complex and comprehensive nature of breast cancer, it is reasonable to hypothesize that a joint effect of genetic factors as well as endogenous and exogenous hormonal and other environmental factors could contribute to an increased risk for breast cancer development.
Attention has long been paid to hormonal influences in breast cancer pathogenesis. Substantial evidence from both epidemiologic and experimental studies demonstrates a crucial role of hormones in the etiology of breast cancer. The indirect epidemiological support has come from the link between established reproductive breast cancer risk factors, such as early age at menarche and late age at menopause, and the exposures to estrogens and progesterone [3]. Consistent observations of higher levels of circulating estrogens and androgens in postmenopausal breast cancer women compared with controls provide the direct epidemiological evidence of hormones and breast cancer risk, whereas the link between sex steroids and premenopausal breast cancer risk still remains unclear [4]. Laboratory studies have showed that estrogens contribute to breast carcinogenesis via estrogen receptor (ER)-mediated cell proliferation, genotoxic effects of the metabolites, or the induction of aneuploidy [5]. Androgens have been suggested to influence breast cancer risk, either by directly binding to androgen receptors and either increasing or decreasing breast cell growth and proliferation, or by binding indirectly, through their conversion to estradiol or competitive binding to ER-a [6].
Microsatellites/short tandem repeats (STRs) are informative genetic markers in the human genome and they could have biological functions according to their locations, such as affecting protein coding (in coding regions), or regulating gene expression (in regulatory regions) [7]. Microsatellites in the estrogen receptor (ESR1, ESR2) and androgen receptor (AR) genes have been hypothesized to be predisposing factors for breast cancer but the mechanisms are unknown. A dinucleotide TA repeat polymorphism (ESR1_TA) is located in the promoter region of ESR1 isoforms 1 and 2, as well as intron 1 and intron 2 of ESR1 isoforms 3 and 4, respectively (6q25.1), while a polymorphic dinucleotide CA tandem repeat (ESR2_CA) is located in intron 5 of ESR2 (14q23.2). To date, the functional property of these two microsatellites remains unknown, although gene expression could be modulated by such repeat nucleotide sequences [7]. AR (Xq12) is a ligand-dependent transcription factor, and its first exon contains a trinucleotide CAG repeat (AR_CAG) which has been of active interest, compared to ESR1_TA and ESR2_CA. AR_CAG encodes a polyglutamine (PolyGln) tract in the N-terminal transactivation domain of the AR protein, and it has been elucidated that PolyGln length inversely correlates with AR transcriptional competence [8]. Over the last decade, intensive association studies have been conducted to test the relationship between these three microsatellites and breast cancer risk in men and women, predominantly focusing on AR_CAG (Table 1). However, the results are controversial due to different study designs and different races/ethnicities of the study subjects, as a population-specific manner could exist. In addition, most of the previous study populations were Caucasians, some were Asians, and only one study population was African American (Table 1).
In the present study, we aimed to investigate whether ESR1_TA, ESR2_CA, and AR_CAG could be breast cancer susceptibility markers in African American (AA) and Nigerian (NG) women. We used a case-control study design with 258 AA cases and 259 AA controls, together with 349 NG cases and 296 NG controls.

Ethics Statement
Informed consent was written and obtained from all participants. This study was approved by the Institutional Review Boards of the University of Chicago and the University of Ibadan.

Study Populations
The Chicago Cancer Prone Study (CCPS): CCPS is an ongoing hospital-based case-control study aimed at understanding the genetic basis of young-onset breast cancer. Histologically confirmed breast cancer patients were recruited through the Cancer Risk Clinic at the University of Chicago. Cancer-free healthy controls were gender-and age-matched with cases and enrolled from individuals who visited the same hospital (Translational Research Initiative in the Department of Medicine, TRIDOM  (Table S1  and Table S2).

Sample Quality Control
Genomic DNA was extracted from whole blood and evaluated for integrity by electrophoresis on agarose gel. Double-strand DNA was quantitated using Quant-iT TM PicoGreen dsDNA kit (Invitrogen, CA, USA) and then quantified on InfiniteH 200 PRO NanoQuant (Tecan, Mä nnedorf, Switzerland), according to the manufacturer's instructions. To examine the potential sample contamination and gender discrepancy, samples were amplified by polymerase chain reaction (PCR) using AmpF,STRH IdentifilerH PCR Amplification kit (Applied Biosystems, CA, USA) and the PCR products were analyzed on Applied Biosystems 3130 DNA Analyzer (Applied Biosystems, CA, USA). DNA fragment data were collected and then individually checked using GeneMapperH software (Applied Biosystems, CA, USA), according to the manufacturer's protocol.

Determination of Microsatellite Allele Sizes
Primers were designed to amplify the fragments encompassing microsatellites ESR1_TA, ESR2_CA and AR_CAG present in (amplicon: chrX:66765056-66765343, GRCh37/hg19). PCR was performed using PCRx Enhancer System (Invitrogen, CA, USA) with conditions as follows: initial denaturation at 95uC for 5 min, followed by 35 amplification cycles at 95uC for 30 sec, annealing at 55uC for 50 sec and 72uC for 50 sec, followed by a final extension step at 72uC for 10 min. Fluorescently labeled fragments generated by PCR were run on Applied Biosystems 3130 DNA Analyzer and the repeat lengths were subsequently checked and assigned in GeneMapperH software. Genotypes were determined by two independent investigators who were blinded to subject disease status and all clinical information. Multiple homozygous subjects for each individual microsatellite were randomly chosen from the same studied populations and sequenced. Repeat lengths read by sequencing were successfully assigned to corresponding peak positions determined by fluorescence-based genotyping. Ninety-six samples were randomly selected and repeated to test the assay reproducibility.

Post-genotyping Quality Measurements
Tests of conformance to Hardy-Weinberg genotypic expectations were carried out with Genepop v4.1 (available at http:// genepop.curtin.edu.au/). In addition, we checked the microsatellite data by Micro-Checker v2.2.3, which was designed to identify genotyping errors due to an excess of homozygotes caused by nonamplified alleles (null alleles), stutter peaks, or short allele dominance (large allele dropout) (available at http://www. microchecker.hull.ac.uk/). Genotypes that did not deviate from Hardy-Weinberg Equilibrium (HWE) were eligible for statistical association analyses.

Statistical Analysis
The repeat lengths for ESR1_TA, ESR2_CA and AR_CAG were classified as continuous and categorical variables, separately. For the continuous variable analysis, two alleles of a single microsatellite carried by each woman were assigned as the short allele (S) and the long allele (L), according to the smaller and larger allele sizes determined, respectively. For homozygotes, two alleles are identical in peak positions; they were assigned as one S and one L, and later included in both the short and long allele analyses. For each microsatellite, the repeat lengths (mean 6 standard deviation [SD]) were calculated under three categories: S, L, and the average of them. Wilcoxon rank-sum (Mann-Whitney) test was applied to compare the distributions of the repeat lengths, between case and control groups of AA and NG. Van Elteren's test which is a stratified version of Wilcoxon rank-sum test was used to compare the repeat length in the pooled AA and NG sample set. Odds ratios (ORs) were also calculated by logistic regression analysis controlling for ascertainment, and with 95% confidence interval (CI) for three microsatellites, in AA, NG, and AA + NG. In addition, in order to avoid too-strong assumptions allele sizes were classified into dichotomous groups. Given that there is no a priori cut-off point applicable to distinguish satisfactorily the short and long alleles for ESR1_TA, ESR2_CA or AR_CAG, the mean repeat lengths of these three microsatellites in control groups were chosen as cut-off points, respectively. The cut-off limits were 18 (S:  Table 2). The comparisons of genotype distributions between case and control groups were then performed in unconditional logistic regression models for AA and NG separately and as pooled samples, based on the categorical variables corresponding to the three microsatellites. Further, a cut-off point of 22 for AR_CAG was also chosen in an attempt to allow direct comparison of our results to the data previously reported in the literature. Moreover, since Wang and colleagues have conducted the association test between AR_CAG and breast cancer risk in AA using a dichotomized cut-off point of 22 [9], we also compared our data to theirs to see whether the findings were consistent in the same ethnic population. Moreover, we conducted a similar analysis for the ER status of breast cancer cases. Given the limited sample size, we combined AA and NG for both continuous and categorical variable analyses stratified by ER status (Table S1 and Table S2). The statistical analysis was conducted using Stata 11.1 software (StataCorp, TX, USA) and SAS 9.2 package (SAS Institute, NC, USA). All statistical tests were two-sided. Given three tested loci and two populations, the number of multiple-testing was 6. Thus, the significant threshold was set as 0.05/6 = 0.0083.

Genotyping Quality
No samples were detected to have contamination or gender issues, no discrepancy of allele calling was found between two investigators, and genotypes of 96 repeated samples were 100% consistent with previous determinations. Genotyping call rates were 99.8%, 100%, and 100% for ESR1_TA, ESR2_CA, and AR_CAG, respectively, in AA; genotyping call rates were 99.5%, 100%, and 99.8% for ESR1_TA, ESR2_CA, and AR_CAG, respectively, in NG. Calculations from Genepop and Micro-Checker showed that genotypes for all three microsatellites were in HWE and no potential genotyping errors were detected.

Distribution of ESR2_CA Alleles in Cases and Controls
The repeat length of ESR2_CA ranged from 9 to 31 (S: 9-26; L: 18-31; average [Ave.] 22.6161.74) in AA cases and 9 to 34 (S:   Figure 1). When the repeat lengths of both ESR2_CA alleles were analyzed as continuous variables, AA cases appeared to have shorter L than AA controls (Mann-Whitney P = 0.036; logistic regression P = 0.04, OR = 0.91, 95% CI 0.83-1.00), whereas NG patients had longer S than NG controls (Mann-Whitney P = 0.0018; logistic regression P = 0.04, OR = 1.06, 95% CI 1.00-1.11) ( Table 2). Thus, it turned out that L was the protective allele in AA and S was the risk allele in NG. In addition, it revealed that longer average repeat length of the ESR2_CA alleles was significantly more associated with breast cancer in NG (Mann-Whitney P = 0.0047; logistic regression P = 0.03, OR = 1.09, 95% CI 1.01-1.17) ( Table 2). According to the categorical repeat length cut-off of 23, comparisons of the ESR2_CA genotypes between case and control groups showed no significance in either AA or AA + NG, but there was significance in NG (P = 0.0004) ( Table 3). NG individuals with SS and SL genotypes (enrichment of the risk allele, S) had significantly increased risk of developing breast cancer compared to the ones with LL genotype (P,0.001, OR = 1.86, 95% CI 1.36-2.54) (Table 3). Similarly, the trend was also observed in AA + NG (P = 0.006, OR = 1.38, 95% CI 1.10-1.74) ( Table 3).

Distribution of AR_CAG Alleles in Cases and Controls
The microsatellite AR_CAG located in exon 1 of the AR gene has repeat length ranging from 10 to 32 ( Figure 1).
No statistical significance was found between AA patients and controls in the analysis of continuous variables for the distribution of AR_CAG repeat polymorphism. AA cases carried longer S of AR_CAG than AA controls (Mann-Whitney P = 0.038; logistic regression P = 0.03, OR = 1.08, 95% CI 1.01-1.15). No statistically significant signal was obtained in either NG or AA + NG. In addition, we applied the dichotomous cut-offs of 20 and 22 for AR_CAG and then conducted genotypic logistic regression analysis. The results of this study provided no evidence that the AR_CAG genotypes can significantly influence the risk for breast cancer, in any populations (Table 3). Furthermore, the combined data set from us and Wang et al. [9] showed that there was no association of AR_CAG genotype and breast cancer risk in AA (SL + LL vs. SS, P = 0.61, OR = 1.07, 95% CI 0.82-1.38) ( Table 4).

Discussion
Breast cancer is a hormone-dependent malignancy and cumulative exposure to sex hormones has been proposed to be linked to the development of breast cancer. Microsatellites ESR1_TA and ESR2_CA in the ESR1 and ESR2 genes, respectively, have also been reported to be associated with other diseases such as bone mineral density [10], osteoarthritis [11], and endometriosis [12]. Similarly, PolyGln tract of AR (encoded by AR_CAG) has been reported to be associated with susceptibility to a number of human diseases, such as prostate cancer, male infertility, cryptorchidism, hirsutism, Spinal and Bulbar Muscular Atrophy, and Kennedy's disease, among others [8]. The ascertainments of ESR1_TA, ESR2_CA and AR_CAG in breast cancer have also been of scientific interest, especially on AR_CAG (Table 1). Nonetheless, the findings were not consistent with the AR_CAG genetic association studies that have been widely conducted in different racial/ethnic populations, except in those studying populations of African ancestry. In the present study, we targeted these three microsatellites and tested for germline susceptibility to female breast cancer, in two populations of African descent that were historically understudied for breast cancer genetics: African Americans and Nigerians. ESR1_TA, ESR2_CA and AR_CAG were genotyped in 1,162 female individuals consisting of 258 AA breast cancer cases, 259 AA controls, 349 NG breast cancer patients, and 296 NG controls. We then compared the repeat lengths of these three microsatellites between case and control groups using statistical methods considering continuous or categorical variables. With 607 case patients and 555 controls, the current study provides 80% power at an alpha level of 0.05 to detect an OR of 1.39 if we assume that the probability of AR_CAG long alleles greater than 22 is 0.5, under a dominant genetic model according to previous study. For ESR1_TA and ESR2_CA, the study provides 80% power at an alpha level of 0.05 to detect an OR of 1.44 under a recessive genetic model or 1.49 under a dominant genetic model if we categorize these two microsatellites biomarkers according to the median. The power for detecting case-control difference is higher if we analyze these three microsatellites biomarkers as continuous variables.
Analyzing each individual allele of a microsatellite is a way to examine its genetic/biological function. However, it is not easy to test whether an effect from a combination of alleles exists, and which alleles are actually involved. The combination could rely on repeat length (joint effect from particular alleles of repeat-lengthdependent functional property), or allele frequency (quantitativelike effect from any functional alleles reaches a threshold). It therefore creates too many assumptions and comparisons, especially when the function (if any) of exact allele(s) of a microsatellite remains unclear. Previous studies applied either the mean repeat length as a cut-off, or a cut-off previously reported, sometimes even ignoring the allele distribution differences among different ethnic populations. We chose to set the mean repeat length as a cut-off, because it divides the distribution of microsatellite repeat lengths in approximately in half, maintains adequate numbers in each allele category, and therefore provides the greatest power to detect statistically significant differences between cases and controls. Overall, the allele distributions of each individual microsatellite between AA and NG are similar (Figure 1). The allele distribution of AR_CAG in our AA is comparable to other AA cohorts [9]. Additionally, the spectrum of AR_CAG in the present study is consistent with previous observations that Africans have the shortest AR_CAG repeats, Asians bear the longest ones, while Caucasians and Mexican Americans are in the middle [13].
With the cut-off of ESR1_TA repeat length defined as 18, we found a P value of 0.039 in logistic regression in AA + NG, but not in AA or NG; however, this P value did not reach the significant threshold of 0.0083. Additionally, no statistical significance was observed in continuous variable analysis. It seems that the risk impact of ESR1_TA in breast cancer is weak or is very likely to be absent in AA and NG. A large case-control study of ESR1 haplotype and postmenopausal breast cancer risk in Sweden has been carried out, but the ESR1_TA genotypes distribution deviated from HWE [14]. Studies in French and Greek populations suggested that ESR1_TA does not contribute to the development of breast cancer in women [15,16].
In recent years, many genome-wide association studies (GWAS) have been conducted to search for breast cancer susceptibility loci. One of the GWAS hits, 6q25.1 (CCDC170-ESR1 region), was identified in a Chinese population [17]. The most significant single nucleotide polymorphism (SNP), rs2046210, is located 39 downstream (6kb away) of CCDC170 and 59 upstream (63 kb away) of the nearest ESR1 isoform 4, a region covered by a single linkage disequilibrium (LD) block. This SNP has been replicated in Chinese and Japanese populations [18] but not in African Americans [18][19][20][21]. Furthermore, fine-mapping studies suggested that rs9397435 (2.9 kb away from rs2046210) could confer risk to all three populations in women of Asian, European, and African origin [22]. These two SNPs are 119 kb away from ESR1_TA which is located in another LD block. Thus, the role of germline genetic variants of ESR1 in breast cancer etiology herein remains to be further clarified and explored. To date, neither ESR2 nor AR genes has been identified by breast cancer GWAS.
For ESR2_CA, our analysis of continuous variables showed that AA cases had shorter L than AA controls (Mann-Whitney P = 0.036; logistic regression P = 0.04, OR = 0.91, 95% CI 0.83-1.00). Thus, L turned out to be the protective allele in AA. However, the association results were negative when considering multiple-testing correction. So, the inherited predisposition of ESR2_CA to breast cancer in AA women was not fully supported. On the other hand, NG patients appeared to have longer S than NG controls (Mann-Whitney P = 0.0047; logistic regression P = 0.03, OR = 1.09, 95% CI 1.01-1.17), namely, S was the risk allele in NG. In the logistic regression analysis using the categorical cut-off of 23, ESR2_CA genotype overall distribution in cases and controls showed no significance in either AA or AA + NG, but in NG (P = 0.0004). When comparing LL vs. SS + SL, statistical significances were also obtained in NG (P,0.001, OR = 1.86, 95% CI 1.36-2.54) and AA + NG (P = 0.006, OR = 1.38, 95% CI 1.10-1.74). Although the association of ESR2_CA and breast cancer risk has not been reported in Finnish [23] or French populations [15], shorter repeats of ESR2_CA has been found to be associated with higher breast cancer risk in Romanians [24], while longer repeats has been linked to less breast cancer risk in Greeks [16]. Our data were in line with the above observations in Romanians and Greeks.
The reversed association directions of ESR2_CA observed in our AA and NG cohorts, could be explained as a flip-flop phenomenon which has been frequently observed across different or even the same ethnic groups [25]. If a genetic variant is a functional causal variant (e.g. a confirmed missense mutation or splicing variant), its risk allele is supposed to be the same in different association tests, theoretically; if a flip-flop association occurs for a variant with unknown function (e.g. ESR2_CA or an intergenic SNP), it could indicate a false-positive or a true association -different risk alleles from different study populations tag/capture the same risk allele of another genuine causal variant in LD. In addition to different LD architectures, flip-flop associations could also be attributable to sample heterogeneity, sampling variations, multilocus interactions, and gene-environment interactions, across different racial/ethnic populations. Although African Americans and Nigerians share ancestral origin, they have differences in life style, environmental exposure, mutation selection balance, local recombination rate, LD pattern, among others. To the best of our knowledge, this is the first reported association of ESR2_CA and breast cancer risk in women of African ancestry; its flip-flop associations underscore the need for further and deeper validation in the same or similar populations.
Among the three microsatellites examined, AR_CAG is the most commonly tested one in the literature (Table 1). In the analysis of continuous variables, it showed that AA cases carried longer S of AR_CAG than AA controls (Mann-Whitney P = 0.038; logistic regression P = 0.03, OR = 1.08, 95% CI 1.01-1.15), but no statistical significance was obtained after adjusting for multiple-testing. Furthermore, we did not gain any significant indications from logistic regression analysis when a dichotomized cut-off of 20 was applied. Wang et al. have genotyped AR_CAG in 239 AA breast cancer women and 249 controls and found no significant association between AR_CAG and overall breast cancer risk using a cut-off of 22 AR_CAG repeats [9]. Nonetheless, longer AR_CAG repeats were found to be associated with increased risk in AA women with a first-degree family history of breast cancer [9]. In order to compare our results to theirs, we also classified the AR_CAG repeat lengths using the same cut-off of 22. Our own data together with the results from the combined data set revealed no positive association between AR_CAG and breast cancer risk in AA women. We were unable to perform statistical comparisons in women with a first-degree family history of breast cancer, due to the very limited number of such individuals in the present study. Based on the results above, we can only cautiously draw a conservative conclusion that AR_CAG seems to influence the overall breast cancer risk weakly at most in AA, though it might affect the disease susceptibility to women having family history of breast cancer.
It is noteworthy that there have been divergent findings about the AR_CAG and breast cancer risk (Table 1). One possible explanation is that the previous studies applied different study designs (case-control study vs. case-only study), populations (Caucasians, Asians, etc.), subjects' gender (female vs. male), sample size, DNA (germline vs. somatic), or dichotomous repeat length cut-offs. In addition, X-inactivation has been suspected to bias risk estimates, since AR is located on the X chromosome. However, it has been reported that the short and long AR_CAG alleles were subjected to skewed inactivation with similar frequency [26], and it has been suggested that X-inactivation might not be responsible to the estimation bias due to its early occurrence in the embryo and later in the different lobes within the same breast [27].
Since it has become clear in recent years that breast cancer genetic susceptibility is subtype specific, we conducted similar subset analysis stratified by ER status but found no significance (Table S1 and Table S2). It is worth noting that we were unable to obtain sufficient statistical power for the subtype analysis due to the limited sample size. Our work is a retrospective study primarily designed for overall breast cancer risk. In addition, doing immunohistochemistry work in Nigeria is a challenge because scientific and medical support is still an emergent need there. Future investigations are warranted to determine the risk conferred by these three microsatellites in breast cancer patients classified by subtypes.
To our knowledge, there have been only three previous reports investigating the association between all the three microsatellites (ESR1_TA, ESR2_CA, and AR_CAG) and risk of breast cancer in women [15,16,24], and the present study is the first to evaluate the risk impact of these microsatellites in hormonal receptor genes and the germline susceptibility to breast cancer in an indigenous African population. The capability of these three microsatellites to estimate breast cancer risk requires further replications in larger and more diverse populations.