A Common Polymorphism near the ESR1 Gene Is Associated with Risk of Breast Cancer: Evidence from a Case-Control Study and a Meta-Analysis

Background Genome-wide association studies have reported that a polymorphism near the estrogen receptor gene (ESR1) (rs2046210) is associated with a risk of breast cancer, with the A allele conferring an increased risk. However, considering the controversial results from more recent replicated studies, we conducted a case-control study in an independent Chinese Han population and a meta-analysis to clarify the association of this polymorphism with breast cancer risk. Method and Findings A hospital-based case-control study including 461 cases and 537 controls from a Chinese Han population was conducted initially, and this study showed that the rs2046210 A allele was significantly associated with breast cancer risk, with an OR of 1.32 (95% CI  = 1.10–1.59). Subsequently, a meta-analysis integrating the current study and previous publications with a total of 53,379 cases and 55,493 controls was performed to further confirm our findings. Similarly, a significant association between this polymorphism and breast cancer risk was also observed in the overall population especially among Asians, with ORs for per A allele of 1.14 (95% CI  = 1.10–1.18) in the overall population and 1.27 (95% CI  = 1.23–1.31) in the Asian population. Conclusion Our results provide strong evidence to support that the common polymorphism near the ESR1 gene, rs2046210, is associated with an increased risk of breast cancer in Asian and European populations but not in Africans, although the biological mechanisms need to be further investigated.


Introduction
According to global cancer statistics in 2008, breast cancer (BC) is the most common malignant tumor among women worldwide, accounting for 23% of new cancer cases and 14% of cancer deaths. In several developing countries, such as China, BC has surpassed cervical cancer and become the leading cause of cancer death among females [1]. Although the underlying mechanism of BC pathogenesis is still unclear, accumulating evidence shows that development of BC is a multifactorial complex process influenced by multiple genetic variants and environmental factors [2]. Given that only a few individuals receiving the same environmental exposure will develop BC, and that women with a family history of BC are at high risk for the cancer, it appears that genetic factors may play an important role in the etiology of BC. Previous studies have revealed that rare but high-penetrance mutations in BRCA1/2 and a few other inherited variants explain only up to 5% of overall BC incidence, whereas more common but lowpenetrance susceptibility alleles may be responsible for a substantial fraction of BC [3][4][5].
Recent genome-wide association studies (GWAS) have demonstrated numerous low-penetrance susceptibility loci were significantly associated with BC [6][7][8][9][10][11][12][13][14][15][16][17]. Among them, a common single nucleotide polymorphism (SNP) in the vicinity of the ESR1 gene was highlighted for its potentially biological plausibility in the development of breast carcinogenesis. ESR1 gene encodes estrogen receptor a (ERa), which stimulates proliferation and differentiation of mammary epithelial tissue through combining with estrogen, an established risk factor for BC [18]. Because the biological roles of estrogen are achieved through a high-affinity combination with ERs, the genetic variants in ER genes have become the focus of molecular epidemiological studies on BC susceptibility [19,20]. rs2046210, which is located 180 kb upstream of the transcription initiation site of the first coding exon of the ESR1 gene, was firstly reported by Zheng et al. [21], to be associated with an increased risk of BC. However, several subsequent replication studies could not reach consistent results; for example, Stacey et al. [22] failed to validate the association in Europeans, and similarly, Campa et al. [23] were also unable to replicate the findings in Asians. Potential explanations for the discrepancy could be the modest effect of this SNP and the diverse genetic backgrounds of the different ethnic groups. Additionally, due to the ''winner's curse'' phenomenon, the replication studies were likely to be underpowered and possibly failed if the sample size calculations were based on the overestimated effect sizes generated from the initial study [24]. Nevertheless, meta-analysis, a powerful tool that combines data to give the exponential increase in sample sizes, could resolve the discordances in genetic association studies [25]. Thus, in this study, we carried out a case-control study to validate the association of rs2046210 and BC risk in the Han Chinese population along with a meta-analysis that integrated the current study and previous publications about this polymorphism, to derive a more accurate estimation of the association between this polymorphism and BC risk.

Study population
A total of 461 incident cases and 537 controls were enrolled between June 2009 and December 2011 from Union Hospital of Huazhong University of Science and Technology, Wuhan, China. All the cases were histopathologically confirmed with primary BC and none of them had received neo-adjuvant treatment. The controls were randomly selected from a health check-up program at

Genotyping
For each subject, genomic DNA was extracted from a 2-ml peripheral blood sample using the RelaxGene Blood System DP319-02 (Tiangen, Beijing, China) according to the manufacturer's instructions. The genotype of rs2046210 was carried out

Statistical Analysis
Differences in the distribution of demographic characteristics between the cases and controls were evaluated by using the x 2 test and t-test. The Hardy-Weinberg equilibrium (HWE) for genotypes in the controls was assessed by the goodness-of-fit x 2 test. An unconditional multivariate logistic regression model was used to estimate the associations between genotypes and BC risk by calculating the odds ratios (ORs) and 95% confidence intervals (CIs) after adjusting for age and menopausal status. To avoid the assumptions of genetic models, dominant, recessive and additive models for rs2046210 were also assessed. In addition, stratified analyses by menopausal status, estrogen receptor (ER) and progesterone receptor (PR) status were further carried out to evaluate the role of rs2046210 in BC risk. All statistical tests were two-sided and performed using the SPSS 12.0 computer program.

Meta-analysis of rs2046210 in association with BC risk
To further investigate the association between rs2046210 and BC risk, we carried out a meta-analysis combining previous publications and the current study. We searched PubMed, EMBASE, ISI Web of Science databases and CNKI (China National Knowledge Infrastructure) for literature published in any language up to June 2012 using the keywords combinations of ''rs2046210 or 6q25.1'' and ''breast cancer, breast carcinoma or breast neoplasm''. The references of retrieved articles and reviews were also checked for missing information. The literature that was included needed to meet the following criteria: (1) the study evaluating the association between rs2046210 and BC risk; (2) providing data for calculating genotypic ORs with corresponding 95% CI; (3) the genotypes in control conforming to Hardy-Weinberg equilibrium (P.0. 05). Reviews, simple commentaries and case reports were excluded. If the studies had overlapping subjects, the one with the largest samples was finally included.
For each study, the following data were extracted: first author, year of publication, geographic location, ethnicity of the study population, study design, study method, control source, sample size, and frequencies of genotypes in cases and controls. The ORs were calculated for the risk allele A versus the wild allele G. Genotype AG versus GG, AA versus GG, dominant, recessive and additive models were recalculated from parts of the included studies because some research did not provide sufficient data. The between-study heterogeneity was assessed by Cochran's Q test and I 2 statistics. Heterogeneity was considered significant at P,0.10 for the Q statistic [27]. The I 2 statistics was then used to evaluate heterogeneity quantitatively (I 2 ,25%, low heterogeneity; I 2 = 25-75%, moderate heterogeneity; I 2 .75%, high heterogeneity) [28]. A fixed-effects model of the Mantel-Haenszel method was applied to pool data from studies if the heterogeneity was negligible based on a P value greater than 0.1 for Q statistics; otherwise, a randomeffects model of the DerSimonian and Laird method was used [29]. To explore sources of heterogeneity across studies, a metaregression model was employed [30]. The particular covariates for assessment of heterogeneity sources were: ethnicity (Asian, European and African), study design (GWA studies and replication), study method (case-control studies and nested case-control studies), sample size (#2000 and .2000 subjects), source of control (population and hospital based controls). Stratified analysis was then conducted according to the potential sources of heterogeneity evaluated by meta-regression analysis. The subgroup meta-analyses stratified by ER and menopausal status were further performed. Additionally, sensitivity analysis was performed by omission of each study in turn to assess the influence of each study on the overall estimate [31]. Cumulative meta-analysis was performed by assortment of publication times [32]. Publication bias was assessed by a funnel plot and Eegger's test [33,34]. All statistical analyses were carried out in STATA 10.0, and all P values were two-sided with a significance level at 0.05. In order to ensure the rigor of this current meta-analysis, we designed and reported it according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement and the checklist is shown in Table S1 (http://www. prisma-statement.org).

Results of case-control study
Population characteristics. The characteristics of the cases and controls were listed in Table 1. A total of 461 BC cases and 537 frequency-matched controls were enrolled in this study. Mean age was 48.41 (69.85) for cases and 49.04 (612.45) for controls, and there was no significant difference between two groups (P = 0.369). The percentage of premenopausal women was 54.7% among the BC cases compared with 46.6% among controls, and the P value for the distribution of menopause status between the cases and controls was 0.011.
We then stratified the data according to menopausal status and ER status. The results demonstrated that rs2046210 was associated with an elevated risk of BC in an allelic model among both pre-and post-menopausal individuals. The positive association of this SNP with BC risk was also found for both ER positive and ER negative women with adjusted ORs equal to 1.27(P = 0.029) and 1.38(P = 0.009) respectively.

Result of meta-analyses
Characteristics of included studies. As shown in Figure S1, 23 potentially relevant publications were identified through PubMed, EMBASE, ISI Web of Science and CNKI initially, of which 17 publications were judged to preliminarily meet the inclusion criteria mentioned above. Seven articles were excluded because the cases largely overlapped with the samples of previous studies [35][36][37][38][39][40][41]. The multicenter research reported by Cai et al. [42] contained samples that duplicated those in the research conducted by Han et al [43]; therefore the corresponding study with less case number was excluded. Finally, 10 previous publications [21][22][23][42][43][44][45][46][47][48] (Table 3) and the current study comprising 36 studies consisting of 53,379 cases and 55,493 controls were included in this meta-analysis based on our search strategy and eligibility criteria. Among them, the publication by Stacey et al. [22] provided only allelic OR value, and was thus only included in the pooled analysis for the allelic model of A VS. G. The study reported by Jiang et al. [47] did not provide the genotype of samples in detail, so we merely put it into the corresponding pooled analysis according to the data it provided.
Overall meta-analyses of rs2046210 in associated with BC risk. As shown in Table 4, the P values for heterogeneity were less than 0.1 in all genetic models, therefore, ORs were pooled under a random-effects model. In the allelic model, A allele conferred a pooled OR of 1.14 (95% CI = 1.10-1.18, P,0.001) compared to G allele (Figure 1). Genotypic ORs of GA versus GG, AA versus GG, and a dominant model combined both crude and adjusted ORs because a study of Asians only provided adjusted ORs of the three models as mentioned previously [47], and the pooled ORs were 1.17 (95% CI = 1.11-1.24, P,0.001), 1.33 (95% CI = 1.24-1.44, P,0.001) and 1.21 (95% CI = 1.14-1.28, P,0.001) respectively. Significant associations between rs2046210 and BC risk were also observed in the recessive model (OR = 1.21, 95% CI = 1.15-1.28, P,0.001) and the additive model (OR = 1.15, 95% CI = 1.11-1.20, P,0.001) in this meta-analysis.
Meta-regression analyses and stratified analyses. To investigate the potential sources of between-study heterogeneity under allelic model of A VS. G, meta-regression analyses were performed. A empty regression was initially run to estimate the baseline value for tau 2 (tau 2 = 0.0073), and then we conducted a series of univariate model by adding single covariates including ethnicity of population, study design, study method, sample size, and source of control. In the univariate analyses, we found that the tau 2 value reduced to 0.0014 (adjusted R 2 = 81.04%) in the model of ethnicity, suggesting that ethnicity could explain 81.04% of the heterogeneity across studies in this allelic model. Then the stratified analyses by ethnicity were further carried out (Table 4). In Asian and European populations, the polymorphism in all genetic models presented a significantly increased risk of BC; however, there was no obvious association between the SNP and BC risk in the African population in any genetic model ( Figure 2). It demonstrated that the A variant played disparate roles in different ethnic populations. We found that the moderate heterogeneity still exited in the Europeans under allelic model (I 2 = 47.8%, P = 0.013), therefore, the further meta-regression was carried out and it revealed that the source of control could explain 53.61% of heterogeneity. After excluding the multicenter research reported by Steven et al [44] that combined hospital-and population-based controls together, the heterogeneity was reduced apparently (I 2 = 21.6%, P = 0.202).When we subsequently stratified the data by ER and menopausal status, the between-study heterogeneity was obvious, but it was reduced notably after further stratifying by ethnicity. The pooled ORs of A VS. G were 1.23(95% CI = 1.14-1.32, P,0.001) in the ER negative population and 1.12(95% CI = 1.04-1.20, P = 0.002) in the ER positive population. After comparing to cases with ER positive BC, the OR (95%CI) for ER negative BC was 1.11(95% CI = 1.06-1.15, G, which indicated that the association was stronger for ER negative BC than ER positive BC. Meanwhile, the positive association of this SNP with BC risk was also found in both pre-and post-menopausal women (OR = 1.18, 95% CI = 1.13-1.24, P,0.001 and OR = 1.22, 95% CI = 1.10-1.36, P,0.001), however, no stronger association was found in post-menopausal cases by comparison with pre-menopausal counterparts (P = 0.706).
Sensitivity analyses and cumulative metaanalyses. Since significant heterogeneity across studies existed in all genetic models of overall population and in allelic model of European population, we carried out sensitivity analyses to evaluate the effect of each study on the pooled estimate under a randomeffects model by removing each study sequentially. As shown in Table 5 and Table S2, the pooled ORs were similar before and after deletion of each study. We also achieved similar results in other genetic models and no single study changed the OR values markedly, therefore, the current results are stable and credible. Cumulative meta-analyses were carried out in all genetic models via an assortment of studies in chronologic order. As shown in Figure 3, the 95% CIs for the pooled ORs became increasingly narrower with each accumulation of more studies in all models, indicating that the precision of the estimation was progressively boosted by continually adding more samples.
Publication bias. A funnel plot ( Figure S2) and Egger's test (all P values for Egger's test .0.05) reflected that there was no evidence of publication bias in any of the genetic models.

Discussion
This study demonstrated a significant association between rs2046210 and an increased risk of BC in a Han Chinese population. The subsequent meta-analysis based on 36 studies consisting of 53,379 cases and 55,493 controls also confirmed the strong association under all genetic models in an overall population. To the best of our knowledge, this is the first metaanalysis seeking to clarify the association between this polymorphism and BC risk, and the sensitivity and cumulative analyses confirmed that the positive finding was stable and the precision of estimation was progressively boosted as more studies were involved. These results clearly revealed the role of this polymorphism, which is near the ESR1 gene, in BC susceptibility.
In the overall meta-analyses, all genetic models presented significant heterogeneity. However, the heterogeneity had been mostly explained by the ethnicity of study population according to the result of meta-regression analyses. After being stratified by ethnicity, it demonstrated that this polymorphism had a significant association with BC risk in Asians and only a weaker and unstable association in Europeans. Meanwhile it could not be validated in Africans. The strength of the association with rs2046210 varies greatly across ethnic groups. One probable reason is the considerable differences in genetic architecture across ethnic SNPs. Another plausible hypothesis suggests that rs2046210 is only a marker SNP of causative variants and resides in different linkage disequilibrium (LD) patterns among the three ethnic populations.
Intriguingly, in further analysis, we found that this association was more significant in ER negative than in ER positive BC. Two recent interesting studies [45,48] indicated that this polymorphism was associated with an increased risk of BC with BRCA1 mutation carriers, but not associated in BRCA2 mutation patients. Remarkably, accumulating evidence showed that the large majority of BRCA1 mutation carriers presented with ER negative tumors [49], which could partly explain why ER negative cases were accompanied by a stronger association. Additionally, recent studies in mice have revealed that the mammary stem cell compartment could be regulated by estrogen and progesterone through a paracrine signaling mechanism from ER positive cells to ER negative cells [50,51]. Thus, polymorphisms near the ESR1 locus could affect the occurrence and development of ER negative tumors through the paracrine pathway. In the stratified metaanalysis, we also found that rs2046210 was significantly associated with BC risk in both premenopausal and postmenopausal women for allelic model, which was kept in line with the result of our casecontrol study. However, there was no evidence showing that the association was stronger in post-than pre-menopausal women.
Considering the relative vicinity of rs2046210 to the ESR1 gene, it was speculated that the SNP itself or causal variants in LD with it might alter ESR1 gene expression, thus affecting the susceptibility to BC. However, the functional genomic analyses and in vitro functional experiments conducted by Cai et. al [42] provided no support for the potential involvement of this polymorphism in the regulation of ESR1. Although dozens of SNPs have been reported in high LD with this polymorphism, functional evaluations on them and their related genes were still warranted. Herein, we conjectured that this SNP might communicate with the ESR1 gene via a long-range chromatin loop. Nevertheless, it was just a postulation and needed to be confirmed by further longitudinal studies.
In conclusion, our case-control study and the subsequent metaanalysis effectively corroborated the impact of rs2046210 near the ESR1 gene on BC risk, and showed that the polymorphism had a larger effect on Asians than on Europeans or Africans. However, the function of this SNP is still unclear; future fine-mapping of the BC susceptibility loci tagged by rs2046210 is warranted and the underlying biological mechanism of this polymorphism still needs further investigation.