Comparison of 6q25 Breast Cancer Hits from Asian and European Genome Wide Association Studies in the Breast Cancer Association Consortium (BCAC)

The 6q25.1 locus was first identified via a genome-wide association study (GWAS) in Chinese women and marked by single nucleotide polymorphism (SNP) rs2046210, approximately 180 Kb upstream of ESR1. There have been conflicting reports about the association of this locus with breast cancer in Europeans, and a GWAS in Europeans identified a different SNP, tagged here by rs12662670. We examined the associations of both SNPs in up to 61,689 cases and 58,822 controls from forty-four studies collaborating in the Breast Cancer Association Consortium, of which four studies were of Asian and 39 of European descent. Logistic regression was used to estimate odds ratios (OR) and 95% confidence intervals (CI). Case-only analyses were used to compare SNP effects in Estrogen Receptor positive (ER+) versus negative (ER−) tumours. Models including both SNPs were fitted to investigate whether the SNP effects were independent. Both SNPs are significantly associated with breast cancer risk in both ethnic groups. Per-allele ORs are higher in Asian than in European studies [rs2046210: OR (A/G) = 1.36 (95% CI 1.26–1.48), p = 7.6×10−14 in Asians and 1.09 (95% CI 1.07–1.11), p = 6.8×10−18 in Europeans. rs12662670: OR (G/T) = 1.29 (95% CI 1.19–1.41), p = 1.2×10−9 in Asians and 1.12 (95% CI 1.08–1.17), p = 3.8×10−9 in Europeans]. SNP rs2046210 is associated with a significantly greater risk of ER− than ER+ tumours in Europeans [OR (ER−) = 1.20 (95% CI 1.15–1.25), p = 1.8×10−17 versus OR (ER+) = 1.07 (95% CI 1.04–1.1), p = 1.3×10−7, pheterogeneity = 5.1×10−6]. In these Asian studies, by contrast, there is no clear evidence of a differential association by tumour receptor status. Each SNP is associated with risk after adjustment for the other SNP. These results suggest the presence of two variants at 6q25.1 each independently associated with breast cancer risk in Asians and in Europeans. Of these two, the one tagged by rs2046210 is associated with a greater risk of ER− tumours.


Introduction
A genome-wide association study (GWAS) in Chinese women by Zheng et al. [1] identified a novel breast cancer susceptibility locus at 6q25.1. The most strongly associated single nucleotide polymorphism (SNP) was rs2046210, with an estimated Odds ratio (OR) [per-allele A/G] = 1.29 (95% confidence interval (CI) 1.21-1.37, p = 10 215 ). SNP rs2046210 did not show a clear association in GWAS carried out in women of European ancestry, and replication studies indicated its effect, if any, was weaker in Europeans [OR (per allele A/G) = 1.04 (95% CI 0.99-1.08), p = 0.09 in a combined analysis of European studies [2]]. More recent studies in European women suggested stronger associations with other SNPs in the region: Turnbull et al. [3] found the most significantly associated SNP to be rs3757318, which is only weakly correlated with rs2046210 in Europeans (r 2 = 0.09 from in HapMap2 CEU), while Stacey et al. [2] suggested that SNPs closer to ESR1 may be more strongly associated. It is as yet unclear whether this difference in breast cancer associated SNPs between Asians and Europeans indicates the presence of a single or multiple causative variant(s) at this locus. If there is only one, it is unlikely to be highly correlated with the best tags identified from either the Asian or European GWAS and could potentially be a common variant with a small effect or a rarer one with a larger effect on breast cancer risk.
In this, by far the largest study to date, we investigate associations with SNP rs2046210, as well as with SNP rs12662670 in forty-four case-control studies within the Breast Cancer Association Consortium (BCAC). These two SNPs have been genotyped in a total of 120,511 female subjects, of which 110,265 subjects are of European ancestry and 8,559 are Asian. SNP rs2046210 is the best tag from the original Asian GWAS [1] and SNP rs12662670 is an easier to genotype surrogate for SNP rs3757318 -the best tag SNP at the 6q25.1 locus from a European GWAS [3]. Our aims were to compare the effects of these tags in well-powered studies of both Asian and European ancestry and to test if these known SNP associations are shared by the different ethnic groups. We have been successful in achieving these aims and our analyses provide additional insights into the nature of this locus.

Ethics Statement
Approval of the studies was obtained from the ethics committees listed in Table S1. All studies conform to the Declaration of Helsinki and all study participants gave written informed consent.

Study Populations
Data from forty-five BCAC case-control studies from Australia, Europe, North America, and South-East Asia were available for inclusion in this analysis (see Table S1 for a description of the individual studies). To be eligible for BCAC, studies needed to include at least 500 cases of invasive breast cancer and 500 controls, with DNA samples available for genotyping. The controls needed to be broadly from the same population as the cases (http://www.srl.cam.ac.uk/consortia/bcac/about/about.html). Some studies selected cases preferentially on the basis of age and/ or family history.
All studies provided information on disease status (58,822 controls/62,061 invasive cases/2,769 in-situ cases/1,435 cases of unknown invasiveness), age at diagnosis or interview and ethnicity (Asian/European/other). Forty studies also provided information on estrogen receptor (ER) status for a total of 40,508 cases (9,878 Estrogen receptor negative (ER2)/30,630 Estrogen Receptor positive (ER+)).

Laboratory Methods
In most studies SNPs were assayed by Taqman TM (Applied Biosystems, Foster City, USA). Primers, probes and master mix were ordered in a single batch and alliquots shipped to each study. Reactions were performed according to manufacturer's instructions, using the following thermal cycling profile 95uC for 10 mins followed by: [92uC for 15 secs, 60uC for 1 min] for 40-60 cycles.
SNP rs12662670 was chosen as the most easily assayable surrogate for the best European GWAS hit, rs3757318, for which no working Taqman TM assay could be designed. These two SNPs are correlated at r 2 = 0.89 in the European samples used in Turnbull et al. [3] although the correlations in populations of Asian ancestry are somewhat weaker (r 2 = 0.72 and r 2 = 0.66 in HapMap2 JPT and CHB samples,respectively Tables S2a and S2b for information on the respective phases for SNPs rs2046210 and rs12662670). All studies followed standard quality control guidelines (for details see http://www.srl.cam.ac. uk/consortia/bcac/about/about.html). Data were excluded for any sample that failed genotyping for .20% of the SNPs typed in a given phase of genotyping. All study data were excluded for any SNP with overall call rate ,95% or duplicate concordance ,94% (based on at least 2% of samples in each study being genotyped in duplicate) or departure of genotype distribution from Hardy- Weinberg equilibrium in controls (p,0.005). In addition, all genotyping centres assayed an identical plate of 80 control DNA samples (referred to as the Coriell plate; which also included 14 internal duplicates) and had to achieve call rates and duplicate concordance .98% in order for their data to be included. Data for both SNPs from one study (NBCS) were excluded from further analyses after quality control rules were applied. Quality control data for the individual studies are shown in Tables S2a and S2b. Thus, for SNP rs2046210 forty out of forty-one assayed studies (56,607 cases/49,559 controls), and for SNP rs12662670 thirtythree out of thirty-four assayed studies (47,251 cases/40,161 controls) were included in the statistical analysis.

Statistical Analyses
ORs were estimated using logistic regression. In order to provide reliable estimates of effect sizes, study-specific effect estimates of ORs were derived only for those studies that provided at least 100 cases and controls for the respective (sub-) group of interest.
The primary analysis estimated ORs for the main effect of the SNP, adjusted for the studies that provided data for the respective analysis (i.e. S-1 indicator variables were entered the logistic regression model, where S was number of studies that provided data for the respective analysis). ORs adjusted for both study and age were essentially identical and we did not therefore present the age-adjusted analyses. Per allele ORs were estimated under the assumption of a log-additive mode of inheritance, i.e. the SNP was coded according to the number of minor alleles 0, 1 or 2. Additionally, ORs by genotype were calculated, i.e. two indicator variables indicating the presence of the heterozygous genotype and the genotype homozygous for the minor allele, respectively, were entered the model. The primary p-values were derived by means of a Wald-Test assuming a log-additive mode of inheritance (one degree of freedom). Following Laird and Mosteller, heterogeneity of per allele ORs between studies was assessed by the p-value derived from the Q statistic [4] and using I 2 . Tests were two-sided.
Genetic main effects by ER status were estimated using casecontrol logistic regression and restricting the case sample to ER+ or ER2 cases, respectively. To test for significant differences between main effects of rs2046210 or rs12662670 in ER+ versus ER2 cases, logistic regression analyses were conducted in cases only. In these case-only analyses, the binary ER status was the outcome/dependent variable and the respective SNP and the indicator variables representing the studies were the independent variables.
Variation in OR by age was evaluated by testing for an interaction between age-group (,40, 40 to 49, 50 to 59, $60) and SNP, separately for each subgroup defined by ethnicity and ER   status. Thus, the multiplicative SNP by age-group interaction term entered the model in addition to the main effect terms for SNP, age-group and study. To investigate whether the association with breast cancer risk could be explained by one SNP or whether both SNPs had independent effects on disease risk, we fitted logistic regression models which included both SNPs, in addition to indicator variables for the studies, as independent variables in the model. Analyses were carried out separately for Europeans and Asians and for ER2 versus ER+ cases and controls. Additionally, haplotype analyses were performed using logistic regression models that included the estimated two-marker haplotypes (coded according to a log-additive model) except for the reference haplotype (i.e., the most frequent haplotype) and the indicator variables for study. Haplotypes were estimated using the expectation-maximization algorithm.
All analyses, were performed using R version 2.11.0 [5] and the R packages meta, rmeta and haplo.stats.

Results
Key characteristics for each participating study are shown in Table S1. In addition to the originally discovered SNP rs2046210, SNP s12662670 was genotyped as a surrogate for the best tag from Turnbull et al. (rs3757318) [3], for which no working Taqman TM assay could be designed. The genotype distributions by ethnicity and study for SNPs rs2046210 and rs12662670 in cases and controls are given in Tables S3a and   Table 3. Association of rs2046210 and rs12662670 with risk of ER2*/ER+** breast cancer. S3b. The associations of each SNP are presented in Table 1 and as Forest plots in Figure 1 Table 1) and Asians (p = 0.028 for rs2046210, p = 0.012 for rs12662670). In each ethnicity the estimated ORs for each SNP, after adjustment for the other SNP, are of similar magnitudes: For rs2046210 in Europeans OR (A/G) = 1.08 (95% CI 1.05-1.11) and for rs12662670 in Europeans OR (G/T) = 1.07 (95% CI 1.02-1.12). For rs2046210 in Asians OR (A/G) = 1.17 (95% CI 1.02-1.36) and for rs12662670 in Asians OR (G/T) = 1.21 (95% CI 1.04-1.40). Similar effect estimates are also obtained for haplotypes carrying one minor allele though estimates do not reach statistical significance for the very rare haplotype carrying the major (G) allele of rs2046210 along with the minor (G) allele of rs12662670 ( Table 2). Of note, from the four observed haplotypes, effects are strongest and highly statistically significant for the haplotype carrying both minor alleles: In Europeans OR (AG) = 1.16 (95% CI 1.11-1.21). In Asians OR (AG) = 1.42 (95% CI 1.30-1.56).
The OR estimates for in-situ cancer are similar to those for invasive cancer for both SNPs in Europeans, although, due to small numbers, the effect of rs12662670 on in-situ tumours does not reach statistical significance (Table S4). For each Asian study, the number of in-situ cases is less than 100 and so effect estimates are inaccurate but do not differ from those for invasive cancer (data not shown). The associations of these two SNPs with tumour sub-types defined by ER status (ER+ and ER2) were also investigated and are presented in Table 3 and Figures 3, 4, 5, and 6. In Europeans, SNP rs2046210 is associated with a greater OR for ER2 than ER+ tumours: OR (ER2) = 1.20 (95% CI 1.15-1.25), p = 1.8610 217 vs. OR (ER+) = 1.07 (95% CI 1.04-1.1), p = 1.3610 27 , p heterogeneity = 5.1610 26 . This difference remains significant after adjustment for rs12662670. A similar, although non-significant, difference is observed in European women for SNP rs12662670 ( Table 3). In the Asian studies, however, there is no clear evidence of a differential association by tumour receptor status for either SNP ( Table 3).
We further investigated whether the magnitudes of these SNP associations on tumour sub-types differed by age at diagnosis/ interview (see Table S5). In Asian studies the data are too sparse to give meaningful results. In the combined ethnicities and the European studies alone, the magnitudes of the observed associations are greater in younger women.
Fourteen of the European studies had been designed to oversample cases with a family history of breast cancer (see Table S1), which could have led to an overestimation of the ORs relative to those expected in a population-based case-control study. However, exclusion of these studies does not materially affect the estimated ORs for either SNP (see Table S6).

Discussion
In this large collaborative study of up to 61,689 cases and 58,822 controls, we demonstrate a highly statistically significant association between the A allele of rs2046210 and increased breast cancer risk in women of both Asian and European ancestry, thus extending the association previously observed in Asian populations. Consistent with previous reports [1][2][3], the effect sizes are significantly greater in Asians than in Europeans. Our study also reveals that the G allele of SNP rs12662670 is significantly associated with increased breast cancer risk in both ethnicities. SNP rs12662670 is used here as surrogate for SNP rs3757318 -the most strongly associated SNP at this locus in the European GWAS described by Turnbull et al. [3]. In addition, and also in contrast to Stacey et al. [2], we find that the OR for rs12662670 is greater in Asians than in Europeans (Table 1, Figure 1 and 2). In contrast to previous reports, our study indicates that both SNPs (rs2046210 and rs12662670) may be independently associated with breast cancer risk -in models including both SNPs, both maintain significant ORs after adjustment for the other. Haplotype analyses result in effect estimates for the AT and GG haplotypes, which carry only one minor allele, very similar to those of the single SNP analyses for the respective minor alleles. Furthermore, haplotype analyses show a clearly stronger effect of the AG haplotype, carrying both minor alleles, compared to the effects of the AT and GG haplotypes, further supporting the hypothesis that there may be two different causative variants, one on each haplotype carrying only one minor allele and both on the haplotype carrying both minor alleles (i.e., the stronger effect of the AG haplotype compared to the AT and GG haplotypes may be explained by the joint effect of the two minor alleles on the AG haplotype). However, the alternative conclusion that a single causative variant may exist that is intermediate between the two SNPs phylogenetically, i.e. on the AG haplotype and on some of the AT haplotypes, cannot yet be completely excluded, since this could also be an explanation for the stronger effect of the AG haplotype compared to the AT and GG haplotypes.
We also find evidence that SNP rs2046210 is more strongly associated with ER2 than ER+ disease in both European and Asian women. In the present study this differential association with receptor status is statistically significant in European studies (and remains after adjustment for rs12662670) but is not quite significant in Asians which may be due to a lack of power attributable to the comparatively small number of Asian individuals involved in our study ( Table 3). However this same SNP had previously been reported to be more strongly associated with ER2 tumours in the original Chinese cases [1] as well as in a recent replication study in Chinese women [6]. In line with these reports, a meta-analysis (14,231 cases, 10,244 controls) on this SNP-disease association by ER status in Asians, incorporating published results as well as those presented here, reveals a significant difference in OR associated with ER2 versus ER+ tumour risk [OR (A/G -ER2 ) = 1.37 (95% CI 1.30-1.44), p = 3.7610 233 vs. OR (A/G-ER+) = 1.27 (95% CI 1.22-1.34), p = 2.2610 224 ; p heterogeneity = 0.04]. A stronger association of SNP rs2046210 with ER2 tumours is also consistent with the report from the Consortium of Modifiers of BRCA1/2 (CIMBA) [7] that the same allele is associated with an increased Hazard Ratio of breast cancer in BRCA1 mutation carriers (who predominantly develop ER2 tumours). The CIMBA study also observed that this allele conferred increased Hazard Ratios among younger mutation carriers while we observed similar trends for greater SNP ORs at younger age groups (Table S5). By contrast, the CIMBA consortium reported that SNP rs9397435 (the tag they used for rs12662670; r 2 = 0.61, r 2 = 0.50 and r 2 = 0.85 in HapMap2 CEU, JPT and CHB samples, respectively) shows evidence of modification of risk in both BRCA1 and BRCA2 mutation carriers (who mainly develop ER2 and ER+ tumours respectively) [7] whilst similarly, we find that SNP rs12662670 is associated with increased risks of both ER2 and ER+ tumours.
Previous fine-scale mapping publications on this locus [2,8] have sought a single variant to explain the associations seen with all SNPs in the region: Stacey et al. [2] proposed SNP rs9397435 as a possible single causative variant since it was more strongly associated than rs2046210 in women of European, African and Asian ancestry. We are not able to comment on this variant, as it has not been genotyped in BCAC. However, our findings suggest there could be two independent associations at this locus: one, better tagged by SNP rs2046210, predisposing to ER2 tumours and the second, better tagged by rs12662670, conferring similar risks of both tumour types. Although physically close, SNPs rs2046210 and rs12662670 are not highly correlated with each other, particularly in Europeans (in BCAC r 2 = 0.12 in Europeans and r 2 = 0.56 in Asians) and all four possible combinations (haplotypes) of these two SNPs clearly exist.
Examination of linkage disequilibrium plots of the regions surrounding these two SNPs in Europeans ( Figure 7) reveals little, if any, physical overlap between SNPs highly correlated (r 2 .0.9) with rs2046210 and those with rs12662670. If there were a single causal variant, directly responsible for the associations seen with both SNPs, it would need to be correlated with both SNPs. Such a variant has not been yet identified (e.g. by the 1000 Genomes Project). It would presumably be relatively rare. An alternative, and we think, more plausible, explanation for the pattern of associations may be the existence of two independent causative variants, one correlated with rs2046210 and another correlated with rs12662670. If this is the case, the former variant may be more strongly associated with ER2 breast cancer than the latter. The reason why both SNPs confer higher relative risks in Asians than in Europeans is unclear. Within the BCAC studies, ER2 tumours are relatively more prevalent among Asian (36%) compared to European cases (23%), but this is not sufficient to explain the higher ORs in Asians, since the effects persist after stratification by ER status. It remains possible that the higher relative risks are due to differential patterns of linkage disequilibrium if the, as yet, unidentified causal variants are not strongly correlated with the SNPs identified to date. These questions may be resolved by comprehensive re-sequencing of this locus and fine scale mapping to identify the causal variant (or variants) responsible for the observed breast cancer risks. One aim of the iCOGS Project [9], which is currently underway, is to address these questions. However it is possible that these observed differences between Asians and Europeans may reflect interactions with lifestyle risk factors or other unlinked genetic loci. Another possible explanation is that the estimated SNP effects in Asians are inflated given the phenomenon known as the ''winner's curse'', i.e. the suboptimal power of the pool of Asian studies (due to the small number of Asian individuals) together with the commonly used requirement for a published association to pass a certain predefined p-value threshold may have resulted in biased SNP effect estimates [10,11].
Although there are eleven genes within 1 Mb of this locus, attention has focused on the ESR1 gene, whose transcription start site is located approximately 180 Kb downstream of SNP rs2046210. ESR1 encodes ERa and has long been implicated in breast carcinogenesis. However, it is possible that the proximity of this SNP to ESR1 may be providing a false lead -both SNPs (rs2046210 and rs12662670) lie in the flanking region of C6orf97 and there are numerous other genes in close physical proximity (see Figure 7). It is notable however, that SNPs mapping to this region have also been identified in GWAS for bone mineral density -another phenotype in which estradiol metabolism is clearly implicated [12,13]. Furthermore, a recent paper [14] demonstrates that a number of genes, including ESR1 and C6orf97 are co-regulated at this locus although the functions of most of these co-regulated genes have not yet been elucidated. The SNP associations, presented here, may provide a basis to explore the biological role of this locus in estrogen signalling and cancer development in more detail.
Taken together our findings suggest the possibility of the presence of two different causative variants at the 6q25.1 locus and indicate that fine-scale mapping efforts aimed at finding a single variant accounting for associations with both marker SNPs, may not be successful.

Supporting Information
Table S1 Characteristics of 45 case-control studies within the Breast Cancer Association Consortium (BCAC). (DOC) Figure 7. Linkage disequilibrium blocks in the ESR1 region. Five SNPs tagged (at r2.0.9) by rs12662670 and three by rs2046210 are marked by arrows (dark and light grey respectively); rs12662670 and rs2046210 are marked by stars; rs3757318 and rs9397435 are marked by points; blocks were generated using data from the 1000 Genomes Project and HapMap; blocks include all single nucleotide polymorphisms with a minor allele frequency .0.05. The directions of translation of ESR1 and C6orf97 are marked and other genes in the locus are listed. doi: 10.1371/journal.pone.0042380.g007 Table S2 Characteristics of the study populations genotyped for rs2046210 (a) and rs12662670 (b). (DOC)