Identification of a Breast Cancer Susceptibility Locus at 4q31.22 Using a Genome-Wide Association Study Paradigm

More than 40 single nucleotide polymorphisms (SNPs) for breast cancer susceptibility were identified by genome-wide association studies (GWASs). However, additional SNPs likely contribute to breast cancer susceptibility and overall genetic risk, prompting this investigation for additional variants. Six putative breast cancer susceptibility SNPs identified in a two-stage GWAS that we reported earlier were replicated in a follow-up stage 3 study using an independent set of breast cancer cases and controls from Canada, with an overall cumulative sample size of 7,219 subjects across all three stages. The study design also encompassed the 11 variants from GWASs previously reported by various consortia between the years 2007–2009 to (i) enable comparisons of effect sizes, and (ii) identify putative prognostic variants across studies. All SNP associations reported with breast cancer were also adjusted for body mass index (BMI). We report a strong association with 4q31.22-rs1429142 (combined per allele odds ratio and 95% confidence interval = 1.28 [1.17–1.41] and P combined = 1.5×10−7), when adjusted for BMI. Ten of the 11 breast cancer susceptibility loci reported by consortia also showed associations in our predominantly Caucasian study population, and the associations were independent of BMI; four FGFR2 SNPs and TNRC9-rs3803662 were among the most notable associations. Since the original report by Garcia-Closas et al. 2008, this is the second study to confirm the association of 8q24.21-rs13281615 with breast cancer outcomes.


Introduction
Breast cancer is the most common cancer in women in the developed world, with 22,700 new cases and 5,100 deaths anticipated in Canada for 2012 [1]. While environmental and lifestyle risk factors contribute to most of the variation in breast cancer risk, twin studies have shown substantial contribution of inherited genetic risk factors to disease susceptibility [2,3]. Linkage and family-based studies have identified high and moderate penetrance mutations in genes such as BRCA1 [4], BRCA2 [5], PTEN [6], ATM [7], TP53 [8], BRIP1 [9], PALB2 [10] and CHEK2 [11] contributing to hereditary breast cancer; however, these mutations occur rarely in the general population. Further, linkage studies failed to identify additional genes/mutations associated with high or moderate risk of breast cancer. Therefore, it has been hypothesized that most of the genetic risk of breast cancer, for both familial and sporadic cases in the general population, may involve a combination of multiple low penetrance genes/loci, each contributing to an overall genetic risk of breast cancer [12].
Over the past five years, several genome-wide association studies (GWASs) reported breast cancer susceptibility variants (i.e., single nucleotide polymorphisms, SNPs) at multiple loci [13][14][15][16][17][18][19][20][21][22]. A large-scale candidate gene study also identified an additional locus (caspase 8 coding SNP, rs1045485) associated with breast cancer risk [23]. The low penetrance common SNPs identified to date explain less than 10% of the genetic risk of breast cancer [22]. Taken together, pathogenic germline mutations and low penetrance variants identified thus far only account for a small fraction of the genetic risk of breast cancer, suggesting that additional variants remain to be identified [24].
Recently, we conducted a two-stage GWAS using sporadic breast cancer cases and apparently healthy controls and identified six SNPs (located at chromosomes 4, 5, 16 and 19) that appeared to be associated with breast cancer susceptibility [21]. In a combined sample size of 1,455 breast cancer cases and 1,536 healthy controls from two independent stages, these SNPs showed modest risk of breast cancer (observed odds ratios (ORs) range: 1. 22-1.45).
It is an internationally accepted practice to replicate GWAS findings in multiple independent studies with cases and controls of both similar and diverse ethnic backgrounds to assess the robustness and generalizability of the identified associations, respectively. Therefore, in the current study, we further investigated the six putative breast cancer susceptibility SNPs that we have reported previously [21] by conducting an independent replication study (stage 3), using breast cancer cases and controls. The study subjects were predominantly of Caucasian origins drawn from the same geographical region in Canada as in our previous study. Further, we evaluated the GWAS variants for breast cancer susceptibility reported by various consortia. These include the Breast Cancer Association Consortium [13,18], the Effectiveness of Additional Reductions in Cholesterol and Homocysteine Collaborative Group [13], the Nurses' Health Study [14], the National Cancer Institute Cancer Genetic Markers of Susceptibility Project [14] and the National Heart, Lung, and Blood Institute Framingham Heart Study [15]. We compared the consortia reported variants with our study population to explore the extent of conformity to previous findings within the Caucasian populations, and for the strengths of associations for the sample size utilized in this study. Since obesity is a well-established risk factor for post-menopausal breast cancer [25] and is a heritable trait [26], we also adjusted the identified variant-breast cancer associations for body mass index (BMI) to examine whether the variants are associated with breast cancer risk, through BMI or through different pathways. We assessed variability in disease susceptibility by clinicopathological characteristics such as menopausal status, family history of breast cancer, luminal A status of tumors, tumor grade and tumor stage. Finally, we explored the associations of the six putative susceptibility SNPs identified in our earlier study and the previously published consortia SNPs with breast cancer outcomes.

Study participants
All breast cancer cases (n = 2,750) used in this study had a confirmed diagnosis of breast cancer in the province of Alberta, Canada, and participated in provincial tumor bank projects in operation since 2001 (the PolyomX Project, 2001-2005 and subsequently merged with the Canadian Breast Cancer Foundation (CBCF) Tumor Bank, 2005 to present; http://www. abtumorbank.com/), Alberta, Canada) [21,27]. The tumor bank accrues tumor tissues and blood samples from patients with confirmed diagnoses of breast and other cancer types, through hospitals (publicly funded comprehensive cancer care centres managed by the Alberta Health Services (AHS)) in Edmonton and Calgary in the province of Alberta, Canada. The tumor bank database contains well-annotated clinicopathological information for the banked specimens. The CBCF Tumor Bank currently holds blood from more than 8,000 individuals from various cancer types, as a source of germline DNA for genotyping, in addition to tumor tissue specimens. Apparently-healthy (i.e., confirmed not to have had a diagnosis of any cancer) controls (n = 4,472) were obtained from the Tomorrow Project (http://in4tomorrow.ca) and were frequency matched to cases based on ten-year age group. The Tomorrow Project is a large prospective cohort study that started in 2000 and successfully recruited approximately 42,000 Albertans (64% women) by 2012 using a combination of random digit dialling (RDD), and random mail-outs, augmented by email campaigns and social media. Inclusion criteria for initial recruitment to the Tomorrow Project were as follows: (i) aged 35-69 years; (ii) no personal history of cancer, other than non-melanoma skin cancer; (iii) able to complete written questionnaires in English and (iv) currently living in Alberta. Upon enrolment to the Tomorrow Project, participants completed a health and lifestyle questionnaire (including family history of major diseases). The participants gave written consent to be contacted in the future to provide a blood sample for banking to support research in cancer or chronic diseases, receive invitations to provide updated health and lifestyle information or additional samples in the future, and to linkage with administrative health data to understand patterns of health services utilization and disease occurrence [28]. Absence of prior history of cancer upon study enrolment was confirmed by performing linkage with the Alberta Cancer Registry (http://www.albertahealthservices.ca/poph/hi-poph-surv-canceralta-cancer-registry-2009.pdf). As of late 2012, approximately 19,000 Tomorrow Project participants from across Alberta had given a 50 ml non-fasting venous blood sample for banking in multiple aliquots of buffy coat, serum, plasma and red blood cells. Breast cancer cases in this study were of predominantly Caucasian ancestry, and resided in the Edmonton and Calgary regions (sites of tertiary cancer centres in Alberta). The population in these regions accounts for two thirds of the total population of the province of Alberta. Thus, in addition to age matching, the controls were selected from the Tomorrow Project using the same ethnicity and geographic location criteria. Even though socioeconomic status (SES) plays a role in health outcomes, differences between SES of cases and controls used in this study and underlying assumptions needs to be validated independently. However, given the universal access to health care as a model adopted in Canada, the influence of SES was therefore considered as minimal, if any. A brief description of demographic characteristics of breast cancer cases and controls is presented in Table 1. Written informed consent to use banked samples to support research was obtained from all the study participants, and the study was approved by the Alberta Cancer Research Ethics Committee, Alberta, Canada.

SNPs genotyping and quality control
Germline DNA was extracted from peripheral blood samples of both cases and controls using commercially available Qiagen (Mississauga, ON, Canada) DNA isolation kits. All genotyping assays were performed on the Sequenom iPLEX Gold platform (San Diego, CA, USA) using services from the McGill University and Genome Quebec Innovation Center, Montreal, Canada. Within-stage (stage 3 for the six SNPs from our previous GWAS and a single stage for the 11 consortia SNPs) genotype concordance was assessed with 66 duplicate samples (8 cases and 58 controls). Cross platform (Affymetrix vs. Sequenom i.e., stage 1 vs. stage 3 for the six SNPs) was assessed with 17 duplicate samples (5 cases and 12 controls). Between-stage (stage 2 vs. stage 3 for the six SNPs) genotype concordance was assessed with 632 cases and 452 controls. Duplicate samples used for assessing genotype concordances among various stages were randomly selected. Very stringent criteria of SNP call rate .99% was considered to minimize false positive associations due to missing genotype counts and HWE criteria of P.10 26 in control subjects were adopted.

Association analyses and statistical considerations
Overall analyses. Allelic associations of SNPs with breast cancer susceptibility were evaluated with correlation/trend tests with one degree of freedom (d.f.). The strengths of allelic and genotypic associations were estimated using unconditional logistic regressions and reported as ORs and 95% confidence intervals (CIs). To increase sample size and hence the statistical power to better capture SNP-breast cancer associations, cases and controls from all independent stages were pooled together and combined analyses were conducted. BMI was included as a covariate in the logistic models to calculate adjusted ORs, 95% CIs and P values in Stage 3 and in combined stages.
Subgroup analyses. To evaluate variations in SNP-breast cancer associations by clinicopathological characteristics (to address potential heterogeneity in the observed overall associations), we conducted subgroup analyses (unconditional logistic regressions adjusted for BMI) within the combined breast cancer cases based on menopausal status, luminal A status, family history of breast cancer (captured under the single category representing cases with first, second or third degree relatives), tumor stage and grade. A common set of healthy controls was used to test the SNPbreast cancer associations in these subgroup analyses. Breast tumors that were either estrogen receptor (ER) or progesterone receptor (PR) positive and human epidermal growth factor receptor 2 (HER2) negative were classified as luminal As, and the remainder were classified as non-luminal As. The cases with unknown ER, PR or HER2 status were excluded from the luminal A subgroup analyses. Breast tumors with operable tumor stages (I-IIIA) were classified as one subgroup while tumors with nonoperable tumor stages (IIIB, IIIC) were classified as the other subgroup. Heterogeneity in ORs between the subgroups was assessed using multinomial logistic regressions ('mlogit') and linear combination of estimators ('lincom') implemented in Stata 12.0 (www.stata.com). Statistical significance of this heterogeneity test was reported as P for heterogeneity (P het ).
Associations of SNPs with breast cancer outcomes. We also evaluated the potential prognostic values of SNPs with breast cancer outcomes, such as recurrence-free survival (RFS) and overall survival (OS), by fitting Cox proportional hazards models available in the ''survival'' package [29] implemented in R 2.15.1 [30], adjusted for BMI. The associations were reported as hazard ratios (HRs), 95% CIs and adjusted P values. Genotypes were recoded to 0 (wild type homozygotes), 1 (heterozygotes) and 2 (variant homozygotes) before fitting the Cox models.
All statistical tests were two-sided. We assumed an additive model of genetic inheritance to calculate power, as described earlier [21]. As such, our study had adequate power (.80%) to detect associations that were larger than genotypic relative risk of $1.2. Whenever multiple SNPs were tested, correction for multiple hypotheses testing was performed by P = 0.05/number of tests. We considered all SNPs from our stage 1 GWAS (782,838 SNPs) to calculate genome-wide significance (P,6.4610 28 ) for the six replicated SNPs. Correlation/trend tests were carried out using SNP and Variation Suite v7.6.11 (Golden Helix, Inc., Bozeman, MT, www.goldenhelix.com) [31]. The observed and adjusted allelic and genotypic ORs and 95% CIs and adjusted P values were estimated using logistic models in PLINK (http://pngu.mgh. harvard.edu/,purcell/plink/) [32]. All the general statistical analyses were conducted using R 2.15.1.

Results
Genotyping assays of the 17 SNPs considered in this study were successful with a SNP call rate of .99%. Average within-stage genotype concordance was 100% while cross-platform genotype concordance was .99%; between-stage average genotype concordance was also 100%. We reasoned that this negligible percentage (,1%) of discordance was unlikely to influence SNP-  Tables 2 and S1) and confidence in the reported associations.

Association of previously identified (consortia SNPs) breast cancer susceptibility loci
Except for COL1A1-rs2075555, we successfully replicated the association of ten consortia reported breast cancer susceptibility loci in our study population at P,0.05 (Table S1). These SNPs remained statistically significant after correction for multiple hypothesis testing (P,0.05/11 = 0.004). Four FGFR2 SNPs and TNRC9-rs3803662 showed the strongest associations attaining the commonly adopted genome-wide significance level (P,5.0610 28 ), with similar ORs to the original study findings [13,14,19]. After adjusting for BMI, five SNPs remained statistically significant (adjusted P,4.2610 28 ) ( Table S1). The adjusted per allele ORs and 95% CIs were also similar to the observed ORs and 95% CI (Table S1), indicating that these SNP-breast cancer associations are independent of the pathway linking BMI and risk of breast cancer.

Replication of the six putative SNPs in stage 3 analyses
Of the six putative breast cancer susceptibility SNPs that we reported earlier, 4q31.22-rs1429142 showed consistent reproducibility across all three stages. The variant at 5p15.2-rs1092913 also retained statistical significance for increased breast cancer risk in the current independent replication stage 3 study at P,0.05 ( Table 2), and remained statistically significant after correction for multiple hypothesis testing (P,0.05/6 = 0.008). The magnitude and direction of per allele ORs and 95% CIs of both SNPs were consistent with our previous findings [21] while slightly elevated ORs and 95% CIs were observed for heterozygotes and variant homozygotes ( Table 2), conforming to the additive model of genetic inheritance. After adjustment for BMI, both 4q31.22-rs1429142 and 5p15.2-rs1092913 remained statistically significant at adjusted P,0.05, while both adjusted per allele and genotypic ORs and 95% CIs of 4q31.22-rs1429142 were larger than the observed ORs. The remaining four SNPs did not show statistical significance at P,0.05 in this stage 3 study.

Combined analyses of the six putative SNPs (stages 1+2+3)
In the combined analyses (stages 1+2+3), five of the six SNPs were significantly associated with increased breast cancer risk at P,0.05, the exception being 16q23.2-rs1981867 which showed marginal statistical significance (P = 0.06) (

Subgroup analyses
The previously reported GWAS variants (consortia SNPs), except COL1A1-rs2075555, remained statistically significant in subgroups with both pre and postmenopausal women, luminal A cases, cases with or without family history of breast cancer, low tumor grade and operable tumor stage at adjusted P,0.05 ( Table  S2). The adjusted per allele ORs, 95% CIs and P values were also comparable to the overall analyses, with similar magnitudes and directions of risk (Tables S1 and S2). Of these, the four FGFR2 SNPs retained genome-wide significance level in subgroups with luminal A cases, cases with family history of breast cancer, low tumor grade and operable tumor stage while 8q24.21-rs13281615 and TNRC9-rs3803662 showed genome-wide significance level associations only in cases with low tumor grade and operable tumor stage, respectively. SLC4A7-rs4973768, 5q11.2-rs889312, 8q24.21-rs13281615 and TNRC9-rs3803662 showed marginal associations in subgroup with non-luminal A cases. Similarly, 5q11.2-rs889312 and TNRC9-rs3803662 showed significant associations in cases with high tumor grade. None of the SNPs showed significant associations in cases with non-operable tumor stage, with the possible exception of 5q11.2-rs889312 which showed a marginally statistically significant association (adjusted P = 0.04).
The associations of the six GWAS-identified putative SNPs from our populations with breast cancer were consistent across the subgroups, without any substantial modifications in SNP-breast cancer associations observed in overall analyses ( Table 2). 4q31.22-rs1429142 and 5p15.2-rs1092913 remained significantly associated in subgroups with both pre and postmenopausal women, luminal and non-luminal A cases and cases without family history of breast cancer, high and low tumor grades and operable tumor stage at adjusted P,0.05 (Tables 3 and 4). Moreover, 4q31.22-rs1429142 attained genome-wide significance level in subgroups with premenopausal women (adjusted P = 6.2610 210 ), while a strong statistical association was also observed in cases with operable tumor stages (adjusted P = 1.6610 27 ). The ZNF577 SNPs (rs10411161, rs3848562 and rs11878583) also showed statistically significant associations in subgroups with postmenopausal women, luminal A cases, cases without family history and operable tumor stages (Tables 3 and 4).

Association of SNPs with breast cancer outcomes
Of the 17 SNPs tested for their associations with breast cancer outcomes, 8q24.21-rs13281615 was significantly associated with reduced risk of both RFS (adjusted P = 0.001 and adjusted per allele HR and 95% CI = 0.77 [0.65-0.90]) and OS (adjusted P = 0.003, adjusted per allele HR and 95% CI = 0.76 [0.64-0.91]) ( Table S3). The remaining 16 SNPs did not show statistically significant associations with breast cancer outcomes at adjusted P,0.05.

Discussion
In this independent replication study in Canadian women involving 2,750 breast cancer cases and 4,472 healthy controls, we successfully reproduced the associations of ten previously GWASidentified breast cancer susceptibility loci, indicating the robustness of the consortia identified SNPs with breast cancer. In Table 4. Subgroup analyses of the six putative breast cancer susceptibility SNPs ( addition, two of the six putative breast cancer susceptibility SNPs (4q31.22-rs1429142 and 5p15.2-rs1092913) from our previous two-stage GWAS also showed robust associations in an independent set of breast cancer cases and healthy controls (stage 3). After adjusting for BMI, 4q31.22-rs1429142 attained near genome-wide significance level (adjusted P = 1.5610 27 ) ( Table 2). A major strength of this study is the consideration of BMI, which allowed confirmation that the genetic contributions to breast cancer are independent of one of the major risk factors for breast cancer. An additional strength was our evaluation of the SNP-breast cancer associations as potential prognostic factors for RFS and OS after diagnosis and their relationships with breast cancer clinical and molecular subtypes. The most notable associations among the ten previously GWAS-identified breast cancer susceptibility loci replicated in this study were with four FGFR2 SNPs (rs2981579, rs1219648, rs2420946 and rs2981582) and TNRC9-rs3803662 (observed P,7.0610 210 and adjusted P,4.2610 28 ) ( Table S1). The magnitude and direction of the associations were similar to those reported in the original GWASs (observed per allele OR ranges: 1.17-1.26) [13][14][15][16]18,19], suggesting the robustness of these associations with breast cancer susceptibility. Further, results from the subgroup analyses were consistent with the previous reports [33][34][35], supporting the hypothesis that FGFR2 loci (rs1219648, rs2420946 and rs2981582) are associated with increased risk of breast cancer, especially in familial breast cancer cases (P het ,0.02), and associated with the better prognosis luminal A type or estrogen receptor positive breast cancers (P het ,0.001) ( Table S2) [33][34][35].
Of the six putative breast cancer susceptibility SNPs reported in our previous two-stage GWAS, our independent stage 3 analyses successfully replicated the associations of 4q31.22-rs1429142 and 5p15.2-rs1092913 with increased risk of breast cancer. In the combined analyses, five of the six reported associations from our previous GWAS retained statistical significance, the exception being 16q23.2-rs1981867. These five SNPs should be further tested independently in additional cases and controls to assess their role in breast cancer etiology. When adjusted for BMI, we observed near genome-wide significant association for 4q31.22-rs1429142 (adjusted P = 1.7610 27 ) while 5p15.2-rs1092913 remained statistically significant (adjusted P = 1.9610 24 ). For 4q31.22-rs1429142, there was a substantial increase from the observed ORs (per allele = 1.22, OR heterozygote = 1.26 and OR homozygote = 1.36) to adjusted ORs (per allele = 1.28, OR heterozygote = 1.32 and OR homozygote = 1.52). These results indicate that the 4q31.22-rs1429142-breast cancer association may be linked to the BMI pathway of breast cancer risk elevation. This observation is in contrast to the ten GWAS-identified consortia reported SNP-breast cancer associations, and hence requires replication in independent set of breast cancer cases and controls, probably through collaborative efforts involving large international consortia. Both 4q31.22-rs1429142 and 5p15.2-rs1092913 showed statistically significant associations with breast cancer in subgroups with pre and postmenopausal women, cases with luminal and nonluminal A tumors, with and without family history of breast cancer, low and high tumor grade and operable tumor stage at adjusted P,0.05 (Tables 3 and 4). However, the association of 4q31.22-rs1429142 was stronger in pre than postmenopausal women (P het = 0.002), suggesting that 4q31.22-rs1429142-breast cancer association may vary by menopausal status.
Except for 8q24.21-rs13281615, none of the breast cancer susceptibility SNPs, including 4q31.22-rs1429142, showed significant association with breast cancer outcomes. 8q24.21-rs13281615 was significantly associated with better RFS and OS (adjusted P,4.5610 23 ) ( Table S3). Similar results for 8q24.21-rs13281615 were also observed in another study involving 13,527 invasive breast cancer cases [33]. To our knowledge, this is the second study to identify the potential prognostic value of 8q24.21-rs13281615 and hence this locus merits further investigation. These results provide further evidence supporting the hypothesis that the SNPs with prognostic value are yet to be identified using whole genome approaches and that the SNPs associated with breast cancer susceptibility (etiology) are distinct.
4q31.22-rs1429142 is located in a gene desert, with the closest gene endothein receptor type A (EDNRA) (Figure 1) located ,112 kb downstream of the SNP. EDNRA gene encoded protein is a cell surface bound receptor involved in several fundamental cellular processes by interacting with endothelins (widely expressed cytokines in various tissues) [36]. SNPs in or near the EDNRA gene have been associated with intracranial aneurysm risk [37], hypertension [38] and migraines [39]. This SNP is ,112 kb away from the EDNRA gene locus and we therefore queried the SCAN database [40], which uses HapMap human lymphoblastoid cell lines to identify putative expression quantitative trait loci. We found that 4q31.22-rs1429142 is associated with differential expression of five other genes (quantitative transmission disequilibrium test P,0.0001, implemented in the SCAN database) involved in at least one type of cancer -i.e., kinesin family member 3B (KIF3B) [41], paxillin (PXN) [42], general transcription factor IIA, 12 kDa (GTF2A2) [43], PTPRF interacting protein, binding protein (liprin beta 2) (PPFIBP2) [44] and tumor protein p63 regulated 1-like (TPRG1L) [45]. However, the allele of 4q31.22-rs1429142 responsible for these is unknown and future fine mapping studies to identify the causal variant and to investigate its allele specific effects are warranted.
5p15.2-rs1092913 is also located in a gene desert. The closest gene is rhophilin associated tail protein 1-like (ROP1NL) located ,2.5 kb upstream of the polymorphism. ROP1NL gene encodes a sperm protein, which interacts with A-kinase anchoring protein.
Recently, an independent study (n = 4,325 cases and controls) also showed significant association of 5p15.2-rs1092913 with breast cancer risk in estrogen receptor positive breast cancer of Korean ethnicity, suggesting the potential generalizability of this SNPbreast cancer association in the Korean population [46]. Furthermore, a meta-analysis of two GWASs also found multiple SNPs within the ROP1NL locus associated with the phenotype of BMI at 5p15.2, suggesting that this region is important for both breast cancer susceptibility and BMI [47].
In summary, our study not only provided supportive evidence for the robustness of the breast cancer susceptibility SNPs previously identified by consortia, but also identified a new locus at 4q31.22-rs1429142 for contributing to breast cancer susceptibility, lending credence to the continued research efforts in search of common variants for breast cancer.

Supporting Information
Table S1 Associations of the previously identified (consortia SNPs) breast cancer susceptibility loci in the current study. (XLSX)