A Genome-Wide “Pleiotropy Scan” Does Not Identify New Susceptibility Loci for Estrogen Receptor Negative Breast Cancer

Approximately 15–30% of all breast cancer tumors are estrogen receptor negative (ER−). Compared with ER-positive (ER+) disease they have an earlier age at onset and worse prognosis. Despite the vast number of risk variants identified for numerous cancer types, only seven loci have been unambiguously identified for ER-negative breast cancer. With the aim of identifying new susceptibility SNPs for this disease we performed a pleiotropic genome-wide association study (GWAS). We selected 3079 SNPs associated with a human complex trait or disease at genome-wide significance level (P<5×10−8) to perform a secondary analysis of an ER-negative GWAS from the National Cancer Institute's Breast and Prostate Cancer Cohort Consortium (BPC3), including 1998 cases and 2305 controls from prospective studies. We then tested the top ten associations (i.e. with the lowest P-values) using three additional populations with a total sample size of 3509 ER+ cases, 2543 ER− cases and 7031 healthy controls. None of the 3079 selected variants in the BPC3 ER-GWAS were significant at the adjusted threshold. 186 variants were associated with ER− breast cancer risk at a conventional threshold of P<0.05, with P-values ranging from 0.049 to 2.3×10−4. None of the variants reached statistical significance in the replication phase. In conclusion, this study did not identify any novel susceptibility loci for ER-breast cancer using a “pleiotropic approach”.


Introduction
Estrogen receptor-negative (ER2) breast cancer (BC) comprises 15 to 30% of all breast tumours (depending on the population) and has an earlier age at onset and a worse prognosis compared with estrogen receptor-positive (ER+) disease. It is more common among women of African-American origin and it is also the breast cancer type associated with BRCA1 mutations [1,2]. Genome-wide association studies (GWAS) have identified thousands of common human genetic variants associated with risk of hundreds of quantitative traits and human diseases [3,4]. Only seven susceptibility loci have been specifically identified for ER2 BC [5][6][7]. In a GWAS, hundreds of thousands or even millions of polymorphisms are interrogated at the same time in a strictly agnostic way, i.e. ignoring any possible a priori knowledge of the SNPs tested. This model requires use of a stringent significance threshold (P,5610 28 ) to correct for the numerous statistical tests performed and to avoid false positive findings. As a consequence, it is possible that variants with a truly positive but weak association are not detected and, therefore, not reported. A possible drawback of GWAS is that strict avoidance of false positives may lead to false negatives [8]. By running secondary analyses using a reduced number of SNPs defined by biological knowledge or hypothesis, the required threshold of significance may be lowered and the power to detect real associations of modest statistical effect may be increased.
A genetic mechanism termed pleiotropy, which is defined as one gene, or in this case allele, having an effect on multiple phenotypes [9] is an example for the selection of candidate SNPs for such secondary analysis. There are regions in the human genome, called Nexus, which have been associated with more than one distinct cancer type [10]. The most striking examples for cancer are: the 8q24 region, that harbors multiple loci associated with breast, colon, prostate, bladder and/or ovarian cancers, the TERT region, which has been associated with pancreatic, bladder, lung and prostate cancers, the p16 region on chromosome 9p21, and 6q25, and 11q13 associated, respectively, with non-Hodgkins lymphoma (NHL) and nasopharyngeal carcinoma and with bladder, breast and prostate cancer [10]. To the best of our knowledge a pleiotropic approach to identify novel cancer risk SNPs has been reported only once [11]. A pleiotropic GWAS performed to examine gene regions associated with pancreatic cancer, identified a region (HNF1A) previously associated with several diseases including Type-2 diabetes [12,13].
We used a similar approach to search for new genetic variants associated with estrogen receptor negative breast cancer susceptibility. We selected all the SNPs that had been associated with a human disease trait or phenotype, at genome-wide level (P,5610 28 ) and performed a secondary analysis on data from a GWAS study of ER2 breast cancer by the National Cancer Institute's Breast and Prostate Cancer Cohort Consortium (BPC3) [7]. We then tested the top associations using three additional populations with a total sample size of 3509 ER+ cases, 2543 ER2 cases and 7031 healthy controls.

Ethic statement
The Mammary Carcinoma Risk Factor Investigation (MARIE) study was approved by the ethics committees of the University of Heidelberg and the University of Hamburg. Written informed consent was obtained from all subjects.
For the BPC3 study written informed consent was obtained from all subjects and ethical approval was collected from the relevant institutional review boards from each cohort.

Study populations
We performed the study in two phases: first we analysed data from the BPC3 ER2 GWAS and second, for replication purposes, we used genotyping or existing data from selected breast cancer cases and controls collected by three different studies CPS-II, MCCS and the MARIE study. Individuals from CPS-II contributed cases and controls to both the initial GWAS and the replication phase, but there were no overlaps between sample sets used in the two phases of this study. The BPC3 has been described extensively elsewhere [14]. It consists of cases and controls selected from large cohorts assembled in Europe, Australia and the United States that have both biological samples and extensive questionnaire information collected prospectively. Cases were women who were diagnosed with invasive BC after enrolment, the diagnosis was confirmed by tumor registries or by medical records. Controls were considered eligible if they were free of BC until the follow-up time for the matched case subject. Case and control subjects were matched for ethnicity and age and for some cohorts also for additional criteria, such as country of residence. Laboratory techniques and relevant QCs for the BPC3 ER2 GWAS are extensively reported elsewhere [7]. Briefly, genotyping was performed at three centers (Imperial College London, UK, University of Southern California, USA, and the NCI Core Genotyping Facility, USA). Subjects from CPSII, EPIC, MEC, PLCO and PBCS were genotyped using the Illumina Human 660k-Quad SNP array (Illumina, San Diego, CA, USA), NHSI/NHSII and part of the PLCO study were genotyped previously using the Illumina Human 550 SNP array (Illumina, San Diego, CA, USA) [15]. For this study 1998 ER2 invasive cancer cases and 2305 controls belonging to the BPC3 cohort were used.
The MARIE study population comprises BC patients who participated in a population-based case-control study conducted in two study regions in Germany (Hamburg and Rhine-Neckar-Karlsruhe). Cases were women diagnosed with histologically confirmed primary invasive or in situ breast tumor, aged 50 to 74 years, and residents of the study regions. Detailed information on tumor hormone receptor status was collected using clinical and pathology records. Controls were randomly selected from population registries and frequency-matched by year of birth and study region. The study has been described in more detail elsewhere [16]. For the present analyses, 2027 cases (370 ER2/1657 ER+) and 1778 controls were included.

SNPs selection (phase one) and genotyping
The selection of the SNPs to be measured in phase one was done using the National Human Genome Research Institute's (NHGRI's) catalog of published GWA studies (http://www. genome.gov/gwastudies/) [4]. It contains summary information on polymorphic variants reported to be associated with a human disease, trait or phenotype in a GWA setting at the significance level of P,1.0610 25 . The data from the catalogue were downloaded in May 2012 and comprised 7986 SNPs. Approximately 60% (n = 5794) of the polymorphic variants reported in the catalogue had a P value higher than 5610 28 and were, therefore, excluded from further analysis. Of the remaining 3192 SNPs, 1688 (58%) were genotyped in the BPC3 scan. PLINK [17] was used to identify highly correlated (r 2 .0.9 in Hapmap3 CEU) SNPs genotyped in the BPC3 GWAS for 452 variants (14.2% of the total selected SNPs). Data for 939 SNPs were imputed: 901 (28.3% of the total selected SNPs) from Hapmap 2 and 38 (1.1% of the total selected SNPs) from Hapmap3. The remaining 113 (3.6% of the total selected SNPs) variants were dropped from the analysis since no surrogate was found and it was not possible to impute data. Thus, data for 3079 out of 3192 catalogued SNPs (96.4%) were used for this study.
The 3079 remaining SNPs were looked up in the BPC3 GWAS ranking the P-value in decreasing order to check for their association. All already known breast cancer risk SNPs were excluded from the analysis.

Replication (phase two) genotyping
In order to confirm the ten most significant findings we used additional BC cases and controls from three studies of women of Caucasian descent as a replication set: the CPS-II [18] consisting of 1530 estrogen receptor positive (ER+) cases, 53 ER2 cases and 2395 healthy controls, the MCCS [19] with 322 ER+ cases, 122 ER2 cases and 823 healthy controls, and the MAmmary carcinoma Risk factor InvEstigation (MARIE) [16] with 1657 ER+ cases, 370 ER2 cases and 1778 healthy controls, for a total of 3684 cases and 4996 controls. Specifically rs498872, rs2000999, rs12150660, rs780094, rs11229030 and rs13397985 were replicated in silico for the MARIE, CPSII and MCCS studies. These six SNPs were genotyped as part of the iCOGS study using a custom Illumina array. In the original iCOGS publications SNPs with MAF ,1%; call rate ,95%; or call rate ,99% and MAF ,5% and all SNPs with genotype frequencies that departed from Hardy-Weinberg equilibrium at P,1610 26 for controls or P,1610 212 for cases were excluded [5,20]. The remaining four SNPs rs8396, rs4788815, rs2571391, rs780092 were not present in the iCOGS array and were, therefore, genotyped de novo for the MARIE study by TaqMan. The mean genotyping success rate was 94.4% (88.2%-96.7%). The percentage of samples that was genotyped twice for quality assurance was 9.5%, the genotyping concordance was 99.99%. Departure from Hardy Weinberg equilibrium was tested for the ten SNPs for the respective control subjects from each study.

Statistical analysis
Logistic regression adjusted for five principal components, age (at diagnosis for cases and at selection for controls) and cohort was used to generate ORs, 95% CIs, and P values for each of the 3079 SNPs selected from the BPC3 ER negative GWA data set and for the 10 SNPs in the replication phase. The replication was performed using ER2 and ER+ breast cancer cases and the analysis was conducted using ER2 alone and in combination with ER+. Considering the fact that several ER2 SNPs are also associated with ER+ BC we included in the analysis ER+ and ER2 cases and then analyzed overall BC risk (ER+ and ER2) and ER2 specific (ER2 alone) to increase our power to find a true association. We had more than 90% power to replicate any of the associations observed in the discovery phase if considering all BC cases, and over 50% (53%-72%) power if considering only ER2 cases considering alpha of 0.05. Using a conservative Bonferroni correction, we considered a threshold of P-values,1.6610 25 (0.05/3079) as statistically significant.

Results
None of the 3079 selected variants in the BPC3 ER-GWAS was significant at the adjusted threshold. 186 variants were associated with ER2 breast cancer risk at a conventional threshold of P,0.05, with P values ranging from 0.049 to 2.3610 24 (Figure 1). The strongest observed association was a decreased risk of ER2 BC with rs8396 (OR hetero :0.84; 95% CI 0.76-0.92 and OR homo 0.71 (CI 95% 0.58-0.85)). We selected the most significant 10 SNPs (shown in table 1) and analyzed them using independent samples to determine whether they were genuinely associated with BC overall and for ER2 breast cancer in particular. All the polymorphic variants were in Hardy-Weinberg equilibrium with the exception of rs12150660 in the CPSII and MARIE cohorts and rs13397985 in the CPSII cohort. Therefore, CPSII was not used as a replication set for rs12150660 and rs13397985 and MARIE was not used for rs12150660. In addition, one polymorphic variant rs8396 was not used in the analysis because it had a call rate lower than 95% (88.2%).
Only rs11229030, a variant originally found associated with risk of Crohn's disease, was nominally associated with a decreased risk of ER2 BC (OR 0.85, CI 95% 0.75-1.00, P value = 0.049). The association was observed only for the MARIE study. The results of all the analyses are shown in table 1. Additional information on the original reports can be found at http://www.genome.gov/ gwastudies/. We also performed meta-analysis between the various studies but the results were very heterogeneous, clearly suggesting a negative finding (Forest plots, heterogeneity P-values and I 2 statistics are shown in figure S1).

Discussion
Pleiotropy is a fairly common phenomenon that is defined as one gene or allelic variant having an effect on multiple phenotypes. In a recent paper based on data from the catalogue of published GWAS, Sivakumaran and colleagues have reported that 4.6% of the SNPs and 16.9% of the genes present in the catalogue are shown to have pleiotropic effects [9]. These percentages probably underestimate the real biological significance, since they have been obtained using a very conservative threshold, such as considering only the SNPs available in the catalogue and associated with a particular disease or trait at a genome wide level. Using data from GWAS meta-analyses, pleiotropy seems to play a much stronger role for specific diseases, for example Cotsapas and collaborators reported that 44% of the susceptibility loci for autoimmune diseases overlap [21]. In a two-staged analysis of 3509 ER+ cases, 2543 ER2 and 7031 healthy controls, none of the SNPs showed a statistically significant association with breast cancer in the replication analysis. The strongest signal, in the replication analysis, was given by rs11229030 (a Crohn's disease susceptibility allele) that was associated with a decreased risk of ER2 BC (P value = 0.049) only in the MARIE study, but not in CPS-II or MCCS suggesting that the association found is probably due to chance.
There are several possible reasons why our pleiotropic approach failed to identify new SNP associatated with ER2 BC. First, ER2 BC may be associated with uncommon biologic pathways that are not shared with many other diseases and, therefore, may not be influenced by pleiotropy. This is consistent with the fact that there are several SNPs which are specifically associated with ER2 but not ER+. Alternatively, ER2 BC may share genetic risk factors with other common disease traits and phenotypes, but not with those we included in our analysis. The pleiotropic approach we used is necessarily limited by the number of disease traits and phenotypes that have been examined with enough statistical power to identify GWAS hits. It is possible that disease traits and phenotypes with biologic pathways similar to ER2 BC have not been examined adequately and are yet to be included in the NHGRI database.
We are aware of several limitations that this work might present: first, we were not able to include all the SNPs from the catalogue because 113 (3.6% of the total selected SNPs) variants were dropped from the analysis since no surrogate was obtained and it was not possible to impute data. Second, we replicated, as an exploratory analysis, only the 10 most significant SNP associations, thus we cannot exclude that a true positive association lies those SNPs that we did not attempt to replicate in the second phase, although due the complete lack of replication of the first ten SNPs this possibility seems unlikely. Third, we have included only the SNPs present in the GWAS catalogue, but we did not include other SNPs present in the regions. Since in pleiotropic regions the SNPs associated with different traits or diseases are not always the same, we can not exclude the possibility that we might have left out SNPs that are truly associated with ER2 but that are not yet present in the GWAS catalogue. The other limitation is the sample size of the replication set which is quite large, considering the rarity of the disease, but might have been inadequate to detect weaker associations.
In conclusion, and given the limitations summarized, we did not identify any pleiotropic SNP associated with ER-breast cancer. Figure S1 Forest plots, I 2 and heterogeneity P-values for the selected polymorphisms in the meta-analysis of the three studies.