Multi-Variant Pathway Association Analysis Reveals the Importance of Genetic Determinants of Estrogen Metabolism in Breast and Endometrial Cancer Susceptibility

Despite the central role of estrogen exposure in breast and endometrial cancer development and numerous studies of genes in the estrogen metabolic pathway, polymorphisms within the pathway have not been consistently associated with these cancers. We posit that this is due to the complexity of multiple weak genetic effects within the metabolic pathway that can only be effectively detected through multi-variant analysis. We conducted a comprehensive association analysis of the estrogen metabolic pathway by interrogating 239 tagSNPs within 35 genes of the pathway in three tumor samples. The discovery sample consisted of 1,596 breast cancer cases, 719 endometrial cancer cases, and 1,730 controls from Sweden; and the validation sample included 2,245 breast cancer cases and 1,287 controls from Finland. We performed admixture maximum likelihood (AML)–based global tests to evaluate the cumulative effect from multiple SNPs within the whole metabolic pathway and three sub-pathways for androgen synthesis, androgen-to-estrogen conversion, and estrogen removal. In the discovery sample, although no single polymorphism was significant after correction for multiple testing, the pathway-based AML global test suggested association with both breast (p global = 0.034) and endometrial (p global = 0.052) cancers. Further testing revealed the association to be focused on polymorphisms within the androgen-to-estrogen conversion sub-pathway, for both breast (p global = 0.008) and endometrial cancer (p global = 0.014). The sub-pathway association was validated in the Finnish sample of breast cancer (p global = 0.015). Further tumor subtype analysis demonstrated that the association of the androgen-to-estrogen conversion sub-pathway was confined to postmenopausal women with sporadic estrogen receptor positive tumors (p global = 0.0003). Gene-based AML analysis suggested CYP19A1 and UGT2B4 to be the major players within the sub-pathway. Our study indicates that the composite genetic determinants related to the androgen–estrogen conversion are important for the induction of two hormone-associated cancers, particularly for the hormone-driven breast tumour subtypes.


Introduction
Estrogen exposure is critical for the development of both breast and endometrial cancers and represents the most well-established risk factors for both diseases. Estrogen is a metabolic product whose circulating level is determined by de novo synthesis, conversion from other steroid hormones, and mechanisms of estrogen elimination. These metabolic processes are regulated by a network of enzymes encoded by different genes, suggesting that genetic variation within these metabolic genes may impact on breast and endometrial cancer risk. Genetic variation within the estrogen metabolic pathway has been intensively investigated, mostly by analyzing single variant effects in a limited number of candidate genes, SNPs and study subjects. The inadequacies of study design and analytical methodology have caused these studies to be underpowered for detecting moderate genetic effects which, not surprisingly, has led to inconsistent results [1][2][3][4][5][6][7][8]. We surmised that strategies for assessing the synergistic effect of multiple genetic variants within the estrogen metabolic pathway may provide a more realistic determination of genetic effect than a single gene, single SNP approach.
Herein, we present a comprehensive analysis of genetic variation in the estrogen metabolism pathway and its association with breast and endometrial cancer risk using a pathway-based approach.

Single SNP Association Analysis
We performed single SNP association analysis in 1596 breast cancer cases, 719 endometrial cancer cases and 1730 population controls from Sweden. Of the 239 tagSNPs analyzed, 17 SNPs (7.1%) had p-values less than 0.05 for breast cancer, and 18 SNPs (7.5%) had p-values less than 0.05 for endometrial cancer (Table  S4 and Table S5). For breast cancer, the smallest p-value was 0.00034 at rs7167936 within CYP19A1, and for endometrial cancer, the smallest p-value was 0.00017 at rs12595627 in CYP19A1. The single-SNP associations were all moderate. Only rs12595627 (for endometrial cancer) survived the conservative Bonferroni correction for multiple testing at a = 0.05. Overall, however, the single-SNP p values appeared to deviate from their null distribution of no association (formally tested below). The single-SNP associations were suggestive, but instead of any single variant having a strong effect, there appeared to be multiple weak associations within the metabolic pathway.

Multi-SNP Pathway Analysis
To evaluate the cumulative effect from multiple variants we employed the AML method [9] that assesses the experiment-wide significance of association by analyzing multiple SNPs through a single global test. The whole metabolic pathway can be sub-divided into three a priori defined sub-pathways, each performing specific metabolic function (Figure 1). Sub-pathway 1 is involved in the synthesis of androgen, sub-pathway 2 is involved in the conversion of androgens to estrogens, and sub-pathway 3 is responsible for removing estrogens. To investigate whether there is multi-SNP association for the whole pathway and whether any of the three subpathways is particularly important in influencing disease risk, we performed the progressive pathway-based global test on the whole metabolic pathway as well as the three sub-pathways using the AML method. The global test yielded marginally significant association for the whole metabolic pathway in both breast (p global = 0.034) and endometrial (p global = 0.052) cancers (Table 1). Dividing the metabolic pathway into three functional sub-pathways for the global test revealed strong association between the androgen-to-estrogen conversion sub-pathway and both breast (p global = 0.008) and endometrial (p global = 0.014) cancer (Table 1). The association evidence survived correction for performing 4 pathway-based tests in each cancer (p global corrected = 0.032 for breast and 0.056 for endometrial). In contrast, the other two sub-pathways showed no association with either form of cancer. For approximately half of the Swedish subjects in the breast cancer study (797 cases and 764 controls) we have genome wide association study (GWAS) data available. We used this to assess the possible influence of population stratification on our results. For the GWAS dataset, the genomic inflation factor, l gc , was1.015. Assuming an equal level of population stratification (in terms of the fixation index F ST ) in the current study and the GWAS sub-study, we estimated the genomic inflation factor, l gc , to be 1.030 in the current study, using the relationship between F ST , sample size and l gc described in [10]. Using the l gc value of 1.030 for genomic control-based correction of population stratification, the corrected global AML p-values for breast cancer are 0.052 for the entire pathway and 0.011 for the androgen-estrogen conversion sub-pathway, leaving our results largely unchanged. Even if l gc was as large as 1.05 in the current study, the global test p-value for the androgen-estrogen conversion sub-pathway would still be as low as 0.014. To further ensure that the observed associations could not be due to the employment of 319 paraffinembedded tissue samples in the analysis, we re-ran analyses excluding 319 paraffin-embedded tissue samples, and (at the same time) excluding 33 SNPs with call rates of less than 95%. Results were very similar. For example, for breast cancer, p-values were 0.028 and 0.009 for the entire pathway and for the androgenestrogen conversion sub-pathway, respectively. To validate the association in the androgen-to-estrogen conversion sub-pathway, we genotyped the 120 SNPs of this sub-pathway in an additional 2245 breast cancer cases and 1287 controls from Finland and performed the same AML analysis by using the 118 successfully genotyped SNPs. The validation analysis in the Finnish sample revealed similar evidence of association between the androgen-to-estrogen conversion sub-pathway and breast cancer (p global = 0.015) ( Table 1). The non-centrality parameter from the AML analysis of the androgen-toestrogen conversion sub-pathway, which represents the size of the common effect of the associated SNPs, was estimated as 2.90 for the Swedish sample and 2.94 for the Finnish sample. The similar values indicate a consistent size of the genetic effect in the two samples. A joint analysis of the Swedish and Finnish samples further yielded a global p-value of 0.001 ( Table 1). The SNPs with the lowest p-values in the Finnish sample are listed in Table S6.

Analysis of the Androgen-to-Estrogen Conversion Sub-Pathway in Breast Cancer Patient Subgroups
Hormone-related risk factors may play a differential role in breast cancer subtypes. In particular, estrogens appear to drive the development of ER positive tumors. This prompted us to investigate the association in the androgen-to-estrogen conversion sub-pathway in hormone-related breast tumor subtypes. As

Author Summary
Estrogen exposure is the most important risk factor for breast and endometrial cancers. Genetic variation of the genes involved in estrogen metabolism has, however, not been consistently associated with these two cancers. We posited that the genetic risk associated with the estrogen metabolic genes is likely to be carried by multiple variants and is therefore most effectively detected by multi-variant analysis. We carried out a comprehensive association analysis of the estrogen metabolic pathway by interrogating SNPs within 35 genes of the pathway in three tumor samples from Sweden and Finland. Through pathwaybased multi-variant association analysis, we showed that the genetic variation within the estrogen metabolic pathway is associated with risk for breast and endometrial cancers and that the genetic variation within the genes involved in androgen-to-estrogen conversion is particularly important for the development of ER-positive and sporadic breast tumors in postmenopausal women. Our study has demonstrated that the influence of genetic variation on hormone exposure has an impact on breast cancer development, especially on the development of hormone-driven breast tumor subtypes. Our study has also highlighted that future genetic studies of the estrogen metabolic genes should focus on the androgen-toestrogen conversion process.
surrogate markers for hormone driven tumour subtypes we constructed variables as combinations of menopausal status, family history and estrogen receptor (ER) status and divided all the patients into subgroups. We then compared subgroups of patients, defined on values of these variables, with controls, to evaluate the role of the androgen-to-estrogen conversion subpathway in different patient subgroups. First, we compared patient subgroups against all the controls in the combined Swedish and Finnish samples. The subgroup results showed that in the combined samples, significant association was observed in postmenopausal patients (p global = 0.009 and 0.018 respectively), postmenopausal patients without family history (p global = 0.001 and 0.04 respectively), and postmenopausal patients with estrogen receptor positive (ER+) tumors (p global = 0.0006 and 0.05 respectively) ( Table 2). No significant association was observed in either premenopausal patients or postmenopausal patients with family history or estrogen receptor negative (ER2) tumors.
Then, to rule out the possibility that the above subgroup results were caused by the mismatch between the patient subgroups and the controls in terms of the variables which defined patient subgroups,  Table 1. P values of the global tests of genetic association between the SNPs in the estrogen metabolic pathways and breast/ endometrial cancer risk. we performed the second subgroup analysis where the controls were also divided into subgroups according to family history and menopausal status ( Table 3). The second subgroup analysis was only performed in the Swedish sample, because the Finnish controls lack information on family history and menopausal status. This yielded similar evidence for the association of the sub-pathway with the hormone-driven subtypes of breast cancer as in Table 2.

Analysis of Reproductive Risk Factors' Impact on the Association of the Androgen-to-Estrogen Conversion Sub-Pathway with Breast Cancer
We further investigated the impact of reproductive risk factors on the genetic association of the androgen-to-estrogen conversion subpathway with breast cancer. Because the risk factor information is not available for the Finnish controls, the analysis of the reproductive risk factors was performed in the Swedish samples where information on such factors is available. We performed the AML analysis of the androgen-to-estrogen conversion sub-pathway with adjustment for the reproductive risk factors (parity, age at the first birth, age at menarche and age of menopause) and HRT use. We investigated this primarily to assess whether any of the reproductive risk factors could be in the causal pathway. Since p-values remained almost unchanged in all analyses (Table 4), it appears that none of the reproductive risk factors are likely to be in the causal pathway.

Gene-Based Analysis of the Androgen-to-Estrogen Conversion Sub-Pathway in Breast and Endometrial Cancers
Attempting to refine the association within the androgen-toestrogen conversion sub-pathway, we performed a gene-based AML analysis in the combined Swedish/Finnish breast cancer sample and the Swedish endometrial cancer sample. Among the 15 genes tested (Table 5), strong association was observed for CYP19A1 with both breast (p global = 0.003) and endometrial (p global = 0.006) cancer and UGT2B4 (p global = 0.002) with breast cancer only. The associations in breast cancer survived correction for multiple testing of 15 genes (p global corrected = 0.045 for CYP19A1 and 0.03 for UGT2B4). We also observed suggestive association for UGT2B11 in breast and endometrial cancer as well as for HSD11B1, SULT2A1 and SULT2B1 in breast cancer. Consistent with the pathway-based associations, the gene-based associations are generally more significant in sporadic postmenopausal patient samples than in the whole breast cancer sample (except SULT2B1). Furthermore, the importance of CYP19A1 and UGT2B4 in breast cancer risk is supported by the fact that excluding either gene from the global test of the sub-pathway reduced the global significance of association for the sub-pathway, from 0.0015 to 0.011 for CYP19A1, and to 0.010 for UGT2B4. However, the fact that the association for the sub-pathway remained significant, after excluding either gene, suggests that, although CYP19A1 and UGT2B4 are the major players, genetic variation within other genes also contributes to the association within the sub-pathway.

Discussion
Our pathway-based multi-SNP association analysis revealed a significant association between genetic variants in the androgento-estrogen conversion sub-pathway and the risk of two hormone dependent cancers. The association was particularly strong for ER+, sporadic breast cancer. Single SNP analysis did not reveal a similar association. We used the AML-based multi-SNP analysis,  Table 3. Subgroup analysis of the androgen-to-estrogen conversion sub-pathway in the Swedish samples. which has been shown to be more powerful than single SNP tests to yield significant and consistent association, when genetic risk is carried by multiple risk alleles each with moderate effect [11]. Pathway-based approaches are just beginning to be applied in association analysis [12]. Recently, an association study of 9 candidate gene groups (involving 120 candidate genes) was performed in breast cancer by using the AML approach, and interestingly, only the group of 8 genes involved in the steroid hormone signalling were significantly associated [13]. Our study has moved one step further and highlights the fact that the power of the pathway-based association analysis can be increased when analysis is guided by well-defined biological information. We believe that pathway approaches have potential to move genomewide association studies beyond their initial success of identifying some 'low-hanging fruits' to revealing many weak genetic risk alleles that have been missed by single SNP analysis.
Unless one enzyme is the rate limiting step for the entire metabolic pathway, it is not likely that small functional perturbations of individual variants would have a major impact on the overall effect of the metabolic pathway. To test the hypothesis that several genetic variants, each conferring weak to moderate effects, contribute to genetic risk, we adopted a systematic pathway-based approach for association analysis by testing the joint effect of multiple genetic variants in a progressive fashion from the whole metabolic pathway to biochemical subpathways and further down to individual genes. Such a progressive approach allows us to not only establish consistent association in three cancer samples from two different populations but also to refine the association of the androgen-to-estrogen conversion component of the metabolic pathway. Our study may therefore have advanced our understanding of the role of estrongen metabolism in breast and endometrial cancers by 1) accounting for the ambiguity surrounding the genetic association results and 2) indicating the androgen-to-estrogen conversion to be the important component of the metabolic pathway in modulating the risk and therefore to be a worthy focus for future studies.
After menopause, ovarian estrogen production dramatically declines and conversion of adrenal androgens to estrogens in peripheral tissues becomes the major source of circulating estrogens. The final step of this conversion is catalyzed by aromatase, encoded by CYP19A1 [11]. Thus, there is biological plausibility in the association between CYP19A1 polymorphisms and postmenopausal breast cancer. Moreover, pharmacological inhibition of aromatase prevents recurrences in postmenopausal women with estrogen-receptor-positive breast cancer and new contralateral primaries [14], which has challenged the previous routine of a 5-year course of tamoxifen alone [15]. Our study has advanced our understanding of CYP19A1 by suggesting that the modulation of aromatase activity by either germ-line variation or pharmacological agents can influence the development of ER+ tumour in postmenopausal women. Furthermore, the convergence of genetic and pharmacological effects of CYP19A1 also raises therapeutic possibilities. For example, other genes implicated by our genetic study, such as UGT2B4, might also be pharmacological targets for treating breast cancer.
Hormone exposure is a common risk factor for breast and endometrial cancer. Our employment of the three samples of two different hormone-related cancers from two different populations allowed us to apply a very stringent criterion for declaring an  association. Furthermore, results of our breast cancer patient subgroup analysis indicate that the genetic determinants within the androgen-to-estrogen conversion sub-pathway may play a more prominent role in postmenopausal women with sporadic ER+ tumors, further suggesting that the modulation of hormone exposure by genetic variation may have a differential impact on breast tumor subtypes. Endogenous sex hormone level appears to be associated with breast cancer risk in postmenopausal women [16], and particularly with the risk of ER+/PR+ breast tumors [17]. The effect of hormone-related factors on breast cancer risk apparently differs by ER status [18] and menopause status [19,20]. It could also differ by the status of family history of the disease, as suggested by a recent study showing that most cases of hereditary breast cancer are probably not related to cumulative hormone exposure [21]. Our findings may have therefore advanced the development of a general model for breast cancer risk: hormonal factors, both genetic and reproductive, can play a key role in the genesis of post-menopausal and ''sporadic'' breast cancer, whereas genes involved in DNA repair, checkpoints, and genetic stability (such as BRCA1, BRCA2, p53, ATM, CHK2) appear to be more involved in predominantly breast cancers associated with family history of disease. It is worth noting that the contribution of genetic polymorphisms to risk is a function of both their prevalence and penetrance and thus the relative importance of individual SNPs may vary from population to population. More studies in different populations are needed to fully understand the role of the androgen-to-estrogen conversion sub-pathway in breast cancer. We also want to highlight that our results are of genetic association in nature, and further studies are needed to confirm the findings and to identify functional variants causally linked to cancer risk.

Study Subjects
Swedish subjects were from a population-based case control study of breast and endometrial cancer as described [22,23]. Briefly, the study included all incident primary invasive breast and endometrial cancers among Swedish-born postmenopausal women between 50 and 74 years of age at diagnosis, diagnosed with breast cancer between October 1993 and March 1995 and endometrial cancer between January 1994 and December 1995. All cases were identified through six regional cancer registries in Sweden, and all controls were randomly selected from the Swedish Registry of Total Population and frequency matched to the expected age distribution of the cases.
Finnish breast cancer cases consist of two series of unselected breast cancer patients and additional familial cases ascertained at the Helsinki University Central Hospital. The first series of 884 patients was collected in 1997-1998 and 2000 and covers 79% of all consecutive, newly diagnosed cases during the collection periods [24,25]. The second series, containing 986 consecutive newly diagnosed patients, was collected in 2001-2004 and covers 87% of all such patients treated at the hospital during the collection period [26]. An additional 538 familial breast cancer cases were collected at the same hospital as described [27][28][29][30]. 1287 anonymous, healthy female population controls were collected from the same geographical regions in Southern Finland as the cases and have been used in several studies previously [31][32][33].
Risk factor information and tumour characteristics were available for all the Swedish samples and the Finnish cases, but were missing for the Finnish controls. The Finnish samples (mean age = 56 for the cases and 41 for the controls) were younger than the Swedish samples (mean age = 63 for both the cases and controls). All the risk factor and tumour characteristics information of the subjects are summarized in Table S1 and Table S2.
Written informed consent was obtained from all participating subjects, and the study was approved by the Institutional Review Boards in Sweden, Finland and at the National University of Singapore.

DNA Isolation
DNA was extracted from 4 ml of whole blood using the QIAamp DNA Blood Maxi Kit (Qiagen)and non-malignant cells in paraffin-embedded tissue using a standard phenol/chloroform/ isoamyl alcohol protocol [34].

Gene and SNP Selection
We selected 35 genes involved in estradiol or estrone metabolism and expressed in the breast (based on published literatures). We selected 1007 single nucleotide polymorphisms (SNPs) in these genes and their 30kb flanking sequences from the dbSNP (build 124) and Celera databases, aiming for a marker density of at least one SNP per 5kb (Table S3). These SNPs were genotyped in 92 Swedish control samples to assess linkage disequilibrium pattern and coverage. Haplotypes were reconstructed using the PLEM algorithm [35] implemented in the tagSNPs program [36]. A subset of SNPs, tagSNPs, were selected based on the R 2 coefficient, which quantifies how well the tagSNP haplotypes predict the genotype or haplotypes an individual carries. We chose tagSNPs so that common SNP genotypes and haplotypes (frequency $0.03) were predicted with R 2 $0.8 [37]. To evaluate our tagSNPs' performance in capturing unobserved SNPs within the genes, we performed a SNP-dropping analysis [38,39]. In brief, each of the genotyped SNPs was dropped in turn and tagSNPs were selected from the remaining SNPs so that their haplotypes predicted the remaining SNPs with an R 2 value of 0.85. We then estimated how well the tagSNP haplotypes of the remaining SNPs predicted the dropped SNP, an evaluation that can provide an unbiased and accurate estimate of tagSNP performance [38,39]. Overall, we selected and genotyped 302 tagSNPs from the 35 genes in all the Swedish cases and controls.

Genotyping
Genotyping was performed using the Sequenom system (San Diego, California). All genotyping results were generated with positive and negative controls and checked by laboratory staff unaware of case-control status. Of the 302 tagSNPs, 42 SNPs failed in the development stage of Sequenom genotyping assays. SNPs with a call rate ,85% (8 SNPs), minor allele frequency ,1% (9 SNPs) or out of Hardy-Weinberg Equilibrium (p,0.05/252, 4 SNPs) were excluded from further analysis. Overall, 239 tagSNPs from the 35 genes were successfully genotyped (Table S3). The genotype concordance was .99%, suggesting high genotyping accuracy.

Statistical Analysis
The Cochran-Armitage trend test was performed for each of the 239 SNPs. One approach for assessing the departure of the distribution of the (Cochran-Armitage) test statistics from the (global) null distribution (no SNPs associated) has been described by Tyrer et. al. [9]. The approach is based upon fitting a mixture model to the distribution of the test statistics, with two components, one representing SNPs which are independent of the case-control status, the other representing SNPs associated with case-control status. The Cochran-Armitage test statistics for the associated SNPs are assumed to all have the same (chi-squared) non-centrality parameter value. The distributed software for the ''admixture maximum likelihood'' (AML) test of Tyrer et. al. [9] calculates empirical p-values based on a ''pseudo-likelihood ratio'' test, comparing the ratio of values of the optimized likelihoods under the null and alternative hypotheses for the observed data, with the corresponding values obtained from a large number of data sets with case-control status permuted randomly. It also provides an estimate of the non-centrality parameter which is a measure of the common effect size of the associated SNPs within the pathway. We performed the AML-based global test of association for the full metabolic pathway as well as for 3 subpathways (see results section). In addition, we performed genespecific analyses, using the AML-based global test on SNPs within genes, within the androgen-estrogen conversion sub-pathway. We also carried out AML tests adjusted for a non-genetic risk factor using software provided by the authors of Tyrer et al. [9].