Association of Genetic Polymorphisms in CDH1 and CTNNB1 with Breast Cancer Susceptibility and Patients' Prognosis among Chinese Han Women

This study aims to investigate whether the germline variants in CDH1 and CTNNB1 would affect breast cancer susceptibility and patients’ prognosis among Chinese Han women using a haplotype-based association analysis. We genotyped 12 haplotype-tagging single nucleotide polymorphisms (htSNPs) in CDH1 and CTNNB1 among 1,160 BC cases and 1,336 age-matched cancer-free controls using the TaqMan® Genotyping Assay. For association analyses of germline variants with breast cancer susceptibility, the results showed that rs7200690, rs7198799, rs17715799, rs13689 and diplotype CGC/TGC (rs7200690 + rs12185157 + rs7198799) in CDH1 as well as rs2293303 in CTNNB1 were associated with increased breast cancer risk. In addition, the Generalized Multifactor Dimensionality Reduction (GMDR) and logistic regression analysis predicted an interaction on breast cancer risk between rs17715799 and rs13689 as well as rs13689 and menarche-FFTP (First Full-Term Pregnancy) interval. For survival analyses, the results demonstrated that the minor allele homozygotes of rs13689 and haplotype TGC in CDH1 were linked with unfavorable event-free survival of breast cancer, whereas, rs4783689 of CDH1 showed the opposite effect under dominant model. Notably, the stratified analysis revealed that rs7186053 was associated with favorable event-free survival among patients with estrogen receptor (ER)-positive, progesterone receptor (PR)-positive or lymph node metastasis negative patients. Moreover, rs7200690 and rs7198799 in CDH1 as well as rs4533622 in CTNNB1 were associated with worse event-free survival among patients with clinical stage 0-I tumors. This study indicated that the genetic polymorphisms of CDH1 and CTNNB1 were associated with breast cancer susceptibility and patients’ prognosis.


Introduction
Breast cancer (BC) is, by far, the most frequent cancer and the most likely common cause of cancer death among women [1]. Epithelial-mesenchymal transition (EMT) has been regarded as a potentially important event in the metastatic spread of tumor cells, in which epithelial tumor cells acquire a more motile and invasive phenotype and escape from the primary tumor [2,3]. In addition, induction of EMT also elicits numerous other properties that likely contribute to tumor development and progression including carcinogenesis, stem cell-like generation, resistance to chemotherapy and senescence, and evasion of the immune system [3,4]. The CDH1 and CTNNB1 genes, which encode the proteins E-cadherin and β-catenin respectively, are two crucial factors involved in the regulation of the EMT process [5], therefore, we proposed the hypothesis that single nucleotide polymorphism (SNP) in CDH1 and CTNNB1 genes would contribute to BC development and progression. E-cadherin, as a tumor-and an invasion-suppressor [6], is a homophilic cell-to-cell adhesion protein localized to the adherens junctions of all epithelial cells [7]. In breast cancer, partial or total loss of E-cadherin expression correlates with loss of differentiation characteristics, acquisition of invasiveness, increased tumor grade, metastatic behavior and poor prognosis [8]. Somatic inactivation of the CDH1 gene by mutations or allelic deletions, as well as promoter methylation, is frequent in BC [9]. Although the somatic and germline mutations in CDH1 is restricted to lobular breast tumors [8][9][10][11], ductal breast carcinomas often show strikingly reduced E-cadherin mRNA and protein expression [8]. This reduced expression could be explained by some mechanisms such as chromatin rearrangements, hypermethylation and alterations in trans-factor binding [8]. SNP, a common type of genetic variation, also contribute to this reduced expression. A functional polymorphism (rs16260, −160 C/A) in promoter of CDH1 was found to reduce E-cadherin expression [12], and linked with 30% increased risk of BC by the minor allele A [13]. In addition, several other SNPs in CDH1 such as rs13689, rs2059254 and rs12919719 were found to be associated with BC susceptibility [14].
β-catenin has two roles in the cells. It forms a functional cadherin-catenin adhesive complex and involves in cell-cell adhesion in the membrane, while its nuclear pool participates in signaling pathways and regulates a remarkable variety of cellular process such as cell proliferation, cell survival and migration [15]. β-catenin involves in the carcinogenesis of infiltrative ductal carcinoma [16], and is associated with increased BC risk and worse prognostic phenotype [16][17][18]. Although somatic mutation of CTNNB1 is rare in BC [19,20], mounting evidences have revealed that the somatic mutations in CTNNB1 are often associated with the upregulation of β-catenin and the pathogenesis of endometrioid-type of endometrial cancer and ovarian cancer [21,22]. Germline mutation in CTNNB1 is not found in BC. It is reported that null mutations of β-catenin in mice models result in gastrulation defects and embryonic lethality [23]. However, several germline variants of CTNNB1 were found to be associated with BC risk [24,25]. One study found that rs4135385 was linked with increased BC risk [24], while another study indicated that rs4135385 was associated with decreased BC risk [25].
Until now, there have been no comprehensive association studies of germline variants of the two genes with BC among Chinese Han population. Based on linkage disequilibrium (LD), a set of associated SNP alleles in a region of a chromosome forms a "haplotype", while a pair of haplotypes forms a diplotype. It is believed that applying a minority of informative SNPs called haplotype-tagging SNPs (htSNPs) can capture the contribution of almost all of the SNPs on a target gene to a specific phenotype [26,27]. In this study, we selected htSNPs in these two genes and comprehensively investigated the associations of genetic polymorphisms of CDH1 and CTNNB1 with BC susceptibility and event-free survival in Chinese Han population.

Study population
This case-control study included 1,160 female BC patients and 1,336 cancer-free controls. All the 1,160 cases were pathologically diagnosed with primary infiltrating ductal carcinoma of the breast at the Beijing Cancer Hospital in China during the period 1995-2007. Their epidemiological information was obtained from their clinical records, including age at diagnosis, height, weight, age at menarche and/or menopause, menopause status, age at first full-term pregnancy and family history of cancer in first-degree relatives. All the clinicopathological parameters were also collected from their clinical records, these including estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (Her2), tumor size, lymph node status, clinical stage (based on the 6th edition of TNM staging of the American Joint Committee on Cancer system), chemotherapy and endocrine therapy status [28]. The event-free survival time was defined as the time from the surgery to the breast events such as breast carcinoma recurrence, metastasis, and death caused by BC. Cases were censored if the patients were still alive or voluntarily withdrew or died of a cause other than BC before the latest follow-up (August 31, 2010) [29]. The median follow-up time after surgery was 3.4 years. Of the 1,160 cases, 51 cases had no operation, 17 cases lost to follow-up and 1 case died of unknown cause. Thus, there remained 1091 cases in the event-free survival analysis. The 1,336 controls were selected from cancer-free women participating in a communitybased screening programme for non-infectious diseases conducted in Beijing, China. The selection criteria included no history of cancer, Chinese Han ethnic background and agematched to cases (same 5-year group) [29]. All eligible controls completed an epidemiological questionnaire.
DNA isolation, genotyping assay, and quality control Genomic DNA was extracted from blood leukocytes. Genotyping was carried out by the ABI 7900HT Real-Time PCR System (Applied Biosystems, Foster City, CA, USA) using the Taq-Man 1 Assay according to the manufacturer's instructions (Applied Biosystems, Foster City, CA, USA). Primers and probes were supplied directly by Applied Biosystems as Assays-by-Design and Assays-on-Demand products, and PCR conditions were the same as described previously [30]. Positive and negative controls were contained in each 384 genotyping plate. As a quality control, genotyping in 2% of the samples were repeated, and the concordance between the duplicates was more than 99%.

LD block determination and haplotype construction
Linkage Disequilibrium (LD) plots of Lewontin coefficient (D') and squared correlation coefficient (r 2 ) between the genotyped SNPs were produced based on our genotyping data using the Haploview program [31]. Then haplotype blocks in cases and controls were respectively reconstructed with the Haploview 4.2 software. For haplotype estimation, the most probable haplotypes for each participant were estimated using the SAS 9.1 PROC HAPLOTYPE procedure [28].

Statistical analysis
Differences in demographic characteristics and selected variables between cases and controls were compared by two-sided chisquare (x 2 ) test or student's t test. Hardy-Weinberg equilibrium was evaluated for each SNP using a one-degree of freedom goodness-of-fit test among the controls, and the cut-off threshold we used was 0.05 [27]. A two-sided x 2 test was employed to compare differences in the distributions of genotypes and alleles between cases and controls. Each genotype was assessed according to codominant, dominant and recessive models. In addition, Cochran-Armitage trend test was performed to estimate the association between BC risk and allele dose in each SNP (P trend). Furthermore, Odds ratios (ORs) with 95% confidence intervals (95% CIs) were calculated to evaluate the effects of genotypes or haplotypes on BC risk using both univarirate and multivariate unconditional logistic regression models, adjusted for age at menarche, age of first birth and family history of cancer in first-degree relatives [29,30]. Generalized Multifactor Dimensionality Reduction (GMDR) method (GMDR Beta program, version 7 software) combining with logistic regression models were used to analyze gene-gene and gene-environment interactions. GMDR is a nonparametric and genetic modelfree alternative to linear or logistic regression for detecting and characterizing nonlinear interactions among discrete genetic and environmental attributes. The GMDR method, compared with the MDR method, can use score statistics to process both quantitative and dichotomous traits and permits adjustment for covariates. Gene-gene interactions were examined for one-to five-locus models by GMDR, which included 12 SNP loci and adjusted for age, Body Mass Index (BMI), age at menarche, age of first birth, and family history of cancer in first-degree relatives. Gene-environment interactions were also examined for one-to five-order models by GMDR, which handled 16 attributes including 12 SNP loci, age, BMI, family history of cancer in first-degree relatives and menarche-FFTP (First Full-Term Pregnancy) interval [27] and adjusted for 5 covariates including age, BMI, age at menarche, age of first birth, and family history of cancer in first-degree relatives. We further explored the effects of these interactions on BC risk using logistic regression models.
Survival estimates were computed using the Kaplan-Meier method, and differences between survival times were evaluated using the log-rank test. To further investigate the associations of clinicopathological parameters, genotypes and haplotypes with event-free survival, hazard ratio (HR) and 95% CIs were calculated using univariate and multivariate Cox proportional hazards model, adjusted for ER status, PR status, Her2 status, tumor size, clinical stage, lymphnode metastasis, chemotherapy and endocrine therapy status. Stratified association analysis was conducted to determine the effects of interaction between genetic variants and clinical risk factors on BC survival. All statistic analyses were done with Statistic Analysis System software (v.9.1; SAS Institute, Cary, NC). A two-sided P value <0.05 was considered statistically significant.
were obtained from the BC patients who can read and write. For the BC patients who cannot read and write, verbal consent was obtained and written consent was signed by her next of kin. The written consent procedure was approved by IRB. The data/samples were used anonymously. PKU IRB approved our application to waive informed re-consent for the already collected BC samples in the tissue/blood biobank. This study only used this part of BC samples. This study was approved by the Peking University IRB (reference no. IRB00001052-11029).

Characteristics of the study population
The epidemiological characteristics of the 1,160 infiltrating ductal BC cases and 1,336 cancerfree controls included in this study were summarized in S1 Table. There was no significant difference in the distribution of age (P = 0.397), BMI (P = 0.661), age at menopause (P = 0.845), menopause status (P = 0.629) and number of childbirth (P = 0.362). As expected, cases were more likely to have an early age at menarche (P = 0.0001), late age at first full-term pregnancy (FFTP) (P<0.0001), and a greater likelihood of family history of cancer in first-degree relatives (P = 0.0064). The clinicopathological parameters of cases were also summarized in S1

LD degree between SNPs
The 12 SNPs were all in agreement with Hardy-Weinberg equilibrium (P>0.05) in the controls (S2 Table). The D' and r 2 between 9 SNPs in CDH1 and between 3 SNPs in CTNNB1 within our cases, controls and HapMap CHB population were calculated using Haploview 4.2 software. The LD degree of all SNPs in HapMap CHB population, our control and case population were showed in Fig 1. We reconstructed a 3-SNP haplotype block1 [rs7200690 (C>T) + rs12185157 (A>G) + rs7198799 (C>T)] and a 2-SNP haplotype block2 [rs10431923 (T>G) + rs7186053 (G>A)] for CDH1 according to our genotyping data in controls (Fig 1). While for CTNNB1, we didn't reconstruct any haplotype block due to the weak LD degree of the 3 SNPs in CTNNB1 in cases, controls as well as in HapMap CHB population (Fig 1).

Associations of genotypes, haplotypes and diplotypes with BC susceptibility
As shown in Table 1, two-sided x 2 test only indicated significant differences in allele frequencies between cases and controls for rs2293303 (P = 0.0404) in CTNNB1 among all the 12 SNPs, but showed significant differences between cases and controls in genotype frequencies of rs7200690, rs7198799, rs17715799 and rs13689 in CDH1 as well as rs2293303 in CTNNB1. Both univariate and multivariate unconditional logistic regression analyses demonstrated that the minor allele homozygotes of these five SNPs could increase BC susceptibility compared with their corresponding heterozygotes and common homozygotes (Table 1). To assess the relative importance of these five at-risk SNPs, multiple logistic regression analysis was performed, which included all the five at-risk SNPs in the full model and used stepwise procedures to select the relatively important SNPs associated with BC risk. After adjusting for the other four SNPs, rs13689 and rs2293303 became much more significant in increasing BC risk (rs13689: aOR = 1.87, 95% CI = 1.25-2.81, P = 0.0024; rs2293303: aOR = 1.88, 95% CI = 1.04-3.40, P = 0.0373), whereas the statistical significance for rs7200690, rs7198799 and rs17715799 disappeared ( Table 2).
For gene-gene interaction, the GMDR analysis indicated that, with adjustment for covariates (age, BMI, age at menarche, age of first birth and family history of cancer in first-degree relatives), the 2-order model including rs17715799 and rs13689 would be the best model for predicting BC risk, which had high Cross Validation Consistency (8/10) and Testing Balanced Accuracy (0.5165) with the Sign Test P = 0.0107 (Table 3). The multiple logistic regression analysis demonstrated that women harboring rs17715799 TT and rs13689 CC had a significantly higher risk of BC after adjustment for age, BMI, age at menarche, age of first birth and family history of cancer in first-degree relatives (OR = 1.68, 95% CI = 1.28-2.20, P = 0.0002) ( Table 4). For gene-environment interaction, after adjustment for age, BMI, age at menarche, age of first birth and family history of cancer in first-degree relatives, the best model included rs13689 and menarche-FFTP interval with the greatest Cross Validation Consistency (10/10) and Testing Balanced Accuracy (0.5955) (Sign Test P = 0.0010), indicating that the interaction of rs13689 and menarche-FFTP interval play an important role on BC risk (Table 3). Notably, menarche-FFTP interval was the best one-order model and emerged in all five models, suggesting that menarche-FFTP interval was an extremely important factor affecting BC susceptibility. Therefore, we further performed logistic regression analysis to assess the joint effect of these atrisk SNPs and menarche-FFTP interval on BC risk. The results revealed that women harboring  the minor allele homozygotes of one of the five at-risk SNPs and longer menarche-FFTP interval (>11 years) had a remarkable increase in BC risk (all P<0.05), indicating a synergistic effect of these at-risk SNPs and menarche-FFTP interval on BC susceptibility ( Table 5).  Table). In CTNNB1, no genotypes, haplotypes and diplotypes were associated with eventfree survival (data not shown). Then, we performed stratified analysis. In CDH1, the patients harboring AA or GA genotype of rs7186053 (G>A) had favorable event-free survival in less aggressive tumor subgroups, such as in ER-positive group, PR-positive group and negative lymph node metastasis group (Fig 3, Table 6). The minor allele homozygote carriers of rs7200690 (C>T) or rs7198799 (C>T) had unfavorable event-free survival among patients with clinical stage 0-I tumors (Fig  4, Table 7). In CTNNB1, AA genotype of rs4533622 (C>A) was associated with worse BC event-free survival among patients with clinical stage 0-I tumors (aHR = 9.04, 95% CI = 0.93-87.96, P = 0.0580) (Fig 4, Table 7).

Discussion
To our knowledge, this is the first haplotype-based association study of CDH1 and CTNNB1 with breast cancer susceptibility and patients' survival in Chinese Han population.
As CDH1 mutations were frequent in gastric cancer (GC) [32,33], several studies were performed to examine the association of CDH1 genetic polymorphisms with GC risk [34][35][36]. Zhan Z et al. genotyped four potentially functional polymorphisms (rs13689, rs1801552, rs16260 and rs17690554) of the CDH1 gene in a case-control study of 387 gastric cancer cases and 392 controls, and they found no association of these four SNPs with overall gastric cancer risk, however, they revealed that rs16260 and rs17690554 were associated with the risk of diffuse gastric cancer in subgroup analysis [36]. Another two studies performed by Cui Y et al. and Zhang XF respectively did not find any association between rs16260 and gastric cancer risk [34,35]. In this study, we found that rs7200690 (C>T), rs7198799 (C>T), rs17715799 (A>T) and rs13689 (T>C) conferred around 50%, 80%, 50% and 90% increased risk for BC respectively in recessive model. Alicia Beeghly-Fadiel and colleagues genotyped 40 SNPs of CDH1 in 2,083 BC cases and 2,152 controls from urban Shanghai. They demonstrated that BC risk was not only associated with rs13689, but also associated with rs2059254 and rs12919719, which were in high LD with rs7198799 and rs17715799 respectively [14]. Therefore, their results are consistent with ours. A functional polymorphism in CDH1 promoter (rs16260, −160 C/A), which was reported to reduce E-cadherin expression by the minor allele A [12], was found to be linked with 30% increased risk of BC [13]. However, no effect of rs16260 was seen on BC risk in a European population-based study as well as on BC survival in a British population-based study [37,38]. Although rs16260 was not directly genotyped in our study,  the genetic variation of this polymorphism was captured. The SNP rs7200690, genotyped in our study, is reported to be in perfect LD (D' = 1.0, r 2 = 1.0) with rs16260 [39]. In the present study, rs7200690 was demonstrated to be associated with BC risk. Notably, the minor allele homozygotes of rs7200690 (C>T) was shown to be associated with worse BC event-free survival among patients with clinical stage 0-I tumors. Statistical differences remained significant after adjustment for other survival affecting factors. The htSNP rs7200690 tagged 19 SNPs (r 2 = 1.0), these including a functional SNP rs16260 in CDH1 promoter and 18 intron SNPs. Therefore, we speculated that the association of rs7200690 with BC was most likely driven by its tagged SNPs rs16260. So, an additional fine-mapping study and the corresponding functional study will be helpful in identifying the causal variants. In addition, we found that rs13689, located in 3'UTR of CDH1, was associated with increased risk and unfavorable survival of BC. We also found that rs13689 interacted with rs17715799 and menarche-FFTP interval, and these factors jointly affected BC risk. 3'UTR is essential in mRNA stability and localization [28], and it may also be the binding site of miRNA. Polymorphisms in the 3'UTRs of several genes have been reported to be associated with diseases by affecting miRNAregulated gene/protein expression [40]. Therefore, functional studies are needed to determine how this SNP rs13689 influence the BC susceptibility and event-free survival.
Given the combined effect of multiple alleles may provide a stronger predictor than individual SNPs, we therefore investigated disease associations with haplotypes and diplotypes. The 3-SNPs diplotype CGC/TGC [rs7200690 (C>T) + rs12185157 (A>G) + rs7198799 (C>T)], could increase about 60% of BC risk. Moreover, haplotype TGC was correlated with an unfavorable event-free survival.
For CTNNB1, we couldn't reconstruct haplotype, so, we just conducted genotype analysis. We found that rs2293303 conferred a 1.0-fold increased risk of BC in recessive model. Wang et al. analyzed five tagSNPs of CTNNB1 in 944 gastric cancer cases and 848 controls in Chinese population, and found that rs2293303 was correlated with increased risk of gastric cancer [41]. It indicates that rs2293303 may play a similar role on breast and gastric cancer susceptibility in Chinese population. Importantly, rs2293303, located in gene-coding regions of CTNNB1, is a synonymous SNP (sSNPs). Although sSNPs do not change the amino acid composition of the encoded proteins owing to the degeneracy of the genetic code, but considerable evidence has accumulated to show that synonymous substitutions could affect mRNA splicing, mRNA stability, splicing accuracy, mRNA structure, translation fidelity and thus protein expression and enzymatic activity [42]. In addition, sSNPs can also affect protein folding and conformation because codon bias could affect tertiary protein structure [42], therefore, they have functional and clinical consequences. Additional mechanism studies are needed to determine how the SNP rs2293303 influence the BC susceptibility. Alanazi and colleagues genotyped rs13072632 and rs4135385 in CTNNB1 in 99 cases and 93 controls in Suadi population, and found that rs4135385 was linked with increased BC risk [24]. Lee et al. analyzed 1,536 SNPs in 203 genes among 209 cases and 209 controls in Korean women, and indicated that rs4135385 was associated with decreased BC risk [25]. In our study, no association was found between rs4135385 and BC risk. The discrepancies among these results could be due to the limited sample size of each study, ethnic diversity of populations and complicated environmental factors. Mostowska el at. genotyped SNP rs4533622 and rs2953 of CTNNB1 in 228 ovarian cancer women and 282 controls, and found no association between the two SNPs and ovarian cancer risk [43]. Similarly, no association between rs4533622 and BC risk was found in the present study. Ting and colleagues genotyped 10 tagSNPs of CTNNB1 and APC in 282 Chinese colorectal cancer patients, and found no associations between the analyzed SNPs and colorectal cancer survival [44]. Wang el at. demonstrated that rs4135385 AG/AA genotypes were significantly associated with a favorable gastric cancer survival [41]. In our event-free survival analysis, we identified no association between the three SNPs of CTNNB1 and BC event-free survival. However, the stratified analysis demonstrated that rs4533622 was associated with worse BC event-free survival among patients with clinical stage 0-I tumors.
Breast cancer is a complex disease, resulting from the interaction of multiple environmental, hormonal, and lifestyle risk factors with the individual's genetic factors [45]. We, therefore, analyzed the gene-gene and gene-environment interactions on BC susceptibility using GMDR method and logistic regression analysis. We observed interactions on BC risk between rs17715799 and rs13689 as well as rs13689 and menarche-FFTP interval. Furthermore, the five at-risk SNPs and longer menarche-FFTP interval (>11 years) would jointly affect BC susceptibility. All these results suggested some putative interactions between gene-gene and gene-environment on BC susceptibility. In summary, this study indicated that the genetic polymorphisms of CDH1 and CTNNB1 were associated with breast cancer susceptibility and prognosis. Due to the fact that these SNPs examined in this study were htSNPs of the two genes, additional studies are warranted to verify these results and identify the causal variants.
Supporting Information S1