Association Study of Germline Variants in CCNB1 and CDK1 with Breast Cancer Susceptibility, Progression, and Survival among Chinese Han Women

The CCNB1 and CDK1 genes encode the proteins of CyclinB1 and CDK1 respectively, which interact with each other and are involved in cell cycle regulation, centrosome duplication and chromosome segregation. This study aimed to investigate whether the genetic variants in these two genes may affect breast cancer (BC) susceptibility, progression, and survival in Chinese Han population using haplotype-based analysis. A total of ten tSNPs spanning from 2kb upstream to 2kb downstream of these genes were genotyped in 1204 cases and 1204 age-matched cancer-free controls. The haplotype blocks were determined according to our genotyping data and linkage disequilibrium (LD) status of these SNPs. For CCNB1, rs2069429 was significantly associated with increased BC susceptibility under recessive model (OR=2.352, 95%CI=1.480-3.737), so was the diplotype TAGT/TAGT (OR=1.947 95%CI=1.154-3.284, P=0.013). In addition, rs164390 was associated with Her2-negative BC. For CDK1, rs2448343 and rs1871446 were significantly associated with decreased BC risk under dominant models, so was the haplotype ATATT. These two SNPs also showed a dose-dependent effect on BC susceptibility. Using stratified association analysis, we found that women with the heterozygotes or minor allele homozygotes of rs2448343 had much less BC susceptibility among women with BMI<23. In CDK1, three closely located SNPs, rs2448343, rs3213048 and rs3213067, were significantly associated with tumor’s PR status: the heterozygotes of rs2448343 were associated with PR-positive tumors, while the minor allele homozygotes of rs3213048 and heterozygotes of rs3213067 were associated with PR-negative BC tumors. In survival analysis, rs1871446 was associated with unfavorable event-free survival under recessive model, so was the CDK1 diplotype ATATG/ATATG, which carried the minor allele homozygote of rs1871446. Our study indicates that genetic polymorphisms of CCNB1 and CDK1 are related to BC susceptibility, progression, and survival in Chinese Han women. Further studies need to be performed in other populations as an independent replication to verify these results.


Introduction
Breast cancer (BC) has become the most common cancer affecting women all around the world; its incidence also ranks first among female cancers in China [1]. Several breast cancer predisposition genes have been identified such as lowfrequency, high-penetrance genes BRCA1, BRCA2, PTEN and p53, as well as low-frequency, intermediate-penetrance genes CHEK2, ATM and PALB2 [2]. However, these mutations explain only a small proportion of the total genetic risk of BC. As a common complex disease, BC is also interpreted by high-frequency, low-penetrance genetic variation according to the "common disease, common variants" hypothesis [3].
Single nucleotide polymorphisms (SNPs), which amount to approximately 15 million in human genome [4], denote sites where the genomes of different people vary by a single base. Based on linkage disequilibrium (LD) theory [5], a set of informative SNPs (tag SNPs) can capture the contribution of the whole SNPs in a region of a chromosome. Therefore, it is cost-effective to genotype tSNPs. A set of associated SNP alleles in a region of a chromosome is identified as a "haplotype", while a pair of haplotypes forms a "diplotype".
CyclinB1 and CDK1, which are two crucial regulatory proteins of centrosome, can form a complex (M-phase promoting factor, MPF) and regulate the entry into mitosis [11], enhance chromosome condensation and nuclear envelope breakdown [12,13]. The overexpression of CyclinB1 has been found in brain astrocytoma, cervical carcinoma, lung cancer, and many other cancers [14,15]. CCNB1 amplification was also reported in colorectal adenocarcinomas [16]. One study found that docetaxel could suppress the expression of CCNB1 in non-small cell lung cancer NCI-H460 cells [17]. Hui Cai and colleagues found that rs2069433 in CCNB1 was related to a reduction in endometrial cancer risk [18]. H Ma and colleagues reported that rs2069429 in CCNB1 was associated with nonsmall cell lung cancer survival [19]. CDK1 overexpression has been found in human gliomas [20]. Previous research in our lab found that CyclinB1 and CDK1 were highly expressed in BC and associated with patients' overall survival (unpublished data).
Based on the previous studies, we proposed the hypothesis that genetic variants in CCNB1 and CDK1 contributed to BC's susceptibility, progression and patients' survival. To examine this hypothesis, we selected 4 and 6 tSNPs to represent these two genes respectively, spanning from 2kb upstream to 2kb downstream of CCNB1 and CDK1 (chromosome 5:68,496,669...68,511,822 for CCNB1; chromosome 10:62,206,242…62,225,930 for CDK1). In this study, we comprehensively investigated the associations of tSNPs, haplotypes, and diplotypes in CCNB1 and CDK1 with BC susceptibility, clinicopathological parameters and event-free survival in Chinese Han population\.

Study Population
This study included 1204 female BC patients and 1204 cancer-free unrelated female individuals. All the 1204 patients were pathologically diagnosed with primary invasive ductal breast carcinoma from 1995 to 2007 in Beijing Cancer Hospital. Their epidemiological information was obtained from their clinical records, including age at diagnosis, height, weight, age at menarche and/or menopause, menopause status, age at first full-term pregnancy and family history of cancer in firstdegree relatives. Body mass index (BMI) was calculated by weight and height, and was used to quantify obesity (BMI≥23 as overweight; BMI<23 as normal) [21]. All the clinicopathological parameters, including ER, PR, Her2, tumor size, lymph node status, and clinical stage (based on the 6th edition of TNM staging of the American Joint Committee on Cancer system), were also collected from their clinical records. The event-free survival time was defined as the time from the surgery to the breast events such as breast carcinoma recurrence, metastasis, and death caused by BC. Cases were censored if the patients were still alive or voluntarily withdrew or died of a cause other than BC before the latest follow-up (August 31, 2010). Of the 1204 cases, 48 cases had no surgery, 20 cases died of unknown causes, and 131 cases were lost to follow-up. Therefore, 1005 cases remained in the event-free survival analysis.
The 1204 cancer-free female individuals were selected from a community-based screening program for non-infectious diseases conducted in Beijing. The controls were age-matched to cases by 5-year age groups. All the epidemiological information was collected from the questionnaire they completed.
This study was approved by the Peking University IRB (reference No. IRB00001052-11029). Breast Cancer samples were collected initially for research purposes in the tissue/blood biobank. Written consents were collected from the BC patients who can read and write. Verbal consents were obtained from the BC patients who cannot read and write, however, for these cases, written consent was signed by her next of kin. Written consents were obtained from all control samples. The IRB approved the written consent procedure. The data/samples were used anonymously. PKU IRB approved our application to waive informed re-consent for the already collected BC samples in the tissue/blood biobank. This study only used this part of samples.  [23] implemented in the HaploView software 4.2 [24], we identified 4 tSNPs in CCNB1 (being rs350104, rs2069429, rs164390, rs2069433) and 6 tSNPs in CDK1 (being rs2448343, rs3213048, rs3213067, rs1871446, rs10711, rs1060373) to best capture the common genetic variations within the genes.

LD block determination and haplotype construction
The most probable haplotypes for each participant were estimated using the SAS9.1 PROC HAPLOTYPE procedure. According to the genotyping results of the tSNPs, Linkage Disequilibrium (LD) measured by Lewontin coefficient (D') and squared correlation coefficient (r 2 ) between the genotyped SNPs was calculated [24]. Then haplotype blocks in cases, controls, and all the participants were respectively reconstructed with the HaploView 4.2 software.

Statistical analysis
For each tSNP, Hardy-Weinberg Equilibrium in control subjects was examined. Two-sided t test (for continuous variables) and chi-square (χ 2 ) test (for categorical variables) were performed to determine the differences between cases and controls. Each tSNP was evaluated according to codominant, dominant, and recessive models [26]. Two-sided chi-square test was also used to investigate the differences in the distributions of genotypes between cases and controls, and to evaluate the association of alleles or genotypes with the clinical parameters. The effects of alleles or genotypes on breast carcinoma risk and progression were determined by odds ratios (OR) and 95% Confidence Intervals (95% CI) using both univariate and multivariate Logistical Regression models [25,27,28]. For gene-gene and gene-environment interaction analysis, we conducted stratified association analysis. Kaplan-Meier curves were generated for event-free survival, and logrank statistics were also used to verify the survival curves. Both univariant and multivariant Cox's proportional hazard model were used to determine the hazard ratio (HR) and the corresponding 95% CI. A two-sided P value<0.05 was considered statistically significant. All analyses were performed using Statistic Analysis System software (SAS v9.1, SAS Institute, Cary, NC).

Characteristics of the population
The demographic data were analyzed by chi-square test (for categorical variables) and two-sided t test (for continuous variables) ( Table S1). The cases and controls appeared to be adequately matched on age (P=0.437). As expected, the cases had a much younger age at menarche (P<0.0001), fewer number of births (P<0.0001) and an elder age at first full-term pregnancy (P<0.0001) than the controls. In addition, cases were more likely to have a family history of cancer (P=0.045) and a high BMI (P=0.015), and have been breastfeeding less than 6 months (P<0.0001).

LD degree between SNPs
The 10 tSNPs were all in agreement with Hardy-Weinberg equilibrium (P>0.1) in the controls (Table S2). Table 1 illustrated the frequency distributions of alleles and genotypes for the ten tSNPs among cases and controls. The LD degree of all tSNPs in case population, control population, and total subjects (cases plus controls) were shown in Figure 1. The haplotype block of CCNB1 in cases was consistent with that in control population, and thus the 4-SNP haplotype block was chosen (rs350104, rs2069429, rs164390, and rs2069433). However, in CDK1, the rs3213048 and rs 10711 were in strong LD in controls, but in weak LD in cases. Therefore, we chose the 5-SNP haplotype block for CDK1 according to our genotyping data in controls (rs2448343, rs3213048, rs3213067, rs1871446, rs10711).

Associations of genotypes, haplotypes, and diplotypes with BC susceptibility
Two-sided chi-square test indicated significant differences both in allele frequencies and in genotype frequencies of rs2069429 (CCNB1), rs2448343 (CDK1), and rs1871446 (CDK1) ( Table 1). In CCNB1, both univariate and multivariate logistic regression showed that rs2069429 (G>A) could increase the BC risk under recessive model (OR=2.352, 95% CI=1.480-3.737). In CDK1, the heterozygotes and minor allele homozygotes of both rs2448343 (G>A) and rs1871446 (C>T) could decrease the BC risk compared with the common homozygotes (Table 1). Multiple logistic regression analyses including these 2 SNPs in the full model was performed in order to select the more important SNPs associated with BC susceptibility. The result indicated that the statistical significance of rs2448343 disappeared with P=0.379 (OR=0.935, 95% CI=0.804-1.086), while the OR value for rs1871446 decreased a little (OR=0.763, 95% CI=0.644-0.904, P=0.002). The joint effects of these two protective loci in CDK1 were also examined ( Table 2). A dose-dependent effect of rs2448343 and rs1871446 in CDK1 was observed with P trend =0.0002. In other words, women harboring two protective loci of rs2448343 and rs1871446 showed lower risk of BC than those harboring one protective locus.
Given that age at menarche, number of births, age at first full-term pregnancy, family history of cancer and BMI were well known clinical risk factors of BC (Table S1), we then assessed whether the interactions between these clinical risk factors and the genetic variants would jointly affect BC susceptibility. We conducted stratified association analysis of genetic variants in CDK1 by the above clinical risk factors. The result indicated that the association between AG or AA genotype of rs2448343 and decreased breast cancer risk was only significant among subjects with BMI<23 status (adjusted OR=0.579, 95% CI=0.465-0.720) ( Table 3). Compared with women with AG or AA genotype of rs2448343 (adjusted OR=0.796, 95% CI=0.708-0.895) ( Table 1) or BMI<23 status (adjusted OR=0.806, 95% CI=0.713-0.912) alone, women with both AG or AA genotype of rs2448343 and BMI<23 status had less BC susceptibility. No other significant association was observed in our study.
As haplotype and diplotype analysis may provide more power to detect association than single marker analyses alone [29], we also detected the associations of haplotypes and diplotypes in CCNB1 and CDK1 with breast cancer risk. The 4-SNP haplotype in CCNB1 had no significant association with BC susceptibility (Table S3). However, in CCNB1, the 4-SNP haplotype pairs (diplotype) TAGT/TAGT (rs350104, rs2069429, rs164390, and rs2069433), which carried the minor allele homozygotes of the risk SNP rs2069429, could increase about    Table S3). When it came to the diplotype, GCACG/ GTGCT could increase the risk of BC (Table S4).

Association of genotypes, haplotypes, and diplotypes with clinicopathological parameters
The associations of genotype, haplotype and diplotype with chinicopathological parameters (including ER status, PR status, Her2 status, tumor size, lymph node status, and clinical stage) were also examined in our study.
In CCNB1, we found that patients harboring TT genotype of rs164390 (G>T) were less likely to have Her2-positive tumors (OR=0.573, 95% CI=0.379-0.868, P=0.009, shown in Table  S5). Patients harboring the 4-SNP haplotypes CGGT and TAGT, which both carried the major allele of rs164390, were more likely to have Her2-positive BC compared with patients harboring common haplotype TGTT

Associations of genotypes, haplotypes, and diplotypes with event-free survival
First of all, the association of the clinicopathological parameters with event-free survival was analyzed. As expected, aggressive clinicopathological parameters, such as PR-negative status, Her2-positive status, tumor size >2cm, lymph node metastasis and clinical stage II-IV, were all associated with worse survival in both Kaplan-Meier log-rank analysis and the Cox's proportional hazard model analysis (Table 4).

Discussion
Chinese Han population is the largest ethnic group and constitutes about 92% of the population of the People's Republic of China. Many studies investigated the associations between SNPs and breast cancer susceptibility among Chinese Han population. Some SNPs in DNA repair related genes, such as APE1, XRCC1, ERCC1 and XPF [30][31][32] were found to be associated with breast cancer susceptibility in Chinese Han population, so did some SNPs in cell-cycle genes such as CCNE1 and CDK2 [33]. Also, some meta-analysis studies found that SNPs in genes MDR, MTR, SLC4A7, ATR and CHEK1 were significantly associated with breast cancer susceptibility [34][35][36][37]. However, as far as we know, this is the first study to comprehensively evaluate the association of germline variation in CCNB1 and CDK1, two essential centrosome-regulating genes in cell cycle, with BC risk, progression and survival in Chinese Han population.
CCNB1 and CDK1 genes encode CyclinB1 and CDK1, which are two critical proteins and interact with each other in cell cycle. Accumulating evidence demonstrated that both the CyclinB1 and CDK1 overexpression could contribute to cancer risk and progressions [14,15,20]. In this study, we hypothesized that the genetic variation in CCNB1 and CDK1 had great impact on susceptibility, progression and survival of breast cancer. With a case-control study including 1204 breast cancer patients and 1204 age-matched controls, we genotyped 10 tSNPs of these two genes.
For CCNB1, 4 tSNPs including rs350104, rs2069429, rs164390 and rs2069433 were analyzed, and these 4 SNPs formed a 4-SNP haplotype block according to our control population data. The result revealed that rs2069429 was significantly associated with high BC susceptibility under recessive model. Diplotype analysis in CCNB1 showed that the diplotype TAGT/TAGT, which carried the minor allele homozygotes of rs2069429, was more likely to have BC than the common diplotypes. The SNP rs2069429 is located in 0.1 kbp upstream of CCNB1 and may be the regulatory region of the CCNB1 transcription. Notably, Hui Cai and colleagues genotyped 3 tagging SNPs of CCNB1 in 1449 newly diagnosed endometrial cases from Shanghai Cancer Registry in China, and found that rs2069433 was related to a reduction in endometrial cancer risk [18], however, no significant association was observed between rs2069433 and breast carcinoma susceptibility in our study. Four SNPs of CCNB1 (rs352626, rs350104, rs2069429, rs164390) were genotyped by H Ma and colleagues in 828 non-small cell lung cancer cases, and rs2069429 was found to be associated with NSCLC survival with a log-rank P<0.1 under recessive model [19]. In our survival analysis, no significant association in CCNB1 was observed. The difference between our result and H Ma's result can be explained as follows. Firstly, our cases were 1204 breast carcinomas while their cases were 828 NSCLC patients. Secondly, a two-sided P value<0.05 was considered statistically significant in our study instead of P<0.1. Our results also demonstrated that TT of rs164390 in CCNB1 was associated with Her2-negative tumors, consistent with the result that the haplotypes of CGGT and TAGT were associated with Her2-positive tumors and diplotype TGTT/TGTT were associated with Her2-negative tumors. No previous research has analyzed the association of rs164390 with Her2 status in any tumor. The SNP rs164390 is located in the 5'-UTR of the gene CCNB1. 5'-UTR has been mainly implicated in translational control, affecting all post-transcriptional stages such as mRNA stability, folding, and interactions with the ribosomal machinery [38,39]. Nevertheless, there is also the Table 4. Association analysis of the clinicopathological parameters in relation to event-free survival of breast cancer patients (n=1005).  For CDK1, 6 tSNPs were genotyped, being rs2448343, rs3213048, rs3213067, rs1871446, rs10711 and rs1060373. The first five SNPs were constructed as a 5-SNP haplotype block in our control population. We found that the minor alleles in rs2448343 and rs1871446 were significantly associated with low BC risk. Multiple logistic regression analysis including these 2 SNPs in the full model indicated that rs1871446 had a stronger effect on reducing BC risk than rs2448343. These 2 SNP showed a dose-dependent effect on the BC risk (P trend =0.0002). Haplotype analysis also indicated that ATATT, which contained two minor alleles of both rs2448343 and rs1871446, showed significant association with low BC susceptibility. For the gene-gene and gene-environment interaction analysis, some previous studies used the method of   multifactor dimensionality reduction (MDR) [40,41]. The method of MDR is a non-parametric, genetic model-free method for overcoming the limitation of small sample size. As we had enough samples in our study, the stratified association analysis was used to test the interaction between tSNPs and clinical parameters on BC risk. We observed a significant joint effect of rs2448343 and BMI status on BC susceptibility: compared with women with AG or AA genotype of rs2448343 or BMI<23 status alone, women with both AG or AA genotype of rs2448343 and BMI<23 status had less BC susceptibility. The SNP rs2448343 is located in intron region of CDK1, which may influence the disease risk by affecting mRNA expression levels, alternative splicing, mRNA structure and mRNA stability [42,43]. The SNP rs1871446 is located in the 3'-UTR of CDK1, which is essential in mRNA stability [44,45] and localization [46]. 3'-UTR may also be the binding site of miRNA. Our result also indicated that three closely located SNPs, rs2448343, rs3213048 and rs3213067, were significantly associated with tumor's PR status: the heterozygotes of rs2448343 were associated with PR-positive tumors, while minor allele homozygotes of rs3213048 and heterozygotes of rs3213067 were associated with PR-negative BC tumors. Haplotype analysis indicated that patients with GTACG, ATACT, and GTATT were more likely to develop PR-positive BC tumors compared with common haplotype GCACG. Besides, diplotype analysis indicated that GTACG/ATATT were associated with less aggressive tumors such as negative lymph nodes, size ≤2cm tumors, and clinical stage 0-I tumors; while GTGCT/ ATACT was found to be associated with less aggressive tumors such as ER positive-tumors or PR-positive tumors compared to the common diplotype GCACG/GTACG, which is consistent with the survival analysis results in which the patients harboring diplotypes of GTACG/ATATT and GTGCT/ ATACT had a favorable event-free survival. In survival analysis, H Ma and colleague genotyped 3 SNPs of CDK1 including rs2127355, rs2170006 and rs1871446, but no significant association between these SNPs and NSCLC survival was observed [19]. In our study, the minor allele homozygotes TT of rs1871446 had an unfavorable breast carcinoma survival under recessive model. Diplotype analysis also proved that ATATT/ATATT, which carried the minor allele homozygotes of rs1871446, had a negative impact on eventfree survival.
In summary, in CCNB1, rs2069429 and diplotype TAGT/ TAGT were associated with increased BC susceptibility. In CDK1, rs2448343 and rs1871446 were associated with decreased BC risk, so was the CDK1 haplotype ATATT. These two SNPs also showed a dose-dependent effect on the BC susceptibility. Notably, the minor allele homozygote of rs1871446 was associated with unfavorable event-free survival, so was the diplotype ATATT/ATATT in CDK1. Nevertheless, these results need to be verified in other populations. Functional studies are also needed to determine how these SNPs influence the BC susceptibility and event-free survival.