The Relationship between Seven Common Polymorphisms from Five DNA Repair Genes and the Risk for Breast Cancer in Northern Chinese Women

Background Converging evidence supports the central role of DNA damage in progression to breast cancer. We therefore in this study aimed to assess the potential interactions of seven common polymorphisms from five DNA repair genes (XRCC1, XRCC2, XRCC3, XPA and APEX1) in association with breast cancer among Han Chinese women. Methodology/Principal Findings This was a case-control study involving 606 patients diagnosed with sporadic breast cancer and 633 age- and ethnicity-matched cancer-free controls. The polymerase chain reaction - ligase detection reaction method was used to determine genotypes. All seven polymorphisms were in accordance with Hardy-Weinberg equilibrium in controls. Differences in the genotypes and alleles of XRCC1 gene rs25487 and XPA gene rs1800975 were statistically significant between patients and controls, even after the Bonferroni correction (P<0.05/7). Accordingly, the risk for breast cancer was remarkably increased for rs25487 (OR = 1.28; 95% CI: 1.07–1.51; P = 0.006), but decreased for rs1800975 (OR = 0.77; 95% CI: 0.67–0.90; P = 0.001) under an additive model at a Bonferroni corrected alpha of 0.05/7. Allele combination analysis showed higher frequencies of the most common combination C-G-G-C-G-G-G (alleles in order of rs1799782, rs25487, rs3218536, rs861539, rs1800975, rs1760944 and rs1130409) in controls than in patients (PSim = 0.002). In further interaction analysis, two-locus model including rs1800975 and rs25487 was deemed as the overall best model with the maximal testing accuracy of 0.654 and the cross-validation consistency of 10 out of 10 (P = 0.001). Conclusion Our findings provide clear evidence that XRCC1 gene rs25487 and XPA gene rs1800975 might exert both independent and interactive effects on the development of breast cancer among northern Chinese women.


Introduction
Breast cancer is the most common invasive cancer in women, and like other forms of cancer it results from multiple hereditary and environmental modulators, possibly in an interactive manner. Many risk factors such as ionizing radiation and alcohol consumption have been established to account for approximately 30% of breast cancer patients [1]. Family studies found that the risk for those with first-degree relatives of affected individuals is more than two times higher than the risk of general population [2,3], confirming a strong genetic component underlying the etiology of breast cancer [4]. Pasche et al have written an excellent review on the genetic underpinnings of breast cancer [5]; however, to determine many genes and which genetic determinants are actually involved in the pathogenesis of breast cancer remains an interpretive challenge.
Evidence is converging supporting the central role of DNA damage in progression to breast cancer. In fact, exposure to ionizing radiation, which can cause double-strand DNA breaks, increased the risk of developing breast cancer [6,7]. In-vitro studies also observed that radiation-induced damage can remarkably reduce the repair proficiency of DNA double-stranded breaks in breast cancer patients [8]. As such, it is reasonable to hypothesize that deficiency in DNA repair proteins induced by genetic mutations can initiate or aggravate the development of breast cancer. However, there is a general impression that most published studies assessing the relationship between DNA repair genes and breast cancer risk have often focused on a single gene or a single polymorphism, but overlooked the potential gene-to-gene interactions, a ubiquitous phenomenon in human genetics. Accordingly, we conducted the present study to assess the genetic interactions of seven common polymorphisms from five DNA repair genes in association with breast cancer among Han Chinese women.

Study participants
In total, 1239 study participants were enrolled on a hospitalbased design from Chengde city, Hebei province, China. Approval of this study was obtained from the Ethics Committee of Chengde Medical College, and each participant read and signed the informed consent before entering this study, which was carried out according to the principles of the Declaration of Helsinki.
All breast cancer patients who had no prior history of any cancers and reported no family history of breast cancer were for their first time diagnosed as invasive ductal carcinoma based on pathological confirmation, and then they received surgical intervention plus adjuvant chemotherapy at the Affiliated Hospital of Chengde Medical College. Clinical information of breast cancer was obtained via a full clinical examination by specialists. All controls were women who underwent breast cancer screening and were clinically confirmed to be free of breast cancer at the same hospital, and they had a negative history of all forms of cancer in their first-degree relatives. All study participants were genetically unrelated women of Han Chinese descent who were consecutively recruited from the Affiliated Hospital of Chengde Medical College between September 2009 and March 2013.
All study participants were classified into two study groups: the breast cancer group and the cancer-free control group. Overall, 606 patients 54.36 (standard deviation: 12.33) years of mean age were diagnosed with sporadic breast cancer, and the rest 633 participants who had no manifest of cancers formed the age-and ethnicity-matched control group with mean age of 55.15 (standard deviation: 9.38) years.
At enrollment, baseline data on age, family history of cancers, age at menarche and menopausal status were recorded. Moreover, additional data on tumor size (from T1 to T4), tumor grade (from G1 to G3), and lymph node (positive or negative) were exclusively presented for breast cancer patients.
Genotyping EDTA blood samples were obtained from all study participants at the time of enrollment. Genomic DNA was isolated from peripheral blood leukocytes by using TIANamp Blood DNA Kit (Tiangen Biotect Co., Beijing, China), and then was stored at 240uC until required for batch genotyping. The polymerase chain reaction-ligase detection reaction (PCR-LDR) method [16] was adopted to determine the genotypes of seven examined polymorphisms in this study.
To discriminate specific bases of each polymorphism, we synthesized two specific probes and one common probe, and labeled the common probe 6-carboxy-fluorescein (FAM) at the 39 end and phosphorylated at the 59 end. The multiplex ligation reaction was conducted in a volume of 10 ml containing 2 ml of PCR product, 1 ml of 106Taq DNA ligase buffer, 1 mM of each discriminating probe, and 5 U of Taq DNA ligase. After ligation reaction, 1 ml of LDR reaction product was mixed with 1 ml of ROX passive reference and 1 ml of loading buffer before being denatured at 95uC for 3 min and chilled rapidly on ice. The fluorescent products of the LDR were differentiated using an ABI 3730XL sequencer (Applied Biosystems, California, USA).
To test the accuracy of the PCR-LDR method, 48 DNA samples were randomly selected and run in duplicates with 100% concordance.

Statistical analysis
Continuous and categorical variables were compared between breast cancer patients and controls by the unpaired t-test and the x 2 test, respectively. A Pearson goodness-of-fit test was conducted to assess the Hardy-Weinberg equilibrium. Binary Logistic regression models were used to evaluate the additive (major homozygotes versus heterozygotes versus minor homozygotes), dominant (major homozygotes versus heterozygotes plus minor homozygotes), and recessive (major homozygotes plus heterozygotes versus minor homozygotes) models of inheritance after controlling for age at enrollment, and risk estimates were expressed as odds ratio (OR) and 95% confidence interval (95% CI). The statistical analyses described above were completed with the SAS software for Windows (version 8.1) (SAS Institute, Cary, North Carolina, USA). Statistical power was estimated by PS (Power and Sample Size Calculations) software (version 3.0.7, Nashville, TN, USA).
Analysis of allele combinations was adopted to examine the joint effect of seven polymorphisms on breast cancer risk, and their frequencies were estimated by the haplo.em program implemented in Haplo.stats software (version 1.4.0, Rochester, MU, USA). The haplo.em program computes the maximum likelihood estimates of allele combination probabilities using the progressive insertion algorithm which progressively inserts batches of loci into the allele combinations of growing lengths. To avoid false-positive results, only allele combination with frequency of over 3% in all study participants was considered in this analysis. P values were calculated based on 1000 simulations.
To explore the potential interactions of multiple polymorphisms of DNA repair genes, a promising data-mining open-source approach multifactor dimensionality reduction (MDR) was employed (version 3.0, available at the website http://www. epistasis.org) [17,18]. This approach aims to identify the overall best combination of all quantities (from one locus to seven loci). The accuracy of each best model was evaluated by a Bayes classifier in the context of 10-fold cross-validation. A single best model has the maximal testing accuracy and cross-validation consistency simultaneously. The cross-validation consistency is a measure of the number of times of 10 divisions of the dataset that the best model is extracted. Permutation testing corrects for multiple testing by repeating the entire analyses on 1000 datasets that are consistent with the null hypothesis.

Baseline characteristics
Details of the study population are shown in Table 1. Age at enrollment did not differed significantly between breast cancer patients and controls (P = 0.205). The percentage of family history  Table 2 shows the genotype and allele comparisons of seven polymorphisms under study between patients and controls and their risk prediction for breast cancer under three genetic models of inheritance. No deviation from Hardy-Weinberg equilibrium was noted in controls for all polymorphisms. The genotypes and alleles of XRCC1 gene rs25487 and XPA gene rs1800975 differed significantly between the two groups, even after the Bonferroni correction (Bonferroni significance threshold P = 0.05 divided by the total number of examined polymorphisms (n = 7): P = 0.007). The power to reject the null hypothesis of no differences in mutant allele frequencies of rs25487 and rs1800975 between the two groups was 81.9% and 92.8%, respectively. There was marginal significance in the alleles of XRCC3 gene rs861539 (P = 0.037) and in the genotypes of APEX1 gene rs1130409 (P = 0.026), while no significance was attained after the Bonferroni correction.
In addition, given that 10.56% of breast cancer patients (n = 64) had reported a family history of other forms of cancer, further subgroup analysis was conducted by excluding these 64 breast cancer patients in order to eliminate the potential confounding of family history (Table S1). Overall there were no material changes for comparative results and risk estimates of all polymorphisms under study, except for a slight change for rs851539 in XRCC3 gene. The association of this polymorphism with breast cancer was slightly substantiated (P for x2 test: 0.036 for genotype and 0.010 for allele), while no significance was reached after applying the stringent Bonferroni correction (Bonferroni significance threshold P = 0.05/7).

Allele combination analysis
The joint effect of seven polymorphisms under study from five DNA repair genes is summarized in Table 3. The most common allele combination was C-G-G-C-G-G-G (alleles in order of rs1799782, rs25487, rs3218536, rs861539, rs1800975, rs1760944 and rs1130409), which was overrepresented in controls (6.66% versus 2.96% in breast cancer patients, P Sim = 0.002, significant at a Bonferroni corrected alpha of 0.05/11). Except for this combination, there was no observable significance for the other allele combinations.

Interaction analysis
A data-mining analytical approach MDR was adopted to explore the potential interactions of multiple polymorphisms of five DNA repair genes, and the results are summarized in Table 4. Each overall best model of all quantities is weighed by testing accuracy and cross-validation consistency. Overall, the two-locus model including rs1800975 and rs25487 emerged as the best MDR model. This model had the maximal testing accuracy of 0.654 and the maximal cross-validation consistency of 10 out of 10, which was significant at 0.001, indicating that a model this good or better was observed one out of 1000 permutations and thus unlikely hinged on the null hypothesis of null association.

Discussion
In this study, we sought to explore the potential interactions of seven common polymorphisms of five DNA repair genes in association with breast cancer among 1239 Han Chinese women.
The key finding was that two polymorphisms, XRCC1 gene rs25487 and XPA gene rs1800975, might exert both independent and interactive effects on the development of breast cancer. This study, to the authors' knowledge, is the first report assessing the association of multiple DNA repair genes and polymorphisms, both individually and interactively, with breast cancer risk in Han Chinese women.
In view of the ubiquity of epistasis in determining susceptibility to common human diseases [19], to examine the interactions of Table 3. The allele combinations of seven polymorphisms under study between breast cancer patients and controls.  multiple genes in common pathologic pathways should be a priority. In such context, MDR has been developed as a promising data-mining approach for overcoming some limitations of traditional parametric statistics such as logistic regression for the detection and characterization of high-order gene-gene and geneenvironmental interactions [17,18]. This approach is nonparametric and model-free in design, and has been successfully applied to detect and characterize high-order gene-gene and geneenvironment interactions in studies with relatively small samples [20,21]. For the present study, application of MDR to breast cancer case-control data set identified a statistically significant twolocus best model from five DNA repair genes. It is not surprising to note that the two polymorphisms in overall best model were strikingly significant in our single-locus analysis, reinforcing the robustness of MDR approach. Moreover, the interactive role of these two polymorphisms was particularly evident in protection against the development of breast cancer, as our allele combination analysis indicated that the estimated frequencies of combinations were consistently higher in controls than patients for those carrying rs25487-G and rs1800975-G alleles, especially for the most common allele combination. Although empirical and theoretical studies have suggested that MDR is a useful method for identifying epistasis, the power of MDR in the presence of noise that is common to many epidemiological studies is unknowable. Furthermore, we cannot exclude the possible existence of residual confounding from the incompletely measured or unmeasured physiologic covariates. Considering the magnitude of risk estimates and the mutual validation of different analytical methods, it seems unlikely that our findings could be explained by confounding.
Although epidemiological studies on DNA repair genes and breast cancer risk have been undertaken extensively across different populations, the results are inconsistent and inconclusive. For example, Roberts et al in Caucasians observed an increased risk of XRCC1 gene rs25487 for breast cancer in postmenopausal women [22], which was consistent with the results of the present study, as well as a recent meta-analysis by Wu et al on 44 independent case-control studies [23]. However, Al Mutairi et al in Saudi patients failed to confirm this association, and instead they found that another polymorphism rs1799782 in XRCC1 gene was associated with the significant risk of breast cancer [24]. Besides the environmental and cultural divergences, it cannot be totally ruled out that the evolutionary history of linkage disequilibrium patterns will vary significantly across different ethnic populations. Generally, a locus is in close linkage with another nearly causal locus in one ethnic group but not in another [25]. As a consequence, there is a need to construct a database of breast cancer-susceptibility genes or polymorphisms in each racial/ethnic group. Also it is of clinical importance to incorporate joint and synergistic analytical strategies for the potential diseasesusceptibility genetic defects by scanning DNA repair genes to facilitate the identification of individuals at high risk of developing breast cancer in future clinical screening.
Several limitations of the present study merit consideration. First, this study was conducted on a retrospective case-control design, which has inherent drawbacks and precludes causal inferences [26]. Second, this study of 1239 participants might not be powered enough to address small risk effects. Third, due to our design flaw, some baseline data on age at menarche and menopausal status were not available for controls, as well as other reproducible risk factors for breast cancer, which prevented further adjustment in risk estimates and may have overestimated the true effect size. However, this lack of information is unlikely to affect the validity of our findings, because this study involved homogeneous breast cancer patients and well-matched controls. Fourth, only seven common polymorphisms from five DNA repair genes were evaluated in this study, and it is highly encouraged to incorporate other polymorphisms, especially the low-penetrance polymorphisms of DNA repair genes. Fifth, although MDR is a method to improve the identification of polymorphism combinations associated with disease risk, it is not without drawbacks, such as computational intensiveness, indistinct interpretation, lack of sensitivity, and heterogeneity-free assumption [27,28]. Last but not the least, because our study sample was entirely of Han Chinese ancestry, we avoided confounding by ethnicity but at the same time, we reduced the generalizability of our findings to other ethnic populations.
In conclusion, our findings provide clear evidence that XRCC1 gene rs25487 and XPA gene rs1800975 might exert both independent and interactive effects on the development of breast cancer. As breast cancer is a multifactorial complex disorder, large well-designed longitudinal studies attempting to account for highorder gene-gene and gene-environment interactions, as well as invitro and in-vivo studies seeking to provide biological or clinical implications of DNA repair genes in susceptibility to breast cancer, are required in future investigation.

Supporting Information
Table S1 Genotype distributions and allele frequencies of seven polymorphisms under study between breast cancer patients without a family history of other cancers and controls, as well as their risk prediction for breast cancer under three genetic models of inheritance. (DOC)