Genetic Variation in the Base Excision Repair Pathway, Environmental Risk Factors, and Colorectal Adenoma Risk

Cigarette smoking, high alcohol intake, and low dietary folate levels are risk factors for colorectal adenomas. Oxidative damage caused by these three factors can be repaired through the base excision repair pathway (BER). We hypothesized that genetic variation in BER might modify colorectal adenoma risk. In a sigmoidoscopy-based study, we examined associations between 182 haplotype tagging SNPs in 14 BER genes, and colorectal adenoma risk, and examined their potential role as modifiers of the effect cigarette smoking, alcohol intake, and dietary folate levels. Among all individuals, no statistically significant associations between BER SNPs and adenoma risk persisted after correction for multiple comparisons. However, among Asian-Pacific Islanders we observed two SNPs in FEN1 and one in NTHL1, and among African-Americans one SNP in APEX1 that were associated with colorectal adenoma risk. Significant associations were also observed between SNPs in the NEIL2 gene and rectal adenoma risk. Three SNPS modified the effect of smoking (MUTYH interaction p = 0.002; OGG1 interaction p = 0.013); FEN1 interaction p = 0.013)), one SNP in LIG3 modified the effect of alcohol consumption (interaction p = 0.024) and two SNPs in LIG3 modified the effect of dietary folate (interaction p = 0.001 and p = 0.08) on colorectal adenoma risk. These findings support a role for genetic variants in the BER pathway as potential modifiers of colorectal adenoma risk. Our findings strengthen the role of oxidative damage induced by key lifestyle and dietary risk factors in colorectal adenoma formation.


Introduction
Colorectal cancer (CRC) is the third most common cancer diagnosed in men and women and is the third leading cause of cancer death in the United States [1]. In 2011, there were approximately 141,210 individuals diagnosed with CRC and 49,380 deaths from this disease in the United States [1]. More than 80% of CRC evolve from neoplastic adenomatous polyps or adenomas [2]. Prevalence of adenomas is estimated at 30% at age 50 and 40% by age 60 [3]. Compared to small (,1 cm) adenomas, larger ones ($ 1 cm) have a greater potential to grow and progress into CRC [4]. Use of endoscopy screening with polyp removal has significantly reduced CRC incidence and mortality [5,6], reinforcing the pathogenetic relationship between adenomas and CRC. Therefore, the identification of risk factors for adenoma development has significant public health implications.
Cigarette smoking [7,8] and folate deficiency [9] are established adenoma risk factors, whereas folate supplementation may promote progression of established adenomas [10]. Cigarette smoke contains reactive oxygen species (ROS) and chemical carcinogens [11] which can damage DNA [11,12]. Additionally, smoking has been associated with decreased folate levels [13], which may contribute to folate deficiency, which is associated with double strand breaks and abasic sites [14].
Alcohol consumption is currently considered a convincing CRC risk factor [15,16]. Increasing alcohol intake has previously been shown to reduce folate levels [17,18]. Alcohol metabolism generates acetaldehyde, a known mutagen which generate ROS in the colon lumen [19,20], directly inhibits O 6 methylguaninemethyl-transferase, an enzyme involved in removal of alkylationinduced DNA damage, and leads to hyper regeneration of crypt cells which induces multiple forms of DNA damage [19,21]. In addition, alcohol metabolism can induce cytochrome P450 enzymes, which can increase activation of chemical pro-carcinogens [19,22,23].
Overall, smoking, alcohol consumption and decreased folate availability can lead to oxidative DNA damage and excess uracil misincorporation, which may all contribute to adenoma develop-ment. The base excision repair (BER) pathway, is the predominant mechanism for repair of oxidative DNA damage [24]. We hypothesized that genetic variation in BER genes may modify the risk of colorectal adenomas, in particular, in combination with smoking, alcohol, and low dietary folate. We examined potential associations between single nucleotide polymorphisms (SNPs) in genes involved in BER and distal adenoma risk, and considered their role as potential modifiers of the effect of cigarette smoking, alcohol consumption, and dietary folate intake. Using data and samples from a sigmoidoscopy-based study conducted in Los Angeles County we comprehensively investigated associations between genetic variation in 14 BER genes and colorectal adenoma risk.

Ethics Statement
The research protocol was approved by the institutional review boards at both USC and Kaiser Permanente, and all subjects signed a written informed consent approved by both institutions.

Study Subjects
Study participants were enrolled in a University of Southern California/Kaiser Permanente study of risk factors for colorectal adenomas. All individuals were examined by flexible sigmoidoscopy from 1991 to 1993 (phase I) and from 1993 to 1995 (phase II) at one of two southern California Kaiser Permanente clinics (Bellflower or Sunset), and were recruited using identical criteria, as we have previously described [25,26]. Briefly, cases were individuals with a first-time diagnosis of a histologically confirmed adenoma. Controls were selected from the remaining eligible individuals who did not present with polyps at sigmoidoscopy examination and had no past history of histologically confirmed adenomas. Controls were individually matched to cases by gender, age (within 5 years), sigmoidoscopy date (within 3 months), and Kaiser Permanente clinic. All subjects signed a written informed consent approved by the Institutional Review Board, donated a blood sample, and completed two questionnaires. A risk factor questionnaire was administered during an in-person interview, and collected data on demographics, smoking history, family history of cancer, physical activity and other factors described previously [27,28]. A semi-quantitative food frequency questionnaire was administered in reference to diet during the year before sigmoidoscopy examination, as previously described [29].

SNP Selection and Genotyping
TagSNPs for each BER gene were selected using Haploview Tagger [30] based on the HapMap CEPH (CEU) population using the following criteria: minor allele frequency (MAF) $5%, pairwise r 2 $0.95, and a distance from the closest SNP greater than 60 base pairs on the Illumina platform. Linkage disequilibrium (LD) blocks were defined using data from HapMap data release #16c.1, June 2005, on NCBI B34 assembly, dbSNP b124. For each gene, we included the 59-and 39-most SNP within the LD block within ,10 kb upstream and ,5 kb downstream. In regions of no or low LD, tagSNPs with a MAF $5% at a density of , 1 per kb were selected from either HapMap or dbSNP. In this analysis, we report our results for 182 tagSNPs across 14 genes that participate in the BER pathway (Table 1 and Table S1). TagSNPs were genotyped on the Illumina GoldenGate platform, as we previously described [31]. Among the BER genes in our study, we required that all 182 tagSNPs have call rates $0.90 and that we not observe evidence of statistically significant deviations of observed from expected values, when assuming Hardy-Weinberg equilibrium $0.00027 using exact tests (Bonferronicorrected p value; a = 0.05/182). All tagSNPs met these two requirements. Additionally, we required all individuals had overall call rates $90%, which led to exclusion from analyses of 89 individuals. We included in final analyses genotype data from 1,368 (94%; 677 cases and 691 controls) of 1,457 total subjects in this study. Demographic and matching characteristics for individuals with and without genotype data were not statistically significantly different.

Statistical Analysis
SNP main effect analyses. Deviations of observed genotype frequencies from those expected when assuming HWE were examined among controls by race/ethnicity using exact tests. We used unconditional logistic regression, assuming a log-additive model, to estimate per-allele odds ratios (ORs) and corresponding 95% confidence intervals (95% CIs) for the association between genotype and adenoma risk, adjusting for race (Non-Hispanic Whites, Latinos, African-American, Asian/Pacific Islander), study phase (phase I/phase II), age at sigmoidoscopy (continuous), gender, exam date, and clinic (Bellflower or Sunset). In this study, similar results are obtained when using unconditional logistic regression adjusting for the matching factors as using conditional logistic regression, with the benefit of using all genotype information in the study population [26]. Additional control for the following adenoma risk factors did not change ORs by more than 10%; therefore, they were not included in final analyses: alcohol intake (g/day), smoking status, dietary folate intake (mcg/ day), body mass index, multivitamin use (yes/no), total caloric intake, total dietary fiber (g/day).
When testing SNP main effects we corrected for multiple testing within each gene region as well as across all BER gene regions investigated. Specifically, for each tagSNP obtained under a logadditive model, p-values were corrected for multiple testing, taking into account multiple correlated tests due to LD between SNPs within each gene region, using the P ACT (p-value adjusted for correlated tests) method implemented in the P ACT R package [32]. Additionally, we corrected for multiple testing between gene regions by applying a Bonferroni correction multiplying each P ACT p-value by the 14 investigated BER gene regions to determine overall pathway significance (p pathway ) [33]. We assessed potential heterogeneity of SNP main effects by race/ethnicity (among tagSNPs with MAF .5% in controls for each race/ ethnicity), by performing 3df likelihood-ratio tests of interaction between genotypes and race/ethnicity. We used multinomial logistic regression to examine differences in SNP main effects by adenoma location (rectal versus left colon) and adenoma size (,1 cm) versus $1 cm) with respect to the control group. SNP main effect p-values for stratified analyses were corrected using P ACT as described above. When considering linkage between different SNPs we used the square of the correlation coefficient (R 2 ) to estimate pairwise LD using Haploview [30].
TagSNP 6 environmental risk factor interactions. We investigated whether BER genes tagSNPs modify the effect of alcohol use, dietary folate intake, or smoking in all individuals and in non-Hispanic Whites (NHW) only. We categorized smoking variables using median values among smoking controls to create the following variables: smoking status (never, quit, current), years of smoking (0, #26 years, .26 years), pack-years smoked (0, #21 pack years, .21 pack years). We also considered the following alcohol intake variables: number of drinks per day (never, #1 drink/day, .1 drink/day) and median daily alcohol intake (using median intake among drinking controls as a cut point: 0, #6 g/d, .6 g/d), with one alcoholic drink per day defined as approxi-mately 15 grams of ethanol. We considered a dietary folate intake variable (low/medium/high) defined using tertiles of dietary folate intake among controls as cut points: #267 mcg/day, .267-387 mcg/day, $388 mcg/day).
Analyses of gene-environment interactions (GxE) were conducted by testing interaction terms between tagSNPs (assuming a log-additive model) and each environmental exposure using likelihood ratio tests based on unconditional logistic regression. To reduce the multiple testing burden and avoid possible failure of asymptotic tests, any tagSNPs for which a genotypic category count was less than 10 for any of the environmental factor strata considered was not included in analyses. Gene-exposure interactions were mutually adjusted for all three exposures considered. Further adjustment for body mass index, multivitamin use (yes/ no), total caloric intake, or total dietary fiber (g/day) produced almost identical estimates; therefore we did not keep these variables in the models.
We corrected for multiple testing by applying two Bonferroni corrections to the crude interaction p-values which we report separately: first, we applied a within gene region Bonferroni correction (interaction p gene ), by considering the number of tagSNPs in its respective gene region; second, we applied an overall pathway Bonferroni correction (interaction p pathway ), by considering the 14 investigated BER gene regions. Statistical significance was declared if either corrected p-values were ,0.05. All tests conducted were two sided and all statistical analysis were conducted using Stata 11 SE (Stata Corporation, College Station, TX) and the R programming language (The R Project for Statistical Computing, http://www.r-project.org).

Results
Demographic characteristics of all cases (N = 721) and controls (N = 736) in our study are summarized in Table 2. As we previously described [25], there were no differences in age, gender, ethnicity, alcohol intake and smoking patterns between Phase I and Phase II participants. However, Phase II participants had higher dietary folate intake than those who participated in phase I (p = ,0.001). Fifty-two percent of enrolled subjects were NHW. The mean age of cases was 61.46 years (66.75) and the mean age of controls was 61.67 years (66.88). Cases smoked longer and more intensely than controls and were more likely to be current smokers (p,0.001). Cases were also found to have a lower mean dietary folate intake (mcg/d) and a lower mean dietary fiber intake (g/d) than controls (p = 0.013; p = 0.036). Approximately 81% of adenomas were colon adenomas and approximately 67% were small adenomas (,1 cm).

BER Genes and Colorectal Adenoma Risk
Among all individuals combined, out of the 182 tagSNPs only NEIL2 rs11785481 showed a statistically significant association with adenoma risk; however, it was not statistically significant after multiple comparisons adjustment within gene region (P ACT ) (OR = 0.70; 95%CI = 0.55-0.90; p = 0.006; P ACT = 0.140) ( Table 3).

BER Genes and Adenoma Risk Taking into Account Adenoma Size and Location
We did not find evidence of any statistically significant per-allele associations within either the small polyp (,1 cm) or large polyp ($1 cm) group after multiple comparisons adjustment within gene region (P ACT ). When adenomas location (colon versus rectum), we observed two NEIL2 tagSNPs were associated with increased risk for rectal adenomas: NEIL2 rs7015453 (OR = 1.72; 95%CI = 1.24-2.39; P ACT = 0.025; p heterogeneity = 0.003) and rs3757949 (OR = 1.58; 95%CI = 1.18-2.13; P ACT = 0.044; p heterogeneity = 0.004) ( Table 4). These two tagSNPs were not found to be in LD among NHW (r 2 = 0.11), among whom we observed similar findings (data not shown). Complete data for analysis by adenoma location can be found in Table S3.

Genetic Variation in BER Genes, Adenoma Risk and Alcohol
Statistically significant interactions were observed for LIG3 rs1052536 and amount (0 g/day, #6 g/day, .6 g/day; interaction p gene = 0.019, interaction p pathway = 0.241) and frequency of alcohol intake (never, 1 drink/day, .1 drink/day; interaction p gene = 0.029, interaction p pathway = 0.345) ( Table 5). Specifically, among carriers of two major (C) alleles, those who drank more than one drink per day had an 84% increased risk of adenoma compared with non-drinkers (OR = 1.84; 95%CI = 1.09-3.11; p = 0.022; p for trend = 0.024), and those who drank more than 6 grams per day had a 77% increased risk of adenoma compared with non-drinkers (OR = 1.77; 95%CI = 1.17-2.68; p = 0.006; p for trend = 0.003). There was no association between alcohol and adenoma risk for subjects with 1 minor (T) allele (Table 5). When restricting analyses to NHW we observed interactions for LIG3 rs1052536 that were of similar magnitude although not statistically significant (data not shown).

Genetic Variation in BER Genes, Adenoma Risk and Dietary Folate Intake
We observed that the association between folate and adenoma risk was modified by two LIG3 tagSNPs (rs1052536 interaction p gene = 0.006, interaction p pathway = 0.081; rs3744358 interaction p gene = 0.032, interaction p pathway = 0.451) ( Table 5). For each of   Among individuals with 2 copies of the major alleles there was no statistically significant trend across increasing levels of dietary folate intake. When restricting analyses to NHW, similar statistically significant trends were observed for rs1052536 and rs3744358 (data not shown). There is no evidence of LD between LIG3 tagSNPs rs3744358 and rs1052536 among NHW.

Genetic Variation in BER Genes, Adenoma Risk and Smoking
No statistically significant interactions were observed when we considered smoking status (never/quit/current) or smoking duration, after applying within gene Bonferroni corrections. However, we observed that SNPs in three genes modified the association between smoking pack-years and adenoma risk, with statistically significant interaction tests that survived within gene Bonferroni correction. First, we observed smoking pack-years was modified by MUTYH rs10890324 (interaction p gene = 0.007, interaction p pathway = 0.091) ( Table 5). Whereas among individuals with two copies of the major (A) allele there was no association between increasing pack-years and adenoma risk (p for trend = 0.871), among carriers of one copy of the minor (G) allele, having smoked over 21 pack-years was associated with a 76% increased adenoma risk when compared to never smokers (OR = 1.76; 95%CI = 1.29-2.42; p,0.001; p for trend ,0.001). Among carriers of two minor (G) alleles, smoking more than 21 pack-years was associated with a 3-fold increased risk of adenomas (p for trend ,0.001). Analysis among non-Hispanic Whites was not performed due to sparse data.
Second, the OGG1 rs159153 SNP modified the association between smoking pack-years and adenoma risk (interaction p gene = 0.040, interaction p pathway = 0.517). While among individuals with two copies of the major (A) allele there was no association between increasing pack-years and adenoma risk (p for trend = 0.610), among carriers of one copy of the minor (C) allele, having smoked over 21 pack-years was associated with a 75% increased adenoma risk when compared to never smokers (OR = 1.75; 95%CI = 1.26-2.44; p = 0.001; p for trend ,0.001). Among individuals carrying a second minor (C) allele, smoking over 21 pack-years was significantly associated with an almost 3fold increased risk of adenoma (p for trend ,0.001) when compared to never smokers. Significant trends of increasing adenoma risk among carriers of 1 and 2 minor alleles were observed with increasing smoking years, but there was no statistically significant evidence of interaction between OGG1 rs159153 and smoking years and adenoma risk.
Finally, the FEN1 rs108499 SNP also modified the association between smoking pack-years and adenoma risk (interaction p gene = 0.039, interaction p pathway = 0.507). Among individuals with two copies of the major (C) allele smoking more than 21 packyears was associated with a 2-fold increased risk of adenomas compared to never smokers (p for trend ,0.001) but there was no association among individuals with either one copy of the minor (T) allele or two copies of the minor (T) allele (Table 5). While a similar significant trend of increasing adenoma risk was observed with increasing smoking years, there was no statistically significant evidence of interaction. Analyses among NHW only were not performed due to sparse data.

Discussion
We investigated potential associations between 182 tagSNPs from 14 BER gene regions and their role in modifying the effects of smoking, dietary folate intake and alcohol consumption on colorectal adenoma risk. We observed statistically significant associations between colorectal adenoma risk and polymorphisms in the FEN1 gene (two tagSNPs) and NTHL1 gene (one tagSNP) among Asian-Pacific Islanders, and the APEX1 gene among African-Americans (one tagSNP). Significant associations were also observed for two unlinked tagSNPs in the NEIL2 gene and rectal adenoma risk. None of the six tagSNPs were found to modify the effects of dietary folate, or alcohol on adenoma risk. However, one of these six tagSNPs, FEN1 rs108499, was found to modify the effects of smoking on adenoma risk. Moreover, we observed evidence that SNPs in other BER genes modified the effect of smoking (MUTYH, OGG1), alcohol (LIG3), and dietary folate (LIG3) on colorectal adenoma risk. Overall, our findings support the hypothesis that oxidative damage induced by these exposures may play an important role in colorectal adenoma development.
Previous studies have investigated BER gene polymorphisms and adenoma risk [25,26,34,35,36,37,38]. With one exception [38], they have been limited to a handful of candidate SNPs within XRCC1, PARP1, OGG1, and APEX genes. Even fewer considered the modifier role of alcohol, smoking or dietary folate intake [26,37,38]. Therefore, to our knowledge, this is the first comprehensive examination of the BER pathway and colorectal adenoma risk taking into account relevant environmental risk factors.
We report a two-fold increased adenoma risk associated with SNP rs17111750 located 59-upstream of APEX1, only among African-Americans. The human APEX1 protein is an apurinic/ apyrimidinic endonuclease that repairs DNA damage caused by oxidative and alkylating agents [39]. A previous study has reported a positive association between this SNP and colorectal adenoma, and another one reported lack of association with prostate cancer risk [38,40]. Among African-Americans APEX1 rs17111750 is not in LD (r 2 ,0.05) with two other APEX1 SNPs, rs1048945 and rs1130409, previously reported by us and others to be associated with CRC and adenoma risk [37,41,42,43].
We observed that the minor alleles of the FEN1 SNPs rs509360 and rs108499 were associated with a two-fold increased and decreased risk of adenoma, respectively, among Asian-Pacific Islanders. Moreover, among all subjects combined, rs108499 was found to modify the effect of smoking on adenoma risk. Both rs509360 and rs108499 are located 59-upstream of FEN1, occurring within intronic regions of C11orf9. A previous study among predominantly NHW found no associations between these FEN1 SNPs and colorectal adenoma [38]. The FEN1 endonuclease is involved in BER and DNA replication and has been reported to play important roles in genomic stability [44], chronic inflammation, autoimmunity and cancers [45,46]. In two studies conducted among Chinese populations, polymorphisms in the FEN1 promoter and 39-UTR were associated with reduced FEN1 expression, increased DNA damage and increased risks for lung and CRC [47,48].
The NTHL1 rs2516781 SNP was associated with a two-fold decreased risk of adenoma only among Asian-Pacific Islanders. The NTHL1 protein has DNA N-glycosylase activity as well as apurinic and/or apyrimidinic endonuclease activity [49,50,51]. The rs2516781 SNP is 39-downstream of the NTHL1 gene, an intronic SNP within the solute carrier family 9 (sodium/hydrogen exchanger), member 3 regulator 2 (SLC9A3R2) gene. SLC9A3R2 is involved in regulation of SLC9A3, the sodium/hydrogen exchanger involved in intestinal sodium absorption [52]. This SNP has not been previously assessed in studies of colorectal adenoma or CRC.
Two NEIL2 SNPs, rs11785481 and rs3757949, were associated with risk for rectal adenomas. Among NHW these two SNPs are not in LD (r 2 ,0.1). The NEIL2 protein has DNA glycosylase activity and apurinic/apyrimidinic endonuclease activity [39]. The rs3757949 SNP, is an intronic SNP located within GATA4, a gene that codes for a transcription factor relevant for gene expression in gastrointestinal epithelium [53,54]. Promoter hyper-methylation and the subsequent transcriptional silencing of GATA4 are commonly seen in CRC cell lines and primary CRCs [55].
We found evidence that the MUTYH rs10890324 SNP, which flanks the 39-end of MUTYH, modified the association between smoking and adenoma risk. The MUTYH gene encodes a DNA glycosylase, and germline mutations in highly conserved residues of the MUTYH gene predispose individuals to MUTYH-associated polyposis coli [56] as well as sporadic CRC [57,58]. We also found that the OGG1 rs159153 SNP, located 59-upstream of OGG1, modified the association between smoking and adenoma risk. The 8oxoguanine glycosylase 1 (OGG1) enzyme can excise the highly mutagenic 8-oxoguanine lesions induced by ROS and normal cellular metabolism [59]. The more commonly investigated OGG1 Ser326Cys SNP (rs1052133) is not in LD with rs159153 among NHW (r 2 = 0.08).
Finally, we found evidence that the LIG3 rs1052536 SNP modified the association between alcohol intake and adenoma risk. In addition, LIG3 SNPs rs1052536 and rs3744358 modified the association between dietary folate intake and adenoma risk. LIG3, a ligase involved in maintaining genomic integrity, encodes proteins present in the mitochondria and nucleus.
Our study had several strengths, including the relatively large sample size and the use of a comprehensive tagSNP approach to thoroughly investigate genetic variation in 14 BER genes. Among the weaknesses of our study is the fact that results are only applicable to the sigmoid colon and rectum, and the fact that cases had colonoscopy performed but not controls; therefore, some controls might have had more proximal polyps out of reach of sigmoidscope [60]. Even though we conducted a large number of tests as part of our analyses, we believe our approach for multiple comparisons adjustment in the SNP main effect analyses and the gene environment analyses is a conservative method of reducing the number of false positive results.
In conclusion, our findings suggest that genetic variation in noncoding BER gene regions can modify the risk of adenoma, particularly in combination with key environmental exposures. These findings highlight a relevant role for oxidative damage induced by these environmental exposures in colorectal adenoma development.

Supporting Information
Table S1 Genes/SNPs included in study and minor allele frequencies. (XLSX)