Polymorphisms in the ERCC5 Gene and Risk of Esophageal Squamous Cell Carcinoma (ESCC) in Eastern Chinese Populations

Background Excision repair cross complementing group 5 (ERCC5 or XPG) plays an important role in regulating DNA excision repair; its functional single nucleotide polymorphisms (SNPs) may alter DNA repair capacity and thus contribute to cancer risk. Methodology/Principal Findings In a hospital-based case-control study of 1115 esophageal squamous cell carcinoma (ESCC) cases and 1117 cancer-free controls, we genotyped three potentially functional SNPs of ERCC5 (SNPs, rs2296147T>C, rs2094258C>T and rs873601G>A) and estimated crude and adjusted odds ratios (ORs) and 95% confidence intervals (CIs) for their associations with risk of ESCC using unconditional logistic regression models. We also calculated false-positive report probabilities (FPRPs) for significant findings. We found that compared with the TT genotype, ERCC5 rs2296147 C variant genotypes were associated with a significantly lower ESCC risk (CT: adjusted OR = 0.76, 95% CI = 0.63–0.93, CT/CC: adjusted OR = 0.80, 95% CI = 0.67–0.96); however, this risk was not observed for the other two SNPs (rs2094258C>T and rs873601 G>A), nor in further stratification and haplotype analysis. Conclusions/Significances These findings suggested that ERCC5 polymorphisms may contribute to risk of ESCC in Eastern Chinese populations, but the effect was weak and needs further validation by larger population-based case-control studies.


Introduction
Esophageal cancer is the sixth leading cause of cancer-related death worldwide. In 2008, there were 482,300 new cases, and 406,800 patients died of this disease around the world. In the highest incidence areas that extend from northern Iran and through the central Asian republics to north-central China, esophageal squamous cell carcinomas (ESCC) account for 90% of the cases [1]. Although cigarette smoking, alcohol intake, nutritional deficiencies and dietary carcinogen exposure may contribute to the etiology of ESCC [2], only a small fraction of exposed individuals develop ESCC, suggesting that genetic susceptibility may also play a vital role in the etiology of ESCC [3,4]. Recently, genetic polymorphisms in genes involved in multiple biological pathways, including DNA repair, have been identified as potential risk factors for ESCC [5,6].
DNA repair is essential for maintaining genomic stability in response to the insults of environmental carcinogens that causes DNA damage; if left unrepaired, such DNA damage can lead to mutation fixation and initiation of carcinogenesis. To date, there are at least five known major DNA repair pathways, including more than 150 human DNA repair genes, among which the nucleotide-excision repair (NER) pathway is the most versatile and particularly important in association with cancer risk. This is because NER repairs several kinds of bulky and helix-distorting lesions in DNA, involving coordinated actions of damage detection and recognition by helicase and nuclease proteins [7]. More importantly, NER defects can lead to severe diseases, such as xeroderma pigmentosum, Cockayne syndrome and tri-chothiodystrophy, and some of these patients are prone to cancer [8,9]. The excision repair cross complementing group 5 (ERCC5) gene, also known as the xeroderma pigmentosum group G (XPG) gene, is an indispensable component of NER, which belongs to the flap structure-specific endonuclease 1 (FEN1) family. ERCC5 encodes a single-strand specific DNA endonuclease that cleaves the damaged DNA strand at the 39 end [10]. The protein may also possess some functions in other cellular processes, including RNA polymerase II transcription and transcription-coupled DNA repair [11,12]. Numerous studies indicate that a defective ERCC5 plays a pivotal role in the initiation of carcinogenesis and that its deficiency leads to DNA repair defects, genomic instability, and failure of gene transcription modulation [13][14][15]. To date, some studies have reported that polymorphisms in ERCC5 are associated with risk of cancers of the breast [16][17][18], lung [19][20][21], skin [22][23][24] and bladder [25,26], but there is only one published study of a nonsynonymous SNP (rs17655) and esophagus cancer in Caucasians with a small sample size [27]. Furthermore, several studies suggested that variations in ERCC5 transcript and protein levels as well as ERCC5 variant genotypes were associated with risk of squamous cell carcinoma of the head and neck (SCCHN) [28][29][30], which shares similar risk factors with ESCC. Therefore, we hypothesized that potential functional ERCC5 SNPs are associated with ESCC risk, and we tested this hypothesis in an Eastern Chinese population.

Study Subjects
This hospital-based case-control study included 1115 cases of ESCC and 1117 cancer-free controls. All subjects were genetically unrelated ethnic Han Chinese from Eastern China, including Shanghai, Jiangsu province and the surrounding regions. Patients newly diagnosed with histopathologically confirmed primary ESCC were recruited from Fudan University Shanghai Cancer Center (Shanghai, China) between March 2009 and September 2011. Blood samples of all patients were collected and processed by the Institutional Tissue Bank at Shanghai Cancer Center as a daily routine practice. Patients who had primary tumors outside the esophagus, tumors of an unknown origin, or any histopathologic diagnosis other than ESCC were excluded. Cancer-free controls were all recruited from a large prospective cohort -the Taizhou Longitudinal Study (TZL) in Taizhou, Jiangsu province, China, during the same period for the recruitment of cases [31]. Control subjects were frequency matched to the cases by age (65 years) and sex. The response rates for cases and controls were approximately 93.0 and 90.0%, respectively. Each participant was asked to sign a written informed consent or next of kin or guardians consented on the behalf of participants whose capacity to consent was reduced, before we interviewed each eligible participant to obtain data on demographic characteristics and environmental exposure history, such as age, sex, ethnicity, body mass index (BMI), smoking and alcohol consumption. Individual who smoked ,100 cigarettes in their lifetime were considered ''non-smokers'', and the others were considered ''smokers''. Similarly, subjects who drank alcoholic beverages at least once a week for $1 year were defined as ''drinkers'', and the others were ''non-drinkers''. Additionally, BMI was divided into two groups of '',24.0 and $24.0 kg/m 2 '' according to the recommendations of the Working Group on Obesity in China, in which the BMI cut-off point for Chinese is 24 kg/m 2 for overweight [32]. At the end of the interview, a 10-mL venous blood sample was collected from each subject. The research protocol was approved by the Institutional Review Board of the Shanghai Cancer Center, Fudan University.

SNP Selection
Because the published genome-wide association studies (GWASs) of ESCC included all tagging SNPs of DNA repair genes but did not report any of such SNPs to be positively associated with cancer risk, our strategy in the present study was to include functional SNPs of ERCC5, a candidate gene rarely investigated for ESCC, that were not included in previously published GWASs. We searched the NCBI dbSNP database (http://www.ncbi.nlm.nih.gov/) and SNPinfo (http://snpinfo. niehs.nih.gov/) for putatively functional SNPs of ERCC5 that are located in the regulatory region (i.e., the 59 near gene, 39 near gene, 59 untranslated regions (UTR) and 39 UTR) with a minor allele frequency (MAF) $5% in Chinese Han, Beijing (CHB) descendants. As a result, five SNPs (rs2094258, rs2016073, rs751402, rs2296147, and rs873601) were selected as previously described [33]. Although both rs2094258 and rs2016073 are predicted to have similar functions, rs2094258 has a higher allele frequency for CHB; therefore, we genotyped rs2094258 only but not rs2016073, although it is possible that rs2016073 may have some functionality that may result in different outcome in a larger association study. As a result, we genotyped three potentially functional SNPs (rs2094258C.T in the 59 near gene, rs2296147T.C in the 59 UTR, and rs873601G.A in the 39 UTR), of which the two 59 UTR SNPs were predicted to affect the transcription factor binding sites (TFBS) activity, and the 39 UTR SNP may have an impact on splicing and miRNA binding sites. These three selected SNPs also captured other 20 untyped SNPs in the nearby genes [33], including one non-synonymous SNP (rs17655) that is the only one common SNP with MAF .5% in CHB (LD for rs873601 and rs17655; r 2 = 0.910).

Genotyping
Genomic DNA was extracted from the buffy-coat fraction of the blood samples using the Qiagen Blood DNA Mini Kit (Qiagen Inc., Valencia, CA). All the three SNPs were genotyped using the Taqman real-time PCR method with a 7900 HT sequence detector system (Applied Biosystems, Foster City, CA). To ensure high genotyping accuracy, strict quality control procedures were implemented, and four duplicated positive controls and four negative controls (no DNA) were used in each of 384-well plates. Approximately 5% of the samples were repeatedly genotyped, and the results were 100% concordant.

Statistical Methods
Differences in frequency distributions of the selected demographic variables, risk factors and each of alleles and genotypes of the selected SNPs between patients with ESCC and control subjects were evaluated using the x 2 test. Univariate and multivariate logistic regression analyses were used to calculate crude and adjusted odds ratios (ORs) to estimate the risk of ESCC and 95% confidence intervals (95% CIs). Multivariate adjustments were made, where appropriate, for age, sex, BMI, and smoking and drinking status, in further stratification analysis of genotype data. The heterogeneity of associations between subgroups was estimated using the Chi-square-based Q test. The Hardy-Weinberg equilibrium for genotype distribution in controls was tested by a goodness-of-fit x 2 test. Haplotype frequencies and individual haplotypes were estimated using unphased genotype data. The haplotype of the highest frequency was used as the reference group to calculate ORs for haplotype associated with ESCC risk in logistic regression analysis.
We also calculated the false-positive report probability (FPRP) to detect the false-positive association findings. For all the significant results, a prior probability of 0.01 was assigned to detect an OR of 1.56 (for a risk effect) or 0.67 (for a protective effect) for an association with genotypes and alleles of each SNP. Only significant results with an FPRP value less than 0.20 were considered a noteworthy association. Statistical significance was established at a P,0.05, and all tests were two-sided and performed with the SAS software (version 9.1; SAS Institute, Cary, NC).

Population Characteristics
The final analysis included 1115 cases of confirmed ESCC and 1117 healthy controls, whose characteristics are summarized in Table 1. By the study design, there were no statistical differences in the distributions of age and sex between the cases (mean age of 60.4068.31 years with 80.81% males) and controls (mean age of 60.76610.26 with 77.98% males), indicating the frequency matching on age and sex was adequate (P = 0.110 and 0.100, respectively). However, the cases were more likely to be smokers (61.61% vs. 54.16%, P = 0.0004) and drinkers (44.30% vs. 32.59%, P,0.0001) than the controls. In addition, the cases exhibited a lower proportion of BMI $24.0 than the controls (43.23% vs. 61.24%, P,0.0001). Therefore, these variables were further adjusted for in subsequent multivariate logistic regression analyses.

ERCC5 Allele and Genotype Distributions and Association with ESCC Risk
The allele and genotype frequencies of the three selected SNPs in cases and controls are summarized in Table 2. All the observed genotype frequencies agreed with the Hardy-Weinberg equilibrium in the controls (P = 0.860 for rs2296147, P = 0.793 for rs2094258, and P = 0.601 for rs873601). The rs2296147 C allele and rs873601 A allele were less frequent among cases than among controls (0.18 vs. 0.21, P = 0.025 and 0.46 vs. 0.47, P = 0.812, respectively), whereas the rs2094258 T allele was more frequent among cases than among controls, though the difference was not statistically significant (0.39 vs. 0.38, P = 0.602). In the single-locus analyses, only genotype frequency of rs2296147 was significantly different between the cases and the controls (P = 0.016). After adjustment for age, sex, BMI and smoking, and drinking status in multivariate logistic regression analysis, rs2296147 variant C genotypes were associated with a significantly decreased risk of ESCC (adjusted OR = 0.76, 95% CI = 0.63-0.93, P = 0.006 for CT vs. TT and adjusted OR = 0.80; 95% CI = 0.67-0.96, P = 0.016 for CT/CC vs.TT). In addition, the rs2296147 variant C allele was borderline significantly associated with a lower risk of ESCC (adjusted OR = 0.87, 95% CI = 0.74-1.01, P = 0.066). However, there was no evidence of an association between other two SNPs (rs2094258 and rs873601) and overall ESCC risk.

Stratification Analysis of ESCC Risk by ERCC5 Polymorphisms
We further evaluated the association between ERCC SNPs and ESCC risk stratified by selected variables including age, sex, BMI, smoking status and drinking status. As shown in Table 3, the stratification analysis indicated that the protective effect of rs2296147 CT/CC genotypes was more evident in older subjects (adjusted OR = 0.75, 95% CI = 0.57-0.97), and never-drinkers (adjusted OR = 0.75, 95% CI = 0.59-0.95), BMI ,24.0 (adjusted OR = 0.77, 95% CI = 0.59-1.00), compared with the TT genotype. However, further homogeneity tests did not show statistical evidence for any differences (P.0.05) between every two strata, suggesting no risk modification by the variables under investigation. Additionally, no associations were observed between the genotypes of other two SNPs (rs2094258 and rs873601) and risk of ESCC in the stratification analysis, nor was there any statistical evidence for gene-environment interactions between the variant genotypes and these variables on risk of ESCC (data not shown).

ERCC5 Haplotypes and ESCC Risk
We found that the LD among these three SNPs of ERCC5 was incomplete in all control subjects (rs2094258 and rs2296147: r 2 = 0.163; rs2094258 and rs873601: r 2 = 0.438; rs873601 and rs2296147: r 2 = 0.102), suggesting that variant haplotypes may play a role in cancer risk. We inferred eight ERCC5 haplotypes, four of which had a frequency less than 0.05 and were merged into one group of ''minor haplotypes'' for further analysis. However, no ERCC5 haplotype in this study population was associated with ESCC risk, when the most common haplotype T-T-G was used as the reference (Table 4).
Finally, the FPRP values were calculated at different prior probability levels for all significant findings. For a prior probability of 0.01, assuming the OR for specific genotype or allele was 0.67, with statistical power of 0.918, 0.963 and 0.998, the FPRP values were 0.301, 0.451 and 0.721, respectively, for an association of the CT, CT/CC genotypes and C allele with a reduced risk of ESCC in all individuals ( Table 5).

Discussion
In the present case-control study, we investigated the associations between three common, putatively functional SNPs of the ERCC5 gene and the risk of ESCC in an Eastern Chinese population. When we evaluated each SNP separately, only the rs2296147 variant CT/CC genotypes were associated with significantly reduced ESCC risk, but this risk was not observed for the other two rs2094258C.T and rs873601G.A SNPs. To the best of our knowledge, this is the first study that investigated the association of these three ERCC5 SNPs with the ESCC risk.
Given the role of the ERCC5 gene in the NER pathway, it is biologically plausible that functional ERCC5 SNPs may contribute to ESCC susceptibility. To date, only three reported studies have investigated the association between rs2296147 and risk of cancers other than ESCC, of which one was conducted for SCCHN with 1059 white patients and 1066 healthy controls [30]; another included mixed populations with 722 cases of endometrial cancer and 726 cancer-free controls [34]; and the third was performed in a Chinese population with 1125 gastric cancer cases and 1196 healthy controls [33]. Although the rs2296147 C allele was associated with a non-significantly altered risk of cancers in these three reported studies of other cancers, the rs873601 A variant genotypes were found to be associated with an increased risk of gastric cancer [33]. For the rs2094258C.T SNP, only one study investigated its association with risk of SCCHN, but the result was null, which is in agreement with the finding in the present study of ESCC. The possible discrepancy of the results among different studies may be due to different genetic backgrounds and etiologies. For example, the frequency of the rs2296147 C allele in non-Hispanic White populations was 0.47 in the study by Ma et al. [30] but was 0.20 in the present study (in a Chinese population). Another possibility may be related to different etiologies of the primary tumors. Although some studies have shown that some sequence variants in the region of chromosome 5p15.33 and 8q24 are associated with risk of different cancer types [35][36][37][38][39], most published studies reported that the same SNP may be cancerspecific. It is likely that our present study still had a limited statistical power to detect a weak effect. For instance, for the sample size in the present study, we had only a 54.9% power to detect OR = 1.23 for rs873601 under a recessive model as reported by He et al. [33]. Meanwhile, greater FPRP values were observed for the significant associations between rs2296147 C variant genotypes and ESCC risk, which further suggests a relatively small sample size in the present study. Therefore, whether or not an association between ERCC5 SNPs with cancer risk is cancer-specific needs to be validated in additional large studies of different cancer types, either in additional groups of homogenous ethnicity or in more ethnically diverse groups.
Moreover, in the published GWASs for cancers, including ESCC, based on common tagging SNPs, none of the SNPs in NER genes has been identified as susceptibility locus. This poses a challenge to the common variants and common diseases theory [40]. Because NER genes are well-known susceptibility genes, it is plausible that individuals carrying NER variants may be prone to exposures that instigate damage to DNA and thus to cancer. Consequently, without comprehensive information about such exposures for additional adjustment or stratification, the actual associations may be either biased or masked, as likely to be in the published GWASs. XP patients, for instance, can greatly decrease their risk of developing skin cancer, if they can effectively avoid sun exposure in their lifetime. Alternatively, common variants are unlikely to have a significant biological effect or when their effects are weak, sufficient study power is needed, even in the GWASs. For example, when 500 000 SNPs were analyzed in a GWAS, multiple comparisons demand a P-value less than 10 27 or 10 28 to avoid finding false significant associations, which markedly decreases the power to detect SNPs associated with the disease phenotype [41]. This is simply because common variants may moderately contribute to risk of cancer due to their weak penetrance. In that case, the identification of additional rare but functional variants requires a single large GWAS or combined analysis of several published GWASs in the future. There is some biological plausibility that functional genetic variants, like those of ERCC5, may contribute to risk of carcinogenesis initiated by carcinogen-induced DNA damage. Under such circumstances, a candidate approach, rather than a GWAS, may be more appropriate and feasible. Acting as a structure-specific endonuclease and also a 59-39 exonuclease, the ERCC5 protein is required for two sub-pathways in NER. One is transcription-coupled DNA repair (TCR), which preferentially removes DNA damage in the transcribed DNA strand of active genes; the other is global genomic repair (GGR), which removes lesions throughout the genome [42,43]. It has been observed that rare mutations in ERCC5 are causative of DNA repair deficient diseases characteristic of abnormal apoptosis and a high level of cancer risk [11,12]. Additionally, ERCC5 has also been shown to play a role in the regulation of transcription [11,[44][45][46]. Numerous studies have reported that altered ERCC5 transcript expression modulate transcription domain-associated repair capacity [13,47] and other important phenotypic effects, including inter-individual variation in the incidence of several cancer types [28,29].
Polymorphic site for rs2296147 is located in the 59UTR of ERCC5 and is predicted to be a putative P53 consensus site [48]. Although biological significance of this SNP has not been elucidated, a growing body of evidence has indicated that SNPs located at the TFBS are likely to affect gene expression by altering DNA binding properties of transcription factors and thus may contribute to susceptibility to cancer [49,50]. Indeed, previous functional studies have revealed that cis-acting genetic variation at a putative TP53 recognition site (for the rs2296147 T allele) and that an E2F1/YY1 response site (for the rs751402 A allele) is associated with higher levels of allele-specific ERCC5 transcripts in the normal human bronchial epithelium among lung cancer patients [51], which is in agreement with our current finding that the rs2296147 C allele may be protective against ESCC. Interestingly, expression data from the HapMap suggested a nonsignificant correlation of the rs2296147 C variant genotypes with low ERCC5 mRNA expression levels in EBV-transformed lymphoblastoid cell lines [33]. The possible reason might be that the functional effect of rs2296147 is modest and may be in LD with either other untyped functional SNPs, thereby altering the function of ERCC5, or with an adjacent susceptibility gene. Another possibility may be that the study with a relatively small sample size has a limited statistical power to detect a weak effect.  Additionally, rs2296147 in the 59UTR may have some effects on regulating posttranscriptional processing, because the 59UTR length of ERCC5 may be critical for its recruitment to the polyribosomal cellular fraction that could increase translation in response to the protein kinase C activity as a result of DNA damage response [52]. However, it would be more conclusive, if we had data on ERCC5 mRNA/protein expression by genotypes. Unfortunately, the target tissues were not made available to us at the time the study initiated. Taken together, because we do not have directly biological evidence, it may be premature to conclude that rs2296147 is a causal SNP. There are several potential limitations in the present study that should be addressed. First, although our study was relatively large, the small sample size in subgroup analysis may have limited statistical power to detect significant associations for each of the strata, let alone the power to assess gene-gene or geneenvironment interactions adequately. Second, we used common, potentially functional SNPs instead of tagging SNPs, which may miss some important genetic variations within the gene. Third, because it was a hospital-based case-control study, there may be inherent biases for selection of the nonrepresentative population and retrospective collection of exposure data. Finally, because of our tissue access constraint, this study did not include the detection of the influence of the ERCC5 variants on its protein expression in the target tissue. These limitations can only be overcome in larger, well designed prospective studies in the future.
In summary, our findings suggested that ERCC5 polymorphisms may contribute to risk of ESCC among Eastern Chinese populations, but the effect was weak and needs to be further validated in larger studies in the future.