A Novel Functional TagSNP Rs7560488 in the DNMT3A1 Promoter Is Associated with Susceptibility to Gastric Cancer by Modulating Promoter Activity

DNA-methyltransferase (DNMT)-3A which contains DNMT3A1 and DNMT3A2 isoforms have been suggested to play a crucial role in carcinogenesis and showed aberrant expression in most cancers. Accumulated evidences also indicated that single nucleotide polymorphisms (SNP) in DNMT genes were associated with susceptibility to different tumors. We hypothesized that genetic variants in DNMT3A1 promoter region are associated with gastric cancer risk. We selected the tagSNPs from the HapMap database for the Chinese and genotyped in a case-control study to evaluate the association with gastric cancer (GC) in a Chinese population. We identified that the functional tagSNP rs7560488 T>C associated with a significantly increased risk of GC. In vitro functional analysis by luciferase reporter assay and EMSA indicated that the tagSNP rs7560488 T>C substantially altered transcriptional activity of DNMT3A1 gene via influencing the binding of some transcriptional factors, although a definite transcriptional factor remains to be established. Compared with TT homozygotes, subjects who were TC heterozygotes and CC homozygotes exhibited a reduced expression of DNMT3A1. Furthermore, stratified analysis showed that individuals who harbor TC or CC genotypes less than 60 years old were more susceptible to GC. Our results suggest that the genetic variations in the DNMT3A1 promoter contribute to the susceptibility to GC and also provide an insight that tagSNP rs7560488 T>C may be a promising biomarker for predicting GC genetic susceptibility and a valuable information in GC pathogenesis.


Introduction
Gastric cancer is one of the most common malignant tumors in China, especially in Jiangsu province with a high incidence and mortality rate [1,2]. It can spread throughout the stomach and to other organs, including the esophagus, lungs, lymph nodes or liver. Therefore, gastric cancer is the second leading cause of cancerrelated death in the world [3]. In consideration of the therapeutic efficiency, surgical resection can be a primary curative treatment for earlier stage of GC patients [4]. Unfortunately, most gastric cancer patients are detected in advanced stage, during which period the tumor are unresectable anymore. Furthermore, relapse after surgery is another terrible event for a poor 5-year survival rate. Considering the patients with advanced or recurrent gastric cancer, it is no doubt that discovery of biomarkers and their application accompanied with traditional diagnosis might be a valuable indication and an extensive help to formulate the prevention and treatment strategy. However, so far, few measurable biomarkers for predicting GC recurrence have been identified.
Tumorigenesis is known to be a multistep process, which is the result of not only genetic alterations but also epigenetic changes [5]. DNA methylation is a major form of epigenetic modification and plays an essential role in development, differentiation, genomic stability, X-inactivation, and imprinting by specific regulation of gene expression. The most commonly studied epigenetic phenomenon is DNA methylation, an essential regulator of transcription and chromatin structure. Aberrant DNA methylation patterns in a genetically susceptible background may be associated with increased risk of a series of human disorders [6,7], including GC [8]. DNMT3A which contains DNMT3A1 and DNMT3A2 are two de novo DNA methyltransferases plays a crucial role in embryonic development and aberrant DNA methylation in carcinogenesis. Some polymorphisms of the DNMT3A gene may regulate gene expression, influence its enzymatic activity and may contribute to susceptibility to cancer. Accumulated evidences in molecular genetics indicate that SNP in DNMT genes are associated with susceptibility to cancer [9,10]. Recent progresses in genome-wide association study (GWAS) also have been identified new susceptibility SNPs for GC, which is helpful to understand the underlying mechanism of genetic variations in the development of GC [11][12][13][14]. Our previous study found a functional SNP rs1550117 in DNMT3A promoter that can increase its transcriptional activity and contribute to the genetic susceptibility to gastric cancer in a Chinese population [15,16].
GWAS has yielded numerous SNPs associated with many cancers. In some cases, dozens of SNPs, called tagSNPs which represent SNPs in a region of the genome with high linkage disequilibrium can identify genetic variation without genotyping every SNPs in a chromosomal region, so tagSNPs are useful in whole-genome SNP association studies, such as prostate, breast, ovarian, colorectal and brain cancers [17][18][19]. In the present study, we selected a tagSNP rs7560488 from the HapMap database for Chinese subjects to evaluate the associations between the genetic variants in the DNMT3A1 promoter and gastric cancer risk in a Chinese population. We identified a risk-associated rs7560488 T.C polymorphism in the DNMT3A1 promoter, and our further work suggested that this variant could alter the promoter activity and destroy the binding ability of transcriptional factors.

Study Subjects
A total of 405 patients with histologically confirmed gastric cancer and 408 cancer-free controls were recruited in this casecontrol study, and the characteristics of the cases and controls are detailed in Table 1. Cases and controls were matched by age, sex and were selected from the First Affiliated Hospital of Nanjing Medical University. All of the samples were obtained with written consent and analyzed anonymously. This study was performed with the approval of the Medical Ethical Committee of Medical School of Southeast University.

TagSNP Selection and the TF Binding Site Prediction
The principal hypothesis underlying this experiment was that there are one or more SNPs in the DNMT3A1 promoter regions that are associated with the risk of gastric cancer. Depending on the linkage disequilibrium (LD) structure at a particular locus, tagSNPs may be surrogates for many thousands of other SNPs. We postulate that such tagSNPs are also likely to tag any hitherto identified SNPs in the DNMT3A1 promoter. Thus, we selected the SNPs in the DNMT3A1 promoter region with a minor allele frequency (MAF) of .5% from both the HapMap and dbSNPs databases. To implement potentially functional tagSNP selection, we use data from the International HapMap and the freely webbased tagSNP selection tools to select tagSNPs, and use the TFsearch algorithm (http://mbs.cbrc.jp/research/db/TFSEARCH. html) to predict rs7560488 transcription factor (TF) binding site.

DNA Extraction and HRM Genotyping
To study the DNMT3A1 promoter tagSNP rs7560488, genomic DNA was isolated from 1 ml of peripheral blood from patients and healthy individuals and was extracted from white blood cells within a week after sample collection by proteinase K digestion as previously described [20]. TagSNP rs7560488 was genotyped using the dsDNA dye LC Green in combination with High Resolution Melting (HRM) analysis. In detail, the PCR primers were designed by the LightScanner primer design software (Idaho Technology) (forward primer: 59-AGGCAGACACAAATGCA-TAAAT-39; Reverse primer: 59-GTCATAAGTACAACCAC-CACCG-39) which product a single 208 bp fragment. Each PCR reaction was initially performed in a final reaction volume of 10 mL, using 25 ng of genomic DNA, 0.2 pmol of each primer, 0.8 mL 2.5 mM dNTPs, 1 mL 25 mM MgCl 2 , 1 mL 106Taq buffer with (NH4) 2 SO 4 , 0.4 U Taq DNA Polymerase (Fermentas), 1 ml 1X LC Green PLUS (Idaho Technology) and 0.4 mL dimethyl sulfoxide (DMSO). The reaction mixture was incubated at 95uC for 5 min and then subjected to 40 cycles of 95uC for 30 sec, 57uC for 30 sec, and 72uC for 30 sec, followed by 72uC for 7 min using a PTC-200 thermal cycler (Bio-Rad). The PCR reactions were transferred to the 96-well plates (Bio-Rad) and analyzed on the Light Scanner (Idaho Technology). Fluorescence data were collected over a temperature range of 70-97uC, and melting curve analysis was performed according to the manufacturer's software. HRM could directly discriminate the heterozygote (TC) and homozygote (CC or TT) genotypes of tagSNP rs7560488 T.C through melt scanning. After mixing homozy-  gous DNA with an equal amount of known PCR products (e.g., CC), it further distinguished between the CC and TT genotypes. For further confirmation, 5% of samples from each group detected by HRM were randomly selected and subjected to DNA sequencing to ensure reliability and reproducibility.

Construction of Luciferase Reporter Plasmid
To construct the DNMT3A1 tagSNP rs7560488 reporter plasmid, we amplified the 948 bp fragment from 25422345 to 25422345 of DNMT3A1 promoter region, which contains the T and C allele of SNP by PCR from genomic DNA. The primers used for the PCR amplifications were: (Forward: 59-TACGC-TAGCATACCAAGTCCCCATTCCCC-39, Reverse: 59-GTA-TAAGCTTTCGGCTTCTACACCCCTCAC-39). The PCR products were subcloned into the NheI and HindIII restriction sites of the pGL3-Basic vector (Promega, Madison, WI). We verified all recombinant clones by DNA sequencing.

Transient Transfection and Dual Luciferase Reporter Assay
Human gastric cancer AGS and BGC-823 cells (ATCC) were grown in RPMI-1640 medium supplemented with 10% Fetal  Bovine Serum (FBS) and 1% penicillin/streptomycin solution (10 000 U/mL and 10 mg/mL, respectively). AGS and BGC-823 cells (1610 5 ) were seeded in 24-well culture plates. After 24 hours of culture, AGS, BGC-823 cells were transfected by Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA) with 0.8 mg of each constructed vector, either with T allele or C allele. Simultaneously, 10 ng pRL-TK plasmids (Promega) per well was also transfected as an internal control for correcting transfection efficiency. Before  it, cells were seeded on 24-well plates over night to ensure 90%-95% confluence at the time of transfection. Twenty-four hours after transfection, luciferase activity was measured by the Dual-Luciferase Reporter Assay System (Promega, Madison, WI, USA) and expressed as the ratio of Firefly luciferase to Renilla luciferase activities. All cells were done in triplicate with the same conditions. Three independent transfection experiments were performed, and each luciferase assay was carried out in triplicate.

Electrophoretic Mobility Shift Assay (EMSA)
The 59-biotinylated oligos 25 bp in length were obtained from Beijing Genomics Institute (BGI). Oligo sequences were rs7560488  (BOSTER, China). Complexes were separated by electrophoresis on native 6% PAGE in 0.56 TBE buffer at 110 V. Gels were transferred to Biodyne B pre-cut modified nylon membranes (pierce/Thermo Fisher Scientific) using a Trans-Blot SD semi-dry transfer cell (Bio-Rad Laboratories). Membranes were cross-linked (UVC-508 UV Cross-linker, Ultra LUM) and the signal was detected with a chemiluminescent detection system (Pierce/ Thermo Fisher Scientific) according to the manufacturer's instructions.

Detection of DNMT3A1 Transcripts by Quantitative RT-PCR (Q-PCR)
To further detect the correlation between the DNMT3A1 mRNA levels and rs7560488 polymorphism, the 44 gastric cancer tissues with different genotypes were subjected to extraction of the total RNA using Trizol Reagent (Invitrogen, Inc.). The DNMT3A1 mRNA level was measured by quantitative real-time PCR after reverse transcription on a Prism 7900 Real-Time PCR machine (Applied Biosystems, Foster City, CA). b-actin was used as an internal quantitative control for each sample. The primers used for DNMT3A1 amplification were F: 59-GAACAGAAGGA-GACCAACATCGAA-39 and R: 59-GCGCTTGCTGATGTAG-TAGGG-39; the primers for b-actin were F: 59-GACCTC-TATGCCAACACAGT- 39 and R: 59-AGTACTTGCGCTCAGGAGGA-39. Relative quantification of DNMT3A1 mRNA was calculated by using the 2-DDCT method, and each assay was done in triplicate.

Statistical Analyses
All data were analyzed with SPSS version 13.0 (SPSS Inc., Chicago, IL, USA). Patients and controls were compared using Student's t-test for continuous variables and chi-square (x2) test for categorical variables. Allele and genotype frequencies between control and GC subjects were obtained using the chi-square test,and the standard goodness-of-fit test was used to test the Hardy-Weinberg equilibrium. A P value of less than 0.05 was considered statistically significant.

Characteristics of Study Subjects
The frequency distributions of the cases and controls are presented in Table 1, there was no significant difference in the frequency distributions between the cases and controls (P = 0.243 for age and P = 0.355 for sex). The average of patients and controls was 59.8 years (range 20,93 years) and 60.6 years (range 25,90 years), respectively. No significant difference was found in average age and gender, suggesting that matching based on these two variables was adequate.

Candidate tagSNP Selection and Genotyping
Among the candidate SNPs in DNMT3A1, we focused on the tagSNPs in the promoter of DNMT3A1 and predicted their potential function on binding transcription factors, which affect the qualitative and quantitative expression of the DNMT3A1. We applied a LD-based tagSNP selection algorithm (r 2 $0.80, MAF$ 5%), which identified two tagSNPs representing common genetic variation in CHB population, including the candidate tagSNPsrs7560488 and rs1550117 which is a functional polymorphism that modifies the susceptibility in gastric cancer we confirmed before [15,16]. TFSEARCH algorithm predicted that rs7560488 T creates a binding site for AP-1 (Figure 1). The samples for genotyping by HRM and sequencing by ABI 3730 automated sequencer respectively (Figure 2).

TagSNP rs7560488 Variant T.C in DNMT3A1 Promoter Significantly Increases the Risk of GC
The genotype distributions and allele frequencies of rs7560488 are presented in Table 2. The genotype frequencies in the controls were in agreement with the Hardy-Weinberg model (P = 0.274). As shown in Table 2, the genotype frequencies of rs7560488 were 68.9%, 27.4%, and 3.7% for the TT, TC, and CC genotypes among the cases, and 79.9%, 18.4%, and 1.7% among the controls, respectively, the difference between the cases and controls was statistically significant (P,0.05). In addition, the T allele frequency was significantly lower among cases than controls (82.6% versus 89.2%, P = 0.000). In addition, the combined TC/ CC genotype frequency was higher among cases than controls (31.1% versus 20.1%, P = 0.002). When taking TT genotype and T allele as reference, we found that the variant genotypes (TC and CC) were associated with an increased risk of GC (OR = 1.653, 95% CI = 1.194-2.287; P = 0.002). Similarly, we also observed that the C allele frequencies was statistically significantly higher than controls (OR = 1.744, 95% CI = 1.310-2.321; P = 0.000). Taken together, these data suggested that the TC and CC genotypes were associated with the genetic susceptibility to GC; the DNMT3A1 rs7560488 T allele may be a putative protective allele. There were no significant different frequencies of rs7560488 in GC at age range .60 years versus #60 years (P = 0.756), and male versus female (P = 0.459) ( Table 3).
Individuals Less than 60 Years Old were more Susceptible to Gastric Cancer with tagSNP rs7560488 Variant T.C Age and sex were important factors in tumor carcinogenesis including gastric cancer. When the analyses were stratified by the age and gender of the patients, we found that significant association was observed, individuals carrying TC/CC genotypes were associated with the genetic susceptibility to GC both in male and female group. Therefore, rs7560488 C allele was a significantly increased risk factor compared with T allele (Table 4). Further stratification evaluated the association of rs7560488 T.C with gastric cancer in different ages. TC/CC genotypes were associated with the genetic susceptibility to GC at the age range #60 years (OR = 1.794, 95% CI = 1.118-2.877; P = 0.015) other than older than 60, similarly, we also observed that the C allele frequencies was statistically significantly higher than controls (OR = 1.720, 95% CI = 1.127-2.622; P = 0.011). These results suggested that the TC and CC genotypes were associated with the genetic susceptibility to GC, particularly in individuals no more than 60 years ( Table 4).

The rs7560488 T.C Variant Affects DNMT3A1 Transcriptional Activity
To evaluate the biological functional effect of rs7560488 polymorphism on DNMT3A1 transcription, we constructed luciferase reporter vectors (pGL3), spanning the 4389823 to 4390770 base from DNMT3A1 promoter, with either wild type (T allele) or mutant type (C allele) and transfected them into BGC-823, AGS cells ( Figure 3A). As shown in Fig. 3B, we found that the transcription activity of T allele was higher than C allele with an approximately 2-fold in above two cell lines, suggesting that rs7560488 T allele worked as a defender for gastric cancer by increasing the transcription of DNMT3A1.

The rs7560488 T.C Variant Attenuates Transcription Factor Affinity
In view of tagSNP rs7560488 is located in the DNMT3A1 promoter region; we hypothesized that it might alter binding of transcription factor (TF). Indeed, using the TF-search algorithm (www.cbrc.jp/research/db/TFSEARCH.html), we predicted that rs7560488 T creates a TF binding site for AP-1. To determine whether this polymorphism has an effect on binding ability of the transcription factor, we conducted the electrophoretic mobility shift assay (EMSA) to analyze the binding of oligo probes containing either T or C allele to nuclear proteins extracted from the AGS cell. As shown in Fig. 4A, a specific shifted DNA/nuclear protein complex band was generated by both C and T allele probes (Fig. 4 A lanes 2, 5). However, T allele still have not been fully competitively inhibited (Fig. 4A lane 4), although the shifted band was abolished by 50-fold unlabeled C probes (Fig. 4A lane 1), suggesting that the binding activity of the sequence containing rs7560488 T allele was stronger compared with C allele and the transcription factor might preferentially bind to the T allele rather than C allele. Moreover, super-EMSA using AP-1 antibody not caused a supershift of the biotin-labeled probe/nuclear protein (Fig. 4 B lane 2, 5) indicating that the AP-1 may not the transcription factor that binds to the promoter region containing the T or C allele. These results indicated that rs7560488 C allele could decrease the nuclear protein binding activity although the impact is not affected by the transcription factor AP-1.

Association between DNMT3A1rs7560488 Polymorphism and the Expression Levels of DNMT3A1 mRNA
Forty-four gastric cancer tissues with different genotypes of DNMT3A1 rs7560488 were available in our present study. Because of the low frequency of CC genotype, we added it into the samples with TC genotype for analysis. As shown in Fig. 5, the expression levels of DNMT3A1 mRNA was lower in individuals with TC or CC genotype than in those with TT genotype (P, 0.05).

Discussion
Genome-wide hypomethylation and promoter hypermethylation are hallmarks of a great variety of cancers contributing to tumorigenesis and DNA methylation plays key roles in regulating gene expression and maintaining genomic stability [21,22]. DNA methylation is performed by DNA methyltransferases (DNMTs) DNMT1, DNMT3B and DNMT3A [23,24]. The de novo methyltransferases DNMT3A are highly expressed during early embryonic development and down-regulated in most differentiated somatic cells [25]. The role of DNMT3A in human cancer was highlighted by reports of DNMT3A mutations in approximately 20% of patients with acute myeloid leukemia [26,27]. The occurrence of these mutations correlated with reduced enzymatic activity and genomic regions with decreased methylation. DNMT3A mutations were also identified in 8% of patients with myelodysplastic syndrome [28]. DNMT3A also plays a critical role in the epigenetic silencing of hematopoietic stem cell (HSC) regulatory genes and enabling efficient differentiation [29].
The DNMT3A genomic locus produces two transcripts giving rise to two proteins, the longer DNMT3A1 and the shorter DNMT3A2, which differ in that a 219-amino-acid amino (N)terminal tail is present only in DNMT3A1 [30,31]. The N-terminal domain of DNMT3A1 is called a ''regulatory'' domain because it does not possess enzymatic DNA methyltransferase activity. This domain does not share significant homology with any other known protein. DNMT3A1 is concentrated in heterochromatin, which is considered to be transcriptionally silent, and functions primarily as a transcriptional repression [30]. But other research showed that DNMT3A1 was efficiently recruited to the silenced Oct3/4 and activated vitronectin (Vtn) gene promoters via its unique Nterminal domain [32].
It has been reported that genetic variations in the DNMT3A gene contribute to carcinogenesis especially associated with GC [15,[33][34][35][36]. Then, further exploration of the relationship between SNPs and the translational regulation to its target genes is proposed. But, ascertaining biological function for each SNP often requires time-consuming, molecular biology experiments. Thus, analyzing the large number SNPs linked to any particular locus in practice requires a systematic bioinformatics evaluation and prioritization to narrow the set of likely functional candidate variants. Because most of the SNPs are in LD, the haplotype-based association studies are considered more powerful than the single SNP analysis to identify causal genetic variants underlying the etiology of complex diseases such as cancer [37], moreover, the use of tagSNPs that capture most of the haplotypic diversity in association studies has been suggested [38]. Though GWAS has yielded numerous SNPs or tagSNPs significantly associated with cancer, most of the tagSNPs are found in non-protein coding regions (intergenic and intron regions), identifying their functional and/or causal variants has an important limitation of GWAS data interpretation despite of assigning putative functionality to many other GWAS tagSNPs has only been successful when fine mapping around a known risk region was performed [39][40][41].
In the present study, we selected a putative functional tagSNP rs7560488 which can represent SNPs of the DNMT3A1 promoter with high linkage disequilibrium, it is possible to identify genetic variation without genotyping every SNP in DNMT3A1 promoter region and improve the efficiency of association. We observed that subjects carrying tagSNP rs7560488 TT genotypes exhibited significantly reduced gastric cancer risk compared with individuals with TC or CC genotype, indicating that allele T is a protective effect potentially exhibited by this tagSNP. Moreover, the assays we performed provided further evidence demonstrating that the TC and CC genotype associated with decreased expression levels of DNMT3A1 mRNA in gastric cancer tissues, the results suggest that the DNMT3A1 tagSNP rs7560488 T.C polymorphism may regulate the expression of DNMT3A1 and thereby contribute to GC susceptibility. These data also indicated that DNMT3A1 may play a role in the progression of gastric cancer, but this finding needs to be confirmed by a larger population study. To our knowledge, this is the first report and demonstrates that the DNMT3A1 transcription is directly influenced by functional tagSNP rs7560488 of DNMT3A1 promoter region and the tagSNP rs7560488T.C was associated with a significantly increased risk of GC. Next, we performed an EMSA experiment to analyze the biological consequences of tagSNP rs7560488 polymorphism in BGC-823 cells. Both the T allele and the C allele probes showed two gel-shift bands, but competition experiments showed the binding affinity to the nuclear proteins by the T allele probe variant was greater than that seen with the C allele probe counterpart. So the enhanced DNA-protein binding ability of T allele may be responsible for the increased DNMT3A1 promoter activity that we observed in our promoter assays. Super-EMSA experiment used AP-1 antibodies not got a super-gel shift indicated that transcription factors AP-1 may not involve in the formation of transcriptional complexes at the tagSNP rs7560488 site. Another novel result comes from the association between age and rs7560488 T.C polymorphism in stratified analysis implied that less than 60 years old were more susceptible to gastric cancer with tagSNP rs7560488 T.C. It is likely that DNMT3A1 affects transcription of specific genes especially and changes in certain genes increase the risk of GC. Those results also show that the stronger risk factor for GC is tagSNP rs7560488 T.C, especially in young age.
Taken together, this study provides the first mechanistic insight into how this novel functional tagSNP rs7560488 T.C variant is significantly associated with risk of gastric cancer in a Chinese population. We found that the T to C change substantially altered transcriptional activity of DNMT3A1 gene via influencing the binding of some transcriptional factors and so to change DNMT3A1 expression level, although a definite transcriptional factors remains to be established. Our findings provide an insight that DNMT3A1 promoter rs7560488 T.C variation is a promising biomarker to evaluate the population susceptible to GC and provide valuable information toward future research in gastric cancer pathogenesis.