Association among Polymorphisms in EGFR Gene Exons, Lifestyle and Risk of Gastric Cancer with Gender Differences in Chinese Han Subjects

Background The epidermal growth factor receptor (EGFR) gene plays a key role in tumor survival, invasion, angiogenesis, and metastatic spread. Recent studies showed that gastric cancer (GC) was associated with polymorphisms of the EGFR gene and environmental influences, such as lifestyle factors. In this study, seven known SNPs in EGFR exons were investigated in a high-risk Chinese population in Jiangsu province to test whether genetic variants of EGFR exons and lifestyle are associated with an increased risk of GC. Methodology/Principal Findings A hospital-based case-control study was performed in Jiangsu province. The results showed that smoking, drinking and preference for salty food were significantly associated with the risk of GC. The differences of lifestyle between males and females might be as the reason of higher incidence rates in males than those in females. Seven exon SNPs were genotyped rs2227983,rs2072454,rs17337023,rs1050171,rs1140475, rs2293347, and rs28384375. It was noted that the variant rs2072454 T allele and TT genotype were significantly associated with an increased risk of GC. Interestingly, our result suggested the ACAGCA haplotype might be associated with decreased risk of GC. However, no significant association was examined between the other six SNPs and the risk of GC both in the total population and the age-matching population even with gender differences. Conclusions Smoking, drinking and preference for salty food were significantly associated with the risk of GC in Jiangsu province with gender differences. Although only one SNP (rs2072454) was significantly associated with an increased risk of GC, combined the six EGFR exon SNPs together may be useful for predicting the risk of GC.


Introduction
Although both the incidence and mortality of gastric cancer (GC) have declined in recent years, GC was the fourth most common malignancy in the world in 2008, with approximately 989,600 new cases. Men generally develop GC twice as frequently as women and about 72% of new cases occur in developing countries. In general, the highest incidence rates are in Asia, particularly in East Asian countries such as Korea, Japan, and China [1].
Indeed, almost 40% of all GC cases occur in China, and there is a remarkable geographical variation in GC rates throughout China [2]. More than two-thirds of the patients diagnosed with GC in China have unresectable disease and a median survival of six to nine months. Moreover, in patients with resectable tumors, the local and distant recurrence rates are high, and the 5-year survival rate is less than 30% [3]. Sun et al. [4] assessed the impact of GC on the Chinese population by epidemiological analysis of its mortality distribution from 1990 to 1992, the results showed that GC was the leading cause of cancer-related mortality in China.
It is now generally accepted that the pathogenesis of GC involves a multi-factorial interaction between environmental triggers and genetic susceptibility. Epidemiological studies have identified age, gender differences and a number of environmental factors that may contribute to the development of GC, including a salty diet, tobacco smoking, alcohol consumption and Helicobacter pylori infection [5][6][7][8]. Other studies have identified host factors and genetic alterations that also play an important role in the development and progression of GC through gene-environment interactions [9,10].
The epidermal growth factor receptor (EGFR) gene is located in the short arm of human chromosome 7 and produces a glycoprotein with a molecular weight of 170 kDa that has a high affinity for its ligands, including epidermal growth factor (EGF) and transforming growth factor alpha (TGF-a). EGFR participates in several essential tumorigenic mechanisms, such as tumor survival, invasion, angiogenesis, and metastatic spread. EGFR expression has been observed in numerous human tumors, and several studies demonstrated that overexpression of EGFR correlates with a poor outcome [11]. In GC, EGFR overexpression correlates with advanced tumor stage and a poor clinical outcome [12]. However, the roles that EGFR overexpression and genetic alterations play in gastric carcinogenesis remain unclear. Moreover, only a few single nucleotide polymorphisms (SNPs) have been found to associate with GC development and outcome [11,13].
In this study, we hypothesized that the environmental exposures and gender differences act as effect modifiers on a background of genetic variation in EGFR exons which may affect EGFR function, thereby shaping GC susceptibility. To test this hypothesis, a hospital-based study was performed in which 387 GC cases and 392 cancer-free controls in a high-risk Chinese population were genotyped for seven known SNPs in EGFR exons, namely, rs2227983 A.G, rs2072454 C.T, rs17337023 A.T, rs1050171 G.A, rs1140475 C.T,rs2293347 G.A, and rs28384375 T.C.

Recruitment of Cases and Control Participants
Initially, 401GC cases and 420 controls were identified; 3 cases of cancer lack of questionnaires as well as 11 tumors other than adenocarcinoma were excluded; 18 controls were excluded by immoderate serum cancer-related biomarkers and 10 controls were excluded by geographical deviation. Overall, a total population with 387 cases and 392 controls were available for the current study on the basis of prospective power analyses, it is a pity that controls were about 10 years younger than cases, thus, an age-matching population with 294 cases and 294 controls was extracted from the total population for the collation of agematching. All subjects were genetically unrelated ethnic Han Chinese. The patients with primary GC were recruited from the Department of Surgical Gastroenterology in the Jiangsu Provence Hospital of Traditional Chinese Medicine (TCM) between January, 2008 and July, 2010 in Nanjing city, Jiangsu province. The cancer-free healthy controls were consecutively recruited from Jiangsu Hospital of TCM, and they were hospital visitors for an annual check-up during the same period. To be included in the study, the patients (a) males or females over 20 years old but under 80 years old, (b) had to be of Han Chinese ethnicity (self-reported), (c) came from three regions of Jiangsu province and had to be a local resident for at least 5 years, (d) had to have newly histopathologically diagnosed primary GC, (e) had to lack previous malignant tumors in other organs, and (f) had not had antitumor therapy before recruitment, including chemotherapy and radiotherapy. Trained interviewers used a pre-tested questionnaire to collect epidemiological data from the participants, namely, demographic factors such as age and gender, and known risk factors for GC (such as tobacco smoking, alcohol consumption, and a family history of digestive tract cancer). Individuals who smoked one or more times a day for over a year were defined as smokers, and those who consumed three or more alcoholic drinks a week for over 6 months were considered to be chronic drinkers [14,15]. After signing informed consent forms, each subject donated 3-5 ml of peripheral blood to be used for genomic DNA extraction. The research protocol was approved by the institutional review board of Jiangsu Provence Hospital of TCM.

Genomic DNA Isolation from Peripheral Blood Cells
A commercial blood DNA extraction kit (AxyPrep-96 kit, Axygen, CA, USA) was used to extract genomic DNA from the blood samples. The purified DNAs were stored at 220uC until they were used for genotype testing. The quality of DNA was assessed by agarose gel electrophoresis.

Genotyping
Polymerase chain reaction-ligation detection reaction (PCR-LDR) methods were used for genotyping [16]. Primers were synthesized by Shanghai Sangon Biological Engineering Technology and Services (No.698, Xiangmin Rd, Songjiang District, Shanghai, China). Each set of ligase detection reaction probes comprised one common probe and two discriminating probes for the two types (Table S1).
The target DNA sequences were amplified using a multiplex PCR method. PCRs for each subject were carried out in a final volume of 20 ml containing 1 6 PCR buffer, 3.0 mmol/l MgCl 2 , 2.0 mmol/l deoxynucleotide triphosphates, 1 ml primers, 0.2 ml QIAGEN HotStarTaq Polymerase (QIAGEN, China), 4 ml of 1 6 Q-solution, and 10-20 ng genomic DNA. Thermal cycling was performed for five SNPs (rs2227983, rs17337023, rs1140475, rs2293347 and rs2072454) in the Gene Amp PCR system 9600 (PerkinElmer) with an initial denaturation for 15 min at 95uC, followed by 35 cycles of denaturation at 94uC for 30 s, annealing at 59uC for 1 min, and extension at 72uC for 1 min, followed by a final extension at 72uC for 7 min. The protocol for rs28384375 amplification consisted of an initial denaturation for 15 min at 95uC, followed by 35 cycles of denaturation at 94uC for 30 s, annealing at 56uC for 1 min, and extension at 72uC for 1 min, followed by a final extension at 72uC for 7 min. The protocol for rs1050171 consisted of an initial denaturation for 15 min at 95uC, followed by 35 cycles of denaturation at 94uC for 30 s, annealing at 53uC for 1 min, and extension at 72uC for 1 min, followed by a final extension at 72uC for 7 min.
The ligation reaction for each subject was carried out in a final volume of 10 ml containing 1 6 NEB Taq DNA ligase buffer, 12.5 pmol of each probe mix, 0.05 ml Taq DNA ligase [NEB Biotechnology (Beijing)], and 1 ml of multi-PCR product. The probe sequences are shown in Table S2. In total, 35 cycles consisting of 95uC for 2 min, 94uC for 30 s, and 60uC for 2 min were performed. The fluorescent products of the ligase detection reactions were differentiated by an ABI sequencer 377 ( Figure S1). To confirm the accuracy of the PCR-LDR genotyping method, direct DNA sequencing of randomly selected PCR products was performed. The proportion of the sequencing samples was about 5%. The PCR-LDR genotyping results showed complete agreement with the direct DNA sequencing results.

Statistical Analyses
All statistical analyses were conducted by using SPSS software version 16.0 (SPSS Inc., Chicago, IL, USA). To compare the observed genotype frequencies in the control group with the expected frequencies, the Hardy-Weinberg equilibrium (HWE) was tested by a goodness-of-fit x 2 -test. Cases and controls were compared in terms of demographic characteristics, lifestyle factors, and the allele frequencies of each SNP using the x 2 -test. For each polymorphism, odds ratios (OR) and 95% confidence intervals (CI) were calculated from conditional logistic regression models to estimate the main effect of each polymorphism with GC while adjusting for continuous age, gender and lifestyle factors. Logistic regression analyses using the major allele as a reference were employed to estimate adjusted ORs, 95% CIs, and P values. However, we found that the gender and age variable did not conform to the assumption of proportionality. Thus, gender stratification and age-matching were used to measure ORs and 95% CIs. For lifestyle variables that showed significant relations with GC in analyses controlled for matching factors only, we further assessed their independence in analyses adjusted for additional potential confounders (i.e., smoking, drinking, salty food, eating time and eating breakfast) in the age-matching population. And PHASE software package version 2.0 was used to infer haplotype frequencies based on the observed EGFR genotypes. All P values were two-sided and a P value ,0.05 was considered statistically significant.

Characteristics of the Study Subjects
The characteristics of these study subjects were consistent with previously described [17,18]. In this paper, the total population with 387 cases and 392 controls and an age-matching population with 294 cases and 294 controls were included in the current analyses. Gender, age, and geographic region distribution of study subjects are shown in Table 1.
To examine the differences in GC incidence between males and females, the distributions of select lifestyle variables were analyzed with gender differences. The results showed that six lifestyle factors (i.e., regularly taking meals, preference for salty food, eating time, smoking status, drinking status, and eating breakfast) were significantly correlated to the risk of GC, and their effects might be modified by gender and age. Four lifestyle factors, i.e., preference for salty food, eating time, smoking status and drinking status, were significantly different between males and females (P,0.05) ( Table 2). It was noted that the percentages of the males carrying unhealthy dietary habits, such as preference for salty food, short eating time, smoking and drinking, were significantly higher than those of the females (P,0.05), which might be as the reason of higher incidence rates in males than those in females. In the total population, regularly taking meals and eating breakfast decreased the risk of female GC, while drinking could increase the risk of male GC (Table 3). In addition, both preference for salty food and smoking increased the risk of male and female GC (P,0.05). Similar results were also found after age-matching that regularly taking meals decreased the risk of GC and preference for salty food and smoking increased the risk of both male and female GC (P,0.05, Table 4). However, there were two exceptions, that was drinking increased the risk of male GC (P,0.05) but not female GC (P.0.05) perhaps due to the limited female population, while daily eating breakfast decreased the risk of female GC (P,0.05) but not male GC (P.0.05) ( Table 4).

Genotyping Distribution and Risk of GC
In the present study, only the T allele of the missense locus rs28384375 was detected. Regarding the six remaining SNPs, the observed genotype frequencies in the controls were in HWE. Since no statistical significance was lost, only the adjusted statistical results were shown in the tables. In the total population, the association among the distribution of EGFR gene alleles, genotypes and the risk of GC was shown in Table 5. Only a slight difference was observed in terms of the allelic distribution of rs2072454 (x 2 = 3.844, P = 0.050), and logistic regression analyses revealed that the variant of this allele may be associated with GC risk (adjusted OR = 1.23, 95% CI = 1.00-1.50). However, no significant differences between cases and controls in terms of allele distribution were observed with gender differences. The rs2293347 polymorphism was associated with male GC cases because 73.0% of GC cases carried a G allele compared with 29.5% of health controls (adjusted OR = 6.45, 95% CI = 4.89-8.51, P,0.001), but no genotypes were significantly associated with the risk of GC. In the contrary female population the GA genotype could decrease the risk of female GC because of 34.5% of GC cases carrying a GA genotype compared with 38.9% of health controls, but no significant differences between the distribution of allele and genotype frequencies (P.0.05). In addition, no significant differences were examined at the other four SNPs (P.0.05), i.e. rs2227983, rs17337023, rs1050171 and rs1140475 between cases and controls even with gender differences.
In the age-matching population, the association among the distribution of EGFR gene alleles, haplotypes and the risk of GC was shown in Table 6. A significant difference was observed in terms of the allelic distribution (x 2 = 4.795, P = 0.029) and genotype distribution (x 2 = 6.668, P = 0.036) of rs2072454, other than the results in Table 5. In addition, logistic regression analyses revealed that the variant of rs2072454 T allele could increase the risk of GC (adjusted OR = 0.77, 95% CI = 0.61-0.97; P,0.05). According to the genotype distribution, a significant difference was also observed in the male population (x 2 = 7.914, P = 0.019) rather than female population (x 2 = 0.452, P = 0.798). However, there were no significant differences between cases and controls even with gender differences for the remaining SNPs, namely, rs2227983, rs17337023, rs1140475, rs1050171 and rs2293347 (P.0.05).
To exclude environmental influence, we investigated the association between the potential three GC-related SNPs and the risk of GC in the age-matching population carrying health lifestyle (Table 7). One distinct difference was examined in terms of the genotype distribution of rs2072454 (x 2 = 6.036, P = 0.049). Specially, logistic regression analyses revealed that the TT genotype significantly increased the risk of GC in the population with daily eating breakfast (adjusted OR = 1.92, 95% CI = 1.07-3.44, P,0.05).  Exons not only encode the amino acid sequence of the protein, but also contain sequences that influence translation or mRNA degradation [19]. The loci were combined and subjected to haplotype inference analysis using the PHASE 2.0 program. There were four possible haplotypes in the total population and three possible haplotypes in the age-matching population, with a frequency of .4% (Table 8, Table 9). Compared to the GTTGCG haplotype, logistic regression analyses revealed that the ACAGCG haplotype was associated with a significantly decreased risk of GC (OR = 0.67, 95% CI = 0.49-0.92, P,0.05 adjusted for age, gender and lifestyle factors) in the total population (Table 8) rather than the age-matching population (OR = 0.83, 95% CI = 0.58-1.19, P.0.05 adjusted for age, gender and lifestyle factors) ( Table 9). The other haplotypes were associated with a significantly decreased risk of GC (adjusted OR = 0.69, 95% CI = 0.52-0.90 for the total population; adjusted OR = 0.74, 95% CI = 0.56-0.99 for the age-matching population). Compared to the ACAGCG haplotype, the ACAACG could decrease the risk of GC (OR = 0.61, 95% CI = 0.40-0.94 adjusted for age, gender and lifestyle) in the total population, but the P value did not reach statistical significance by Bonferroni correction (Table 8).

Discussion
Although a great number of new GC cases were seen throughout the world in the latest decades, the exact mechanisms underlying gastric carcinogenesis are not yet fully understood. Similar to previous research [20,21], our result showed that the men generally have been developing GC twice as frequently as women in China. It was suggested by Michael et al. that much of the global variation in cancer incidence has been attributed to environmental influences, including dietary preferences and unhealthy lifestyle factors [22]. In the present study, therefore, we were interested to test whether some unhealthy lifestyle factors could increase the risk of GC with gender differences in China. Six lifestyle factors, including regularly taking meals, preference for salty food, eating time, smoking status, drinking status, and eating breakfast, were identified to be influenced the risk of GC with gender differences, which was consistent with the previous studies in east China [20,23,24]. Especially, preference for salty food, drinking and smoking were the strongest and most consistent risk factors for GC.
Resent researches suggested that a high intake of salt (sodium) could increase the risk of GC [21,[24][25][26]. It was also evidenced by our observation that the GC patients in Jiangsu province were preference for salty foods, such as salted meat, pickled vegetables, and pickled vegetable juice, which might be contaminated by Nnitroso compounds. However, the N-nitroso compounds were the most frequently proposed related to the increased risk of uppergastrointestinal cancers [27]. Drinking and smoking, another two dominant risk factors for GC in the world [28], were also examined in this study. We found a significant association between drinking and GC risk in Jiangsu province. It was evidence by a laboratory study that smoking could increase the apoptosis in the rat gastric mucosa by an increase in XO activity [29], and alcohol could also exert influence on acid secretion, gastric emptying, and certain acid-related diseases, such as gastritis accompanied with damage of the gastric mucosa, and the following inflammatory reaction will in turn promote gastric cell proliferation and differentiation [30]. In the process, the N-nitroso compounds and mycotoxins from some salty food may induce gene mutations [31,32], thus preference for salty food may be the original risk of GC. And evidence pointed to an association with pathways involved in developmental processes [33]. Key molecules of these pathways were the receptor tyrosine kinases, which were found to be aberrantly activated or overexpressed in a variety of tumors and therefore represent promising targets for therapeutical intervention [13].
EGFR was one of the key molecules, and many lines of evidence suggest that highly invasive GC is associated with the aberrant activation or overactivation of EGFR due to gene amplification or structural alterations [13]. Very recently, a case-control study of 61 cases and 20 controls in Henan province, located in middle China, showed that the EGFR rs28384375 C/T polymorphism may promote the occurrence and development of GC [34]. Moreover, another case-control study of 138 cases and 170 controls in Jiangxi province, located in south-east China, revealed that the EGFR rs763317 G/A polymorphism may associate with an increased risk of GC [35].
EGFR is a growth factor receptor tyrosine kinase and belongs to the receptor tyrosine kinase superfamily, whose members are characterized by an extracellular domain (where ligand binding ligands takes place), a short lipophilic transmembrane domain, and an intracellular domain that harbors the tyrosine kinase activity [36]. EGFR can be activated by binding ligands, such as EGF and TGF-a, and it plays pivotal roles in development, proliferation and differentiation. The activation of EGFR may contribute to the transformation of cellular phenotypes and provide tumor cells with substantial growth and survival advantages [37]. Many human tumors exhibit EGFR overexpression, which is correlated with an advanced tumor stage or a poor clinical outcome, such as nonsmall cell lung cancer [38], colorectal cancer [39], breast cancer  Table 6. Association between EGFR exon polymorphisms and the risk of GC in the age-matching population with gender differences.  [40], head and neck cancer [41], bladder cancer [42], and GC [43].
There is currently increasing interest in SNP mutations in EGFR, given that they could affect the efficacy of EGFR tyrosine kinase inhibitor (TKI) treatment in various cancers [44], colorectal cancer [39], non-small cell lung cancer [45], GC [46][47][48][49]. It is well known that exons not only encode the amino acid sequence of the protein, but also contain sequences that influence translation or mRNA degradation [19]. Thus, EGFR exon SNPs could influence EGFR gene expression and/or protein activity and thereby alter the affinity of EGFR for not just its ligands, but also for anticancer agents that target this protein. Indeed, Puyo et al. found that mutations in exons 18-21 of EGFR enhance the activity of TKIs, while the deletion of exons 2-7 is associated with glioblastoma oncogenesis [49]. Moreover, two clinical studies showed that SNPs in EGFR exons 18,19,21, and 25 affect the clinical efficacy of gefitinib and may be potential biomarkers for the prediction of the clinical outcome of gefitinib-treated patients with advanced nonsmall cell lung cancer (NSCLC ) [50,51].
In the present study, the six exons SNPs in EGFR were slightly associated with the risk of GC both in the total population and the age-matching population even with gender differences. It was noted that the SNP rs2072454 was significantly associated with the risk of GC. Especially, the variant rs2072454 T allele and TT genotype were associated with an increased risk of GC. However, the rs2227983, rs17337023, rs1050171 and rs1140475 SNPs did not relate with GC risk. With regard to the missense locus rs28384375, only the T allele was detected in cases and controls. Many reports suggested that several EGFR SNPs, including rs2227983 (also designated as Arg521Lys or R497K) [39,52,53], rs1050171 (also designated as Q787Q) [54,55] and rs2293347 (C2982T) [56], were more likely to affect the biological behavior of tumors (such as tumor growth, invasion, metastasis, and progression) than to define susceptibility to cancer development. Since EGFR, a key mediator of angiogenesis, can regulate its target genes such as VEGF, which can directly affect tumor biological behavior. These results would also explain why EGFR SNPs can serve as key determinants of a response to EGFR TKI-based chemotherapy [11]. Therefore, these gene mutations may confer the complexity and embarrassment for GC treatment and survival [57]. When Puyo et al. [49] analyzed the association between specific EGFR functional polymorphisms and anticancer drug activity in 60 human tumor cell lines established by the National Cancer Institute, the frequency of the nonsynonymous SNP rs28384375 (also designated as Val592Ala) was 0.5, while the heterozygous frequency of the 2216G.T SNP (also designated as rs287129) was 0.346. The cell lines that were heterozygous and variant homozygous for the 2216 G.T SNP showed significantly higher expression of the EGFR gene than the homozygous wild-type lines. Moreover, compared with cell lines without a variant allele, the cell lines with at least one variant T allele at the 2216 G.T SNP were more sensitive to erlotinib and less sensitive to geldanamycin, topoisomerase I and II inhibitors, and alkylating agents. Interestingly, our results showed that the GTTGCG haplotype was more prevalent in GC cases than in cancer-free controls, and that the ACAGCG haplotypes were associated with a significantly decreased risk of GC (adjusted OR = 0.67, 95% CI = 0.49-0.92 for ACAGCG, adjusted OR = 0.69, 95% CI = 0.52-0.90 for others) in the total population (Table 8). However, in the age-matching population, no significant association was examined between ACAGCA haplotype and risk of GC (adjusted OR = 0.83, 95% CI = 0.58-1.19) (Table 9). Thus, the T alleles of rs2072454 and rs17337023, especially the former, might be associated with the risk of GC. Therefore, combined analysis of the six SNPs, especially the T alleles of rs2072454 and rs17337023, may be useful for predicting the risk of GC.
Several limitations in the present study need to be addressed: 1) its sample size may not have been large enough to detect SNPs with low variant frequency, such as rs28384375; 2) the polymorphisms that were investigated in this study were selected on the basis of their effects on EGFR function and may not give a comprehensive view of the genetic variability in EGFR exons; 3) detailed information about the GC cases was not collected,  Table 9. Association study with haplotypes consisting of pairwise combination of six SNPs between cases and controls in the agematching population. including patient survival, whether the tumors were the intestinal or diffuse type, whether there was metastasis, and what the effect of drug therapy was.
In conclusion, the present study suggested that the differences of lifestyle between males and females might be as the reason of higher incidence rates in males than those in females. Although only one SNP (rs2072454) was significantly associated with an increased risk of GC, combined analyzing the other six EGFR exon SNPs together may be useful for predicting the risk of GC. Further studies are warranted to establish these findings and to address the underlying mechanisms. Figure S1 The fluorescent products of LDR were differentiated by ABI sequencer 377 for the seven SNPs in EGFR exons. In total, more than 90% of the products were successfully differentiated by ABI sequencer 377. (TIF)