Genetic Variants in Epidermal Growth Factor Receptor Pathway Genes and Risk of Esophageal Squamous Cell Carcinoma and Gastric Cancer in a Chinese Population

The epidermal growth factor receptor (EGFR) signaling pathway regulates cell proliferation, differentiation, and survival, and is frequently dysregulated in esophageal and gastric cancers. Few studies have comprehensively examined the association between germline genetic variants in the EGFR pathway and risk of esophageal and gastric cancers. Based on a genome-wide association study in a Han Chinese population, we examined 3443 SNPs in 127 genes in the EGFR pathway for 1942 esophageal squamous cell carcinomas (ESCCs), 1758 gastric cancers (GCs), and 2111 controls. SNP-level analyses were conducted using logistic regression models. We applied the resampling-based adaptive rank truncated product approach to determine the gene- and pathway-level associations. The EGFR pathway was significantly associated with GC risk (P = 2.16×10−3). Gene-level analyses found 10 genes to be associated with GC, including FYN, MAPK8, MAP2K4, GNAI3, MAP2K1, TLN1, PRLR, PLCG2, RPS6KB2, and PIK3R3 (P<0.05). For ESCC, we did not observe a significant pathway-level association (P = 0.72), but gene-level analyses suggested associations between GNAI3, CHRNE, PAK4, WASL, and ITCH, and ESCC (P<0.05). Our data suggest an association between specific genes in the EGFR signaling pathway and risk of GC and ESCC. Further studies are warranted to validate these associations and to investigate underlying mechanisms.


Introduction
ERBBs or epidermal growth factor receptors (EGFRs) belong to the receptor tyrosine kinase (RTK) superfamily and are important signaling proteins in normal physiological conditions [1,2]. For example, ligand-bound EGFRs are regulators of cell-cycle progression, proliferation, survival, invasion, and other cancer contributing processes [3,4]. Not surprisingly, therefore, members of the EGFR family, particularly EGFR (also known as ERBB1 or HER1) and ERBB2 (HER2), have been implicated in the development of numerous human cancers and are pursued as therapeutic targets [3,4,5]. In regards to esophageal and gastric cancer, higher EGFR and ERBB2 levels have been correlated with poor esophageal and gastric cancer survival [4,6,7]. Therapies targeting the EGFR family were shown to improve esophageal and gastric cancer prognosis [4]. Several studies have also revealed somatic mutations of genes in the EGFR family in esophageal and gastric cancers [8,9,10,11,12]. In addition, a role for downstream signaling of the EGFR family has also been found, with molecules involved in the MAPK/ERK pathway activated in esophageal and gastric cancers [13,14,15].
Given the significance of this pathway, genetic variations in EGFR signaling proteins could correlate with predisposition to esophageal and gastric cancers. However, only a few studies have investigated the role of germline single nucleotide polymorphisms (SNPs) in these cancers. These few prior studies had only limited coverage of the genes in this pathway [16,17,18,19,20,21,22,23]. Although SNPs in this pathway have not reached genome-wide significance in published genome-wide association studies (GWAS) [24,25,26,27,28,29,30,31,32], such a criteria may be overly conservative for detecting modest associations. Therefore, pathway analysis may help to identify important genetic contributions whose individual effect sizes may be too small to be detected using the GWAS significance criteria [33,34]. Based on our GWAS data in ethnic Chinese subjects [24], we comprehensively evaluated associations between genetic variants in the EGFR pathway and the risk of esophageal squamous cell carcinoma (ESCC) and gastric cancer (GC) in 1942 ESCC cases, 1758 GC cases (1126 cases of gastric cardia cancer (GCA) and 632 of gastric noncardia cancer (GNCA)), and 2111 controls living in the Taihang Mountain region of China, an area with a high risk of ESCC and GC.

Ethics Statement
The Shanxi upper gastrointestinal (UGI) Cancer Genetics Project (Shanxi, registered at ClinicalTrials.gov as NCT00341276) obtained written informed consent from subjects to attend the Shanxi parent study and the overall GWAS (current study) and the whole procedures were approved by Shanxi Cancer Hospital and Institute Institutional Review Board. The Linxian Nutrition Intervention Trials (NITs, registered at ClinicalTrials.gov as NCT00342654) obtained written informed consent from subjects to attend the NIT parent study and the overall GWAS (current study) and the whole procedures were approved by Cancer Institute of the Chinese Academy of Medical Sciences Institutional Review Board. The NCI Special Studies Institutional Review Board approved both the Shanxi and NIT parent studies as well as the overall GWAS (current study).

Study Population
The study participants were enrolled from two upper gastrointestinal (UGI) cancer projects conducted in the Taihang Mountain area in China: the Shanxi and NITs study. The Shanxi study was initiated in 1997 and had a case-control portion and a case-only portion. We enrolled newly-diagnosed, histologicallyconfirmed ESCC and GC cases, and, in the case-control portion of this study, age (65 years)-, sex-, and neighborhood-matched controls were enrolled within 6 months of the identification of each case [35]. Blood samples were collected at enrollment. The NITs were initiated in Linxian in 1985 and tested the effect of multiple vitamin and mineral combinations taken daily for up to six years on the outcome of esophageal and gastric cancers [36]. We collected blood in 1999 and 2000 specifically to obtain DNA from NIT participants. During the follow-up through December 31, 2010, all newly-diagnosed, histologically confirmed ESCC and GC cases along with controls from an age-and gender-stratified randomly sampled subcohort, were included in the current genetic analysis. All examined esophageal cancers were ESCC, and all GCs were adenocarcinomas. GCAs were defined as those located in the proximal 3 cm of the stomach, whereas GNCAs were those in the remainder of the stomach.

Gene and SNP Selection
We performed an extensive literature search of the EGFR pathway genes [1,2,3,4,5]. A gene was included in our analysis if it was referenced in at least one of the following databases: ErbB signaling pathway in KEGG (http://www.genome.jp/dbget-bin/ www_bget?pathway:map04012, retrieved Dec 20, 2012), EGF signaling pathway in BioCarta (http://www.biocarta.com/ pathfiles/h_egfPathway.asp, retrieved Dec 20, 2012), or ErbB receptor signaling, ErbB2/ErbB3 signaling, EGF receptor signaling, or ErbB4 signaling pathway in the NCI Pathway Interaction Database (http://pid.nci.nih.gov/browse_pathways.shtml, retrieved Dec 20, 2012). We identified a total of 131 EGFR pathway genes. No SNPs mapped to AREGB, EIF4EBP1, PAK3, and SHC1 in our dataset, leaving 127 genes (Table S1) for analysis. A total of 3443 SNPs located within these genes and their flanking areas (20 kb upstream and 10 kb downstream), with a minor allele frequency of .1% (in cases and controls combined) were included in our analysis, and the full list of these SNPs were shown in Table  S2.

Genotyping and quality control
Genome-wide scanning was performed using the Illumina 660W array, which has been detailed in our published GWAS on UGI cancer [24]. After that report, we scanned additional subjects on the same platform at the same facility. The initial and additional subject scan data underwent similar processing and quality control filtering metrics. We excluded SNPs with a missing rate .5%, subjects with a completion rate of all SNPs ,94%, subjects with abnormal mean heterozygosity values (.30% or ,25%), gender discordant subjects, or unexpected duplicate pairs. The GWAS data on UGI cancer in the study populations have been deposited on the database of Genotypes and Phenotypes (dbGaP, http://www.ncbi.nlm.nih.gov/projects/gap/, study accession number: phs000361.v1.p1).

Statistical analysis
We investigated the association between genes in the EGFR signaling pathway and risk of ESCC and GC. To conduct genelevel analysis, we first carried out SNP-level analysis. We calculated the odds ratios (ORs) and 95% confidence intervals (CIs) for risk of ESCC or GC associated with having one minor allele, using unconditional logistic regression in an additive model for each SNP, adjusting for age, gender, and study (Shanxi or NIT). We did not consider population stratification because there was no evidence for significant problems with population substructure [24]. We used a dominant model for SNPs when the expected number of subjects carrying the minor allele was less than five.
Gene-level associations were then calculated using the adaptive rank truncated product (ARTP) approach, which applied ranktruncated test statistics and a permutation-based sampling procedure (1,000,000 resamplings) [34]. Association signals over a set of SNPs within a gene were combined while accounting for SNP linkage disequilibrium (LD) structures and multiple comparisons. We also evaluated the association of the overall EGFR pathway with ESCC and GC, which globally combined the associations between each outcome and genes within the pathway. We used the ARTP method with 1,000,000 resamplings to obtain a single summary pathway-level P-value for each cancer type.
In secondary analyses, we additionally adjusted for cigarette smoking (ever or never), alcohol intake (ever or seldom/never), and family history of UGI cancer (yes or no). Since results of these SNP-level secondary analyses showed essentially similar results as those from the primary models, we present only the primary analyses in the paper.
We tested the association between SNPs and ESCC and GC by subgroups of sex, smoking, alcohol intake, and family history of UGI cancer. The P for interactions between SNPs and these variables were examined using likehood ratio tests.
Statistical significance for gene-and pathway-based analyses was defined as P,0.05. Since none of the SNPs reached the Bonferroni-corrected significance level (1.45610 25 , 0.05/3443 SNPs), statistical significance for SNP-level analyses was defined as P,0.001. Statistical analyses were performed using R language. We evaluated the linkage disequilibrium (LD) between SNPs across specific gene regions with Haploview version 4.1.

Results
A total of 1942 cases of ESCC, 1758 cases of GC (1126 GCA and 632 GNCA cases), and 2111 controls were included from the Shanxi and NIT studies (Table S3). Overall, the mean age was 56.0 years in controls, 56.0 in ESCCs, and 56.3 in GCs.
The pathway-level analysis revealed a statistically significant association of the overall EGFR pathway with GC risk (P = 2.16610 23 ), but not with ESCC risk (P = 0.72). However, the association was not significant for either GCA (P = 0.12) or GNCA (P = 0.097).
The SNP-level associations are shown in Table 3. Although none of the SNPs exceeded the significance level after correcting for multiple comparisons, at a reduced threshold of 0.001, rs1884361 (NRG3) was associated with ESCC risk, and rs9387033 (FYN), rs9788973 (MAP2K4), rs7187863 (PLCG2), and rs7720677 (PRLR) were associated with GC risk. We also identified a correlation for rs549386 (TGFA) with GCA, as well as correlation for rs16947307 and rs9923225 (both in WWOX) with GNCA. In the subgroup analyses, we did not observe significant interactions between SNPs and other characteristics at the threshold of 0.001 (data not shown).

Discussion
Somatic mutations and altered regulation of EGFR pathway genes have been widely implicated in the development and prognosis of esophageal and gastric cancers [6,7,8,9,10,11,12,13,14,15]. In contrast, it is less clear whether germline genetic variants in the EGFR pathway are associated with these cancers. Recent GWASs have identified numerous risk loci associated with ESCC or GC, but thus far, there has been no evidence for an association with genetic variants in EGFR pathway. Pathway-based approaches have been developed to utilize genome-wide data more efficiently, and they hold the potential to yield novel findings [33,34]. We comprehensively evaluated genes in the EGFR pathway and risk of ESCC and GC using the ARTP approach. Although none of the genes met the Bonferroni-correction for multiple comparisons, at a threshold of 0.05, we observed that several genes, as well as the overall EGFR pathway, were associated with risk of GC. The results also suggested associations between multiple EGFR-related genes and ESCC risk.
We identified five genes significantly associated with ESCC risk. Among them, GNAI3 and CHRNE were significant in both ESCC and GCA, but not in GNCA. GNAI3 in 1p13.3 encoding Guanine nucleotide-binding protein G(k) subunit alpha, was the most significant gene for ESCC and also correlated with risk of GC, particularly GCA. Guanine nucleotide-binding proteins (G proteins) are involved as modulators or transducers in various transmembrane signaling pathways. One previous study suggested an association between rs11184738 (PRMT6, located in 1p13. 3) and ESCC risk in a GWAS scan but not in the validation stage [28]. GNAI3 is located 2.3 Mbps downstream of PRMT6, and the top SNP in GNAI3 (rs1434285) was not in high LD with rs11184738 in our GWAS dataset (r 2 ,0.01). CHRNE in 17p13.2 encoding acetylcholine receptor subunit epsilon precursor, was correlated with risk of both ESCC and GCA, but not with GC overall. One GWAS reported that rs17761864 (SMG6, located in 17p13.3) was associated with risk of ESCC [32], but SMG6 is located more than 2.5 Mbps downstream from CHRNE.
Ten genes were significantly associated with GC risk in our study. FYN in 6q21 was the most significant gene in GC overall, but was associated only with GCA and not with GNCA. FYN protein belongs to the membrane-associated Src tyrosine kinase family and has a pivotal role in cell adhesion, proliferation and apoptosis [37]. MAPK8 in 10q11 was the most significant gene for GNCA and was also associated with GCA. MAPK8 is a member of the mitogen-activated protein kinases and is involved in cell proliferation, differentiation, apoptosis and transcription. Recent pathway-based research indicated that MAPK8 was associated with rectal cancer and pancreatic cancer [38,39].
Since the standard single-locus methods may miss SNPs with moderate effect size, we used a resampling-based ARTP method, which combines association signals across individual SNPs within a gene, to calculate gene-level associations. In addition to the above-highlighted genes, our results suggested that some other genes were also associated with risk of GC or ESCC, even though individual SNPs in these genes were not reported in previous GWAS studies. We also found significant genes in the gene-level analysis for which the individual SNPs were not significant in the pre-defined threshold for SNP-level analysis, underscoring the necessity of a more integrated understanding of the genetic contributions than the SNP-level perspective only.
Our results are biologically plausible. The EGFR family has been found to be upregulated and is the target of somatic mutations in UGI cancers, and a clinical trial indicated improved cancer prognosis for therapies targeting the EGFR family [4,6,7]. Prior studies have also reported the role of downstream signaling of EGFR family genes in UGI cancers. One recent report indicated that the MAPK pathway was commonly stimulated in esophagogastric cancer following activation of RTKs [13]. A Table 2. Epidermal growth factor receptor pathway genes significantly associated with risk of gastric adenocarcinoma overall and by anatomic sites * .

Gene
Chr. (cytoband) Gene-level P second study showed that oncogenic CagA promoted GC risk by activating ERK signaling pathways [15]. Previous GWASs indicated genetic variants in PLCE1 as common susceptibility loci for ESCC and GCA but not for GNCA [24,27]. In our analyses, we found two genes significant for ESCC and GCA but not for GNCA, further suggesting that a common genetic mechanism might contribute to the development of ESCC and GCA.
In our study, we used prior biological knowledge to systematically investigate associations between genes in the EGFR pathway and risk of ESCC and GC in a high-risk population in north central China. To our knowledge, this is the first study to comprehensively investigate the role of genetic variation in EGFR pathway genes and risk of UGI cancers. The relatively large sample size allowed us to assess the associations for ESCC, GC overall and by anatomic sites with a reasonable power. We also acknowledge, however, the limitations of our study. First, we had no information on Helicobacter pylori (H. pylori) infection [40], which could be a concern particularly for the analysis of GNCA. However, a recent survey among NIT plasma samples showed a prevalence of H. pylori seropositivity of 96.6% among GNCA, 95.8% among GCA, and 93.9% among controls (unpublished data), using a multiplex assay with H. pylori positivity defined as three or more antigens being positive [41]. Although the multiplex method tends to be more sensitive than traditional ELISAs, this serological examination revealed a very high H. pylori infection rate in this area even among controls, suggesting that our results were less likely to be greatly distorted by the lack of information on H. All SNPs with P-value ,0.002 for esophageal squamous cell carcinoma (ESCC), gastric cancer (GC) overall or by anatomic sites are listed. The top SNPs for total GC (Pvalue ,0.002) which have P-value ,0.05 for cardia or noncardia cancer are also listed. Results were derived from logistic regression models using genotype trend tests adjusted for age (10-year categories), sex and study. b These SNPs were significant only for gastric cardia or noncardia cancer, but not for total gastric cancer. doi:10.1371/journal.pone.0068999.t003 pylori infection. Second, further replications in independent populations are required to determine if the associations we observed between EGFR pathway genes and the risk of ESCC and GC are real. Third, the pre-defined EGFR pathway that we tested may not represent all functionally-related EGFR genes due to limitations in current knowledge. Fourth, further generalizability to other populations requires caution since our study was conducted only among high-risk Han Chinese.
In conclusion, our study identified significant associations between the germline genetic variations of the overall EGFR signaling pathway and several individual genes and the risk of GC, as well as individual genes and the risk of ESCC, suggesting a possible role for EGFR pathway genes in the development of UGI cancers. Further studies are warranted to confirm the associations in independent populations and to explore the underlying biological mechanisms. Table S1 The associations between all EGFR pathway genes and risk of esophageal squamous cell carcinoma and gastric adenocarcinoma.