RTK/ERK Pathway under Natural Selection Associated with Prostate Cancer

Prostate cancer (PCa) is a global disease causing large numbers of deaths every year. Recent studies have indicated the RTK/ERK pathway might be a key pathway in the development of PCa. However, the exact association and evolution-based mechanism remain unclear. This study was conducted by combining genotypic and phenotypic data from the Chinese Consortium for Prostate Cancer Genetics (ChinaPCa) with related databases such as the HapMap Project and Genevar. In this analysis, expression of quantitative trait loci (eQTLs) analysis, natural selection and gene-based pathway analysis were involved. The pathway analysis confirmed the positive relationship between PCa risk and several key genes. In addition, combined with the natural selection, it seems that 4 genes (EGFR, ERBB2, PTK2, and RAF1) with five SNPs (rs11238349, rs17172438, rs984654, rs11773818, and rs17172432) especially rs17172432, might be pivotal factors in the development of PCa. The results indicate that the RTK/ERK pathway under natural selection is a key link in PCa risk. The joint effect of the genes and loci with positive selection might be one reason for the development of PCa. Dealing with all the factors simultaneously might give insight into prevention and aid in predicting the success of potential therapies for PCa.


Introduction
As one of the most destructive cancers worldwide [1], prostate cancer (PCa) is increasing according to recent cancer statistics in the United States [2][3][4]. Although the death rates continuing to decline (declining by 1.9% from 2000 to 2008 per year), 238,590 new cases might be discovered, with a mortality rate of 12.5% [5]. In Europe, the incidence has tripled in the last 40 years, with about 328,000 men diagnosed with prostate cancer in 2008 [6]. In order to identify the potential role of genetic factors in the development of PCa, several recent studies have been based on analysis at the gene level. Recently, one of the most effective methods, a genome-wide association study (GWAS), was used to conduct high-throughput SNP polymorphism analysis that could reveal significant loci that influence the disease. However, at the same time, with its strict standards and SNP-based analysis, some potential useful information might be ignored. So conducting the further study, which could reveal more comprehensive information, seemed to be important.
Considering the complexity of the factors inducing disease, especially cancer, the interlacing signaling pathways and cell signal transmission involved in biological activities such as cell survival, development, and apoptosis [7][8][9] might make the pathogenesis clearer. As one of the crucial cancer pathways, the RTK/ERK pathway is said to be one of the important links in cancer's development, containing a number of genes (EGFR, EGF, CCND1, MAPK3, etc.). In this pathway, receptor tyrosine kinases (RTKs) represent enzyme-linked receptors, which regulate cell proliferation and differentiation, promote cell migration and survival and modulate cellular metabolism in response to extracellular cues [10][11][12]. However, in order to arouse most RTK pathway functions, the essential effector mitogen-activated protein kinase (MAPK) cascade needs to be activated, composed of the Raf, MEK, and extracellular signal-regulated kinase (ERK) kinases (known as the ERK cascade). With activation of the MAPK cascade, a signal from a receptor on the surface of the cell to the DNA in the nucleus of the cell is transferred by proteins influencing regulation, transcription, and synthesis of the genes [13,14]. So the growth of an organism, angiogenesis, and carcinogenesis are all involved with the critical functions of the RTK pathway.
PCa is one of the most common malignancies in the world. Its high rate of morbidity and mortality bring about enormous losses. So studying the pathogenesis of PCa seems extremely urgent, including determining which genetic factors might be significant in the process. Summarizing recent studies, the pathway might be a vital factor in the development and progression of PCa. In 2007, Caraglia M et al. [15] found that zoledronic acid (ZOL) and R115777 farnesyltransferase inhibitor (FTI, Zarnestra) were synergistic in destroying the Ras/Erk and Akt survival pathways and decreasing the phosphorylation of both mitochondrial bcl-2 and bad proteins, and caspase activation, which led to both growth inhibition and apoptosis in prostate adenocarcinoma cells. Moreover, Milone et al. [16] studied the ZOL further, which suggested the p38-MAPK pathway played a key role in inducing a more aggressive and invasive phenotype and resisting to the ZOL. In addition, our previous path enrichment and gene meta-analysis by Gene Set Enrichment Analysis (GSEA) [17] suggested that significantly expressed genes were located on the RTK/ERK pathway.
Meanwhile, recent studies have discovered natural selection in genes and networks based on populations [18][19], which gave a new view at the evolutionary level into the pathogenesis of diseases. However, no related studies have revealed the potential relationship between the RTK/ERK pathway under natural selection and PCa risk. Prompted by this, the present study was conducted.
This study was based on the key role function of the RTK/ERK pathway, the HapMap database and our own GWA study data about PCa risk. In this analysis, we explored the pathway via several methods: (1) association analysis between gene polymorphisms in the RTK-ERK pathway and PCa; (2) gene-based pathway testing; and (3) natural positive selection and expression quantitative trait loci (eQTLs) analysis. Combining all of these effective analyses, this study might give us a more comprehensive understanding of the RTK/ERK pathway based on genes and SNP analysis, which would help us discover potential crucial loci or genes and give a wide view on the treatment, prevention, and forecasting of Pca.

Study subjects
The analyses in this study were mainly based on the Chinese Consortium for Prostate Cancer Genetics (ChinaPCa) GWAS, the HapMap Project (http://www.hapmap.org), and other databases (NCBI36). Subjects in the ChinaPCA were male Han Chinese recruited from the southeastern in China. They came from different areas, including Shanghai, Suzhou, Guangxi, Nanjing, etc. There were 1417 hospital-based cases that were confirmed to have primary prostate cancer histologically and 1008 health controls that had undergone routine physical examination in local hospitals. Their characteristics are showed in the Table 1. A blood sample was obtained from each subject for DNA extraction when consent was obtained. On the basis of our pathway analysis, 50,459 polymorphic SNPs were potentially involved. All genes were extended 100 kb from beginning to end in order to ensure all loci in the pathway were included. Comprehensive epidemiological and clinical data were collected with standardized questionnaires and from the associated PCa diagnosis. Essential information was included, such as age, the level of the PSA, the record of the cancer, and other related datum. All the details were given in Jianfeng Xu et al. [20].
In addition, the HapMap Project Phases II and III were also applied in the study, combining them with the ChinaPCA.
Considering the close affinity between similar genes, we selected 60 unrelated samples for our analysis for the CEU and YRI. Meanwhile, the CHB (Han Chinese from Beijing) and JPT (Japanese from Tokyo) were pooled with 89 non-affinity Asian individuals, referred to as the ASN. In this study, the numbers of SNPs in each population were 791,208 (ASN), 849,575 (CEU), and 885,926 (YRI) [21][22][23].
Associations between gene polymorphisms of the RTKs-ERK pathway and prostate cancer As an important pathway in the development of cancer, the RTKs-ERK pathway could easily induce cancer when its proteins become mutated. Many genes are known to be involved in this pathway, with some related ones as yet undiscovered. In our study, we included almost all the genes known in the pathway for analysis, comprising 40 genes: EGF, EGFR, PDGF, VEGF, MAP2K1, CCND1, and others located from every chromosome. In order to ensure all the SNPs in the pathway were included, we amplified the length of the genes in both directions for 100 kb. Finally, nearly 50,000 SNPs were involved. With these SNPs, systematic association tests for the risk of PCa were conducted, including single-SNP association tests and a gene-based pathway association test. A logistic regression model implemented in Plink [24] was applied to test the associations between the selective SNP polymorphisms and PCa with the covariate of age.
In the gene-based pathway test, the adaptive rank truncated product (ARTP) method was applied. ARTP method was proposed by Yu K et al. [25], and the method was based on the adaptive rank truncated product statistic to combine evidence of associations over different SNPs and genes within a biological pathway. It mainly could be divided into two steps: first, the association evidence between a gene and the outcome was obtained with the standardized summary; second, combine these gene-level P-values into a test statistic for the disease-pathway association. As a powerful statistical method, ARTP could remedy the shortcomings of the rank truncated product (RTP) method, in which the product of K most significant P-values were used as the summary statistic [26]. According to the method, the potential significant association between the genes, pathway, and related diseases could be revealed. In this study, the SNPs with the significant association (P,0.05) were collected in the gene-based ARTP analysis. In addition, 10,000 permutations were included in the analysis in order to make the statistics more convincing.

Natural positive selection loci and eQTL analysis
Although the GWA study indicated that a number of SNPs were associated with prostate cancer, an area analysis of the haplotype could not be included, which might neglect possible related loci. However, the interactions in terms of the functional consequences and evolutionary history of these loci remain largely unknown. Recently, several studies have proposed natural positive selection based on haplotype analysis in various populations [27][28]. Candidate regions that are experiencing positive selection could be identified by a number of sophisticated statistical methods. In this study, we applied the integrated haplotype score (iHs), which was developed in 2006 [23,29]. This test statistic was said to select potential positive selection loci with an empirical threshold of 2 or 1.65. iHS score is derived from the extended haplotype homozygosity statistic (EHH) [30], which measures the decay of the identity of haplotypes that carry a specified ''core'' allele at one end. It is said that when an allele rises rapidly in frequency due to strong selection, it tends to have high levels of haplotype homozygosity, extending much further than expected under a neutral model. Combining with the EHH, Voight et al. [23] proposed the integrated EHH (iHH), which is divided into iHH A and iHH B with the limitation of computing with respect to an ancestral or derived core allele. In addition, considering the influence of allele frequency on the core SNP, Voight et al. improved the data analysis. And iHS score was conducted with all SNPs with minor allele frequency .5% that were treated as core SNPs. As to the results, both extremely positive and extremely negative iHS scores would be potentially interesting.
In our study, the web-based tool Haplotter [30] was applied to identify iHS scores from the HapMap phase II for three different populations (ASN, CEU and YRI). Considering the various physical locations in the different databases, the number of rs for all the SNP in the RTKs-ERK pathway was selected. According to these findings, iHs values derived with the HapMap Project phase II were confirmed. In addition, Plink was used to explore the association between the ChinaPCA and matched genotypes in all the samples in the pathway. A logistic regression model was used with the PSA value and the covariate was age. On the basis of the SNP (rs), the iHS scores and P-values for the associations were matched. The SNPs, which were said to be significant associations, were collected. Meanwhile, considering the robustness of the selection signals, we continued to conduct a gene-based approach. In this analysis, a window of 50 SNPs centered on each gene closest to the index SNP was created. The candidate targets of selection were defined to be in the upper 10% of the empirical distribution for number of significant SNPs.
When it came to the expression quantitative trait loci (eQTLs) analysis, not only cis-but tran-, the left SNPs with |iHs|.2 were included. Single cis-eQTL considered all association signals from SNPs within 1 Mb up-and down-stream, and trans-eQTLs were identified as being associated with SNPs located greater than 1 Mb from the probe set. The correlation with nearby gene expression was available in the eQTL database Genevar as measured by probes (http://www.sanger.ac.uk/resources/software/genevar/). The HapMap III with 726 individuals (CEU, CHB, GIH, JPT, LWK, MEX, MKK, and YRI), Geneva with 75 individuals (three types of samples: fibroblast, lymphoblastoid cell line, and T cell) and MuTHER healthy female twins with three tissue types (166 adipose, 156 lymphoblastoid cell line, and 160 skin) were collected in this analysis. A linear regression model was used take the distribution of normalized expression levels between genotypes into consideration. And a significance threshold of P,0.05 with 10,000-fold permutations was applied to avoid false positive associations. All analyses were performed using Statistical Analysis System (SAS) software (version 9.0; SAS Institute, Cary, NC, USA) and Plink. All statistical tests were two sided.

Results
As a key pathway in carcinogenesis, the RTKs-ERK pathway could influence the cell, tissue, and organism directly by mutations in its proteins, which would block normal signal transmission from the extracellular milieu to the DNA. In recent years, many studies have focused on this pathway for the pathogenesis and treatment of diseases, particularly cancers. On the basis of the genes and SNPs of this pathway, the study was conducted with the positive selection loci and eQTL analysis involved.

SNP-level association and gene-based rank truncated product (ARTP)
We tested the association between each SNP and PCa risk using a logistic regression model after adjusting for age (Table S1). The results showed that 317 loci reached the threshold value of significance (P,0.05). When it came to the genes in the RTKs-ERK pathway, three SNPs (rs1815009, rs3743250, and rs3743249) in IGF1R presented with ideal P-values (2.84610 24 , 2.98610 24 , and 4.35610 24 ), which were in the exon in utrvariant-3-prime. In addition, ERBB2 with six loci (rs2517959, rs2643194, rs2517960, rs2088126, rs903506, and rs2643195) and CCND3 with three loci (rs115597780, rs149917140, and rs4331978) were said to be significant in PCa risk considering their P-values.

Natural positive selection loci and related eQTL
On the basis of the web-based tool Haplotter, iHs values for all three different ethnic groups (ASN, CEU, and YRI) in HapMap phase II were obtained. Considering our Asian samples, iHs values in ASN were selected according to the number of SNP in the two data sets. In order to analyze the data comprehensively, we defined the threshold as 0.05. In addition, |iHs|.1.65 was used in the next selection. Then, 26 SNPs came into view, which were mainly focused on two genes, EGFR (in the pathway) and MKRN2, located on chromosomes 7 and 16, respectively ( Table 3). The maximum values of |iHs| reached 2.967 for rs17172432 (EGFR) with cis-eQTL (P = 0.0424, JPT). In addition, the results suggested the association with PCa risk was mainly focused on EGFR. After selecting the potential natural selection loci, we identified the genes with selection in Table 4 Table S2, in which the cis-eQTL and tran-eQTL with the related genes are offered. Significant associations were collected. The results indicated additional SNPs in the pathway might have both cis-regulation and trans-regulation in the genes themselves and nearby (rs11238349, rs17172438, rs984654, rs11773818, and rs17172432) ( Table 5). However, the functions could not be repeated in all the HapMap samples derived from similar lymphocytes, different cell types or twins, which suggested specific influences in the different samples.

Description
PCa is a worldwide disease that causes enormous losses each year considering its high morbidity and mortality. Recently, many studies have been focused on the function of regulatory pathways in the development and progression of related diseases, especially for cancers, [31][32], which could provide a way for targeted therapy [33][34]. In this study, we proposed that the RTK/ERK pathway could influence various cancers, including PCa, when the relevant proteins are mutated. On the basis of this hypothesis, the association between this pathway and PCa risk was analyzed. Combined with positive selection, the eQTL method, SNP-level association analysis, and a gene-based pathway analysis, we described the essential status of the pathway with regard to PCa. Our results suggest the RTK/ERK pathway has the potential to be a key factor regulating the development and progression of PCa, as we had determined before [17]. According to the analysis we conducted, many genes were said to be significant associated with PCa risk. After combining all the genes we proposed 4 genes (EGFR, ERBB2, PTK2, and RAF1) with five SNPs (rs11238349, rs17172438, rs984654, rs11773818, and rs17172432) as the key factors influencing PCa. In this study, it is suggested many genes in the pathway are significant, especially EGFR, ERBB2, PTK2, and RAF1, which were not only associated with PCa risk, but also under natural selection. This study was based on the natural selection theory to discover potential positive selection in the pathway associated with PCa risk and gave us new insights into studying the relationship between this pathway and the disease.

Loci-genes analysis and prostate cancer
As one of the significant genes, EGFR was proven to be important in PCa risk. The gene located on Chr7p12 was also determined to be ERBB1, a member of the HER/ERBB/EGFR family of receptors. Recent studies had revealed its close relationship with cancers [35][36]. Through both homo-and heterodimeric HER complexes, cell proliferation, motility, and invasion were induced. In addition, it was revealed that obstructing expression and activity of HER-family members could prevent human neoplasia [37]. Therefore, specific anti-EGFR monoclonal antibodies could inhibit cell growth and downregulate the related genes, which could reduce the development of PCa [38]. As the other member of the family, ERBB2, a known protooncogene located on the long arm of human chromosome 17 (17q12), is overexpressed in cancers such as breast cancer, with approximately 30% amplification or over-expression of the gene. The gene overexpression could drive aggressive disease, representing a potential therapeutic target [39][40]. Recent studies have suggested that EGFR or ERBB2 contribute to prostate cancer (PCa) progression by activating the androgen receptor (AR) under hormone-poor conditions [41]. In 2001, Chen L et al. [42] suggested that dual EGFR/HER2 inhibition combined with androgen withdrawal therapy could sensitize prostate cancer cells to apoptosis. In addition, PTK2 was also suggested to be associated with cancer development. Known as focal adhesion kinase (FAK), PTK2 was a focal adhesion-associated protein kinase involved in cellular adhesion. Lacoste J et al. [43] presented FAK as necessary for the cell motility of PCa and enhancing metastasis. Sumitomo M et al. [44] indicated neutral endopeptidase (NEP) could inhibit FAK phosphorylation on tyrosine, which contributes to the invasion and metastases in PC cells through multiple pathways. In addition, we proposed in the analysis that a well-known proto-oncogene, RAF1, might play a key role in carcinogenesis among humans [45][46]. It is a member of the Raf kinase family of serine/threonine-specific protein kinases, in which its brethren kinase B-Raf was the major player in carcinogenesis in humans. Approximately 20% of all examined human tumor samples display a mutated B-Raf gene [47]. Recently, Ren et al. [48] suggested the RAF gene might be the main contributor in the activation of the RAS/RAF/MEK/ERK pathway in Chinese PCa. Keller et al. [49] and Escara-Wilke et al. [50] said Raf kinase inhibitory protein (RKIP) could be a promising factor in therapy.
In summary, we concluded from our analysis that four genes (EGFR, ERBB2, PTK2 and RAF1) undergoing positive selection might be important in the regulation of PCa development via various mechanisms. Combined with the close relationship between PCa and these genes, we suggested that these four genes might be key in the RTK/ERK pathway. Through adhesion, cell signaling transduction, regulation of related genes, and assisting gene-gene or protein-protein interactions, the development, progression, and prognosis of the cancers could be regulated. The four genes might be the most promising targets for therapy in the treatment of PCa, especially by virtue of their combined effects.
In this analysis, the four genes (EGFR, ERBB2, PTK2, and RAF1) we proposed were under natural selection, which supported the views we presented above. There are three different types of natural selections: positive selection, negative selection, and balancing selection. Different types of selection would guide the development or purification of the species, in which either positive or negative selection could be more important. At the level of loci and genes, we presented the conditions of evolution for the RTK/ERK pathway. According to their suggestive thresholds (|iHS|.1.65), positive loci were selected. After combining the associated analysis and the eQTL method, five SNPs (rs11238349, rs17172438, rs984654, rs11773818, and rs17172432) were identified as the significant loci that might play a key role in the Pca risk. Tracking recent research, these loci were said to be the possible key factors in the cancer risk. In 2009, Dong et al. [51] studied the role of growth, differentiation, and apoptosis genes in regulating the renal cancer, which implied the rs11238349 in the EGFR might be statistically significantly associated with risk of renal cancer. At the same time, Hong et al. [52] presented the potential significant association in the loci of EGFR (rs11773818 and rs17172432) and breast cancer risk, although without replication, in their stage II analysis. Considering the relationship of these SNPs with other cancers and the function of angiogenesis for the EGFR gene, we inferred that the SNPs we proposed might be associated with the metastasis of Pca to the kidney, breast and other viscera.
With the natural selection effects, the four genes seemed to be more important in regulating the development and progression of Pca based on human population analysis using linkage disequilibrium (LD)-based methods. Explanations for the development of PCa might be associated with evolution at the gene-and locuslevel. In addition, combined with the gene-based pathway analysis and molecular evolution method, we did not only explain the key role of the pathway in the development of PCa, but also gained new insight into the potential role of molecular evolution in the pathway.

Limitation
The presented study combining various analyses has some limitations. First, there are a lot of criteria to evaluate natural selection in a population. As just one of these measurements, iHs might be limited in the estimation. Second, the eQTL and positive selection were based on the available databases, which made it difficult to avoid deficiencies in the results. Finally, every statistical analysis should confirm the deviations. So the real association needs further research.

Conclusions
In conclusion, prostate cancer is a worldwide disease that brings about inestimable losses for health and the economy. The RTKs-ERK pathway was said to be an important link in this cancer's development and progression. Combining natural selection, genebased pathway analysis, and other associated analysis, we proposed that the pathway was significantly associated with PCa risk, in which the mutation of EGFR, ERBB2, PTK2, RAF1, and related SNPs in the genes might be the key link in the evolution of the cancer. However, given that the specific locus in the EGFR with positive selection was located in a non-functional intron, the real function of this intron in the evolution of cancers requires further attention. Table S1 The results of association analysis for Pca risk on the SNPs-level with logistic regression model adjusting for age. (DOC)