Characterization of Differentially Expressed Genes Involved in Pathways Associated with Gastric Cancer

To explore the patterns of gene expression in gastric cancer, a total of 26 paired gastric cancer and noncancerous tissues from patients were enrolled for gene expression microarray analyses. Limma methods were applied to analyze the data, and genes were considered to be significantly differentially expressed if the False Discovery Rate (FDR) value was < 0.01, P-value was <0.01 and the fold change (FC) was >2. Subsequently, Gene Ontology (GO) categories were used to analyze the main functions of the differentially expressed genes. According to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, we found pathways significantly associated with the differential genes. Gene-Act network and co-expression network were built respectively based on the relationships among the genes, proteins and compounds in the database. 2371 mRNAs and 350 lncRNAs considered as significantly differentially expressed genes were selected for the further analysis. The GO categories, pathway analyses and the Gene-Act network showed a consistent result that up-regulated genes were responsible for tumorigenesis, migration, angiogenesis and microenvironment formation, while down-regulated genes were involved in metabolism. These results of this study provide some novel findings on coding RNAs, lncRNAs, pathways and the co-expression network in gastric cancer which will be useful to guide further investigation and target therapy for this disease.


Introduction
Gastric cancer (GC) is one of the most common cancers worldwide, and its incidence is particularly high in Eastern Asia, especially in China. Approximately 952,000 new cases of stomach cancer were diagnosed worldwide in 2012, and half of them occurred in Eastern Asia (mainly in China) [1]. In China, the majority of patients with GC are diagnosed at a late stage with poor prognosis. Therefore, elucidating the molecular mechanisms underlying GC progression is essential to identifying key biomarkers and developing effective targeted therapies.
Over the last decade, gene expression microarrays have become a common tool for examining gene transcript levels in cancer research. Microarray data is used for a wide variety of analyses, such as unsupervised clustering, classification, differential expression analysis, and expression mapping of quantitative trait loci [2]. It not only helps to identify key dysfunctional genes in cancer but provides genome-wide information on gene expression at one time as well [3,4]. In this study, we performed a genome-wide survey of the expression of lncRNAs and mRNAs from paired samples of primary gastric cancer tissues and noncancerous tissues, to profile the differentially expressed lncRNAs and coding RNAs. Study of these data will provide valuable information on the mechanism of carcinogenesis and allow discovery of key genes that may act as future targets of anti-cancer therapy.

Ethical statement
Written informed consent was obtained from all participants. The study was approved by the Human Research Ethics Committee of Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine.

Tissue samples
Tissues were taken from primary gastric carcinomas from untreated patients who underwent D2 radical gastrectomy in Shanghai Ruijin Hospital. For each cancer tissue, a paired noncancerous tissue sample was collected from the adjacent region at the same time. The size of each sample was around 0.1cm 3 . All the samples were placed in RNALater within 15 minutes after excision and stored in liquid nitrogen until RNA extraction. In this study, 32 paired tissues were collected for the microarray and 26 paired samples were enrolled for the next-step analysis of GO, pathway and network after quality control using 3D Principal component analysis (3D-PCA) and Cluster analysis.

Microarray experiments
Agilent SurePrint G3 Human GE 8x60K Microarray (Design ID: 028004) was employed in this study. Total RNA was isolated and amplified using a Low Input Quick Amp Labeling Kit, One-Color (Cat#5190-2305, Agilent technologies, US). Then, the labeled cRNAs were purified by a RNeasy mini kit (Cat#74106, QIAGEN, Germany).
Based on the manufacturer's instructions, each slide was hybridized with 600ng Cy3-labeled cRNA using a Gene Expression Hybridization Kit (Cat#5188-5242, Agilent technologies, US) and washed by the Gene Expression Wash Buffer Kit (Cat#5188-5327, Agilent technologies, US).
An Agilent Microarray Scanner (Cat#G2565CA, Agilent technologies, US) and Feature Extraction software 10.7 (Agilent technologies, US) were applied to scan each slide with the same settings shown as follow, Dye channel: Green, Scan resolution = 3μm, 20bit. The raw data were normalized by the Quantile algorithm, Gene Spring Software 11.0 (Agilent technologies, US) (detailed in S5 Table).

Co-expression network
Gene co-expression Network was built according to the normalized signal intensity of specific expression genes. Degree centrality is defined as the number of links one node has to another, which determines the relative importance of genes. What's more, k-cores were applied as a method of simplifying the graph topology analyses. Core regulatory factors (genes) which have the highest degrees connect most adjacent genes and build the structure of the network (detailed in S5 Table).

Real-time quantitative PCR
Total RNA was extracted from tissues using the Trizol reagent (Invitrogen) according to the manufacturer's instructions. The quantitative real-time polymerase chain reaction (PCR) was performed by using SYBR-green PCR Master Mix in a Fast Real-time PCR 7500 System (Applied Biosystems). The primers of the 10 genes were showed in S4 Table. PCR reactions were performed at 50°C for 2 min, followed by 40 cycles of 95°C for 15 s and 60°C for 1 min. ΔCt was calculated by subtracting the Ct of β-actin RNA (control) from the Ct of the RNA of sample, respectively. ΔΔCt was then calculated by subtracting the ΔCt of the control from the ΔCt of the sample. Fold change was calculated by the equation 2-ΔΔCt.

Statistical analysis
SPSS software 19 and Microsoft Excel 2010 was used to analyze the data. Expression levels between cancer tissues and adjacent noncancerous tissues were analyzed by paired-sample t-tests. P-values below 0.05 were regarded as statistically significant.

Microarray analyses
In total, 42,405 human genes were profiled in our study by using an Agilent G3 Human GE 8x60K microarray. We have submitted our dataset in the repository of "Gene Expression Omnibus" and the accession number was "GSE65801" (http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE65801). We used linear models and empirical Bayes methods to analyze the data (see Methods). There were 2371 mRNAs and 350 lncRNAs considered as the differentially expressed genes by limma for the next-step analysis (Fig 1A). Among all 2371 differential mRNAs, there are 1142 mRNAs down-regulated and 1229 mRNAs up-regulated in our observation on alterations of gene expression between gastric cancer and control tissues ( Fig 1C). Most of the differential mRNAs have been proven to be correlated with carcinogenesis and metastasis in most types of cancer ( Table 1). The genes such as GKN2, PGC, MUC6, CHIA, PSCA and FBP2 were among the top 20 down-regulated genes, while KLK8, SFRP4, INHBA, CLDN1, CST1, FAP, SPP1, OLFM4, and KRT17 were among the top 20 up-regulated genes (Table 1). However, some genes such as HOXC9, FNDC1, STRA6, KCNE2, PGA3 and KCNJ16 haven't been reported in gastric cancer and their roles remain unknown (Table 1).
In addition, we found 193 down-regulated lncRNAs and 156 up-regulated lncRNAs among a total of 350 differential lncRNAs based on the profiling (Fig 1B). Most of the lncRNAs have not been given an official names and their functions remain unknown. However, some have been reported playing critical roles in cancer, such as H19, GUCY1B2, MEG3 and AKR7L ( Table 2).
In our previous report [36], the fold change (FC) of H19 in 74 gastric cancer versus paired noncancerous tissues was 6.015, with a P-value of 0.017. This result was consistent with the data of H19 (Absolute FC = 6.06) in this microarray analyses. Furthermore, over-expression of H19 contributes to the proliferation, migration, invasion and metastasis of gastric cancer.

Gene Ontology categories
All the differentially expressed genes were classified into different functional categories according to the Gene Ontology (GO) project for biological processes. Based on our microarray data, GO analyses indicated that 208 GO terms were enriched (P<0.01, FDR<0.01) (S1 Table). The primary GO categories for 170 up-regulated GO terms were focused on cell adhesion, angiogenesis, multicellular organism development, axon guidance, skeletal system development, collagen fibril organization, positive regulation of angiogenesis, wounding and negative regulation of cell proliferation (Fig 2A). The main GO categories for down-regulated genes were digestion, xenobiotic metabolic process, transmembrane transport, ion transport, small molecule metabolic process, negative regulation of growth, glutathione metabolic process, cellular response to cadmium ion and metabolic process ( Fig 2B).
According to the differential genes and functions, we built a GO Tree to explore the interactions among all the differential GO categories. The diversity in these categories when comparing cancerous and control tissues suggested that gastric cancer may be associated with significantly up-regulated cell migration, cell proliferation, angiogenesis, cell-cell adhesion Each column represents one sample and each row represents one differential lncRNA. C) Clustering heatmap showing the differential mRNAs. Each column represents one sample and each row represents one differential mRNA. and cell surface receptor signaling pathways, while cell metabolism processes and ion transmembrane transport are down-regulated (Fig 3).

Pathway analyses
Pathway analyses were used to identify the significant pathways associated with the differentially expressed genes according to KEGG. There were 32 up-regulated pathways and 31 downregulated pathways based on our data (Fig 4). Furthermore, the pathway profiling was consistent with the results for the GO categories in cancer-related biological functions. Our data showed some differential genes highly up-regulated which suggested their involved pathways were activiated. For example, SFRP4, WNT11, FZD2, MYC were highly expressed in cancer tissues which represent the Wnt pathway was activiated and BCL2A1, ICM1, TNFSF14 in NF-κB pathway were highly expressed as well. Most of the cancer-related signaling pathways such as JAK/STAT, Wnt, NF-κB, PI3K, mTOR, Hedgehog and Notch pathways were activated in gastric cancer compared with noncancerous tissues based on our data (S2 Table). The up-regulated pathways which were focused on cell adhesion, transcriptional dysregulation, carcinogenesis and differentiation were correlated with tumorogenesis and metastasis ( Fig 4A). However, the down-regulated pathways were generally responsible for metabolism ( Fig 4B).

Gene-Act network
Based on GO categories and pathway analysis, one gene may be involved in several pathways or interact with several other genes. We pooled the differential genes and built a network of the interactions of differentially expressed genes. A high degree protein regulates or is regulated by many other proteins, which implies an important role in the Gene-Act network (S3 Table). The glutathione S-transferase (GST) family, cytochrome P450 (CYP) family, UDP glucuronosyltransferase 2 (UGT2) family, Epidermal Growth Factor Receptor (EGFR) family and cAMPdependent protein kinase catalytic beta (PRKACB) were at the core of the gene-gene interaction network. They may play key roles in the network because they possessed the strongest degree (degree >25) centralities (gene-gene interactions) (Fig 5). It has been reported that GST, EGFR and PRKACB are responsible for signal transduction pathways involved in tumor growth and differentiation in different type of cancers [42,43].

Gene co-expression network
We produced a gene co-expression network based on the differentially expressed genes, proteins and protein complex in cancer tissues and noncancerous (control) tissues, respectively. Compared with the control, the connections between genes in cancer tissues were less, which suggested that most of the physiological gene-gene interactions and linkages in normal tissues had been broken or lost in the cancer tissues (Fig 6A and 6B). The genes with high degree and k-core which means they possessed most of the interactions with other geneswere known as key genes in the interaction network ( Fig 6B) including TRO, GPR124, TIMP2, EMCN, SLIT3, HTRA1, SPARC, LAMA4 and MEOX2 (Table 3). They were responsible for cell signaling, adhesion, angiogenesis, migration, growth and metastasis.

Confirmation of microarray results by qPCR
We performed Quantitative Real-time PCR (qPCR) on 6 up-regulated genes (COL1A, BGN, SPP1, MELK, IGFBP4, SPARC) and 4 down-regulated genes (PGC, SST, MT1X, S100P) to verify our data in gastric cancer tissues (Tumor) and noncancerous tissues (Normal). The expression ratios of these 10 genes (Tumor/Normal) from qPCR are consistent with those from  Table). It suggested the data of differential genes expression from microarray was reliable. What's more, our team has been worked on some of the differential genes such as PHF10 [55], CEACAM6 [56], SFRP1 [57], SOX11 [58], CLDN1 [59] to investigate their expression and functions in gastric cancer and the results perfect proved our microarray data.

Discussion
Microarray gene-expression analyses on gastric cancer have previously been used to predict diagnostic markers [60] and to identify gene expression patterns associated with prognosis [61,62], but it hasn't been used to reveal molecular interactions among lncRNAs and mRNAs in GC. In this study, we analyzed 26 gastric cancer tissues with paired noncancerous tissues and profiled the genes differentially expressed according to their GO categories, pathways, Gene-Act network and Co-Expression network.  The gene expression results were obtained by using an Agilent G3 Human GE 8x60K microarray, which not only covers the transcriptome databases for mRNA targets but also includes probes for lncRNAs (long non-coding RNAs). With the combination of mRNA and lncRNAs, it can perform two experiments on a single microarray and predict lncRNA function and interaction with mRNAs. The analyses revealed a set of genes that were differentially expressed between gastric cancer and normal tissue. Some of them have been reported previously in gastric or other cancers. For example, expression of gastrokine-2 (GKN2) was significantly downregulated or absent in gastric cancer cell lines, gastric intestinal metaplasia, and tumor tissues. Over-expression of GKN2 contributed to cell proliferation, migration, and invasion of gastric cancer and arrested the cell cycle at the G1-S transition phase [6]. In contrast, levels of expression of inhibin beta A (INHBA) were significantly higher in cancer tissue than in adjacent normal mucosa, and it is regarded as an independent prognostic factor in gastric cancer [22]. In addition, we discovered some novel genes, such as TMEM184A, PSAPL1, KIAA1199, CLRN3 and FNDC1, which have not been reported in gastric cancer previously, and their roles in cancer remain unknown.
One of the advantages of our gene expression microarray analysis is that it represented the expression of lncRNAs and mRNAs so that both could be investigated together. Our previously report on the role of lncRNA H19 and its network in GC [36] was based on this microarray data. However, most of the lncRNAs such as DRD5, FMO6P, SNAR-A3 and TPRXL showed in our microarray haven't been identified and need further investigation to clarify their roles in gastric cancer.
Based on our gene expression profiling data, the genes and their functions activated in gastric cancer were responsible for proliferation, adhesion, migration and metastasis, which was consistent with the results from pathway analyses. Interestingly, we discovered that most of the cancer-related signaling pathways reported previously such as Notch, mTOR and Hedgehog were activated in GC based on our data. These results support the viewpoint that heterogeneity Differentially Expressed Genes Involved in Pathways Associated with GC is the characteristic of GC. Comparison of the co-expression network between normal tissues and cancer suggested that the expression, functions and interactions of the majority of physiological gene were lost or damaged in gastric cancer, whereas proliferation, migration and metastasis were abnormally enhanced. These interesting findings match the characteristics of cancer, such as anaplasia and dedifferentiation. These differentially expressed genes involved in signaling pathways acted as key genes in co-expression network might be the potential targets of anti-cancer therapy or diagnostic markers in the future.