Abnormal expression of mRNA, microRNA alteration and aberrant DNA methylation patterns in rectal adenocarcinoma

Aim Rectal adenocarcinoma (READ) is a malignancy cancer with the high morbidity and motility worldwide. Our study aimed to explore the potential pathogenesis of READ through integrated analysis of gene expression profiling and DNA methylation data. Methods The miRNA, mRNA expression profiling and corresponding DNA methylation data were downloaded from The Cancer Genome Atlas (TCGA) database. Differentially expressed mRNAs/ miRNAs/methylated regions (DEmRNA/DEmiRNAs) were identified in READ. The negatively correlation of DEmiRNA-DEmRNAs and DNA methylation-DEmRNAs were obtained. DEmRNAs expression was validated through quantitative real-time polymerase chain reaction (qRT-PCR) and microarray expression profiling analyses. Results 1192 dysregulated DEmRNAs, 27 dysregulated DEmiRNAs and 6403 aberrant methylation CpG sites were screened in READ compared to normal controls. 1987 negative interaction pairs among 27 DEmiRNAs and 668 DEmRNAs were predicted. 446 genes with aberrant methylation were annotated. Eventually, 50 DEmRNAs (39 down- and 11 up-regulated DEmRNAs) with hypermethylation, synchronously negatively targeted by DEmiRNAs, were identified through the correlation analysis among 446 genes with aberrant methylation and 668 DEmRNAs. 50 DEmRNAs were significantly enriched in cAMP signaling pathway, circadian entrainment and glutamatergic synapse. The validation results of expression levels of DEmRNAs through qRT-PCR and microarray analyses were compatible with our study. Conclusion 7 genes of SORCS1, PDZRN4, LONRF2, CNGA3, HAND2, RSPO2 and GNAO1 with hypermethylation and negatively regulation by DEmiRNAs might contribute to the tumorigenesis of READ. Our work might provide valuable foundation for the READ in mechanism elucidation, early diagnosis and therapeutic target identification.


Introduction
Colorectal cancer (CRC) is a leading cause of cancers with high morbidity and mortality. CRC is the third leading cause of cancer both in male and female and worldwide number of death in 2012 is more than 690,000 [1]. Australia/New Zealand, Europe and Northern America possess the highest incident rates, Africa and Northern America possess the low incidence rates of CRC [1,2].
CRC is classified as colon cancer and rectum cancer according to the anatomical location. Development of metastasis often meddles with homeostasis and predicts unfavorable prognosis [3]. Approximately 20% of CRC patients lose opportunity for radical surgery on account of metastases [4]. Rectal cancer spreads more frequently to the thoracic organs, bone and nervous system and less frequently to peritoneum compared to colon cancer [5]. Age, gender, smoking and diabetes mellitus are risk factors of rectal cancer [6][7][8].
Numerous data indicate that the aberrant accumulation of genetic and epigenetic changes function as vital roles in initiation and development of rectal cancer.
DNA methylation typically occurs at cytosine-phosphate-guanine (CpG) sites, regulates gene expression, protein function and RNA processing. Patients with hypomethylation of LINE-1 show shorter survival time and a higher incidence rate of tumor recurrence in earlystage rectal cancer (stage I-II) [9]. KRAS mutations and CDKN2A promoter methylation are closely related to the poor overall survival in rectal cancer [10]. XRCC3 is over-expressed in rectal cancer patients responding to preoperative chemoradiotherapy (pCRT) followed by surgery compared with those non-responders, the deregulation of which is extensively involved in the chemo-resistance mechanism [11]. Recent articles demonstrate that two gene sets of dysregulated genes could predict pCRT response in patients with rectal adenocarcinoma, one gene set is AKR1C3, CXCL11, CXCL10, IDO1, CXCL9, MMP12, HLA-DRA and another gene set is TMEM188, ITGA2, NRG, TRAM1, BCL2L13, MYO1B, KLF7,and GTSE1 [12,13]. microRNAs (miRNAs) are small non-coding RNAs, which negatively regulates mRNA expression of targeted genes. In rectal cancer, miR-92a expression is inversely associated with KLF4 and IQGAP2 expression [14]. miR-573,miR-579 and miR-802 display the significant correlation with overall survival of patients, in addition to, miR-573 is significantly correlated with tumor grade of patients after preoperative chemoradiotherapy [15]. In patients with rectal adenocarcinoma, serum level of miR-125b is significantly over-expressed in pCRT non-responders compared to those responders [16].
In order to explore the tumorigenesis mechanism and potential biomarkers for early diagnosis of rectal adenoacrcinoma, integrated analyses of miRNA expression profiling, mRNA expression profiling and DNA methylation data based on The Cancer Genome Atlas (TCGA) database were performed. Our study might be the ground work for further mechanism elucidation of rectal adenocarcinoma and identification of the diagnostic biomarkers for early stage of rectal adenocarcinoma.

Source of data
The mRNA expression data, miRNA expression data, methylation data and the corresponding clinical data for retal adenocarcinoma (READ) were downloaded from TCGA database (https:// tcga-data.nci.nih.gov/tcga/, Oct 20, 2015). Total of 16 READ patients were excluded from 171 READ patients based on the criteria including patients without the history of other malignancy and without the history of neoadjuvant treatment before collection of tumor specimens. The sequencing platforms of mRNA expression data, miRNA expression data, methylation data were respective UNC_ IlluminaHiseq_RNASeqV2, BCGSC_IlluminaHiSeq_miRNASeq and JHU_USC_HumanMethylation450. There were respective 29 READ patients with TNM sage I, 49 patients with TNM sage II, 48 patients with TNM sage III, 23 patients with TNM sage IV and 6 patients with unknown TNM stage in our study.

Identification of differentially expressed mRNAs and miRNAs
The mRNA expression profiling and miRNA expression profiling of READ and normal control tissues were downloaded from TCGA data portal. The mRNA and miRNA expression level were calculated and demonstrated as reads per million miRNA mapped data. The significantly differentially expressed mRNAs (DEmRNAs) and differentially expressed miRNAs (DEmiRNA) were identified inREAD compared to normal control tissues through DESeq2 repackage [17]. The calculated raw P-value was performed to multiple testing corrections through Benjamini and Hochberg method [18]. The adjusted P-value was described as false discovery rate (FDR). mRNA and miRNA with FDR<0.0001 and abs (log 2 fold change)>2 was identified as DEmRNA and DEmiRNA, respectively.
Differentially methylated CpG sites analysis COHCAP package in R was applied to identify differentially methylated sites between READ and normal control samples [19]. t-test was used to determine differences in two group comparison. The methylation score for each CpG site was described as a beta value according to the fluorescent intensity ratio. The differentially methylated sites with FDR <0.05 and abs (delta beta) >0.2 were selected. The corresponding genes and CpG islands of the differentially methylated sites were annotated.

Manhattan plot analysis and heatmap analysis
The distribution of differentially methylated CpG sites in 22 chromosomes was described as Manhattan plot by "qqman" package in R language. The pattern of differentially methylated CpG sites in READ and normal controls were visualized by "pheatmap" package in R language.

DEmiRNA target genes prediction analysis
The target DEmRNAs of DEmiRNAs were predicted through mirWalk database [20], in which the targeted correlations between miRNAs and mRNAs have been confirmed through in vivo and in vitro experiments. 6 algorithms including RNA22, miRanda, miRDB, miRWalk, PICTAR2 and Targetscan were conducted to predict target-genes of miRNAs. The genes, simultaneously predicted by more than 4 algorithms, were identified as the target-genes of miRNAs [20].

Visualization of DEmiRNA-DEmRNA network
The identified negative correlations between DEmiRNAs and DEmRNAs through PCC analysis and mirWalk database were visualized by Cytoscape software (http://cytoscape.org) [21]. In the network, a circular node represented the miRNA and a rectangle node represented the mRNA, and their association was represented by the solid line.

Correlation analysis of aberrant gene expression and DNA methylation in READ
The identified DEmRNAs and reference genes with aberrantly methylated CpG islands were screened out to analyze the correlation between gene expression and DNA methylation. There was a negative correlation between DNA methylation and mRNA expression. The genes with hypermethylation and lower expression or genes with hypomethylation and higher expression in READ were our concern.
KEGG pathway enrichment analysis In order to explore the enriched signaling pathways of aberrantly methylated genes and aberrantly expressed genes, the online software of KOBAS 2.0 was used for Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment [22,23]. KOBAS 2.0 was updated on January 26th, 2015, contains the latest data of signaling pathways. P<0.05 was set as the threshold of significantly enriched pathways.
Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR) 5 paired READ and adjacent normal tissues were obtained from patients underwent radical surgery in Department of Colorectal Surgery, Tianjin Union Medical Center. Our study was approved by the Ethics committee of Tianjin Union Medical Center and complies with the Declaration of Helsinki. All of participants provided their written informed consent to participate in this study. Total RNA of fresh samples of 5 paired READ tumor and adjacent nontumor tissues and was extracted by using TRIzol reagent (Invitrogen, CA, USA). The Super-Script III Reverse Transcription Kit (Invitrogen, CA, USA) was used to synthesize the cDNA. qRT-PCR reactions were carried out using Power SYBR Green PCR Master Mix (Applied Biosystems, Foster City, CA) on the Applied Biosystems 7500 (Applied Biosystems, Foster City, CA).The relative expression of genes was calculated using the 2 -ΔΔct methods [24].GAPDH was used as internal control gene and all of primers used were shown in Table 1. Each assay was performed as in triplicate. The GraphPad Prism version 6.0 software packages (GraphPad Software, San Diego, CA, USA) was used to output the Figures.

The expression levels of DEmRNAs were validated based on Gene Expression Omnibus database
Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) is a freely public data repository that archives microarray, next-generation sequencing, and other forms of highthroughput functional genomics data submitted by research community. In order to performed the further validation of dysregulated genes in READ, public expression profiling deposited in GEO database was used in our study.
Firstly, RNA-seq expression profiling of READ tissues were searched, however, none of dataset was retrieved. Then, microarray expression profiling of READ tissues were searched and the inclusion criteria of datasets were complied with following details: (1) the dataset was generated from mRNA expression profiling of READ patients; (2) both of READ and matched mucosa tissues samples were available in the dataset; (3) the sample size of dataset was more than 50. Finally, GSE75970 (230 READ versus 230 matched mucosa tissues) and GSE20842 (65 READ versus 65 matched mucosa tissues) of mRNA expression profiling were incorporated into our study.
The mRNA expression profiling of GSE75970 and GSE20842 were integrated and the expression levels of f DEmRNAs in READ were detected. Box-plot analysis was performed to describe the expression of DEmRNAs both in READ and matched mucosa tissues. P-value indicating the difference between two group was calculated and P<0.05 was significant difference.

Dysregulated mRNAs in READ
The mRNA expression profiling of READ tumor samples and normal control tissues were conducted to differentially expressed genes analysis based on TCGA database ( Table 2). Total of 1192 DEmRNAs including 439 up-regulated and 753 down-regulated DEmRNAs were identified in READ compared to normal controls according to the threshold of FDR<0.001 Table 1. The primers for qRT-PCR validation.

Genes Primers
qRT-PCR: quantitative real-time polymerase chain reaction.

Aberrant methylation CpG sites in READ
In order to investigate the differentially methylated sites between READ and normal control samples, methylation score of each gene, described as a beta value, was compared between READ and normal controls. Finally, 6403 differentially methylated sites including 4275 hypermethylated sites and 2128 hypomethylated sites were identified in READ (S1 Table).  mRNA, miRNA and DNA methylation alteration in rectal adenocarcinoma 555 hypermethylated and 6 hypomethylation CpG islands, associated with 446 annotated genes (S2 Table).

DEmiRNA-DEmRNA interaction in READ
In order to explore the interaction between DEmiRNA and DEmRNA, DEmiRNA was subjected to mirWalk database, and then the predicted target-genes of DEmiRNAs in READ were obtained. Total of 668 DEmRNAs were identified as the target-genes of 27 DEmiRNAs in READ.

Correlation analysis of aberrant gene expression and DNA methylation in READ
In order to determine the potential effects of the aberrant DNA methylation on the abnormal gene expression in READ, the association between DNA methylated status and mRNA expression level were investigated. DNA methylation status could positively or negatively correlate with mRNA expression profiling. In our study, negative correlation between DNA methylation and gene expression in READ were concerned. mRNA, miRNA and DNA methylation alteration in rectal adenocarcinoma 555 hypermethylated CpG islands were associated with 441 annotated genes and 6 hypormethylated CpG islands were associated with 5 annotated genes, respectively. 446 genes with aberrant methylation CpG islands were overlapped with 668 genes with aberrant expression, total 50 dysregulated genes including 11 up-regulated genes and 39 down-regulated genes with hypeymethylated CpG islands were identified, and there was no dysregulated genes with hypomethylated CpG sites were identified in READ as Fig 4 shown (S3 Table).

Functional annotation
In order to investigate the potential signaling pathways of 50 dysregulated gene with aberrant methylation in READ, KEGG database was used to enrich the pathways. As Table 5 shown, total of 9 pathways including cAMP signaling pathway, circadian entrainment, glutamatergic synapse, nicotine addiction, amphetamine addiction, morphine addiction, melanogenesis,

qRT-PCR verification
In our study, dysregulated genes were identified in READ through bioinformatics analysis. qRT-PCR method was used to validate the expression levels of dysregulated genes in 5 READ tumor samples and 5 adjacent non-tumor tissues. As Fig 5A, 5C and 5D shown, the expression levels of ATP2B4 (P<0.05), ROR1 (P<0.05) and PRKCB (P<0.05) were significantly downregulated and NR3C1 (Fig 5B) was down-regulated in READ compared to adjacent nontumor tissues. TCF7, SLC6A6, PDPN, WNT2 and ONECUT2 were up-regulated in READ, as Fig 5E-5I shown.
The expression levels of DEmRNAs were analyzed in the GSE75970 and GSE20842 datasets Integrated analyses were performed to GSE75970 and GSE20842 datasets for detecting the expression levels of identified DEmRNAs in READ through the larger sample size. In our study, 33 DEmRNAs' expression were detected in GSE75970 and GSE20842 datasets, including 10 DEmRNAs randomly selected from top 20 up-regulated DEmRNAs, 10 DEmRNAs randomly selected from top 20 down-regulated DEmRNAs, 9 DEmRNAs examined by qRT-PCR and 7 DEmRNAs discussed in the manuscript (Fig 6 and S2 Fig).
As Fig 6A-6E revealed, the expression levels of KRT23, MMP7, STRA6, WNT2, GRHL3 were significantly up-regulated in READ tissues; moreover, the expression levels of DPP6, CA7, BAI3, FIGF, RSPO2, SORCS1, PDZRN4, LONRF2, CNGA3, HAND2 and GNAO1 were obviously down-regulated in READ tissues compared with matched mucosa tissues based on the microarray analyses (Fig 6F-6P). The expression levels of other 17 DEmRNAs in READ tissues through microarray analyses were shown in S2 Fig. In summary, the expression levels of those 33 DEmRNAs in READ tissues based on microarray analyses were completely compatible with our bioinformatics analyses based on TCGA datasets.

Discussion
In our study, DEmiRNAs, DEmRNAs and differentially methylated genes were identified in READ compared with normal control tissues based on the TCGA database. DEmiRNA-DEmRNA regulatory network was constructed and DEmRNAs associated with differentially methylated genes were recognized. qRT-PCR was applied to verify the dysregulated genes mRNA, miRNA and DNA methylation alteration in rectal adenocarcinoma through bioinformatics analysis. 9 candidate genes were randomly selected for qRT-PCR examination, ATP2B4, NR3C1, ROR1 and PPKCB were down-regulated in READ, TCF7, SLC6A6, PDPN, WNT2 and ONECUT2 were up-regulated in READ compared to paired adjacent non-tumor tissues, which were accordance with our analyses. In order to further validate the expression levels of DEmRNAs in READ, microarray expression profiling with larger sample size of READ and matched mucosa tissues generated from GEO database were applied for validation. The expression status of 33 representative DEmRNAs in READ tissues based on microarray analyses were completely compatible with our bioinformatics analyses based on TCGA datasets. Both of qRT-PCR validation and microarray analyses indicated our bioinformatics analyses based on TCGA datasets was acceptable. 7 genes SORCS1, PDZRN4, LONRF2, CNGA3, HAND2, RSPO2 and GNAO1 were our concerns by reasons of those down-regulated genes had increased methylation and were negatively targeted by up-regulated miRNAs in READ.  RSPO2 encodes R-spondin 2, belongs to R-spondin family. RSPO2, is a typical secretory protein and activates the canonical Wnt/beta-catenin signaling pathway as an agonist. It is negatively regulated by mir-335, mir-141, mir-19b-2 and let-7f-2 in the DEmiRNA-DEmRNA network (Fig 3). RSPO2 with hypermethylation was the top 10 down-regulated DEmRNAs in READ compared to normal control tissues. Dysregulated expression of RSPO2 is closely related to the tumor invasiveness and aggressiveness in papillary thyroid cancer, pancreatic cancer and lung adenocarcinoma [25][26][27]. However, RSPO2 functions as a tumor driver or suppressor in CRC is still controversial [28][29][30]. Our study indicated RSPO2 was significantly down-regulated in READ and kept the line with the published research, which demonstrates RSPO2 functions as a tumor suppressor; over-expression of it inhibits CRC cell proliferation and tumorigenicity [28]. RSPO2 is highly expressed in colon cancer stem cells and promotes the invasion of CRC cells through enhancing epithelial-mesenchymal transition [26]. The biological functions of RSPO2 in CRC and READ are unclear and the expression pattern of RSPO2 in READ regulated by DNA methylation modification and miRNA needs to be further investigated through experiments.
GNAO1 and PDZRN4 with increased methylation were down-regulated in READ. GNAO1 encodes the guanine nucleotide-binding protein, α-activating activity polypeptide O, is a member of the subunit family of Gα proteins. GNAO1 with hypermethylation was down-regulated in READ, and was negatively targeted by mir-182, mir-19b-2 and let-7f-2 (Fig 3). GNAO1 was significantly enriched in circadian entrainment, glutamatergic synapse, morphine addiction, melanogenesis, retrograde endocannabinoid signaling and cholinergic synapse pathways. A number of articles have illuminated its biological roles in cancers including hepatocellular carcinoma (HCC), gastric cancer and renal cell carcinoma [31][32][33].In HCC, GNAO1 acts as tumor suppressor depending on down-regulation of GNAO1 promoting cell proliferation and suppressing the senescence of HCC cells [31]. Nevertheless, GNAO1 functions as tumor driver in gastric cancer. The silencing of GNAO1 in gastric cancer cells inhibits cell proliferation and enhances cell apoptosis; increased GNAO1 predicts the unfavorable prognosis in patients with gastric cancer [32]. PDZRN4 encodes PDZ domain containing ring finger 4, PDZRN4 was negatively targeted by mir-182 (Fig 3). PDZRN4 is described as a potential tumor suppressor with down-regulation in HCC. Increased expression of it inhibits cell proliferation and colony formation [34].
HAND2 encodes heart and neural crest derivatives expressed 2, belongs to the basic helixloop-helix family of transcription factors. HAND2 was negatively regulated by mir-1-2 (Fig 3), which is down-regulated in READ with increased methylation. It is related to endometrial carcinoma, non-small cell lung cancer and neuroblastoma [35][36][37]. HAND2 is absent in endometrioid carcinoma and reduced in atypical hyperplasia compared to benign endometrium; knockout of HAND2 leads to continuous proliferation in mice model [35]. HAND2, as the transcript factor, is over-expressed in early stage of lung squamous cell carcinoma instead of lung adenocarcinoma [36]. HAND2 and DEIN expression is well correlated in neuroblastoma, HAND2 is highly expressed in neuroblastoma [37]. SORCS1 and CNGA3 with increased methylation were down-regulated in READ. In our work, SORCS1 were negatively regulated by mir-335 and CNGA3 was negatively regulated by mir-182 and mir-19b-2 (Fig 3). SORCS1 encodes sortilin related VPS10 domain containing receptor 1,is associated with the Alzheimer's disease, type 2 diabetes and obese women with polycystic ovary syndrome [38][39][40]. CNGA3 encodes cyclic nucleotide gated channel alpha 3, a member of cyclic nucleotide-gated cation channel protein family, is required for normal vision. The missense mutations for c.633T>A (p.D211E) and c.1006G>T (p.V336F) and the combined heterozygous mutations for c.997_998delGA and p.M424V in the CNGA3 gene is the cause for achromatopsia [41,42].
In conclusion, DEmRNAs, DEmiRNAs and differentially methylated regions were identified in READ compared to normal control tissues based on TCGA database. 39 down-regulated DEmRNAs with hypermethylation, had potentially negative correlation with DEmiRNAs, were identified in READ. Among these 39 genes, RSPO2, GNAO1, PDZRN4, HAND2, SORCS1, CNGA3 and LONRF2 might play essential roles in the tumorigenesis of READ. GNAO1, PDZRN4, HAND2, SORCS1, CNGA3 and LONRF2 are not previously described in relation to tumorigenesis of READ. The role of RSPO2 in colorectal cancer is arguable according to the published article. Our results provide the foundation for further READ research in mechanism elucidation and early diagnosis, and these results must be confirmed through additional experiments.