A Meta-Analysis: Identification of Common Mir-145 Target Genes that have Similar Behavior in Different GEO Datasets

Background MicroRNAs, which are small regulatory RNAs, post-transcriptionally regulate gene expression by binding 3'-UTR of their mRNA targets. Their deregulation has been shown to cause increased proliferation, migration, invasion, and apoptosis. miR-145, an important tumor supressor microRNA, has shown to be downregulated in many cancer types and has crucial roles in tumor initiation, progression, metastasis, invasion, recurrence, and chemo-radioresistance. Our aim is to investigate potential common target genes of miR-145, and to help understanding the underlying molecular pathways of tumor pathogenesis in association with those common target genes. Methods Eight published microarray datasets, where targets of mir-145 were investigated in cell lines upon mir-145 over expression, were included into this study for meta-analysis. Inter group variabilities were assessed by box-plot analysis. Microarray datasets were analyzed using GEOquery package in Bioconducter 3.2 with R version 3.2.2 and two-way Hierarchical Clustering was used for gene expression data analysis. Results Meta-analysis of different GEO datasets showed that UNG, FUCA2, DERA, GMFB, TF, and SNX2 were commonly downregulated genes, whereas MYL9 and TAGLN were found to be commonly upregulated upon mir-145 over expression in prostate, breast, esophageal, bladder cancer, and head and neck squamous cell carcinoma. Biological process, molecular function, and pathway analysis of these potential targets of mir-145 through functional enrichments in PPI network demonstrated that those genes are significantly involved in telomere maintenance, DNA binding and repair mechanisms. Conclusion As a conclusion, our results indicated that mir-145, through targeting its common potential targets, may significantly contribute to tumor pathogenesis in distinct cancer types and might serve as an important target for cancer therapy.


Introduction
MicroRNAs (miRNAs) are 18-22 nucleotide long, non-coding RNAs, which are transcribed by RNA polymerase II. They regulate their target genes expression at the post-transcriptional level through binding to the 3'-untranslated regions (3' UTR) of specific mRNAs and either cause mRNA degradation or translational inhibition [1,2] MiRNAs are implicated in several central biological processes such as cell development, proliferation, differentiation, and apoptosis. They also have been shown to possess crucial roles in cancer initiation, progression, and metastasis [3,4].
The involvement of miRNAs in cancer pathogenesis is well established, as they can behave as oncogenes or tumor suppressor genes depending on the cellular functions of their targets. As tumor suppressors, miRNAs are downregulated in cancer tissues and repress their oncogenic targets. On the other hand, some miRNAs are upregulated in tumor samples and trigger cancer growth. Additionally, both tumor suppressor and oncogenic miRNAs can induce multiple cancer traits by targeting different cellular pathways [5].
Various pathways can be affected by deregulation of miRNAs, since a single miRNA can target hundreds of mRNAs in the same or different biological pathways, and they can regulate the expressions of human genes up to %60 [6].
DNA Microarray is one such technology, which provides profiling of thousands of genes expressions at the same time. Microarrays are effective tools in cancer research, where they can be used to predict tumor development and progression, to evaluate drug response, and to find out biomarkers [7]. miR-145 is a well-studied miRNA, which is located at 5p32 chromosomal region and its expression is controlled by p53 and some other transcriptional factors like RREB1, FoxO, and C/EBP-β [8].
miR-145 has been found to be associated with tumorigenesis via suppressing the expression of several genes such as Insulin-like growth factor 1 in colorectal cancer [20], c-Myc and Cyclin-dependent kinase 6 (Cdk6) in oral squamous cell cancer [21], ER-α in breast cancer, SOX2 in larynx and prostate cancer, and several other genes in distinct cancer types [22]. However, to the best of our knowledge there is no a meta-analysis study investigating mir-145 targets and this is the first study, which combines and correlates miR-145 and mRNA microarray data in the literature.
In this study, we aimed to show potential common target genes of miR-145 in several cancer types including prostate, breast, esophageal, bladder, head, and neck squamous cell carcinoma cancer, using GEO database and to unravel the underlying molecular pathways associated with mir-145 in tumor pathogenesis.

Literature search
A systematic review of the microarray literature from GEO database was documented to identify studies, where expression profiling was performed for miR-145 over-expressing cancer cell lines, published up to Jun 15, 2014. Medical subjective heading (MeSH) was "miR-145 in human cancer". A total of 55 studies were identified through GEO database searching. Of these, 20 studies were retained after rejecting replications. A total of 9 articles were excluded according to the title of samples (GSMs). The reason for the exclusion was the following: the title of samples no association with miR-145. The full-text articles were evaluated for the remaining 11 studies, and 7 were recruited in the final meta-analysis. The other 4 investigations were excluded for these reasons: the median-centered across samples is not zero (not suitable for comparison) and data sets are containing null values. Remained 7 Microarray datasets were obtained from National Centers for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The entire study selection process for meta-analysis is shown in Fig 1. All GSE Series Matrix files, platform sets and annotations files were downloaded and parsed by GEOquery package in Bioconducter 3.2 with R version 3.2.2. Box-plots were drawn for each selected GSE microarray data for visual comparison of inter-group variability in a statistical population. These box plots were used to display the statistical distribution of different data values of mir-145 GSMs.

Data Preparation and Statistical Analysis
Before being able to do statistical analysis, we need to be prepared to meet the following requirements: raw data must be log2-scaled and all datasets must exhibit the same data precision. In our dataset except GSE58295 all of GSEs were in the form of (log) 10 (ratio), therefore, we converted them to (log) 2 (ratio) format. All GSEs have 16 bit precision. Since the comparable data are provided, the analysis should be limited to genes that are expressed in all data sets. In 16 GSMs (samples) 17085 common genes were founded. Then, we checked for the crossplatform bias (batch effect) by computing and plotting the Principal Components Analysis (PCA) for combined dataset. PCA plot (Fig 2, left panel) shows that sample of the each technology (GSE) clusters together. This means that we have batch effect in our data. We have removed batch effect by "removeBatchEffect" function in limma package. The right panel of Fig 2 shows PCA plot after removing the batch effect. Note that the expression values of combined data will change after removing batch effect. In the next step we try to design and make proper contrast matrix again by limma package. We have five groups (each disease) with eight replicates in group1, one replicate in group2, three replicates in group3, two replicates in group 4, and two replicates in group5. In order to test the interaction effect of the different disease, the meaningful contrast was considered to be group1-group2-group3 -group4 + control. Then, differentially expressed genes with P-value < 0.01 were selected as potential candidates in different cancer types.

Cluster Analysis based on Hierarchical Clustering
In this study, two-way Hierarchical Clustering, based on correlation distance and average linkage, was used for gene expression data analysis. In two-way hierarchical clustering, related experiments and genes with similar expression patterns were combined together and were connected by a series of branches to identify which genes are the most important for experiment clustering. These branches are added to the left side and the top of the heat map.

Visualization based on Heat Map
Graphical representation of the changes in genes behavior in cancer data can be represented by heat map. We used heat map to visualized expression level of 8 significantly deregulated genes via color codes. The significantly deregulated genes are shown in rows and the samples are shown in columns. Red, black, and green represent the levels of expression: downregulated, not changed, and upregulated, respectively.

Results
The aim of this study was to identify common target genes of tumor suppressor miR-145 via meta-analysis of microarray-based gene expression profiles in several human cancer types. In this study, we collected a total of 7 gene expression profile data set from previously published studies considering the inclusion criteria. We retrieved the GSE Series Matrix files, platform sets and annotations files from the Gene Expression Omnibus (GEO) including 16 array samples using GEOquery package in Bioconducter with R version. Then, did transformation on raw data, found common genes, and removed batch effects.
Details of the each individual microarray studies are summarized in Table 1. Data set 1 (GSE47657) is obtained from a microarray analysis, which was performed in human prostate cancer cell lines including PC3, DU145, and LNCaP cells, treated with miR-145 to investigate the differentially expressed genes using SurePrint G3 Human GE 8×60K Microarray (Agilent Technologies, Santa Clara, CA, USA) containing 62,976 probes.
In the data set 2 (GSE24782), The microarray analysis was generated from PC3 and DU145 human prostate cancer cell lines which were transfected with miR-145 were using Agilent-026652 Whole Human Genome Microarray 4x44K v2 with 44,495 probes.
Data set 3 (GSE58295) was generated from mir-145 transfected PC3 cells which were collected at 8, 16 and 24 hours after transfection along with untransfected control PC3 cells using Agilent-014850 Whole Human Genome Microarray 4x44K G4112F containing 45,015 probes.
Besides, data set 4 (GSE37119) is obtained from a microarray analysis which was performed in human head and neck squamous cell carcinoma cell lines HNSCC and IMC3 transfected with miRNA 145 utilizing Agilent-026652 Whole Human Genome Microarray 4x44K v2 to arrays spotted with 44,495 probes.
Data set 5 (GSE18625) microarray analysis was performed using DLD-1, colon carcinoma cell line, transfected with miR-145 and collected 24 hours after transfection. Gene expressions were profiled on Affymetrix Human Genome U133 Plus 2.0 Array containing 54675 probes.
Data set 6 (GSE19737) was generated from miR-145 or a negative control pre-miRNA transfected MDA-MB-231 cell line which is most typical cell line with highly metastatic features in breast cancer using Affymetrix Human Genome U133 Plus 2.0 Array with 54,675 probes. Data set 7 (GSE20028) identified miR-145 targets in squamous cell carcinoma. The aim of study was to explore of miR-145 target genes using Agilent-014850 Whole Human Genome Microarray 4x44K G4112F which includes 45,015 probes.
Lastly, data set 8 (GSE19717) gene expression profiles of bladder cancer cell line KK-47, and urinary bladder cancer cell line T24 were investigated upon miR-145 transfection using Agilent-014850 Whole Human Genome Microarray 4x44K G4112F with 45,015 probes.
In order to clarify whether the microarray data were comparable, we initially prepared the Box plots representation of median-centered gene expression as provided in Fig 3. It shows the common target gene expression levels for probe set over all arrays. According to the result all GSE datasets were centered on zero except from GSE18625 (colon cancer). Therefore, we excluded this data set before the statistical analyses. Expression data in all GSEs were converted from (log 10 ) to (log 2 ) to eliminate variability among the datasets and batch effect was removed. Then, by making proper contras matrix significantly differentially expressed genes were found. As a result of the meta-analysis, we found that UNG, FUCA2, DERA, GMFB, TF, and SNX24 are significantly downregulated, and MYL9 and TAGLN are significantly upregulated in all GSM data. As a result, we found eight common target genes of mir-145 that have similar behavior in different GEO datasets. A heat map representation of these genes is demonstrated in Fig 4. In silico analysis tools predicted these genes as potential targets of miR-145 (Table 2).
Biological process (Table 3), molecular function (Table 4), cellular component (Table 5), and KEGG pathways (Table 6) analysis of these potential targets of mir-145 through functional enrichments in PPI network, demonstrated that those genes are significantly involved in telomere maintenance, DNA binding and repair mechanisms. Besides, PPI network of commonly deregulated mir-145 targets and pathway analysis of MYL9, UNG, TAGLN, FUCA2, DERA, GMFB, TF, and SNX24 are represented in Figs 5 and 6, respectively.

Discussion
miRNAs are frequently located in the cancer-associated genomic regions or in fragile sites of the genome. In addition to in vitro and in vivo tools, bioinformatics approaches are of paramount importance to evaluate their roles in the pathogenesis of different types of cancer [30]. Expression of miR-145 has been commonly identified as downregulated in several human cancer types. Several studies suggested that low levels of miR-145 might contribute to  pathogenesis and progression of human tumors [31,32]. miR-145 is a well characterized tumor suppressor in human malignancies which targets various oncogenes in cancer cells. Functional analyses of target genes, which are repressed by mir-145, are crucial to explain their roles in cancer development.
The aim of the present study is to investigate the commonly targeted genes by mir-145 and relevant pathways through evaluating publicly available microarray datasets. We selected miR-145 in our meta-analysis due to its well-known function as a tumor suppressor and since its presence has been reported in a variety of cancers including prostate, esophageal, head and neck, breast, bladder cancer, and squamous cell carcinoma [33,34]. In this study, we extracted 7 gene expression microarray datasets from GEO database (1 of them in Affymetrix Array, others Agilent), which are generated from cell line samples. We identified six genes including UNG, FUCA2, DERA, GMFB, TF, and SNX24 as significantly downregulated and two genes including MYL9 and TAGLN as significantly upregulated upon mir-145 over-expression in distinct cancer types.
The tumorigenic potentials of those genes have not been studied extensively until now. Considering our results, we suggest them as important contributors in tumorigenesis, which should be demonstrated in in vitro and in vivo studies. UNG, FUCA2, DERA, GMFB, TF, and SNX24 as significantly downregulated upon mir-145 over-expression, are expected to have elevated expressions in tumor samples considering low levels of miR-145 in several cancer types. Among those genes, elevated expression of UNG, an essential enzyme for post-replicative repair of uracil in DNA [35], has been found to be associated with pemetrexed-resistance and present in cell lines derived from pemetrexed-resistant histologic subtypes [36]. Besides, high GMFB expression was related to poor disease-free survival and overall survival in patients with SOC (serous ovarian cancer) [37].
In order to minimize false positives, we used target prediction tools, including miRwalk, TargetScan, RNA22, and miRanda. The significantly downregulated genes that we found as a Common Mir-145 Target Genes in Different Cancer Types result of our meta-analysis were predicted to be targeted by mir-145 in at least one in-silico tools out of four. Interestingly, biological process, molecular function, cellular component, and KEGG pathways analysis of these potential targets of mir-145 through functional enrichments in PPI network showed that UNG, FUCA2, DERA, GMFB, TF, and SNX24 are significantly involved in telomere maintenance, DNA binding and repair mechanisms.
As a conclusion, our results pointed out the importance of mir-145 and its targets and suggested that they contribute to carcinogenesis in distinct tumor types.
To the best of our knowledge, this is the first study retrieving gene expression data from several cancer types and investigating the common targets of mir-145 to help enlightening the roles of mir-145 in cancer pathogenesis.