Mirna Expression Profiles Identify Drivers in Colorectal and Pancreatic Cancers

Background and Aim Altered expression of microRNAs (miRNAs) hallmarks many cancer types. The study of the associations of miRNA expression profile and cancer phenotype could help identify the links between deregulation of miRNA expression and oncogenic pathways. Methods Expression profiling of 866 human miRNAs in 19 colorectal and 17 pancreatic cancers and in matched adjacent normal tissues was investigated. Classical paired t-test and random forest analyses were applied to identify miRNAs associated with tissue-specific tumors. Network analysis based on a computational approach to mine associations between cancer types and miRNAs was performed. Results The merge between the two statistical methods used to intersect the miRNAs differentially expressed in colon and pancreatic cancers allowed the identification of cancer-specific miRNA alterations. By miRNA-network analysis, tissue-specific patterns of miRNA deregulation were traced: the driving miRNAs were miR-195, miR-1280, miR-140-3p and miR-1246 in colorectal tumors, and miR-103, miR-23a and miR-15b in pancreatic cancers. Conclusion MiRNA expression profiles may identify cancer-specific signatures and potentially useful biomarkers for the diagnosis of tissue specific cancers. miRNA-network analysis help identify altered miRNA regulatory networks that could play a role in tumor pathogenesis.


Introduction
MicroRNAs (miRNAs) are small non-protein coding RNA molecules that regulate gene expression mainly at the level of protein synthesis [1]. They represent an evolutionary highly conserved system that controls crucial cellular processes, such as development, differentiation, proliferation, apoptosis, and metabolism [2]. Aberrant expression of miRNAs can arise from deletion, mutation, and methylation of miRNA-encoding genes, many of which located at genomic fragile sites or regions frequently deleted or amplified in cancer [3]. Based on these premises, they have been proven to interact with potential oncogenes or tumor suppressors [4]. Many miRNAs are expressed in a tissue-specific manner, with profiles differentially expressed in either normal and neoplastic tissues, and in tumors with distinct biological properties. In addition, some evidences indicate miRNA profiles to allow reliable identification of the cell-of-origin of tumors [5].
Classically, differential expression of miRNAs has been evaluated either as single contribution or as predictive signature [6,7]. However, much of scientific effort is spent currently to elucidate their functions: most miRNAs control at least one mRNA, and most mRNAs are controlled by more than one miRNA [8]. The complexity behind this sophisticated mechanism can be partly alleviated by the knowledge of the conservation levels of each miRNA. Indeed, high conservation across wide phylogenetic distances helps restricting the range of functions that a miRNA can play to regulate the control of species development and physiology.
In this study, we investigated expression patterns of miRNAs in colorectal (CRC) and pancreatic cancer (PC) with the intent to identify miRNA regulatory networks likely involved in oncogenic pathways by evaluating cancer-specific signatures through the analysis of the relationship between miRNA expression profile and cancer lineages.

Samples Selection and RNA Extraction
Primary tumor and neighbouring non-tumorous tissues were obtained from two training cohorts of 19 CRC patients and 17 PC patients, and from two validation cohorts of 14 CRC and 21 PC patients.. The study was performed with the approval of the Scientific and Ethic committees of the IRCCS ''Casa Sollievo della Sofferenza'' Institute, San Giovanni Rotondo, FG (Italy). The patients gave informed written consent according to the Italian law on privacy (providing for the protection of personal data), so individuals can not be identified from data or images included in this publication, and approved by the Scientific and Ethic committees of the IRCCS ''Casa Sollievo della Sofferenza'' Institute. All clinical investigations have been conducted according to the principles expressed in the Declaration of Helsinki.
Patient clinical data, tumor location, and staging are shown in Table S1. Tissue samples were flash frozen in liquid nitrogen and stored at 280uC until nucleic acids extraction. About 150-200 mg fresh frozen tissues were used to isolate total RNA by phenol extraction (TRIzol Reagent, Invitrogen Corporation, Carlsbad, CA, USA). RNA concentration and purity were controlled by Nano Drop Specthophotometer. The Agilent 2100 Bioanalyzer was used to measure the quantity, integrity and purity of small RNAs and total RNA, and only non degraded RNA characterized by an RNA integrity number .7 with no DNA contamination signs was processed.

MiRNA Microarrays
MiRNA expression profile was determined by using Gene-ChipH miRNA Array (www.affymetrix.com). This array contains 46,228 probes comprising 7,815 probe sets, including controls, and covers 71 organisms such as human, mouse, rat and dog. Content is derived from the Sanger miRBase miRNA database v. 11  All microarray data are MIAME compliant and the raw data has been deposited in ArrayExpress (accession number E-MTAB-752 for colon cancer miRNA expression profiles and E-MTAB-753 for pancreatic cancer miRNA expression profiles).

Quantitative estimation of miRNA by reverse transcription Real-Time PCR assay
Quantitative real-time polymerase chain reaction (qPCR) of miRNAs was performed using TaqMan MicroRNA Assay (Applied Biosystems, Foster City, CA, USA) with ABI-PRISM 7700 Sequence Detection System. A two-step protocol requires reverse transcription with a miRNA-specific primer, followed by a real-time PCR with TaqMan probes. The assays target only mature miRNAs, not their precursors. In brief, reverse transcriptase reactions contained 10 ng of RNA samples, 50 nM stemloop RT primer, 10X RT buffer, 0.25 mM each of dNTPs, 3.33 U/ml MultiScribe RT and 0.25 U/ml RNase inhibitor (all purchased from cDNA Archive kit of Applied Biosystems) were performed in final volume of 15 ml and incubated in termal cycler for 30 min at 16uC, 30 min at 42uC, 5 min at 85uC and then held at 4uC. The 20 ml PCR reaction mixture included 1.3 ml RT product (1:15 diluition), 10 ml of TaqMan 2X Universal PCR Master Mix (NoUmpErase UNG) and 1 ml of TaqMan 20X MicroRNA Assay (Applied Biosystems, Foster City, CA, USA). Reactions were incubated in a 96-well optical plate at 95uC for 10 min, followed by 40 cycles of 95uC for 15 sec and 60uC for 10 min. All assays were performed in triplicate.

Statistical Analysis
Patients' baseline characteristics were reported as mean 6 standard deviation (SD) or frequencies and percentages for continuous and categorical variables, respectively.
For each of the two training cohorts, microRNA chip data were normalized using Robust Multi-array Average (RMA) algorithm [9]. In order to identify differentially expressed miRNAs between paired normal and tumor tissues, two approaches were followed. Firstly, a classical paired t-test controlling for false discovery rate (fdr) allowed ranking miRNAs according to their p-values. Secondly, random forest (RF) analysis [10] was applied to detect miRNAs with the best capability in discriminating tumor from paired normal tissues. A RF is a classifier consisting of an ensemble of tree-structured classifiers. According to this technique, 100,000 trees were built to classify tissues. The learning set used to grow each tree was a .632+ bootstrap resample of the observations. Trees were allowed to grow to their full size without pruning. The best split at each node was selected from a random subset of miRNAs. The left-out observations (i.e., ''out of bag'' observations) were then predicted to obtain the classification error rate of the considered tree. Predictive ability of the classifier was assessed aggregating the single tree error rates. Furthermore, the random forest framework allowed us to estimate the importance of a variable by looking at how much the classification error increases when ''out of bag'' data for that variable are permuted while all others are left unchanged. The importance metric used was the Mean Decrease in Accuracy (MDA). The MDA is constructed by permuting the values of each variable of the test set, recording the prediction and comparing it with the un-permuted test set prediction of the variable. Therefore, it is the increase in the percentage of times a test set is misclassified when the variable is permuted. We followed Strobl et al. [11] to avoid possible bias in variable selection: individual classification trees were built using subsampling without replacement and adopting a conditional permutation scheme [12]. We obtained a miRNAs ranking in accordance to the variable importance measure. Finally, the two miRNA rankings, one from the classical analysis and one from RF analysis, were merged to obtain a list of differentially expressed miRNAs. Correlations between miRNA expression were estimated using Spearman coefficient. A p value,0.05 was considered for statistical significance. All analyses were performed using SAS Release 9.1 (SAS Institute, Cary, NC, USA) and R for random forest analyses (randomForest package).
Quantitative real-time PCR (qPCR) assay was used on external validation cohorts to confirm microRNA array results, which were assessed using the Comparative Threshold cycle (CT) method and default threshold settings. Relative expressions of miRNA were computed by 2 2DDCt formula [13] using U6 small nuclear RNA (RNU6B) as normalization control. Since the 2 2DDCt transformed values were not normally distributed, comparisons were made using the non-parametric Wilcoxon signed rank test in order to assess the statistical significance of the up or down regulation. For the qPCR validation analyses, we considered a sample size of 14 subjects to provide a 90% power (with a two-sided alpha = 0.05) to detect an up-regulation of at least 2 or a down-regulation of at most 0.5 of each miRNA's expression in terms of 2 2DDCt using the non-parametric Wilcoxon signed rank test and 1 as the null hypothesis.

Network analysis
For each of the two training cohorts an undirected and weighted network was built, where nodes and edges represent miRNAs and their correlations (when significant), respectively. Thus, edges are not oriented since correlation is symmetric, and weighted with the Spearman coefficient. The thickness of edges mirrors the magnitude of the weights. In order to dig critical nodes out of both CRC and PC networks, we borrowed some methods from the social sciences, and evaluated the following centrality measures: degree, betweenness and clustering coefficient [14,15] for each network. Finally, miRNAs have been ranked accordingly.
All these measures (or metrics) are based on enumeration of links or shortest paths. In details, let us define a path from s [ V to t [ V , with V the set of nodes, as an alternating sequence of nodes and edges, beginning with s and ending with t, such that each edge connects its preceding with its succeeding vertex. We set the length of a path as to be the sum of the inverse weights of its edges. The idea is that highly correlated miRNAs minimize the distance between nodes or, from another perspective, the closer two nodes are the more they are correlated. Therefore, we compute the distance between two nodes s and t, written d G (s,t), as the minimum length of any path connecting s and t in G. By Degree centrality is based on the idea that important nodes are those with the largest number of ties to other nodes in the graph. It is often interpreted in terms of the immediate involvement of nodes in relationships established through the network. Let w us be the weight of the edge, which connects the node u with the node s, the degree centrality of a node u is C d (u)~P u,s[E w us , with E the set of edges, according to which the higher is the degree value, the more important (globally correlated) is the node.
Betweenness centrality measures the influence a node has over the indirect correlation between not neighbor nodes. Betweenness, in its basic version, is computed as the fraction of shortest paths between node pairs that pass through the node of interest. Its mathematical where s s,t is the number of shortest paths from s to t, and s s,t (u) is the number of shortest paths from s to t that pass through a vertex u. Vertices that occur on many shortest paths between other vertices have higher betweenness, and then higher relevance in indirect correlations. The Clustering coefficient is a measure of degree to which nodes in a graph tend to cluster together, or in other terms, it quantifies nodes because of the extent to which their neighbors are to being a clique (complete graph) with it. The clustering coefficient of a node is then given by the proportion of links between the nodes within its neighbourhood divided by the number of links that could possibly exist between them. Succinctly, it is expressed where N u defines the immediately connected neighbours set of u and k u~Nu j j. Its 'weighted' formulation builds on the weights of the triangles centered on the nodes of a network. It is defined by Horvath and Zhang [16] as C HZ (u)~P i=u : where w ij is the weight over the edge e ij , and thus w ui w ij w ju the product of the weights of the edges which form a closed triangle (u?i?j?u) of nodes. Starting from the local definition of the clustering coefficient, we report its averaged formulation by Schank and Wagner [17] for the whole network as the ratio between the sum of the products of the clustering coefficients and a general weight function, and the weight function is the general weight function. The weight function is usually chosen among the metrics that better capture the topology of the network under examination. A lower Average Weighted Clustering Coefficient measure indicates a more important node in relationship to the network robustness.
Networks have been drawn and analyzed by a custom standalone tool written in C# and built over the library NodeXL 1.0.1.174 [18].

Altered miRNA expression in patients with CRC and PC
We compared miRNA profiles of 36 pairs of solid tumors and adjacent nontumorous tissues in the training cohorts (19 CRC and 17 PC) by means of microRNA microarrays. Human miRNAs (has-miR-) and tissues were grouped by a hierarchical clustering analysis ( Figure 1). Each plotted probe was color-coded to equate the level of expression of the miRNA relative to its median expression level across the entire tissue samples set (blue, low; red, high) ( Figure 1).

Differentially expressed miRNA in CRC and PC
In order to identify miRNAs differentially expressed in paired normal and tumor tissues, two approaches were followed: a classical paired t-test controlling for false discovery rate (fdr) and a RF analysis. By the first analysis, 42 miRNAs were differentially expressed in CRC (Table 1). Twenty-five of them were overexpressed, with hsa-miR1246 showing the highest fold-change value (12.0-fold). In PC, 128 miRNAs were differentially expressed (see Table S2). In particular, of 34 miRNAs with a p,0.01, 30 showed higher expression levels in tumor tissue, with hsa-miR-23a displaying the highest foldchange value (9.3-fold), whereas 4 more miRNAs were downregulated (Table 2).
To ascertain the eventual existence of orthologues in other species, and to use them as internal methodological control, the human miRNAs found altered in the two solid tumors were checked in 71 other organisms present in the array: all deregulated orthologues of human miRNAs (from now on, has-miRwill be referred to as miR-) showed a significant differential expression with similar fold-change in these organisms (Table  S3).
To check the accuracy of the RF, classification performances in discriminating tumor from normal tissue were reported in multidimensional scaling of the estimated proximity matrix plots ( Figure 2) displaying similarities or dissimilarities in data. For each tumor type, the ranks of the most important miRNAs, according to MDA, were derived (Figure 3, panels A and B).

Comparison of miRNA signature detection methods
When the results with the t-test and RF were merged, a significant overlap among miRNAs detected in either CRC and PC tissues was found. By intersecting the miRNAs with p,0.001 at t-test analysis (Table 1) with those with the highest MDA at RF analysis ( Figure 3A), the expression of 24 miRNAs was significantly altered in CRC (Figure 4). By a similar analytical approach (Table S2, and Figure 3B), 23 miRNAs were significantly altered in PC ( Figure 4).
As shown in Figure 4, miRNA expression was heavily tissue specific; indeed, most miRNAs with altered expression in CRC were not differentiallly expressed in pancreatic tumoral tissue. The altered expression of only two miRNA was shared by CRC and PC: miR-145 appeared 2.2-fold down-regulated (p,0.001) in CRC, and 5.7-fold up-regulated (p,0.01) in PC, while miR-1280 was 2.1-fold up-regulated (p,0.001) and 2.3-fold down-regulated (p,0.05) in the former and latter cancers, respectively (see Table 1, Table 2 and Table S2).

miRNA correlations and cancer-miR network
Next we tried to verify the inter-relationship of previously selected miRNAs expressed in either CRC and PC tissues by means of the Spearman correlation coefficient method.
For CRC, different degrees of correlation were found for the 24 miRNAs exhibiting altered expression: miR-106a was not correlated at all, while miR-195 exhibited the largest number of ties and was taken as a root node with 9 edges (degree 15.86); 8 links were detected for miR-28-3p (degree 15.66), and 7 links for miR-1280 (degree 13.79), miR-1246 (degree 13.05), and miR-140-3p (degree 12.12) ( Figure 5, panel A). The latter 3 nodes were also interconnected with miR-28-3p, which showed a high betweenness value. For this last measure, the highest value was observed for miR-1246 node on which many shortest paths between other vertices occur (Table S4). Another important node was represented by miR-18a with 6 relationships, 3 of them established with several miRNAs belonging to the miRNA 17-92a cluster. When we measured the degree of nodal clustering (clustering coefficient), the miR-378, miR-10b and miR-31 appeared to be a clique with vertice in miR-28-3p. To verify the robustness of the network, we calculated the average weighted clustering coefficient, and evaluated the contribution of a miRNA to the overall compactness of the network. The lowest scores were observed for miR-1280, miR-195 and miR-140-3p which were the vertices of complete graph (Figures 5, panel A and Table S4).
Twenty out of 23 miRNAs with a significant deregulation in PC tissues were linked to others vertices of the network by a number of edges varying from one to five (degree: from 8.88 to 1.92); for only 3 miRNAs (i.e. miR-30d*, miR-1280 and miR-493*) no significant correlation was observed (degree: 0), ( Figure 5, panel B). In particular, the maximum number of ties was found for miR-103 (degree: 8.88), miR-23a (degree: 8.84) and miR-15b (degree: 8.38). These 3 main nodes were all linked to miR-199a-3p (degree: 6.03) and these four miRNA appeared inter-crossed to constitute a fournode clique. The highest betweenness centrality measure was observed for miR-384 (betweenness: 0.31), found to be critically positioned in between (thus, responsible of many) medium-range correlations. We verified the essentiality of the four-node clique by measuring the overall robustness of the network (in terms of paths redundancy) after removing it. We computed the average weighted clustering coefficient for the whole network by removing in turn a node. In particular, after excluding the nodes of the clique (miR-103, miR-15b, miR-199a-3p and miR-23a) the lowest ranks, namely  5.78, 5.85, 6.05 and 6.14 were obtained, respectively. This finding pinpoints to these nodes as functionally relevant and as considerable constituent of the cohesiveness of the network ( Figure 5, panel B and in Table S4).

Validation qPCR
Among significantly deregulated miRNAs in the microarray study, 18 were selected for further validation by quantitative realtime PCR (qPCR). Selection of these miRNAs was based on two criteria: the outcome of the previously described intersection analysis, and/or their involvement in colon or pancreas tumorigenesis (according to literature data). Quantitative PCR was applied to analyse altered expression of the selected miRNAs, as well as that of the RNAU6 control, in the tumor and normal tissues taken from a new validation cohorts of 14 patients with CRC and 21 PC patients.

Discussion
MiRNAs that are differentially expressed in various cancers may participate in common altered regulatory pathways [19]. Thus, the knowledge of their interacting network can provide a perspective of altered miRNAs and their pattern of deregulation (i.e., up-or down-regulation). In this study, we profiled miRNAs in two different solid cancer tissues, the CRC and PC, to test whether any plausible ''signatures'' was detectable. To this purpose, we initially looked for miRNAs differentially expressed in tumors and normal tissue of the two different lineages, and subsequently identified ''tissue-specific'' miRNAs by using a novel statistical approach, i.e. the RF classifier. For each cancer tissue we were able to generate a list of the most specific and significantly deregulated miRNAs. Interestingly, in the two lists only two miRNAs recurred but with opposite patterns of expression: miR-145 and miR-1280. However, while different patterns of the expression of miR-145 are recognized [20], nothing is known about miR-1280 and its behavior in solid tumor.
A comprehensive analysis of interactive networks of these miRNAs showed that their deregulation patterns are tissue specific, as dissimilar connections (i.e., different correlation patterns) were observed between them in the two lineages. Moreover, we determined the most critical miRNAs and verified that they were different in CRC and in PC networks.
To understand the regulatory mechanisms of the miRNAs linked with significant connections in both networks, we conducted a three-step in-silico analysis. First, we ranked nodes by means of well-known centrality indices: weighted degree and betweenness. While the former index provided insights of the direct effect exerted by each node to its neighborhood (in our case, their correlation), the latter identified nodes essential for indirect Figure 5. CRC and PC network. Green points depict the nodes corresponding to 24 miRNAs of CRC network and 23 miRNAs of PC network. The edges characterize the significant correlations, whose thickness reflects Spearman Correlation Coefficients (blue and green edges are used for positive and negative correlations respectively). In CRC network, as shown in panel A, miR-195, miR-28-3p, miR-1280, miR-18a and miR-1246 exhibit the highest weighted degree rank. No correlation is found for miR-106a. MiR-1246 has also the highest betweenness rank. In PC network, as shown in panel B, miR-103, miR-23a and miR-15b have the highest degrees and are all linked to miR-199a-3p as well as to each other forming a four nodes clique. In addition, the removal of these nodes causes the deepest drop of the average clustering coefficient rank. MiR-384 exhibits the highest betweenness centrality measure. doi:10.1371/journal.pone.0033663.g005 correlations, namely nodes that lie in paths of correlation and that somehow contributes to long range correlations. Consequently, we ranked nodes with the clustering coefficient index, in order to measure their degree of cohesiveness with the immediate neighbors. The aim was to quantify the robustness of the networks, by looking for the nodes that would otherwise disrupt the networks architecture when not taken into account. To this purpose, we computed the cluster coefficient for the whole networks by cyclically removing one node with corresponding edges, and by employing the weighted degree as weight function. Generally, networks lacking nodes essential to the overall robustness exhibited lower scores. As last step, we validated our in-silico estimations by manually inspecting target genes and their signaling pathways. With these approaches, we observed that the majority of miRNAs with similar deregulation patterns were associated with common target genes and/or regulatory pathways (see Table S5). In particular, in CRC three principal nodes were found: miR-195, miR-18a, and miR-1246. The first node had a high relationship with miRNAs involved in MAPK signaling pathway, as shown in-silico analysis by using DIANA-mirPath (http://diana.cslab.ece.ntua.gr/pathways/ index_multiple.php), except miR-31 and miR-28-3p, which belong to the EGF pathway. The second node, the miR-18a, included several miRNAs (miR-17, miR-19b, miR-92a) belonging to the same cluster: the miRNA 17-92a. This cluster plays an important role in carcinogenesis and embryogenesis by regulating the proliferation process through E2F1 gene under-expression [21]. The other three correlations (miR-145, miR-21 and miR-182) were found associated with IGF1R [22], EGFR [23] and AKT [24] signaling pathway. The last relationship has miR-1246 as principal node. Recently, Zhang et al. [25] have shown that miR-1246 is a new target of p53 family members. TP53 induces the expression of miR-1246 which, in turn, reduces the level of DYRK1A, a Down syndrome-associated protein kinase. Overexpression of miR-1246 reduces DYRK1A levels and decreases the induction of apoptosis [25].
Focusing on the principal nodes detected in PC network, miR-103 was associated with alteration of TGF-b signaling pathway, and miR-23a with KRAS-mediated signaling pathway, while for miR-15b a number of biological relevant associations with cellular signaling pathways were observed [26]. On the other hand, the commonly correlated node miR-199a-3p, which form a complete graph with the three main vertices of the network, was associated with MAPK signaling pathway. It is to note that both TFG-b and KRAS pathways belong to the ''core signaling pathways'' and processes reported to be altered in many PC [26]. In addition, considering that a lower Average Weighted Clustering Coefficient measure indicates a more important node in relationship to the network robustness, the low value obtained for miR-103 underlines the central role for TGF-b signaling in PC, in agreement with previous observations [27]. Moreover, when we performed in-silico analysis of these four miRNAs inter-crossed to form a clique by using DIANA-mirPath, (http://diana.cslab.ece.ntua.gr/pathways/ index_multiple.php), a previously undescribed link with Focal Adhesion pathway was put in evidence.
The approaches used in this study allowed us to connect deregulation of miRNA expression to oncogenic pathways and identify links undescribed up until now between altered specific miRNA expression and pathways involved in organ-specific cancer: even though the oncogenic pathways identified in both cancers were similar, the miRNAs linked to them were specific for CRC or PC highlighting the organ-specificity of miRNA networks.
In conclusion, our observations highlight network connections between miRNAs in CRC and PC, and suggest that the regulatory miRNAs, as network nodes, might have combinatorial effects on driving specific cellular regulatory pathways involved in cancer development and progression. Data obtained in our study encourage to further investigate details of miRNA involvement in signaling pathways deregulated in CRC and PC, and to understand whether correlations between specific miRNAs might be in turn accompanied by connections between the respective influenced signaling pathways.