Predicting miRNA Targets by Integrating Gene Regulatory Knowledge with Expression Profiles

Motivation microRNAs (miRNAs) play crucial roles in post-transcriptional gene regulation of both plants and mammals, and dysfunctions of miRNAs are often associated with tumorigenesis and development through the effects on their target messenger RNAs (mRNAs). Identifying miRNA functions is critical for understanding cancer mechanisms and determining the efficacy of drugs. Computational methods analyzing high-throughput data offer great assistance in understanding the diverse and complex relationships between miRNAs and mRNAs. However, most of the existing methods do not fully utilise the available knowledge in biology to reduce the uncertainty in the modeling process. Therefore it is desirable to develop a method that can seamlessly integrate existing biological knowledge and high-throughput data into the process of discovering miRNA regulation mechanisms. Results In this article we present an integrative framework, CIDER (Causal miRNA target Discovery with Expression profile and Regulatory knowledge), to predict miRNA targets. CIDER is able to utilise a variety of gene regulation knowledge, including transcriptional and post-transcriptional knowledge, and to exploit gene expression data for the discovery of miRNA-mRNA regulatory relationships. The benefits of our framework is demonstrated by both simulation study and the analysis of the epithelial-to-mesenchymal transition (EMT) and the breast cancer (BRCA) datasets. Our results reveal that even a limited amount of either Transcription Factor (TF)-miRNA or miRNA-mRNA regulatory knowledge improves the performance of miRNA target prediction, and the combination of the two types of knowledge enhances the improvement further. Another useful property of the framework is that its performance increases monotonically with the increase of regulatory knowledge.


Algorithm 1
We formally summarize the procedure of causal structure construction. Suppose that we are interested in the regulatory relationships among p miRNAs and q mRNAs represented as X = {X 1 , . . . , X p , X p+1 , . . . , X p+q }, where X 1 , . . . , X p denote the miRNAs and X p+1 , . . . , X p+q denote the mRNAs (including TF coding mRNAs). Given the expression profile data X s×n of s samples of n = p + q miRNAs and mRNAs, the prior knowledge matrix M n×n where an entry with m i,j = 1 indicates regulatory relationship between the ith to jth gene in the prior knowledge, let ci test represent the conditional independence test procedure in the PC algorithm. ci test(i, j) = 0 if the ith and jth variables are dependent given any conditional set, and ci test(i, j) = 1 if they are independent given a conditional set S, we describe the details for constructing the causal structure in Algorithm 1.

Algorithm 2
We summarize the details of Algorithm 2 in this section. Given the expression profile matrix X, and the causal structure constructed by Algorithm 1, our goal is to estimate a matrix C where each entry C(i, j) represents the amount of causal effect that miRNA i has on mRNA j .

Additional results when utilizing transcriptional knowledge
To demonstrate the effectiveness of CIDER when utilizing transcriptional knowledge, in Fig. 1 we show more miRNA targets predicted with CIDER using expression profiles and TransmiR.

Algorithm 1 Construct the causal structure G
Require: Gene expression profile data X s×n , prior knowledge matrix M n×n . Ensure: Constructed causal graph G G ← fully connected graph with n vertices //Constructing the graph with prior knowledge Repeatedly apply the following rules to orient as many edges as possible

Additional experiments using post-transcriptional knowledge
In Table 1 we show the additional results of CIDER utilizing post-transcriptional knowledge and expression profiles. Particularly, we utilize the regulatory knowledge from the miRNA target predicted by miRANDA [1], and the expression profiles described in the main article. To validate the results, we used the same Algorithm 2 Estimate the causal effects between miRNA i and mRNA j Require: Gene expression profile data X s×n , causal structure G. Ensure: Causal effects matrix C where C(i, j) is the causal effect of miRNA i on mRNA j . C ← n × n zero matrix Determine all possible causal DAGs G 1 , ..., G m by iterating directions over undirected edges in G for i = 1 to p do for j = p + 1 to p + q do for t = 1 to m do θ ijt = β ij|paj (Gt) end for C(i, j) = min t∈1,...,m |θ ijt | end for end for return C combined experimentally validated databases as in the main article.
Similar to TargetScan, miRANDA also utilizes sequence binding information to predicted miRNA/mRNA binding sites, and use the mRNAs with corresponding miRNA binding sites as the predicted miRNA target.
The results show that CIDER is able to utilize miRANDA to improve prediction performance, despite that the knowledge in miRNADA is also prone to false positives.

Pathway analysis for predicted miRNA targets
We show the results of pathway analysis for the miRNA targets predicted by CIDER when utilizing all three types of knowledge in Table 2.