CSmiRTar: Condition-Specific microRNA targets database

MicroRNAs (miRNAs) are functional RNA molecules which play important roles in the post-transcriptional regulation. miRNAs regulate their target genes by repressing translation or inducing degradation of the target genes’ mRNAs. Many databases have been constructed to provide computationally predicted miRNA targets. However, they cannot provide the miRNA targets expressed in a specific tissue and related to a specific disease at the same time. Moreover, they cannot provide the common targets of multiple miRNAs and the common miRNAs of multiple genes at the same time. To solve these two problems, we construct a database called CSmiRTar (Condition-Specific miRNA Targets). CSmiRTar collects computationally predicted targets of 2588 human miRNAs and 1945 mouse miRNAs from four most widely used miRNA target prediction databases (miRDB, TargetScan, microRNA.org and DIANA-microT) and implements functional filters which allows users to search (i) a miRNA’s targets expressed in a specific tissue or/and related to a specific disease, (ii) multiple miRNAs’ common targets expressed in a specific tissue or/and related to a specific disease, (iii) a gene’s miRNAs related to a specific disease, and (iv) multiple genes’ common miRNAs related to a specific disease. We believe that CSmiRTar will be a useful database for biologists to study the molecular mechanisms of post-transcriptional regulation in human or mouse. CSmiRTar is available at http://cosbi.ee.ncku.edu.tw/CSmiRTar/ or http://cosbi4.ee.ncku.edu.tw/CSmiRTar/.

Understanding the miRNA-target interactions is the crucial step to discern the roles of miRNAs in different biological processes [10]. Many databases have been constructed to provide miRNA targets information. For example, TarBase [11] and miRTarBase [12] collect a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 manually curated miRNA targets with experimental evidence from the literature but they are far from complete. The other databases such as TargetScan [10], miRDB [13], microRNA.org [14], DIANA-microT [15], miRecords [16], MAGIA [17], mirDIP [18], miRSystem [19] and miRGator [20] collect computationally predicted miRNA targets generated from various algorithms. However, these databases usually return thousands of predicted targets of a query miRNA. Researchers have to put extra efforts to extract the interested miRNA targets from a large number of uninterested ones. Since miRNAs regulate their targets in specific tissues, cell types and disease states, it is advantageous to have a database which can return miRNA targets in a specific physiological condition. Three existing databases attempted to meet this need. miTALOS [21] can provide miRNA targets of a specific tissue or cell line. miRWalk [22] can provide miRNA targets related to a specific OMIM disorder. starBase [23] can provide miRNA targets whose expressions are anti-correlated with miRNA's expression in specific cancer types. However, none of them can provide the miRNA targets expressed in a specific tissue and related to a specific disease at the same time. Therefore, there is still a need for a database which implements both the tissue and disease filters.
The complex circuitry of miRNA-mRNA interactions show the dynamic regulation of gene expression. Recent study showed that overexpressed MRE (miRNA response element)-containing transcripts can soak up the miRNA and upregulate its target genes [24]. Moreover, the competing endogenous RNAs (ceRNAs), transcripts that cross-regulate each other by competing for shared miRNAs, were proposed to describe the new layer of post-transcriptional regulation and linked the functions of coding and non-coding RNAs [25]. Several studies indicated the deregulation of ceRNA network in cancer development [26]. Because most existing databases do not provide the common miRNAs of a set of genes, they cannot be used to find out the shared miRNAs of ceRNAs. Therefore, it is advantageous to have a database which provides researchers the common miRNAs of multiple genes and the common targets of multiple miRNAs.
To meet these two needs, we develop a database called CSmiRTar (Condition-Specific miRNA Targets). CSmiRTar collects computationally predicted targets of 2588 human miR-NAs and 1945 mouse miRNAs from four most widely used miRNA target prediction databases (miRDB, TargetScan, microRNA.org and DIANA-microT). CSmiRTar implements (i) a tissue filter for users to search the miRNA targets expressed in a specific tissue, (ii) a disease filter for users to search the miRNA targets related to a specific disease, and (iii) a database filter for users to search the miRNA targets supported by multiple existing miRNA target prediction databases. Moreover, CSmiRTar allows users to search the common targets of a set of input miRNAs under a specific physiological condition and the common miRNAs of a set of input genes under a specific physiological condition. We believe that CSmiRTar will be a useful database for biologists to study the molecular mechanisms of post-transcriptional regulation in human or mouse.

Construction and contents Data collection and processing
Five data sources were used to construct CSmiRTar. First, the experimentally validated human and mouse miRNA targets were retrieved from miRTarBase [12], which manually collected miRNA-target interactions with experimental evidence from the literature. Second, the computationally predicted human and mouse miRNA targets were retrieved from four most widely used miRNA target prediction databases (TargetScan [10], miRDB [13], microRNA.org [14] and DIANA-microT [15]). The miRNA-target interactions in these four databases were predicted by TargetScan algorithm, MirTarget algorithm, miRanda algorithm and DIANA microT-CDS algorithm, respectively. Since the collected miRNA-target interactions from different databases may use different identifiers (IDs), we have to do ID conversion in order to integrate data from different databases. In CSmiRTar, we used miRBase ID as the miRNA identifier and NCBI gene ID as the gene identifier. That is, all miRNA-target interactions were recorded as miRBase ID-NCBI gene ID pairs in CSmiRTar. Third, the tissues in which a human or a mouse gene is expressed were retrieved from Expression Atlas [27]. Expression Atlas, maintained by EMBL-EBI, provided the genes expressed in a specific tissue by analysing microarray and RNA-seq data in ArrayExpress [28]. Fourth, the diseases to which a human gene is related were retrieved from DisGeNET [29], which manually collected gene-disease associations from the literature and other expert curated databases. Fifth, the diseases to which a human miRNA is related were retrieved from PhenomiR [30], which manually collected miRNA-disease associations from the literature. The statistics of CSmiRTar could be found in Table 1. The collected dataset is already very big. On average, a human gene has 572 predicted miRNAs and a mouse gene has 231 predicted miRNAs. Therefore, biologists already have troubles to find out the functional miRNAs (among so many predicted miRNAs) for a gene of interest.

Implementation of CSmiRTar website
CSmiRTar was built using the scripting language PHP and Codelgniter framework. Crawler was used to retrieve raw data from other databases and Python was used to process the raw data. The processed data was stored in MySQL. The Interactive bar chart was generated by Highcharts.

Database interface
CSmiRTar provides both a search mode and a browse mode. In the search mode, users have four possible ways to search CSmiRTar. First, users can input a miRNA and search its targets which are (i) expressed in a specific tissue, (ii) related to a specific disease, or/and (iii) supported by multiple existing miRNA target prediction databases. After submission, users will see the search results sorted by the number of supported databases or the average normalized score (see Fig 1). Second, users can input a set of miRNAs and search their common targets which are (i) expressed in a specific tissue, (ii) related to a specific disease, or/and (iii) supported by multiple existing miRNA target prediction databases. After submission, users will see the search results sorted by the number of supported databases or the mean average normalized score (see Fig 2). Third, users can input a gene and search its miRNAs which are (i) related to a specific disease or/and (ii) supported by multiple existing miRNA target prediction databases. After submission, users will see the search results sorted by the number of supported databases or the average normalized score (see Fig 3). Fourth, users can input a set of genes and search their common miRNAs which are (i) related to a specific disease or/and (ii)  supported by multiple existing miRNA target prediction databases. After submission, users will see the search results sorted by the number of supported databases or the mean average normalized score (see Fig 4).
In the browse mode, users have two possible ways to browse CSmiRTar. First, users can click on a human/mouse miRNA name and get the miRNA's targets supported by one or multiple existing miRNA target prediction databases (see Fig 5). Second, users can click on a human/mouse gene name and get the miRNAs, which regulate the gene, supported by one or multiple existing miRNA target prediction databases (see Fig 6).
Our database/tissue/disease filters can significantly reduce the number of predicted miRNA targets but still keep the functional ones Identifying the functional targets is a crucial step to dissect the function of miRNAs. Using existing miRNA target prediction databases usually returns thousands of predicted targets per miRNA. Therefore, it is very hard for researchers to choose the biologically plausible candidates for further experimental validation. Besides, it can be expected that many of the predicted targets are non-functional since miRNAs only regulate their target genes in specific tissues, cell types and disease states.
To solve this problem, we implement three different kinds of filters (a database filter, a tissue filter and a disease filter) to efficiently reduce the number of predicted miRNA targets but still keep the functional ones. To show the effectiveness of our filters, we prepare a benchmark set by randomly selecting several experimentally validated miRNA-target pairs in specific tissues and cancers from OncomiRDB [31]. As shown in Table 2 for 10 case studies, our filters can significantly reduce the number of predicted miRNA targets by more than 90% but still keep the experimentally validated miRNA targets. For example, human miR-16-5p is known to regulate the gene PPM1D in breast cancer cells [32]. Even by considering the common predicted targets from four existing miRNA target prediction databases (i.e. setting the database filter equal to four), PPM1D is still hidden in 883 predicted targets of miR-16-5p, suggesting that applying the database filter alone is not an efficient way to reduce the non-functional miRNA targets. If we further apply the tissue filter (selecting breast) and disease filter (selecting invasive breast cancer), PPM1D is now hidden in only 16 predicted targets. Researchers then have a high chance to pick out the functional targets (e.g. PPM1D) of miR-16-5p for further experimental investigation. On the contrary, if using existing miRNA target prediction databases (e.g. miRecords [16], miRWalk [22], miRSystem [19] and starBase [23]), researchers will have difficulty to pick out PPM1D among hundreds or even thousands of predicted targets of miR-16-5p (see Table 3).
Our database/disease filters can significantly reduce the number of predicted miRNAs of a gene but still keep the functional ones As shown in Table 4 for 10 case studies, our filters can significantly reduce the number of predicted miRNAs of a gene by 63% to 95% but still keep the experimentally validated miRNAs returns 117 common target genes sorted by the mean average normalized score (MANS). (c) When clicking on a gene name in the "Common Target Gene" column, it opens a webpage showing the basic information of this gene, the tissues in which this gene is expressed and the diseases to which this gene is related. (d) An orange bar means that the miRNA-target pair has been experimentally validated. When clicking on the orange bar, it links to miRTarBase to show the experimental evidence of the selected miRNA-target pair. (e) When clicking on a score in the "MANS" column, it opens a webpage showing how the MANS is calculated.  which really regulate the gene. For example, human gene MECP2 is known to be regulated by miR-212-3p in gastric cancer cells [33]. Even by considering the common predicted miRNAs from three existing miRNA target prediction databases (i.e. setting the database filter equal to three), miR-212-3p is still hidden in 537 predicted miRNAs of the gene MECP2, suggesting that applying the database filter alone is not an efficient way to reduce the non-functional miR-NAs of MECP2. If we further apply the disease filter (selecting gastric cancer in the stomach), miR-212-3p is now hidden in only 22 predicted miRNA of MECP2. Researchers then have a high chance to pick out the functional miRNAs (e.g. miR-212-3p) of MECP2 for further experimental investigation.

Identifying the shared miRNAs of ceRNAs
An important step to reconstruct the ceRNA network is to identify the shared miRNAs of ceR-NAs. CSmiRTar allows users to input a set of genes (e.g. ceRNAs) to search the shared miR-NAs which regulate these genes. As shown in Table 5 for five case studies, CSmiRTar can identify the experimentally validated shared miRNAs of ceRNAs. For example, it is known that human ceRNAs (PTEN, VAPA and CNOT6L) are all regulated by miR-17-5p, miR-19a-3p, miR-20a-5p and miR-106b-5p in human prostate cancer cells [34]. By considering the common predicted miRNAs from three existing databases (i.e. setting the database filter equal to three) and applying the disease filter (selecting prostate cancer), CSmiRTar returns 13 predicted shared miRNAs which contain all the four experimentally validated shared miRNAs of the input ceRNAs.

Identifying the common target genes of a set of miRNAs
In CSmiRTar, users can input a set of miRNAs to search their common target genes. As shown in Table 6 for five case studies, CSmiRTar can successfully identify the experimentally validated common target genes of multiple miRNAs. For example, it is known that human miR-186-5p, miR-216b-5p, miR-337-3p, and miR-760 cooperatively induce cellular senescence by targeting the gene CSNK2A1 in human colorectal cancer cells [35]. By considering the predicted target genes supported by three existing miRNA target prediction databases (i.e. setting the database filter equal to three) and applying the tissue/disease filter (selecting colon/colorectal carcinoma), CSmiRTar returns 23 predicted common target genes which contain the experimentally validated common target gene CSNK2A1 of the input set of miRNAs.

Conclusions
In this article, we present CSmiRTar which provide computationally predicted targets of 2588 human miRNAs and 1945 mouse miRNAs. CSmiRTar implements (i) a tissue filter for users to search the miRNA targets expressed in a specific tissue, (ii) a disease filter for users to search the miRNA targets related to a specific disease, and (iii) a database filter for users to search the predicted miRNA targets supported by multiple existing databases,. Moreover, CSmiRTar allows users to search the common targets of a set of input miRNAs under a specific physiological condition and the common miRNAs of a set of input genes under a specific physiological  condition. We provide many case studies to show the effectiveness of our filters in reducing the number of predicted miRNA targets but still keep the functional ones. However, users should note that some functional miRNA targets may not be kept when applying both the tissue and disease filters if they are not expressed in normal tissues but are abnormally expressed in disease states. Nevertheless, we believe that CSmiRTar will be a useful database for biologists to study the molecular mechanisms of post-transcriptional regulation in human and mouse. Condition-Specific microRNA targets database