Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach

doi:10.1371/journal.pone.0121501

Fig 1.

An illustration to show biogenesis of miRNAs and model of miRNA-mediated translational repression or mRNA degradation.

MiRNA genes are transcribed by RNA polymerase II [2,90], resulting in the primary transcripts termed as pri-miRNAs, which are typically 60–70 nucleotides. The pri-miRNAs are processed by the enzyme Drosha to release the hairpin-shaped intermediates (pre-miRNAs) [3], followed by being exported into the cytoplasm by Exportin V and Ran-GTP cofactor [4–6], and then cleaved by the enzyme Dicer to yield miRNA/miRNA* duplexes [7–11].

More »

Expand

Fig 2.

Illustration to show the 6 structure statuses of paired nucleic acid residues.

Note that the nucleotide near 5’ end is different with the one near 3’end: (a) the base pairs A-U or U-A has 2 hydrogen bonds; (b) the base pair G-C or C-G has 3 hydrogen bonds; and (c) the wobble base pair G-U or U-G has 2 weaker hydrogen bonds. See the main text for further explanation.

More »

Expand

Fig 3.

A flowchart to show the process of generating the feature vector for a RNA sequence by its structure status composition.

Given a RNA sequence R (cf. Equation 2), its secondary structure sequence was derived from Vienna RNA software package, as formulated in Equation 4. According to the definition in that package, there are two types of status for each of the nucleotides: unpaired or paired. The former is denoted by a dot “.” and the latter by the symbol “(“or “)”. The left bracket “(“stands for a nucleotide near the 5'-end while the right bracket for the one near the 3'-end. Since the number of different structure elements in the RNA sequence thus obtained is 10 (cf. Equation 5), its n-tuple element composition will contain 10ⁿcomponents (cf. Equation 6). For simplicity, however, shown here is only for the case of n = 2; i.e., the 2-tuple element composition that contains 10² = 100 components formed by different pairs of the most contiguous secondary structure status elements.

More »

Expand

Fig 4.

A schematic illustration to show the correlation of structure statuses along a RNA sequence.

(a) The first-tier correlation reflects the structure-order mode between all the most contiguous nucleotides. (b) The 2nd-tier correlation reflects the structure-order mode between all the second-most contiguous nucleotides. (c) The 3rd-tier correlation reflects the structure-order mode between all the third-most contiguous nucleotides. As we can see, the global or long-range sequence order information of RNA can thus be approximately and indirectly incorporated into the current prediction model as done by the PseAAC approach for proteins [30].

More »

Expand

Table 1.

Comparison of different predictors by the jackknife tests on a same benchmark dataset (S1 Dataset).

More »

Expand

Fig 5.

A graphical illustration to show the performance of different methods by means of the receiver operating characteristic (ROC) curves.

The areas under the ROC curves, or AUC are 0.93, 0.96, 0.90, and 0.94 for iMcRNA-PseSSC, iMcRNA-ExPseSSC, Triplet-SVM, and MiPred, respectively. See section “Comparison with Other Methods” for further explanation.

More »

Expand

Fig 6.

Visualizing the discriminative power with a heat map.

(a) The discriminative power of the 100 local structure status compositions. The structure statuses marked on the vertical and horizontal axes indicate the first structure status and the second structure status in the local structure status compositions. (b) The discriminative power of the 13 features incorporating the structure-order effect. The λ values are marked on horizontal axis.

More »

Expand

Fig 7.

A semi-screenshot to show the top page of the web-server iMcRNA.

Its website address is at http://bioinformatics.hitsz.edu.cn/iMcRNA/.

More »

Expand

Fig 8.

A semi-screenshot to show the output obtained by the web-server.

See the text for further explanation.

More »

Expand