miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs

doi:10.1371/journal.pcbi.1007309

Fig 1.

Outline of miRWoods Pipeline.

After aligning to the genome, overlapping reads are grouped together to form read stacks. Read stacks are scored used Mature Product Random Forest (MPRF), to predict a set of putative mature microRNAs. Products which meet the minimum threshold score for the MPRF are combined with the surrounding region to form hairpins and each hairpin is folded. Hairpins are scored using the Hairpin Random Forest (HPRF) and a set of final predictions are generated which meet the minimum threshold for the HPRF score.

More »

Expand

Table 1.

Features used in the mature products random forest (MRPF).

More »

Expand

Table 2.

Features used in the hairpin products random forest (MRPF).

More »

Expand

Fig 2.

Improved hairpin precursor Span identification.

a miRWoods generates several potential hairpin precursor spans from each product that passes through the MPRF. Duplex-focused spans take the region between the product and the optimal duplex and product-focused spans take the region between the product and other products greater than 4 nt away. Hairpins are selected based on HPRF score. b The miRBase annotation for hsa-mir-4721 crosses over an intron boundary. miRWoods corrects the annotation by recognizing a second read stack and produce precursor span that perfectly matches an intron, suggesting mir-4721 is a mirtron. c The miRWoods prediction for mmu-let-7c-2 in mouse is consistent with the miRBase annotation, while the best miRDeep2 prediction, albeit below the default signal-to-noise threshold, only partially overlap with the miRBase annotation.

More »

Expand

Table 3.

Percentage of predicted hairpin spans matching miRBase annotation.

The method with the highest percent for a particular sample are presented in bold.

More »

Expand

Table 4.

Comparison of performance of miRWoods compared to miRDeep2 and miReap.

The method associated with the highest F-score for a particular sample are presented in bold.

More »

Expand

Fig 3.

Evaluation of miRWoods performance.

a Euler diagrams comparing predictions from miRWoods and miRDeep with annotations from miRBase for human MCF-7 cytoplasmic extract b A scatterplot comparing the miRWoods decision value to the log fold change in Dicer knockdown cells compared to wild-type cells. c Scatter-boxplot comparing the log fold change for Dicer knockout to wild type for unprocessed read regions, miRBase annotations, and predictions from miRWoods, miRDeep, and miReap for MCF-7 (cytoplasmic fraction). Black dots indicate predictions that are unique to this method. d Precision-recall (PR) Curve and AUPRC of miRWoods predictions for human including MCF-7 (total cell content), MCF-7 (cytoplasmic fraction), cell lines, and liver. e Euler Diagrams comparing predictions from miRWoods and miRDeep with annotations from miRBase for human liver. f Precision Recall Curve and AUPRC of miRWoods predictions for mouse tissues including brain, embryo, newborn, testes, and ovaries sets. g Euler Diagrams comparing predictions from miRWoods and miRDeep2 with annotations from miRBase for mouse ovary.

More »

Expand

Fig 4.

miRWoods predictions in the feline genome.

a Euler diagram of the predictions from miRWoods with predictions from Sun et al. (2014) and Lagana et al. (2017). b The expression in skin and muscle for miR-133-Novel-3p c Hairpin for mir-133-Novel precursor. d The expression in skin and muscle for Novel110-3p. e Hairpin for Novel110 precursor. f Scatterplot of average muscle expression vs average skin expression for each mature microRNA.

More »

Expand

Fig 5.

Novel let-7 microRNAs in the feline genome.

a RNA-seq of cluster containing fca-let7-Novel2, fca-let7f, and fca-let7-Novel3 for each skin and muscle sample from Felis catus. b Hairpin structures for fca-let7-Novel2, c fca-let7-Novel3, and d fca-let7f. e Phylogenetic tree of let-7 miRs including those previously found by Lagana et al. (2017).

More »

Expand

Fig 6.

Novel microRNA predictions in the bovine genome.

a Euler diagram comparing miRWoods predictions in the cow genome with miRBase annotations. b Scatterplot and best fit line comparing the normalized RT-qPCR expression and RNA-seq for the control miR bta-miR-7. c Scatterplot and best fit line comparing the normalized RT-qPCR expression and RNA-seq for a novel predicted miR with enriched expression in brain stem. d Scatterplot and best fit line comparing the normalized RT-qPCR expression and RNA-seq for a novel predicted miR with enriched expression in corium feet. e Heat map of RT-qPCR expression expression values over tissues examined.

More »

Expand

Fig 7.

mir-2284/mir-2285 family miRs in Bos taurus.

a A heat map for the expression of annotated and novel mir-2284/mir-2285 family miRs. b A phylogenetic tree for the bta-2284/bta-2285 family. Variants of bta-mir-2284 appear in red and variants of bta-mir-2285 appear in blue. Colors for novel predictions appear lighter than those for annotated predictions. c Abundance of miRs for the 5′ and 3′ sides of the mir-2284/mir-2285 family. The 5′ product tends to show greater expression in the mir-2284 family while the 3′ product shows greater expression in the mir-2285 family.

More »

Expand

Fig 8.

Novel microRNA families identified by miRWoods.

a hsa-novel-8 is a mirtron predicted for both MCF-7 sets where expression was decreased in the Dicer knockdown sets. b hsa-Novel-185 is a mirtron predicted within the human cell lines set and the MCF-7 (cytoplasmic fraction) set. It also shows reduced expression in the Dicer knockdown version of the MCF-7 set. c The structure of hsa-novel-8. d The structure of hsa-Novel-185. e Phylogeny comparing the LAMA5 intron and CHD3 intron for several mammals. f Novel miR predicted in bovine genome in an intron of TYK2. g novel predicted miR in the feline genome in the same intron of TYK2 h structure of novel feline miR. i structure of novel bovine miR. Eight nucleoties were removed from the 5' end, and two were added to the 3' end to match the feline hairpin precursor boundaries. j A phylogeny comparing the TYK2 intron in several mammals.

More »

Expand