Fig 1.
The benefits of collapsing reads in short RNA-seq data.
Collapsing identical reads is advantageous for miRNAs because the species length (17-24bp) is less than the sequence length (50 bp). Collapsing is not advantageous for mRNAs or DNA.
Fig 2.
miRge: multi-sample quantization of unique sequences followed by a single sequential annotation method for miRNA-seq analysis.
First, sequencing data undergoes a quality control and length filtering step. Sequences are trimmed of adaptors (optional) and unique sequences are quantitated per sample. The unique sequences identified across all samples examined then undergo 5 separate alignment steps against 4 libraries using Bowtie. Only reads > 25 bp are aligned to the hairpin miRNAs. The resulting data is organized and miRge outputs several files including a final miRNA oriented data table in both absolute counts and RPM.
Table 1.
Profiling and miRNA assignment across 5 methods in 3 separate samples.
Fig 3.
Comparisons across 8 methods of miRNA identification.
The miRQC sample A RNA-seq Illumina data set was analyzed by 7 methods and compared to the original data. For each method, a histogram is given of log2 normalized miRNA read counts for 333 shared miRNAs. Pearson correlation was performed for each comparison and a scatter plot with loess curve is presented.
Fig 4.
The spectrum of miRNA entropy.
Kernel density estimates of the distribution of normalized miRNA entropy in two sample sets. A) As embryonic stem cells (ESCs) differentiate towards retinal pigment epithelial cells (RPE) the distribution of miRNA entropy is shifted towards more order (Spearman correlation coefficient 0.14, p>0.001). B) No significant difference in the distribution of miRNA entropy with respect to normal pancreas vs pancreatic adenocarcinoma is observed (Kolmogorov-Smirnov test p > 0.05).
Table 2.
A comparison of common miRNA alignment methods.