Figure 1.
Conceptual Model of Intragenomic Matching
mRNA sequences are matched against the genome and matches are prefiltered. Matches with miRNA precursor potential are selected for further processing.
Figure 2.
Overview of the Number of miRNA Candidates at Successive Steps of the Procedure
A genome assembly and a set of annotated mRNA transcripts are input to the intragenomic matching.
Intragenomic matching. The result of the intragenomic matching generates “micromatches” consisting of pairs of a genome segment and an mRNA segment. Also shown is the recovery of miRBase 8.2 loci and families.
miSVM. Remaining number of miRNA loci and families after miSVM classification is shown (numbers in green). The number of miRNA candidate loci and families not overlapping repeat/CDS regions are shown in blue.
miHomology. Conservation filters were applied to detect the subset of miRNA candidates that have at least one homolog in one of the other two organisms.
miSquare. The conserved miRNA candidates with the additional requirement of targets orthologs.
Figure 3.
The Structural Feature Space of miSVM
Distribution of structural features in the positive (blue) and negative (red) examples used to train miSVM. Arrows illustrate the feature on an example miRNA precursor, with the mature miRNA sequence highlighted in red.
Figure 4.
Density of the miSVM score of positive (blue) and negative examples (red).
Table 1.
Performance of miSVM Estimated from Cross-Validation
Figure 5.
The Principle of the miSquare Conservation Criteria
When two orthologous miRNAs have at least one instance of orthologous targets in the two organisms, we call this a miSquare.
Figure 6.
Distribution of Family Sizes and Target Numbers
miRNA candidates outside coding sequences and repeat regions are counted and density plots constructed.
Top row: Distribution of the number of targets per miRNA family.
Bottom row: Distributions of family sizes. The conserved candidates generally have larger family sizes.
Figure 7.
Conservation of miRNA Candidates and miRBase miRNAs
(A) Species conservation (miHomology) of all candidate miRNA families predicted with miSVM and not overlapping repeat or coding sequence. The Venn diagram shows the number of families that are species specific and those that are conserved within another species (see Materials and Methods).
(B) Species conservation (miHomology) of only miRBase (version 8.2) miRNA families (repeat/CDS overlapping families). We only include miRBase miRNAs that can be mapped exactly to the genome according to the reported precursor sequence and where we can predict at least one target.
Table 2.
Novel miSquare Conserved miRNA Candidates
Figure 8.
Distribution of miRNAs in the Genomic Landscape
A histogram for each of the three organisms showing the genomic origin of the miRNAs. The first histogram group in each plot shows the relative abundance of coding (CDS), untranslated (UTR), intron, repeat, and intergenic (IGR) regions in the genome. The second histogram group shows the relative abundance of miRBase miRNAs among these regions, with different colors for sense and antisense overlap. The last three histogram groups capture the same measurements for predicted miSVM, miHomology, and miSquare miRNAs. Novel predicted miRNAs (not found in miRBase) in these groups are illustrated with darker colors, whereas miRBase miRNAs found among our candidates have lighter colors (see legend).
Figure 9.
miRNA Candidates Targeting TFs in Arabidopsis
Enrichment of Arabidopsis TF targets in different sets of miRNAs, comparing the relative abundance of TFs among the miRNA targets with the relative abundance of TFs in the Arabidopsis genome (∼5.9%). For the nonfiltered miRNA sets (red), the relative abundance of TF targets are miRBase, 59 of 440; miSVM, 87 of 782; miHomology, 60 of 429; and miSquare, 59 of 408. For the repeat/CDS filtered miRNA sets (green), the numbers are miRBase, 42 of 133; miSVM, 73 of 442; miHomology, 43 of 116; and miSquare, 42 of 103.
Figure 10.
miRNA Overlap with Sequenced Small RNAs
Percentage of Arabidopsis miRNAs with 20–23 nt coordinate overlap with sequenced and genome-mapped small RNAs from [36]. Three different sets are shown (all filtered for CDS/repeat overlap).
(A) Random 22mers, 21.549 loci sampled randomly from the genome.
(B) A set of 1,886 miRNA loci classified as non-miRNAs with miSVM.
(C) A set of 334 miRNA loci classified as miRNAs by miSVM.