Fig 1.
A schematic of the core Mirage2 algorithm.
Isoforms are first mapped back to their coding exons on the genome. Once all isoforms within a gene family have been mapped, those genome mapping coordinates serve as the basis for intra-species alignment, resulting in an MSA with explicit splice site awareness and exon delineation.
Fig 2.
An example of a “species guide” file.
The top line is a Newick-formatted species tree to set the merge order of species during Mirage2’s interspecies alignment phase. Each subsequent line associates a species with the location of its reference genome and a GTF index. Sequences belonging to species that aren’t listed in the species guide are treated as “miscellaneous” and are the last to be integrated into interspecies MSAs.
Table 1.
Mirage2’s mapping methods map nearly all SwissProt sequences.
Fig 3.
FastMap and Spaln2 are complementary mapping methods.
The majority of sequences that Mirage2 is able to map back to the genome can be mapped using either FastMap or Spaln2, although one tool or the other is specifically required to map 14.0% of human sequences, 15.0% of mouse sequences, and 12.1% of rat sequences.
Fig 4.
Mirage2 MSAs have extremely high percents column identity.
Percent column identity distributions for intra-species Mirage2 multiple-sequence alignments (excluding “alignments” with only 1 sequence) and Mirage2 inter-species alignments for genes present in at least 2 species.
Fig 5.
Differences between the percents column identity of Mirage2 MSAs and alignments produced by general-purpose MSA tools.
Values were computed by subtracting the percent column identity of each Mirage2 MSA from the percent column identity of corresponding MSA produced by an alternative tool.
Fig 6.
The length compaction factors of alternative alignment methods relative to Mirage2.
Alignment length is defined as the number of columns in an MSA and the compaction factor is computed by dividing the length of an alternative tool’s MSA by the length of the corresponding Mirage2 MSA.
Fig 7.
A partial comparison of the alignments of human DMBT1 sequences produced by Mirage2 and MAFFT.
The underlined segments highlight sequence regions where the tools are generally in agreement, but the segments are spaced significantly further apart in the MAFFT alignment than they are in the Mirage2 alignment. This illustrates how, in cases of erroneous alignment, using the comparative lengths of isoform MSAs can be an imperfect quantification of relative alignment quality.
Table 2.
Mapping performance comparison between Mirage2 and the original Mirage implementation.
Table 3.
Runtime comparison between Mirage2 and alternative MSA tools.