Fig 1.
Flow chart of the steps performed by LEMONS.
(A) LEMONS default and primary database taken from UCSC Genome Browser and HG19 encompasses all non-redundant human RefSeq proteins, together with their known splice-junctions location. Arrowhead-like gaps correspond to splice-junctions. (B) LEMONS employs BLASTX pairwise alignment to compare each of the identified transcripts to their orthologous proteins (as compared to the reference database) and predicts splice-junctions based on the conserved gene structure. (C) LEMONS uses all predicted exons that do not split codons to establish the 3' motif of the exon. (D) The identified motif assists in choosing between adjacent potential splice-junctions and between the two potential splice-junctions that split codons. (E) Using more than one reference database enhances the accuracy of splice-junction prediction (again, while implementing a motif search).
Table 1.
Summary statistics following analysis of the new version of the chameleon transcriptome (TransCham v2.0).
Fig 2.
LEMONS sensitivity and precision assessment.
(A) Demonstration of the different splice-junction predictions made by LEMONS and their occurrence in the examined organism’s coding regions, according to genome annotation. P—"true" splice-junction; TP (true positive)–correct identification of splice-junction by LEMONS; FN -false negative; FP—false positive splice-junctions. TP+FN (true positive + false negative)–total number of true splice-junctions in the examined organism, according to genome annotation; TP+FP (true positive + false positive)–total number of splice-junctions predicted by LEMONS; (B-C) LEMONS-based identification of splice-junctions. Our analysis accounted for the distance (in nucleotides) between the splice-junction predicted by LEMONS and the true splice-junction. The analysis presented is of five species: M. musculus, G. gallus, A. carolinensis, X. tropicalis and D. rerio. (D) Comparison of LEMONS similarity, sensitivity and precision for the five species tested. For absolute numbers, see S3 Table.
Fig 3.
LEMONS sensitivity and precision assessment using a motif search and multiple reference databases.
(A-B) Identification of splice-junctions by LEMONS. Our analysis accounted for the distance (in nucleotides) between splice-junctions predicted by LEMONS and the true splice junctions. (C) Comparison of LEMONS similarity, sensitivity and precision plotted for the five species tested. The analysis was performed using five databases, including the human and four of the model organisms (excluding the tested organism). The nomenclature used is as in the legend to Fig 2.
Fig 4.
Comparing the performances of LEMONS and CEPiNS.
LEMONS performance in terms of similarity, sensitivity and precision are as indicated in Fig 3. CEPiNS performance was evaluated using a human database (the same Refseq protein sequences as used in the LEMONS human database, including their genomic sequences). The same RNA sequences were used as input in both analyses. CEPiNS evaluation was performed using default settings, excluding the filter for alternative splicing in the input sequences (see Materials and Methods).
Fig 5.
Sequencing of three representative chameleon genes amplified from DNA templates.
Shown are alignments of sequences extracted from the RNA-seq (cDNA) and the corresponding genomic DNA sequence. Exons are shaded in gray. The genes analyzed were HSD17B4, comprising a single exon and its adjacent intron, DDX56, comprising two exons and their intervening sequence (intron) and AQR, corresponding to a single exon.