Figure 1.
A) RNA-Seq produces millions of short reads, some of which will span the exon boundaries of hypothetical fusion transcripts between Gene 1 and Gene 2. Two different fusion isoforms involving different exons are shown, left and right, along with a single read that spans each breakpoint. Reads are split into smaller pseudo PE reads which can be aligned independently to a reference transcriptome. B) Alignment of pseudo PE reads against the reference transcriptome. One of each pair aligns to an exon on Gene 1 and the other aligns to an exon on Gene 2. Repeating this process for all other RNA-Seq reads creates “alignment blocks” from overlapping groups of aligned 5′ and 3′ pseudo PE reads and their genomic coordinates. Multiple alignment blocks on either gene (as for Gene 1 in the example) provide evidence for the existence of different isoforms of the fusion.
Figure 2.
Identification of the transcript breakpoint in each PRIM1:NACA isoform.
Alignments of the full length 76mer reads providing evidence for the two isoforms of PRIM1:NACA (i.e. as originally identified by Levin et al, top, and the novel isoform identified by FusionFinder, bottom) against the last 30 bases of the implicated PRIM1 (G1) exon and the first 30 bases of the NACA (G2) exon. The transcript breakpoint can be clearly seen where the PRIM1 exon ends and the NACA exon begins. Also displayed is an in-frame translation of the G1 exon from wild type PRIM1, running into the fused NACA exon. Both isoforms retain an open reading frame despite different exon usage.
Table 1.
Summary file showing the 9 candidates from the FusionFinder analysis of the Levin dataset.
Table 2.
FusionFinder isoforms file for the six common fusion candidates reported by both FusionFinder and Levin et al.
Figure 3.
RT-PCR validation of the fusion candidates.
Primers were designed around the individual fusion breakpoints and cDNA was synthesised using gene-specific primers. Products were successfully amplified for the following fusion isoforms; BCR:ABL (380 bp, lane 1), PRIM1:NACA isoform 1 (400 bp, lane 2), PRIM1:NACA isoform 2 (340 bp, lane 3), C3orf10:VHL isoform 2 (340 bp, lane 6), ACCS:EXT2 isoform 3 (230 bp, lane 9) and SLC29A1:HSP90AB1 (340 bp, lane 10). No product could be amplified from CEP170:RAD51L1 (lane 4), C3orf10:VHL isoform 1 (lane 5), ACCS:EXT2 isoform 1 or ACCS:EXT isoform 2 (lanes 7 and 8). The corresponding negative controls for each reaction are in the lanes proceeding each reaction. All detected fusion products were validated by Sanger sequencing.
Table 3.
Performance comparison of FusionFinder, FusionMap and Tophat-Fusion in an analysis of the Levin dataset.
Table 4.
Summary of the overall comparative performance of FusionFinder, FusionMap and Tophat-Fusion on a simulated dataset.
Figure 4.
Comparison of sensitivity and PPV for FusionFinder, FusionMap and Tophat-Fusion.
To compare the sensitivity and PPV of FusionFinder, FusionMap and Tophat-Fusion to detect fusion genes, each software was used to analyse a randomly generated dataset simulating normal genes and 55 fusion genes. Calculations of sensitivity and PPV were made for subgroups of the results based on the number of reads evidencing the fusion genes predicted by each software. FusionFinder shows consistently higher sensitivity than both FusionMap and Tophat-Fusion and shows a generally higher PPV than FusionMap and similar PPV to Tophat-Fusion.
Table 5.
Summary file showing the 7 candidates from the FusionFinder analysis of the MCF-7 Breast Cancer cell line paired-end dataset.