Table 1.
Summary of RNA-Seq and fusion analysis.
Figure 1.
The deFuse gene fusion discovery method.
a) Discordant alignments are clustered based on the likelihood that those alignments were produced by reads spanning the same fusion boundary. Ambiguous alignments are resolved by selecting the most likely set of fusion events, and the most likely assignment of paired end reads to those events, and the remaining alignments are discarded. b) Paired end reads with an alignment for which one end aligns near the approximate fusion boundary are mined for split alignments of the other end of the read. c) The predicted fusion boundary is used to calculate the fragment lengths for each spanning paired end read. These fragment lengths are tested for the hypothesis that they were drawn at random from the fragment length distribution.
Figure 2.
Conditions for considering two paired end reads to have originated from the same fusion transcript.
a) Fusion transcript X-Y supported by a paired end read spanning the fusion boundary. b) Discordant paired end reads represent reads potentially spanning a fusion boundary. Each discordant alignment suggests fusion boundaries in the regions adjacent to the alignments in each transcript. The fusion boundary region, shown in gray, is the region in which we expect a fusion boundary to occur. c) The overlapping boundary region condition is the condition that the fusion boundary regions in each transcript must overlap. d) The difference between the fragment lengths of two paired end reads spanning a fusion boundary is . e) The similar fragment length condition is the constraint that
must be no more than
.
Figure 3.
Searching for candidate split reads.
a) Approximate fusion boundaries, shown as dashed rectangles, are the intersection of fusion boundary regions for discordant alignments supporting a potential fusion. b) The mate alignment region, shown as a dashed rectangle, is the union of possible alignment locations for the other end of a single end anchored alignment. c) The approximate fusion boundary in transcript is projected into transcript
by remapping the start of the approximate fusion boundary from
, to the genome, to
.
Table 2.
RT-PCR validated novel deFuse predictions.
Figure 4.
ROC curve for deFuse annotated with the threshold for the adaboost probability estimate. The threshold corresponds to a false positive rate of 10% and true positive rate of 82%.
Figure 5.
Variable importance plot for deFuse classifier.
Relative importance of each of the 11 features used by deFuse classifier.
Table 3.
Fusions predictions compared between deFuse and FusionSeq.
Table 4.
Comparison of accuracy metrics for FusionSeq and deFuse.
Table 5.
deFuse predictions for existing datasets with known fusions.
Figure 6.
Evidence for the FRYL-SH2D1A fusion showing the validated fusion boundary (vertical red line).
a) Validation evidence using a FISH come together assay, with fusion probes circled in white. b) FISH probe selection. c) FRYL exonic coverage showing fewer reads aligning after the fusion boundary. FRYL exons in blue with narrower boxes denoting untranslated sequence. d) SH2D1A exonic coverage showing significant coverage after the fusion boundary. SH2D1A exons in green with narrower boxes denoting untranslated sequence. e, FRYL-SH2D1A exons in blue or green depending on their origin, with the whole transcript predicted as untranslated. f) Positions of spanning reads supporting the fusion. g, Split alignments supporting the fusion prediction. h) Chromatogram of a sequenced PCR product supporting the fusion.
Figure 7.
a) Read depth across HNF1A exonic positions shows that only the region after the fusion boundary is being expressed, evidence of the possible biallelic inactivation of HNF1A. b) Putative RREB1-TFE3 chimeric protein showing preservation of TFE3's basic helix-loop-helix (bHLH) leucine zipper (LZ) domain and N-terminal activation domain (ATA), in addition to 4 of RREB1's zinc finger (ZF) motifs.