Figure 1.
Transformation of the overlap alignment graph into the path graph.
In (A), an overlap graph is shown, and, for some read sequences, the underlying alignment information is displayed. In (B), the corresponding path graph is displayed. For each unique path in the overlap graph an inner edge is introduced in the path graph, represented by a solid line. For example, the unique path between b and d in the overlap graph is represented by the inner edge (b′″, d′) in the path graph. If two vertices in the path graph represent the two ends of the same read, then a real edge is presented between them in the path graph. These real edges are shown as dashed lines. An example is the real edge (d′, d″). The vertices d′ and d″ represent the same read since they both correspond to vertex d in the overlap graph but they represent the two different ends of the read, since the read of d overlaps with the read of b and e but at different ends, which is displayed in (A).
Figure 2.
The figure shows the workflow of the algorithm of SUPERLOCAS. The initial steps are illustrated: the left-over reads with the constructed left-over overlap graph, and the reads that are aligned against the reference sequence and partitioned into blocks. Next, the steps that are executed consecutively for each block are shown: the construction of the overlap graph, the insertion of edges between both graphs and the procedure until contigs are reported for the merged graph.
Figure 3.
Performance comparison of low sequencing depth assembly.
Illumina GAIIx reads were simulated at a sequencing depth of 7.5× for the first chromosome of A. thaliana Col-0. The reads were assigned to the reference sequence corresponding to their origin positions and partitioned into blocks of a length of 10 kb. The avgN50 (average N50) is plotted against the avgERR (average error rate) for the assembly tools LOCAS, EULER-SR, ABySS, VELVET and soapDeNovo. For each assembler, several runs are displayed corresponding to the different parameter settings. The data points of ABySS are drawn in orange, EULER-SR in green, LOCAS in red, VELVET in blue and soapDeNovo in turquoise. Each point represents one run.
Figure 4.
Performance comparison of homology-guided assembly on simulated data.
We simulated a resequencing study of an artificial A. thaliana strain using a sequencing depth of 7.5×. The simulated Illumina reads were aligned to the reference genome Col-0 and partitioned into blocks of 25 kb using SHORE. The assembly tools SUPERLOCAS and VELVET were applied to assemble the mapped reads of the first chromosome and the left-over reads. The avgN50 (average N50) is plotted against the avgERR (average error rate) for the assembly tools SUPERLOCAS and VELVET (in left-over incorporation mode). SUPERLOCAS is displayed in red and VELVET in blue.
Figure 5.
Number of detected insertion regions in homology-guided assembly on simulated data.
For the artificial A. thaliana strain in the simulated resequencing study, the total insertion regions in the target genome are plotted for different lengths of these regions. In addition, the number of error-free regions assembled by VELVET and by SUPERLOCAS are shown.
Figure 6.
Performance comparison of homology-guided assembly on real world data without utilizing left-over reads.
Paired-end reads were produced by Illumina GAIIx with a length of 80 bp to a depth of ∼7× for the first chromosome of A. thaliana strain Ler-1. Reads were aligned against the complete reference sequence (Col-0) and partitioned into blocks with SHORE of 25 kb. LOCAS and VELVET are applied in paired-end mode for all blocks which contain reads that are aligned to the same region of the reference sequence. The x-axis shows the avgN50 (average N50) and the y-axis the avgDISS (average dissimilarity). The runs of LOCAS produced with different parameter setting are drawn in red and those of VELVET in blue.
Figure 7.
Performance comparison of homology-guided assembly on real world data utilizing left-over reads.
Illumina reads of the first chromosome of A. thaliana strain Ler-1 were aligned against the reference sequence (Col-0) and partitioned into blocks with SHORE of 25 kb. Local assemblies of reads are performed with SUPERLOCAS and VELVET in order to incorporate left-over reads. While SUPERLOCAS provides algorithms specifically adjusted to this task, VELVET had to assemble each block with the complete set of left-over reads. A barplot is shown for the avgN50 (average N50) size of both assemblers.