Figure 1.
Max-clique enumeration and edge definitions.
(A) Example of a read alignment graph based on the insert size criterion. Alignments of read pairs are shown in gray and the corresponding nodes in the graph representation are depicted in blue. The four bottom-most alignment pairs stem from a haplotype harboring a deletion (shown in orange in the reference genome) and therefore display a larger insert size than the remaining alignment pairs. Note that the four deletion-indicating alignment pairs form a max-clique (circled in orange). (B) Illustration of the compatible gaps condition of the sequence similarity criterion. Two reads and
are aligned against the reference (left). This induces a direct read-to-read alignment of
and
(right). Case (1): No gaps in the reference alignments lead to a gapless read-to-read alignment, which renders the pair of reads an edge candidate. Case (2): Gaps in the reference alignment lead to gaps in the read-to-read alignment, excluding the possibility of an edge. See also Figure S6 in the appendix for more complicated cases involving gaps.
Figure 2.
Performance in (A) frequency estimation and (B) distinguishing reconstructed local haplotypes.
(A) Ten haplotypes were sampled with different frequencies (x-axis, logarithmic scale), and the mean deviations of the estimated to the true frequencies are reported for ten repetitions of the simulation (y-axis). The different symbols represent data sets with coverages 400×, 800×, and 1600×. Color indicates whether the genome was fully covered by predicted haplotypes (blue) or not (orange). (B) Performance in distinguishing reconstructed local haplotypes, depending on pairwise distance and coverage. The displayed percentages are the fractions of super-reads that do not match any true haplotype without error. Color-coded is the fraction of super-reads that match exactly one true haplotype (100%, orange; , blue;
, violet).
Figure 3.
Estimated deletion size deviation and false negative rate for different true deletion sizes of (A) 100, (B) 500, and (C) 1000 bp. For each deletion length and each coverage of 5, 12, 24, 48, 96, and 144×, a boxplot summarizes the deviations of the estimated to the true deletion size in 100 simulated samples. The blue line represents the number of false negative predicted deletions in each of the 100 samples.
Figure 4.
Global haplotype assembly results.
Minimum, maximum, and mean read lengths (A) and the total number of reads (B) for the global haplotype assembly of the lab-mix, for the first 13 and the last iteration (30).
Table 1.
Global haplotype assembly comparison.