Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Figure 1.

Comparison of the time it takes for k-mer counting tools to calculate k-mer abundance histograms, with time (y axis, in seconds) against data set size (in number of reads, x axis).

All programs executed in time approximately linear with the number of input reads.

More »

Figure 1 Expand

Figure 2.

Memory usage of k-mer counting tools when calculating k-mer abundance histograms, with maximum resident program size (y axis, in GB) plotted against the total number of distinct k-mers in the data set (x axis, billions of k-mers).

More »

Figure 2 Expand

Table 1.

Benchmark soil metagenome data sets for k-mer counting performance, taken from [11].

More »

Table 1 Expand

Figure 3.

Disk storage usage of different k-mer counting tools to calculate k-mer abundance histograms in GB (y axis), plotted against the number of distinct k-mers in the data set (x axis).

Note that khmer does not use the disk during counting or retrieval, although its hash tables can be saved for reuse.

More »

Figure 3 Expand

Figure 4.

Time for several k-mer counting tools to retrieve the counts of 9.7 m randomly chosen k-mers (y axis), plotted against the number of distinct k-mers in the data set being queried (x axis).

BFCounter, DSK, Turtle, KAnalyze, and KMC do not support this functionality.

More »

Figure 4 Expand

Figure 5.

Relation between average miscount — amount by which the count for k-mers is incorrect — on the y axis, plotted against false positive rate (x axis), for five data sets.

The five data sets were chosen to have the same total number of distinct k-mers: one metagenome data set; a set of randomly generated k-mers; a set of reads, chosen with 3x coverage and 1% error, from a randomly generated genome; a simulated set of error-free reads (3x) chosen from a randomly generated genome and a set of E. coli reads.

More »

Figure 5 Expand

Table 2.

Data sets used for analyzing miscounts.

More »

Table 2 Expand

Figure 6.

Relation between percent miscount — amount by which the count for k-mers is incorrect relative to its true count — on the y axis, plotted against false positive rate (x axis), for five data sets.

The five data sets are the same as in Figure 5.

More »

Figure 6 Expand

Figure 7.

Number of unique k-mers (y axis) by starting position within read (x axis) in an untrimmed E. coli 100-bp Illumina shotgun data set, for k = 17 and k = 32.

The increasing numbers of unique k-mers are a sign of the increasing sequencing error towards the 3′ end of reads. Note that there are only 69 starting positions for 32-mers in a 100 base read.

More »

Figure 7 Expand

Table 3.

Iterative low-memory k-mer trimming.

More »

Table 3 Expand

Table 4.

Low-memory digital normalization.

More »

Table 4 Expand

Table 5.

E. coli genome assembly after low-memory digital normalization.

More »

Table 5 Expand