The Fast Changing Landscape of Sequencing Technologies and Their Impact on Microbial Genome Assemblies and Annotation

doi:10.1371/journal.pone.0048837

Figure 1.

The distribution of projects among the 12 sequencing methods used.

With dark green color are indicated the projects for which there are more than 5 sequenced projects and were used in downstream analysis.

More »

Expand

Table 1.

Methods used in this comparison.

More »

Expand

Figure 2.

Assembly quality as assessed by the number of scaffolds in draft assemblies.

Data is shown for the six sequencing methods with more than 5 projects. Indicated are the range from upper to lower quartile (boxes), the median (thick black line), and the minimum/maximum values.

More »

Expand

Figure 3.

Assembly quality for the draft genomes included in this analysis.

Assembly quality is assessed by (a) the number of gaps in the draft assemblies, and (b) gap size expressed as a percentage of genome length. Data is shown for the six sequencing methods with more than 5 projects.

More »

Expand

Figure 4.

Genes missed in draft assemblies.

Data is shown for the sequencing methods with more than 5 projects. (a) Missed gene sequences, i.e., the number of genes in the finished genome whose nucleotide sequence is absent from the draft assembly. (b) Unrecognized genes, i.e., the number of genes whose nucleotide sequence is present in the draft assembly but that were not predicted by Prodigal (v2.5).

More »

Expand

Figure 5.

Misassemblies as detected by low gene quality.

Low quality genes are genes present in the finished genome that had a similarity (tBLASTn) to the draft genome but the alignment was either short (<50% of the gene length) or identity was <90%. Data is shown for the six sequencing methods with more than 5 projects.

More »

Expand

Figure 6.

Distributions of functions, based on COG group assignments, of gene sequences missing in draft assemblies.

Data is shown for six sequencing technologies; omitted is Illumina PacBio for which there are currently only eight genome projects without any missing genes.

More »

Expand

Table 2.

Correlation of the number of contigs with genome GC%, repeat content, and size.

More »

Expand