Figure 1.
The distribution of projects among the 12 sequencing methods used.
With dark green color are indicated the projects for which there are more than 5 sequenced projects and were used in downstream analysis.
Table 1.
Methods used in this comparison.
Figure 2.
Assembly quality as assessed by the number of scaffolds in draft assemblies.
Data is shown for the six sequencing methods with more than 5 projects. Indicated are the range from upper to lower quartile (boxes), the median (thick black line), and the minimum/maximum values.
Figure 3.
Assembly quality for the draft genomes included in this analysis.
Assembly quality is assessed by (a) the number of gaps in the draft assemblies, and (b) gap size expressed as a percentage of genome length. Data is shown for the six sequencing methods with more than 5 projects.
Figure 4.
Genes missed in draft assemblies.
Data is shown for the sequencing methods with more than 5 projects. (a) Missed gene sequences, i.e., the number of genes in the finished genome whose nucleotide sequence is absent from the draft assembly. (b) Unrecognized genes, i.e., the number of genes whose nucleotide sequence is present in the draft assembly but that were not predicted by Prodigal (v2.5).
Figure 5.
Misassemblies as detected by low gene quality.
Low quality genes are genes present in the finished genome that had a similarity (tBLASTn) to the draft genome but the alignment was either short (<50% of the gene length) or identity was <90%. Data is shown for the six sequencing methods with more than 5 projects.
Figure 6.
Distributions of functions, based on COG group assignments, of gene sequences missing in draft assemblies.
Data is shown for six sequencing technologies; omitted is Illumina PacBio for which there are currently only eight genome projects without any missing genes.
Table 2.
Correlation of the number of contigs with genome GC%, repeat content, and size.