Skip to main content
Advertisement

< Back to Article

Figure 1.

Overview of genome assembly and annotation improvement process.

More »

Figure 1 Expand

Table 1.

Summary of assembly metrics after Pilon improvement.

More »

Table 1 Expand

Table 2.

Summary of annotation changes in protein coding genes.

More »

Table 2 Expand

Figure 2.

Examples of an artifactual insertion and an artifactual deletion that were corrected during the update of the P. brasiliensis Pb03 genome sequence.

Screenshots of Pilon-generated genome browser tracks in GenomeView v1.0 [35] show the evidence used by Pilon to recognize and correct an incorrect insertion in the gene PABG_00120 (left) and an incorrect deletion in the gene PABG_00790 (right). Tracks (top panels) depict paired-end reads (green) aligned to the corresponding region of the reference assembly v1, a subset of the total depth of ∼150X or ∼170X; these alignments were used by Pilon to refine the consensus sequence, generating the improved Pb03 assembly v2. Positions in the v1 assembly where aligned reads suggest a change due to either a gap (red box) or an insertion (black line) are indicated with dashed red boxes. The changes suggested by Pilon are also supported by conservation of the changed bases in a multiple alignment (bottom panels) with the corresponding region of P. brasiliensis Pb18 and P. lutzii Pb01.

More »

Figure 2 Expand

Figure 3.

Improved consistency of gene annotation in v2 genomes.

The final predicted gene sets of the three Paracoccidioides strains were clustered using OrthoMCL, in v1 and v2. The scatterplots (A) compare, for each clustered group, the maximum length versus the minimum length of the three Paracoccidioides genes in the same cluster, for each of the two versions. The scatterplot contrasts the maximum-minimum pairs from annotation v1 (red points) and those from annotation v2 (blue points). The location of blue points closer to the diagonal illustrates that the annotation v2 was more consistent across the three genomes with smaller differences in gene length. In the same sense, the rank plots (B) show the difference between maximum and minimum length for each clustered group, for each of the two versions; again annotation v2 (blue line) showed fewer (later increase) and smaller (more gradual increase) differences, corresponding to the improvement of the genome annotation in v2.

More »

Figure 3 Expand

Figure 4.

Diverse error correction for the 90 kDa heat shock protein (HSP90 gene) of Paracoccidioides spp.

(A) In this example different annotation errors were present in v1 of all three Paracoccidioides reference strains, all of which were fixed in v2 after Pilon improvement and re-annotation. The example also illustrates how one or more single-nucleotide errors, unknown single nucleotides (N's), or single nucleotides that were erroneously reported as absent or duplicated by a Sanger sequencer can amplify across annotations, generating radically different gene structure (intron/exon and/or gene boundary) predictions. (B) Five changes are shown at assembly (DNA sequence) level, one of which was a single nucleotide error in a stop codon; as a result, the gene-calling program did not recognize the end of an exon and it was not reported.

More »

Figure 4 Expand

Table 3.

Changes in updated annotations of known yeast-phase specific genes or virulence factors of Paracoccidioides and other dimorphic human pathogenic fungi.

More »

Table 3 Expand