Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Figure 1.

Heptanucleotide Signature Based Cladogram.

Cladogram derived from heptanucleotide signatures using Euclidean distances between 1,424 sequenced microbes. Terminal branches are color-coded to depict nearest neighbor taxonomic relationships as: strong relationships (same species or same genus) in red, good relationships (phylum or better) in blue, same domain in yellow and different domain in black. This figure shows that heptanucleotide signatures are conserved amongst phylogenetically similar organisms across the tree of life. The tendency for phylogenetically similar organisms to maintain similar oligonucleotide biases is the basis oligonucleotide-based clustering techniques.

More »

Figure 1 Expand

Figure 2.

Oligonucleotide vs. 16S rRNA Comparisons.

The ability place phylogenetically similar organisms together on a cladogram using mononucleotide through nonanucleotide signatures was tested against a cladogram generated using 16S rRNA for 1,424 completed prokaryotic genomes. This figure shows the percentage correct cladogram placement for oligonucleotide signature (x-axis) verses the percentage of correct cladogram placement for 16S rRNA (y-axis). Taxonomic level is show along top axis using: same species (S), same genus (G), same family (F), same phylum (P) and same domain (D). Mononucleotides through nonanucleotide signature trend lines are color-coded (see figure legend).

More »

Figure 2 Expand

Figure 3.

Improvement in Placement vs. CPU Time.

The sum total percent improvement in placing identical taxonomic levels together on a cladogram as oligonucleotide length is increased verses the increase in CPU time required to calculate all Euclidean distances between 1,424 genomes. CPU time increases are due to the exponential increase in signature bins (and therefore variables in Euclidean distance calculations) as oligonucleotide lengths increase.

More »

Figure 3 Expand

Figure 4.

Tetranucleotide & Heptanucleotide vs. 16S rRNA identity.

Plot of 16S percent identity verses genus normalized Euclidean distance for tetranucleotide (A) and heptanucleotide (B) signatures. Plots are colored based on the highest shared taxonomic level of the two organisms being compared: same species are in orange, same genus (purple), same family (green), same order (red), same phylum (blue), same domain (yellow) and different domain (black). Vertical lines added at a Euclidean distance of 0.3 for visual reference. By plotting 16S identity verses Euclidean distance this plot demonstrates the range of oligonucleotide Euclidean distances useful for discerning the taxonomic relationships between sequences. Additionally, this plot shows that low oligonucleotide Euclidean distances are a strong indicator that sequences are from phylogenetically close organisms.

More »

Figure 4 Expand

Figure 5.

Tetranucleotide vs. Heptanucleotide.

Plot of tetranucleotide verses heptanucleotide Euclidean distance for 1,424 genomes to a genus normalized Euclidean distance of 2.0 (A) and 0.20 (B). Plots are colored based on the highest shared taxonomic level of the two organisms being compared: same species are in orange, same genus (purple), same family (green), same order (red), same phylum (blue), same domain (yellow) and different domain (black). Plots include a 1∶1 line to mark equivalence between tetranucleotide and heptanucleotide Euclidean distances. These plots demonstrate lower Euclidean distance for closely related organisms (same genus/species) from heptanucleotide signatures, while moving towards shorter Euclidean distances for distantly related organisms from tetranucleotide signatures.

More »

Figure 5 Expand

Figure 6.

Leave-one-out Histograms.

Histograms show the results of a leave-one-out analysis where the oligonucleotide-based Euclidean distance was calculated between all organisms (except self comparisons) and the percentage of organism matches which contain identical taxonomy for tetranucleotide (A) and heptanucleotide (B) signatures was binned based on genus normalized Euclidean distance. Plots are colored based on the highest shared taxonomic level of the two organisms being compared: same species are in orange, same genus (purple), same family (green), same order (red), same phylum (blue), same domain (yellow) and different domain (black). These plots are useful for determining the statistical likelihood of taxonomic matches between unknown sequences, as the percentages can be used to determine likelihood of a taxonomic match when the Euclidean distance between two unknown sequences has been calculated.

More »

Figure 6 Expand

Figure 7.

Random Mutations.

This figure shows how a one million base pair DNA sequence responds to random mutations. Euclidean distance from the initial sequence is plotted for tetranucleotide (A) and heptanucleotide (B) verses iteration number. Figure 7C shows tetranucleotide verses heptanucleotide Euclidean distance by iteration with a 1∶1 line (red) to show equivalence. These plots show that heptanucleotide signatures demonstrate a faster increase in Euclidean distance from small changes in the DNA sequence, compared to tetranucleotide signatures, while leveling off and responding little to changes beyond approximately 600,000 iterations. Conversely, tetranucleotide signatures demonstrate smaller increases in Euclidean distance as a result of small perturbations in the DNA sequence, but continue to fluctuate to one million iterations.

More »

Figure 7 Expand

Figure 8.

Metagenomic Sized Fragments.

Completed prokaryotic genomes were broken into metagenomically relevant fragments sizes of: 1,000 bp, 2,500 bp, 5,000 bp, 10,000 bp, 15,000 bp, 25,000 bp and 50,000 bp by extracting a random fragment of each length from each of the 1,424 genomes. The tetranucleotide and heptanucleotide based Euclidean distance was calculated between each fragment and these distances were used to construct cladograms. Each cladogram was analyzed for the percentage of organisms with a nearest neighbor belonging to the same genus and this percentage is plotted verses fragment length. Improvement is seen as fragment length is increased, but the improvement levels off at approximately 10,000 bp for tetranucleotide signatures and approximately 5,000 bp for heptanucleotide signatures, with heptanucleotide signatures are performing better at all fragment lengths.

More »

Figure 8 Expand