Assembly-Driven Community Genomics of a Hypersaline Microbial Ecosystem

doi:10.1371/journal.pone.0061692

Figure 1.

Length-weighted %G+C nucleotide composition of unassembled reads, assembled scaffolds, and composite population genomes.

Genomes were constructed by targeted assembly of scaffolds with a uniform signature of phylogenetic binning properties, as described in Materials and Methods. Genome names, percent G+C, and other general properties of assembled genomes are shown in Table 1.

More »

Expand

Table 1.

Consensus population genome properties.

More »

Expand

Figure 2.

Phylogenetic distribution of archaeal 16S rRNA gene sequences in assembled scaffolds and population genomes.

Names in bold indicate new 16S rRNA sequences identified in this study. Boxed names indicate sequences contained within Lake Tyrrell-specific population genomes. Asterisks indicate isolated individual sequences found on small scaffolds that were not associated with any assembled population genome.

More »

Expand

Figure 3.

Relative abundance of microbial population groups.

Colors correspond to taxonmically related microbial populations, including both assembled genome sequences and non-genomic scaffolds containing less abundant variant sequences. Percentage calculations include total number of assembled nucleotides in reads associated with each group, normalized for the group's average genome size. Percentage of unclassified sequences was calculated using an estimated genome size of 3 MB, the approximate abundance-weighted average for all other groups. Known viral and plasmid sequences, representing approximately 0.2% of assembled nucleotides, have been excluded from these calculations.

More »

Expand

Figure 4.

Phylogenetic distribution of protein BLAST matches for assembled population genomes and unclassified scaffolds.

Taxonomic distribution of non-self matches versus the Genbank nr database were calculated using the DarkHorse algorithm at a filter threshold setting of 0.05, including only alignments covering at least 70% of both query and target sequences with an e-value of 1e-5 or better.

More »

Expand

Figure 5.

Metabolic connectivity graph showing community distribution of protein family clusters.

Cohesive populations are shown as similarly colored nodes and vectors according to numbers of shared features, based on unsupervised protein family clustering of 12 habitat-specific genomes.

More »

Expand

Table 2.

Population-unique protein family clusters.

More »

Expand