Fig 1.
Quality metrics of low-coverage SAG assemblies.
A faceted plot containing histograms of quality metrics used to describe the assembled SAGs. The facets display the following metrics: A) total number of contigs, B) their total assembled lengths (in number of nucleotide basepairs), C) the length of the longest contig in each assembly (in number of nucleotide basepairs), D) CheckM estimated completeness (as percentage), and E) GC content. Tukey five-number summaries (minimum, 25% quantile, median, 75% quantile, maximum) are overlaid on each metric’s panel.
Fig 2.
SAGs increase phylogenetic diversity and contain distinct genomic features.
The central part of this circular figure contains a heat tree reflecting the number of SAG assemblies placed at different sub-branches of the GTDB v86 bacterial genome tree (represented by node size), and percentage phylogenetic gain achieved by the insertion of the new genome assemblies (represented by color scale). The outer rings of the figure contain additional genomic feature information inferred about the successfully placed SAG assemblies. Proceeding radially outwards, the additional markings denote predicted CRISPR-Cas system type and the confidence of this prediction (ring of single point symbols of variable opacity), estimates of genome completeness (ring of barplots), the number of genes contributing to predicted biosynthetic gene clusters (outermost ring of colored polygons), and number of open reading frames contributing novel entries to the gene catalog when compared against a previously published resource (outer ring of barplots).
Fig 3.
A gene catalog derived from SAGs shows substantial novelty when compared against other microbiome gene catalogs.
Venn diagrams reflect the shared and unique counts of genes when comparing the set of non-redundant genes from this study’s data against previously published gene catalogs derived from metagenomic sequencing efforts in A) mice, B) humans, and C) marine samples.
Fig 4.
Taxonomic classifiers perform better on murine metagenomic samples when their reference databases are augmented with SCG data.
Boxplots and swarmplots show the performance metrics of two classifiers (left column—mean marker gene coverage by MIDAS, right column—total number of kmer hashes assigned by sourmash) on three test metagenomic inputs (1st row—samples from DNR mice, 2nd row—samples from other lab mice, 3rd row—samples from wild mice). Brackets over the boxplots display FDR-adjusted (Benjamini-Hochberg procedure) p-values from pairwise comparisons with the unpaired (MIDAS) and paired (sourmash) Wilcoxon rank-sum test. Text over significance brackets is colored red in cases where the adjusted p-value is less than 0.1.