Table 1.
Organisms of concern to planetary protection.
Table 2.
Samples used in the study.
Table 3.
BactQuant and FungiQuant results for extracted DNA samples.
Fig 1.
The sequencing effort was sufficient to capture most species from each sample.
The plots were generated by randomly subsampling 10% to 90% of reads from each dataset and estimating the number of species identified using MTSv and Bracken. As the number of subsampled reads approach 100%, the number of species identified leveled off which suggests that all but the rarest species were captured. Bracken identified more species because it reallocates genus-level assignments to the species level. For MTSv, the species count included only species with reads that aligned uniquely at the species level. The values are the average for 10 replicates for each subsample size. The results shown are from the 20 high-quality libraries sequenced using Illumina NextSeq High Output runs.
Table 4.
MTSv sequence fragments and alignment characteristics.
Fig 2.
Few organisms were detected in the samples with low amounts of input DNA, and these were likely contaminants that were also found in controls and blank buffer samples.
Staphylococcus aureus and Ralstonia solanacearum were identified in all samples including the PCR no template control (PCRNTC), the elution buffer (EBJPL) and buffer blanks. Seven additional species were found among 7 of the samples. The heat map shows the hierarchical clustering of all species-level bacterial candidate taxa with medium confidence or better (i.e., 300+ signature reads and the ratio ) detected using MTSv. Sample type and description grouping are shown on the x-axis. The heat map colors indicate the number of signature hits assigned to each species (the values were converted to a long scale for better visualization). The values have been normalized by row to show the relative difference in species hits between samples.
Fig 3.
Substantial taxonomic diversity was observed among the high-quality libraries and there is some clustering of sample types.
The heat map shows the hierarchical clustering of species- and genus-level bacterial candidate taxa with medium confidence or better (i.e., 300+ signature reads and the ratio ) detected using MTSv. Sample type and description grouping are shown on the x-axis. The heat map colors indicate the number of signature hits assigned to each species (the values were converted to a long scale for better visualization). The values have been normalized by row to show the relative difference in species hits between samples.
Fig 4.
Heat map and hierarchical clustering of the top 40 taxa mapped by MetaPhlAn2. Sample category and number is shown on the x-axis.
Whole metagenome sequences from each sample were characterized by MetaPhlAn2 and merged into a relative abundance table. A heatmap with hierarchical clustering of the top 40 taxa was generated to illustrate similarity.
Fig 5.
Heat map and hierarchical clustering of the top 40 taxa mapped by Bracken. Sample category and number is shown on the x-axis.
Taxonomic assignments from whole metagenome sequences from each sample were assigned by Kraken2 and abundances were estimated using Bracken. The results for each sample were merged into a relative abundance table. A heatmap with hierarchical clustering of the top 40 taxa was generated to illustrate similarity.
Fig 6.
Heat map of genus-level overlap among three classification methods.
The heatmap shows the number of methods (MTSv, Bracken, and MetaPhlAn2) that agree with each call at the genus level. Shown are 78 genera identified by all three methods in at least two samples. A total of 111 unique genera were identified by all three methods, 383 unique genera were identified in at least two methods, and 1,081 unique genera were identified in at least one method, across all samples. When only two methods agreed it was usually MTSv and Bracken, MetaPhlAn2 missed 93% of calls made by the other two tools.
Fig 7.
Heat map of species-level overlap among three classification methods.
The heatmap shows the number of methods (MTSv, Bracken, and MetaPhlAn2) that agree with each call at the species level. Shown are the 54 species identified by all three methods in at least two samples. A total of 317 unique species were identified by all three methods,3,698 unique species were identified in at least two methods, and 26,542 unique species were identified by at least one of the methods. When only two methods agreed it was usually MTSv and Bracken, MetaPhlAn2 missed 96% of the total species-level calls.
Fig 8.
Hierarchical clustering of the top 40 pathways identified using HUMAnN2 for each sample (x-axis).
See S11 Table in S2 File (https://zenodo.org/record/7041654) for full pathway names. Pathways from the whole metagenome sequence data for each sample were estimated using HUMAnN2. The results for each sample were merged into a single table. A heatmap with hierarchical clustering was generated for the top 40 pathways to provide a qualitative snapshot of grouping between samples. Complete pathway names are located in S11 Table in S2 File (https://zenodo.org/record/7041654).
Fig 9.
Sample clustering by KEGG function composition.
A library of 6,741 KeGG Orthology (KO) functional sequences was collected for species and genera identified in the metagenomic analysis. Reads that uniquely mapped to a particular KO were counted and the counts were normalized for read depth and unique sequence length to generate RPKM-like statistics. The top 500 RPKM values were used to estimate the functions with high copy count for each sample. The union of these functions identified a set of 2,704 KOs. The figure shows the results of a principal components analysis (based on Bray-Curtis dissimilarity) across these KOs for all samples. Colors indicate ISO facilities and enclosing circles indicate different sample types. ISO represents building (or source), while circles identify sample types.
Fig 10.
Taxonomy of the assembled metagenome of sample 4–1.
Each contig greater than 1kb from the assembly of sample 4–1 was analyzed with blastn. The contribution of each organism was estimated. A pie chart was generated for the top hits to illustrate the taxonomic composition.
Fig 11.
Taxonomy of the assembled metagenome of sample 3–7.
Each contig greater than 1kb from the assembly of sample 3.7 was analyzed with blastn. The contribution of each organism was estimated. A pie chart was generated for the top hits to illustrate the taxonomic composition.
Fig 12.
Circular map with open reading frames of the assembled genome of Moraxella osloensis from sample 4–1.
Map was created with BLAST Ring Image Generator (BRIG) [28].
Table 5.
Scaffold taxonomic identity.