Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Multi-faceted metagenomic analysis of spacecraft associated surfaces reveal planetary protection relevant microbial composition

  • Sarah K. Highlander †,

    † Deceased.

    Roles Formal analysis, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Pathogen and Microbiome Division, The Translational Genomics Research Institute, Flagstaff, Arizona, United States of America

  • Jason M. Wood,

    Roles Conceptualization, Formal analysis, Investigation, Writing – review & editing

    Affiliation Jet Propulsion Laboratory, California Institute of Technology, Biotechnology and Planetary Protection Group, Pasadena, California, United States of America

  • John D. Gillece,

    Roles Data curation, Formal analysis, Software, Writing – review & editing

    Affiliations Pathogen and Microbiome Division, The Translational Genomics Research Institute, Flagstaff, Arizona, United States of America, Pathogen & Microbiome Institute (PMI), Northern Arizona University, Flagstaff, Arizona, United States of America

  • Megan Folkerts,

    Roles Data curation, Formal analysis, Investigation, Visualization, Writing – review & editing

    Affiliation Pathogen and Microbiome Division, The Translational Genomics Research Institute, Flagstaff, Arizona, United States of America

  • Viacheslav Fofanov,

    Roles Data curation, Formal analysis, Methodology, Project administration, Supervision, Writing – review & editing

    Affiliations Pathogen & Microbiome Institute (PMI), Northern Arizona University, Flagstaff, Arizona, United States of America, School of Informatics, Computing and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, United States of America

  • Tara Furstenau,

    Roles Data curation, Formal analysis, Visualization, Writing – review & editing

    Affiliation Pathogen & Microbiome Institute (PMI), Northern Arizona University, Flagstaff, Arizona, United States of America

  • Nitin K. Singh,

    Roles Formal analysis, Software, Writing – review & editing

    Affiliation Jet Propulsion Laboratory, California Institute of Technology, Biotechnology and Planetary Protection Group, Pasadena, California, United States of America

  • Lisa Guan,

    Roles Formal analysis, Investigation, Writing – review & editing

    Affiliation Jet Propulsion Laboratory, California Institute of Technology, Biotechnology and Planetary Protection Group, Pasadena, California, United States of America

  • Arman Seuylemezian,

    Roles Data curation, Formal analysis, Investigation, Writing – review & editing

    Affiliation Jet Propulsion Laboratory, California Institute of Technology, Biotechnology and Planetary Protection Group, Pasadena, California, United States of America

  • James N. Benardini,

    Roles Funding acquisition, Project administration, Writing – review & editing

    Affiliation Jet Propulsion Laboratory, California Institute of Technology, Biotechnology and Planetary Protection Group, Pasadena, California, United States of America

  • David M. Engelthaler,

    Roles Conceptualization, Investigation, Project administration, Supervision, Writing – review & editing

    Affiliation Pathogen and Microbiome Division, The Translational Genomics Research Institute, Flagstaff, Arizona, United States of America

  • Kasthuri Venkateswaran ,

    Roles Conceptualization, Funding acquisition, Investigation, Project administration, Writing – review & editing (PSK); (KV)

    Affiliation Jet Propulsion Laboratory, California Institute of Technology, Biotechnology and Planetary Protection Group, Pasadena, California, United States of America

  • Paul S. Keim

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing (PSK); (KV)

    Affiliations Pathogen and Microbiome Division, The Translational Genomics Research Institute, Flagstaff, Arizona, United States of America, Pathogen & Microbiome Institute (PMI), Northern Arizona University, Flagstaff, Arizona, United States of America, Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona, United States of America


The National Aeronautics and Space Administration (NASA) has been monitoring the microbial burden of spacecraft since the 1970’s Viking missions. Originally culture-based and then focused 16S sequencing techniques were used, but we have now applied whole metagenomic sequencing to a variety of cleanroom samples at the Jet Propulsion Lab (JPL), including the Spacecraft Assembly Facility (SAF) with the goals of taxonomic identification and for functional assignment. Our samples included facility pre-filters, cleanroom vacuum debris, and surface wipes. The taxonomic composition was carried out by three different analysis tools to contrast marker, k-mer, and true alignment approaches. Hierarchical clustering analysis of the data separated vacuum particles from other SAF DNA samples. Vacuum particle samples were the most diverse while DNA samples from the ISO (International Standards Organization) compliant facilities and the SAF were the least diverse; all three were dominated by Proteobacteria. Wipe samples had higher diversity and were predominated by Actinobacteria, including human commensals Cutibacterium acnes and Corynebacterium spp. Taxa identified by the three methods were not identical, supporting the use of multiple methods for metagenome characterization. Likewise, functional annotation was performed using multiple methods. Vacuum particles and SAF samples contained strong signals of the tricarboxylic acid cycle and of amino acid biosynthesis, suggesting that many of the identified microorganisms have the ability to grow in nutrient-limited environments. In total, 18 samples generated high quality metagenome assembled genomes (MAG), which were dominated by Moraxella osloensis or Malassezia restricta. One M. osloensis MAG was assembled into a single circular scaffold and gene annotated. This study includes a rigorous quantitative determination of microbial loads and a qualitative dissection of microbial composition. Assembly of multiple specimens led to greater confidence for the identification of particular species and their predicted functional roles.


One of the key goals of space exploration is the quest for extraterrestrial life; critically, it is important to prevent contamination of extraterrestrial sites with terrestrial organisms (“forward contamination”). Spacecraft must be assembled in metagenomically-monitored cleanrooms. The Jet Propulsion Laboratory (JPL) has several such cleanrooms, with various International Organization for Standardization (ISO) classifications from 5 to 8.5 (105–5x108 particles/m3). These cleanrooms are maintained with rigorous cleaning routines and high-efficiency particulate air (HEPA) or ultra-low particulate air (ULPA) air filtration systems. Depending on ISO classification, different levels of personal protective equipment are required to maintain the respective standard of cleanliness. As a result, samples obtained from these facilities are ultra-low biomass. Most of the organisms expected to be found in these environments are of human origin, although previous studies have identified the presence of oligotrophs in such facilities [1].

The National Aeronautics and Space Administration (NASA) has been monitoring the microbial bioburden of spacecraft since the 1970’s Viking missions using culture-based techniques [2] to abide by the international treaty set forth by the Committee on Space Research (COSPAR). Over the years, more comprehensive, non-culture-based techniques have been developed and tested for profiling microbial communities in the built environment [3,4]. NASA is now evaluating the feasibility of a metagenomics-based approach for the quantification, phylogenetic identification, and functional assessment of microbes on spacecraft-associated surfaces, including the cleanrooms in which they are assembled. Of particular concern are organisms resistant to decontamination methods or those with phenotypic traits that may allow survival in space environments, such as resistance to desiccation, radiation, ionization, etc. (Table 1).

Two prior studies have used whole metagenome sequencing (WMGS) to evaluate microbial contamination in spacecraft assembly facilities. In one study, which included samples from the JPL Spacecraft Assembly Facility (SAF) [8], Acinetobacter lwoffii was dominant. The second study examined three different cleanroom facilities, including the JPL SAF [9], where again, the dominant genus during metagenome assembly was Acinetobacter. While culture dependent methods were not used in this study, the metagenomic approach was previously validated using focused 16S gene sequencing [8]. Pathogens and corresponding virulence genes were also found. A limitation of this study was that multiple displacement amplification was used to amplify each sample. This approach overcame technical problems with low biomass samples, but it is also known to introduce bias and contamination [10].

Here we used only unbiased shotgun metagenomics to evaluate microbial burden and composition in different cleanrooms and different types of samples collected at the JPL, developing a pipeline for continued monitoring of relevant built environments where NASA’s life-detection mission components were assembled. The assembly facilities are controlled cleanroom environments (certified ISO 5 to ISO 8.5) and were sampled using surface liquid swipes and air vacuum devices. Procedural and reagent controls were included to understand background signal. To these samples, we applied methods first to quantitate both bacterial and fungal load, and developed a two-tiered approach to generate high-quality and low-quality sequencing libraries based on biomass. We used three different classification methods to capture each sample’s most complete taxonomic profile. Likewise, we annotated the data to make functional predictions that could be important to identify unknown species representing potential high consequence space contaminants. Because of the low diversity of samples, genome assembly from the WMGS data was possible and demonstrated here for a strain identified as Moraxella osloensis.

Materials and methods

Samples and DNA extraction

The project’s experimental design was developed to comprehensively evaluate a metagenomic processing workflow and analysis pipeline for spaceflight hardware cleanroom samples (Table 2). Samples were chosen to represent 1) a range of cleanroom classifications (from ISO 5 to ISO 8.5), 2) several sample types (surface, filter, vacuum), and 3) a range of expected biomass content (from below detection limit to nanograms). Samples and their respective negative controls were sent for metagenomic analysis at various stages of the processing workflow to capture any variability and bias introduced at each critical step.

Sample collection and processing

A total of 50 samples (41 samples and 9 controls), collected from controlled cleanroom environments (certified ISO 5 to ISO 8.5) at JPL, were sent on dry ice at -20°C to the Translational Genomics Research Institute (TGen) for DNA extraction and WMGS analysis. Samples came in the form of extracted DNA, (n = 18) and pre-extraction homogenized samples (n = 32). The samples encompassed five sample categories: 1) DNA from cleanroom facilities; 2) wipe solution from cleanroom facilities; 3) bench pre-filter solution; 4) vacuum particle solution; 5) DNA technical replicates from one cleanroom (SAF). All samples are described in detail in Table 2. All pre-extraction samples were kept cold at 4°C until processed. Following extraction, all DNA samples were frozen at -20°C.

Category 1: DNA samples extracted at JPL (n = 12).

A subset of the sample types described above were extracted for DNA at JPL (n = 9). All sample solutions underwent filter concentration to increase biomass concentration prior to extraction. Samples were filtered (Millipore Sigma, 50kD Amicon® Ultra 15 mL Centrifugal Filters) to a final volume of 250 μL. Following the kit protocol, 200 μL of lysis buffer (Qiagen, Buffer ATL, UCP Pathogen Mini Kit) was added to each concentrated sample and incubated at 56°C for 10 min with continuous shaking at 600 rpm. Following chemical lysis, each sample was added to bead beating tubes (OPS Diagnostics, 100 μm/400 μm Acid-Washed Silica Beads) and vortexed at max speed for 10 min ± 30 sec. Final lysate was centrifuged for 2 min at 13,000 x g and 400ul were aspirated from the tubes and loaded onto a QIAcube automated DNA extraction instrument (Qiagen) and extracted following manufacturer specifications for the UCP Pathogen Mini Kit. Ten μL of DNA extract per sample were sent to TGen for sequencing and analysis. The materials such as facility wipe without active sample collection, extraction reagent, and filter were included as background controls (n = 3).

Category 2: Facility surface wipe samples (n = 12).

Cleanroom surface samples (n = 10) were collected from the floor, walls, and workbenches of several JPL cleanrooms used for spaceflight hardware assembly. Pre-moistened, sterile polyester wipes (Texwipe Co., TX3211 SterileWipe, ThermoFisher Scientific, UltraPure DNase/RNase-Free Distilled Water) were used for sample collection and stored in sterile 50 mL polystyrene tube (BD Falcon, 50mL Conical Tube). Wipes were transferred to individual 500 mL polystyrene storage bottles (Corning) with 100mL of dissociation buffer (10 mM Tris, 1 mM EDTA, pH 8.0, 0.05% Tween 80 [v/v]). The bottles were then sonicated for 5 min ± 15 sec at 25 kHz in an ultrasonic bath containing 0.05% Tween (v/v). For additional physical agitation, bottles were shaken at 200 rpm for 30 min ± 1 min on an orbital shaker. Since wipe solutions were expected to have highly diluted biomass, up to 2 samples from the same cleanroom and collected on the same day were combined and filter concentrated using a filter size that is small enough to retain microbial cells (Millipore Sigma, 50kD Amicon® Ultra 15 mL Centrifugal Filters). 40 mL of wipe solution were sent to TGen to be used for extraction and analysis. A background control wipe solution (n = 2) followed the same protocol as all surface wipe samples, but was not exposed to the cleanroom environment, was included.

Category 3: Flow bench pre-filter samples (n = 10).

JPL cleanrooms contain ISO 5 flow benches that are often used for hardware assembly and 8 samples, and 2 controls were included in this category. These flow benches contain pre-filters and HEPA filters. The pre-filters are replaced at the time of recertification and collected for this study. To remove particles on the surface and within the fibers of the pre-filter, a forensic vacuum (3M, Trace Evidence Vacuum A-6510) with pre-sterilized vacuum filters (3M, Trace Evidence Vacuum Filters A-6512, 97% retention rate for 0.1-micron particles) were used. To maximize recovery, the inlet side of the flow bench pre-filters were vacuumed a total of three times. Vacuum filters were individually transferred to a polystyrene 500 ml storage bottle (Corning, Corning, NY) and 50mL of dissociation buffer (10 mM Tris, 1 mM EDTA, pH 8.0, 0.05% Tween 80 [v/v]) was added. Similar to the cleanroom wipes, bottles were sonicated for 5 min ± 15 sec at 25 kHz in a waterbath containing 0.05% Tween (v/v). Bottles were shaken at 200rpm for 30 min ± 1 min on an orbital shaker. 20 or 30mL of pre-filter solution was sent for extraction and analysis. Negative control filter solutions for each type of flow bench pre-filter (carbon or non-carbon) were obtained from new and unused flow bench pre-filters and followed the same particle removal protocol for further analysis.

Category 4: Cleanroom vacuum samples (n = 11).

Cleanroom vacuum dust bags (n = 10) were included in this study because they are likely to contain a high concentration of biomass while still being representative of microbes present in a cleanroom environment. These samples were collected from certified cleanroom vacuum dust bags (Nilfisk Inc., GM 30CR Cleanroom Vacuum) at the time of vacuum recertification. Dust bag particles were transferred polystyrene 500 mL storage bottles (Corning, Corning, NY) and 50mL of dissociation buffer (10 mM Tris, 1 mM EDTA, pH 8.0, 0.05% Tween 80 [v/v]) was added to each bottle. Bottles were sonicated for 5 min ± 15 sec at 25 kHz in an ultrasonic bath containing 0.05% Tween (v/v). Bottles were shaken at 200 rpm for 30 min ± 1 min on an orbital shaker. For each sample, 30 mL of dissociation solution was used for extraction and analysis. The dissociation buffer (10 mM Tris, 1 mM EDTA, pH 8.0, 0.05% Tween 80 [v/v]) was processed without any additional input and used as a background control (n = 1).

Category 5: Technical DNA replicate samples (n = 5).

To assess variability between technical replicates, five 10 μL aliquots from one DNA sample obtained from surface samples of the Spacecraft Assembly Facility (SAF, ISO 7) at JPL were sent to TGen. A background control containing DNA from a wipe that did not come into contact with any facility surfaces and followed the same extraction protocol was included.

DNA samples extracted at TGen

DNA was extracted from the particulate and solution samples received from JPL (n = 32) using the MagMax Microbiome Ultra Nucleic Acid Isolation kit (ThermoFisher, #ABA423577) using the Kingfisher Flex System (Thermofisher, #5400620). Briefly, 20 mL of each homogenized solution or particulate sample was transferred to a 50 mL high-speed conical tube and centrifuged for 10 min at 13,000 x g. The supernatant was decanted and the resulting pellet was suspended in 200 μL molecular biology grade water. The suspension was transferred to a MagMax bead plate, and the manufacturer’s protocol was followed for the remainder of the extraction. DNAs were resuspended in 50 μL MagMax Elution Solution (ThermoFisher, #ABA423577).

Bacterial and fungal quantitative analyses

The bacterial and fungal load of each sample (n = 50) was quantified using BactQuant and FungiQuant qPCR assays, respectively, as described previously [11] (Table 3 and S1 Table in S2 File Since the quantity of fungal DNA in a sample is much lower than the amount of bacterial DNA, it is not as useful a predictor of whole metagenome library preparation success. FungiQuant was only performed on samples extracted at TGen to conserve sample volume on previously extracted DNA samples from JPL. The detection limits of the BactQuant and FungiQuant assays are approximately 100 16S rRNA copies/μL, and ten 18S rRNA copies/μL, respectively. Thus, samples with values close to these limits (i.e., within the same order of magnitude) are reported as having no quantifiable bacterial or fungal DNA.

Table 3. BactQuant and FungiQuant results for extracted DNA samples.

Since the BactQuant and FungiQuant assays only account for potential bacterial and fungal DNA in a sample, respectively, additional testing was performed to determine if additional DNA, such as human or plant, might be present. A subset of the samples extracted at both TGen, and JPL was assessed using the Agilent Fragment Analyzer (Agilent Technologies) to visualize DNA content. Samples were analyzed using Agilent’s High Sensitivity Genomic DNA Kit (#DNF-488-0500), which has a detection limit of five picograms (equivalent to cal. 30–200 16S rRNA gene copies). Examples of two analyses are shown in S1 Fig in S1 File ( The results of fragment analysis agreed with BactQuant and FungiQuant data; that is, no quantifiable DNA was present in these samples (e.g., sample 3–6). Thus, the combination of qPCR and fragment analysis provided sufficient sample quality information to guide subsequent whole metagenome library preparations.

DNA libraries and sequencing

We prepared 53 samples (consisting of 50 samples provided by JPL and one positive and two negative controls included by TGen) for WMGS using the Celero DNA-seq Kit (Tecan Genomics, #0360A-UDI). The Celero kit uses Tecan’s adapter-dimer-free technology, which makes it an ideal choice for low-biomass samples. Samples were sonicated to fragment genomic DNA, then size selection was performed to obtain a size range of approximately 600–700 bp. Libraries were quantified using KAPA’s Library Quantification Kit for Illumina (KAPA Biosystems, #KK4973), then were visualized on an Agilent Technologies Bioanalyzer to assess library size and quality. As an additional quality-control step, all samples were first sequenced using a Nano Kit v2 (2 x 150 bp) on the Illumina MiSeq platform. Although the Celero DNA-seq kit used is advertised to be dimer free, in situations where input DNA concentration is low or absent, adapter dimer has been observed. Since it is extremely difficult to completely remove this dimer via conventional clean-up methods, when sequenced, the dimer overwhelms larger, genomic DNA-derived fragments on the flow cell. Thus, based on the performance of individual samples on this trial sequencing run, and on the prior qPCR and fragment analysis data, only samples with appropriate quality libraries were sequenced separately on Illumina’s NextSeq platform using the 2 x 150 bp High Output Kit v2.

Twenty high-quality libraries (1–11, 2–5, 2–6, 2–7, 2–8, 2–9, 2–11, 3–7,3–8, 4–1, 4–2,4–3, 4–4, 4–5, 5–1, 5–2, 5–3, 5–4, 5–5, 5–6) were split evenly among two Illumina NextSeq High Output runs (2–150 bp). The remaining low quality libraries (1–1, 1–2, 1–3, 1–4, 1–5, 1–6, 1–7, 1–8, 1–9, 1–10, 1–12, 2–1, 2–2, 2–3, 2–4, 2–10, 3–1, 3–2, 3–3, 3–4, 3–5, 3–6, 3–9, 3–10, 4–6, 4–7, 4–8, 4–9, 4–10, and controls) were sequenced on a MiSeq (v2, 2 x 150 bp) to limit sequencing reagent waste, since the majority, if not all, of these samples were expected to fail. In addition, the MiSeq is less impacted by adapter dimer (Illumina, personal communication) so the probability of obtaining useable data was greater with this method.

Sequencing metrics and quality control of sequence data

The paired reads for each sample were first analyzed with FastQC ( to assess quality and adapter levels. Most of the low quantity libraries (from the MiSeq pool) contained greater than 90% adapter, as expected. However, there were a few samples (see below) that were identified as mid to high quality in the MiSeq pool that were also sequenced on the NextSeq to increase the yield for improved WMGS analysis.

Prior to read-based analysis and assembly, reads were run through KneadData (v 0.7.2) ( to remove adapters, low quality reads and human reads. KneadData incorporates FastQC, Trimmomatic [12], and BMTagger ( for mapping and filtering using the latest assembly of the human genome. The results from KneadData for each sample are shown in S2 Table in S2 File ( Note that most of the 20 samples that we selected for NextSeq sequencing (in black) performed well, although samples 5–6, 1–11, and 2–7 had significant numbers of reads that were dropped post-filter (87.7%, 35.0% and 29.6%, respectively). For the samples sequenced on the NextSeq, human filtering was very efficient, with a few exceptions, notably 2–5, where 9.1% human reads remained (an enrichment from 0.6%, probably due to loss of poor-quality reads) and sample 3–7 where 10% of reads remained from a pre-filter high of 78.4%. Examination of the MiSeq data revealed four samples (i.e., 1–8, 1–10, 2–1 and 4–7) that could be promoted to NextSeq sequencing to gather additional data.

Sequencing depth metrics

To assess whether sufficient amounts of sequence data were generated for accurate and sensitive taxonomic characterization for the 20 high-quality libraries, we performed a series of subsampling experiments to assess the rate of species accumulation (Fig 1), similar to a rarefaction curve. The goal was to determine whether additional sequencing was needed and/or would provide substantial improvements in the number of candidate species detected for each of the samples.

Fig 1. The sequencing effort was sufficient to capture most species from each sample.

The plots were generated by randomly subsampling 10% to 90% of reads from each dataset and estimating the number of species identified using MTSv and Bracken. As the number of subsampled reads approach 100%, the number of species identified leveled off which suggests that all but the rarest species were captured. Bracken identified more species because it reallocates genus-level assignments to the species level. For MTSv, the species count included only species with reads that aligned uniquely at the species level. The values are the average for 10 replicates for each subsample size. The results shown are from the 20 high-quality libraries sequenced using Illumina NextSeq High Output runs.

The sequence reads generated for each of the high-quality libraries were subsampled (10X times) at 10% intervals from 10% to 90% of total reads for a total of 20 X 10 X 9 = 1,800 simulations. Each subsampled read set was processed via MTSv (v 1.0.0) to identify candidate species. In all cases, the number of candidate species plateaued well before the actual number of reads (100%) was achieved, suggesting that additional sequencing would not detect additional (sufficiently common) species in the samples. This also provides a prediction of the complexity of a sample. For example, the category 5 samples are predicted to have low diversity.

Sample classification by MTSv

All sequence reads were segmented into non-overlapping 50-mer fragments (with read pairing information ignored) prior to MTSv analyses (Table 4). The resulting segments were taxonomically classified to the genus and species levels using the MTSv pipeline. The MTSv pipeline ( uses a local copy of the GenBank database (accessed 06/15/2018) as the reference. Briefly, the flat files of the entire NCBI GenBank were downloaded; all sequences without a valid (species level or below) taxonomic classification or labeled as “environmental” samples were discarded; all sub-species (strain-level and below) taxonomic classifications were collated into their parent species classification. The resulting sequences were then used to build a custom FM-Index for fast taxonomic classification of high throughput sequencing reads. The pipeline performed a full alignment with up to three mismatches tolerated against all taxonomically classified sequences. Only the sequences that aligned uniquely to a taxonomic unit (species or genus levels) within the database were used for taxonomic classification. In this fashion, only the sequence fragments that unambiguously supported the presence of an organism were used. This significantly reduced the false positive rate associated with metagenomic characterization of complex samples.

Table 4. MTSv sequence fragments and alignment characteristics.

For each taxonomic unit within GenBank, MTSv provides the number of total hits, the number of unique hits (i.e., unique 50-mer sequence), total signature hits (reads aligning to one and only one species), and the total unique signature hits. The candidates were then ranked (high to low) in terms of total signature hits (unique in terms of sequence, reads aligning to one and only one species). This allows MTSv to avoid issues associated with spurious genomic similarity between organisms when assessing taxonomic composition of each sample. Thus, top ranked candidate taxa are both present in high proportion in the sample and identified with the highest confidence.

For each candidate taxon identified as part of the initial run of the MTSv pipeline, we performed a proportion test to provide a confidence score for the presence of the taxon. Briefly, assume that is the proportion of signature to total hits for a taxon Ti, where i = 1,…,# of candidate taxa, whereas is the true proportion of signature to total hits for this taxon (estimated from the reference sequences of the taxon). Assuming a Null Hypothesis H0: , and an Alternative Hypothesis of HA: , we can then perform the proportion test to assign a likelihood that the observed proportion of signature reads conforms to expectations. The proportion test involves computing the test statistic , where n is the total number of trials (100,000 in our example above). With this approach, for each of the candidate taxa, we computed a p-value to assess our confidence in the call (i.e. our confidence in observing the signature reads we have, under the assumption that the organism was actually there). In addition to the p-value, we also computed the Cohen’s H statistic to measure the distance between the two proportions, as well as the actual ratio of observed versus expected proportion ratio = .

Candidate taxa with at least 300 unique signature hits, a Cohen’s H value of < 0.4, and a proportion ratio < 2.5 were considered to have very high confidence. Candidates with at least 300 unique signature hits, Cohen’s H value of < 0.5, and proportion ratio < 2.5 were considered to have high confidence. Candidates with at least 300 unique signature hits and a proportion ratio < 4 were considered to have medium confidence. Candidates not fitting these criteria were considered to have low confidence and were discarded from summary and analysis. The raw data, including ratios, and confidence determinations are available in the S3–S6 Tables in S2 File (

Taxonomic classification using Bracken and MetaPhlAn2

In addition to MTSv, two additional tools were used to taxonomically classify the reads in the high-quality samples. The first, Bracken (Bayesian Reestimation of Abundance after Classification with Kraken) [13,14] is a k-mer-based classifier that includes estimations of species abundance in a sample. Bracken was run using default parameters. MetaPhlAn2 [15] is a tool that uses a marker gene database to identify bacteria, archaea, eukaryotes, and viruses in metagenomic sequencing samples. The current database represents about 13,500 bacterial and archaeal strains, 3,500 viral, and 110 eukaryote isolates. MetaPhlAn 2.0 was run using the version 2.0 database with default parameters. Heatmaps were constructed using hclust2 (

Read-based functional annotation

Metabolic capacity was predicted using two methods. The first was using HUMAnN2 [16], which uses the marker species information from MetaPhlAn2 to align reads to species pangenomes, then performs a translated search of unclassified reads using a comprehensive non-redundant protein database (UniRef50) [17] followed by normalization to reads per kilobase (RPK) units. We also performed a KEGG-based functional analysis using the same segmented 50-mer fragments used in the MTSv analysis. Based on the MTSv metagenomic analysis, an estimate of the likely translation space was used to assemble a library of 6,741 functions using the KEGG Orthology (KO). Briefly, a total of 117 bacterial species with annotations in KEGG, each containing between 677 and 3,868 annotated functions, were identified. The union of these yielded 6,741 unique functions (with unique KO numbers), with many functions supported by multiple candidate bacterial taxa. The 50-mers were mapped against these 6,741 functions. Because a single sequence fragment can (and often does) align to multiple KO numbers, the signature space annotation was carried out for each of the KO number bins. Using this approach, only the reads that unambiguously supported a single function were used in the functional analysis. Reads were reported as RPK per million reads (RPKM).

Assembly and annotation

High-quality metagenomic assembled genomes (MAG) were produced using IDBA-UD [18] using the option—pre_correction—min_contig 500. Assembly statistics were generated using assembly-stats ( The M. osloensis MAG was assembled by recruiting reads to the M. osloensis strain MOXF1 assembly ( We then assembled the reads with SPAdes v. 3.14.0 with the non-default option—careful to reduce the number of mismatches and indels [19]. Annotation was performed using Prokka v 1.13 [20] using the reference genome with the options prokka—proteins—prefix M-osloensis-Assembly contigs.fasta—mincontiglen 200—rawproduct.

Taxonomic assignment of MetaGenomic assemblies

Taxonomic classification was done using a local copy of NCBI’s nt database [21] downloaded on 12/07/2021. Scaffolds were aligned to the database using BLAST+ 2.12.0 software package with default parameters and “-max_target_seqs 5” flag enabled. The top hit for each scaffold was selected on the basis of NCBI’s bit score, provided that alignment covered at last 25% of the original query with at least 80% sequence identity and an e-value of 1*10−7 or lower. Taxonomic classification of top alignments was done using taxonomizr 0.7.1 ( software package and a local version NCBI’s accession2taxid and taxdump databases downloaded on 12/07/2021.


We employed three separate established metagenomic analysis pipelines (MTSv, MetaphlAn2, and Bracken) that are based upon different methods (true alignment-based, marker-based, k-mer-based, respectively) the assign microbial content and predict function withing various low microbial burden environmental samples collected in the JPL space craft manufacturing facilities. These results are discussed individually and collectively below and put into context with current understanding of the risks of microbial flora contamination in space exploration.

MTSv analysis

As expected, when there were low amounts of input DNA the sequencing libraries were of low quality and very few organisms were detected. As shown in the Table 4 MTSv alignment metrics, this included samples:1–1, 1–2, 1–3, 1–4, 1–5, 1–6, 1–7, 1–8, 1–9, 1–10, 1–12, 2–1, 2–2, 2–3, 2–4, 2–10, 3–1, 3–2, 3–3, 3–4, 3–5, 3–6, 3–9, 3–10, 4–6, 4–7, 4–8, 4–9, 4–10, and controls (not shown). These samples contained trace amounts of Staphylococcus aureus and Ralstonia solanacearum, which are probably low-level contaminants, since they were also detected in the controls and the buffer blank samples (Fig 2). Many of these samples also contained significant human DNA (as per the MTSv pipeline, human DNA was not filtered for the MTSv analysis). Samples 1–8 and 2–10 had the largest number of non-contaminant candidate taxa, including Cutibacterium (formerly Propionibacterium) acnes, Micrococcus luteus and Pseudomonas stutzeri. Samples 1–10, 1–12, 2–1, 3–6, and 4–9 also had a few taxa, but the remaining samples had none.

Fig 2. Few organisms were detected in the samples with low amounts of input DNA, and these were likely contaminants that were also found in controls and blank buffer samples.

Staphylococcus aureus and Ralstonia solanacearum were identified in all samples including the PCR no template control (PCRNTC), the elution buffer (EBJPL) and buffer blanks. Seven additional species were found among 7 of the samples. The heat map shows the hierarchical clustering of all species-level bacterial candidate taxa with medium confidence or better (i.e., 300+ signature reads and the ratio ) detected using MTSv. Sample type and description grouping are shown on the x-axis. The heat map colors indicate the number of signature hits assigned to each species (the values were converted to a long scale for better visualization). The values have been normalized by row to show the relative difference in species hits between samples.

In contrast, there was substantial taxonomic diversity observed in the high-quality libraries resulting from higher input samples. Taxonomic classification of samples was carried out at the species level (Fig 2 and S3 and S4 Tables in S2 File, the genus level (S5 Table in S2 File, and a combination of species and genus calls when confident species calls were not possible (Fig 3 and S6 Table in S2 File The vacuum samples from SAF (ISO 7 samples, 4–1, 4–3, 4–4, and 4–5) formed the tightest cluster (i.e., having the least number of differences in taxonomic composition), and were dominated by M. osloensis and several organisms of concern for planetary protection. All samples contained Methylobacterium and some contained a combination of Blastococcus saxobsidens, Modestobacter marinus, and Geodermatophilus obscurus. These three organisms are members of the family Geodermatophilaceae and were originally isolated from harsh environments [22]. They are able to resist UV light, ionizing radiation, heavy metals, desiccation, starvation and oxidative stress [22,23]. Other organisms of concern in this cluster are Sphingomonas (a psychrophile) and Roseomonas, one species of which (R. radiodurans), is radiation resistant [24]. The SAF DNA replicates (category 5) formed another tight, yet less taxonomically diverse, cluster dominated by Methylobacterium populi, Massilia spp., Sphingomonas spp., Clavibacter spp., Serratia marcescens and the organo-oligotroph Chryseobacterium indologenes. They also had a high relative abundance of Acinetobacter species. The similar profiles were expected since these were replicates. A notable exception and outlier was sample 5–6, which had no PCR detectable 16S or ITS sequences compared to all the other JPL samples. The wipe solution samples (category 2) did not form a cluster and had few consistently dominant organisms; these included Acinetobacter, Cutibacterium, M. osloensis and Staphylococcus. The category 3 samples (filters; n = 2) that were sequenced had profiles similar to most of the category 2 samples (wipes).

Fig 3. Substantial taxonomic diversity was observed among the high-quality libraries and there is some clustering of sample types.

The heat map shows the hierarchical clustering of species- and genus-level bacterial candidate taxa with medium confidence or better (i.e., 300+ signature reads and the ratio ) detected using MTSv. Sample type and description grouping are shown on the x-axis. The heat map colors indicate the number of signature hits assigned to each species (the values were converted to a long scale for better visualization). The values have been normalized by row to show the relative difference in species hits between samples.

Taxonomic classification using MetaPhlAn2 and Bracken

Metagenomic reads from the high-quality libraries were also analyzed using MetaPhlAn2 [15] and Bracken [13] to identify taxa and calculate their relative abundances. Heatmaps were generated to visualize the 40 most abundant organisms assigned by each tool and to visualize clustering by sample (Figs 4 and 5). The complete data tables are available in the S7 and S8 Tables in S2 File ( As seen in the MTSv analysis, samples from categories 4 (vacuum particle solution samples) and 5 (DNA replicates) formed distinct clusters using both tools. As mentioned above, sample 5–6 was unlike any of the other samples and did not have any reads that could be classified by MetaPhlAn2, so it was not carried forward in that analysis. Category 5 had a high abundance of taxa in the Oxalobacteraceae family, specifically Massilia spp. and Janthinobacterium spp.; A. lwoffii and P. acnes was also present. As observed in the MTSv analysis, vacuum debris samples had a high abundance of the chemo-organotroph, Enhydrobacter aerosaccus, Massilia and presence of A. lwoffii and P. acnes. These samples also contained the desiccation and radiation resistant species identified by MTSv (Fig 5), including B. saxobsidens, G. obscurus and M. marinus. MetaPhlAn2 identified the genus Deinococcus; Bracken identified the spore-former Streptomyces fradiae. Vacuum debris samples also contained Pseudomonas spp., including P. putida, and the heavy metal resistant species Stenotrophomonas maltophilia [7]. Three wipe samples clustered (2–6, 2–7, 2–8) with the HEPA filter samples (3–7 and 3–8) and had a high abundance of C. acnes, Pseudomonas, and several species of Corynebacterium and Sphingomonas (Figs 4 and 5). They also had Massilia spp. and the presence of the fungus Malassezia restricta. Planetary Protection relevant taxa, mentioned above, were also present at low relative abundance. Overall, vacuum debris (category 3) samples appeared to be the most diverse and were composed mostly of members of the Actinobacteria and Proteobacteria phyla, including many hardy environmental organisms. The wipe (category 2) and ULPA filter (category 4) samples had less diversity and were predominated by the Actinobacteria phylum, including human commensal taxa. Categories 1 (one sample) and 5 were the least diverse and also dominated by Proteobacteria of both human and environmental origin.

Fig 4. Heat map and hierarchical clustering of the top 40 taxa mapped by MetaPhlAn2. Sample category and number is shown on the x-axis.

Whole metagenome sequences from each sample were characterized by MetaPhlAn2 and merged into a relative abundance table. A heatmap with hierarchical clustering of the top 40 taxa was generated to illustrate similarity.

Fig 5. Heat map and hierarchical clustering of the top 40 taxa mapped by Bracken. Sample category and number is shown on the x-axis.

Taxonomic assignments from whole metagenome sequences from each sample were assigned by Kraken2 and abundances were estimated using Bracken. The results for each sample were merged into a relative abundance table. A heatmap with hierarchical clustering of the top 40 taxa was generated to illustrate similarity.

Metagenomic analysis of sample composition—Cross approach comparison

There was considerable overlap of the genus-level calls between the three classification approaches used (Fig 6 and S9 Table in S2 File Of the total 10,599 genus-level calls across all samples, 505 (5%) were made by all three methods, 1,767 (16.7%) were made by two of the methods and 8,327 (78.6%) were made by only one of the methods. When calls were made by two of the methods, MetaPhlAn2 was the method that more often failed to make a matching call (1646/1767, 93.2%). This is not surprising, since MetaPhlAn2 uses a marker gene database to map reads so it is most prone to sensitivity issues (with a tradeoff of increased specificity when the marker gene is present in the sequencing read set). Lower overlaps but similar patterns were observed when comparing species level calls for the three methods (Fig 7 and S10 Table in S2 File Of the 30,557 species level calls across all samples, 317 (1%) were called by all three methods, and 3,698 were called by two methods. Again, MetaPhlAn2 was lowest with only 1,099 out of 30,557 calls. While abundance may play a role for rare taxa, the methods’ intrinsic identification differences are more important. This emphasizes the utility of using multiple methods for taxonomic composition and highlights the importance of not relying on one method or database.

Fig 6. Heat map of genus-level overlap among three classification methods.

The heatmap shows the number of methods (MTSv, Bracken, and MetaPhlAn2) that agree with each call at the genus level. Shown are 78 genera identified by all three methods in at least two samples. A total of 111 unique genera were identified by all three methods, 383 unique genera were identified in at least two methods, and 1,081 unique genera were identified in at least one method, across all samples. When only two methods agreed it was usually MTSv and Bracken, MetaPhlAn2 missed 93% of calls made by the other two tools.

Fig 7. Heat map of species-level overlap among three classification methods.

The heatmap shows the number of methods (MTSv, Bracken, and MetaPhlAn2) that agree with each call at the species level. Shown are the 54 species identified by all three methods in at least two samples. A total of 317 unique species were identified by all three methods,3,698 unique species were identified in at least two methods, and 26,542 unique species were identified by at least one of the methods. When only two methods agreed it was usually MTSv and Bracken, MetaPhlAn2 missed 96% of the total species-level calls.

Functional analysis of samples using HUMAnN2

A heatmap of relative abundances generated by HUMAnN2 for the 40 most abundant pathways in each sample is shown in Fig 8. The complete dataset is available in S11 Table in S2 File ( All samples had a high signal for the valine and isoleucine synthesis pathway and samples 1–11, 2–5 and all category 5 samples had a high abundance of tricarboxylic acid (TCA)-glyoxylate and glyoxylate bypass, TCA cycle, histidine and arginine synthesis and tRNA charging pathways. Category 5 also had a dominance of branched chain amino acid biosynthesis amino acid pathways. This suggests that these communities have the ability to grow in nutrient limited environments.

Fig 8. Hierarchical clustering of the top 40 pathways identified using HUMAnN2 for each sample (x-axis).

See S11 Table in S2 File ( for full pathway names. Pathways from the whole metagenome sequence data for each sample were estimated using HUMAnN2. The results for each sample were merged into a single table. A heatmap with hierarchical clustering was generated for the top 40 pathways to provide a qualitative snapshot of grouping between samples. Complete pathway names are located in S11 Table in S2 File (

HUMAnN2 associates pathways with significant contributing organisms. These were plotted for the pathways mentioned above. For the L-arginine biosynthesis I pathway (ARGSYN) (S2 Fig, Pseudomonas mendocina was a contributor of the pathway in samples 2–5, 2–7 and 2–11, while Staphylococcus epidermidis was the only contributor in sample 2–6. Enhydrobacter aerosaccus was an important contributor in samples 3–7, 4–1, 4–2 and 4–3, yet much of the ARGSYN pathway was contributed by unclassified organisms in most samples, especially category 5, 2–5 and 1–11. The L-arginine biosynthesis II pathway (acetyl cycle; ARGSYNBSUB) was also mostly contributed by unclassified organisms (S3 Fig in S1 File, but P. mendocina was again present in 2–5, 2–7 and 2–11, by E. aerosaccus in category 4 vacuum debris samples and by S. epidermidis alone in sample 2–6. A somewhat different pattern was observed for the L-histidine biosynthesis pathway (HISTSYN) (S4 Fig in S1 File, where P. mendocina was again prevalent in 2–7 and 2–11 and also present in 2–5, but Modestobacter multiseptatus was a contributor of the pathway in 4–1, 4–2, 4–3 and 4–4, C. acnes in 2–8, 3–7 and 3–8, and S. epidermidis alone in 2–6. Many more organisms were assigned to the superpathway of branched amino acid biosynthesis (BRANCHED-CHAIN-AA-SYN) but again many were unclassified (S5 Fig in S1 File Samples 2–5, 2–7 and 2–11 had major contributions by P. mendocina, 4–1, 4–2, 4–3 and 4–4 by E. aerosaccus with 4–3 and 4–4 including contributions from Sphingobium yanoikuyae and Clavibacter michiganensis. Category 5 samples were almost entirely unclassified. The TCA cycle I (prokaryotic) was largely contributed by two classes of unclassified organisms (dark blue and dark red in S6 Fig in S1 File There was a contribution to the TCA cycle by Massilia timonae, especially in sample 2–9 and vacuum debris samples, a small contribution by M. multiseptatus in vacuum debris and a small contribution by A. lwoffii in category 5 samples. The glyoxylate cycle (GLYOXYLATE-BYPASS) (S7 Fig in S1 File was completely contributed by unclassified organism in eleven samples, by E. aerosaccus and M. multiseptatus in category 4 samples and a very small contribution by Kocuria rhizophila (a persister species) in samples 3–7, 3–8 and 4–2. The superpathway of glyoxylate bypass and TCA (TCA-GLYOX-BYPASS) (S8 Fig in S1 File and tRNA charging pathway (TRNA-CHARGING) (S9 Fig in S1 File were unremarkable because nearly all the assignments were to unclassified organisms. The abundance of these organisms and their contributions to the pathways called agrees with the relative abundances of organisms predicted by both MetaPhlAn2 and Bracken (Figs 4 and 5; S7 and S8 Tables in S2 File

A limitation of HUMAnN2 is that it uses the output from MetaPhlAn2. If MetaPhlAn2 cannot assign many of the reads because it is a marker-based database, then HUMAnN2’s output will also be limited. An alternative approach is to conduct read based annotation or assembly, as described below.

KEGG-based functional analysis

The KEGG orthologs (Kos) with top 500 RPKM (Reads Per Kilobase per Million) values were used to estimate the functions with high copy count for each sample. The union of these functions identified a set of 2,704 KO numbers. The full table of these Kos is provided in S12 Table in S2 File ( We examined these KOs for the presence of functions related to spore formation, radiation resistance, anaerobiosis, peroxide resistance, etc. and found very few examples of proteins that have such functions. Only five proteins predicted for sporulation associated functions were annotated in a few of the samples. Most samples had one of four anaerobic reductases. All had catalase-peroxidase and organisms from category 4 samples had a nickel superoxide dismutase, yet no protein with annotated function associated with radiation resistance was observed.

The Bray-Curtis dissimilarity between these KOs for all samples was used to identify the principal components (PCs) shown in Fig 9. Substantial clustering was observed on the basis of sample type and location. These results closely mirrored sample organization observed from taxonomic clustering of these samples. Sample 5–6 (not shown in Fig 9) was substantially different from all other samples. Of the top 500 functions drawn from each sample, 277 functions were unique to this sample. As mentioned above, this sample could not be classified using MetaPhlAN2 so it could not be analyzed using HUMANn2. This suggests a significantly different functional composition of the sample that is highly correlated to its taxonomic composition.

Fig 9. Sample clustering by KEGG function composition.

A library of 6,741 KeGG Orthology (KO) functional sequences was collected for species and genera identified in the metagenomic analysis. Reads that uniquely mapped to a particular KO were counted and the counts were normalized for read depth and unique sequence length to generate RPKM-like statistics. The top 500 RPKM values were used to estimate the functions with high copy count for each sample. The union of these functions identified a set of 2,704 KOs. The figure shows the results of a principal components analysis (based on Bray-Curtis dissimilarity) across these KOs for all samples. Colors indicate ISO facilities and enclosing circles indicate different sample types. ISO represents building (or source), while circles identify sample types.

Metagenome assembled genomes and annotation

We assembled sequence reads from the high-quality metagenome data sets using IDBA-UD [18], which is an assembler optimized for sequencing data that has highly uneven depth. Although the samples had very low microbial biomass, some were also low in diversity, so we expected that we might be able to capture near complete MAGs from these assembly efforts. In fact, we were able to generate MAGs from all (n = 18) but two of the metagenome data sets (S13 Table in S2 File The assembly attempts for samples 4–3 and 5–6 never completed due to computational memory constraints, despite allocating 300 GB of memory to the job. The assembly-stats script was set to ignore contigs smaller than 1 kb. In many cases we were able to assemble contigs that were larger than 50 kb and in four cases the largest contig was larger than 150 kb.

Taxonomic assignment of scaffolds

To conduct a detailed analysis on two assembled data sets, 4–1 (vacuum debris) and 3–7 (filter sample), blastn was run for each scaffold in the assembly that was ≥ 1 kb using the current version of GenBank to obtain the top hit for each contig (Table 5). We estimated the contribution of each organism in the assembly by summing the length of the alignments to each organism and dividing by the total number of bases in the assembly (see S14 and S15 Tables in S2 File for contig blastn results and sums). A pie chart was generated for the top hits in samples 4–1 and 3–7 (Figs 10 and 11). In Fig 10, note that the most abundant species, M. osloensis (37%), is the same species identified as most abundant by Bracken (Fig 5) and MTSv (Fig 3). The assembly of sample 3–7 revealed a different distribution of species. Malassezia restricta, a skin fungus, was dominant (29%) and hits to all nine chromosomes were observed. It was only identified at the genus level by MTSv, but it was identified as Malassezia globosa by MetaPhlAn2 (S7 Table in S2 File The second most abundant species, C. acnes (15%) was identified by all three read-based methods. M. osloensis (8%) was also identified in this sample. The M. osloensis reads from sample 4–1 were assembled from the metagenome by taxonomically binning the reads and running a whole genome assembly assembled using SPAdes [19] (Fig 12). The genome was also annotated using Prokka (S16 Table in S2 File The annotation provided 1992 predicted open reading frames, four ribosomal RNA genes, three of which were partial, and 41 tRNA genes. Although classical typing characterized M. osloensis to be non-motile [25], we identified proteins associated with the production of type IV pilus, which is common in other Moraxella species [26]. Several chemotaxis proteins (CheB, CheW, and CheZ) were also identified. This reference-free MAG generated a single circular scaffold of ~2.89 mbp, which is remarkably close to the published ~2.5 mbp chromosome sizes from cultured M. osleonsis [27]. In addition to the larger assemblies, scaffolds assembled from each data set allowed the identification of their closest taxonomic relative from the NCBI nt database. This was highly successful with high quality identities for 75%, or more, of the scaffolds across the study (S17 Tbale in S3 File and S18 Table in S3 File In all, over 800,000 taxonomic identities were assigned to scaffolds.

Fig 10. Taxonomy of the assembled metagenome of sample 4–1.

Each contig greater than 1kb from the assembly of sample 4–1 was analyzed with blastn. The contribution of each organism was estimated. A pie chart was generated for the top hits to illustrate the taxonomic composition.

Fig 11. Taxonomy of the assembled metagenome of sample 3–7.

Each contig greater than 1kb from the assembly of sample 3.7 was analyzed with blastn. The contribution of each organism was estimated. A pie chart was generated for the top hits to illustrate the taxonomic composition.

Fig 12. Circular map with open reading frames of the assembled genome of Moraxella osloensis from sample 4–1.

Map was created with BLAST Ring Image Generator (BRIG) [28].


Environmental metagenomic characterization of the spacecraft assembly facilities is critical to understand their potential microbial diversity and will lead to better risk assessment and improve methods to reduce the risk of accidental interplanetary contamination. Using whole metagenome sequencing and multi-classification bioinformatic analysis, we generated high confidence identification of microorganisms associated with floor wipe and filter samples at the JPL SAF. The 16S rRNA and 18S rRNA counts are generally correlated with ISO classification, with the SAF (Blg 179) having the highest counts of bacteria and fungi. Use of three different classification methods provided a comprehensive classification of samples. Not surprisingly, dominantly identified organisms were mostly those associated with human skin, such as C. acnes, A. lwoffii and Malassezia, or were environmental organisms in the phyla Actinobacteria and Proteobacteria. Category 4 samples (vacuum particles from cleanroom vacuums) were of most concern to Planetary Protection because they included a number of taxa with the potential to be resistant to desiccation, radiation, UV irradiation, low temperature, heavy metals, etc. Many of these and others were oligotrophs that can survive under nutrient limiting conditions. These included members of the Geodermatophilaceae (Blastococcus, Geodermatophilus and Modestobacter), Methylobacterium, Deinococcus, sphingomonads, and Stenotrophomonas. We identified 17 (12 in top 40%) of 22 genera recently described as being newly isolated from the International Space Station [29], suggesting that our methods were comprehensive.

Functional annotation failed to identify abundant protein families associated with spore formation, desiccation resistance, radiation resistance or other properties that would allow organisms to survive in extreme environments. Nevertheless, our HUMANn2 analysis of pathways and organisms provided a clue. One of the key mechanisms of radiation and other defiance in D. radiodurans is resistance to reactive oxygen species (ROS) [30]. Catalase and superoxide dismutase can contribute to ROS resistance, but another significant mechanism is via the glyoxylate bypass of the TCA cycle [31,32], which avoids the free radical generating steps of the TCA cycle. In the HUMANnN2 analysis, the glyoxylate bypass of the TCA cycle was only associated with Modestobacter multiseptatus in vacuum debris samples, yet the glyoxylate cycle was associated with additional organisms in the ULPA filter and 4 vacuum debris samples.

Our analyses were limited by low biomass in samples, as such, some could not be submitted to WGS. Focused DNA sequencing strategies, such as PCR based amplicons, can provide a superior limit of detection (LOD) while still generating genomic sequences to enhance sensitivity and specificity in the analysis. In contrast, WMGS, as performed in this study, provided the opportunity to query samples in an unbiased fashion to identify unanticipated environmental components. Clearly, both are valuable approaches with particular attributes that provide complementary information. Focused sequencing amplicons can be developed for specific microbial targets from comparative genomic analysis. This would provide for the appropriate sensitivity and specificity, while gaining the superior LOD. When there are known high consequence microbes (examples of which are reported here) that must be detected, a focused amplicon approach should be developed and included in an ongoing monitoring program. Future decreases in cost per read, increases in sequencing fidelity, and longer read lengths will make MWGS even more powerful.


As humanity embarks on interplanetary exploration and travel, we must continue to improve our methods to reduce chances of space contamination with earth-based microbes. Recent advances in metagenomic sequencing and bioinformatic analyses is making this possible. Here we gained valuable insight on both microbial burden in spacecraft and instrument manufacturing facilities but also in appropriate analysis methodologies. Classification of metagenomic reads is both algorithm and database dependent, and individually these methods provide unique but incomplete views of the environmental microbiome, even in a highly controlled, low microbial burden setting as the JPL facilities described here. Using a multi-classification approach, we believe that we have captured a majority of the known microbial taxa within the JPL cleanroom environments. WMGS is clearly a powerful technique for unbiased monitoring of microbial burden of spacecraft assembly facilities, both in terms of presence and likely functional impact. Assembled WMGS data can also be used to capture new microbial “dark matter”, microbes that lack current database representation and, as such, may represent previously unforeseen challenges to space exploration. A comprehensive, mulit-classification metagenomic approach is therefore critical for environmental monitoring of ongoing and future spacecraft and instrument manufacturing.


  1. 1. La Duc MT, Dekas A, Osman S, Moissl C, Newcombe D, Venkateswaran K: Isolation and Characterization of Bacteria Capable of Tolerating the Extreme Conditions of Clean Room Environments. Applied and Environmental Microbiology 2007, 73(8):2600–2611. pmid:17308177
  2. 2. Gradini R, Chen F, Tan R, Newlin L: A summary on cutting edge advancements in sterilization and cleaning technologies in medical, food, and drug industries, and its applicability to spacecraft hardware. Life Sciences in Space Research 2019, 23:31–49. pmid:31791604
  3. 3. Gilbert JA, Stephens B: Microbiology of the built environment. Nature Reviews Microbiology 2018, 16(11):661–670. pmid:30127345
  4. 4. Chase J, Fouquier J, Zare M, Sonderegger DL, Knight R, Kelley ST, Siegel J, Caporaso JG, Gilbert JA: Geography and Location Are the Primary Drivers of Office Microbiome Composition. mSystems 2016, 1(2):e00022–00016. pmid:27822521
  5. 5. Jacobs JL, Sundin GW: Effect of Solar UV-B Radiation on a Phyllosphere Bacterial Community. Applied and Environmental Microbiology 2001, 67(12):5488–5496. pmid:11722897
  6. 6. DeBruyn JM, Fawaz MN, Peacock AD, Dunlap JR, Nixon LT, Cooper KE, Radosevich Met al: Gemmatirosa kalamazoonesis gen. nov., sp. Nov., a member of the rarely-cultivated bacterial phylum Gemmatimonadetes. The Journal of General and Applied Microbiology 2013, 59(4):305–312.
  7. 7. Xiong W, Yin C, Peng W, Deng Z, Lin S, Liang R: Characterization of an 17β-estradiol-degrading bacterium Stenotrophomonas maltophilia SJTL3 tolerant to adverse environmental factors. Applied Microbiology and Biotechnology 2020, 104(3):1291–1305.
  8. 8. Minich JJ, Zhu Q, Janssen S, Hendrickson R, Amir A, Vetter R, Hyde J, Doty MM, Stillwell K, Benardini J et al: KatharoSeq Enables High-Throughput Microbiome Analysis from Low-Biomass Samples. mSystems 2018, 3(3):e00218–00217. pmid:29577086
  9. 9. Bashir M, Ahmed M, Weinmaier T, Ciobanu D, Ivanova N, Pieber TR, Vaishampayan PA et al: Functional metagenomics of spacecraft assembly cleanrooms: presence of virulence factors associated with human pathogens. Front Microbiol 2016, 7:1321. pmid:27667984
  10. 10. Woyke T, Sczyrba A, Lee J, Rinke C, Tighe D, Clingenpeel S, Malmstrom R, Stepanauskas R, Cheng J-F et al: Decontamination of MDA Reagents for Single Cell Whole Genome Amplification. PLOS ONE 2011, 6(10):e26161. pmid:22028825
  11. 11. Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, Creasy HH, Earl AM, FitzGerald MG, Fulton RS et al: Structure, function and diversity of the healthy human microbiome. Nature 2012, 486(7402):207–214. pmid:22699609
  12. 12. Bolger AM, Lohse M, Usadel B: Trimmomatic: a Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30(15):2114–2120. pmid:24695404
  13. 13. Lu J, Breitwieser FP, Thielen P, Salzberg SL: Bracken: Estimating Species Abundance in Metagenomics Data. PeerJ Computer Science 2017, 3(e104).
  14. 14. Wood DE, Salzberg SL: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 2014, 15(3):R46. pmid:24580807
  15. 15. Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, Tett A, Huttenhower C, Segata N et al: MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature Methods 2015, 12(10):902–903. pmid:26418763
  16. 16. Franzosa EA, Morgan XC, Segata N, Waldron L, Reyes J, Earl AM, Giannoukos G, Boylan MR, Ciulla D, Gevers D et al: Relating the metatranscriptome and metagenome of the human gut. Proceedings of the National Academy of Sciences 2014, 111(22):E2329–E2338. pmid:24843156
  17. 17. Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, UniProt C: UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 2015, 31(6):926–932. pmid:25398609
  18. 18. Peng Y, Leung HCM, Yiu SM, Chin FYL: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012, 28(11):1420–1428.
  19. 19. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD et al: SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology 2012, 19(5):455–477. pmid:22506599
  20. 20. Seemann T: Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014, 30(14):2068–2069. pmid:24642063
  21. 21. Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S et al: Database resources of the national center for biotechnology information. Nucleic Acids Res 2022, 50(D1):D20–D26. pmid:34850941
  22. 22. Gtari M, Essoussi I, Maaoui R, Sghaier H, Boujmil R, Gury J, Pujic P, Brusetti L, Chouaia B, Crotti E et al: Contrasted resistance of stone-dwelling Geodermatophilaceae species to stresses known to give rise to reactive oxygen species. FEMS Microbiology Ecology 2012, 80(3):566–577.
  23. 23. Sghaier H, Hezbri K, Ghodhbane-Gtari F, Pujic P, Sen A, Daffonchio D, Boudabous A, Tisa LS, Klenk H-P, Armengaud J et al: Stone-dwelling actinobacteria Blastococcus saxobsidens, Modestobacter marinus and Geodermatophilus obscurus proteogenomes. The ISME Journal 2016, 10(1):21–29.
  24. 24. Kim J-Y, Kim D-U, Kang M-S, Jang JH, Kim SJ, Kim MJ, Lee JY, Lee YS, Zhang J, Lim S et al: Roseomonas radiodurans sp. nov., a gamma-radiation-resistant bacterium isolated from gamma ray-irradiated soil. International Journal of Systematic and Evolutionary Microbiology 2018, 68(8):2443–2447.
  25. 25. Vandamme P, Gillis M, Vancanneyt M, Hoste B, Kersters K, Falsen E: Moraxella lincolnii sp. nov., Isolated from the Human Respiratory Tract, and Reevaluation of the Taxonomic Position of Moraxella osloensis. International Journal of Systematic and Evolutionary Microbiology 1993, 43(3):474–481.
  26. 26. Luke NR, Howlett AJ, Shao J, Campagnari AA: Expression of Type IV Pili by Moraxella catarrhalis Is Essential for Natural Competence and Is Affected by Iron Limitation. Infection and Immunity 2004, 72(11):6262–6270.
  27. 27. Lim JY, Hwang I, Ganzorig M, Huang SL, Cho GS, Franz C, Lee K et al: Complete Genome Sequences of Three Moraxella osloensis Strains Isolated from Human Skin. Genome Announc 2018, 6(3). pmid:29348360
  28. 28. Alikhan N-F, Petty NK, Ben Zakour NL, Beatson SA: BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics 2011, 12(1):402. pmid:21824423
  29. 29. Mora M, Wink L, Kögler I, Mahnert A, Rettberg P, Schwendner P, Demets R, Cockell C, Alekhova T, Klingl A et al: The International Space Station selects for microorganisms adapted to the extreme environment, but does not induce genomic and physiological changes relevant for human health. bioRxiv 2019:533752.
  30. 30. Krisko A, Radman M: Biology of Extreme Radiation Resistance: The Way of Deinococcus radiodurans. Cold Spring Harbor Perspectives in Biology 2013, 5(7).
  31. 31. Liu Y, Zhou J, Omelchenko MV, Beliaev AS, Venkateswaran A, Stair J, Wu L, Thompson DK, Xu D, Rogozin IB et al: Transcriptome dynamics of Deinococcus radiodurans recovering from ionizing radiation. Proceedings of the National Academy of Sciences 2003, 100(7):4191–4196.
  32. 32. Ghosal D, Omelchenko MV, Gaidamakova EK, Matrosova VY, Vasilenko A, Venkateswaran A, Zhai M, Kostandarithes HM, Brim H, Makarova KS et al: How radiation kills cells: survival of Deinococcus radiodurans and Shewanella oneidensis under oxidative stress. FEMS Microbiol Rev 2005, 29(2):361–375.