The Approach to Sample Acquisition and Its Impact on the Derived Human Fecal Microbiome and VOC Metabolome

Recent studies have illustrated the importance of the microbiota in maintaining a healthy state, as well as promoting disease states. The intestinal microbiota exerts its effects primarily through its metabolites, and metabolomics investigations have begun to evaluate the diagnostic and health implications of volatile organic compounds (VOCs) isolated from human feces, enabled by specialized sampling methods such as headspace solid-phase microextraction (hSPME). The approach to stool sample collection is an important consideration that could potentially introduce bias and affect the outcome of a fecal metagenomic and metabolomic investigation. To address this concern, a comparison of endoscopically collected (in vivo) and home collected (ex vivo) fecal samples was performed, revealing slight variability in the derived microbiomes. In contrast, the VOC metabolomes differ widely between the home collected and endoscopy collected samples. Additionally, as the VOC extraction profile is hyperbolic, with short extraction durations more vulnerable to variation than extractions continued to equilibrium, a second goal of our investigation was to ascertain if hSPME-based fecal metabolomics studies might be biased by the extraction duration employed. As anticipated, prolonged extraction (18 hours) results in the identification of considerably more metabolites than short (20 minute) extractions. A comparison of the metabolomes reveals several analytes deemed unique to a cohort with the 20 minute extraction, but found common to both cohorts when the VOC extraction was performed for 18 hours. Moreover, numerous analytes perceived to have significant fold change with a 20 minute extraction were found insignificant in fold change with the prolonged extraction, underscoring the potential for bias associated with a 20 minute hSPME.


Introduction
Recent animal and human studies have highlighted the importance of the microbiota in maintaining a healthy state as well as promoting disease states, including not only gastrointestinal diseases but also chronic systemic metabolic and inflammatory diseases [1]. The role of the intestinal microbiota in regulating metabolism as well as intestinal and systemic immunity is now well established [2,3]. The intestinal microbiota exerts its profound physiological and pathological effects primarily through its metabolites, and not surprisingly metabolomics investigations have begun to evaluate the diagnostic and health implications of volatile organic compounds (VOCs) isolated from human feces [4][5][6][7][8].
Related by their volatility at ambient temperatures, the VOCs comprise a large and structurally diverse family of carbonbased molecules, of both natural and man-made origin. Specialized sampling methods, such as headspace solid-phase microextraction (hSPME), have greatly enabled the isolation of VOCs from a wide array of biological samples [9][10][11][12], including feces [4][5][6][7][8]13]. hSPME typically involves the partitioning of the VOCs from the headspace above a sample into a polymeric sorbent adhered to a fused silica rod (fiber), subsequent desorption of the VOCs into the heated inlet of a gas chromatograph, separation of the VOC mixture by gas-liquid partition chromatography, and detection by mass spectrometry. Spectral comparison to a reference database enables VOC identification.
The approach to stool sample collection is an important consideration that could potentially introduce bias and profoundly affect the outcome of a fecal metagenomic and metabolomic investigation. In vivo sample collection is generally desirable, as the resulting microbiome/metabolome is then a reflection of the native biological context, without potential ex vivo effects. However, to date, human fecal VOC investigations have typically examined samples collected after passage, allowing the stool to become exposed to the ambient environment. Since the descending and sigmoid colon are predominantly anaerobic, with the resident bacteria primarily comprised of obligate anaerobes [14,15], there is the potential that ex vivo bacterial metabolism, occurring between the time of passage and the freezing of the stool, may significantly alter the composition of the fecal VOC metabolome. While home stool collection is undoubtedly easier, far more feasible, and more economical to perform, it remained unclear if significant differences in fecal composition would appear if the samples were alternatively collected directly from the sigmoid lumen and immediately snap frozen to avoid any ex vivo effects. This information is critical for interpretation of stool metabolomics results, particularly before stool VOCs can be reliably used as diagnostic and risk stratification tools and/or exploiting stool VOC data to better understand the role of intestinal microbiota in the pathogenesis of gastrointestinal and systemic diseases.
Potential sampling bias might also be introduced by the approach to the hSPME technique. Fiber choice and extraction duration are two prime considerations when performing a hSPME analysis of feces. While numerous fiber types are commercially available, the polarity of the targeted analytes generally dictates fiber selection [12]. As the fecal VOC metabolome is chemically diverse, a rational combination of several different sorbent chemistries is essential for a global metabolomic analysis of all the indigenous analytes [8]. Regardless of the sorbent type employed, the fecal VOC extraction profile is hyperbolic, with short duration extractions (e.g. 20 minutes) more susceptible to variable analyte titers as a consequence of subtle deviations in extraction duration [8]. While several fecal VOC investigations have utilized short hSPME durations [4][5][6][7], a quantitative hSPME analysis is ideally performed when the analyte distribution is in equilibrium between the sample and the fiber coating, during the plateau of a hyperbolic extraction profile (i.e. 16-18 hrs for fecal VOC extraction) [8,16]. It remains unclear if metabolome differences observed using short extraction durations simply reflect sampling bias due to the innate variability of the hSPME technique more so than variability in metabolite abundance among the samples. It is noteworthy however, that for some metabolites, titers plateau then subsequently wane with prolonged extraction duration [8], a phenomenon attributed to higher affinity compounds displacing those with lower affinity for the fiber, thereby lowering the titer of the latter [17].
We describe here a comparative microbiome and VOC metabolome analysis of fecal samples collected directly from the sigmoid lumen (via un-prep sigmoidoscopy) then frozen right away with those collected after passage (at home) and then frozen after a period of time. Additionally, we compare the derived VOC metabolome obtained using 20 min and 18 hr hSPME durations.

Fecal samples
This investigation was approved by the Institutional Review Boards at George Mason University and Rush University Medical Center and conducted after an informed, written research consent was signed by all study participants. Fecal samples were collected from 17 healthy subjects (a total of two samples were obtained from each subject, each sample collected on separate visits to the hospital, as detailed below). Table 1 depicts the demographic characteristics of the study subjects. Each subject completed a detailed health questionnaire that showed that none had any chronic GI or systemic disease, none had any GI symptoms, none were taking any regular medication except for blood pressure and cholesterol, and none used supplements including probiotics or prebiotics. No subject took antibiotics, for at least three months, and none were excessive drinkers of alcohol (less than 2 drinks per sitting per day for women and less than 4 drinks for men). The study participants were instructed not to change their usual dietary consumption and, as verified by dietary questionnaire, all participants had no change in their typical diet or health status between the two stool collections. Each subject had a stool collection on two occasions: once during sigmoidoscopy, Visit 1, and another time at home, within 24 hrs prior to Visit 2. The interval between the study subject's two stool collections was never more than 7 days (Table 1). For endoscopy stool collection, Visit 1, each study subject underwent a limited un-sedated sigmoidoscopy after an informed, written consent. There was no colon preparation prior to sigmoidoscopy. The stool in the lumen of distal sigmoid was grabbed by a Roth Net (Ref 00711052; US Endoscopy, Mentor, OH) and removed with the sigmoidoscope. From the sigmoidoscope, the stool was then placed in a cryovial and placed in liquid nitrogen in the endoscopy room. Upon removal from the liquid nitrogen, the cryovial was immediately stored in a -80 °C freezer until analysis. At the completion of the study subject's sigmoidoscopy, each subject was given a home stool collection kit to be returned to the hospital at Visit 2.
For home stool collection, study subjects were provided with the supplies and instructions that informed them on how to put their stool into a BD Gaspak EZ Anaerobe Gas Generating Pouch System with Indicator (Ref 260683; Becton, Dickinson and Company, Sparks, MD) in order to minimize the exposure of stool to high oxygen ambient atmosphere. Subjects were asked to have a bowel movement, within 24 hours of Visit 2, to keep the sealed anaerobic stool bag in a cold environment, and to bring the anaerobic stool bag to the hospital. The stool was then immediately stored in a -80 °C freezer. The interval between passage of stool and storage at -80 °C was within 12 to 24 hours.

Microbiome analysis
We interrogated the microbial taxa associated with the gut fecal microbiome using multi-tag pyrosequencing (MTPS). This technique allows the rapid sequencing of multiple samples at one time, yielding thousands of sequence reads per sample [18]. Specifically, we have generated a set of 96 emulsion PCR fusion primers that contain the 454 emulsion PCR linkers on the 27F primer (5'-AGAGTTTGATCCTGGCTCAG-3′) and 355R′ (5′-GCTGCCTCCCGTAGGAGT-3′) and different eightbase "barcode" between the A adapter and the 27F primer. Thus each fecal sample was amplified with unique barcoded forward 16S rRNA primers, and then up to 96 samples were pooled and subjected to emulsion PCR and pyrosequenced using a GS-Junior pyrosequencer (Roche). Data from each pooled sample were "deconvoluted" by sorting the sequences into bins based on the barcodes using custom PERL scripts. Reads were filtered based on quality scores (>30 quality units assigned by the 454) and length (>180 bp). Thus we were able to normalize each sample by the total number of reads from each barcode. We have noted that ligating tagged primers to PCR amplicons distorts the abundances of the communities, and thus it is critical to incorporate the tags during the original amplification step [18]. We identified the taxa present in each sample using the Bayesian analysis tool in Version 10 of the Ribosomal Database Project (RDP10). The abundances of the bacterial identifications were then normalized using a custom PERL script, and taxa present at >1% of the community were tabulated. We chose this cutoff because 2,000 reads per sample will only reliably identify community components that are >1% in abundance, while acknowledging that those less than 1% may indeed have significant biological and clinical relevance. A Pearson (n) principal component analysis was performed using the taxa abundance table and the statistical package XLSTAT 2012.6.02. Pearson's correlation coefficients were calculated using Microsoft Excel. Unsupervised hierarchical clustering and heatmap generation was performed using the statistical package R (http://www.r-project.org/) using the Ward method and Pearson correlation for the distance measure. hSPME procedure Aliquots (0.2 g) of each frozen fecal sample were dispensed into 4 mL WISP style screw thread amber glass vials, sealed with Black Top Hat PTFE/Silicone caps (J.G. Finneran, Vineland, NJ), and stored at -80 °C until analyzed. The following three SPME fibers (Supelco, Bellefonte, PA) were used in our investigation: 75 μm carboxen-polydimethylsiloxane (CAR-PDMS), 85 μm polyacrylate (PA), and 50/30 μm divinylbenzene (DVB)-CAR-PDMS. All fibers were preconditioned before use, as per the manufacturer's instructions. All analyses were performed in duplicate. Sample vials were heated to 60 °C for 30 minutes then the hSPME fiber was positioned into the headspace above the feces and the fiber exposed to the volatiles for 20 minutes or 18 hours. The sample vial temperature was held at 60 °C for the duration of the exposure. The fiber assembly was then placed into the GC inlet for thermal desorption of the analytes.

GC-MS Instrument
Samples were analyzed using an Agilent 7890A GC equipped with a DB5-MS capillary column (Agilent, Palo Alta, CA; 30 m length, 0.25 mm ID, and 0.25 μm film thickness), a 0.75 mm ID SPME injection port liner, and a 5975 inert XL mass selective detector (MSD) with triple axis detector. The GC injection port was operated in splitless mode at select inlet temperatures, dependent upon the SPME fiber used (300 °C, CAR-PDMS; 280 °C PA; 270 °C DVB-CAR-PDMS). Helium carrier gas was set to a flow rate of 1.17 mL/min. The GC oven was held at an initial temperature of 35 °C for 1 min, ramped at 3 °C/min to 80 °C, then 10 °C/min to 120 °C, and finally 40°C /min to 260 °C, where the temperature was held for 1.5 min. The total run time for the analysis was 25.0 min. The MSD was scanned from 30 to 550 amu at a rate of 2.81 scans/sec.

Data processing and analysis
VOCs were identified in the GC-MS chromatograms using the National Institute of Standards and Technology (NIST, Washington, DC) Automated Mass Spectral Deconvolution and Identification System (AMDIS, ver. 2.69) software and mass spectral library (NIST08). Only compounds with 85% or greater probability of match to a molecule in the NIST08 library were considered. Each AMDIS outfile, containing a list of identified metabolites and their corresponding peak height values, was filtered by custom Perl scripts designed to remove background analytes (e.g. siloxanes) and eliminate metabolite redundancies (retaining the replicate with the highest peak value). Duplicate sample data sets were combined by merging their AMDIS outfiles and averaging the corresponding peak height values. A comprehensive, three-fiber metabolite dataset was prepared for each sample by pooling the metabolites obtained using the CAR-PDMS, PA, and DVB-CAR-PDMS fibers and summing the corresponding peak height values (a peak height of zero was imputed for missing metabolites). A Perl script was then used to assemble a complete metabolite matrix containing all of the endoscopy and passaged home collected samples and their accompanying metabolites. Metabolites present in ≤20% of the samples were treated as one-offs and were removed. The metabolite matrix was arranged into two cohorts (an endoscopy cohort and a passaged home collected cohort) and outlier peak height values were identified in each cohort using a plot of (meanmedian)/median for each analyte and a cutoff value ≥1.5. Outliers were replaced with the median value for that metabolite within the cohort. Metabolite peak height values were then standardized across the two cohorts by conversion to Z-scores (peak height-mean/standard deviation). A Pearson (n) principal component analysis was then performed using the standardized metabolite matrix and the statistical package XLSTAT 2012.6.02. Pearson's correlation coefficients were calculated using Microsoft Excel. Unsupervised hierarchical clustering and heatmap generation was accomplished using the R statistical package and the Ward method and Pearson correlation for the distance measure. Fold change analysis was performed using MetaboAnalyst2.0 [19]. Custom Perl scripts were used to combine and compare the cohort metabolites to identify common and unique metabolites and to group the metabolites and their relative abundance into defined chemical classes. Bar graphs were prepared using GraphPad Prism ver. 4.0.

Results and Discussion
To ascertain if the fecal composition is affected by the approach to sample acquisition, we obtained 34 stool samples from 17 healthy volunteers (two samples from each volunteer were collected, one in vivo via endoscopy and another ex vivo by home collection after passage) and the samples were compared in terms of their microbiome and VOC metabolome.
Using multi-tag pyrosequencing, the microbiome composition of the fecal samples was determined and the identified taxa were compared among the samples. A total of 50 different taxa (22 families, 50 genera) were identified in the feces. Figure 1 shows the distribution of the samples along the first two axes of a principal component analysis (PCA). As seen in Figure 1A, among the two cohorts as a whole, the majority of the samples tightly cluster on the PCA plot, indicative of relatively small variance in the overall microbiome composition (in contrast to the metabolome, in which the two cohorts are well segregated, as discussed below). A pairwise comparison of matched home passage and endoscopy collected samples reveals that individual pairs demonstrate some degree of variance in their microbiome composition ( Figure 1B) Figure S1). Thus, the majority (81%) of the home collected and corresponding endoscopy collected samples have strong to very strong correlations in their microbiome content (0.7<r<0.99). Additionally, unsupervised hierarchical clustering analysis does not significantly differentiate the microbiomes derived from the home collected and endoscopy collected fecal samples ( Figure S2). Collectively, considering the degree of similarity observed in the derived microbiome content among the home and endoscopy collected samples, the significant effort and expense associated with the endoscopic collection of stool may not be justified for a metagenomics investigation of human feces.
To determine if the VOC metabolome is affected by the fecal collection technique, VOCs from the endoscopy and passage collected samples were extracted by hSPME then identified by GC-MS. To ensure greater metabolome coverage while still accommodating reasonable throughput, 3 different hSPME fiber types were chosen for this analysis (CAR-PDMS, PA, and DVB-CAR-PDMS). Whenever possible (sample abundance permitting), the extractions were performed in duplicate (using different aliquots) and the replicates combined by averaging chromatographic peak height values. As the fecal VOC extraction profile is hyperbolic, we elected to perform both 20 minute and 18 hour extractions of each sample (using different aliquots), to permit a comparison of the results. Hence, a total of 408 chromatograms were acquired from the 34 participant fecal samples.
Utilizing a 20 minute hSPME, a combined total of 1371 different VOCs were identified in the endoscopy collected cohort. Similarly, when extracted for an identical duration, the home passage collected cohort contains 1404 total analytes, a difference of only 33 analytes relative to the endoscopy group. As anticipated, given the hyperbolic nature of the fecal VOC extraction profile [8], an 18 hour extraction isolates considerably more VOCs; 2097 total metabolites are associated with the endoscopy cohort and 2190 are found in the home passaged group (a difference between the cohorts of only 93 metabolites). Overall, regardless of the extraction duration used, it is apparent that both approaches to fecal collection yield a similar number of total VOCs. Figure 2 compares the composition of the cohorts in terms of the number of identified analytes and the relative abundance in each of the indicated chemical classes. With the 20 minute extraction duration, the overall chemical distribution appears quite similar among the two collection techniques (Figure 2A), with a slight bias towards oxidized metabolites in the home passaged cohort (alcohols, aldehydes, acids/esters) and reduced metabolites in the endoscopy cohort (alkanes, alkenes). With an 18 hour hSPME, the similarities among the cohorts are even more pronounced (Figure 2B), implicating the variability associated with short hSPME durations [5] as a primary contributor to the variability observed between the 20 minute metabolomes (Figure 2A). A similar trend is also observed when comparing the relative analyte abundance ( Figures 2C and 2D); specifically, greater variability associated with the 20 minute hSPME, yet a very similar distribution of metabolite abundance among the chemical classes, whether the feces is isolated in vivo by endoscopy or ex vivo using a home passage collection technique.
Although the overall chemical milieu appears similar between the two cohorts (based on the comparisons above), noteworthy differences arise when comparing the specific composition of analytes within each of the chemical classes. Figure S3A presents the similarities and differences within each of the chemical classes, comparing the 20 minute metabolomes. While a significant number of metabolites are common to both of the cohorts, in most of the chemical classes a substantial number are uniquely associated with either the endoscopy or home collected samples. Equivalent results are obtained when comparing the 18 hr metabolomes (data not shown). While these metabolomic differences between cohorts could be a reflection of the fecal collection technique, the 'cohort-unique' metabolites are more likely attributed to variations in dietary intake, as nearly all of these unique analytes appear in only a small proportion (20% or fewer) of the stool samples analyzed ( Figure S4). When these low frequency metabolites are The graphs indicate the total number of analytes in each class for the 20 minute hSPME (A) or the 18 hour hSPME (B) metabolomes. The sum of the standard deviation across the chemical classes is 134 for the 20 minute data and 108 for the 18 hour data, emphasizing the greater similarity between the cohorts in the latter. C and D) The relative abundance (peak height) of the analytes present in each cohort were distributed among the indicated chemical classes and then summed. The graphs indicate the relative abundance of each class for the 20 minute hSPME (C) or the 18 hour hSPME (D) metabolomes. Although the cohort similarity is apparent regardless of extraction duration, the sum of the coefficient of variation across the chemical classes totals 3.5 for the 20 minute data and 1.1 for the 18 hour data, highlighting the greater similarity between the cohorts in the latter. doi: 10.1371/journal.pone.0081163.g002 omitted, the metabolome composition among the cohorts appears nearly identical ( Figure S3B and S3C and Table 2). One notable exception, 1,3-bis(1,1-dimethylethyl)-benzene, is exclusively linked to the endoscopy collected fecal samples and occurs at a high frequency among those samples, in both the 20 minute and 18 hour metabolomes ( Table 2). A radiolysis product of polypropylene, 1,3-bis(1,1-dimethylethyl)-benzene is generated during gamma irradiation sterilization of microcentrifuge tubes [20], the same type of tubes we used for storing the endoscopy collected samples (the home passage collected samples were placed in different types of tubes). Thus, 1,3-bis(1,1-dimethylethyl)-benzene is most likely an exogenous analyte derived directly from the plasticware. Methyl-and propylparaben are two additional analytes also found uniquely associated with the endoscopy collected samples ( Table 2). Present as preservatives in lubricating jelly, it is probable that these two exogenous compounds were introduced when lubricant became unintentionally incorporated into a subset (44%) of the fecal samples, during the endoscopic collection of stool (the parabens were also detected in 15% of the endoscopic samples with a 20 minute hSPME).
Given their relatively low frequency of appearance, and considering their known affiliations with food, the unique cohort associations of 1-ethynyl-4-methylbenzene (a contaminant of cow's milk [21]), as well as 5-methyl-2-(1-methylethyl)cyclohexanone and undecanoic γ-lactone (two well-known phytochemicals with diverse taxonomic distribution [22][23][24][25][26][27][28][29]), may simply reflect dietary variations among the study participants, more so than alterations in the metabolome due to fecal collection technique. This may also be true for 4-methylquinazoline, uniquely detected in 25% of the 18 hr endoscopy collected samples (Table 2). While it is a known bacterial metabolite [30][31][32] derived from the shikimate metabolic pathway, quinazolines are also produced by plants and thus might reflect the dietary composition before the fecal sample was obtained. Similar relationships among the remainder of the cohort-unique metabolites could also be rationalized.
While a few cohort-unique metabolites can be differentiated, the vast majority (>99%) of the identified fecal VOCs are present in both the endoscopy and home passage collected groups. To determine how the relative metabolite abundance relates among the two cohorts, a PCA was performed using the 20 minute and 18 hour metabolomes (Figure 3). In stark contrast to the relatively invariant microbiomes (Figure 1), the endoscopy collected and home passage collected samples clearly differ from each other in terms of their metabolomes, as evidenced by the samples segregating into separate cohorts on the PCA plots. With both the 20 minute and 18 hour metabolomes, the first principle component clearly discriminates between the two collection techniques, whereas the second component contributes variation within the cohorts (particularly evident with the 18 hour extraction data). Numerous metabolites collectively contribute to the segregation of the cohorts (Figure S5), the top 20 of which are identified in Figure S6. A dendrogram and accompanying heat map derived from the 20 minute and 18 hour metabolomes (Figure 4) further illustrates the clear differentiation of the endoscopy collected and home passaged collected samples, demonstrating that while the two collection cohorts are nearly identical in their Several analytes (~30) deemed unique in the 20 minute metabolomes were found to be common to both cohorts after 18 hour extraction, and were thus excluded from the list (presumably a consequence of incomplete extraction with the shorter extraction duration). The inverse scenario was not found to occur (99% of the total metabolites compared among the 18 hour metabolomes appear common to both cohorts ( Figure S3C), only half of which were detected with a 20 minute extraction duration).
Metabolites highlighted with an asterisk are also uniquely associated with the same cohort after an 18 hour extraction, but appeared in less than 20% of the total number of samples in the 18 hour data set, so are not listed in the 18 hour column. Competitive dissociation by higher affinity analytes may account for the absence of these metabolites with prolonged extraction [17]. The frequency is the percentage of cohort members containing this metabolite.
doi: 10.1371/journal.pone.0081163.t002 specific VOC composition, changes in the abundance relationships among these metabolites differentiate the cohorts from one another. Figure 5 presents the fold change of metabolite abundance between the two cohorts. Nearly half (41%) of the total metabolome exhibits a fold change greater than 1.5. Exclusion of these 'hypervariable' metabolites significantly reduces the variance observed among the two cohorts, as the endoscopy collected and home passage collected cohorts no longer segregate on a PCA plot ( Figure  S7), clearly illustrating that the metabolome differences associated with the fecal collection technique is primarily attributed to global changes in relative metabolite abundance, rather than alterations in the specific composition of the metabolome itself.

Conclusions
While in vivo sample collection is typically desirable for metagenomic and metabolomic investigations, there are several challenges associated with the endoscopic collection of human stool, not the least of which is the significant cost compared to fecal collection at home after passage. Hence, a primary goal of our investigation was to evaluate if sample acquisition significantly biases the derived fecal microbiome and VOC metabolomes, thereby justifying the additional expense associated with endoscopic collection. While pairwise comparison of matched endoscopic and home collected samples did indeed illustrate some variability in the derived microbiomes, the vast majority (81%) of the paired samples demonstrate a strong correlation in their microbiome composition, and the two cohorts as a whole are seen to cluster together in a PCA plot, indicative of relatively little variance between them. In contrast, regardless of the hSPME extraction duration used, the VOC metabolomes differ widely between the home collected and endoscopy collected samples. While the specific VOC composition remains nearly identical, the relative abundance relationships among the metabolites were found to vary among the home and endoscopy collected samples. This is clearly depicted in Figure S8, illustrating metabolite correlation maps of the home passage collected and endoscopy collected samples. While similarities among the maps can be discerned, differences in correlation patterns are also readily distinguished. A PCA, hierarchical clustering analysis, and fold change analysis also clearly differentiate the metabolomes derived from the home and endoscopy collected samples (as detailed above), illustrating the bias of sample acquisition on the derived fecal VOC metabolome. Hence, the use of endoscopy collected samples appears justified for fecal VOC investigations (alternatively, studies evaluating methods of fecal collection and preservation are needed).
As the VOC extraction profile is hyperbolic, with short extraction durations more vulnerable to variation than extractions continued to equilibrium, a second goal of our investigation was to ascertain if a hSPME-based fecal metabolomics study might be biased by the extraction duration employed. In agreement with our previous observations [8], a  prolonged hSPME duration (18 hours) resulted in the identification of significantly more metabolites (~750) than short (20 minute) extraction durations. Additionally, comparison of the pooled home collected metabolome with the combined endoscopy collected metabolome reveals less overall variability associated with the 18 hour extraction duration (see Figure 2). Surprisingly however, the sample to sample variation within the cohorts appears more prominent with the 18 hour extraction duration (i.e. the 20 minute extraction samples cluster more tightly in the PCA plots than do the 18 hour extraction samples), as illustrated in Figure 3 and Figure S9. While this supports the preference for a 20 minute hSPME, it is important to note that several analytes (~30) deemed unique to a cohort in the 20 minute metabolomes were found common to both the home and endoscopy collected cohorts when the VOC extraction was performed for 18 hours (Table 2). Additionally, numerous analytes perceived to have significant fold change with a 20 minute extraction duration are found insignificant with prolonged extraction (Figure 6), underscoring the potential for bias associated with a 20 minute hSPME. Hence, 18 hour hSPME appears better suited to fecal VOC metabolomics.
Our results also implicate the importance of dietary consumption on the stool VOC composition. As illustrated in Figure S4, given the geographical and cultural similarities among the majority of our study participants, we attribute nearly 50% of the entire VOC metabolome to dietary variability, with these low frequency analytes appearing in ≤20% of the total fecal samples analyzed. However, detailed and comprehensive dietary information should ideally be obtained and considered when comparing the fecal VOC profiles between cohorts involving geographic and cultural diversity, where dietary habit could be markedly different. It should also be considered that we focused our study on stool samples collected from healthy subjects and it is possible that abnormal intestinal microbiota composition (i.e. dysbiotic microbiota) in disease states might be more or less susceptible to exposure to ambient environment when the stool is collected at home.  . Metabolome bias introduced by short hSPME durations. The box plots illustrate the relative abundance of 25 select metabolites extracted from the home passage collected and endoscopy collected fecal samples. hSPME was performed for 20 minutes or 18 hours, as indicated. Fold change values, calculated as the log transformation of the ratio between the mean metabolite abundance in the endoscopy cohort relative to the passaged cohort, are presented over each paired endoscopy and home collection cohort. For many of the analytes, the fold change appears significant (>2) in the 20 minute metabolomes, but is deemed insignificant with the 18 hour extraction, indicative of incomplete extraction of the analyte at 20 minutes, underscoring the potential for bias associated with short, non-equilibrium extractions. Examples of analytes demonstrating insignificant fold change with a 20 minute hSPME but significant fold change with an 18 hour hSPME were also observed, but are not depicted in the Figure  Supporting Information Figure S1. Microbiome correlation between matched home and endoscopy collected samples. Pearson's correlation coefficients (r) were calculated by comparison of the microbiome content (taxa and abundance) derived from matched home and endoscopy collected samples. As depicted in the plot, 81% of the sample pairs are strongly correlated (r>0.7), whereas 19% have only weak to moderate correlation among their microbiome content (0.2<r<0.7). In contrast, comparable analysis of the derived VOC metabolomes indicates very weak (r<0.2) correlation among the samples (not shown). (TIF) Figure S2.
Heat map showing the unsupervised hierarchical arrangement of the fecal samples according to the similarity of microbiome composition. The samples are arranged in rows, the taxa in columns, and shades of red represent elevation of abundance while shades of blue represent decrease abundance relative to the median values (see color scale). In the dendrogram, the fecal collection technique is indicated by the colored bars (green = home passage collected, red = endoscopy collected). The naming and numbering convention of the samples is described in Figure 1. Many of the matched sample pairs appear juxtaposed in the dendrogram, a reflection of their similarity. There is no significant segregation of the samples by fecal collection technique (in contrast to Figure 4). (TIF) Figure S3.
Distribution of metabolites among the endoscopy and home passage collected cohorts. The composition of each chemical class was compared between the two cohorts and the proportion of common and unique metabolites are indicated. A) The analysis was performed using the 20 minute VOC metabolomes. A comparison of 18 hr metabolomes produced similar results (data not shown). In B) and C), infrequent metabolites were disregarded by restricting the analysis to analytes that appeared in a minimum of 20% of all samples in each cohort. The analysis was performed using the B) 20 minute extraction and C) 18 hour extraction metabolomes. The graphs in (B) have been corrected to account for the observation that several analytes identified as unique in the 20 minute extraction data set were actually found to be common to both cohorts after prolonged (18 hour) extraction. The inverse was not found to occur. (TIF)  Figure S9. Principle component analysis of the VOC metabolomes derived from the home passage collected human fecal samples. PCA plots reflect the identified metabolites and their abundance obtained with either a 20 minute or 18 hour hSPME. The analysis was restricted to analytes appearing in a minimum of A) 20% of all samples in each cohort or B) 80% of all samples in each cohort. In either case, the 20 minute metabolomes clearly segregate from the 18 hour metabolomes and the 20 minute extraction samples cluster more tightly than do the 18 hour extraction samples. The naming and numbering convention of the samples is described in Figure 1. (TIF)