An Integrated Metabolomic and Microbiome Analysis Identified Specific Gut Microbiota Associated with Fecal Cholesterol and Coprostanol in Clostridium difficile Infection

Clostridium difficile infection (CDI) is characterized by dysbiosis of the intestinal microbiota and a profound derangement in the fecal metabolome. However, the contribution of specific gut microbes to fecal metabolites in C. difficile-associated gut microbiome remains poorly understood. Using gas-chromatography mass spectrometry (GC-MS) and 16S rRNA deep sequencing, we analyzed the metabolome and microbiome of fecal samples obtained longitudinally from subjects with Clostridium difficile infection (n = 7) and healthy controls (n = 6). From 155 fecal metabolites, we identified two sterol metabolites at >95% match to cholesterol and coprostanol that significantly discriminated C. difficile-associated gut microbiome from healthy microbiota. By correlating the levels of cholesterol and coprostanol in fecal extracts with 2,395 bacterial operational taxonomic units (OTUs) determined by 16S rRNA sequencing, we identified 63 OTUs associated with high levels of coprostanol and 2 OTUs correlated with low coprostanol levels. Using indicator species analysis (ISA), 31 of the 63 coprostanol-associated bacteria correlated with health, and two Veillonella species were associated with low coprostanol levels that correlated strongly with CDI. These 65 bacterial taxa could be clustered into 12 sub-communities, with each community containing a consortium of organisms that co-occurred with one another. Our studies identified 63 human gut microbes associated with cholesterol-reducing activities. Given the importance of gut bacteria in reducing and eliminating cholesterol from the GI tract, these results support the recent finding that gut microbiome may play an important role in host lipid metabolism.


Introduction
The known microbial community imbalance associated with Clostridium difficile infection (CDI) [1][2][3][4][5][6][7][8] also implies disrupted metabolic profiles. Restoration of colonic microbiota is one of the most effective approaches for the treatment of CDI, which affects nearly half a million individuals per year in the US [9]. Since the gut microbiome of patients with CDI is significantly different from that of healthy individuals [2], differences in microbial composition is likely accompanied by alterations in fecal metabolites that define these two populations. Given the known depletion of gut microbiota in CDI, we hypothesized that an integrative analysis of fecal metabolome and microbiome would lead to the identification of fecal metabolites associated with specific gut microbes.
Using a gas chromatography-mass spectrometry (GC-MS) based fecal metabolomics approach; we observed that the levels of cholesterol and its reduced metabolite coprostanol in fecal samples were significantly different between CDI and healthy controls. Previous studies in gut physiology have established a role for gut bacteria in cholesterol metabolism. Such microorganisms were first described in 1934 [10,11] and later identified as constituents of the human intestinal microbiota [12][13][14]. Given their cholesterol-reducing activity, these microbes have been investigated as potential agents for the treatment of hypercholesterolemia [15] and as additives to dairy products [16]. Cholesterol comprises up to 20% of the metabolites in fecal matter and their byproducts such as coprostanol and cholestanone contribute to an additional 5% of neutral sterol material [17].
Certain bacteria enzymatically reduce the double bond between carbons 5-6 of cholesterol to coprostanol, a reduced sterol, which is excreted in feces. It has been suggested that a high efficiency of cholesterol to coprostanol metabolism may reduce the risk of cardiovascular disease [18]. When coprostanol is conjugated with oligosaccharides, the resulting compounds have shown some activity against certain cancers [19,20]. Low rates of cholesterol to coprostanol conversion have been implicated in the progression of ulcerative colitis [21,22] and colon cancer [17]. Cholesterol reduction by microbiota can be achieved by bile-salt hydrolase (BSH) activity, binding to cell walls, enzymatic deconjugation, or direct uptake by the host bacteria [23,24]. In in-vitro culture assays, certain strains of Lactobacillus, Bifidobacterium, Enterococcus, and Streptococcus have all shown to decrease the level of cholesterol [16,24]. Together, the available data suggest a role for gut microbiota in fecal sterol metabolism. However, the identity of human endogenous gut microbes associated with cholesterol reduction remains poorly understood.
Here, we determined and measured cholesterol and coprostanol levels in fecal samples using GS-MS fecal metabolomics and found that levels of these two fecal metabolites differed significantly between subjects with CDI and healthy controls. Using multivariate Spearman rank correlation and 16S rRNA deep sequencing, we identified 65 bacterial phylotypes that were significantly associated with cholesterol or coprostanol, which included 63 phylotypes that correlated strongly with high coprostanol levels. Functional analysis of these 65 bacteria identified here would be of great interest for future studies.

Fecal coprostanol and cholesterol levels in fecal samples distinguished CDI from healthy controls
To identify fecal metabolites associated with specific gut microbes, we devised an integrative approach to correlate GC-MS metabolomics and 16S rRNA microbiome datasets (Fig 1). First, we examined metabolomics profiles of all samples collected longitudinally from seven subjects Infectious Diseases KO8 AI077713 awarded to GPW (http://www.niaid.nih.gov/Pages/default.aspx), and Merck Investigator-Initiated grant IISP 38992 awarded to GPW (http://engagezone.merck.com/ clostridium.html). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: One of the co-authors of the manuscript, Dr. Aaron T. Dossey, is the President, Founder and Owner of All Things Bugs LLC. The company does research on insects based food, insect farming and produces insect/cricket powder/ flour. It is not currently involved in research involving cholesterol metabolism or Ã Clostridium difficile Ã infection. For this specific manuscript, Dr. Dossey served as a collaborator assisting with study design, protocol development and data analysis. Neither he nor his company (All Things Bugs LLC) provided financial support for this project. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials. Flow-chart of integrative scheme between genomics and metabolomics to identify bacterial OTUs associated with cholesterol and coprostanol. Genomic DNA (gDNA) from longitudinal fecal samples emanating from Healthy or CDI subjects (over 90 days) was isolated and deep sequenced on the V1V3 hypervariable 16s rRNA gene before being classified to 2395 refOTUs (Right). The same longitudinal fecal sample was extracted with dichloromethane and injected on a GC-MS instrument where the retention with CDI and six healthy controls (Table 1). Partial least squares-discriminant analysis (PLS-DA) showed a clear separation of metabolomics datasets between CDI and healthy controls, with 72.7% of the variation explained in three components (Fig 2A). The cross-validated predictive ability Q2 was 0.66, indicating that a random fecal GC-MS spectrum discriminates CDI from healthy controls at 66% of the time. The explained variance R2 was 0.88. We next divided the CDI cohort according to the antimicrobial treatment they received (either Metronidazole or Vancomycin), and the healthy controls according to their history of antibiotic exposure (HAbx: presence of recent antibiotic exposure, Healthy: absence of recent antibiotic exposure). PLS-DA using a 4-state model (Healthy, HAbx, Met, and Vanc) showed that 59.5% of the variation in the GC-MS dataset was explained ( Fig 2B), with Q2 of 0.57 and R2 of 0.86 for a typical chromatogram. Thus, these results indicate distinct clustering of fecal metabolome between groups. time of discriminatory peaks were determined based on PLS-DA VIP scores (Left). Discriminatory peaks cholesterol and coprostanol were Spearman correlated to refOTUs based on NMDS and ANOVA. As a further step, ISA was used to determine whether refOTUs associated with high coprostanol or cholesterol were enriched in Healthy or CDI cohorts. Red arrows represent feedback and integration between chart items whereas black arrows are directional flow of the pipeline. Abbreviations: ANOVA: analysis of variance, ISA:  indicator species analysis, NMDS: non-metric multidimensional scaling, PLS-DA: partial least squares  discriminant analysis refOTUs: reference operational taxonomic units, RT:  To determine differentially abundant metabolites, Variable Importance Projection (VIP) scores from 114 fecal longitudinal chromatograms (S1 Fig) were generated for all longitudinal subject participants. As part of the metabolomics pipeline [25,26], each retention time RT was assigned a VIP score for all components fitted to a PLS-DA model. We summed the contributions of each RT to the first three components (component 1, component 2, and component 3) which corresponds to the x, y, and z-axis in Fig 2. The highest ranked VIP scores were in the range of 31.900 ± 0.150 and 31.700 ± 0.150 minutes, which represented attractive targets for further identification by mass spectrometry (Part A in S2 Fig). Peaks corresponding to retention times identified as cholesterol showed clustering in regards to its contributions to the PLS-DA plot. Fecal coprostanol was identified in both Healthy, HAbx, and Metronidazole cohorts. Their chromatographic peaks in terms of retention times varied within a time window as shown in mapping the VIP scores for each cohort. Scattering of retention times identified as coprostanol for each fecal sample is reflected by the dispersion shown in the contribution of coprostanol towards its PLS-DA contribution (Part B in S2 Fig). In general, coprostanol and cholesterol peak RTs were identified within a range of ±0.150 min as stated above.
As a second method to identify discriminating RTs for subsequent analysis, we examined a two-group model (CDI vs Healthy), of which the lowest p-value from a t-test was observed at 31.927 minutes (mins) (p = 9.40 x 10 −6 ) for the CDI group and 31.681 mins (p = 1.40 x 10 −11 ) for the Healthy group. For a 4-group model (Met vs Vanc vs Healthy vs HAbx), the lowest pvalue from ANOVA was observed at 31.843 mins (p = 7.25 x 10 −16 ) for the HAbx cohort and 32.05 mins (p = 7.25 x 10 −13 ) for the Vancomycin cohort. Thus, using a combination of PLS-DA VIP scores, t-test, and ANOVA (when stratifying Healthy and CDI), we identified a range of retention times that represented the most attractive targets for chemical identification by mass spectrometry.
Using positive mode mass spectrometry, we identified the top two retention times as cholesterol  a >97% match to the NIST database (S1 Fig). Thus, we chose cholesterol and coprostanol as two metabolites that we could confidently characterize that distinguish CDI from Healthy fecal samples. The levels of these two compounds were inversely correlated with each other, and a subset of chromatograms derived from vancomycin treated CDI subjects showed high levels of cholesterol and very low levels of coprostanol ( Fig 3A).
To compare proportional abundance of coprostanol and cholesterol between samples, we quantified peak areas in total ion current (TIC) for both metabolites in injected fecal extract. Grouped analysis using all samples revealed that coprostanol levels were highest and cholesterol levels were lowest in healthy subjects. In contrast, mean coprostanol level was lowest and cholesterol level was highest in subjects with CDI (p<0.001; Student's t test) (Fig 3B). Subgroup analysis showed a significant difference in mean coprostanol total ion current among the four groups (F 3, 9 = 9.797, p < 0.01, ANOVA). Healthy subjects (HAbx and Healthy) had significantly higher mean coprostanol TIC percentage than subjects with CDI (Metronidazole and Vancomycin subgroups) ( Fig 3C). Additionally, HAbx, Healthy, and Metronidazole groups had mean coprostanol TIC percentage that were significantly greater than the Vancomycin group. The relationship between retention times for cholesterol and coprostanol as determined by mass spectrometry (x-axis) and their relative abundance (y-axis). The inverse relationship between the two compounds based on fold change in fecal composition is highlighted in blue and red circles. (B) Box-plots showing distribution of average total ion current of coprostanol (left) and cholesterol (right) for all fecal samples from the Healthy or the CDI group. The TIC of the two metabolites was normalized by auto-scaling before plotting. (C) Percentage of coprostanol TIC relative to the sum of coprostanol and cholesterol TIC for each subgroup. ANOVA on the ranked Coprostanol TIC values indicated a significant difference among the four cohorts (F 3, 9 = 9.797, p < 0.01). For the 13 subjects, ranks were highest for Healthy (10) and HAbx (10), followed by Met (6) and Vanc (2); numbers in parentheses indicate mean ranks. Letters above whiskers indicate similar groups based on ranks according to the Tukey HSD test. Fecal samples from a Healthy, HAbx, or Metronidazole origin could be grouped together according to coprostanol levels. Likewise, Metronidazole and Vancomycin treated fecal derived samples could be grouped together based on coprostanol levels.

NMDS (Non-metric multidimensional scaling) analysis identified 63 gut bacteria associated with coprostanol
After bioinformatic classification of 16S reads was performed, a matrix of 2395 OTUs was reduced to two dimensions using a Bray-Curtis distance, which does not count zero values in the matrix as a sign of similarity. The method reduced the matrix of 2395 OTU's into two dimensions (Stress = 0.165) and correlated at 0.747 with the original data matrix. This new 2-dimensional matrix was then regressed to High or Low coprostanol levels for each subject/ time point based on TIC levels (see Methods section). After controlling for disease/drug status and variability associated with the repeated measures (time within subject), 66.5% of the remaining SSE (sum of square errors) could be attributed to coprostanol on NMDS axis 1 (F 2,103 = 102.41, p <0.001). Coprostanol was not significantly related to the second axis (F 2,103 = 0.572, p = 0.566), and explained only 1.1% of the variability after controlling for covariates. (Fig 4). In total, 63 bacterial OTUs correlated positively with coprostanol levels and two negatively correlated with coprostanol ( Fig 4A, Table 2, and S3 Fig).
The NMDS analysis revealed two predominant clusters. One cluster was dominated by the Vancomycin cohort and was driven primarily by two Veillonella species that correlated negatively with coprostanol ( Fig 4B). We examined the relationship between the levels of coprostanol (TIC's) and the relative abundance of coprostanol-associated gut bacteria (i.e. total abundance of 63 coprostanol-associated bacteria based on 16S rRNA sequence reads). A analysis of bacterial OTUs and relative coprostanol TICs. Fecal samples were assigned as either "High" or "Low" coprostanol formers. Data was reduced by the NMDS approach using Bray-Curtis distances, followed by Spearman rank correlation to identify OTUs associated with coprostanol TIC levels. Dimension 1 represents coprostanol levels; Dimension 2 represents CDI treatment or antibiotics exposure for each subject.
doi:10.1371/journal.pone.0148824.g004 Table 2. Bacterial operational taxonomic units (OTUs) associated with coprostanol total ion current. For each OTU, the taxonomy (u = uncultured), Spearman correlation coefficient (r s ), relative abundance (RA) and relative frequency (RF) based on 16S rRNA sequence reads are shown. A reference OTU (refOTU) sequence accession number from the Silva (release 108) database is shown for all uncultured species. Superscripts after the OTU number indicate whether the species is an indicator species for a specific cohort. HAbx: healthy subjects with prior antibiotic exposure, H: healthy subjects with no prior antibiotic exposure; Vanc: subjects with CDI who received vancomycin therapy. P-values are shown with Bonferroni correction. strong, positive correlation between levels of fecal coprostanol and the abundance of coprostanol-associated bacteria in healthy subjects could be modeled monotonically (r s = 0.868, p<0.001) (Fig 5). Both a linear (R 2 = 0.731) and an exponential (R 2 = 0.736) model fit equally well to the correlation between these two metrics, although inter-individual and longitudinal variations between coprostanol, cholesterol levels, and abundance of coprostanol-associated bacteria were also observed (S4-S16 Figs). With the exception of one subject (M4) in the metronidazole group, the abundance of coprostanol-associated bacteria and coprostanol levels in CDI subjects were generally low compared to healthy controls. The depletion of coprostanolassociated bacteria and low coprostanol levels were particularly pronounced in the vancomycin subgroup ( Fig 5).

The association of specific gut bacteria with individual cohorts
We asked which of the 65 bacterial OTUs identified by the NMDS analysis were associated with each of the four subject cohorts. Using Indicator Species Analysis (ISA), we found that 31 of 65 bacterial OTUs were associated with health. Of these, 20 phylotypes were "indicators" of healthy subjects with antibiotic exposure, and 11 were associated with healthy subjects without prior antibiotic exposure, respectively. Two OTUs were associated with CDI subjects who received vancomycin. No indicator species were found for the metronidazole subgroup ( Fig  4A, Table 2). In total, 33 OTUs were associated with one of the four cohorts (S3 Fig). Next, we combined the vancomycin and metronidazole subgroups as the "disease" group and the Healthy and HAbx subgroups as the "health" group, and asked whether the remaining 32 OTUs could be indicators of health or disease. Of the 32 OTUs, 19 were indicators of "health" (Table 3), but no additional indicator species were identified for the "disease" group. The remaining 12 OTUs that were associated with coprostanol could not be assigned as an indicator for any of the four subject cohorts.

Co-occurrence of coprostanol-associated bacteria in gut microbiota
To identify bacterial communities that co-occur more frequently than they are in combination with other members of the 65-OTU community, we performed an agglomerative hierarchical clustering of the 65 coprostanol-associated bacteria. Using this algorithm, we identified 12 clusters of microbial communities ranging from a single species community to a 14-member community (Fig 6). The 12 community clusters determined from hierarchical clustering analysis was found in a majority of subjects studied, though the presence of all 12 communities was not accounted for in every subject. Furthermore, each hierarchically clustered community contained both our proposed coprostanoligenic as well as non-coprostanoligenic bacteria. We performed a nested ANOVA on ranked 16S rRNA sequence data after summing the sequences for each cluster to generate a total cluster score. In all cases, there was a significant cohort effect and significant difference of subjects within cohort, but not for time except for clusters 8 and 11. This provided confidence that we could collapse the nested variability by averaging a single cluster score for subjects based on their longitudinal data. This new collapsed dataset was then subjected to a separate ANOVA. ANOVA testing two separate models for each cluster (Healthy vs CDI), and the other with a single factor having four levels (Healthy, HAbx, Vanc, Met) on the ranked abundance for each cluster was performed. This analysis showed that clusters 3-5 and 10-12 could statistically distinguish for the two level model while 2-5, 7, and 9-12 could distinguish between the four level model (p < 0.05; Fig 6). Table 3. Indicator species analysis (ISA) of the remaining 32 OTUs that were not previously assigned to one of the four individual cohorts. Groups were designated as "Health" for healthy volunteers (combining the two groups with and without prior antibiotic exposure, and as "Disease" for subjects with CDI (combining the metronidazole and vancomycin groups). A representative sequence accession number from Silva (release 108) database is shown for all uncultured species as a reference OTU. Indicator value of species j in group k is the product of the percent relative abundance of each organism to a specific cohort along with its percent relative frequency. Those species that are significant after 100,000 Monte-Carlo randomizations of ecological communities are listed below.

Discussion
In healthy populations, coprostanol conversion in the gastrointestinal tract is influenced by demographics and gender [27][28][29][30][31]. Midtvedt et al. showed that coprostanol conversion phenotype is established early in the first year of life [32]. However, antibiotic treatment can influence the rate of conversion [33]. Healthy individuals can be classified as high or low coprostanol formers [34,35], and these high or low metabolic phenotypes could be replicated in animals by transplantation of human fecal material into gnotobiotic rats [36]. These observations suggest that coprostanol-associated phenotype is determined by the composition of gut microbiota. However, the identity of gut microbes that can reduce cholesterol to coprostanol has not been well defined. By correlating 16S rRNA microbiome and GC-MS metabolomics datasets, we identified 65 gut bacteria associated with fecal coprostanol and cholesterol. Given the recent report that showed an association between specific intestinal microbiota and blood lipid levels in human subjects [37], and the discovery of a new cholesterol reducing species Bacteroides sp. D8 [38], cultivation and functional analysis of coprostanol-associated bacteria identified in our study would be of great interest in cardiovascular disease. A majority of bacterial phylotypes associated with high coprostanol levels belonged to the Lachnospiraceae and Ruminococcaceae family of Clostridiales order, suggesting that some members of these families may harbor coprostanoligenic activity. Fu et al. recently analyzed the bacterial taxa of a large population cohort and identified 34 gut bacterial taxa that were strongly associated with blood lipid levels. These investigators showed that the gut microbiome explained a large percentage of variations in blood HDL levels independent of age, gender and host genetics. Interestingly, 30 of 34 bacteria taxa identified in this study belonged to the Clostridiales order, most of which were either Lachnospiraceae or Ruminococcaceae (Tables 2 and  3). Thus, our results are consistent with these findings pertaining to the association of health with many of these putative coprostanoligenic bacteria. It also portends to study the relationship, if any, between the levels of these gut microbes identified here and blood levels of cholesterol in a susceptible CDI population.
The majority of Lachnospiraceae and Ruminococcaceae species identified in Table 2 and Fig  4A have not been previously cultivated, making functional assays to confirm coprostanoligenic activities challenging. Nonetheless, since some of these organisms were more likely to co-occur in community clusters (Fig 6), they may share similar physiologic, functional or growth requirements. As an example, conditions that are favorable to Barnesiella intestinihomis and Prevotella stercorea, may be favorable to the other five co-occurring uncultured bacteria (Fig 6; See Group #3), since metabolism of a functional gut ecosystem depends on cross feeding of bacteria to generate metabolites. For instance, the nitrogen cycle requires communities of microbes where species act in succession whereby the metabolic product of one species feeds directly into the metabolic reactant of another [39].
Human gut microbiota is known to confer colonization resistance against C. difficile, but the underlying mechanisms remain poorly understood. Recent data suggest that resistance mechanisms may involve cholesterol and bile acids metabolisms [40]. Schwan et al. showed that entry of C. difficile to colonocytes is cholesterol dependent and can be inhibited by methylβ-cyclodextrin that depletes cholesterol [41]. Clostridium difficile toxin A binding to target cells is facilitated by cholesterol-enriched lipid rafts [42] and a number of sterols and bile acids can inhibit C. difficile binding [43,44]. Clinically, the use of statins (which inhibit cholesterol biosynthesis) is associated with a reduced risk of Clostridium difficile infection [45,46]. While the precise mechanisms underlying colonization resistance remain unknown, low levels of luminal coprostanol and high levels of cholesterol in the setting of altered gut microbiota may play a role in the susceptibility to C. difficile infection.
The role for bile acids metabolism in C. difficile pathogenesis is also of significance. Bile acids are the main metabolites of cholesterol in the liver. Primary bile acids (e.g. cholic acid and chenodeoxycholic acid) are produced by endogenous enzymes in the liver and conjugated to taurine or glycine to form bile salts to assist in lipid digestion in the small intestine. About 5% of the secreted bile salts reach the colon (95% are reabsorbed via the enterohepatic circulation), where primary bile acids are deconjugated and dehydroxylated by intestinal microbiota to form secondary bile acids deoxycholic and lithocholic acids. Therefore, antibiotics that alter gut microbiota are expected to reduce the transformation of primary bile acids into secondary bile acids [47]. In our study, the depletion of these essential commensals may have led to elevated fecal cholesterol especially in the CDI cohort administered vancomycin. Since vancomycin treatment is generally considered for more severe form of infection, we speculate the loss of these commensals leads to the loss of bile acid components that ultimately gave rise to abnormal fecal cholesterol levels seen here.
Interestingly, primary bile acids are potent germinates for C. difficile spores to transform into vegetative bacteria [48]. In contrast, secondary bile salts, which are generated by gut microbes through enzymatic action of primary bile acids, inhibit the vegetative growth of C. difficile in vitro [49], and likely also inhibit germination and/or vegetative growth in vivo. In patients treated with fecal microbiota transplant for recurrent CDI, an increase in deoxycholic acid and lithocholic acid was observed, which was accompanied by an increase in Clostridial clusters IV and XIVa (which include members of the Lachnospiraceae and Ruminococcaceae families). Members of these families harbor numerous genes for 7α-dehydroxylation and deconjugation of primary bile acids [40]. Administration of Clostridium scindens, an organism expressing enzymes involved in secondary bile acid generation, enhance resistance to C. difficile infection in mice [50,51]. Taken together, perturbation of gut microbiota may decrease the conversion of primary bile salts to secondary bile salts, leading to increased vegetative growth, toxin production and colitis. Consistent with this, we have previously shown that members of the Lachnospiraceae and Ruminococcaceae families are depleted in patients with CDI [2]. In the present study, many OTUs that belonged to these two families correlated strongly with high coprostanol levels and were indicator species for health (Fig 4A).
Spearman correlation and NMDS analysis revealed a negative correlation between coprostanol level and two Veillonella species (Figs 4 and 5). Endogenous to the oral cavity, Veillonella spp. have also been found in atherosclerotic plaques, fecal samples, and oral washings from subjects with known cardiac events [52]. These observations suggest an association between Veillonella and cholesterol, and are consistent with our data indicating a reduction of coprostanol levels that was selectively associated within the vancomycin subgroup (see S7-S9 Figs). Some bacteria may travel to target sites in the body by cholesterol-laden foam cells [53], suggesting that cholesterol may be exploited for additional functionality by pathogenic gut microbes. It should be noted that variance exists between subject cohorts and time points (see S4-S16 Figs) and wide dispersion exists amongst Spearman correlations of specific individuals compared to Spearman correlations within a specific cohort. Despite this, a global assessment of metabolite and microbial features shows healthy subjects having more candidate bacteria associated with high fecal coprostanol levels than CDI subjects do. Separate evidence in our lab based on sequencing a larger cohort of CDI subjects longitudinally have shown local fluctuations within day-to-day sampling, but global stability over 100 days (manuscript in progress). This dynamical trend of local instability and global equilibrium was also found in a recent analysis of CDI microbiota recovery after fecal microbial transplantation (FMT) [54] In summary, we have identified 63 human endogenous gut microbes associated with coprostanol. While our study points towards an association of the human microbiome towards cholesterol metabolism, further studies should focus on assessing roles of these candidate bacteria on C.difficle recovery in animal studies and a function of fecal coprostanol/cholesterol ratio in such a recovery. Modalities involving shotgun or whole genome sequencing on fecal extracts to identify differential genes involved in cholesterol metabolism between CDI and healthy controls can extend our findings to cholesterol and bile-acid pathways absent in CDI due to antibiotic perturbation. In short, microbiota-mediated transformation of fecal cholesterol to coprostanol may enhance resistance to C. difficile infection by altering toxin entry and decreasing the availability of cholesterol substrates for primary bile acids generation. Given the recent report that suggests a role for gut microbiome in blood lipid levels [37] and the inverse relationship between serum cholesterol levels and fecal coprostanol/cholesterol ratio [22], cultivation and functional analysis of coprostanol-associated bacteria identified in our study would be of great interest for both cardiovascular disease and C. difficile infection.

Subject recruitment and longitudinal fecal sampling
Fecal samples (mean of 8.7 fecal samples per subject) were collected longitudinally from 13 subjects over a three-month period. Time between sample collections ranged from nine to 17 days with a mean of 11 days. Fecal samples were collected for a period up to 90 days from years 2011-2013. Subject characteristics and prior antibiotic history for all 13 participants are shown in Table 1. Of the 7 subjects, 3 subjects were treated with vancomycin (Vanc) and 4 subjects received metronidazole therapy (Met) for their underlying CDI pathology. Healthy subjects were recruited and analyzed according to prior history of antibiotic exposure (within three months of the first fecal sample collection): three subjects had prior antibiotic exposure (HAbx), and three subjects had no prior antibiotic exposure (Healthy). All subjects were recruited from University of Florida Health System in Gainesville, FL as part of a larger longitudinal Clostridium difficile gut microbiome study. The University of Florida IRB approved the study, and all subjects provided written informed consent to participate.

16S rRNA Sequencing and Bioinformatics analysis
Fecal samples delivered to the laboratory via postal service or through the study coordinator were immediately aliquoted in 1.5mL Eppendorf tubes, assigned a code for a de-identification, and stored at -80°C until genomic DNA extraction was to be performed. Fecal samples were contained in Invitek PSP Stool Collection vials that held a capacity of 10mL of fecal material with stabilization buffer. Genomic DNA extraction using a bead beating technique, bacterial 16S rRNA gene amplification, and sequence analysis were performed as described previously (2). Briefly, the V1-V3 hypervariable 16S rRNA gene was amplified in quadruplicate using barcoded PCR primers with denaturing and amplification times derived from the Human Microbiome Project (www.hmpdacc.org). The primers contained a titanium-flex region for compatibility to the Roche/FLX sequencer. Amplicons were excised under fluorescence, gel purified, (Qiagen gel extraction kit), quantified using Qubit (Invitrogen), and pooled in equimolar concentrations of 100ng per amplicon. Sequencing was performed on the Roche/454 FLX pyrosequencing platform. Using a custom bioinformatic pipeline [2], 2,395 unique reference operational taxonomic units (refOTUs) were obtained post-filtering based on 97% sequence similarity to the Silva 16S rRNA database. The taxonomy at the nearest phylogenetic level for all 2,395 16s RNA obtained in this study are shown in S1 Table.

GC-MS Sample Preparation
A dichloromethane microextraction was performed on 1 mL aliquot of each fecal sample from the PSP 1 kit (Invitek, Germany) that was used for 16 rRNA deep sequencing. Each aliquot was vortexed and placed in a 13x15 mm glass test tube with screw-top lid and 1 mL (1:1 vol/vol) of dichloromethane was added using a gas-tight Hamilton syringe and vortexed for 1 minute. Samples were then centrifuged at~3000g for 20 min at 4°C, and the organic fraction was removed using a glass pipette. The resultant organic layer was then transferred to GC-MS auto sampler vials and clamped with a PTFE faced rubber cap. The volume obtained from each extraction was variable; for small volumes, samples were placed in vial inserts. The extracted organic phase was diluted 1:100 and run on an Agilent 5973 GC-MS with auto sampler. 1μL of sample was injected (splitless) onto an HP-5MS capillary column (0.25mm x 30mm x 25μm) with helium as the carrier gas flowing at 1 mL/min. The solvent delay time was 6 minutes. An oven temperature was set at 30°C for 3 minutes and ramped to 320°C at 10°C/min until finally held for 10 minutes. Raw GC-MS extension files acquired from the instrument were converted to compatible files using Xcalibur v8 (ThermoScientific) and subsequently manipulated and processed in OpenChrom (San Diego, CA). A representative GC-fecal chromatogram, mass spectrometry fragmentation of both cholesterol and coprostanol, and molecular structures of the two sterols are shown in S1 Fig.

Metabolomic Analysis
In order to extract differentially abundant features, peak areas for all chromatograms were combined into a format for discriminant analysis using the Metaboanalyst software suite [25,26]. A matrix containing all 118 samples as row header and retention times as column header was generated by merging all raw data files. Each sample and sample time point was annotated as either CDI or Healthy (for a 2-component model), or Healthy, HAbx, Met, Vanc (for a 4-component model). The elements for this array input were peak intensities in units of total ion current (TIC). Using the Metaboanalyst suite, each spectrum was normalized by sum filtering and autoscaled, which resulted in a Gaussian distribution of intensities for all chromatograms. The output determined retention times that were significantly different between CDI and healthy cohort (irrespective of prior antibiotic exposure and antibiotic treatment) and VIP scores of retention times that contributed to the PLS-DA. 155 retention times were found to be statistically significant (t test). From the t test and VIP scores, retention times 31.700± 0.1000 mins and 31.900 ± 0.1000 mins were the top features that distinguished both the two and the four-component model. The chemical identity of these two peaks was investigated by mass spectrometry (Chemstation, Agilent), and compounds were matched to cholesterol and coprostanol for all subjects manually to avoid complications from retention times overlapping (see Results section). Each chromatogram was then deconvoluted by extracting the leading molecular ions for both molecules: cholesterol (m/z 353. 4, 368.4, 386.4), and coprostanol (m/z 355.4, 373.4, 388.4) using XCalibur v8. The percentage of coprostanol as a function of the sum of coprostanol and cholesterol TIC was then computed for each chromatogram. A Spearman's rank (rho coefficient) was calculated for each sample from all time points (S4-S16 Figs) Other discriminating metabolites identified in this targeted screen that were identified included Vitamin E (RT 32.10 ± 0.150 mins), fatty acids hexadecanoic acid and octadecanoic acid (22.00-24.00 mins), and squalene~33.00-34.00 mins. In addition, we observed that cholestanone, another cholesterol derivative, co-eluted with coprostanol peaks in some but not all chromatograms. However, due to high inter-individual variation of these other metabolites, the lack of agreement and ambiguity with NIST database match, and their relatively low overall contribution to PLS-DA VIP components, we selected cholesterol and coprostanol for subsequent analysis, as identification of these two metabolites was consistent across all 118 fecal samples.

Nonmetric Multidimensional Scaling (NMDS)
To identify bacterial OTUs associated with cholesterol and coprostanol levels, two complementary methods were employed. First, Nonmetric Multidimensional Scaling (NMDS) using a Bray-Curtis distance was used as a data reduction technique. The original community matrix of 118 individual-time samples by 2395 bacterial OTUs, was reduced to two orthogonal dimensions using NMDS. The NMDS algorithm generated a best solution after 1000 random starts of 2-D matrices into a 118 by 2 two-dimensional matrix (Stress = 0.165). This allowed for a statistical analysis using ANOVA to identify unique bacterial species associated with one of the two dimensions. As an additional step, a Spearman rank correlation test determined which of the 2395 OTU's were most strongly related to the NMDS dimensions in order to indicate which OTU's were related to coprostanol TIC levels. Spearman rank correlation, rather than a parametric alternative, was used because (a) there was a large number of zeroes in the community matrix, and (b) we had no a priori reason to suspect a linear relationship between OTUs and coprostanol or cholesterol levels. Since most samples had some amounts of cholesterol and coprostanol in their chromatograms, we designated "High" coprostanol formers as those whose ratio of coprostanol/(coprostanol + cholesterol) TIC was >50%, and "Low" coprostanol formers as those whose ratio was <50%. Each sample was then graphed in a NMDS plot as either a "High" or a "Low" coprostanol former.

Indicator Species Analysis (ISA)
Indicator species analysis (ISA) [55] was used to identify bacterial OTUs (from NMDS and Spearman rank) that were significantly associated with a specific cohort. The procedure is a statistical method for determining whether a particular species is found in predefined groups significantly more likely than if the same species were randomly assigned to the groups. An ISA combines measures of relative abundance (i.e., the number of species in one group relative to that species abundance in all groups) and relative frequency (i.e., the proportion of samples the species occurs at least once in a group relative to the proportion of samples the species occurs in all groups) into a single measure (i.e., an indicator value). The indicator value is computed as IV kj = 100(RA kj x RF kj ) where the IV kj is the indicator value associated with each species j in each group k, RA kj the relative abundance of each species j in each group k relative to all other groups, and RF kj is the relative frequency of species j in each sample of group k relative to all other groups. Indicator values range from 0 (no indication) to 100 (perfect indication). Once indicator values were assigned for each species to each group, a Monte Carlo randomization procedure determined whether observed indicator values are higher than indicator values associated with randomized ecological communities [55]. We used 100,000 randomizations and determined our IV kj significant if it gave a higher IV kj than 95% of the randomizations after a Bonferroni correction for experiment-wise error. Probability values were adjusted using Bonferroni correction. The R package labdsv was used for ISA and ran on R console v3.1.2.
ISA was performed on coprostanol-associated bacteria with three sequential strategies in defining groups. In the first strategy, we used cohort groupings (i.e. Met, Vanc, HAbx, and Healthy). In the second strategy, any non-significant coprostanol-associated bacteria that could not be placed into one of the 4 initial groups were used in a second ISA using disease (combining Met and Vanc subgroups) or no disease (combining HAbx and Healthy subgroups) groupings. In the final ISA strategy, bacteria were analyzed without a priori defined groups (i.e., cohorts or disease), but groups were determined using the method of Dufrene and Legendre (1997) which relies on agglomerative clustering to determine independent community clusters. (see section on Agglomerative hierarchical clustering below)

Agglomerative hierarchical clustering
We used agglomerative hierarchical clustering on a distance matrix generated from a species by sample matrix. The method finds groups that are most similar, combines them, and then combines the combined groups into higher clumps until all species are accounted for. The method produces a dendrogram (Fig 6) with nodes that divide species groups hierarchically into fewer and fewer different groups. The number of unique communities can be objectively determined when an ISA is run within each different number of possible groupings and the average p-value is returned for the entire sample. The method of Dufrene and Legendre [55] was used to combine agglomerative hierarchal clustering of taxa with ISA. The most parsimonious number of groups is the number of groups with the lowest average p-value resulting from the ISA. As recommended by [56], we used the flexible beta method with β = -0.25 on the Bray Curtis distance matrix. A Bray Curtis distance gives the proportion of shared taxa weighted by their abundance between any two samples. It does not count shared zeroes as a similarity and as such has applicable use with sparse matrices such as those generated by 16S rRNA sequencing.
Supporting Information S1 Fig. GC-MS reveals cholesterol and coprostanol as differential metabolites in CDI. (A) Representative GC-chromatogram of typical healthy subject time point (above) and CDI subject time point (below). SF is the solvent front and FA represents fatty acid peaks that consistently eluted in the time window but whose identity and features could not be discriminated. Coprostanol, Cholesterol TIC content and 16S Sequence percent abundance for Subject 303 (Vancomycin treated CDI-subject). Spearman's correlation coefficient (rho) between 16S sequence abundance and coprostanol also shown. (PNG) S11 Fig. Subject 501 Sequencing and Metabolite Profile. Coprostanol, Cholesterol TIC content and 16S Sequence percent abundance for Subject 501 (Healthy subject with 90 days prior antibiotic exposure). Spearman's correlation coefficient (rho) between 16S sequence abundance and coprostanol also shown. (PNG) S12 Fig. Subject 502 Sequencing and Metabolite Profile. Coprostanol, Cholesterol TIC content and 16S Sequence percent abundance for Subject 502 (Healthy subject with 90 days prior antibiotic exposure). Spearman's correlation coefficient (rho) between 16S sequence abundance and coprostanol also shown. (PNG) S13 Fig. Subject 503 Sequencing and Metabolite Profile. Coprostanol, Cholesterol TIC content and 16S Sequence percent abundance for Subject 503 (Healthy subject with 90 days prior antibiotic exposure). Spearman's correlation coefficient (rho) between 16S sequence abundance and coprostanol also shown. (PNG) S14 Fig. Subject 701 Sequencing and Metabolite Profile. Coprostanol, Cholesterol TIC content and 16S Sequence percent abundance for Subject 701 (Healthy control subject. Spearman's correlation coefficient (rho) between 16S sequence abundance and coprostanol also shown. (PNG) S15 Fig. Subject 702 Sequencing and Metabolite Profile. Coprostanol, Cholesterol TIC content and 16S Sequence percent abundance for Subject 702 (Healthy control subject. Spearman's correlation coefficient (rho) between 16S sequence abundance and coprostanol also shown. (PNG) S16 Fig. Subject 703 Sequencing and Metabolite Profile. Coprostanol, Cholesterol TIC content and 16S Sequence percent abundance for Subject 703 (Healthy control subject. Spearman's correlation coefficient (rho) between 16S sequence abundance and coprostanol also shown. (PNG) S1 Table. Reference OTU (refOTU) Identification number and corresponding phylogeny based on Silva database used in this study. (XLSX)