Computational Modeling of the Gut Microbiota Predicts Metabolic Mechanisms of Recurrent Clostridioides difficile Infection

Approximately 30% of patients who have a Clostridioides difficile infection (CDI) will suffer at least one incident of reinfection. While the underlying causes of CDI recurrence are poorly understood, interactions between C. difficile and other commensal gut bacteria are thought to play an important role. In this study, an in silico metagenomics pipeline was used to process taxa abundance data from 225 CDI patient stool samples into sample-specific models of bacterial community metabolism. The predicted metabolite production capabilities of each community were shown to provide improved recurrence prediction compared to direct use of taxa abundance data. More specifically, clustered metabolite synthesis rates generated from post-diagnosis samples produced a high Enterobacteriaceae cluster with disproportionate numbers of recurrent samples and patients. This cluster was predicted to have significantly reduced capabilities for secondary bile acid synthesis but elevated capabilities for aromatic amino acid catabolism. When applied to 40 samples from fecal microbiota transplantation (FMT) patients and their donors, community modeling generated a high Enterobacteriaceae cluster with a disproportionate number of pre-FMT samples. This cluster also was predicted to exhibit reduced secondary bile acid synthesis and elevated aromatic amino acid catabolism. Because clustering of CDI and FMT samples did not identify statistical differences in C. difficile abundances, these model predictions support the hypothesis that Enterobacteriaceae may create a gut environment favorable for C. difficile spore germination and toxin synthesis. Importance Clostridioides difficile is an opportunistic human pathogen responsible for acute and sometimes chronic infections of the colon. Elderly individuals who are immunocompromised, frequently hospitalized and recipients of antibiotics are particular susceptible to infection. Approximately 30% of treated patients will suffer at least one episode of reinfection, commonly termed recurrence. The objective of the current study was to utilize computational metabolic modeling to investigate the hypothesis that recurrent infections are related to the composition of the gut bacterial community within each patient. Our model predictions suggest that patients who have high compositions of the bacterial family Enterobacteriaceae during antibiotic treatment are more likely to develop recurrent infections due to a metabolically-disrupted gut environment. Successful treatment of recurrent patients with transplanted fecal matter is predicted to correct this metabolic disruption, suggesting that interactions between C. difficile and Enterobacteriaceae are worthy of additional study.


Introduction
The anaerobic bacterium Clostridioides difficile is an opportunistic pathogen responsible for infections, primarily in the human colon (1). C. difficile infection (CDI) is most common in elderly patients previously treated with broad spectrum antibiotics that disrupt the healthy gut microbiota and produce a dysbiotic environment conducive to C. difficile germination, expansion and pathogenicity (2,3). CDI has become particularly common in hospital settings due to the ability of C. difficile to form spores that adhere to surfaces and resist common disinfectant protocols.
Some C. difficile strains have developed resistance to common antibiotics while also exhibiting more severe pathogenicity (4). Studies estimate that 500,000 CDI cases occur in the U.S. annually (5), resulting in 29,000 deaths and over $4.8 billion in associated costs in acute care facilities alone (6).
Approximately 10% of healthy adults are asymptomatically colonized with C. difficile (7)(8)(9). Commensal species in the healthy gut can provide resistance against C. difficile pathogenic colonization through a variety of metabolic mechanisms, including competition for dietary nutrients such as carbohydrates and amino acids (10) and conversion of host-derived primary bile acids that promote C. difficile spore germination to secondary bile acids that inhibit germination and growth (11). Recurrence is a major challenge associated with CDI treatment, as approximately 30% of patients develop a least one occurrence of reinfection (12). The host-microbiota mechanisms underlying recurrence are not well understood, as microbiota composition alone is a poor predictor of patient recovery versus recurrence (13)(14)(15). For patients who suffer from repeated episodes of recurrence, fecal microbiota transplantation (FMT) is the last resort treatment. Despite its remarkable success rate approaching 90% (16), FMT remains controversial (17) as the donor 5 microbiota confer poorly understood functions to the endogenous community (18) and may contain pathogenic strains not recognized during screening of donor stool (19).
The advent of metagenomic technologies such as 16S rRNA-encoding gene sequencing has yielded unprecedent insights into the composition of in vivo bacterial communities (20)(21)(22).
Furthermore, the microbial community being transplanted with FMT is poorly understood both with regard to its composition and the health-promoting metabolic functions being introduced (33)(34)(35). Uncertainty at this level can decrease therapeutic efficacy and increase the risk of adverse events (19,36).
Translating composition data derived from 16S sequencing into an understanding of community function is a challenging problem. Gut bacteria often possess overlapping metabolic functions, such as their ability to synthesize secondary bile acids (37)(38)(39) and short-chain fatty acids like butyrate and propionate (40,41). Furthermore, numerous studies (42)(43)(44)(45)(46)(47) have demonstrated that microbiota taxonomic composition is an individual characteristic and usually an inadequate measure for assessing disease states. These critical gaps in knowledge exist because bacterial composition data alone is insufficient to characterize the metabolic state of the diseased gut and nutritional environments that are protective against CDI. The next step in metagenomic applications to microbiome research needs to be the translation of taxa composition data into quantitative information about bacterial community dynamics and function (48)(49)(50). In this study, a recently developed in silico metagenomics pipeline (mgPipe; 51) was applied to the problem of 6 identifying microbiota-based determinants of recurrent CDI. The pipeline was used to translate 16S-derived taxa abundances from stool samples of CDI and FMT patients into sample-specific models to quantify the metabolic capabilities of the modeled communities, which have been shown to correlate with clinical states in other microbiota-based disease processes (52,53).

Patient Data
Gut microbiota composition data were obtained from two published studies (13,54) in which patient stool samples were subjected to 16S rRNA gene amplicon library sequencing. The first study (13) included 225 longitudinal samples from 93 CDI patients ranging in age from 18 to 89 years. Each patient was characterized as either: nonrecurrent if a non-reinfected sample was collected >14 days after a previous C. difficile positive sample; recurrent if a positive sample was collected 15-56 days after a previous positive sample; and reinfected if a positive sample was collected >56 days after a previous positive sample (Table 1). Because patients in both groups were ultimately reinfected, the recurrent and reinfected patients were lumped together in this study and termed recurrent. The sample was defined as an index sample if it returned the first C. difficile positive for that patient, a pre-index sample if it was collected before the index sample, and postindex sample if it was collected after the index sample. The second study (54) included 40 samples from 14 FMT patients and 10 of their stool donors ( Table 1).
The 16S rRNA OTU reads available in the two original studies were generally at the genus and family taxonomic levels. These reads were mapped into taxa abundances for development of sample-specific community metabolic models. Using the 100 most abundant OTUs across the samples in each study, taxa abundances were derived as follows: (1) all OTUs belonging to the same taxonomic group were combined; (2) OTUs belonging to higher taxonomic groups (i.e. order 7 and above) were eliminated to maintain modeling at the genus and family levels; and (3) the reduced set of OTUs was normalized such that the abundances of each sample summed to unity.
To quantify the effect of eliminating higher-order taxa, the total reads in (3) were divided by the total reads in (2) to generate an unnormalized total abundance for each sample. For the CDI dataset, this procedure resulted in 48 taxa (40 genera, 8 families) that accounted from an average of 97.7% of the top 100 OTU reads across the 225 samples (Table S1). Due to the non-negligible level of the class Gammaproteobacteria in the FMT dataset (average abundance of 3.4%), this class was retained to generate 39 taxa (30 genera, 8 families, 1 class) that accounted for an average of 99.3% of the top 100 OTU reads across the 40 samples (Table S2).

Community Metabolic Modeling
Taxa represented in the normalized CDI and FMT samples were modeled using genome-scale metabolic reconstructions from the Virtual Metabolic Human (VMH) database (55; www.vmh.life; Figure S1). The function createPanModels within the metagenomics pipeline (mgPipe; 51) of the MATLAB Constraint-Based Reconstruction and Analysis (COBRA) Toolbox (56) was used to create higher taxa models from the 818 strain models available in the VMH database. The sample taxa were mapped to these pan-genome models according to their taxonomy (e.g. Clostridium cluster XI containing C. difficile was mapped to the family Peptostreptococcaceae). The function initMgPipe was used to construct a community metabolic model for each of the 225 CDI and 40 FMT samples. Model construction required specification of taxa abundances for each sample and maximum uptake rates of dietary nutrients, which was specified according to an average European diet (53;  8 performed flux variability analysis (FVA) for each model with respect to each of the 411 metabolites assumed to be exchanged between the microbiota and the lumen and fecal compartments. The FVA results were used to compute the net maximal production capability (NMPC; 51) of each metabolite by each model (Table S4 for CDI; Table S5 for FMT) as a measure of community metabolic capability.

Data analysis
Patient data consisted of normalized taxa abundances and model data consisted of calculated NMPCs, both of which could be connected to associated metadata on a sample-by-sample basis (Tables S1 and S2). Both types of data were subjected to unsupervised learning techniques including kmeans clustering and principal component analysis (PCA) to extract relationships between partitioned samples/patients and clinical parameters such as recurrence. Statistical significance of associations between categorial variables (e.g. recurrent/nonrecurrent) across samples/patient groups were assessed using Fisher's exact test. Correlations between taxa based on their abundances across samples/patients were calculated using the proportionality coefficient (57), which accounts for the effects of data normalization. Statistically significant differences between metabolite NMPCs across samples/patients were assessed using the Wilcoxon rank-sum test.

Clustered index samples were not predictive of recurrence
The 90 index samples remaining after removal of 3 samples containing less than 90% of modeled taxa were clustered using their normalized taxa abundances. The Davies-Bouldin criterion (58) indicated the optimal number of clusters to be 3, and the abundance data were clustered using the 9 kmeans method. The index samples were clustered into 20 samples with elevated Enterobactericeae and Enterococcus, 33 samples dominated by Bacteroides, and 37 samples with elevated Escherichia and Akkermansia ( Figure 1A). While the Enterobactericeae/Enterococcus cluster exhibited a higher proportion of recurrent samples than the other two clusters and the entire index dataset ( Figure 1B), none of these differences were significant (Fisher's exact test, p > 0.5).
When the index samples were analyzed with PCA, the abundance data showed structure with respect to the three clusters but not with respect to recurrent/nonrecurrent samples ( Figure 1C).  Figure S1A), demonstrating that the abundance data and the model-processed abundance data produced different clustering results ( Figure S1B). None of the clusters exhibited a higher proportion of recurrent samples (p > 0.25; Figure S1C), and PCA showed no distinct structure with respect to recurrent/nonrecurrent samples ( Figure S1D). Therefore, the index samples, which were collected prior to antibiotic treatment, were deemed to have little predictive value with respect to CDI recurrence.

Post-index samples clustered by metabolic capability were predictive of recurrence
The 119 post-index samples remaining after removal of samples containing less than 90% of  Figure 2B). Therefore, the metabolic model generated a larger cluster of high recurrent samples compared to the abundance data (28 samples from 22 patients versus 15 samples from 11 patients) at a higher level of statistical significance. The high recurrence Enterobactericeae cluster was distinguishable in the upper left quadrant of a PCA plot of the model-processed abundance data due to the unique metabolic capabilities of these clustered samples ( Figure 2C), an issue explored below in detail. Despite having 411 possible PCA components compared to the abundance data with 48 possible components, the model output data was more efficiently compressed with a small number of principal components (e.g. 58.2% versus 48.0% variance captured for 2 components; Figure 2D). Collectively, these results demonstrate the 11 potential benefit of model-based processed abundance data to quantify metabolic functions of sampled communities rather than relying on sample compositions alone.
The number of clusters was varied to further explore partitioning of the 119 modelprocessed post-index samples. Interestingly, 2 clusters also produced a relatively small group with elevated Enterobactericeae and Escherichia (34 samples) as well as generating a second larger group with elevated Enterococcus, Bacteroides and Lactobacillus (85 samples; Figure 3A). As the number of clusters was increased, the Enterobactericeae/Escherichia group split into two separate clusters and the Enterococcus/Bacteroides/Lactobacillus group split into three separate clusters ( Figure 3B-E). The high Enterobactericeae clusters retained their property of disproportionate recurrence compared to the high Enterococcus-elevated clusters for all cases (p < 0.04; Figure 3F), suggesting a possible supportive role for Enterobactericeae with respect to CDI recurrence during antibiotic treatment.

Clustered post-index samples exhibited distinct bile acid and aromatic amino acid metabolism
NMPCs of the 119 post-index samples with respect to each of the 411 exchanged metabolites were statistically analyzed to assess metabolic differences between the 3 clusters. For each pair of clustered samples, the Wilcoxon rank sum test was applied to the NMPCs on a metabolite-bymetabolite basis. To reduce the number of reported metabolites, statistically different metabolite NMPCs (p < 0.05) also were required to have an average NMPC > 50 mmol/day in at least one cluster and average NMPC that differed between the clusters by at least 100%. A comparison of the high Enterobactericeae cluster (HEb, 28 samples) and the high Enterococcus cluster (HEc, 28 samples) generated 44 differentially produced metabolites ( Figure S5, Table S6), with 19 metabolites associated with aromatic amino acid (AAA), bile acid (BA) and butanoate metabolism.
The HEb cluster and the high Bacteroides cluster (HBo, 63 samples) had 47 differentially produced metabolites ( Figure S6, Table S6), including 7 metabolites associated with AAA degradation elevated in the HEb cluster. Interestingly, 11 secondary BA metabolites were elevated in the HEc cluster compared to the HBo cluster, accounting for 25% of the differentially produced metabolites ( Figure S7, Table S6). Due to their differential utilization across the 3 clusters, the BA and AAA pathways were examined more carefully by collecting all metabolites belonging to these pathways that were allowed to be exchanged according to the metabolic models. The HEb cluster had the highest production capabilities for the two unconjugated primary BAs ( Figure 3A), which have been reported to either promote (cholate) or inhibit (chenodeoxyholate, C02528) C. difficile germination (59,60). By contrast, the HBo cluster generated the highest production of most secondary BAs, which are known to be generally protective against CDI (2,61,62). Interestingly, the HEc cluster had much lower production capabilities for secondary BAs. The HEb cluster consistently generated higher production of metabolites involved in AAA catabolism but not significantly higher production of the AAAs themselves ( Figure 3B, Table S6). This predicted AAA degradation ability was decreased in the HBo cluster and substantially lower in the HEc cluster, with the notable exceptions of the tyrosine degradation product tyramine (tyr) and the tryptophanderived metabolite tryptamine (trypta). Interestingly, the key AAA precursor chorismite (chor) was significantly elevated in the HEc cluster, yet the production capabilities of the AAA themselves were reduced in this cluster. Since the HEb cluster contained a disproportionate number of recurrent samples compared to the other 2 clusters, these predictions suggest a possible role for AAA metabolism in CDI recurrence.
Transient presence in the high Enterobactericeae cluster was sufficient for elevated patient recurrence 13 The In fact, Enterobacteriaceae and Peptostreptococcaceae abundances were only weakly correlated within the entire HEb group ( = -0.01). Therefore, transient presence in the HEb cluster was hypothesized to temporarily create a metabolic environment that promoted CDI recurrence through an increase in C. difficile toxicity rather than C. difficile expansion. To explore this hypothesis, the metabolite production capabilities of the HEb, HBo and HEc groups were compared. The metabolic signature of the HEb group ( Figure S9) was similar to that predicted when the HEb and HEc clusters were compared ( Figure S6, Table S6) and included elevated synthesis of metabolites known to induce (e.g. butyrate) and suppress (e.g. cysteine) the toxicity of C. difficile (67,68).  Figure 5A).
Since Cronobacter belongs to the family Enterobacteriaceae and these two taxa averaged 56.4% across the 11 samples, the small cluster was considered to be dominated by Enterobacteriaceae.
When PCA was performed on the model-processed abundance data, the Enterobacteriaceae-dominated cluster was clearly distinguishable and appeared to have an overrepresentation of pre-FMT patient samples ( Figure 5C). In fact, this cluster contained a disproportionately large number of pre-FMT samples (10/11) compared to both the Bacteroideselevated cluster (4/29; p < 0.0001) and the entire sample set (14/40; p = 0.0014; Figure 5D).
Additionally, the Enterobacteriaceae-dominated cluster had a disproportionately small number of 16 donor samples (0/11) and post-FMT patient samples (1/11) compared to the Bacteroides-elevated cluster (p = 0.038 and 0.027, respectively). The findings that the high Enterobacteriaceae (HEb) cluster studied earlier contained a disproportionately large number of recurrent CDI samples (see Figure 2) and the high Enterobacteriaceae cluster found here contained a disproportionately large number of pre-FMT samples provide additional support for the hypothesis that elevated Enterobacteriaceae is associated with recurrent CDI.
When similar analyses were applied directly to the abundance data, the dataset was split  Figure S11). Only 23 of these 60 metabolites were identified as being differentially produced between the HEb and HBo clusters defined from model processing of CDI post-index samples ( Figure S5; Table S6). Interestingly, 10 secondary BA metabolites and 4 AAA catabolic products were among the 37 newly identified metabolites. Therefore, BA and AAA metabolism in the HEb-FMT and HBo-FMT clusters were examined more carefully by comparing all secreted metabolites belonging to these pathways. The HEb-FMT cluster had decreased production of all 17 BA metabolites ( Figure 6A), including significantly reduced synthesis of 10 secondary BAs generally correlated with recurrent CDI (59,70,71). By contrast, the HEb-FMT cluster had enhanced AAA metabolism as evidenced by elevated production of all 3 AAAs and 15 AAA degradation products, including significantly increased synthesis of 8 degradation products ( Figure 6B). Given that the HEb-FMT cluster was overrepresented in pre-FMT samples and underrepresented in donor and post-FMT samples, these predictions provide additional support for the hypothesis that BA and AAA community metabolism may play key roles in CDI recurrence and treatment.
When the same analysis procedure was applied to NMPCs clustered according to 16Sderived abundance data, 46 metabolites differentially produced between the Enterobactericeaeelevated and Bacteroides-elevated clusters were identified ( Figure S12). Overproduction of AAA catabolic products in the Enterobactericeae-elevated cluster continued to be pronounced, but differences in secondary BAs between the two clusters were no longer evident. The inability of the clustered abundance data to generate differential predictions of BA metabolism was attributed to the Enterobactericeae-elevated cluster containing 1 donor and 5 post-FMT samples in addition to all 14 pre-FMT samples. Therefore, clustering the samples according to model-processed abundance data appeared to offer advantages for understanding community metabolic changes resulting from FMT.

Discussion
An in silico metagenomics pipeline was used to translate 16S-derived abundance data into samplespecific community models for investigating the metabolic determinants of recurrent CDI. The models allowed sample-by-sample predictions of metabolite production rates that were used both to cluster samples according to their functional metabolic capabilities and to provide mechanistic insights into clusters exhibiting high recurrence. Community model predictions were dependent both on the taxonomic groups represented in the 16S data and the fidelity of individual taxa metabolic models. The CDI (PMC4847246) and FMT (54) datasets used in this study captured taxonomic differences primary at the genus and family levels and therefore precluded modeling metabolism at the strain and species levels (53). Despite this limitation, the pan-genome metabolic models used for community modeling allowed substantial differentiation of samples according to their functional capabilities.
Taxa abundance data and model-processed abundance data were clustered to determine if the resulting clusters exhibited statistically significant differences between the number of recurrent CDI samples. No significant differences were observed when only index samples were tested, suggesting that community composition prior to CDI treatment may provide limited information about recurrence. By contrast, both abundance data and model-processed abundance data derived from post-index samples identified high Enterobacteriaceae, low Bacteroides clusters as having disproportionate numbers of recurrent samples. Numerous studies have identified Enterobacteriaceae as positively associated and Bacteroides as negatively associated with primary CDI (63)(64)(65) and to a lesser extent with subsequent reinfection (72,73). The analyses presented here suggest CDI recurrence is more dependent on community response to antibiotic therapy than on the community composition entering therapy. Indeed, first-line antibiotics for CDI treatment including metronidazole and vancomycin are known to collaterally target Bacteroides (74,75) while having little efficacy against Enterobacteriaceae (76)(77)(78). Unfortunately, the metadata available for these samples only reported if the patient received antibiotic therapy prior to CDI diagnosis. Since next generation antibiotics such as fidaxomicin used for recurrent CDI are more specific for C. difficile and are known to spare Bacteroides (79, 80), knowledge of which antibiotics were used to treat the patients represented in the high Enterobacteriaceae clusters would enable additional analysis.
As compared to direct use of abundance data, an advantage of utilizing predicted metabolite production rates for sample clustering was that the high Enterobacteriaceae (HEb) cluster contained more samples (28 vs. 15) representing more patients (22 vs. 11). The modelbased cluster included samples with a high combination of Enterobacteriaceae and Escherichia, which have similar metabolic capabilities since Escherichia is a genus within Enterobacteriaceae.
The capability to collapse samples with different compositions but similar metabolic features is useful when dealing with 16S-derived abundance data at several taxonomic levels, a common situation in human microbiome research.
Another benefit of quantifying metabolic capabilities through modeling was the ability to predict differentially synthesized metabolites across sample groups. When compared to a more taxonomically diverse cluster with elevated Bacteroides (HBo cluster) and no statistical difference in recurrence, the HEb cluster was predicted to have significantly reduced capabilities for secondary bile acid (BA) synthesis. These predictions were generally consistent with the established role of BA metabolism in recurrent CDI, as elevated primary BA and reduced secondary BA levels are known to be a disease signature (59,70,71). The specific effects of individual BA metabolites are more nuanced, as the primary BA cholate is known to induce 20 germination of C. difficile spores, the primary BA chenodeoxycholate suppresses both germination and vegetative growth and the secondary BA deoxycholate induces germination but suppresses growth (64,81). To achieve prediction at this level of granularity, the metabolic models would need to be constructed with 16S-derived abundance data at lower taxonomic levels since individual species and strains are known to have distinct BA metabolism (53).
Despite having no statistical difference in recurrence, a third cluster elevated in Enterococcus and to a lesser extent Lactobacillus (HEc cluster) had significantly reduced capabilities for secondary BA synthesis compared to both the HEb and HBo clusters. These predictions underscore the fact that recurrent CDI is a complex disease and not likely to be completely explained by a single factor such as community BA metabolism (54,65). Interestingly, model-based analysis revealed aromatic amino acid (AAA) metabolism as a second putative mechanism underlying increased recurrence in the HEb cluster. More specifically, this cluster was predicted to have significantly increased synthesis of numerous AAA degradation products compared to the two lower recurrence clusters. Enterobacteriaceae is thought to be largely responsible for AAA catabolism in the gut (41, 82), and AAA synthesis has been implicated as a metabolic function protective against CDI (83). C. difficile isolates have been shown to have highly variable AAA metabolisms (84), opening the possibility that Enterobacteriaceae interactions with C. difficile are isolate dependent. However, the 22 patients represented in the HEb cluster were reported to have been infected with at least 9 different C. difficile ribotypes. While evidence directly linking AAA metabolism and CDI is currently lacking, the modeling work presented here suggests that this putative connection could be a fruitful area for future experimental studies.
All samples from each patient with at least one sample in the HEb cluster were collected to allow longitudinal analysis of individual patients. This HEb group had a disproportionate number of recurrent patients (19/22) compared to the larger patient population. HEb group patients exhibited compositionally variable communities that routinely switched between clusters, suggesting that transient presence in the HEb cluster could be sufficient for CDI recurrence. Since Enterobacteriaceae and C. difficile abundances had a very weak negative correlation within the HEb group, Enterobacteriaceae did not seem to support C. difficile vegetative growth but may have induced spore germination and/or enhanced toxicity of vegetative cells. As discussed above, the BA metabolite profile predicted for the HEb cluster was consistent with enhanced germination.
C. difficile toxicity is thought to be regulated by a number of metabolites (67,68,85). Two of the most potent regulators are toxicity-inducing butyrate and toxicity-suppressing cysteine, both of which were predicted to be elevated in the HEb cluster so as to have opposing effects. An intriguing but entirely speculative possibility is that AAA degradation products from Enterobacteriaceae induced C. difficile toxicity.
To test consistency of model predictions derived from the CDI dataset, the in silico metagenomics pipeline was applied to 40 samples obtained from FMT patients and their stool donors (54). Clustering of model-processed abundance data generated a cluster with a disproportionate number of pre-FMT samples, suggesting distinct metabolic function compared to donor and post-FMT communities as has been reported (70, 86,87). This cluster had elevated Cronobacter and Enterobacteriaceae with very low Bacteroides. Because Cronobacter is a member of Enterobacteriaceae, this cluster was identified as high Enterobacteriaceae and was compositionally similar with the high recurrence HEb cluster found in the CDI dataset. A second cluster comprised mainly of donor and post-FMT samples was elevated in Bacteroides and Lachnospiraceae and compositionally similar to the HBo cluster identified from CDI samples.
Consistent with these results, Cronobacter was found to be strongly positively correlated with 22 Enterobacteriaceae and strongly negatively correlated with Bacteroides across the FMT dataset.
These predictions agreed with observations that FMT tends to decrease the abundances of Enterobacteriaceae and other Proteobacteria (54,73,88)  limitations of metabolic modeling at higher taxonomic levels and the potential value of more resolved 16S rRNA sequence data. Despite these differences, the HEb-FMT cluster still exhibited reduced secondary BA levels observed in recurrent CDI (59,70,71) and resolved through FMT (70,71,91). The HEb-FMT cluster also was predicted to have the capability for elevated AAA degradation including increased synthesis of the catabolic products phenylpyruvic acid, tyramine and tryptamine derived from phenylalanine, tryrosine and tryptophan, respectively. Because they also were elevated in the HEb-CDI cluster compared to high Bacteroides (HBo-CDI) cluster, these 3 metabolites might make interesting experimental targets for their ability to induce germination and/or enhance toxicity of C. difficile clinical isolates. an HEb cluster-like composition, as Enterobacteriaceae would require sufficient time to establish favorable metabolic conditions for C. difficile pathogenicity. While intriguing, such speculation was impossible to test with the available dataset due to infrequent and irregular sampling. The most obvious explanation for the observed discrepancies is that recurrent CDI has a very complex disease etiology that depends on host-microbiota-environment interactions, both metabolic and non-metabolic. Therefore, the inability to fully predict patient recurrence based only on modelprocessed 16S-derived abundance was hardly surprising. However, the hypotheses that high Enterobacteriaceae-containing communities are more prone to recurrence and that recurrence may be partially attributable to the combination of disrupted BA and AAA metabolism seems worthy of further investigation through the type of integrated metagenomics-modeling framework utilized in this study.   Table S7.