Stool Microbiome and Metabolome Differences between Colorectal Cancer Patients and Healthy Adults

In this study we used stool profiling to identify intestinal bacteria and metabolites that are differentially represented in humans with colorectal cancer (CRC) compared to healthy controls to identify how microbial functions may influence CRC development. Stool samples were collected from healthy adults (n = 10) and colorectal cancer patients (n = 11) prior to colon resection surgery at the University of Colorado Health-Poudre Valley Hospital in Fort Collins, CO. The V4 region of the 16s rRNA gene was pyrosequenced and both short chain fatty acids and global stool metabolites were extracted and analyzed utilizing Gas Chromatography-Mass Spectrometry (GC-MS). There were no significant differences in the overall microbial community structure associated with the disease state, but several bacterial genera, particularly butyrate-producing species, were under-represented in the CRC samples, while a mucin-degrading species, Akkermansia muciniphila, was about 4-fold higher in CRC (p<0.01). Proportionately higher amounts of butyrate were seen in stool of healthy individuals while relative concentrations of acetate were higher in stools of CRC patients. GC-MS profiling revealed higher concentrations of amino acids in stool samples from CRC patients and higher poly and monounsaturated fatty acids and ursodeoxycholic acid, a conjugated bile acid in stool samples from healthy adults (p<0.01). Correlative analysis between the combined datasets revealed some potential relationships between stool metabolites and certain bacterial species. These associations could provide insight into microbial functions occurring in a cancer environment and will help direct future mechanistic studies. Using integrated “omics” approaches may prove a useful tool in identifying functional groups of gastrointestinal bacteria and their associated metabolites as novel therapeutic and chemopreventive targets.


Introduction
A healthy gastrointestinal system relies on a balanced commensal biota to regulate processes such as dietary energy harvest [1], metabolism of microbial and host derived chemicals [2], and immune modulation [3]. Accumulating evidence suggests that the presence of microbial pathogens or an imbalance in the native bacterial community contributes to the development of certain gastrointestinal cancers. A causal relationship between gastric cancer and Helicobacter pylori has been established [4], leading to the hypothesis that other host-associated organisms are involved in cancer etiology.
An association between colorectal cancer (CRC) and commensal bacteria has been suspected for decades. For example, Streptococcus infantarius (formerly S. bovis) became diagnostically important after it was recognized that bacteremia due to this organism was often associated with colorectal neoplastic disease [5,6]. However, early studies associating genera of bacteria with colon cancer risk were limited to culture-based methods that did not reflect the complexity of the gastrointestinal microbiota [7][8][9]. Development of highthroughput sequencing has facilitated detailed surveys of the gut microbiota, and a more thorough and complex colorectal cancer (CRC)-associated microbiome is emerging. Sobhani et al. [10] found that the Bacteroides/Prevotella group was over-represented in both stool and mucosa samples from individuals with colon cancer compared to their cancer-free counterparts. They also found that Bifidobacterium longum, Clostridium clostridioforme, and Ruminococcus bromii were underrepresented in samples from these individuals and concluded that a lack of correlation between tumor stage/size with the over-represented populations suggested a contributory role of the bacteria in tumor development. Two additional studies, published concurrently, examined the microbiota present in the tumor mucosa and adjacent healthy tissue of individuals with colon cancer and both studies revealed an overrepresentation of Fusobacterium spp [11,12], while others have revealed an abundance of Coriobacteria and other probiotic species [13,14].
The question remains whether over-representation of particular microbial species in stool and mucosal samples is indicative of a contributory role in the development of CRC or a consequence of the tumor environment. Although a causal role of intestinal biota in CRC development has not been demonstrated, there is evidence to suggest that induction of pro-inflammatory responses by commensals contribute to tumor initiation and development [10,14]. Production of genotoxins and DNA damaging superoxide radicals are also mechanisms by which commensals can contribute to CRC development [15]. Alternatively, it has been hypothesized that certain probiotic bacteria act as tumor foragers, taking advantage of an ecological niche created by the physiological and metabolic changes in the tumor microenvironment [14].
To clarify the role of intestinal biota in the development of CRC, it will be necessary to move beyond taxonomic overrepresentation and examine changes in the CRC associated microbiome in a more functional context. One important functional parameter is how commensal organisms contribute to the flux of metabolites and the breakdown of dietary components. Thus, metabonomics, the study of global changes in metabolites in response to biological stimuli [16], is being applied to identify and characterize the functional microbiome that drives metabolic changes associated with different diets, genotypes, and disease states [17][18][19]. Stool metabolite profiles have been validated as a means of assessing gut microbial activity [20] and the current study contributes to the growing list of gut microbes in the CRC microbiome, but also utilizes a metabonomics approach to identify potential microbiome-metabolome interactions.

Ethics Statement
All individuals provided written informed consent prior to participating in the study. All study protocols were approved by Colorado State University (Protocol numbers 10-1670H and 9-1520H) and Poudre Valley Hospital-University of Colorado Health System's Institutional Review Boards (Protocol numbers 10-1038 and 10-1006).

Sample Collection and DNA Extraction
Stool samples were collected from healthy individuals (n = 11) and recently diagnosed colon cancer patients (n = 10) prior to surgery for colonic resection (Table 1-note: not all samples were subjected to all analyses. See Table 1 footnote). Exclusion criteria for all participants included use of antibiotics within two months of study participation, and regular use of NSAIDS, statins, or probiotics. Individuals that reported chronic bowel disorders or food allergies/dietary restrictions were also excluded from the study. Additional exclusion for CRC patients included chemotherapy or radiation treatments prior to surgery. Stool samples were provided for analyses prior to administration of any preoperative antibiotics or bowel preparation. Samples were transported to the laboratory within 24 hours after collection by study participants. Stool samples were homogenized, and three subsamples were collected with sterile cotton swabs. DNA was extracted from all samples using MoBio Powersoil DNA extraction kits (MoBio, Carlsbad, CA) according to the manufacturer's instructions and stored at 220uC prior to amplification steps.

Pyrosequencing Analysis
Amplification of the V4 region of the bacterial 16S rRNA gene was performed in triplicate using primers 515F and 806R labeled with 12-bp error correcting Golay barcodes [21]. Twenty ml  [22] using the default settings unless otherwise noted. Briefly, sequence reads were (i) trimmed (bdiff = 0, pdiff = 0, qaverage = 25, minlength = 100, maxambig = 0, maxhomop = 10); (ii) aligned to the bacterial-subset SILVA alignment available at the Mothur website (http://www.mothur.org); (iii) filtered to remove vertical gaps; (iv) screened for potential chimeras using the uchime method; (v) classified using the Green Genes database (http://www.mothur.org) and the naïve Baysian classifier [23] embedded in Mothur. All sequences identified as chloroplast were removed; (vi) sequences were screened (optimize = minlength-end, criteria = 95) and filtered (vertical = T, trump = .) so that all sequences covered the same genetic space; and (vii) all sequences were pre-clustered (diff = 2) to remove potential pyrosequencing noise and clustered (calc = onegap, coutends = F, method = nearest) into OTUs [24]. To remove the effect of sample size on community composition metrics, sub-samples of 1250 reads were randomly selected from each stool sample. After clustering sequence reads into OTUs (i.e., nearest-neighbors at 3% genetic distance) or phylotypes (i.e., sequences matching a common genus in the Green Genes Database), the replicate sub-samples were averaged to yield a single community profile for each sample. Sample size independent values for alpha diversity community descriptors such as observed species richness (S obs ), Chao1 estimates of total species richness (S Chao ), Shannon's diversity (H') and evenness (E H ), and Simpson's diversity (1-D) and evenness (E D ) were determined by fitting a 3-parameter exponential curve [y = y0+ a(1-e 2bx )] to rarified parameters over a range of 100 to 1250 sequence reads, where the asymptotic maxima is equal to the sum of y0 and a. Effective number of species were calculated as S H = exp (H') for the Shannon's index and S D = 1/D for Simpson's. All sequence data is publicly available through the Sequence Read Archive (SRA) under study accession number ERP002217, which is available at the following link: http://www. ebi.ac.uk/ena/data/view/ERP002217.

Nontargeted Metabolite Profiling and Data Processing Methods
One hundred milligrams of lyophilized stool sample were extracted two times with 1 ml of 3:2:2 isopropanol:acetonitrile:water spun at 14,000 rpm for 5 minutes and the supernatants were combined. The extract was dried using a speedvac, resuspended in 50 mL of pyridine containing 15 mg/mL of methoxyamine hydrochloride, incubated at 60uC for 45 min, sonicated for 10 min, and incubated for an additional 45 min at 60uC. Next, 50 mL of N-methyl-N-trimethylsilyltrifluoroacetamide with 1% trimethylchlorosilane (MSTFA +1% TMCS, Thermo Scientific) was added and samples were incubated at 60uC for 30 min, centrifuged at 30006g for 5 min, cooled to room temperature, and 80 mL of the supernatant was transferred to a 150 mL glass insert in a GC-MS autosampler vial. Metabolites were detected using a Trace GC Ultra coupled to a Thermo DSQ II (Thermo Scientific). Samples were injected in a 1:10 split ratio twice in discrete randomized blocks. Separation occurred using a 30 m TG-5MS column (Thermo Scientific, 0.25 mm i.d., 0.25 mm film thickness) with a 1.2 mL/min helium gas flow rate, and the program consisted of 80uC for 30 sec, a ramp of 15uC per min to 330uC, and an 8 min hold. Masses between 50-650 m/z were scanned at 5 scans/sec after electron impact ionization. For each sample, a matrix of   molecular features as defined by retention time and mass (m/z) was generated using XCMS software [25]. Features were normalized to total ion current, and the relative quantity of each molecular feature was determined by the mean area of the chromatographic peak among replicate injections (n = 2). Molecular features were formed into peak groups using AMDIS software [26], and spectra were screened in the National Institute for Technology Standards (www. nist.gov) and Golm (http://gmd.mpimp-golm.mpg.de/) metabolite databases for identifications. SCFA determination. Stool samples were extracted for short chain fatty acids by mixing 1 g of frozen feces with acidified water (pH 2.5) and sonicated for 10 min. Samples were centrifuged and filtered through 0.45 mM nylon filters and stored at 280uC prior to analysis. The samples were analyzed using a Trace GC Ultra coupled to a Thermo DSQ II scanning from m/z 50-300 at a rate of 5 scans/second in electron impact mode. Samples were injected at a 10:1 split ratio, and the inlet was held at 22uC and transfer line was held at 230uC. Separation was achieved on a 30 m TG-WAX-A column (Thermo Scientific, 0.25 mm ID, 0.25 mm film thickness) using a temperature program of 100uC for 1 min, ramped at 8uC per minute to 180uC, held at 180uC for one minute, ramped to 200uC at 20uC/minute, and held at 200uC for 5 minutes. Helium carrier flow was held at 1.2 mL per minute. Peak areas were integrated by Thermo Quan software using selected ions for each of the short chain fatty acids, and areas were normalized to total signal.

Statistical Analysis
Differences in bacterial phylotypes and global metabolites between samples from healthy individuals and colon cancer  patients were determined using AMOVA and student's t-tests with a significance cutoff of ,0.01. Phylotypes and metabolites that were significantly different between groups were further refined by removing markers that had fewer than 25 total reads (bacteria) or borderline background signals (metabolites) or that were present in fewer than 3 individual samples. Short chain fatty acid concentrations were determined in two separate chromatographic runs, so a weighted mean was calculated for each quantified compound and statistical differences between stool samples from healthy individuals and colon cancer patients were determined using a mixed model ANOVA with experiment representing a random effect and disease status as a fixed effect (XLSTAT 2011.1, Addinsoft Corp, Paris, France). Correlations between metabolites and bacteria were determined using Pearson's r with a moderate correlation denoted by an r$0.50 and a strong correlation denoted by an r$0.70.

Alpha and Beta Diversity in Stool Biota
Typical community descriptors of alpha diversity for molecular microbial data include actual and estimated OTU richness, and indices of population diversity and evenness. In systems where pathogens are introduced (e.g. Helicobacter pylori), there are marked decreases in estimates of diversity and evenness [27] suggesting that these indices may be useful predictors of infection. We examined these parameters in stool samples from healthy individuals and those with CRC to see if they could be used as predictors of disease state.We observed no significant differences at the 3% genetic distance in the average diversity or evenness of stool microbial communities from healthy individuals compared to those with CRC (Table S1). The average coverage obtained from 1250 reads per sample was 84% and 86% in healthy and colon cancer samples respectively. The average effective diversity of each group suggested a trend toward higher bacterial diversity in stool samples of healthy individuals (S H = 63 , S D = 20) compared to those from CRC patients (S H = 46 , S D = 15); however, the interindividual variation was too great to achieve statistical significance. Based on these data, we suggest that alpha diversity descriptors of stool microbiota are not indicative of disease state in CRC; although a limitation of this study is that only stool samples and not tissue mucosa were analyzed. However, despite inherent differences in stool and mucosal microbial communities our findings are consistent with other published reports of total bacterial diversity and evenness estimates between CRC and healthy stool and tissue/mucosasamples [10].
This inter-individual variation was also apparent in estimates of beta diversity, where a low degree of similarity in overall microbial community composition between individuals was observed as determined using the unweighted Jaccard distance (J class ) to compare community membership ( Figure 1A) and Yue and Clayton's [28] index (H YC ) to compare community structures ( Figure 1B). Because of this variation, no patterns in the overall community composition were noted between stool samples from CRC patients and healthy individuals.

Taxonomic Differences between CRC and Healthy Stool Samples
The disease status of study participants did not drive overall community structure of the stool microbiota, and the composition and relative abundance of the major phyla were similar, although there was a non-significant trend towards higher Verrucomicrobia in samples from colon cancer patients ( Figure 2). There were also higher levels of Synergetes in the cancer group, but this was driven by a single individual with an extremely high proportion of this phyla and was not representative of the entire sequenced cancer population. However, at the genus/species level there were a number of OTU's that were significantly under-represented in the stool of colon cancer patients compared to healthy individuals ( Table 2). These include several Gram-negative Bacteroides and Prevotella spp. that have previously been isolated from human stool, but are not well characterized with regards to their role in intestinal function or general health. Two of the Prevotella species identified were not only under-represented, but were completely absent from the colon cancer samples analyzed. Prevotella was a dominant genera reported in stool from children in a rural community in Burkina Faso but absent from a cohort of Italian children, and the study authors hypothesized that Prevotella helped maximize energy harvest from a plant-based diet [29]. Therefore, it is possible that the higher levels of Prevotella in the healthy cohort may reflect differences in the intake of fiber and other plant compounds compared to the individuals with colon cancer. At the genus level, Shen et al [30] found the Bacteroides spp. to be enriched in colonic tissue from healthy individuals when compared to adenoma tissue. Lachnospiracae and members of the genera Dorea and Ruminococcus were also previosly reported as dominant phylotypes driving differences between healthy and cancerous tissue samples [13]. The other OTUs that we identified such as the Dialister spp. and Megamonas spp. have not previously been reported in association with colon cancer; however, decreased populations of Dialister invisus have been reported in Crohn's disease [31]. There were fewer identifiable bacteria that were overrepresented in the colon cancer population (Table 3). Most notably, we observed that the mucin-degrading bacteria, Akkermansia muciniphila, which represented a relatively large percentage of the total sequences, was present in a significantly greater proportion in the feces of colon cancer patients. This bacterium is a common member of the colonic microbiota and was recently shown to be reduced in irritable bowel syndrome and Crohn's Disease [32]; however a more recent report showed increased A. muciniphila in ulcerative colitis-associated pouchitis [33]. Two types of mucins, MUC1and MUC5AC, are reportedely overexpressed in colon cancers [34], suggesting that our observed CRC-related increases in A. muciniphila populations may be due to increased substrate availability. Citrobacter farmeri, which can utilize citrate as a sole carbon source was also higher in samples from colon cancer patients, but represented a much smaller proportion of the total bacterial sequences. Citrobacter farmeri is among a group of gut bacteria that includes multiple pathogenic species like Salmonella and Shigella, and which has arylamine N-acetyltransferase activity that may be involved in activation of carcinogens and xenobiotic metabolism [35].
Age and BMI represent other factors that play a role in shaping the intestinal microbial communities. Several reports have demonstrated a correlation between the ratio of Bacteroidetes to Firmicutes and obesity [1]. We conducted linear regressions between the relative abundance of each of the taxa that significantly differed between CRC and healthy stools (see Tables 2 and 3) and BMI and saw no significant correlations (Table S2). In addition, aging has been associated with a decrease in protective commensal anaerobes, such as Feacalibacterium prausnitzii, and an increase in E. coli [36]. We did find a negative correlation between the age of participants and Dorea formicagens (R 2 = 0.354; p = 0.041) and Ruminococcus obeum (R 2 = 0.434; p = 0.020), both members of the Clostridium XIVa group, suggesting that differences between cohorts with regard to these two species may be a result of differences in the mean age of participants in each group rather than CRC disease status. To our knowledge, a decline in the population of Clostridium XIVa group members has not been previously associated with aging, but has been associated with dysbiosis related to intestinal inflammatory conditions such as Crohn's disease [37]. None of the other bacterial taxa identified were correlated with age (Table S3). Therefore, we conclude that the majority of taxa that significantly differed in stool samples between healthy and CRC cohorts was a result of disease status and not of differences in age or BMI.

Short Chain Fatty Acid Analysis
Short chain fatty acids (SCFA), particularly butyrate, are widely studied microbial metabolites reported to have anti-tumorigenic effects [38]. SCFA's are readily absorbed and utilized in host tissues so detection in stool is typically considered an indication of production in excess of that which can be utilized by the host [29]. We and others [10,13] have observed that species of butyrate producing bacteria, such as Ruminococcus spp. and Pseudobutyrivibrio ruminis, were lower in stool samples from CRC patients compared to healthy controls. Therefore, we quantified several short chain fatty acids from frozen stool samples. The three major SCFAs produced as microbial metabolites, acetate, propionate, and butyrate, were all detected as were valeric, isobutyric, isovaleric, caproic, and heptanoic acids. Among these, acetic and valeric acids were significantly higher in stool samples from CRC patients (p,0.0001 and p = 0.024 respectively) while butyric acid was significantly higher in the feces of healthy individuals (p,0.0001; Figure 3). No differences in propionic acid were detected between the two groups. Butyrate is regarded as one of the most important nutrients for normal colonocytes, and alone or in combination with propionate it has been shown to reduce proliferation and induce apotosis in human colon carcinomas [39]. Although acetate is an important SCFA for maintaining colonic health and as a precurser molecule for endogenous cholesterol production, elevated levels of this metabolite have previously been associated with CRC in humans [40]. Acetate can be used to produce butyrate and proportional differences in these metabolites between CRC and healthy samples may reflect a depletion of colonic microbes that can carry out this reaction in CRC samples or it may be a result of degradation of butyrate to acetate under low colonic pH associated with CRC. We also observed significantly higher relative concentrations of isobutyric (p,0.0001) and isovaleric acid (p = 0.002) in samples from individuals with CRC ( Figure 3). These two SCFA's result from bacterial metabolism of branched chain amino acids valine and leucine, which were also higher in CRC stool samples (Table 4), and may account for the significant increases observed in these two SCFAs in the CRC population.

Global Stool Metabolites
Stool samples allow for evaluation of bacteria residing in the intestinal lumen, and therefore, stool small molecules are considered to result from co-metabolism or metabolic exchange between microbes and host cells [13]. Global metabolite profiling performed herein on lyophilized stool samples provided insights into the relationship between microbial populations and metabolites, and lend to the identification of novel CRC metabolic biomarkers. The supervised multivariate analysis technique, Orthogonal Projection to Latent Structures-Discriminant Analysis (OPLS-DA), which facilitates interpretation by separately modeling predictive and orthogonal (non-predictive) variables, was used to determine if non-targeted GC-MS profiles were predictive of disease state of the donor. The OPLS-DA demonstrated satisfactory modeling and predictive capabilities for this dataset (R2Y = 0.986; QY2 = 0.927), revealing a distinct separation between stool metabolic features of the two groups (Figure 4), suggesting that presence or absence of CRC is an important factor driving the variability in stool metabolites.
Compared to healthy controls, stool metabolome analysis revealed 11 amino acids that showed a 41-80% increase in stool samples of individuals with CRC ( Table 4). Reasons that could account for this CRC-associated increase in amino acid concentrations may include, but not be limited to differences in protein consumption patterns, inflammation-induced reduction in nutrient absorption, and increased autophagy associated with tumor cells resulting in accumulation of free amino acid pools [41]. Microbial degradation of dietary proteins in the distal colon is a putreficative process that results in the production of toxic amines, and may account for the increased free amino acids we observed in CRC stool samples. An increased concentration of all amino acids except glutamine was previously reported in stomach and colon tumor tissues compared to healthy tissue [42]. The authors hypothesized that tumor cells may exhibit increased glutaminase activity resulting in glutamine conversion to glutamate. Consistent with these findings, we also saw a large increase, approximately 76%, in glutamate without a corresponding increase in glutamine in stool samples from colon cancer patients. Another recent study using NMR to identify and detect metabolites from stool water extracts from healthy and CRC samples showed that the CRC samples had approximately 1.5-fold higher levels of cysteine, proline, and leucine [43]. The increased concentrations of proline, serine, and threonine that were observed in CRC samples could also be the result from degradation of intestinal mucins, which are primarily comprised of glycoproteins rich in these amino acids [44]. This is consistent with the enrichment of Akkermansia muciniphila, a mucin-degrading bacteria, observed in CRC stool samples; although we saw no strong correlations between the relative proportion of these bacteria and specific amino acid concentrations.
There were higher levels of glycerol as well as several unsaturated fatty acids detected in the stool samples of healthy individuals. Human cancer cells have a known transport system for the uptake of glycerol, suggesting stool glycerol may be lower in CRC because it is being taken up by the tumor cells. Alternatively, bacterial lipases present in healthy individuals may facilitate the metabolism of dietary and endogenously produced triacylglycerols, resulting in the final degradation products of glycerol and free fatty acids. In addition to glycerol, fatty acids most closely matching metabolomic signatures for linoleic acid, and stereoisomers of oleic acid were also higher in controls (Table 4). Finally, ursodeoxycholic acid (UDCA), a secondary bile acid produced by intestinal bacteria was approximately 63% higher in healthy individuals compared to CRC. While several bile acids such as lithicolic acid and deoxycholic acid have been associated with tumorigenesis, UDCA has shown chemopreventive effects in preclinical and animal models of CRC [45].
Correlation analysis of the microbiome and metabolome data revealed strong associations between some members of the stool microbiota and candidate metabolites. Bacteroides finegoldii, two Dialister spp., and P. ruminis were strongly correlated, and Bacteroides intestinalis and Ruminococcus obeum were moderately correlated with increased stool free fatty acids and glycerol ( Figure 5). These same bacteria were inversely associated with a cholesterol derivative and one or more of the amino acids that were overrepresented in stool samples from CRC patients. The two Ruminococcus spp. also showed a strong positive correlation with the presence of UDCA, in concurrence with previous reports that Ruminoccoccus species exhibit 7a-and 7b-hydroxysteroid dehydrogenase activities to produce this metabolite [46]. Two of the bacterial genera overrepresented in CRC, Phascolarctobacterium and Acidiminobacter showed a strong positive association with the amino acids phenylalanine and glutamate, and were moderately correlated with increased serine and threonine ( Figure 5). Glutamate can be utilized by these bacteria as a substrate, but their association with serine and threonine could also be indicative of involvement in mucin degradation or putrificative processes in the colon and warrant further study.
Extensive attempts to characterize CRC microbiota have led to new hypotheses as to how the gut microbiota influences CRC development. One hypothesis suggests that there are ''driver bacteria'' with pro-carcinogenic features that contribute to tumor development and ''passenger bacteria'' that may outcompete drivers to flourish in the tumor environment as the cancer progresses [47]. Available metabolites, those produced by bacteria and those that they utillize as substrates will largely drive these host-microbiome interactions. Integrating metabolome and microbiome datasets is a novel approach towards finding new directions to functionally characterize the microbiota in terms of their metabolic activity relative to cancer will greatly assist in our understanding of this complex host-microbe interaction.

Supporting Information
Table S1 Comparison of observed and estimated OTU richness and diversity and evenness indices between microbial communities from stool of CRC patients and healthy adults.