Taxonomic differences of gut microbiomes drive cellulolytic enzymatic potential within hind-gut fermenting mammals

Host diet influences the diversity and metabolic activities of the gut microbiome. Previous studies have shown that the gut microbiome provides a wide array of enzymes that enable processing of diverse dietary components. Because the primary diet of the porcupine, Erethizon dorsatum, is lignified plant material, we reasoned that the porcupine microbiome would be replete with enzymes required to degrade lignocellulose. Here, we report on the bacterial composition in the porcupine microbiome using 16S rRNA sequencing and bioinformatics analysis. We extended this analysis to the microbiomes of 20 additional mammals located in Shubenacadie Wildlife Park (Nova Scotia, Canada), enabling the comparison of bacterial diversity amongst three mammalian taxonomic orders (Rodentia, Carnivora, and Artiodactyla). 16S rRNA sequencing was validated using metagenomic shotgun sequencing on selected herbivores (porcupine, beaver) and carnivores (coyote, Arctic wolf). In the microbiome, functionality is more conserved than bacterial composition, thus we mined microbiome data sets to identify conserved microbial functions across species in each order. We measured the relative gene abundances for cellobiose phosphorylase, endoglucanase, and beta-glucosidase to evaluate the cellulose-degrading potential of select mammals. The porcupine and beaver had higher proportions of genes encoding cellulose-degrading enzymes than the Artic wolf and coyote. These findings provide further evidence that gut microbiome diversity and metabolic capacity are influenced by host diet.


Introduction
The microbiome supports animal digestion by detoxifying and breaking down indigestible compounds [1]. There is accumulating evidence that microbial enzymes responsible for these processes are highly conserved across species, and strongly influence overall composition of the microbiome [2]. Intensive study of the human gut microbiome provides a framework for investigating the phylogenetic diversity of wildlife gut microbiomes [2][3][4][5][6][7][8][9][10][11][12][13][14]. Similar  has previously been observed in human microbiomes [15], the gut microbial diversity of beavers [10], ruminant mammals [14], and giant pandas [16], is strongly influenced by diet. This provides ample rationale for exploring the gut microbiome as a genetic repository of novel enzymes involved in processing diverse dietary components. Animal gut microbiomes provide a potentially rich source of novel microbial genes that can be leveraged for bioengineering applications, including breakdown of complex plant-derived materials [17]. Lignocellulose, a component of the plant cell wall, is an attractive low-cost substrate for biofuel production via microbial fermentation [10]. Before lignocellulose can be fermented, it must be broken down into component sugars by cellulases, hemicellulases, and various debranching enzymes [18]. The porcupine, Erethizon dorsatum, is an herbivore that feeds on lignified plants, coniferous (preferred) and deciduous cambium (inner bark), and flowers [19]. As a hindgut fermenter, the porcupine has an enlarged cecum where microbial digestion is largely confined [20]. Because the primary diet of porcupines is lignified plant material, we hypothesized that the porcupine microbiome would be replete with enzymes required to degrade lignocellulose.
The primary objective of this study, completed by undergraduate students of the Dalhousie iGEM team, was to characterize the gut microbiome of the porcupine along with 20 other species (within the Rodentia, Carnivora, and Artiodactyla orders) from the Shubenacadie Wildlife Park in Nova Scotia, Canada. All species were profiled using 16S rRNA gene sequencing. Mammals in the same order had more similar microbiome profiles compared to those with differing taxonomic classification. Metagenomic shotgun sequencing was conducted on select representative herbivores (porcupine, Castor canadensis (beaver)) and carnivores (Canis latrans (coyote), Canis lupus arctos (Arctic wolf)). In these four selected mammals, dramatic bacterial taxonomic differences were detected using both 16S and metagenomic sequencing methods. Importantly, the genes encoding cellulolytic enzymes were significantly enriched in porcupine and beaver microbiomes. Taken together, these findings confirm that the microbiomes of hind-gut fermenters are a rich source of cellulose-degrading enzymes.

Data availability
Quality controlled and processed metagenomic and 16S rDNA sequence data are available through the Sequence Read Archive (NCBI) at accession number SRP115632 and SRP115643 respectively.

Animal containment and diet
All 21 animals were sampled from Shubenacadie Wildlife Park. Animals lived in outdoor pens modeled after their natural habitats. For instance, roaming animals, such as deer and elk, were held in large pastures with unlimited access to grass. Animals were co-housed regardless of whether they were born in captivity or rescued from the wild. All animals received diets that met their species-specific requirements ( Table 1).

DNA extraction and sequencing
Fecal samples from 21 different mammals at Shubenacadie Wildlife Park were collected by park staff and stored at -80˚C [21]. DNA was extracted using a Mobio PowerFecal1 extraction kit (QIAGEN, Germany) according to the manufacturer's protocol. DNA purity was assessed using a Nanodrop spectrophotometer (NanoDrop Technologies Inc., USA). DNA samples were submitted for 16S rRNA gene sequencing at the Integrated Microbiome Resource (IMR) at Dalhousie University. 16S rRNA sequencing primers (Table 2), provided by the IMR, were modified from the primers used by Comeau et. al. to improve compatibility with the Illumina platform [22]. Variable regions V6-V8 of the bacterial 16S rRNA gene were amplified from all purified DNA samples and sequenced on an Illumina MiSeq using paired-end 300 bp sequencing [23]. For metagenomic shotgun sequencing, extracted DNA from coyote, porcupine, Arctic wolf and beaver samples was PCR-amplified using primers from Nextera following the Nextera XT (Illumina) protocol. Then, sequencing libraries were prepared from purified PCR products (1 ng) using the Nextera XT Library Preparation Kit (Illumina). Libraries were cleaned-up and normalized using the Just-a-Plate 96 PCR Purification and Normalization Kit (Charm Biotech). Complete libraries were then pooled and sequenced in a portion of a 300+300 bp Pair-End MiSeq run (Illumina 600-cycle v3 kit).

16S sequencing data analysis
Analysis of 16S rRNA sequence data was performed as previously described using Microbiome Helper [23]. Samples from each of the 21 mammals were sequenced only once, with exceptions in the beaver, porcupine, Arctic wolf and coyote which were sequenced in biological triplicate. Raw data in FASTQ format from Illumina sequencing was analyzed for quality via FASTQC. Paired-end reads were stitched together using PEAR [24]. Stitched reads were filtered for quality (q-score >30 over at >90% of the read) and to ensure forward and reverse primers matched using FASTX-toolkit and BBMAP. FASTQ files were converted to FASTA format and all sequences containing a variable nucleotide 'N' were removed. Chimeric reads were removed using VSEARCH [25]. Taxonomic assignments were made by matching sequences into Operational Taxonomic Units (OTUs) with 97% sequence identity and comparing to the Green-Genes 16S rRNA database using QIIME [26][27]. Samples were normalized for differences in sequencing depth using DeSeq2's negative binomial distribution [28] and principle coordinate analysis was performed using UniFrac beta-diversity through QIIME [29]. KEGG Orthologs (KO) [30][31][32] were inferred using PICRUSt to estimate functional-gene profiles from the triplicate porcupine, beaver, coyote, and Arctic wolf triplicate 16S sequencing data [33]. Differences in KO relative abundances across species were analyzed using STAMP, and subjected to an ANOVA analysis with Benjamini-Hochberg FDR correction to generate the predicted mean proportion of sequences and the box-plots [34].

Metagenomic sequencing data analysis
Raw sequences were inspected for overall quality using FASTQC and then stitched using the program PEAR [24], followed by screening for potential contaminant sequences from human and spiked-in PhiX using BowTie2 [35]. We opted to only use stitched reads to avoid artificial annotation due to short read fragments, and to ensure higher quality annotation. After removal of contaminant data, the software program Trimmomatic was used with default quality cut offs based on FASTQC inspection [36]. MetaPhlAn2 was used to screen the data for taxonomy, and assign reads to bacterial species [37]. Following taxonomic assignment, DIAMOND and HUMAnN1 were used to assign functionality and metabolic profiles to the sequenced metagenomes. DIAMOND was used to perform the initial Pre-HUMAnN1 search against the KEGG database with a default e-value cutoff of 0.001 (stringent selection for protein matches) [38]. The HUMAnN1 software tool was used to get a functional profile of the metagenomic sequences which was then converted to STAMP format [39]. STAMP was used to perform an ANOVA analysis with Benjamini-Hochberg FDR correction and to generate box plots for comparison to the PICRUSt analysis from the 16S rRNA sequencing above. The data output from MetaPhlAn2 was visualized using GraPhlAn, after conversion of the MetaPhlAn2 output data with the Export2GraPhlAn scripts. Summary heat maps of the HUMAnN1 output were generated using Morpheus hierarchal clustering with a one minus Pearson correlate matrix and an average linkage method [40]. All four animals were sequenced in triplicate.
HUMAnN2 was run on the metagenomic dataset to confirm the HUMAnN1 results and to identify taxa likely contributing to the gene families of interest in each sample [38]. HUMAnN2 was run in uniref-50 mode using the same KEGG database as used for HUMAnN1 analysis. HUMAnN2 gene family abundance data was analyzed using a two-way ANOVA with the Holm-Sidak multiple test correction.

Captive mammals display overlapping microbiome composition
Illumina sequencing of the 16S rRNA gene (V6-8 region) for the 21 Shubenacadie Wildlife Park mammals produced a total of 1,002,901 paired-end overlapping reads (Table 3). After merging paired-end reads and filtering for sequences longer than 400 base pairs (90% readquality of >30), samples were left with at least 84% of their initial number of sequences. The number of OTUs identified per sample at 97% sequence identity when clustering ranged from 786 (Skunk) to 5,310 (Red Deer). Alpha rarefaction curves confirmed OTU saturation for most of the sequences, although the porcupine, moose, and red deer samples appeared to have less saturated sequencing than the others (S1 Fig).
The 21 mammalian microbiomes were then compared using weighted unifrac beta-diversity analysis of the 16S rRNA sequencing data (Fig 1). The principal component analysis showed that mammals in the same order (Carnivora, Rodentia, Artiodactyla, etc.) had more similar microbiomes compared to those from different orders. The order Carnivora was observed twice in the beta-diversity plot representing two different clusters of Carnivora: the Table 3. Summary of profiled species and 16S rRNA sequencing details. Presented is the common animal name, the scientific animal name, the total number of sequences, the number of OTUs per sample, number of unique OTU's per sample, and the number of remaining sequences after stitching and quality filtering. Canidae family (including foxes, wolves, and coyotes), and the Musteloidae superfamily (including the fisher, skunk, otter, mink, marten, and bear) [41][42]. The two distinct clusters indicated that the microbiomes of the Carnivora (super)families were divergent, and warranted separate locations on the beta-diversity plot. Next, the specific composition of the microbiome was analyzed through the 16S data of the sampled animals.
Further taxonomic characterization of the porcupine, beaver, coyote, and Arctic wolf microbiomes was conducted by analyzing the metagenomic shotgun sequencing data. The microbial taxonomy of the beaver and porcupine microbiomes were similar to each other but diverged from microbiome taxonomies of the Arctic wolf and coyote (Fig 3). Perhaps not surprisingly, the most prevalent bacteria in both groups were Clostridiales (49.46% and 38.78% in the cellulose-consuming and carnivorous groups, respectively) and Bacteroidales (22.77% and 21.08% respectively). In the cellulose-consumer group, the next most prevalent bacterial orders were: Pseudomonales (13.22%), Burkholderiales (3.99%), Methanobacteriales (1.49%), Selenomonadales (1.42%), and Sphingobacteriales (1.26%). In the carnivorous group, the next most prevalent bacterial orders were: Coriobacteriales (14.49%), Erysipelotrichales (7.82%), Burkholderiales (4.94%), Fusobacteriales (4.99%), and Flavobacteriales (3.72%). Apart from the most prominent bacteria (Clostridiales and Bacteroidales) and the least prominent (Burkholderiales)  that appear in both the carnivore and herbivore groups, bacterial composition is distinct. The complete taxonomic assignment for the metagenomic shotgun sequencing can be found in S3 Table. Cellulose metabolism genes are more abundant in hind-gut fermenters than carnivores Next, the gene family profile was annotated using HUMAnN1 to determine the abundances of KEGG pathways in the four select mammalian samples.  (Table 6). These pathways were expected to be among the highest as they are core metabolic pathways [43]. Complete functional assignments for metagenomic shotgun sequencing can be found in S4 Table. A heat-map was then generated to visualize the starch and sucrose pathway families identified from the KEGG database. Here cellulolytic and sugar transport genes were displayed from the metagenomic data from the four animals (Fig 4). The map demonstrated that the porcupine and the beaver had high abundances of cellulose-metabolizing genes and low abundances of transport genes. The opposite result was true for the coyote and Arctic wolf.

Fig 3. Metagenomic taxonomic summary of select Rodentia (porcupine and beaver) and Carnivora mammals (coyote and Arctic wolf).
Each dot represents a node in the phylogenetic tree, whose size indicates the prevalence of representatives of that node in the sample. Colours are used to group bacteria into orders, indicated in each panel. Species identified with high abundance are labelled (A through W, and A through K in Panels A and B, respectively), while some with low abundance are not. Both the Rodentia mammals (A) and the Carnivora mammals (B) are represented.
https://doi.org/10.1371/journal.pone.0189404.g003 Table 6. The 13 most abundant gene pathways identified using HUMAnN1 (metagenomic sequencing) in the Arctic wolf, coyote, porcupine, and beaver. Units are relative abundance of gene pathway sequences expressed as a percentage average. Standard deviation (StDev) is shown in brackets.

Discussion
Here we report high abundance of cellulolytic enzymes in the porcupine gut microbiome. We initially surveyed 21 mammalian microbiomes using 16S rRNA sequencing, and clustered them into taxonomic groups (Fig 1). From this, we identified differences in bacterial composition amongst mammalian orders. We focused our attention on four mammals for in-depth 16S analysis in triplicate: the porcupine and beaver (Rodentia), and the coyote and Arctic wolf (Canidae). Clostridiales and Bacteroidales were abundant in both the Rodentia and Canidae groups. In addition, the Arctic wolf and coyote samples contained Fusobacterium and Burkholderiales, and the beaver and porcupine contained RF39 and Verrucomicrobiales. We observed that not all species grouped within their particular order (ex. the groundhog was Functional assignment of the metagenomic sequencing was done using HUMAnN and the KEGG database. Generation of the above heatmap was done using Morpheus (see methods) using only genes in the starch and sucrose metabolism pathway, which has been simplified for clarity. Transport and degradation genes in the starch and sucrose metabolism pathway are labelled with black lines. Rows and columns were clustered using the hierarchal clustering tool in Morpheus, using the one minus Pearson correlation matrix and the average linking method. AR, Arctic wolf replicate; CR, coyote replicate; BR, beaver replicate; PR, porcupine replicate. https://doi.org/10.1371/journal.pone.0189404.g004 Cellulolytic potential of hind-gut fermenter microbiomes separate from the rest of the Rodentia order), which may be in part due to the constraints of 16S rRNA sequencing. If we had selected a different 16S rRNA region, or had a larger database of known microorganisms, we may have seen an increased congruence of microbiomes amongst these animals [44]. However, without further sequencing of these microbiomes, we are unable to make any formal conclusions. We re-sequenced the Arctic wolf, coyote, porcupine, and beaver in biological triplicate via metagenomic shotgun sequencing. Within the Rodentia mammals, Clostridiales and Bacteroidales still had the highest abundance, but Burkholderiales and Pseudomonadales were also prevalent. The Pseudomonadales are known to denitrify organic matter, and are part of the carbohydrate-metabolizing Gammaproteobacteria phylum [45]. A few examples of cellulosemetabolizing Gammaproteobacteria are Acinetobacter, Pseudomonas, and Staphylococcus, which have been isolated from termite gut microbiomes and have been shown to grow when cellulose is the sole carbon source [46]. Our metagenomic sequencing data did not precisely match our 16S rRNA sequencing data; we reasoned that our metagenomic sequencing data was likely more accurate due to previously reported biases in 16S primer sets [47]. In the Canidae mammals, the most abundant bacterial orders were Clostridiales and Bacteroidales, followed by Coriobacteriales, Erysipelotrichales and Burkholderiales. The roles of specific bacterial families in the Canidae gut microbiome may be inferred from human microbiome studies. For example, the Coriobacteraceae family (of the Coriobacteriales order) consists of bacteria that degrade bile salts and steroids [48] and the Erysipelotrichaceae family (of the Erysipelotichales order) have been linked to lipidemic profiles of human hosts [49]. Thus, the high prevalence of Erysipelotrichaceae in Canidae may be attributed to high fat diets. sequence proportion of genes encoding the indicated enzymes are shown for feces of four indicated mammals. Predicted data were obtained via PICRUSt analysis from 16S rRNA sequencing, in triplicate. Mean sequence proportions were obtained via HUMAnN1 analysis from shotgun metagenomic sequencing, in triplicate. Statistical significance was determined using ANOVA with Benjamini-Hochberg FDR correction for multiple tests.
https://doi.org/10.1371/journal.pone.0189404.g005 Fig 6. HUMAnN2 analysis gene family abundance and these gene families contributing taxa. Whisker plots for the relative abundance (measured in reads mapped per kilobase) were generated from HUMAnN2 functional analysis for endoglucanase (K01179) and beta-glucosidase (K05349). A two-way ANOVA using Holm-Sidak multiple test correction we used to test significance (A, p = 0.0137; B, not significant). The taxonomic breakdown for each gene family was also generated using HUMAnN2 output and presented at the genus level (C).
Beavers and porcupines are monogastric hindgut fermenters that rely on symbiotic gut bacteria to break down plant material. Fermentation is restricted to the cecum in the porcupine, whereas the beaver uses both its cecum and the proximal colon [20]. Both animals feed on tree Table 7. Average percent abundances of bacterial genus' contributing beta-glucosidase (K05349) in the Arctic wolf, coyote, beaver and porcupine microbiomes as analyzed by HUMAnN2. Average percent abundances were obtained by calculating the percentage of each organism in each run (relative abundance divided by total abundance) and then taking the average of these means. Each sample was sequenced in biological triplicate times, except the porcupine which was sequenced in biological quadruplicate. bark and similar plant material which likely explains certain similarities between their microbiomes, notably including high levels of known cellulose-degrading bacteria. Gruninger, et. al. recently sequenced the beaver microbiome from cecum and fecal samples, examining the microbial differences at the phylum level [10]. They found that the cecum and fecal microbiomes were both dominated by Bacteroidetes and Firmicutes. Our study did not sample from the beaver cecum, but we also found abundant Bacteroidales (class of Bacteroidetes) and Clostridiales (class of Firmicutes) bacteria in the fecal microbiome. Because porcupines and beavers have enlarged cecum's housing complex microbial communities, we speculate that the beaver cecum microbiome would also be dominated by Bacteroidetes and Firmicutes. Unfortunately, V1-V3 primers were used to sequence the 16S rRNA in the Gruninger et. al. study, preventing a direct comparison between our datasets. The functional diversity of the Canidae and Rodentia microbiomes can be attributed to distinct diets of carnivores and herbivores. Diet, in association with phylogeny, drives the evolution of mammalian gut microbiota [50] to the point that species-specific microbiomes share metabolic pathways with other animals that eat similar diets [51]. In this study, analysis of mammalian microbiomes might be considered biased as we are not sampling wild mammals, but instead park mammals that are fed a consistent diet based on species-specific nutritional requirements. The influence of captivity on microbial diversity remains poorly understood [11,50,52]. In studies in which captivity appeared to affect diversity, captive animals were fed a diet that differed from what they would experience in the wild [50,53]. In our study, we observed low deviation between sample replicates (Fig 5) indicating that the microbiomes of both wild and captive animals are similar and stable (an observation similar to that made by Song et al. in cohabiting human microbiomes [54][55]).

K05349: Beta-glucosidase
Beyond taxonomic analysis, we compared the cellulose-degradation potential of the Arctic wolf, coyote, beaver, and porcupine microbiomes. To do so we measured starch and sucrose metabolism pathway abundance (which contains the cellulose metabolism pathway) [56]. We found that the porcupine and beaver microbiomes contained large abundances of cellulose metabolism genes (as expected) [10,14,57,58], but low abundances of sugar transporter genes. The opposite result was observed in the Arctic wolf and coyote microbiomes. This result may be due to the physiological differences between the digestive tracts of herbivores, especially hind-gut fermenters, and carnivores [59]. The major feature differentiating these two groups is the cecum, which is located at the start of the large intestine [59]. In comparison to carnivores, hind-gut fermenters have a working, enlarged cecum which acts as the site of microbial digestion and fermentation [59]. This means that in hind-gut fermenters, the majority of food passes through the small intestine undigested. Hind-gut fermentation enables the breakdown of cellulose-rich plant material, but does not support uptake of nutrients, as reflected in our results (high abundances metabolizing genes, but low abundances of transporters) [59]. For this reason, hind-gut fermenters have evolved a number of behavioral and physical adaptations, such as eating shed feces, to maximize nutrient absorption [59]. Alternatively, carnivores rely on the small intestine, which is replete with carbohydrate transporters, for digestion and absorption of nutrients (as reflected in our results) [59].
Next, we selected three genes involved in cellulose crystal metabolism, di-and tri-saccharide subunit metabolism, and sugar utilization (endoglucanase, beta-glucosidase and cellobiose phosphorylase, respectively). We identified that the porcupine and beaver had significantly increased abundances of endoglucanase, and cellobiose phosphorylase (Fig 5). The abundance of endoglucanase in the porcupine and beaver were corroborated by PICRUSt, HUMAnN1, and HUMAnN2 analysis (Figs 5 and 6). HUMAnN2 analysis also provided a taxonomic breakdown of organisms, such as Rhodospirillum and Cellulophaga, likely contributing beta-glucosidase and endoglucanase in the porcupine, beaver, Arctic wolf, and coyote microbiomes ( Fig  6C). Although Bacteroides are abundant in the beta-glucosidase functional assignment for each animal, Clostridium are responsible for roughly a two-fold increase of the enzyme in the beaver and porcupine in comparison to the coyote and Arctic wolf. This is to be expected, as Clostridium are often noted as important cellulose degraders [57]. For endoglucanase, Eubacterium and Clostridium play a nearly two-fold greater role in cellulosic breakdown in the herbivores, than in the Arctic wolf and coyote. This is perhaps not surprising, as the organisms that metabolize cellulose (like Clostridium spp.) are more abundant, thus there is a larger abundance of reads matching cellulose degradation.
Most of the enzymes needed to degrade complex plant polysaccharides are not present in mammalian genomes, resulting in a mutual dependence between the mammalian host and gut microbiota [58]. Zhu et al. investigated the bacterial diversity of the giant panda (Ailuropoda melanoleuca) which possesses a typical carnivore gastrointestinal tract, but which also consumes roughly 12.5 kg of bamboo each day [60]. Using 16S rRNA sequencing and gene function classification they identified cellulose-metabolizing bacteria in the panda microbiome. Separate groups have identified the cellulolytic potential of other microbiomes such as beavers [10], ruminants [14], and termites [61]. In our datasets, the porcupine and beaver had statistically higher abundances of endoglucanases and cellobiose phosphorylases than carnivores. Thus, mining animal gut microbiomes could provide a potentially rich source of novel microbial genes that could be leveraged for bioengineering applications.
Metagenomic libraries derived from hind-gut fermenter fecal samples could accelerate the discovery of novel cellulolytic enzymes using an approach known as synthetic metagenomics [62][63]. This technique chemically synthesizes selected genes of interest identified from functional metagenomic screens [63]. Synthetic metagenomics enables the identification and synthesis of thousands of novel enzyme sequences from complex environmental isolates. Combining traditional metagenomic library analysis with new synthetic metagenomics approaches will undoubtedly accelerate efforts to "mine the microbiome" of mammals of interest.