Deep Illumina-Based Shotgun Sequencing Reveals Dietary Effects on the Structure and Function of the Fecal Microbiome of Growing Kittens

Background Previously, we demonstrated that dietary protein:carbohydrate ratio dramatically affects the fecal microbial taxonomic structure of kittens using targeted 16S gene sequencing. The present study, using the same fecal samples, applied deep Illumina shotgun sequencing to identify the diet-associated functional potential and analyze taxonomic changes of the feline fecal microbiome. Methodology & Principal Findings Fecal samples from kittens fed one of two diets differing in protein and carbohydrate content (high–protein, low–carbohydrate, HPLC; and moderate-protein, moderate-carbohydrate, MPMC) were collected at 8, 12 and 16 weeks of age (n = 6 per group). A total of 345.3 gigabases of sequence were generated from 36 samples, with 99.75% of annotated sequences identified as bacterial. At the genus level, 26% and 39% of reads were annotated for HPLC- and MPMC-fed kittens, with HPLC-fed cats showing greater species richness and microbial diversity. Two phyla, ten families and fifteen genera were responsible for more than 80% of the sequences at each taxonomic level for both diet groups, consistent with the previous taxonomic study. Significantly different abundances between diet groups were observed for 324 genera (56% of all genera identified) demonstrating widespread diet-induced changes in microbial taxonomic structure. Diversity was not affected over time. Functional analysis identified 2,013 putative enzyme function groups were different (p<0.000007) between the two dietary groups and were associated to 194 pathways, which formed five discrete clusters based on average relative abundance. Of those, ten contained more (p<0.022) enzyme functions with significant diet effects than expected by chance. Six pathways were related to amino acid biosynthesis and metabolism linking changes in dietary protein with functional differences of the gut microbiome. Conclusions These data indicate that feline feces-derived microbiomes have large structural and functional differences relating to the dietary protein:carbohydrate ratio and highlight the impact of diet early in life.


Introduction
Animals co-evolved with the microbiota that inhabit their guts, with direct host-microbiota interactions and microbe-microbe interactions that may impact the host indirectly [1]. The subsequent relationship between the host and the microbiota is complex and has an impact across a broad range of physiology that may impact health maintenance and disease progression [2,3]. The importance of the microbiota in health and disease is well established in humans, with studies highlighting a wide range of associations within the immune system (such as autoimmunity and allergy), metabolism (metabolic syndrome [4], obesity [5] and diabetes [6]), disease (periodontitis [7]) as well as cognition (depression [8], anxiety [9] and autism [10]).
Characterization of the gut microbiota and its surrogate, the fecal microbiota, has identified different ecotypes and taxonomic structures associated with ethnicity, diet and genetics as well as health status in humans [11][12][13]. Furthermore, dietary interventions such as probiotics and prebiotics have been suggested as mechanisms to manipulate microbiota structure and microbiome function [14,15].
Characterization of the gut microbiota has undergone many developments over recent decades. Much early work was focused on the enumeration and isolation and identification of bacteria to characterize their physiology in vitro. The development of DNA-based techniques allowed microbial profiling by DGGE and hypothesis-driven studies quantifying bacteria of interest using 16S rRNA primers. With the advent of next generation sequencing (NGS) and the greater availability of bioinformatics tools, detailed 16S rRNA classification of the gut microbiota could be generated. To transition from observation to interpretation of these more complex datasets, there is an increased requirement to use bioinformatics approaches to analyze data and to translate it into forms suitable for biological interpretation. The functional potential of a microbiome may be hypothesized from sequenced bacterial genomes. However, the development of NGS tools and bioinformatics databases to enable non-targeted sequencing of the genome has provided an opportunity to undertake global analysis of the functional potential of the whole microbiome and to develop a better understanding of the gut microbial ecosystem. Through a better understanding of the functional potential of different microbial structures, it may be possible to gain insights into how they establish, how host health-associated populations can be sustained and how best to reduce the load of potentially pathogenic bacteria through various life stages.
Careful study designs enable the transition from characterization and observation to hypothesis generation. One approach to support better interpretation of large microbiome datasets is to use genetically-related individuals, housed together in a similar environment from an early age that consume nutritionally complete diets over a period of time. Such conditions should support the investigation of nutritional interventions on the gut microbiota.
One group of individuals that provide a unique opportunity to investigate the effect of diet on the microbiome is companion animals because they are fed specific diets. Responsible pet food companies provide nutritionally complete food available from weaning and such diets may be fed for the entire lifetime of these individuals. Furthermore, the nutritional composition may influence the development and stability of the gut microbiome. One companion animal species of interest is the cat, which despite adaptations to being an obligate carnivore, can be fed a wide range of macronutrients (i.e. protein, fat and carbohydrate) within existing commercial cat diets. There is little known about the feline gut microbiome and how it is impacted by diet. Previously, a 16S rRNA sequencing project in cats identified that diet (protein: carbohydrate ratio) can influence the taxonomic distribution of the fecal microbiota [16]. Kittens were fed a high-protein, lowcarbohydrate (HPLC) or a moderate-protein, moderate-carbohy-drate (MPMC) diet from weaning. Fecal samples collected at 8, 12 and 16 weeks of age identified bacterial species with statistically significant differences in their abundances between the two diets, especially within the phyla Firmicutes, Actinobacteria and Fusobacteria. However, in that study it was not possible to explore potential functional aspects of these taxonomic differences.
To progress from taxonomic characterization to functional interpretation we have used the same fecal samples from these kittens to analyze them using Illumina shotgun sequencing. Homology searches of sequenced short random DNA fragments against a database of known genes enables insights into microbial taxonomy and gene functions encoded in the metagenome of a microbiome. Protein coding genes can subsequently be assigned to ortholog groups (set of genes with the same putative enzyme function in different genomes inferred based on amino acid sequence similarity) and mapped to reference pathways such as KEGG [17] or SEED [18] to classify enzyme function (e.g. noted as KEGG level four) by biochemical pathways (noted as KEGG level three).
The purpose of this current study was to interpret shotgun data derived from deep Illumina shotgun sequencing of cat fecal samples and to determine whether functional attributes that were found to be most significant could be interpreted within the context of a diet intervention study.

Results
Annotating 345 Gb of feline gut microbiome data DNA originally isolated from 36 kitten fecal samples for 16S amplicon analysis by Hooda et. al [16] was shotgun sequenced on the Hi-Seq Illumina platform for metagenomics analysis. A total of 345.3 gigabases (Gb) of sequence was generated, resulting in an average of 95.9 million reads (range = 63.6 to 136.2 million reads) per sample. Less than one percent of sequence reads mapped to the cat genome (data not shown) thus no filtering for host sequences was performed. Short reads were trimmed and compared against a reference database of 5,488,681 genes from 1,691 fully-sequenced genomes (1,685 bacteria, 4 archaea and 2 viruses) using the composition vector approach implemented in MetaCV [19]. 32% and 19% of reads were annotated to the genus taxonomic level and to KEGG ortholog groups for functional inference, respectively (Table S1).
Overall, 99.75% of all annotated reads were identified as coming from bacteria, with approximately 0.25% and less than   Table S1). Although the percentage of annotated reads was different between diet groups, this difference was consistent over time (e.g., week 8,12,16).

Characterization of taxonomic structure
Overall, 32 phyla, 230 families, 578 genera and 1,114 species of prokaryotes were identified in kitten feces. All taxonomic annotations for superkingdom, phylum, class, order, family, genus and species are provided in Table S2. Although many taxa were identified at each level of the NCBI taxonomy hierarchy, only a few taxa dominated. For example, 2 phyla, 10 families, 15 genera and 30 species contributed to more than 80 percent of the sequences at the respective taxonomic levels. Conversely, a lot of rare taxa contributed only few sequences each. In total, 23, 116, 241 and 525 taxa contributed less than one percent of the sequences at these same taxonomic levels.
Species diversity estimates indicated a significantly increased (with p,0.001) fecal microbial diversity in kittens fed HPLC compared to MPMC, with average Shannon indices of 4.1 and 3.1, respectively ( Figure 1a, Table S3). The residual variability was significantly smaller on the HPLC diet, after taking into account the between kitten variability, by likelihood ratio tests, p, 0.001. No statistically significant differences in Shannon diversity were identified over time for kittens fed either diet (with p,0.05, Table S3). A rarefaction analysis also indicated higher species richness and diversity in HPLC compared to MPMC, as the respective curve reaches an earlier and lower plateau ( Figure 1b). The curves were modeled and a subsequent statistical analysis identified significant differences between diet groups (with p, 0.001, Table S3).

Diet significantly affected microbial abundances
The aim of this study was to identify significant differences in taxa and the functional potential of the microbiome between diet groups and/or over time. Statistical analysis of all 578 genera and 230 families present in feline feces identified 324 genera and 138 families that were significantly different between diets, as assessed by generalized estimating equations (with p,0.00009 and 0.00022 respectively using a Sidak adjustment level to maintain an overall error rate of 5% per data set, see methods for details). Another five families and six genera had significant diet by time interactions. All genera and families with significant differences, p-values and 95% confidence intervals are listed in Table S4, with predominant bacterial genera presented in Table 1. Figure 2 illustrates taxonomic profiles of all 36 microbiome samples based on 20 bins; the 19 most abundant genera accounting for a mean of 83.4% of total reads, plus one bin summarizing the remaining 16.6% of reads for all other genera ( A higher number of taxa were more abundant in HPLC but those more abundant in MPMC had stronger ORs Diet not only significantly affected the 19 most prevalent taxa shown in Figure 2, but also less prevalent taxa (Table S4). In total, of the 324 bacterial genera that were significantly different by diet, 30 were more common in MPMC (average odds ratio (OR) of 8.61) and 294 were more common in HPLC (average OR of 0.62). Twofold or higher differences were observed for 7 and 48 genera, respectively (OR. Differences at the family level of bacterial taxonomy were very similar to those at the genus level (Table S4). Of the 138 families that were significantly different by diet, 12 were more common in MPMC (average OR of 10.06) and 126 were more common in HPLC (average OR of 0.65). Twofold or stronger differences were observed for 4 and 20 genera, respectively. Fourfold or bigger ORs were detected for Veillonellaceae (62.86), Bifidobacteriaceae (31.94), Acidaminococcaceae (12.81), Eubacteriaceae (0.24), Desulfovibrioceae (0. 19) and Fusobacteriaceae (0.06).
Over two thousand ortholog groups were significantly different by diet In total, 2,182,187 unique protein-coding genes were identified in the microbiomes of feline feces (1,702,265 in HPLC and 1,440,712 in MPMC). Of those 1,217,213 (56%) had a KEGG annotation that was used to relate them to 6,660 KEGG ortholog groups (KEGG level 4; Table S5). The identified KEGG ortholog groups mapped to 280 biochemical pathways (KEGG level three groups). Similar to the taxonomy section, the distribution of individual ortholog groups was skewed. Only 10.1% of the ortholog groups contributed to more than 80% of the sequences, while 70.1% of ortholog groups contributed to less than 1%.
Statistical analyses using generalized estimating equations were carried out analogously to the taxonomy section with adjustments for multiplicity to identify ortholog groups with significant diet and time effects (with p,0.000007). Significant differences between kittens fed MPMC and those fed HPLC were observed for 2,013 unique groups contributing to 194 pathways (Table S4). Significant diet by time interactions were observed for 129 unique ortholog groups contributing to 66 KEGG pathways.
Of the 2,013 ortholog groups that were significantly different by diet, 1,183 had increased ORs with MPMC (average OR of 11.22) and 830 had increased ORs in the direction of HPLC (average OR of 0.32). This is similar to the taxonomy results whereby changes in the direction of MPMC were larger, but a higher number of changes were detected in the direction of HPLC. In comparison to the taxonomy results, very strong ORs were observed for the biggest differences in ortholog groups between diets. Tenfold or greater differences were observed for 45 and 211 ortholog groups with HPLC and MPMC, respectively (average OR of 48.17 and 0.06). Hundredfold or greater ORs with MPMC were observed for 19 ortholog groups (average OR of 202.25).

Protein-related pathways were ranked highly among pathways differing between diets
To identify the biochemical pathways containing the majority of ortholog groups significantly different by diet, we carried out a permutation analysis. This approach identified 10 out of 194 pathways that contained more significantly different ortholog groups than expected by chance (with p,0.022; Table 2, Table  S6). Those 10 pathways contained 263 unique (300 in total, as the same ortholog group can be associated with more than one pathway) ortholog groups with significant diet effects. The overrepresented KEGG pathways involved six pathways of amino acid biosynthesis and metabolism and one pathway each from the respective categories: amino acid related enzymes, ribosome biogenesis, terpenoid backbone biosynthesis and flagellar assembly. 198 out of the 300 ortholog groups were more prevalent in MPMC and 102 ortholog groups were more prevalent in HPLC.
An overview of all 280 biochemical pathways identified in feline feces in this study and their average prevalence in the MPMC and HPLC microbiomes is given in Figure 3 (underlying data available in Table S7). Mean relative abundance was calculated by adding up the relative abundances of all ortholog groups in the respective pathways (for all samples in the respective diet group). Fold changes between HPLC and MPMC diets were obtained by dividing mean relative abundances of both groups. Clustering based on the abundance patterns revealed five major groups ( Table 3). The clusters differed with regards to pathway completeness (the number of enzymes identified vs. the number of enzymes in a pathway), average prevalence and fold changes ( Table 3). On average, the pathways in cluster A contained 84.0 out of 134.5 total enzymes (62.5% completeness). Clusters B, C, D and E were 50.1%, 34.1%, 8.6% and 70.4% complete, respectively.
The six pathways of amino acid biosynthesis and metabolism and the ribosome genesis pathway were grouped together in cluster A. The pathways of terpenoid backbone biosynthesis, flagellar assembly and amino acid related enzymes were grouped in clusters B, C and E, respectively ( Table S7).
The MinPath [20] method was used to identify the minimal set of pathways necessary and sufficient to explain the enzyme functions. There were 78 pathways identified as potential artifacts: 13 from cluster C and 65 from cluster D ( Table 3 and Table S7). In cluster C 19% of pathways from the category the KEGG top level category metabolism, 80% from organismal systems and 43% from human disease were potential artifacts. In cluster D 24% of pathways from the category metabolism, 75% from environmental information processing, 82% from cellular systems, 74% from organismal systems and 63% from human disease were potential artifacts.
Based on the data presented in Table 2, Figure 3, and Table  S4, the microbiomes of kittens fed MPMC included genes with a greater abundance in pathways associated with the biosynthesis of amino acids, vitamins, fatty acids, and peptidoglycans; the glycolytic, TCA, and pentose phosphate pathways; oxidative phosphorylation; and the metabolism of purines, pyrimidines, and sugars. In contrast, the microbiomes of kittens fed the HPLC diet had greater abundance of genes associated with flagellar assembly, chemotaxis, sporulation, and two-component systems.

Time-related differences
Significant differences in relative abundances over time were detected for 18 genera, 11 families and 360 ortholog groups with p,0.00009, p,0.00022 and p,0.000007, respectively (Table S4).
With the exception of one genus (Parabacteroides; 2.1%) and the family that genus belongs to (Porphyromonadaceae; 3.6%) all taxa concerned had relatively low overall abundances (average relative abundances of 0.2% for genera and 0.4% for families). Parabacteroides had decreasing odds with time in addition to increased odds with HPLC. Of the 360 ortholog groups with significant time differences, 56 and 304 had increased and decreased odds with time, respectively. Cumulative relative abundance of all ortholog groups with significant time differences was 0.8%.

Discussion
Recently, a growing number of 16S rRNA-based taxonomy and shotgun metagenomics studies have started to build our understanding of gut microbiota. Such studies are focused almost exclusively on humans, other omnivores or herbivores. The gut microbiota of cats and obligate carnivores in general are still poorly understood. There is a limited number of taxonomy studies [16,[21][22][23] and only two other shotgun studies examining the potential gene functions of the feline gut microbiome [24,25]. Moreover, except from the original 16S rRNA study of the same samples analyzed here by shotgun metagenomics [16], no other longitudinal cat study has been published to date. To our knowledge, this is by far the largest and most comprehensive analysis of the feline gut microbiome to date. It provides a broad view of the microbial taxa and gene catalogue present in the cat, and what taxa/functional groups are most impacted by altering the dietary carbohydrate: protein ratio.

Metagenome annotation through short read binning
When processing a large dataset of shotgun sequences generated on the Illumina platform there is the choice of two fundamentally different approaches; assembly of the short reads into longer contiguous sequences (contigs) or binning them into groups of taxonomy and function. For an overview on the strategies for analyzing shotgun metagenomics data see [26]. Each approach has its strengths and weaknesses, but the literature is inconclusive about what approach is best. For each strategy there is a vast choice of analytical methods and algorithms available. The advantage of contigs is that open reading frames (ORFs) can be identified and longer sequences compared to reference databases thereby increasing annotation accuracy. The disadvantage is that although attempts have been made to modify short read assemblers to perform better for metagenomics datasets (e.g. metavelvet [27], an extension to the widely used velvet assembler [28]), de Brujn graph assemblers assume a clonal genome. Assembly of metagenomics datasets often results in extremely fragmented assemblies and can also lead to chimerical contigs of short reads belonging to two or more species. Also due to the complexity of metagenomes compared to genomes, the memory requirements are very large (in excess of 256 Gigabytes). The issue with binning approaches is that to reach acceptable levels of sensitivity, very strict thresholds have to be applied thus reducing the total number of reads assigned taxonomy or function.
For practical reasons, we decided to use MetaCV [19], a binning approach with similar accuracy to blastx [29], the traditional approach for annotating nucleotide sequences against a protein database, while performing ,300 times faster [29]. MetaCV is a novel tool that applies a composition vector method to the amino acid space. Such methods are well established for nucleotide-nucleotide comparisons (e.g. S-GSOM [30] or TA-COA [31]), but novel for protein sequences. Table 3. Clusters of biochemical pathways identified in feline feces. An average annotation rate of 32% was observed at the genus level despite the increasing number of sequenced microbial genomes (1,691 in reference database used for this study and 2,406 available on NCBI as of April 2013). This indicates that the majority of microbial diversity in feline fecal samples is not accounted for in the reference database of sequenced genomes. Another consequence of the short read binning approach was the high number of taxa or enzyme functions with very low sequence counts. They might represent truly rare taxa or orthologous gene groups. However, it is more likely that they are false positives; a side effect of the very high sequencing depth in this study. In the taxonomy results for example, the method identified 1,114 out of the 1,115 species present in the database.
Significant effects of dietary protein on large parts of microbiota are mostly consistent with previous studies To date, only two studies have characterized the taxonomic composition and functional capacity of the feline fecal microbiome using shotgun sequencing [24,25]. Moreover, few studies have evaluated the effects of diet on the feline fecal microbiome using any high-throughput strategy [16,21,25]. Similar to what has been reported in those studies and that of other host species, a few taxa accounted for a majority of sequences in the current study. Firmicutes (46.0-61.8% of sequences) was the predominant bacterial phylum in all kittens, followed by Bacteroidetes (17.1-36.0%), Actinobacteria (4.2-17.7%), Proteobacteria (2.0-5.9%), and Fusobacteria (0.2-2.4%). Although the same five phyla were predominant in the current dataset (using Illumina shotgun sequencing) and that of Hooda et al. [16] whereby 16S rRNAbased primers and 454 pyrosequencing was used, the abundance of Bacteroidetes differed greatly between studies. The 16S rRNA primers used in that study appear to be biased against Bacteroidetes, with only 0.2-0.7% being reported in these same samples [16]. Given the lack of Bacteroidetes reported in that study, the other predominant phyla were represented in greater proportions (71.0-80.1% Firmicutes; 4.6-28.5% Actinobacteria; . Pathways were clustered and sorted by their average relative abundance in the two diets. Cluster structure is indicated by colors overlaid onto the dendogram. Fold changes (full bar equals to a six fold change) to the left indicate a higher abundance in high protein diet (HPLC). The next column indicates ten pathways with an over-representation of ortholog groups (KEGG level 4) with significant diet differences. Six out of ten pathways (names given in right column) are related to amino acid metabolism highlighting the effect of differences in dietary protein on the microbiome. doi:10.1371/journal.pone.0101021.g003 0.1-12.9% Fusobacteria; 0.1-3.7% Proteobacteria). Although the proportions between studies differ, Firmicutes, Bacteroidetes, Actinobacteria, Proteobacteria, and Fusobacteria are the consistently predominant phyla in cat feces [21,24,25].
Despite the differences in technology used, the changes due to diet were relatively similar between the current study and what has been reported previously [16,32]. At the phylum level, the current study identified fewer Firmicutes and Actinobacteria, but greater Fusobacteria in HPLC-fed kittens consistent with previous studies [16,32]. Proteobacteria were also greater in HPLC-fed kittens consistent with Hooda et al. [16] but inconsistent with Bermingham et al. [32] who compared a wet HPLC to a dry MPMC diet. At the genus level, many of the predominant groups and dietaryinduced changes were consistent with what has been reported previously [16,32]. In all studies [16,32], Megasphaera was a predominant taxon and present in much greater proportions in kittens fed MPMC compared to HPLC. Megasphaera are predominant and important bacteria in ruminants because of their ability to ferment lactate into several short-chain fatty acids (SCFA), including butyrate [33,34]. As probiotic treatments, Megasphaera have been reported to improve gastrointestinal health of rats [35] and pigs [36]. Because the MPMC diet contained more dietary fiber than the HPLC diet (6.9 vs. 2.0%; [37]), a greater population may be expected. The high abundance of this bacterium in the feline gut, accounting for .30% of sequences at 12 and 16 week of age, however, indicates a major role in an obligate carnivore. A greater research focus on this genus in the future is justified.
Bifidobacterium and Acidaminococcus were predominant genera and in much greater abundance in MPMC-fed vs. HPLC-fed kittens. This finding is consistent with Hooda et al. [16] but inconsistent with Bermingham et al. [32] who either reported very low abundance or no presence at all. Bifidobacteria are predominant bacteria in human infant feces [38,39] and are associated with gastrointestinal health in humans [40] and cats [41]. Even though some studies have had difficulty detecting bifidobacteria in cat feces [42,43], recent studies have verified their presence and ability to increase with the addition of fermentable carbohydrates to the diet [44][45][46].
The current study highlights Selenomonas and Prevotella as being more abundant in MPMC-fed kittens, taxa not reported as being different between dietary groups previously [16,32]. In general, selenomonads are metabolically diverse, possessing the ability to use a wide variety of nitrogen-and carbon-based substrates. Selenomonas ruminatium, the member of this genus most studied because of its importance in ruminant nutrition, is an important lactate-fermenting organism [47] that grows well in environments with a high lactate and organic acid production [48] and is often more abundant in animals fed fermentable carbohydrate-containing diets [49]. Previously reported diet related differences in Lactobacillus (greater in MPMC-fed kittens [16,32]) were not statistically significant in the current study.
Fusobacterium, Clostridium, Eubacterium, Ruminococcus, Bacteroides and Desulfovibrio were significantly more prevalent in HPLC vs. MPMC. Our findings for Fusobacterium and Clostridium are in agreement with both previous studies [16,32], while our results for Eubacterium and Ruminococcus are consistent with Hooda et al. [16] and our findings for Bacteroides are in agreement with Bermingham et al. [32]. Our findings for Desulfovibrio, a sulfate-reducing bacteria and member of the Proteobacteria, have not been described previously. Although members of the genera Fusobacterium, Clostridium, and Desulfovibrio are sometimes associated with gastrointestinal disease [41,50,51], the kittens in the current study were healthy throughout the study so their presence was more likely due to their ability to ferment protein-based substrates.
The Bacteroides and Prevotella shift noted in the current study is consistent with data from humans, with Bacteroides being more abundant in those eating high protein and animal fats and Prevotella being more abundant in those eating more carbohydrates [52]. Gut bacteria of kittens consuming the MPMC diet would be expected to have more total and carbohydrate-based substrate than those consuming the HPLC diet.

Significant effects on taxonomy are in agreement with targeted 16S sequencing
Statistical analyses were carried out to identify significant differences due to diet, time, and diet by time interactions similar to that performed on 16S sequence data [16]. This enabled a comparison of two studies using different sequencing approaches on the same fecal samples. Hooda [16] presented statistical results for 18 genera and ten families. Our study included data for ten of those genera and all ten families while eight genera were lacking from our reference database (Table S8). Choosing the number of annotated 16S reads as a denominator this allows a comparison based on 56% and 86% of sequences for genera and families, respectively (or 46% and 54% when the number of annotated shotgun reads was chosen). We consider this a very good overlap given the disparity of more than two orders of magnitude between the number of taxa in the Hooda [16] reference database (. 300,000 16S bacterial sequences) and the reference database of this study (1,691 fully-sequenced genomes).
Among the ten genera available for comparison, eight had significant diet differences, while no significant associations with time or diet by time interactions were observed (Table S8). Hooda et al. [16] also identified significant diet differences for the same eight species, but for two of those (Acidaminococcus and Megasphaera) they also identified significant time effects and diet by time interactions. They also found a significant time difference for Lactobacillus. Out of the ten families available for comparison, we found seven with significant diet differences, one family with diet by time interactions (Erysipelotrichaceae) while no significant associations with time were observed (Table S8). Hooda et al. [16] identified significant diet differences for six families (all of which are included in our list), but did not find significant diet differences for Lachnospiraceae. They also identified one family with significant diet differences (Lactobacillaceae) and one family with significant time differences and diet by time interactions (Erysipelotrichaceae). In summary, the statistical results were in agreement for seven out of ten genera and seven out of ten families (Table S8).
We conclude that where a direct comparison between the 16S and the shotgun sequencing data was possible the statistical analyses generated similar results. This is despite the fact that proportions inferred could be very different between methods. For example, the genus Ruminococcus had average percentages of 3.2 in MPMC and 13.1 in HPLC with 16S and 0.4 and 1.2 with shotgun sequencing. Despite those differences, the statistical comparisons found significant diet difference in both studies.

HPLC microbiota are more diverse and stable
Diversity of the HPLC microbiota was significantly higher but residual variability was lower in comparison with MPMC. In a previous study [32], kittens fed a wet HPLC diet had a more diverse fecal microbiota than those fed a dry MPMC diet. We speculate that the more diverse and less variable taxonomic structure observed for the high protein diet reflects specialized microbiota, while the less diverse and more variable structure on the moderate protein diet reflects more generalized microbiota. The fact that the microbiota of the cats fed HPLC were more similar to each other than those of cats fed MPMC (with shorter branch lengths in the cluster dendogram of Figure 2 and a tighter 95% confidence interval in Figure 1a) could also reflect that the HPLC group consisted of two litters and the MPMC group of four litters. However, this may be unlikely as studies examining the impact of host genetics on the microbiota have been inconclusive and most studies could not identify associations [53].

Little evidence of microbiome development between 8 and 16 weeks of age
It is known that the microbiome develops early in life [54], however in this eight week period little effect was seen. Only 18 bacterial genera and 11 families showed significant differences in their relative abundances over time and with the exception of the genus Parabacteroides and the family Porphyrmonadaceae (the family this genus belongs to), all taxa were rare. Those two taxa were also significantly different by diet. No significant differences in Shannon diversity over time were found. 360 ortholog groups had significant differences over time but their cumulative abundance contributed to less than one percent of the annotated sequences. These data can be used to suggest that the microbiome of growing kittens with access to mothers diet, are established by week eight and little changes happen between eight and 16 weeks of age.
Kittens started eating solid cat food at three weeks of age and gradually increased their intake of solid food and decrease nursing until weaned by their mothers (starting at week five). As kittens were housed with their mothers, who were fed the same diets later assigned to the kittens, this exposure could have shaped the kittens' microbiota from a very early age. Studies in humans have shown the mother's milk microbiota to be complex [55] and concluded the bacterial composition plays an important role in the colonization of the infant gut [55,56]. One hypothesis is that evolution has shaped obligate carnivores and their gut microbiota in a way that facilitates a rapid transition from the digestion of mother's milk to prey.
Although few differences over this relatively short period of time were identified in this study, future longitudinal studies of the feline fecal microbiota may provide further insights into the development and maintenance of microbiota. Studies designed to characterize the development of the gut/fecal microbiota over a longer period of time and into adulthood are needed.

Gene identification and biochemical pathway inference from metagenomes
The number of unique genes identified in the metagenomes of feline feces was approximately 108-fold higher than the number of open reading frames identified in the cat genome [57]. This ratio is consistent with what was proposed for humans (100-fold; [58]) and later verified experimentally (150-fold; [59]). To understand better the metabolic potential related to diet the genes were mapped on to the KEGG database. Using this approach 55.8% of the genes in the feline feces were mapped to ortholog groups. Whilst this approach excluded proteins without enzymatic function (e.g. structural proteins) it did not affect our central question about dietinduced differences in the functional potential of the microbiomes.
The inference of biochemical pathways does not only depend on the number of genes and ortholog groups identified, but also on how they are mapped to the KEGG reference pathways. A naïve approach that called a pathway present when it had at least one enzyme present identified 280 pathways in the microbiomes of feline feces. This approach is problematic because a single enzyme may function in more than one pathway [60] with ''promiscuous enzymes'' (enzymes occurring in more than one pathway) resulting in an over-estimation of the number of pathways.
Clustering the pathways based on their average relative abundance highlighted cluster D, a group of 136 pathways with very low relative abundances and low pathway completeness. It is very likely those pathways are false positives caused by enzymes with low sequence counts and promiscuous enzymes. This is also the cluster from which the largest proportion of pathways was removed by MinPath [20].
MinPath [20] reduced the number of pathways from 280 to 202 by identifying promiscuous enzymes. This more conservative estimation affected clusters C (13) and D (65) and removed mostly (in terms of the proportion of pathways removed) pathways related to cellular systems, organismal systems and human disease; all pathways one would not expect to find in the prokaryotic genomes in feline feces analyzed in this study. This shows that most falsely identified pathways (e.g., human disease related) are not the consequence of an erroneous sequence annotation during the MetaCV binning (the reference database contained no eukaryotic sequences), but indeed caused by promiscuous enzymes playing roles in more than one pathway. We recommend that any study working with pathway data inferred by automatic annotation of sequence apply MinPath or a similar approach to reduce the number of false positives.
It is important to note that although the concept of separate biochemical pathways can facilitate understanding, in the cell all pathways are connected to some degree by the enzymes they contain [61]. This makes it difficult to assess in an automated way whether a pathway is functional or not. In some cases, a lot more enzyme functions can also be included into a single pathway than are needed for the pathway to function biochemically. A good example is the DNA replication pathway in which we identified 24 out of 51 enzymes. This would suggest an incomplete and potentially non-functional pathway. Highlighting the enzymes identified in the respective pathway revealed that in the KEGG database prokaryotic and eukaryotic DNA replication are combined into a single pathway and the microbes in feline feces have all the enzymes necessary for prokaryotic DNA replication.
Strongest diet-related differences between pathways were related to amino acid metabolism We identified 2,013 enzyme functions that were significantly different by diet. Interpreting all of those functions and the 194 pathways they belong to, in the context of diet, is prohibitive. It also bears the risk of over-interpreting the results before having established the robustness of the inference methods given all their shortcomings (see discussion on methods for annotating shotgun metagenomics data and pathway inference). Therefore, we first took a top level view at the data and used an objective method to identify the biochemical pathways containing most diet-related changes before addressing any specific changes. Our permutation testing approach identified ten pathways that contained more significant changes in their enzymes than expected by chance (p,0.022). Six of those pathways were related to the biosynthesis and metabolism of amino acids. Given that the main difference between the diets fed in this study was their protein content, it is of note that the biggest differences in the metabolic potential of the two fecal microbiomes lie in their ability to metabolize amino acids. This finding is an indication that the fecal microbiomes have functional differences related to the contrasting protein content in the diets and that they may reflect the functional differences of the microbiomes of the GI tract, making the fecal microbiome a suitable surrogate for that of the GI tract. It may also be interpreted as a validation of the bioinformatics and statistics methods applied in this study, as they are robust enough to identify biologically-relevant diet differences.

Diet-related differences in enzyme functions and biochemical pathways
In the current study, the microbiome of MPMC-fed kittens had a greater relative abundance of genes associated with the biosynthesis and metabolism of several amino acids, including alanine, aspartate/asparagine, and glutamate/glutamine; cysteine and methionine; lysine; tyrosine; and phenylalanine, tyrosine, and tryptophan. The abundance of genes associated with the biosynthesis of several vitamins (thiamin, riboflavin, vitamin B6, niacin, biotin, and pantothenate), fatty acids, and peptidoglycans and genes associated with the glycolytic, TCA, and pentose phosphate pathways; oxidative phosphorylation; and the metabolism of purines, pyrimidines, and sugars also were more prevalent in MPMC-fed kittens. Biosynthesis of amino acids and fatty acids are critical for the growth of all microbes, but microbial activity level, the amount and/or profile of amino acids produced, and production of other protein-containing metabolites have been reported to vary depending on dietary composition in pigs [62], cattle [63], and cats [44,64]. Abundance of urease subunits-a, b, and c were also greater in MPMC-fed kittens. Urease activity is important for nitrogen recycling, especially when low-protein diets are consumed [65]. Both diets fed in this study provided sufficient dietary protein for growing kittens. However, a greater amount of fermentable carbohydrates and lower amount of dietary nitrogen reaching the large intestine in MPMC-fed kittens would be expected to require greater amino acid biosynthetic and urease capacity and/or activity.

Potential for mucin degradation is higher in HPLC microbiomes
Mucins are high-molecular-weight glycoproteins that not only form a protective mucus barrier in the gut, but may have a role in regulating bacterial colonization and metabolism [66,67]. Many bacteria may adhere to mucus, using it as an anchor for colonization, while others have developed an ability to utilize it as a substrate, possessing glycoside hydrolase, sulfatase, fucosidase, or protease enzymes. The abundance of three genes related to mucin degradation, including arylsulfatase (OR 0.21), alpha-Lfucosidase (OR 0.55), and mannosyl-glycoprotein endo-b-Nacetylglucosaminidase (OR 0.27), was greater in HPLC-fed kittens. A fourth, O-sialoglycoprotein endopeptidase, was slightly more abundant (OR 1.35) in MPMC-fed kittens. Even though exogenous carbohydrates are often the focus of gut microbiome metabolism, many taxa have the ability to degrade GI mucins. The difference in this study may be due to the greater presence of Bacteroides and lower presence of Bifidobacterium in the HPLC-fed kittens. The ability of Bacteroides to utilize mucins is well established [68,69]. In contrast, other than B. bifidum that possesses several glycoside hydrolases capable of mucin degradation [70], most Bifidobacterium are restricted in that regard [71].

Summary and conclusion
We carried out shotgun sequencing of DNA isolated from fecal samples of kittens fed a high-and a medium-protein diet to determine the effect of protein content on the taxonomic structure and functional potential of the gut microbiome. We identified 324 genera, 138 families and 2,013 KEGG ortholog groups with significant diet differences. Biochemical pathways related to amino acid metabolism were highly ranked among the pathways containing KEGG ortholog groups with diet differences. This result links differences in protein content in the hosts' diet with differences in the functional potential of the microbiomes to metabolize protein. It also suggests that feces can be used as a surrogate for the gut in comparative analyses of microbiomes, and that short read binning approaches is able to detect those differences. The naïve approach of pathway identification greatly overestimated the number of pathways in feline feces and methods like MinPath need to be applied to give more conservative estimates. Very few taxa and KEGG orthology groups had significant differences over time between weeks 8 and 16, but more studies investigating changes further into adulthood are needed.

Animals and experimental design
The animal protocol was approved by the University of Illinois Animal Care and Use Committee and was conducted at the Edward R. Madigan Laboratory of the University of Illinois. Eight domestic shorthair female cats [3.560.24 kg body weight (BW) and 1.360.02 yr of age] and one male domestic shorthair cat (5.8 kg; 1.3 yr of age) were parents to these kittens. Female cats were randomly assigned to two test diets, one month before mating, and continued throughout gestation and lactation. Kittens from mothers fed the MPMC diet (n = 6) or HPLC diet (n = 6) were studied herein. Kittens were housed with dams until eight weeks of age, weaned and then fed the same diets as mothers. After weaning, kittens were twin-or triple-housed within dietary group in cages (160.7660.7 m) in temperature-controlled rooms with ad libitum food and water intake to allow for adequate growth. Fresh fecal samples (within 15 min of defecation) were collected from all kittens at 8, 12, and 16 weeks of age. All fecal samples were stored immediately at 280uC.

Diets
Complete dietary information is provided in [37]. Briefly, both diets were formulated to meet or exceed all nutrient requirements for growth according to the Association of American Feed Controls Officials [72] and were based on chicken meal, dried potato product, chicken fat, dried egg, herring meal, and beet pulp. The MPMC diet contained approximately 78 g crude protein and 44 g fat/1000 kcal ME (dry matter basis), while the HPLC diet contained approximately 110 g crude protein and 49 g fat/1000 kcal ME (dry matter basis).

DNA Extraction and Illumina Sequencing
Bacterial DNA was extracted using a QIAamp DNA stool mini kit (Qiagen, Valencia, CA, USA) using the repeated bead beating plus column (RBB+C) method [73]. Fecal DNA was quantified using a NanoDrop ND-1000 spectrophotometer (

Sequence data annotation
Sequencing reads were trimmed using sickle (https://github. com/najoshi/sickle.git) version 1.200 with cutoff of 20 and 50 for phred score and length, respectively. Trimmed short reads were annotated using a composition vector approach as implemented in MetaCV version 0.226 [19] against a reference database of 5,488,681 genes from 1,691 fully-sequenced genomes (supplied with MetaCV). MetaCV results files contain the following data entries for each individual short read in the FASTQ files of the respective samples: Read identifier, quality score of the annotation, GI number of the best protein match, KEGG entry identifier (if available), eggNOG identifier (if available), NCBI taxon ID and strain name. MetaCV applies a lowest common ancestor (LCA) approach on the NCBI taxonomy tree when reliable annotation down to strain level is not possible. Results were filtered by a quality score of 20 (as recommended in the original publication and during personal communication with the authors for read lengths of 100 nucleotides) and collated into a summary table using MetaCV. Summary tables are automatically generated for the following levels of taxonomical hierarchy: superkingdom, phylum, class, order, family and genus. A summary table on the species level (which is required for the calculation of species diversity) was generated separately by parsing the results files. In terms of functional annotation, summary tables were automatically generated for KEGG [17] orthology groups (KEGG level 4), biochemical pathways (KEGG level 3) and the higher level groupings KEGG level 2 and 1, as well as eggNOG [74] gene clusters. Uniprot tables of April 2012 were used to relate genes to KEGG orthology groups and eggNOG clusters. KEGG tables of May 2011 were used to relate ortholog groups to pathways and higher level groupings.

Statistical analyses
Individual genera, families and ortholog groups were analyzed univariately by generalized estimating equations, with a binomial distribution (for proportional responses) and a logit-link, to investigate the diet effects over time. Kitten was specified as a random effect to account for the lack of independence in observations over time within kittens and diet, week and their interaction were fitted as categorical fixed effects. If effects were found to be non-significant they were dropped from the model. All effects were tested against a data set type I error rate of 5% by Sidak [75] adjustment, thus the critical p-values used were 0.000089 for genera, 0.000223 for families and 0.000007 for ortholog groups. Prior to fitting of the univariate model sets, potential responses were filtered if the proportion of zero counts was greater than 25%. All odds ratios (OR) reported were significant at the respective adjusted p-value cutoff.
Permutation testing was performed to identify pathways containing more ortholog groups with significant diet changes than would be expected by chance. The number of ortholog groups in each KEGG pathway and the subset with significant diet changes was calculated. One thousand random subsets of 2,013 ortholog groups were then taken (to represent random significant ortholog groups) and the number found in each pathway calculated. The probability of a pathway containing more significant ortholog groups than would be expected by chance was calculated as the percentage of subsets where the random number in each pathway was greater or equal to the number of significant ortholog groups in each pathway. A Sidak [75] adjusted critical p-value of 0.002 was used to maintain a 5% type 1 error rate for the 228 pathways tested.
Diversity was estimated by calculation of Shannon indices [76] from the normalized species counts for each sample according to the formula: H 0~{ X R i~1 ln pi, where R is the number of species observed and p is the proportion of the i-th species. These i were then analyzed by linear mixed models with kitten as a random effect, with weighting by diet specific variability, and diet, time and their interaction as categorical fixed effects. Fixed effects were tested against a cut-off of p,0.05 and dropped from the model if non-significant. In addition, the heterogeneity of variance between diets was tested by likelihood ratio tests for nested random effect models. Shannon indices of individual samples and the results of the statistical analysis are provided in Table S3. Species richness and diversity were also estimated by calculation of rarefaction curves for each sample. Random permutations were generated by using the Linux command shuf on the MetaCV results files. To reduce noise a higher QC cutoff of 40 was applied. Sub-samples were taken at depths of 1, 5, 10, 20, 40 and 60 million reads (unpaired) with ten replicates per depth and sample (2160 data points in total). These were then analyzed by linear mixed models with time nested in kitten as the random effects, with variability weighting by the number of sequences. Diet, time, the number of sequences and their interactions were fitted as categorical fixed effects. Effects were tested against a cut-off of p,0.05 and dropped from the model if non-significant. The number of species per sub-sample and the results of the statistical analysis are reported in Table S3.
Taxonomic profiles (Figure 2). Raw genus abundance data as inferred by MetaCV [19] was converted to relative abundances using the number of annotated sequence reads as a denominator on a per-sample basis. The 19 most abundant genera irrespective of diet were used for a stacked per-sample bar plot with the remaining genera pooled as 'others'. The samples were ordered as inferred by the following clustering approach. Clustering was performed on the relative abundances of all genera using R. Pairwise distances were calculated using Euclidean distances. Hierarchal clustering was performed according to the average agglomeration method which was previously shown to perform best on taxonomic profiles of environmental data [77].
Functional profiles (Figure 3). Raw pathway abundance data (KEGG level 3) as inferred by MetaCV [19] was converted to relative abundances using the number of annotated sequence reads as a denominator on a per-sample basis. Mean relative abundances were calculated and plotted across all HPLC and MPMC samples, respectively. Clustering was performed on mean pi relative abundances for HPLC and MPMC using Euclidean distance with hierarchal clustering and average agglomeration in R. Fold change was calculated by dividing mean relative abundances of HPLC and MPMC using the smaller value as denominator and plotting the values in the respective direction. The axis of the graph illustrates a minimum fold change of 1 up to a maximum fold change of 6.

Supporting Information
Table S1 Total taxonomy and functional assignment counts and percentages of fecal samples from kittens fed a high-protein, low-carbohydrate or moderate-protein, moderate-carbohydrate diet. (XLSX)