Metagenomic Analysis of a Tropical Composting Operation at the São Paulo Zoo Park Reveals Diversity of Biomass Degradation Functions and Organisms

Composting operations are a rich source for prospection of biomass degradation enzymes. We have analyzed the microbiomes of two composting samples collected in a facility inside the São Paulo Zoo Park, in Brazil. All organic waste produced in the park is processed in this facility, at a rate of four tons/day. Total DNA was extracted and sequenced with Roche/454 technology, generating about 3 million reads per sample. To our knowledge this work is the first report of a composting whole-microbial community using high-throughput sequencing and analysis. The phylogenetic profiles of the two microbiomes analyzed are quite different, with a clear dominance of members of the Lactobacillus genus in one of them. We found a general agreement of the distribution of functional categories in the Zoo compost metagenomes compared with seven selected public metagenomes of biomass deconstruction environments, indicating the potential for different bacterial communities to provide alternative mechanisms for the same functional purposes. Our results indicate that biomass degradation in this composting process, including deconstruction of recalcitrant lignocellulose, is fully performed by bacterial enzymes, most likely by members of the Clostridiales and Actinomycetales orders.


Introduction
Decomposition of organic matter in a typical composting process is carried out by a complex microbial community whose structure changes depending on temperature, pH, aeration, water content, and type and amount of organic solids [1][2][3][4][5][6]. The aerobic microbial metabolism drives pH changes and rapid temperature increase above 50uC, followed by sustained high temperatures between 60-80uC and then gradual cooling of the composting mass [7].
Analyses of different composting environments by cultivationdependent or community fingerprinting by amplified rDNA restriction analysis, denaturing gradient gel electrophoresis (DGGE), DNA hybridization techniques and phospholipid fatty acid determination have shown that Actinomycetales, Bacillales, Clostridiales and Lactobacillales are among major bacterial orders identified in composting processes [6,[8][9][10][11][12]. For instance Lactobacillales have been associated with the initial mesophilic stage in the composting of organic household waste, which often has a low initial pH [2,6,9]. On the other hand, Bacillales, Clostridiales and Actinomycetales have been shown to constitute a substantial part of the community in the thermophilic stages of composting of organic household waste [6,10] or a mixture of livestock manure and shredded plant waste [8,11]. In addition a few fungal species have been also identified among compost microbial communities during its thermophilic stage as well as upon cooling [1,13,14].
The above mentioned composting studies were focused on the detection of abundant microbial groups and limited by biases imposed by rRNA gene-cloning or probing approaches [15][16][17][18]. These limitations could potentially be overcome by advances in DNA extraction protocols [19] and sequencing technologies [20][21][22] as well as by computational methods for whole-community sequence data analysis [22][23][24], which together allow a comprehensive overview of the phylogenetic composition and diversity of genes in complex microbial communities. For instance, metagenomic approaches are guiding discovery of enzymes and organisms for biomass deconstruction using samples from complex environments such as cow and yak rumen [25][26][27] and switchgrass-adapted compost [20,28,29].
Here we present analyses of a large data set (1.6 Gbp) generated by direct pyrosequencing of metagenomic DNA from composting samples, with the goal of investigating their microbial community composition and to prospect for genes and functions related to biomass degradation. Samples were collected at a composting facility inside the São Paulo Zoo Park, which is located within the urban area of the São Paulo megalopolis (Brazil), and includes a significant remnant patch of Atlantic rain forest. The composting facility is designed to compost four tons/day of all organic waste produced in the park. Dropped tree leaves, plant debris and grass clippings collected from the Atlantic rain forest fragment and gardens located inside the park, water recycling slurry from its artificial lake, waste water treatment sludge, bedding materials and animal feed wastes, plus animal excrements from about 400 species are blended and composted by a standardized management procedure in several 8 m 3 open concrete chambers, followed by stabilization in windrows (unpublished procedure). The end compost humus-rich material obtained after 80-100 days is used as fertilizer and soil amendment in the São Paulo Zoo Farm, thus completing the full cycle of recycling. About 600 tons of compost end product is generated per year. The hypothesis that guided our study was that given its peculiar composition, the São Paulo Zoo Park compost process would host a large microbial diversity, combining the phylogenetic richness of soil and forest microbial communities [30][31][32] with that of the microbiota associated with zoo animals [33][34][35]. To our knowledge this work is the first report of high-throughput sequencing and analysis of a composting whole-DNA microbial community.

Shotgun Pyrosequencing of Compost Microbiomes
To assess the microbial diversity and the metabolic potential for biomass degradation in the composting process from the São Paulo Zoo we applied a sequence-based metagenomic approach. Samples were collected during the composting operation, one from a chamber 8 days after the beginning of composting process (Zoo Compost 1, ZC1) and another from a chamber 60 days after the beginning of composting process (Zoo Compost 2, ZC2); the latter had been thoroughly mixed and aerated eight days before sampling. In both operations the total composting time was about 90 days. High molecular weight DNA extracted from samples ZC1 and ZC2 was submitted to shotgun sequencing using the Roche 454 GS FLX Titanium technology. Four sequencing runs yielded over 2,900,000 reads per sample with 276 and 299 nt mean length, totaling 836 Mbp and 842 Mbp, for ZC1 and ZC2, respectively ( Table 1). Assembly of these two metagenomic sequence datasets yielded 52,953 contigs for ZC1 and 52,182 contigs for ZC2, each one using, respectively, 37.2% and 48.8% of the total reads. N50 contig length of 1,734 bp and 1,516 bp was obtained for ZC1 and ZC2, respectively.
The ZC1 metagenome exhibits average GC content higher than ZC2 (Table 2) and its sequence reads also present a very distinct GC content profile when compared with ZC2 ( Fig. 1). Besides differing between themselves in GC content, both ZC1 and ZC2 are also markedly different in GC content from three publicly available high-throughput sequencing datasets related to biomass degradation (soil from a Puerto Rico rain forest, termite gut and cow rumen planktonic microbiomes [25,36,37]) ( Fig. 1), suggesting differences in their respective microbial composition [38,39], which is supported by results shown below.

Compost Microbial Community Composition
Overall community structure analyses performed with M5RNA (Non-redundant multisource ribosomal RNA annotation) and M5NR (M5 non-redundant protein) databases available within MG-RAST [40] showed that ZC1 and ZC2 are dominated by species in the Bacteria domain (84-89% and 93-96%, respectively), regardless of the database used (Table S1). The remaining sequences match Archaea (,1%), Virus (,0.25%) and Eukaryota (,3%) sequences, or were unassigned. The few Eukaryota rRNA sequences found in both samples are mostly related to Streptophyta, Nematoda, and Arthropoda phyla and possibly correspond to residual DNA from the compost start substrate. We observed that the fraction of ZC1 and ZC2 protein-coding sequences related to fungi was negligible (less than 0.02% of all reads in either sample).
The Bacteria domain composition of ZC1 and ZC2 metagenomes was further investigated using the RDP [41] and M5NR databases available within MG-RAST [40]. Despite the striking differences in abundance, most bacterial orders found in both samples (Table S2) are among major bacterial classes previously identified in composting processes [6,8,9,11,12,[42][43][44]. (The baseline for all fractions reported henceforth refer to all reads assigned to the Bacteria domain.) Proteobacteria is by far the most abundant phylum in ZC1 (58% and 48% according to RDP and M5NR, respectively), while Firmicutes dominates the ZC2 bacterial community (88% and 67% according to RDP and M5NR, respectively). The ten most abundant orders in ZC1 and ZC2 bacterial communities are shown in Figure 2. The observed difference in abundance is significant (p,0.01) as determined by the RDP library compare tool using the Naive Bayesian classifier [45]. While in ZC1 75% of the total bacterial orders are represented by Xanthomonadales, Pseudomonadales, Clostridiales, Burkholderiales and Bacillales, in ZC2 ,75% are solely represented by Lactobacillales. This high abundance of Lactobacillales might reflect the more advanced stage of the compost process of the ZC2 sample relative to ZC1 or unknown characteristics of the ZC2 initial composting substrate. In contrast, an early work by rRNA cloning and sequencing has shown that members from the lactic acid bacteria were present during the initial stages of composting in a model bench-scale reactor system, and their presence correlates with low pH in the feeding and mesophilic composting conditions [46]. In our case, the observed differences could not be correlated with pH or temperature, since at the moment of sampling temperatures were 66uC and 67uC for ZC1 and ZC2, respectively, and pH was 7.0 for both samples.
Despite the fact that the composting process such as the one we prospected here is an aerobic process, we found a noteworthy abundance of Clostridiales (,15% in ZC1; ,6% in ZC2), which is a bacterial order known to include anaerobic or micro-aerophilic species. This probably reflects the semi-static conditions of the compost we sampled, which favors the formation of anaerobic micro-environments, and also the high metabolic activity of the bacterial community [1,6,7,47]. Anaerobic microorganisms have been proposed to play an important role in biomass degradation [47,48] and, indeed, Clostridium appears to be responsible for cellulose degradation in composting [1,11,49,50]. Therefore, the appearance of Clostridiales since the initial stages of composting  seems important for degradation of complex biopolymers such as hemicellulose and cellulose. Degradation of complex polymers in compost appears to be performed also by Actinomycetales, Bacillales and fungi, whose presence has been associated with age and temperature of composting [1,6]. In these studies Actinomycetales have been shown to be abundant in thermophilic stages, while fungi appear towards the end of the composting process, in the cooling and maturation phase. Even though fungi are well-known agents of lignocellulose degradation, cumulative evidence suggests that members from Actinomycetales and Bacillales among other bacterial orders possess the ability to degrade cellulose and solubilize lignin [48,51]. Moreover, they tolerate higher temperatures and higher pH than fungi, and usually colonize the substrate once the less complex carbon sources have been exhausted [1,6,[52][53][54][55]. Our results show that, despite their relatively low sequence abundance, Actinomycetales and Bacillales (respectively, 1.8% and 5.9% in ZC1; 2.5% and 4.9% in ZC2) are among the 10 top bacterial orders in our compost samples, which were both collected at thermophilic stages. These results are in line with cultivation-dependent observations showing Bacillus among the dominant bacterial taxa recovered from compost during the thermophilic phase [1].
ZC1 and ZC2 16S-rRNA reads were further taxonomically classified at the level of genus by means of the RDP Naive Bayesian Classifier [45] (Fig. 3). In ZC1 the five most abundant genera are Acinetobacter, Stenotrophomonas, Xanthomonas, Comamonas and Clostridium, which account for more than half (,52%) of all identified genera, while in ZC2 about 70% of the 16S-rRNA sequences were assigned to genus Lactobacillus. An analysis performed with the M5NR database also shows similar results (data not shown). The remaining bacterial community in both samples appears to be distributed in more than two hundred different genera (Table S3).
Rarefaction curves from the samples were determined at genetic distance of 3% by using rRNA-related sequences retrieved from the whole metagenomic sequences dataset (4,420 sequence reads for ZC1 and 5,616 sequence reads for ZC2). The rarefaction curves ( Fig. 4) did not reach saturation, with the number of species sampled being 2,260 and 2,816 for ZC1 and ZC2, respectively. These numbers are lower bounds on the species richness of the two samples and they support our initial hypothesis that the Zoo composting process would host a large microbial diversity. We do not report diversity estimators as given by indexes such as Chao1or Shannon because such estimators are strongly biased by sample sizes and do not seem to yield reliable results [56].

Species Diversity of Lactobacilli in ZC2
As discussed above the genus Lactobacillus predominates in the ZC2 metagenome (Fig. 3). This result is consistent with previously reported results from a recent study of the microbial diversity of a composting process in pilot and full-scale operations performed in drum units fed with organic municipal waste [6]. There are other studies reporting presence of Lactobacilli in composting [8,[57][58][59]. In the Partanen et al. study [6], based on analyses of 1,560 reads generated from 16S rRNA gene libraries from 18 samples, Lactobacillus was found to be highly abundant at the start of the process (reaching more than 90% in one of the samples, 4 days into the composting process [6]). The presence of Lactobacillus in these samples correlated with low pH (4.7-5.9) and mesophilic temperatures, except for one sample where pH was 7.8 [6]. This contrasts to some extent with the ZC2 sample conditions, which had thermophilic temperatures and pH 7.0. Presence of Lactoba-cillus under thermophilic conditions is consistent with previous reports [60,61].
Lactobacilli are almost ubiquitous and found in environments where carbohydrates are available such as dairy products, fermented fish and sourdoughs [64][65][66]. As members of the lactic acid bacteria (LAB) group, a number of Lactobacillus species are recognized as safe bacteria and are used as probiotics and/or starter cultures in food and feed fermentation [62,67]. Due to their competitiveness and adaptation to the environmental conditions, certain LAB species dominate specific fermentation processes, and it is believed that production of bacteriocins plays an important role in this competitive advantage [61], which might justify the dominance of Lactobacillus in the ZC2 metagenome. Moreover, the ZC2 sample was collected after 60 days of composting, when most of the hemicelluloses and cellulose have been converted to less complex carbohydrates, allowing colonization by thermophilic Lactobacilli. A recent study [68] identified Lactobacillus species in the feces of 16 animals classified as carnivores, omnivores and herbivores. L. johnsonii and L. reuteri were among the most abundant species isolated from carnivores (though also present in omnivore and herbivore feces), and L. plantarum, L. brevis and L. casei were isolated from omnivores. Such results are consistent with our observations of Lactobacillus diversity in ZC2 and the use of diverse animal fecal material in the ZC2 composting substrate.

Functional Profiling of Compost Metagenomes
The functional profiles of the ZC1 and ZC2 metagenomes were determined by classification of predicted genes based on Clusters of Orthologous Groups (COG/KOG) [69] assignments. At the highest level of the COG category system, ZC1 and ZC2 exhibit a similar profile (Fig. 5). Moreover, ZC1 and ZC2 exhibit approximately the same COG functional categories distribution seen in general for prokaryotes [69], reflecting the dominance of the Bacteria domain in these microbiomes. As expected, typical eukaryotic KOG functional categories (RNA processing and modification, Chromatin structure and dynamics, Extracellular structures, Cytoskeleton and Nuclear structure) are not represented in our sequence data set.
Functional specificities of ZC1 and ZC2 are revealed using deeper levels of the COG hierarchy. Among assigned COG functions we observed many that are relevant to the expected characteristics of a complex microbial community engaged in biodegradation. For instance, some of the abundant COG functions in ZC1 and/or ZC2 (Table 3), such as hydrolases and dehydrogenases (COG1012, COG1960, COG1028, COG0673 and COG0561) and proteins involved with carbohydrate transport and metabolism (COG0395, COG1175, COG1129, COG1109, COG2814 and COG2723), can be related directly to the dynamics and recycling power in the microbial community structure in a biomass degrading environment. In addition, among the most abundant functions present in ZC1 and/or ZC2 metagenomes (Table 3), we found several COGs associated with bacterial efflux pumps (COG1132, COG0841, COG0534, COG1131 and COG1136), which are known to export substances such as antibiotics and toxic molecules [70]. We hypothesize that ZC1 and ZC2 proteins with these functions may play a role in bacterial defense against toxic metabolites such as antibiotic compounds and anti-microbial peptides, produced by many bacteria (e.g. acid lactic bacteria, Staphylococcus and Bacillus) during the composting process [57]. The 30 most abundant COG functions (Table 3) also include functions related to regulation in response to environmental stimuli such as histidine kinases and response regulators (COG0642 and COG0745) and transcriptional regulators (COG1609 and COG0583). The high proportion of these COGs could be indicative of the need to respond to the constant changes in the composting environment and to the interactions required by its microbial community.
The ZC1 set includes a group of predicted genes annotated as coding for cellulase M and related proteins (COG1363 and EC 3.2.1.4). An alignment of two of these ZC1 predicted protein sequences (349 and 350 aa) with Clostridium thermocellum cellulase M results in 50% identity ( Figure S1). Despite the difficulty in distinguishing CelM from the M42 family of peptidases based on sequence similarity [71], C. thermocellum CelM shows endoglucanase activity and appears to be noncellulosomal [72]. The ZC1 metagenome includes other predicted genes related to cellulose degradation activities in higher abundance when compared with the ZC2 metagenome. For instance, while the ZC1 metagenome has 112 predicted protein sequences annotated as cellulase (glycosyl hydrolase family 5) and 32 predicted protein sequences annotated as proteins with cellulose binding domain, ZC2 has only 19 and two sequences, respectively, with the same annotation. In addition, we were able to identify 65 predicted protein sequences containing the dockerin domain (pfam00404) and 36 predicted protein sequences with the cohesin domain (pfam00963) in the ZC1 metagenome. Although in much lower abundance, the ZC2 metagenome also contains predicted genes annotated with these functions, six and eight sequences with the dockerin and cohesin domains, respectively. These enzymes and protein modules are known components of the cellulosome, a multienzyme complex that mediates the deconstruction of hemicellulosic substrates by anaerobic bacteria [73]. Accordingly, 867 predicted genes annotated with COG3459 (cellobiose phosphorylase EC:2.4.1.20), for an enzyme family that is key for microbial cellulose utilization [48], are found in the ZC1 metagenome, while ZC2 contains 267 such sequences.
The degradation of other components of the plant cell wall, such as pectin, contributes to reduction of plant biomass. Together the ZC1 and ZC2 metagenomes have 584 predicted genes related to pectin degradation, such as pectate lyase (COG 3866), endopolygalacturonase (COG5434) and pectin methylesterase (COG4677). In ZC1 contig 00009.9 (27,919 bp) we found genes encoding these three enzymes along with predicted genes related to carbohydrate metabolism and other functions (Fig. 6). This contig appears to belong to a member of the bacteroidales order (data not shown). Altogether these results provide strong evidence for the notion that at the composting stage when ZC1 was sampled the microbial community has high metabolic potential for complex carbohydrate deconstruction and utilization of released oligosaccharides.

Putative Lignin-degrading Genes
Aware of the considerable interest in lignin breakdown methods for conversion of lignocellulose into second-generation biofuels and renewable aromatic chemicals [74], we searched for predicted genes related to lignin peroxidases and copper-dependent laccases in the ZC1 and ZC2 metagenomes. These are extracellular enzymes produced by ligninolytic white-rot and brown-rot fungi [75]. As noted above, fungi were essentially absent from ZC1 and ZC2; but several reports have described the ability of bacteria to breakdown lignin [74]. We found 43 (ZC1) and 190 (ZC2) predicted genes coding for iron-dependent peroxidases, which include Dyp-type peroxidases (pfam04261). For instance, the complete coding sequence of a Dyp-type peroxidase found in ZC1, with 307 aa, is 94% identical to a putative dyp-type peroxidase from Acinetobacter sp. (GI:389721224) ( Figure S2). In ZC2 we identified a dyp-type peroxidase complete coding sequence (318 aa) that is 100% identical to a Dyp-type peroxidase from Lactobacillus acidipiscis KCTC 13900 (GI:366090439) ( Figure S2). However, neither was predicted to be a secreted enzyme. The Dyp-type peroxidase family appears to contain bifunctional enzymes, with hydrolase or oxygenase, as well as typical peroxidase activities [76]. It has been suggested that secreted bacterial Dyp-type peroxidases may represent the bacterial counterpart of the fungal lignin peroxidases, with examples being the ones produced by the Actinomycetales Rhodococcus sp. and Thermobifida fusca [77,78]. On the other hand, both ZC1 and ZC2 metagenomes contain, respectively 224 and 110 sequences encoding genes with similarity to heme-dependent bifunctional catalase-peroxidase (EC:1.11.1.7/EC:1.11.1.6), a family of enzymes recently proposed to contribute to lignin degradation in the Actinomycetales Amycolatopsis sp [79]. In ZC1 we found a predicted gene 60% identical to a catalase-peroxidase from Amycolatopsis sp (GI: 385676086) ( Figure S3). Thus, it appears that ZC1 and ZC2 have the potential for lignin degradation of the compost lignocellulosic biomass. Based on the above observations, we hypothesize that this capability is due to Actinomycetales species present in both microbiomes (Fig. 2).

Comparison with Seven Other Metagenomes
We compared the two composting microbiomes with seven public metagenomes: benzene-degrading bioreactor, biofuel reactor, compost minireactor, termite hindgut, poplar biomass bioreactor, lake sediment, and rain forest soil. The general features of these metagenomes are listed in Table S5. Among the criteria for selecting these public metagenomes for our comparative analyses were their relatedness to biomass deconstruction   environments, whole shotgun sequencing strategy, and annotation of assembled sequences publicly available on IMG/M [80]. The COG functional categories overall distribution for the seven public metagenomes reflects the dominance of the Bacteria domain, similarly to what was seen for the ZC1 and ZC2 metagenomes (Fig. 7), even though each individual microbiome composition is quite different. As described above, ZC1 presents a significant abundance of Clostridiales, but Lactobacillales predominate in ZC2 (Fig. 2). The termite hindgut microbiome is enriched in Spirochaetales and Fibrobacterales [37], and the biofuel reactor metagenome is highly enriched in Bacteroidales and Clostridiales (IMG/M unpublished data).
Again here, at the highest level of the COG system, we found general agreement of the distribution in ZC1 and ZC2 compared with the selected seven public metagenomes, but with some differences (Fig. 7). Among the broad differences we highlight the following. In the ZC2, biofuel reactor, and rain forest soil metagenomes COGs belonging to functional category G (Carbohydrate transport and metabolism) are statistically overrepresented compared with the other metagenomes except termite hindgut. The functional category K (Transcription) is also statistically overrepresented in the rain forest soil compared with the other metagenomes. On the other hand, secondary metabolite biosynthesis-related COGs are statistically overrepresented in the compost minireactor, poplar biomass bioreactor and lake sediment metagenomes, but less abundant in the termite hindgut microbiome (Fig. 7, Functional category Q). Also, the termite hindgut metagenome is particularly rich in cell motility COGs in comparison with the other metagenomes (Fig. 7 category N), as has already been noted [81]. Even though functions related to signal transduction mechanisms are enriched in the ZC1 and ZC2 metagenomes as discussed above (Table 3), the other seven metagenomes are even more enriched in this category (Fig. 7, Functional category T). At deeper levels of the COG system, a comparison of COG functions present in the compost metagenomes and in the seven selected metagenomes revealed a set of 35 and 179 COGs statistically overrepresented respectively in ZC1 (15,623 predicted genes) and ZC2 (76,175 predicted genes) (Table S6). Among these overrepresented COGs are those associated with bacterial efflux pumps (COG 1132 and COG0534), which are abundant within the ZC1 and ZC2 metagenomes, as already noted above. The set of COGs statistically overrepresented in ZC2 with respect to the other seven metagenomes include predicted genes related to fermentation, such as Pyruvate/2-oxoglutarate dehydrogenase complex and L-lactate dehydrogenase, which is consistent for a metagenome in which Lactobacillus species predominate. Also, predicted genes related to phosphotransferase system (COG1455, COG1263 and COG1264) and to ABC-type transport systems (Table S6) are overrepresented in the ZC2 metagenome, revealing its high potential for sugar uptake.
A hierarchical clustering of functional gene groups based on COG functional categories and on COG functions of ZC1, ZC2 and the seven public metagenomes (Fig. 8) emphasize points made above. In both diagrams ZC1 and ZC2 cluster together, demonstrating their similar functional profile, despite large differences in microbial species composition. In the clustering using the highest COG categories (Fig. 8A), branch lengths are short, giving evidence of the compositional similarity among the metagenomes compared. In the clustering using COG functions (Fig. 8B) we see much longer branch lengths, denoting their specificities.

Concluding Remarks
Composting is a highly dynamic process involving changing microbial communities that are very efficient in organic matter decomposition. Here, the complexity of this process was analyzed at a detailed level by shotgun metagenomic sequencing. Our results fit well with the current understanding that biomass degradation in composting, including deconstruction of recalcitrant lignocellulose, is fully performed by bacterial enzymes, possibly derived from Clostridiales and Actinomycetales [20,74]. Although fungi are generally considered the main microbial decomposers of plant material [75], their role in composting is possibly diminished because of frequent anaerobic and thermophilic conditions in semi-static composting processes like the São Paulo Zoo composting operation, similarly to what has been observed in the anaerobic decomposing of poplar wood chips [82].
Our results indicate that cellulose and hemicellulose deconstruction during the composting process appear to be performed by cellulosomal enzymes. Indeed, it has been proposed that the cellulosome is more efficient in degrading complex plant polysaccharides than ''free enzymes'' produced by aerobic bacteria and fungi [73].
Despite the differences in the phylogenetic profile of the two microbiomes we have analyzed, their overall functional profile is similar. Moreover, we found a general agreement of the Zoo compost metagenomes functional categories distribution in comparison with seven selected metagenomes of biomass deconstruction environments. On the other hand, the organism composition of these microbiomes are quite different, indicating the potential for distinct bacterial communities to provide alternative mechanisms for the same functional purposes. If correct, this suggests that complex microbial environments harbor functional capabilities carried out in novel ways. In support of this we note that a new strategy for lignocellulose degradation has been recently described in yak rumen, which does not involve either cellulosomes or a free-enzyme system [27].
It is also notable that genes encoding proteins related to pectin degradation are present in the Zoo compost metagenomes. Pectinrich biomass has been considered as an alternative feedstock for biofuel production [83]. Thus, a composting operation such as the one we analyzed here can be considered a rich source for prospection of biomass degradation enzymes. Moreover, contin-ued exploration of complex environments such as composting will foster the discovery of compounds (e.g. antibiotics) and/or mechanisms (e.g. interspecies bacterial communication) relevant to the understanding of how particular environments drive the functional structure of microbial communities.

Sample Collection and DNA Extraction
Two 8 m 3 concrete chambers ZC1 and ZC2 were established, respectively on 01/26/2011 and 07/21/2009, for composting, following routine procedures at the São Paulo Zoo Park composting facility with minor modifications from a previously described method [84] to attend the needs of a large composting operation. The two cells were fed with similar biosolids composed by shredded tree branches and leaves from the surrounding Atlantic rain forest, plus manure, beddings and food residues from about 400 species of zoo animals (mammals, avian and reptiles), so that both reached a Carbon: Nitrogen ratio of roughly 30:1. Adequate aerobic conditions were maintained by having air pipes at the bottom of the chamber and by arranging the bio-residues in a way to permit air flowing from bottom to top through shredded tree branches and wood chips. The chambers were watered once a week to maintain proper humidity levels (50-60%) and to avoid excessive heating. Moisture content was estimated by microwave oven drying as previously described [84]. Temperature was measured weekly at five points in each chamber; reported temperatures are averages of the five measures. Over the course of the composting process temperatures in the composting mass oscillated between 50 and 72uC. The compost was thoroughly mixed using a BobCat skid-steer loader around day 40 after temperature dropped below 55uC; immediately after, temperatures climbed back to the 70-72uC range, thus ensuring thermophilic conditions. No undesirable odors were detected during the composting process, indicating that a desirable aerobic level was reached. After ,90 days the compost material was removed and aged for an additional ,10 days in windrows.
Samples were collected following the protocol previously described [85], at day 8 of composting from one chamber (Zoo Compost 1, ZC1) and at day 60 of composting from another chamber (Zoo Compost 2, ZC2) which had been aerated 8 days earlier. In brief, each sample of approximately 300 g was made by pooling 5 subsamples taken from 5 points of each compost pile. At the moment of sampling, average temperature was 65.8uC and 67.2uC for ZC1 and ZC2, respectively, and pH was 7.0 for both. Samples were stored at 280uC until DNA extraction. Aliquots of the ZC1 and ZC2 samples were lyophilized and macerated, and approximately 2 g of dried material was used for DNA extraction with MoBio DNA Power Soil kit (MoBio Laboratories, Carlsbad, CA). Some samples (including ZC2, but not ZC1) were pre-treated with lysozyme, Proteinase K and sodium dodecyl sulfate prior to purification with the MoBio kit. The critical step for DNA extraction was the maceration with grinding mortar and pestle, and both ZC1 and ZC2 samples were macerated under the same conditions. Mechanical cell lysing through maceration was shown to be more effective than chemical or enzymatic lysing. Thus we believe it is highly unlikely that enzymatic pre-treatment in the DNA extraction procedure would have favored DNA extraction of selected bacterial groups. DNA purity and concentration was analyzed by spectrophotometric quantification at 260 nm, 280 nm and 230 nm and using Invitrogen's Quant-iT Picogreen dsDNA BR assay kit. Metagenomic DNA integrity was examined using Agilent Bioanalyser DNA 7500 LabChip.

Pyrosequencing and Sequence Analysis
The two DNA samples (500 ng) were submitted to pyrosequencing following standard Roche 454 GS FLX Titanium protocols (Roche Applied Science). Shotgun libraries for ZC1 and ZC2 DNA were constructed using GS Titanium Rapid Library Prep Kit and submitted to four sequencing runs. Sequencing reads were quality-filtered and assembled using 454 Newbler assembler software version 2.5.3. The resulting sets of contigs (including singlets) were submitted to the IMG/M annotation pipeline [80]. Unassembled raw reads were submitted to annotation on MG-RAST metagenomics analysis server [40] using their default quality control pipeline.
Microbial composition analyses were performed using MG-RAST best hit classification tool against the databases M5RNA (Non-redundant multisource ribosomal RNA annotation) or RDP (Ribosomal Database Project) available within MG-RAST (version 3.2.4.2) [40] using minimum identity of 98%, maximum e-value cutoff of 10 230 and minimum alignment length of 50 bp. Analyses were also done against M5NR (M5 non-redundant protein) using minimum identity of 60%, maximum e-value cutoff of 10 25 and minimum alignment length of 50 bp.
Bacterial taxonomy classification and rarefaction were obtained using rRNA-related sequences retrieved from the whole metagenomic sequences data set (4,420 sequences for ZC1 and 5,616 sequences for ZC2, annotated as rRNA-related by MG-RAST) and the Classifier and PYRO pipeline tools in the Ribosomal Database Project [41].
Lactobacillus species identification in ZC2 was done by comparing ZC2 reads using BLAST against three different databases. The first was the RDP database of 16S rRNA sequences (version 10) [41]; the second was the NT database from GenBank (downloaded on 6/19/2012); and the third was the M5NR database available within MG-RAST (version 3.2.4.2) [40]. For the RDP and NT databases (searched with BLASTN) we used the following conservative criteria: we only considered alignments with at least 200 positions, at least 98% identity to subject sequences, and comparison results in which a defined Lactobacillus species (as opposed to Lactobacillus sp.) was the first hit. Moreover, a species assignment was considered positive only when the bit score of the first hit was larger than the bit score of the second hit (hits were sorted on bit score) and when there were at least five different reads witnessing the assignment (for RDP) or at least 50 (for NT). The criteria for species assignment against the M5NR database (searched with BLASTX) were those adopted by the MG-RAST pipeline. In defining the final species tally we considered only our results based on the RDP and NT databases, although we do report the M5NR number of hits as well (in Table S4). We have also used the software Metaphlan [86] to confirm these identifications and to provide abundance figures.
Functional classification and comparative analyses of metagenomes were performed based on COG categories, Pfam family and EC numbers for the metagenomic data sets annotated by IMG/M pipeline [80], using the function comparison tool considering its statistical parameters (binomial test). For all tests of statistical overrepresentation we used a maximum p-value of 0.05.

Protein Sequence Comparison and Alignments
Protein-coding gene sequences retrieved from IMG/M were further compared against the NR database of GenBank [87] using BLAST [88] with maximum e-value 10 25 and aligned to orthologs using ClustalW [89].

Hierarchical Clustering
Hierarchical clustering was performed using a matrix of the number of reads assigned to COGs from each metagenome and was generated with the ''Compare Genomes'' tool in IMG/M [80], which uses uncentered correlation as distance measure and pairwise single-linkage clustering.