The Metagenome of an Anaerobic Microbial Community Decomposing Poplar Wood Chips

This study describes the composition and metabolic potential of a lignocellulosic biomass degrading community that decays poplar wood chips under anaerobic conditions. We examined the community that developed on poplar biomass in a non-aerated bioreactor over the course of a year, with no microbial inoculation other than the naturally occurring organisms on the woody material. The composition of this community contrasts in important ways with biomass-degrading communities associated with higher organisms, which have evolved over millions of years into a symbiotic relationship. Both mammalian and insect hosts provide partial size reduction, chemical treatments (low or high pH environments), and complex enzymatic ‘secretomes’ that improve microbial access to cell wall polymers. We hypothesized that in order to efficiently degrade coarse untreated biomass, a spontaneously assembled free-living community must both employ alternative strategies, such as enzymatic lignin depolymerization, for accessing hemicellulose and cellulose and have a much broader metabolic potential than host-associated communities. This would suggest that such a community would make a valuable resource for finding new catalytic functions involved in biomass decomposition and gaining new insight into the poorly understood process of anaerobic lignin depolymerization. Therefore, in addition to determining the major players in this community, our work specifically aimed at identifying functions potentially involved in the depolymerization of cellulose, hemicelluloses, and lignin, and to assign specific roles to the prevalent community members in the collaborative process of biomass decomposition. A bacterium similar to Magnetospirillum was identified among the dominant community members, which could play a key role in the anaerobic breakdown of aromatic compounds. We suggest that these compounds are released from the lignin fraction in poplar hardwood during the decay process, which would point to lignin-modification or depolymerization under anaerobic conditions.


Introduction
In recent years, biofuels have attracted great interest as an alternative, renewable source of energy in the face of the ongoing depletion of fossil fuels, our energy dependence on them, and our growing environmental awareness of the negative consequences of burning such fuels. Plant biomass represents a globally abundant carbon source with the potential to provide a sustainable source of mixed sugars for biofuels production [Billion Ton Report; http:// feedstockreview.ornl.gov/pdf/billion_ton_vision.pdf]. However, breakthrough technologies are still needed to overcome the critical barriers to developing cost-effective processes for converting recalcitrant biomass to fuels and chemicals, such as the high cost of commercially available lignocellulolytic enzymes and hardware for thermal chemical pretreatment [1].
In nature, cellulosic biomass is decomposed by complex and efficient processes, usually by microbial communities that produce cellulolytic enzymes that function synergistically to decompose plant biomass [2][3][4]. The enzymes primarily involved in plant biomass breakdown are the glycoside hydrolases (also known as glycosidases), which catalyze the hydrolysis of the b (1-4) main chain bonds in cellulose and hemicelluloses; as well as the considerable diversity of side chain bonds in hard wood hemicellulose [5]. In the decomposition of terrestrial biomass, bacterial and fungal communities often occupy various temporal stages of the conversion process, depending on the microenvironment of the decay site [6][7][8][9].
Prokaryotic microorganisms have evolved and accumulated remarkable physiological and functional diversity, and thereby harbor a major reserve of genetic potential. The traditional method to tap this reservoir of functional potential is by cultivating microorganisms and screening individuals for the requisite phenotypes. However, 95 to 99.9% of microorganisms are not readily cultured by standard laboratory techniques [10]. Metagenomic approaches provide a powerful tool to bypass the limitation of cultivation-based methodologies, as they allow to predict the biochemical potential of a particular community or to prospect novel biocatalysts from environmental samples for experimental validation and characterization [11][12][13][14][15][16][17], including those involved in biomass decomposition [13,18]. In the current study, we examined a free-living microbial consortium capable of degrading poplar biomass in the absence of oxygen or other nutrient sources, which we hypothesized, would possess a rich supply of enzymes for degrading cellulose, hemicelluloses and possibly lignin. We generated several hundred megabases of metagenomic sequence data which proved sufficient to assemble draft genomes of dominant community members and annotate several thousand putative biomass degrading genes.

Imaging of biomass decomposition
After one year incubation at 30uC in a closed system, which created a gradient from micro-aerobic to anaerobic, the poplar biomass showed strong visual signs of decomposition. The volume of the biomass had compacted approximately 25%, the biomass particles became softer, and the color had changed from light to dark brown (see Fig. S1 and Methods S1 for sample preparation in the supplementary materials). To confirm the decay process Transmission electron microscopy (TEM) was used to examine the structural integrity of the biomass at the cell wall level. By comparing samples from untreated (Fig. 1A) and incubated biomass (Fig. 1B), clear signs of cell wall decomposition were observed. More detailed imaging revealed attached microbial cells ( Fig. 2A), indentations in the cell wall (Fig. 2B), and invading cells covered with cellulosome-like structures (Fig. 2C).
To further investigate the effects of bioreactor incubation on cell wall structure, a combination of imaging techniques was used. Coherent anti-Stokes Raman (CARS) microscopy, used to visualize the lignin, showed that microbial digestion under anaerobic conditions resulted in a relative enhancement of the lignin signal (Fig. 3), thus pointing towards preferential degradation of hemicellulose and cellulose. The unchanged thickness of the plant cell walls in the anaerobic samples suggests that lignin breakdown in the bioreactor is limited, but sufficient to gain access to the cellulose and hemicellulose. Degradation of cellulose and hemicellulose was confirmed by immune staining with fluorescent antibodies followed by epifluorescent light microscopy ( Fig. S2 and Methods S1). In comparison with untreated biomass, samples from the composted poplar biomass showed a decrease in the fluorescent signals for both cellulose and hemicellulose. CARS microscopy also showed that under aerobic conditions, in the  upper part of the bioreactor, polysaccharides and lignin were simultaneously degraded and/or modified, resulting in a thinner plant cell wall that showed relatively unchanged lignin signal intensity as compared to that of untreated cell walls.

Metagenome sequencing
DNA extracted from the bioreactor material was subjected to extensive 16 S/18 S profiling by V6-V8 pyrotag sequencing (,1.5 million tags). To characterize the functional diversity and potential of the community, we performed metagenome sequenc-ing on both the Sanger and 454 sequencing platforms (see Methods). A total of approximately 675 Mb of sequence was generated, and assembly resulted in 44,600 contigs and 1.24 M singletons, totaling 382 Mb, that contain a total of 893,380 putative genes. A summary of the metagenome sequencing statistics is provided in Table 1.

Microbial community composition
Analysis of 1,516,612 V6-V8 pyrotags (after filtering for length and quality; see methods) indicated the presence of a diverse community degrading the poplar biomass under anaerobic conditions (see Fig. 4). As we found in the metagenome binning, the Firmicutes and Proteobacteria represented the most abundant microbial phyla (with 45.9% and 32.3%, respectively), followed by the Bacteroidetes (9.9%). The Firmicutes were dominated by the class Clostridia that form the most abundant class of microbes in this community. They are mainly represented by members of the orders Clostridiales (37.5%), which was dominated by members of the family of the Clostridiaceae (9.9%) and Catabacteriaceae (15.8%). Of the Clostridiales, species of the genus Clostridium, which encompasses numerous highly active anaerobic cell wall degraders (e.g. C. cellulolyticum and C. thermocellum), comprised nearly 64% of this order (6.3% of the total community).
The Proteobacteria, representing 32.4% of the total community, were dominated by the alpha-Proteobacteria, which comprise 28.2% of the tag data. Of this class, 22.9% belonged to the order Rhodospirillales, the family Rhodospirillaceae (21.1%) and the genus Magnetospirillum (19.9%), making this the most abundant genus in the community. The Bacteroidetes are the third most abundant group of microorganisms, representing 9.9% of the total community. Finally, we also observed a group of Cyanobacteria, which  represented approximately 4% of the community. The Archaea formed only a small part of the total pyrotag data, largely because of a mismatch in the forward primer with most archaeal lineages. We found 11,155 tags representing Euryarchaeota, of which 10,258 tags could be assigned to the Methanomicrobia. Only 678 tags were derived from Eukaryota, indicating that fungi don't play a major role in the decomposition of biomass in this community.

Metagenome assembly and binning
This deep sequencing effort resulted in significant assembly, such that .52% of the quality filtered, non-redundant reads assembled into 34,156 contigs greater than 500 bp in length totaling 46.6 Mb in size. While many of these contigs were small (average size 1364 bp), there were many well-assembled contigs of up to ,80 kb and ,206 depth when all the data were assembled with Newbler [19]. Moreover, there were several scaffolds over 1 Mb in size that could represent draft genomes of bioreactor populations.
Phylogenetic marker genes, including 16 S ribosomal RNA and ribosomal protein genes, were identified in the sequence data and the contigs/scaffolds in which they resided and used to train a phylogenetic classifier using ClaMS (Classifier for Metagenome Sequences; http://clams.jgi-psf.org/). Binning of the larger contigs (at least 1 kb) using trinucleotide distributions resulted in the formation of six well defined clusters thought to represent relatives of Magnetospirillum, Bacteroides, two distinct Clostridiales, Cyanobacteria  Table 2). For 36% of the contigs greater than 1 kb in length, no match to any of the major clusters was obtained. Dominance of the biomass degrading community by species belonging to the classes Clostridia, Bacteroidetes and Methanomicrobiales was also described for the metagenome of a microbial community of a production-scale biogas plant fermenter [20]. In the case of the poplar biomass degrading community the dominance of Magnetospirillum (present at ,106 to 206 coverage in the metagenome assembly) putatively involved in biomass decomposition is unique, as is the absence of fungi. This point opens the possibility for the existence of new mechanisms for biomass decomposition, including lignin depolymerization, under anaerobic conditions. These data largely confirmed the results of community profiling by pyrotag sequencing, with the exception of Methanomicrobiales, an archaeal group which was poorly targeted by the pyrotag primers used.

Bioprospecting for GHases and other enzymes involved in biomass breakdown
Genes whose enzymes were putatively involved in the decomposition of recalcitrant plant biomass or modifying carbohydrates were identified based on their similarity to carbohydrate active enzymes (glycoside hydrolases, glycosyl transferases) and fungal oxidoreductases as described in the CAZy (http://www.cazy.org/) [22] and FOLy (http://foly.esil.univ-mrs.fr) [21] databases, respectively. Of the 888,455 protein-coding genes identified by IMG/M, 28,793 candidate genes were identified based on BlastP homology with CAZy or FOLy genes (e-value,1e-10), representing a total of 230 gene families. The 22 most dominant gene families contained 19,510 candidate genes, representing 67.8% of the genes. Candidate genes belonging to the glycosyl transferase families GT2 (4,354) and GT4 (4,178) were the most abundant, followed by members of the glycoside hydrolase families GH13 (1,381), GH3 (832) and GH2 (814) (see Fig. 5). The glycosyl transferases are ubiquitous enzymes that catalyze the attachment of sugars to a glycone [23,24] and are not thought to be involved in cellulose and hemicellulose hydrolysis. The dominance of candidate members of the glycoside hydrolase family 13 is not surprising, as is the largest sequence-based family of glycoside hydrolases and groups together a number of different enzyme activities and substrate specificities acting on a-glycosidic bonds, including hydrolases, transglycosidases and isomerases [25]. The Family 3 enzymes, also quite abundant in this community, have been classified as b-D-glucosidases, a-L-arabinofuranosidases, b-D-xylopyranosidases and N-acetyl-b-D-glucosaminidases [26]. In many cases, the enzymes have dual or broad substrate specificities with respect to monosaccharide residues, linkage position and chain length of the substrate, such as both a-L-arabinofuranosidase and b-D-xylopyranosidase activity [27]. The most common activities for bacterial Family 2 glycoside hydrolases include b-Dgalactosidases, b-glucuronidases, b-D-mannosidases, and exo-bglucosaminidases. Members of these GHase families seem to play a key role in the breakdown of hemicellulose and disaccharides. Members of cellulose degrading gene families GH5, 9, 44, 46, 51, and 74 (which also degrades xyloglucan) are represented in the consortium, although not at high levels. The GH family 10 enzyme, known to hydrolyze the b-(1-4) glycosidic linkages in xylan, is also observed. Recently, functional verification of a small  subset of the predicted biomass-degrading genes, which were selected based on homology to plant biomass-degrading GHase families/activities of interest and the quality of their sequences, was performed, and resulted in the cloning of four novel GHases that could be successfully expressed in E. coli [28]. Many candidate genes displayed some degree of homology to fungal oxidase families catalogued in the FOLy database including the Families LO3 (including cellobiose dehydrogenases and dihydrolipoamide dehydrogenases; 335 genes), LO2 (peroxidases, 231 genes) and LO1 (multi-copper oxidases, 35 genes) [22]. However, sequence composition based binning, as well as closest blast hits, indicated that most, if not all, of these genes were derived from bacterial members of the community. Indeed, very few putative fungal genes involved in biomass decomposition were found, which is consistent with the apparent lack of candidate fungal GHase Family 7 members in this community or fungal 18 S rRNA genes as determined by pyrotag sequencing.

Comparison to other metagenomes
To put the biomass degrading potential of the bioreactor community decaying poplar woodchips in perspective, we compared it to metagenome data from several other communities expected to have some biomass degrading activity, including the microbiomes of compost [29], rhizosphere soils from maize, miscanthus and switchgrass (unpublished, see GOLD CARD Gm00349), the top and bottom part of the fungal garden of the leaf-cutter ant (Atta colombica) [30], cow rumen [18], and the gut communities from termite [31], wallaby [32], canine [33], human [34] and mouse [35].
Rather than comparing various metagenomes on the species level, we used the distribution and relative abundance of glycoside hydrolase families as variables for metagenome comparison on the level of functionality related to biomass breakdown. This information was obtained for the various metagenome sequence sets by performing Blastx searches against the CAZy database (to avoid biases from different gene prediction algorithms used), and was subsequently used to calculate the correlation distances between the various metagenomes ( Figure 6). The use of datasets that differ significantly in the amount of sequence data was accounted for by (i) performing CAZy blasts with overlapping 300 bp subsequences from the assembled contigs and unassembled reads, to avoid a bias toward gene discovery in longer sequences; (ii) multiplying by read depth when hits occurred on assembled contigs to account for abundance in raw data; (iii) normalizing all counts to the total amount of sequence data weighted by contig depth (a sum of total bases multiplied by depth for each contig). Our analysis indicated that total frequency of CAZy database hits per million nucleotides was relatively consistent across datasets, suggesting that total dataset size was not having a dramatic effect. Moreover, we did not see a tendency of similar size datasets to cluster together.
Clearly, the gut communities form a distinct group from the free-living biomass decay communities, even though genes involved in host-symbiont interactions were not part of the analysis. Notably, this is the case despite the very different sequencing technologies and read lengths used for these projects (e.g. Sanger for termite and wallaby; 454 for human, dog and mouse; and Illumina for cow rumen). In addition, the microbial communities found in compost and decaying poplar woodchips show, at least on the level of glycoside hydrolase abundance, a close affiliation. A close correlation is also observed for the glycoside hydrolase distributions among the maize, miscanthus and switchgrass associated rhizosphere communities, as well as the microbiomes of the top and bottom part of the fungal garden of the leaf-cutter ant (Atta colombica) [30]. Within the gut communities, the distribution of GHases in the termite hindgut microbial community is distinct from that seen in mammals.
To reveal the basis for these differences on a gene level, the total coverage per family of biomass modifying enzymes, which included cellulases, hemicellulases, debranching enzymes [21] and enzymes homologous to lignolytic enzymes [22], was compared for the various metagenomes. The comparisons for the poplar woodchip bioreactor community with those in compost, cow rumen, termite hindgut, and fungal garden bottom are presented in Figures 7A, 7B, 7C and 7D, respectively. Strikingly, a near-complete lack of putative lignin degrading enzymes was shared by all the gut communities examined (wallaby, dog, human, mouse and even cow rumen).

GHase distribution over phylogenetic groups
Cultivation-based approaches did not result in the isolation of members of the six major phylogenetic groups as identified based on binning of the major metagenome contigs. Although we successfully isolated various members of the genus Clostridium, sequencing of their 16 S rRNA genes placed them outside the two dominant Clostridiales clusters. In order to determine the putative roles of members of the six dominant microbial clusters in poplar biomass decomposition, the distribution of the biomass modifying enzyme families, which included cellulases, hemicellulases, debranching enzymes and enzymes homologous to lignolytic enzymes, was compared for the various clusters. The data are presented in Table 3.
The largest number of GHases was found on the contigs assigned to the Bacteroidetes, indicating that members of this class are key biomass degraders in the community. This is consistent with their predominant role as biomass decomposers in other Figure 6. Calculated correlation distances between the metagenomes of the microbiomes associated with the poplar biomass decay community, compost, rhizosphere soils from maize, miscanthus and switchgrass, the top and bottom part of the fungal garden of the leaf-cutter ant (Atta colombica), cow rumen, and the gut communities from termite, wallaby, canine, human and mouse. The distribution and relative abundance of glycoside hydrolase families was used as variables for metagenome comparison on the level of functionality related to biomass breakdown. This information was obtained for the various metagenome sequence sets by performing Blastx against the CAZy database, and was subsequently used to calculate the correlation distances between the various metagenomes. doi:10.1371/journal.pone.0036740.g006 biotopes, such as the cow rumen. Also, some key functionalities for the breakdown of hard woods (summarized in Table 4), such as the hydrolysis of (1-2)-a-D-(4-O-methyl)glucuronosyl links in the main chain of hardwood xylans by enzymes of the families GH67 and GH115, are only found in contigs assigned to the Bacteroidetes. As for lignin breakdown, genes showing homology to LO3 are widespread among the dominant members of the community. However, other putative lignin degrading genes are very patchy: homologs to LO1 multi-copper oxidases were only found among the contigs assigned to the cyanobacteria, whereas the LO2 peroxidase homologs were found on contigs assigned to the cyanobacteria and one cluster of Clostridiales.

Discussion
Natural enrichment resulted in a microbial community that degraded poplar under anaerobic conditions, as was evident from strong visual signs of decomposition of the bulk biomass, enzymatic attack of the cell walls, and the disappearance of cellulose and hemi-cellulose. Therefore, this community must contain the necessary functionality to break down the various polymers found in poplar hard wood, including the major polysaccharides xylan and glucomannan as part of the hemicellulose, and cellulose, in addition to less abundant polysaccharides. The complex network of these polysaccharides and their side chains, and the presence of lignin, all contribute to the  recalcitrance of the poplar biomass. An overview of the poplar hard wood polysaccharides and the key enzymes that are theoretically required for the efficient decomposition of poplar hard wood are presented in Table 4. When comparing the enzymatic requirements for the breakdown of poplar hardwood with the presence of glycoside hydrolases among the dominant members of the biomass decomposing community (Table 3) . In general, the contigs assigned to Bacteroides contained the largest number of unique putative glycoside hydrolases. We therefore hypothesize that the Bacteroides-like community members play a major role in the breakdown of cellulose and hemicelluloses  Table 4. Overview of the critical polysaccharide bonds in found in poplar hardwood as well as the glycoside hydrolases that act on these bonds. under these conditions. Members of this phylum also play an important role in other anaerobic microbiomes with high biomass turnover rates, including the cow rumen [36]. Also, the two Clostridiales appear, based on the presence of various putative GHases, to play a key role in the breakdown of these two recalcitrant plant cell wall polysaccharides. This is not unexpected, as many members of the Clostridiales have been found in environments with high plant biomass turnover rates.

Major
Because the cellulose and hemicellulose polymers are complexed with lignin, the anaerobic microbes must possess mechanisms to access these polysaccharides, most likely via mechanisms that allow for local depolymerization of lignin, as no major removal of lignin from the cell walls was observed. Despite the lack of enzymes involved in lignin decomposition, the presence of bacterial genes that show homology to fungal lignin oxidases might provide some insights in this process. Homologs to the LO3 family This database was assembled by first cataloging the known glycosidic bonds in hardwoods (cellulose, hemicelluloses, and pectins) and then correlating reported enzyme activities from the Enzyme Commission (EC) database known or suspected to hydrolyze these bonds [67][68][69] (FOLy database) were common among the six dominant members of the community studied. Most of the genes with similarity to this family are annotated as dihydrolipoamide dehydrogenase, cellobiose dehydrogenases or have more general annotations, as this is a large family of proteins only some of which are involved in lignin breakdown. However, putative cellobiose dehydrogenases can very well be involved in lignin breakdown, as they were found to be able to interact with lignin in three important ways: (1) to break beta-ethers; (2) to demethoxylate aromatic structures in lignins; (3) and to introduce hydroxyl groups in non-phenolic lignins [37,38]. The presence of LO2 type peroxidases in the Clostridiales genome bins might hint towards an anaerobic mechanism for lignin depolymerization, since members of the Clostridiales require strict anaerobic conditions for their growth. In addition, genes with homology to the two Dyp-like peroxidase genes, involved in lignin degradation [39] by Rhodococcus jostii RHA1, were identified on low-coverage contigs and individual reads, suggesting that minor community members may also be important in local depolymerization of lignin, rendering the hemicellulose and cellulose available to depolymerization by the glycoside hydrolases that are synthesized by many members of the community. In fungi, the general role of lignin degrading enzymes is perhaps becoming clearer. Vanden-Wymelenberg et al. reported upregulated Phanerochaete chrysosporium genes in piles of decaying ball milled pine and the hardwood, aspen [40]. The piles of hardwood (aspen) displayed high transcript levels for a glucose oxidase-like oxidoreductase, a catalase, and an alcohol oxidase [40]. P. chrysosporium oxidoreductase transcripts, found when grown on pine, included the copper radical oxidase CRO2 and cellobiose dehydrogenase. In the study presented here, the anaerobic bacterial consortium grown on poplar hardwood appears to produce oxidoreductases more consistent with those produced by P. chrysosporium grown on pine, suggesting the bacterial homologs may have different specificity.
Although very little information is available on the anaerobic breakdown of lignin, some genetic clues on the evolution of anaerobic catabolism of aromatic compounds [41,42] have been described, and anaerobic degradation of aromatic compounds has been reported for various bacteria, including Azoarcus spp. CIB [43] and Geobacter metallireducens BamVW [44]. Also, members of the genus Magnetospirillum that degrade aromatic compounds anaerobically including toluene, phenol and benzoate, were previously isolated from denitrifying enrichment cultures [45,46]. Although members of this genus have not previously been identified as dominant members of biomass decomposing communities, nor have they been known to harbor metabolic potential to break down lignocellulosic biomass, their unexpected dominant presence in the microbial community decaying poplar biomass seems to indicate that they might play a role in lignin depolymerization or the breakdown of aromatic compounds released from the wood. The toluene-degrading strain Magnetospirillum TS-6 was found to contain genes that are homologous to those encoding benzylsuccinate synthase (Bss) and benzoyl-CoA reductase (Bcr), two key enzymes of anaerobic toluene and benzoate degradation respectively in known denitrifying bacteria [45]. These two genes were also found on a large contig that based on sequence composition was assigned to Magnetospirillum: a putative bssD gene, encoding a benzylsuccinate synthase activating enzyme I, was found immediately downstream of a bssA gene coding for a putative alpha subunit of benzylsuccinate synthase (PBDCA2_4994280). Since Magnetospirillum does not seem to contain any key functions for lignin depolymerization, such as LO1 and LO2, our hypothesis is that the presence of Magnetospirillum is key for the efficient degradation of toxic aromatic compounds that are released from the poplar wood during its decay. As such, Magnetospirillum acts as the ''liver'' of the community, making sure that the community doesn't collapse due to the accumulation of toxic lignin-derived aromatic compounds.
Members of the genus Magnetospirillum have been reported to play a significant role in oxic-anoxic transition zones of freshwater ecosystems, where opposing gradients exist of reduced iron and sulfide with oxygen, creating a suitable environment for microorganisms that derive energy from the oxidation of iron or sulfide [47]. It cannot be totally excluded that opposing redox gradient also existed in our reactor system, and that this resulted in the enrichment of Magnetospirillum.
The distribution and relative abundance of cellulases, hemicellulases, debranching enzymes and enzymes homologous to lignolytic enzymes revealed compelling patterns of enzyme frequencies among different types of communities. In this respect, the strong conservation in the relative distribution of candidate GHase between the top and bottom layers of the fungal garden, as shown in Figure S3, is striking, as both layers have a clear distinction: the top layer, which retains the green, harvested state of plant leaves; and the bottom layer, which contains mature fungus and partially degraded plant material. Furthermore, comparison of the phylogenetic diversity between top and bottom layer indicated distinct differences, although both layers are dominated by phylotypes in the a-Proteobacteria, b-Proteobacteria, c-Proteobacteria, Actinobacteria and Bacteroidetes [30]. The common presence of these dominant phylotypes might explain the conservation in functionality. It was also noted that no measurable lignin degradation occurred in the fungal garden. However, in order to explain the observed breakdown of hemicellulose and cellulose, some local degradation of lignin is required in order to access these polymers. The presence of bacterial genes with similarity to fungal peroxidases might provide a possibility for lignin breakdown.
In comparison, more light has been shed recently regarding the complete picture of fungal cellulose degradation by the discovery of the probable role of GH Family 61 oxidative enzymes [48]. For example, Vanden-Wymelenberg et al. found high levels of GH61 transcripts in decaying softwood piles inoculated with P. chrysosporium [40]. In the study presented here, the anaerobic bacterial consortium grown on the hardwood, poplar, appears to lack the ability to produce GH61 enzymes. Recently, the cellulose binding module (CBM) from Family 33, which is a structural analogue to GH61, was implicated in bacterial oxidative cellulose deconstruction [49]. In this consortium, four homologs of CBM33 were found, all located on relatively small unbinned sequences: two on individual 454 reads, and 2 on contigs of 2-36 depth. All putative CBM33 genes showed high GC content (64-69%), so on that basis could potentially belong to the Magnetospirillum, whose contigs varied considerably in coverage due to apparent sequencing bias in this high-GC genome. Furthermore, the bacterial consortium studied here encodes putative cellobiose dehydrogenases. This is significant, because it was suggested nearly 15 years ago that cellobiose dehydrogenase could be involved in cellulose depolymerization [50] and today it appears that GH61 (or CBM33) and cellobiose dehydrogenase could work cooperatively to effect oxidative cellulose depolymerization.
From a community perspective, a striking observation is the near-complete lack of putative lignin degrading enzymes shared by all the gut communities examined (termite hindgut, cow rumen, wallaby, dog, human and mouse); whereas compost, soil, and fungal garden microbes all harbored some of these genes, as did the members of the poplar biomass decomposing community. For example, the termite hindgut community has very few candidate functions involved in the decomposition of lignin; only one candidate LDA6 with homology to a glucose oxidase and 20 putative LO3 candidates, all displaying homology to dihydrolipoyl and dihydrolipoamide dehydrogenases, could be identified (Fig. 7C). The lack of enzymatic functions involved in the decomposition of lignin in the gut communities suggests that the hosts may provide adequate pretreatment of the biomass to allow for microbial decomposition, through chewing in all animals in addition to an alkaline pretreatment in the higher termite gut (pH 8.5-12, depending on species) [51,52], which might provide suitable solubilization of some lignins to permit the needed GHase accessibility to wall polysaccharides. While the relative availability of oxygen may also impact on the prevalence of lignin degradation pathways, it is worth noting that the termite gut has been found to have relatively high oxygen concentrations in some areas [53]. Significant lignin degradation has been proposed to occur in the Asian longhorned beetle, which could prove an exception to the pattern seen here [54]. This near-complete lack of putative lignin degrading enzymes was shared by all other gut communities examined (including bovine rumen, wallaby, dog, human and mouse). In the case of the termite gut, the relatively high pH (10)(11)(12) of the higher gut may provide suitable solubilization of some lignins to permit the needed GHase accessibility to wall polysaccharides.
In conclusion, natural communities subsisting on untreated plant biomass provide an ideal environment for the bioprospecting of enzymes involved in the depolymerization of plant cell-wall polysaccharides and lignin. Our results seem to indicate that these communities have a much broader metabolic potential than hostassociated communities and thus could provide a richer resource for finding new catalytic functions involved in biomass decomposition. This is exemplified by the microbial community that under anaerobic conditions is decaying poplar woodchips, as a very broad representation of glycoside hydrolases and putative lignin decomposing enzymes was found. Furthermore we unexpectedly identified bacteria among the dominant community members, similar to Magnetospirillum, that seem to play a key role in the anaerobic breakdown of aromatic compounds. We hypothesize that these compounds are released from the lignin fraction in the poplar hardwood during the decay process, which would point to lignin-depolymerization under anaerobic conditions.

Preparation of biomass and DNA samples
This work concentrated on the microbial community decaying poplar biomass under anaerobic conditions. 1.8 kg non-sterile yellow poplar saw dust, with particles ranging in sizes between 1 mm 3 to 0.3 cm 3 , was taken from the inside of a 1 m 3 pile and placed in a plastic, white, 10 L bucket. The biomass was humidified with 5 L 10 mM MgSO 4 solution and the bucket was closed with an air-tight plastic cover and incubated at 30uC. This temperature was chosen over thermophilic conditions, as we expected that the organisms naturally present on the biomass were unlikely to be adapted to thermophilic conditions. This resulted in the creation of a gradient ranging from micro aerobic at the top to anaerobic at the bottom of the biomass. After 3 and 12 months incubation in the dark at 30uC, 500 g biomass and 500 mL liquid were collected from the anaerobic zone at the bottom of the bucket and used for DNA isolation and imaging studies. In addition, a sample was taken from the aerobic zone for imaging studies.
The attached microorganisms were suspended in 10 mM MgSO 4 solution by vigorously shaking the biomass, and collected by centrifugation, after which total community DNA was isolated according to Bron and Venema [55]. To further purify the DNA, the material was loaded on a 0.8% agarose gel, and after migration DNA with a size of .23 kb was isolated using the QIAEX II Gel Extraction Kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. The DNA isolated with this method had a molecular weight of over 40 kb and was suitable for PCR amplification and restriction digestion.

Sample Preparation for Microscopy
Pieces of untreated and composted poplar tissues were fixed and embedded using microwave processing. Samples were fixed 266 min in 2.5% gluteraldehyde buffered in 0.1 M sodium cacodylate buffer (EMS, Hatfield, PS) under vacuum. The samples were dehydrated by treating with increasing concentrations of ethanol for 1 min at each dilution (30%, 60%, 90%, and 36100% ethanol). The samples were infiltrated with Epon resin (EMS, Hatfield, PA) for 3 min, with one final step at room temperature (RT) overnight, in increasing concentrations of resin (7%, 15%, 30%, 60%, 90%, 36 100% resin, diluted in ethanol). Infiltrated samples were transferred to flat-bottomed TAAB capsules and polymerized at 60uC for 24 h. Epon-embedded samples were sectioned to 2 mm with glass knives on a Leica EM UTC ultramicrotome (Leica, Wetzlar, Germany) for confocal scanning laser microscopy (CSLM) and epifluorescent light microscopy (ELM) or sectioned to 60 nm with a Diatome diamond knife on a Leica EM UTC ultramicrotome (Leica, Wetzlar, Germany) for transmission electron microscopy and Coherent anti-Stokes Raman (CARS) microscopy.

Coherent anti-Stokes Raman (CARS) microscopy
CARS microscopy [56,57] was used to provide information on biomass composition. A mode-locked Nd:VAN laser (High Q Laser (US), Inc) was used to generate 7 ps, 76 MHz pulse trains of both a 1064 nm and 532 nm laser beam. The 1064 nm beam was used as the Stokes beam. The 532 nm beam was used to pump an optical parametric oscillator (OPO) (Levante Emerald, APE-Berlin) to generate the CARS pump beam. The pump beam was tuned to 910 nm to effectively detect the 1600 cm 21 Raman band. The excitation power was kept at ,350 mW for the pump and ,150 mW for the Stokes beam. Samples were cut into ,20 mm thin sections by rotary microtome (Leica RM2255, Leica Microsystems Inc) embedding processes, after which slices were spread out between two cover slips. CARS imaging of the lignin was obtained at 1600 cm 21 . The focused beams were rasterscanned over the sample. The anti-Stokes light was collected from the Epi-direction and filtered by a clean-up filter 800/40 (Thorlabs).

Microbial cultivation
In order to isolate members of the genus Clostridium, dilutions of the microbial suspension obtained from the anaerobic zone were plated on CM3 medium [58] containing cellulose, hemicellulose, cellobiose or glucose and incubated under anaerobic conditions at 30uC.

DNA sequencing
To obtain a first glance of the community composition after 12 months of incubation, DNA was extracted from the bioreactor material and was used for construction of a bacterial 16 S clone library (see http://my.jgi.doe.gov/general/protocols/), from which 384 clones were sequenced. 16 S rRNA gene sequences were submitted to Genbank and assigned accession numbers JQ624944 to JQ625053. Subsequently, the DNA subjected to extensive 16 S/18 S profiling by V6-V8 pyrotag sequencing using the primers 926F and 1391R, reported to be capable of amplifying bacteria, fungi, and some archaea [59]. A total of 2,549,998 sequencing reads were generated and analyzed with the in-house pipeline PyroTagger [60], which filters out low-quality or short sequences, trims to a uniform length, clusters at 97% identity and blasts cluster representatives against a combined Greengenes [61]/ Silva [62] database of aligned rRNA sequences from the Bacteria, Archaea and Eukarya domains. In total 1,516,612 16 S/181S reads were clustered and phylogenetically assigned. The pyrotag data were submitted to the Short Read Archive under SRA number SRA045915.
The same bioreactor DNA used for community profiling was used to construct both a short-insert (3 kb) and a fosmid (40 kb) metagenome library for Sanger end sequencing. 11 Mb and 31 Mb of sequence were generated from these libraries, respectively. To obtain much deeper coverage of the constituent species, the Sanger data were supplemented with 454 pyrosequencing data, both using the FLX technology (115 Mb from a full run) and Titanium (564 Mb from a full run). Finally, to improve scaffolding another half Titanium run (236 Mb) was sequenced from a 454 paired-end library with a 3 kb insert size. All the resulting data were trimmed with the program LUCY [63] to remove low quality, vector, and adapter sequence and de-replicated with Uclust (http://drive5.com/usearch/usearch3.0.html) to remove duplicate reads prior to assembly. Data were assembled using Newbler 2.4. The pyrotag data were submitted to the Short Read Archive under SRA number SRA045915. The metagenome data, including all assembled gene sequences, can be publically accessed via the IMG/M website at http://img.jgi.doe.gov/cgi-bin/m/ main.cgi?section = TaxonDetail&taxon_oid = 2010388001, and via GenBank accession number AGTN00000000.1. In addition, for each of the assembled glycoside hydrolase or lignin oxidase genes, their IMG/M and CAZY/FOLY identifiers were provided as part of the Supporting Information as File S1, 'GHases and LO overview.xls.

Metagenome annotation, analysis and binning
All contigs and unassembled singlets were submitted to IMG/ M-ER for annotation [64]. Phylogenetically informative genes in the metagenome were identified with the ''Phylogenetic Marker COGs'' function and used to build trees with homologs from isolate genomes and thereby identify contigs and scaffolds belonging to the dominant populations. Contigs and scaffolds from six abundant populations (Magnetospirillum, Bacteroides, Methanomicrobia, Cyanobacteria, and two Clostridiales) were used as training data for the metagenome classifier, ClaMS [65]. The classifier was then used to bin all remaining contigs .2 kb, of which ,63% were assigned to one of the populations. These bins were then added to the dataset in IMG/M-ER.
All metagenome scaffold sequences used here were obtained from IMG/M [64]. For each metagenome, including the poplar biomass bioreactor, scaffold sequences were divided into 300 nt fragments (tiles) overlapping by 150 nt. BlastX [66] searches of each fragment were then conducted vs. the complete set of protein sequences from the CAZy (http://www.cazy.org/) [21] and FOLy (http://foly.esil.univ-mrs.fr) [22] websites. Contiguous runs of best hits with e-value below a 1e-10 threshold and having the same CAZy or FOLy gene family were collected from each fragment. These run lengths were then multiplied by mean contig depth and totaled by family. A depth of 1 was assumed for those metagenomes for which contig depth information was not available.
Glycoside hydrolase and ligninolytic metagenome comparison scatterplots were made by normalizing these family totals by metagenome total sequence, weighted by mean contig depth. Clustering and MDS plots comparing the metagenomes were created from the normalized GH values using Spearman rank correlation coefficients for distance/similarity. BlastX batch jobs were run in parallel on a 40 CPU linux cluster using PBS/Torque. Post-processing was performed with a combination in-house Perl and R scripts. Plots were created with R.