Metagenomics reveals niche partitioning within the phototrophic zone of a microbial mat

Hypersaline photosynthetic microbial mats are stratified microbial communities known for their taxonomic and metabolic diversity and strong light-driven day-night environmental gradients. In this study of the upper photosynthetic zone of hypersaline microbial mats of Elkhorn Slough, California (USA), we show how metagenome sequencing can be used to meaningfully assess microbial ecology and genetic partitioning in these complex microbial systems. Mapping of metagenome reads to the dominant Cyanobacteria observed in the system, Coleofasciculus (Microcoleus) chthonoplastes, was used to examine strain variants within these metagenomes. Highly conserved gene subsystems indicated a core genome for the species, and a number of variant genes and subsystems suggested strain level differentiation, especially for nutrient utilization and stress response. Metagenome sequence coverage binning was used to assess ecosystem partitioning of remaining microbes to both reconstruct the model organisms in silico and identify their ecosystem functions as well as to identify novel clades and propose their role in the biogeochemical cycling of mats. Functional gene annotation of these bins (primarily of Proteobacteria, Bacteroidetes, and Cyanobacteria) recapitulated the known biogeochemical functions in microbial mats using a genetic basis, and revealed significant diversity in the Bacteroidetes, presumably in heterotrophic cycling. This analysis also revealed evidence of putative phototrophs within the Gemmatimonadetes and Gammaproteobacteria residing in microbial mats. This study shows that metagenomic analysis can produce insights into the systems biology of microbial ecosystems from a genetic perspective and to suggest further studies of novel microbes.


Introduction
Hypersaline microbial mats are diverse laminated assemblages of microorganisms thought to represent one of the earliest ecosystems on Earth, and are typically dominated by oxygenic phototrophic cyanobacteria (Awramik, 1984;Oren 2010). Compact and highly structured, these mats contain microbial communities that possess great diversity at both 5 the metabolic and phylogenetic level (Canfield & Des Marais 1993;Ley et al. 2006).
Microbial mats have been described as complete ecosystems in miniature, with relatively closed cycling of photosynthetically fixed carbon from the upper layers distributed to heterotrophic organisms in the lower layers for re-mineralization and subsequent reincorporation. Photosynthetic activity of oxygenic phototrophs during the daytime is 10 followed by a rapid transition into anoxic conditions following sunset. Features of an active nitrogen cycle include high rates of nitrogen fixation supported by daytime photosynthetic activity or sulfide redox reactions (Bebout et al. 1993).
Extensive biogeochemical, microbiological, and targeted molecular ecological studies have been completed on the hypersaline microbial mats of Elkhorn Slough (CA) and 15 have identified rates of biogeochemical processes and the identities of some organisms involved. As an example, previous studies have shown that within the upper 2 mm layers of these mats, net hydrogen production is a consequence of constitutive fermentation of photosynthate to acetate by Cyanobacteria, followed by consumption of fermentation byproducts by Desulfobacterales and Chloroflexi (Burow et al. , 2013(Burow et al. , 2014Lee et al. 20 2014). Major nitrogen fixers have also been identified, and include a novel group of cyanobacteria (ESFC-1), that have been isolated and whole genome sequenced Stuart et al. 2015;Everroad et al. 2016). In many cases, identity and metabolic role of microbes have not been linked to specific biogeochemical transformations, largely due to the high diversity and novelty of the microorganisms present in these mats.
In this study, high throughput shotgun metagenomic sequencing was combined with several genome-centric bioinformatics approaches (co-assembly, coverage binning, reference-mapping) to understand functional and microbial diversity in the phototrophic 5 zone of microbial mats of Elkhorn Slough, CA. Underpinning this work are previous binning studies have sought to analyse metagenomic results at the organism level rather than at the microbiome level, either to identify novel genomic diversity (Dick et al. 2009;Wrighton et al. 2012;Hanke et al. 2014;Sekiguchi et al. 2015), novel metabolism (Podell et al. 2013), novel genetics (Hess et al. 2011Nielsen et al. 2014), or ecological succession (Morowitz et 10 al. 2011;Brown et al. 2013;Sharon et al. 2013). Using reference-free binning of metagenomic scaffolds we show that partitioning of metagenomes derived from microbial mats resulted in recovery of numerous bins representing species, or groups of closely related organisms, in the phototrophic mat layer. Using reference-based variant analysis with a sequenced genome of the dominant Cyanobacteria Coleofasciculus chthonoplastes, 15 we identified genes representing metabolic pathways conserved within this species.
Analysis of the functional genetic diversity of these dominant bins recapitulated the known partitioning of ecological functions in these mats and also identified novel organisms that have yet to be isolated. This combined reference and reference-free based approach shows the value of binning metagenomes and provides a robust systems biology 20 analysis of novel microbial ecosystems from a genetic perspective.

Site sampling and incubation description
Samples were collected from the Elkhorn Slough estuary at 36°48'46.61" N and 121°47'4.89" W. The site consists of up to 1 cm thick mats dominated by Coleofasciculus sp.
(formerly Microcoleus sp.) and Lyngbya sp., that vary with seasonal water flows and 5 nutrient inputs. The conditions and the mats found at this site have been documented in previous reports (Burow et al. , 2014Woebken et al. 2012). A single contiguous mat piece approximately 60-80 cm in diameter was harvested in a total of thirty-two 10 cm diameter acrylic cores tubes on Nov. 8, 2011 at 6 AM. Cores were sealed with rubber stoppers on the bottom, covered with clear plastic wrap and transported to NASA Ames 10 Research Center (Moffett Field, CA) for incubation with Elkhorn Slough water in aquaria under natural light ( Figure S1). The cores were split into two treatments: controls, and a set treated with 30 mM molybdate to inhibit sulfate reduction. Cores were incubated in a temperature controlled water bath, monitored, and sacrificed in 4-hour intervals over 24 hours. Two control and two molybdate-treated samples, collected at 0130 and 1330, were 15 selected for metagenomic sequencing. Because of the short timeframe, no differences in community composition were expected amongst these samples.

Nucleic acid extraction
Nucleic acid extraction was conducted using a total RNA/DNA approach consisting of a phenol-chloroform method described by Woebken et al. (2012)  RNase-free H2O. The top 2 mm of the microbial mat was excised using a sterile razor blade and homogenized for 30 seconds on the lowest setting in 0.5 ml RLT buffer mix (10 ml RLT buffer (RNEasy Plus Mini kit, Qiagen) and 100 ul β-mercaptoethanol) in a 2 ml bead beating tube (0.5 mm zirconium beads). A FastPrep bead beater (MP Biomedicals, Santa Ana, CA, USA) was used for 40 seconds at setting "6.0". Samples were spun for 1 minute at 8,000 x g 5 (rcf) and the supernatant transferred to new 2 ml tubes. DNA was isolated by adding an equal volume of phenol-chloroform (basic) and vortexing for 10 seconds, incubating for 5 minutes at room temperature, and spinning for 5 minutes at 8,000 x g (rcf). The aqueous phase was transferred to a new tube on ice. An equal volume of 100% ethanol was added to eluate and vortexed for 10 seconds. 700 ul of supernatant/ethanol mix was added to 10 QIAmp spin column (QIAamp DNA mini prep kit, Qiagen) and processed according to manufacturer's instructions. RNA was also extracted from each sample using RNA Mini Prep kits (Qiagen) and reserved for future gene expression analysis. Whole genome shotgun metagenomic sequencing was completed at the Joint Genome Institute (JGI) on an Illumina HiSeq 2000 platform (Lee et al., in submission). 15 Read preprocessing, assembly, and annotation 150 bp paired end reads were quality trimmed using Trimmomatic (Bolger et al. 2014) with the following parameters: headcrop 11, trailing 20, sliding window 4:20, minimum length 75 bp. Quality filtered reads were assembly with Ray-Meta (Boisvert et al. 2012) on the NERSC Edison supercomputing cluster. Sequences from metagenomes of 20 microbial mat samples were pooled together and co-assembled with Ray-Meta (Boisvert et al. 2012) three times using different assembly word sizes (k=29, 45, 63). Assembly word sizes were assessed using KmerGenie ( Figure S2A) (Chikhi & Medvedev 2014). Each individual metagenome sample was then mapped using Bowtie2 (Langmead & Salzberg 2012) back to each assembled scaffolds to determine sample specific coverage. Prodigal (Hyatt et al. 2010) was then used to predict open reading frames (ORFs), and an HMM model (Eddy 2011) was used to find the essential single copy genes (Albertsen et al. 2013).
All ORFs were submitted to MG-RAST (Meyer et al. 2008) for annotation using a BLAT 90% 5 clustering protocol. The mapping process was repeated for open reading frames (ORFs) detected by Prodigal (Hyatt et al. 2010). MG-RAST custom md5 matches were quality filtered, de-replicated, and parsed to a tab-delimited ORF database of ORF name, protein annotation, ontology annotation, taxonomy, and coverage.

Metagenomic binning 10
Previous work examining binning has shown several successes using k-mer nucleotide frequency, %GC, read coverage, taxonomy, or a combination of these strategies as biosignatures of genomes within metagenomes. The pipeline used in this study combines supervised learning (Dick et al. 2009), dimensionality reduction (Woyke et al. 2006), and the coverage binning (Albertsen et al. 2013) using R and CRAN analysis 15 packages.
As was noted by Dick et al. (2009), preliminary results suggested that larger scaffolds harbored strong phylogenetic signal ( Figure S2B), so these scaffolds were used to recruit clusters representing bins from the metagenomes. Log normalization and principle component analysis (PCA) dimensionality reduction using scaled values of %GC and 20 differential sample coverage aided to resolve binning 'spears' seen in metagenomic data.
DBScan, noted for being sensitive to cluster density and used on noisy datasets (Ester et al. 1996), was selected to cluster large (>5 kbp) scaffolds in PCA space. The remaining scaffolds >1.5 kbp were recruited using a SVM machine-learning algorithm (Chang & Lin 2011) trained on the larger scaffolds (Supplemental Figure S3B) and was tuned by maximizing single copy essential gene membership and minimizing gene copy duplication.
This was repeated for the 3 assemblies (performed at different word sizes (k=29, 45, 63)); the best corresponding bin (maximum single copy genes, minimum duplication) from each 5 assembly was extracted and pooled with both background and unbinned contigs from the k=29 assembly. Binning procedure and quality analysis were based on analysis of ~100 essential single copy genes (Albertsen et al. 2013;Dupont et al. 2012). These data when charted together produce informationally-dense PCA graphs overlaying the clustering of contigs with contig information (taxonomy, contig size, etc.) which we refer to as "galaxy" 10 plots.

Annotation search and collation
Since a number of different ontology systems and gene annotation databases were used to annotate genes, querying single annotation sets (e.g. KEGG) produced partial results. To maximize search coverage, a customized Python regular expression search 15 algorithm was developed. This algorithm emphasized matching KEGG and EC ontologies (e.g. K02586 and 1.18.6.1, with subunit or chain designation), but can also search for multiple text patterns in protein annotations by keyword and subunit (e.g. nitrogenase alpha chain), and protein abbreviation (e.g. nifD). The algorithm also allows for nested searches (e.g. first cytochrome c oxidase, then cbb3 subtype), as well as exclusion terms 20 (e.g. 'precursor' proteins). Results were cross-referenced with bin annotation and written to tab-delimited files for heatmap generation in Excel. A list of genes associated with biogeochemical cycling (sulfur metabolism, nitrogen metabolism, phototrophy, autotrophy, and hetrotrophy) and with starch utilization were determined and used to query annotation records.

Read mapping and variant analysis
Recent studies have examined the possibility of using variant callers typically seen in human genomic variant analysis for the detection of strain variation across genomes (He 5 et al. 2010) and for differing populations of Bacteria in the human gut microbiome (Schloissnig et al. 2013). We used these insights and approach to call the coverage and density of variants in genes and in subsystems that differed from the mapping reference.
The complete metagenomic dataset from the selected four mat samples were pooled and mapped to the C. chthonoplastes PCC 7420 (Rippka et al, 1979, Garcia-Pichel et al. 1996 genome (GCA_000155555.1 ASM15555v1, JCVI) using Bowtie2 with default settings. Single nucleotide polymorphisms (SNPs) were called with FreeBayes using haploid continuous pooled variant calling settings (Garrison & Marth 2012). Variants were filtered for poorly called variants and selected for SNPs using Samtools and BCFtools (Li et al. 2009;Danecek et al. 2011). Bedtools (Quinlan & Hall 2010) was used to count the number of variants per 15 gene in the PCC7420 genome. Variants were summarized by gene and SEED subsystems using PATRIC annotations (Wattam et al. 2013) cross-referenced with NCBI annotations (with a cutoff >70% annotation overlap using Bedtools). SNPs were then filtered for coverage between 50-200 reads and the ratio of SNPs per all bases in a gene (SNP density) was calculated. These gene SNP density values were matched to subsystem ontology. Each 20 gene was binned by SNP density into 4 groups, (0-1%, 1-2%, 2-3%, 3%+) variants / gene length. A score was developed based on the aggregation of gene variant density in subsystems to estimate the level of genetic variation in each subsystem as compared to genome-wide variation levels (Equation 1) where: g is a subsystem gene set from all gene sets G, n is the variant density bin described above, and Vg,n is the number of genes in set g also in bin n, and Cn is a weighting coefficient 5 for each bin (here, Cn= 1, unweighted). A variation ratio of Vg,n to number of genes in n for G (VG,n) was calculated and offset by the variation ratio of all gene bins in subsystem g and then summed for each bin. The result was scaled by an arbitrary factor of 1,000 for convenience to generate a final score for each gene set g. Positive scores indicated more genes with variation in a subsystem than the genome-wide average, negative scores 10 indicated fewer genes with variation than the genome-wide average.

Data and repository archival
Data and code from this study were archived to several locations. JGI sequences were archived on the JGI IMG server under Project ID 1081546-1081548, 1000633. Raymeta assemblies and gene annotation tables were archived at 10.5281/zenodo.584152. All 15 codes used on this project are available at GitHub at (http://github.com/leejz/). Threedimensional interactive ShinyRGL visualizations of galaxy plots are available at https://leejz.shinyapps.io/plot3d2 (training dataset) and at https://leejz.shinyapps.io/plot3d3 (full dataset).

Metagenomic binning of the photosynthetic zone of microbial mats
Initial analysis was performed at the JGI on all four metagenomes. Coverage and diversity estimation indicated essentially equivalent metagenome composition at a phylum 5 level (Lee, et al., in submission) and suggested that metagenomes could be combined for differential coverage analysis. "Galaxy" charts of PCA components overlaid with taxonomic information from essential single copy genes found on contigs ( Figure 1A) showed that bins could be identified from longer contig fragments containing these genes. When longer contigs were clustered, bins were clearly delineated as density-dependent clusters by the 10 first 3 principal component axes (Supplemental Shiny visualization plots). This produced more than 70 bins ( Figure 1B) evaluated for completeness (fraction of essential single copy genes detected), duplication (fraction of duplicate essential single copy genes detected) of which the top 20 bins (equivalent to those with a completeness > 80%) were selected for downstream analysis. Table 1 shows the top 20 ordered by mean coverage and 15 Supplemental Dataset 1 shows the remainder. Two approaches to determine taxonomic affiliation were undertaken. In a more conservative approach, we selected the most common phylum annotated by Hidden Markov Model (HMM) profiles of essential single copy genes. Where taxonomy could be determined for these single copy genes, taxonomy at phylum level was concordant for each bin, with one exception (Table 1, bin 6). Our second 20 method for taxonomy determination was to examine the most common genome annotation matched by MG-RAST for all Open Reading Frames (ORFs) in a bin. This produced a nearest genome result and the fraction of genes that matched to this nearest genome (Table 1). Of the 20 bins selected for downstream analysis, using annotations of single copy essengial genes, 3 bins of Cyanobacteria (bins 1,2,9), 5 bins of Gammaproteobacteria (bins 3,4,7,8,20), 1 bin of Alphaproteobacteria (bin 10), 1 bin of Deltaproteobacteria (bin 16), 1 bin of Firmicutes (bin 6), and 9 bins of Bacteroidetes (bins 5,11,12,13,14,15,17,18,19) were annotated with near consensus at the phylum level (class level for Proteobacteria).
Consensus species taxonomic annotations from ORF annotation of the most abundant two 5 bins (bins 1, 2) suggested that the dominant Cyanobacteria known from these mats (C. chthonoplastes, and Lyngbya sp.) were captured. The L50 assembly metric for bin 1 (C. chthonoplastes) was much lower than expected (despite the fact that this taxa was the most abundant and therefore had the deepest read sampling). The taxonomy of the third most abundant bin (bin 3) suggested that this was a purple sulfur bacteria (Thiorhodovibrio sp.); 10 the remaining Gammaproteobacteria (bins 4,7,8,20) were poorly matched to reference genomes. The lone Firmicutes bin (bin 6) was unique in that there was no strong consensus of phylum identity among annotated essential-copy genes, and it had ORF annotations that belonged to the only sequenced member of the phylum Gemmatimonadetes. The single bin of Alphaproteobacteria (bin 10) in this subset was annotated as a possible relative of 15
A full listing of all bins identified in this study is included in Supplemental Dataset 1 with a cross-referenced table ( Figure S3A) for each assembly. We note that some of the minor bins, still largely unexamined, contained annotations for possible Chloroflexi, Planctomycetes, and Verrucomicrobia and may prove useful to future studies of the diversity of these organisms.

Bin pathway annotations comparing C. chthonoplastes to other Cyanobacteria
Binning and annotation analyses showed bins 1 and 2 closely matched to C. 5 chthonoplastes PCC 7420 and Lyngbya sp. PCC 8106 respectively. A third bin (bin 9) loosely matched a Fischerella genome (9% of ORFs). To understand the metabolic differences between Cyanobacteria within mats, annotations from these 3 bins were combined with annotations from the draft genome of ESFC-1, a nitrogen-fixing cyanobacterium previously isolated from the same ecosystem and sequenced Everroad et al. 10 2016). In the current study ESFC-1 did not pass binning thresholds, although it was detected in metagenomes ( Figure 1B). To draw comparisons between these, the major KEGG pathways were visualized. Notable differences were observed in the carbohydrate utilization pathways ( Figure S4). When annotations of ESFC-1 were included as a bin with the 3 other Cyanobacteria and explored in depth by searching for pathway gene terms in 15 annotations, the C. chthonoplastes bin had a unique set of glycoside hydrolases involved in many aspects of both carbohydrate production and breakdown (GH3, GH5, GH9, GH38, GH57, Figure 2A). All bins contained annotations for several types of cellulases, cellulose synthesis genes, and starch (glycogen) storage and had some involvement of maltose and sucrose synthesis and cycling ( Figure 2B). A full table of all search terms used and all 20 extracted search results are available in Supplemental Dataset 1.

Functional genetic diversity of biogeochemical cycling identified using metagenomic bin annotations
A catalog was compiled of the pathways involved with biogeochemical cycling (C, N, S) and of the organisms involved with those pathways. Briefly, annotations of indicator genes selected from these pathways were used to identify the partitioning of 5 biogeochemical roles across bins as well as the remaining ecosystem (minor bins, background recruitment bin, and remaining unbinned genes (Figure 3). A complete list of bins, genes used in this study, gene abbreviations used in this study, and gene selection criteria can be found in Supplemental Dataset 1. Clear delineations between bins involved in sulfur cycling, nitrogen cycling, and carbon cycling could be observed. For example, both 10 sulfur oxidation and sulfate reduction could be resolved. Capacity for phototrophic sulfur oxidation, represented by DSR, APR, and SOX genes for sulfur oxidation and PUF, CHL, BCH for bacterial phototrophy, was present in a bin (bin 3) representing the dominant clade of Thiorhodovibrio sp. A Deltaproteobacteria (bin 16) was annotated as sulfate-reducing, represented by DSR, APR, and methyl viologen-reducing hydrogenase (MVH). This bin also 15 contained a suite of annotations for oxygen tolerance such as reactive oxygen scavenging indicated by rubrerythrin, thioredoxin, catalase-peroxidase, and alkyl hydroperoxide reductase, as well as direct oxygen scavenging indicated by rubredoxin, molybdopterin oxidoreductase, NADH-quinone oxidoreductase (Supplemental Annotations). A bin associated with purple non-sulfur bacteria (PNS) (bin 10) was also identified as having SOX 20 genes.
Within the nitrogen-utilizing bins, annotations for nitrogen fixation (NIF) were detected in 5 bins (bins 1,2,3,10,16), and in minor, unbinned, and background bins, suggesting that this essential trait was widespread. However, nitrate reduction was observed in only one bin (PNS bacteria, bin 10) and not in minor or background bins.

Sequence variation analysis of C. chthonoplastes PCC 7420 by coverage, subsystems, and genes
The poor assembly statistics for the dominant bin identified in these mats (bin 1; most similar to C. chthonoplastes) suggested that genetic subpopulations of this organism confounded the assembler. We used the GenBank draft genome of the type strain PCC 7420, 5 isolated from hypersaline microbial mats, as a mapping scaffold for metagenome reads. All variants (SNPs, indels, repeats) were inferred, but only SNPs, which were the most common variant, were retained for downstream analysis. SNPs occurred in lower coverage regions compared to the coverage distribution of the sequenced genome ( Figure 4). These variants represent a rarer strain, or strains, with ~150x coverage in the sample while the 10 main coverage distribution at ~600x coverage had fewer SNPs. SNPs from these lower coverage regions (50-200x coverage) were filtered and summarized for each gene and each subsystem. The difference between the variant density of a subsystem to the genome average was tallied for all genes in a subsystem to form a score to evaluate the relative accumulation of mutations across gene subsystems. This score weighted for subsystems 15 with more genes. Subsystems with less variant accumulation had a negative score, and subsystems with more variant accumulation had a positive score (Supplemental Figure S5). Scores from subsystems that had the least and most accumulation are shown in Table 2.
Subsystems with low variant density scores were related to photosynthesis (e.g. photosystem I/II, phycobilisome, chlorophyll), carbon fixation (e.g. cAMP, carboxysome, 20 circadian clock), and basic cellular processes (e.g. cell division, DNA replication, ribosomal proteins, respiration, central metabolism). Examining individual genes from these subsystems with high variant density, specific genes that might indicate strain differentiation (Table 3) were regulatory proteins (P-II, kinases, cAMP proteins), membrane proteins (Co/Zn/Cd efflux, phosphate permease, O-antigen export permease, isoprenoid and carotenoid biosynthesis, fatty acid biosynthesis, and amino-sugar biosynthesis), nitrogen and amino acid cycling genes, and several transferases, and many genes related to carbohydrate modification. Additionally, environmental stress response 5 genes, represented by chemotaxis genes, cryptochrome, and Exodeoxyribonuclease, and DNA-cytosine methyltransferase, were observed. Supplemental Dataset 2 contains the full list of subsystem and gene variant density statistics.

Metagenomic binning can reproduce the systems biology of microbial mats by providing a genetic atlas
In this study, de novo metagenomic binning approaches were used to reconstruct the biogeochemical cycling divisions between different organisms of a microbial mat and to seek novel diversity previously unrevealed by reference-based annotation studies. Phylum genera Coleofasciculus and Lyngbya, respectively. Bin 3 was annotated as a sulfideoxidizing bacteria from a Thiorhodovibrio sp., which are commonly observed in mats (Overmann et al. 1992). Bin 10 supported observations of nitrogen fixation potential in purple nonsulfur bacteria in mats (Bebout et al. 1993;Zehr et al. 1995) but also suggests the multifaceted role (i.e. sulfate and nitrate metabolism) of many PNS bacteria in microbial mats (Yurkov et al. 1994). Observations of sulfate reduction in the phototrophic zone of mats (Canfield & Des Marais 1991;Fike et al. 2008) was supported by one Deltaproteobacteria bin (bin 16). This bin, with closest relative Desulfotalea psychrophila 5 LSv54, contained annotations for sulfate reduction pathways and a methyl viologenreducing hydrogenase suggested as markers for SRBs by previous work (Pereira et al. 2011;Burow et al. 2014). This bin also contained a number of annotations that indicate oxygen tolerance and support previous studies that identified sulfate reduction in the phototrophic zone of hypersaline microbial mats and suggested that oxygen-tolerant SRBs 10 were responsible (Canfield & Des Marais 1991;Visscher et al. 1992;Jørgensen 1994;Teske et al. 1998;Baumgartner et al. 2006;Fike et al. 2008;Burow et al. 2014). However, isolation of such organisms has been difficult and multiple mechanisms of oxygen tolerance have been proposed (Dolla et al. 2006). Similar to previous findings, the bin we identified does not appear to contain cytochrome c oxidase or superoxide dismutase (Burow et al. 15 2014) but does have rubrerythrin, suggesting that this organism may scavenge oxygen using this pathway or have a syntrophic oxygen coping strategy seen in mat-derived SRBs We noted that not all canonical members of microbial mats were observed, even when unbinned sequences were searched. Notably missing from this study were the chemolithotrophic sulfur bacteria (Beggiotoa sp.) and methanogenic Archaea, both of 10 which have been observed from Elkhorn Slough microbial mats (unpublished data). Deeper sequencing efforts may be required to detect enough genomic sequence to bin these rarer clades, as was also noted for ESFC-1; alternatively sampling from deeper mat depths may be required to capture these additional species.

C. chthonoplastes is distinguished from other Cyanobacteria in mats by extensive 15 polysaccharide production and breakdown capability
To better understand C. chthonoplastes ubiquity in microbial mats, we compared the three Cyanobacteria (bins 1,2,9) plus the genome of filamentous cyanobacterium ESFC-1 (isolated previously from this site but poorly assembled in this dataset) for major metabolic differences, specifically for carbon utilization genes ( Figure 2). Notably, the 20 dominant bin (C. chthonoplastes) appears to have an extensive capability to synthesize and process beta-glucoside polymers, making it distinct in this study and supporting recent work showing that fixed carbon polysaccharide production by Cyanobacteria (notably beta-glucose polysaccharides) constitutes a sizable fraction of extracellular polysaccharide (Stuart et al. 2015). The elevated mutation levels seen in these genes also support the view that polysaccharide regulation plays a role in strain differentiation in this species. The role of these beta-glucose polymers are not well understood, but may play a role in carbon storage for Cyanobacteria (Stuart et al., 2015), or may have adhesion and anchoring 5 functions (Ross et al. 1991;Römling 2002). These two details infer a phenotype of polysaccharide specialization that both distinguishes the dominant mat-building Cyanobacteria found in these mats and differentiates strains of this species from each other.

Metagenomic binning predicts novel functional roles of microbes from microbial 10 mats
Binning was used to identify novel organisms from unstudied clades to examine the novel diversity in microbial mats. These binning efforts revealed that Gammaproteobacteria consisted not only of the canonical sulfide-oxidizing bacteria in mats, but also of bins with bacteriochlorophyll-containing heterotrophs, possibly 15 containing mixotrophs similar to OM60 clades (Spring & Riedel 2013). As OM60 clades have been shown to rely on specific light, oxygen, and organic acids for mixtotrophic growth, this work suggests that the isolation of such bacteria would be highly dependent on mimicking specific conditions that develop during a diel cycle, such as the acetatereplete, low-oxygen, low-light initial early morning photosynthetic period of microbial 20 mats, where their growth would be distinctive from other heterotrophs. At present, given the lack of genomic knowledge about phototrophic Gammaproteobacteria, accurate taxonomic assignment of these bins below the class level remains challenging.

This work also identifies a potential novel phototroph within Firmicutes or
Gemmatimonadetes. Based on the taxonomic identity and binning completeness we measured, our research suggests this bin represents a possible salt-tolerant variant of the recently isolated phototrophic Gemmatimonadetes. Sequences of taxa from freshwater lakes belonging to possible clades of phototrophic Gemmatimonadetes (Zeng et al. 2014) 5 bear some resemblance to this bin. Specifically, the annotations of photosystem genes in this bin appear to be derived from Alphaproteobacteria, and contain annotations for aerobic respiration genes (cytochrome c oxidase (CCO), (2-oxoglutarate synthase) KORA, enolase (ENO), pyruvate kinase (PYK), and 6-phosphoglucanate dehydrogenase (PGD), Figure 3). However, given the possibility of misassembly, mis-annotation, and the difficulty 10 of inferring gene expression from genomic data, these findings must be paired with further microbial isolation, characterization, and genome sequencing.

Reference-based variant analysis reveals core and accessory genomes C. chthonoplastes
Our analyses indicate that genomic flexibility of subpopulations of C. chthonoplastes 15 can be detected by metagenomic variant analysis. Comparative genomic analysis has shown that species can contain recombinant and horizontally transferred regions that allow for both core and flexible genome elements (Fraser et al. 2007;Kashtan et al. 2014).
Differential coverage analysis suggested that the dominant strain resembled the sequenced PCC 7420, but that at least one more rare subtype with less similarity was also present 20 ( Figure 4). Focusing on these less abundant variants, each SEED subsystem was assessed for the density of variant sites in gene subsystems. In subsystems related to carbon metabolism, phototrophy, DNA replication, and cell division we observed reduced levels of variant accumulation that seems consistent for the essential functioning subsystems of a cyanobacterium (Table 2). This suggested that assessment of variability can predict conserved subsystems that represent the core genome of that organism. Conversely, subsystems associated with greater accumulation of mutations should indicate potential variation within niches. A metagenomic study of meltwater mats from Arctic and Antarctic 5 ice shelves (Varin et al. 2012) suggested that coping with environmental regulation, especially in variable salinity conditions, was a primary driver of genetic functional diversity. Similarly, in this study, subsystems displaying increased variation suggest that C, N, P nutrient cycling and environmental stress response to factors such as salinity, metals, light and infection were the drivers in C. chthonoplastes genetic differentiation in mats. 10 Specific genes that had increased variation (Table 3) included a nitrogen regulatory gene involved in modulating nitrogen scavenging, a transaldolase (part of the pentosephosphate pathway) and several glycotransferases and carbohydrate processing genes, an endonuclease involved in DNA repair, a DNA methyl-transferase related to phage immune response, a metal ion pump involved in toxicity resistance, and a number of fatty acid, 15 carotenoid, isoprenoid biosynthesis pathway genes. This work predicts that these factors can be used to differentiate C. chthonoplastes strains, both physiologically and genetically, and may explain some of the ubiquity of this species across microbial mats worldwide.

Conclusion 20
In this study coverage binning was used to resolve functional genes at a genome population level and thus create a genetic basis for biogeochemical partitioning that parallels results seen from physiological and biogeochemical cycling studies. The use of reference-free binning was crucial as the majority of bins identified had only a fraction of genes matching any nearest reference genome. We also show that where reference genomes were available, a key weakness of assembly-based analysis, strain level microheterogeneity, can be used to generate SNP analysis to differentiate strain level differences using metagenome reads. Though the ecosystem survey in this study included a 5 limited snapshot of a microbial mat, using only abundant organisms and key genes in metabolic pathways, we can see how the ecosystem roles of mat microbes partition the genetics within metagenomes. Our analysis also predicts a number of novel mat organisms that are as yet unidentified, including several phototrophs. We suggest that these combined reference and reference-free analysis approaches can be used to generate genetic atlases of 10 biogeochemical cycling of novel ecosystems and direct further microbial investigation. Lee JZ, Burow LC, Woebken D, Everroad RC, Kubo MD, Spormann AM, et al. (2014). Fermentation couples Chloroflexi and sulfate-reducing bacteria to Cyanobacteria in hypersaline microbial mats. Front. Microb. Physiol. Metab. 5:61. 35 Rogers TE, Pudlo NA, Koropatkin NM, Bell JSK, Moya Balasch M, Jasker K, et al. (2013). Dynamic responses of Bacteroides thetaiotaomicron during growth on glycan mixtures. Mol. Microbiol. 88:876-890. Römling U. (2002). Molecular biology of cellulose production in bacteria. Res. Microbiol. 5 153:205-212. Ross P, Mayer R, Benziman M. (1991).  Riedel T. (2013). Mixotrophic growth of bacteriochlorophyll a-containing members of the OM60/NOR5 clade of marine gammaproteobacteria is carbon-starvation independent and correlates with the type of carbon source and oxygen availability. BMC 25 Microbiol. 13:117. Stuart RK, Mayali X, Lee JZ, Craig Everroad R, Hwang M, Bebout BM, et al. (2015). Cyanobacterial reuse of extracellular organic carbon in microbial mats. ISME J.     Table 3. Variant density (expressed as variants per gene length) for individual genes from subsystems with high variance score (excluding unknown genes and genes from unknown categories).

5
Supplemental Figures: Figure S1: A. Photograph of location of cores collected in the field from the microbial mats in Elkhorn Slough, Moss Landing, California on November 7, 2011. Individual samples collected in core tubes were numbered and could be tracked throughout the diel 5 experiment. B. Experimental apparatus used to incubate microbial mats throughout the diel period. Incubation containers containing cores used for control and molybdate treatments are labeled.
10 Figure S2: Kmergenie plots indicating optimal word sizes for assembly (k=29,45,63 were chosen) (A) and coverage v. length for k=29 word assembled scaffolds indicating a phylogenetic signal relating to contig size and coverage (B).