Genetic Diversity Affects the Daily Transcriptional Oscillations of Marine Microbial Populations

Marine microbial communities are genetically diverse but have robust synchronized daily transcriptional patterns at the genus level that are similar across a wide variety of oceanic regions. We developed a microarray-inspired gene-centric approach to resolve transcription of closely-related but distinct strains/ecotypes in high-throughput sequence data. Applying this approach to the existing metatranscriptomics datasets collected from two different oceanic regions, we found unique and variable patterns of transcription by individual taxa within the abundant picocyanobacteria Prochlorococcus and Synechococcus, the alpha Proteobacterium Pelagibacter and the eukaryotic picophytoplankton Ostreococcus. The results demonstrate that marine microbial taxa respond differentially to variability in space and time in the ocean. These intra-genus individual transcriptional patterns underlie whole microbial community responses, and the approach developed here facilitates deeper insights into microbial population dynamics.


Introduction
Marine microbial communities play critical roles in the cycling of organic matter and nutrients and in food webs [1,2]. These complex communities are composed of diverse, poorly-characterized taxa, including many uncultivated lineages and strains [3][4][5][6]. Much has been learned about marine microbial communities using cultivation-independent methods, in particular, metagenomics and metatranscriptomics approaches enabled by high-throughput and nextgeneration nucleic acid sequencing [5,[6][7][8][9][10]. One recent discovery is that individual populations of phototrophic and heterotrophic plankton have complex but coordinated rhythmic daily transcription patterns that are highly conserved throughout oceanic microbial communities worldwide [11][12][13]. This was particularly surprising, since genome comparisons of closelyrelated cultivated and uncultivated taxa, such as the abundant phototrophs Prochlorococcus and Synechococcus, show that strains or ecotypes are diverse and variable in nucleotide sequence and gene content [10,[14][15][16][17][18]. Studies with cultured isolates showed that genotypic diversity among bacterial strains is reflected in gene regulation at the transcription and expression levels [19,20], and gene regulation has been recognized as an important factor driving evolution and adaption in microbes [21]. It is currently not clear how the potentially great physiological diversity implied by high genomic variability in marine microbial populations [5] could result in consistent, robust and synchronized daily gene transcription patterns in the environment.
The whole genome population binning (WGPB) approaches [11][12][13] for analysis of the high-throughput sequence data have the advantage of examining transcriptional responses across the whole genome, but without appropriate reference genomes, cannot usually resolve populations at the species or sub-species taxonomic levels. This high resolution is only possible when genome sequences are known and metagenomic/metatranscriptomic datasets have high coverage for these genomes [9]. At the current depth of sequencing, assigning short reads to individual taxa is only possible for taxonomically identifiable signature regions and for genes/ transcripts present in high abundance (for example, the <200 nucleotide long region of the proteorhodopsin gene has been examined in detail in order to evaluate Pelagibacter subtaxa contributions in [11]). Oligotyping [22] and minimum entropy decomposition [23] are powerful approaches that uncover finer community changes based on a marker gene, but both require that sequences cover the same region of the gene (for example, the V4-V5 hypervariable region of the 16S-rRNA gene). We developed a microarray-inspired gene centric (MAGC) approach that links sequences originated from different regions of a gene/transcript and resolves transcription of closely-related taxa. We applied MAGC to examine the contribution of individual microbial taxa to synchronized transcription patterns observed previously at the genus level [11,12] and examined transcription of specific functional genes by individual taxa within the cyanobacteria Prochlorococcus and Synechococcus, the proteorhodopsin-containing alpha proteobacterium Pelagibacter ubique (SAR11) and the eukaryotic picophytoplankton Ostreococcus. The results showed that closely related microbial taxa (defined as taxa that have greater than 95% nucleotide identity to the target gene sequence) have distinct and differential transcription patterns for ecologically-relevant functional genes, and these individual patterns may reflect the differences in microbial responses to environmental conditions.

Results and Discussion
The MAGC approach described in this study (Fig 1) is an in silico nucleotide sequence-query based method that uses 60-nucleotide (nt) long sequences (probes) previously designed for the MicroTOOLs environmental microarray [24]. The probes are specific for known marker genes for metabolic and cellular processes important for microbes living in pelagic oligotrophic environments. These processes include energy, carbon and nitrogen metabolisms, phosphonate and dimethylsulfoniopropionate utilization, nutrient transport, stress responses, DNA replication and cell division [25][26][27][28][29][30]. The ortholog sequences for each marker gene were collected from marine metagenomic and metatranscriptomic sequence datasets and marine microbial genome sequences. These sequences were clustered at 95% nucleotide identity, and a total of six probes were designed for each representative target sequence [24]. Thus, the~99,000 probes that are specific for 16,800 orthologs of 145 different functional genes target a diverse set of representative organisms and genes from pelagic microbial communities. In the MAGC approach, this MicroTOOLs probe set was used as a query set against metatranscriptomic sequence data to discriminate gene transcription at a high phylogenetic resolution (Fig 1). In this paper, we define a group of gene sequences or transcripts that shares 95% nt identity as a gene-Operational Taxonomic Unit (OTU) or transcript-OTU, respectively. Because six probes gene-OTU has six highly specific probes, and all of the 98,632 probe sequences are used as queries in BLASTN searches against the database of nucleotide sequences obtained from environmental samples (metagenomes or metatranscriptomes). The transcript-OTU abundances are estimated as the averages of reads with 95% nucleotide identity (nt) across the entire length of each probe. (B) Distribution of probes specific for the Synechococcus KORDI-49 rbcL gene shown with reads identified with these probes in a California Current System diel study. Three probes (indicated with shaded area) for Synechococcus KORDI-49 rbcL had 95% nt sequence identity to reads from this sample. The read ID and probe start position are shown above each read and probe, respectively. target the same individual transcript-OTU, and transcript abundance is estimated as the average of all probe hits (Fig 1), the MAGC approach links several reads over the length of the transcript to one transcript-OTU and shows OTU-specific patterns in gene transcription.
We used the MAGC approach to analyze previously published metatranscriptomic datasets that were obtained from a three-day duration, two-hour interval sampling in the North Pacific Subtropical Gyre (NPSG) [12] and a two-day duration, four-hour interval sampling in the California Current System (CCS) [11]. Both studies collected microbial biomass in the 0.2-5.0 μm size fraction from 23 m depth using the free-drifting robotic Environmental Sample Processor [11,12]. The CCS study consisted of 13 samples that were sequenced with a 454 GS FLX Titanium DNA sequencer (19 runs of 4.4 Gbp total, 450 bp long single reads), and the NPSG study consisted of 30 samples sequenced with an Illumina MiSeq sequencer (119 runs of 18.8 Gbp total, 150 bp long paired-end reads). The goal of this study was to investigate the transcriptional patterns of individual gene-OTUs within the cyanobacteria Prochlorococcus and Synechococcus, the alpha proteobacterium Pelagibacter and the eukaryote Ostreococcus during the diel cycle. The probes identified 0.68M (0.55% of total) and 0.38M (0.38% of total) sequence reads at 95% sequence identity in metatranscriptome data from the NPSG and CCS, respectively (Table A in S1 File). The number of identified reads per gene identified with MAGC were highly correlated (Pearson's r 0.92±0.07, n = 3) with read counts identified by the WGPB approach in [11,12]. A large fraction of the reads per gene were identified by MAGC, with the median of 53% of reads per gene per sample identified by MAGC to reads identified by WGPB (Tables 1 and 2). Interestingly, the probes used in MAGC were originally designed for sequences that were available publically before 2010 and obtained from samples collected from different regions of the oceans before 2008 [24]. A large portion of the sequences came from the metagenomic dataset from Station ALOHA in the NPSG collected in 2002 [7], the Global Ocean Sampling expedition collected during 2004-2006 [8,31], and metatranscriptomic datasets from the North Pacific, Southwest Pacific and Equatorial Atlantic Oceans collected during 2005-2007 [32][33][34]. The CCS and NPSG datasets were obtained in 2009 (CCS) and 2011, respectively [11,12]. The fact that the probes captured a significant fraction of the targeted transcripts from the CCS and NPSG datasets and the high correlation between the MAGC and WGPB methods imply that, perhaps, we have now largely characterized the diversity of sequences in the environment, at least at the current depth of sequencing, making it feasible to use more targeted approaches, such as MAGC, to address hypothesis-driven ecological questions.
The MAGC approach revealed that oscillating patterns of gene transcription in Prochlorococcus in the NPSG [12] were composed of distinct OTU-specific patterns (Fig 2). Seven hundred ninety-one unique Prochlorococcus transcript-OTUs were detected which represented 54 different genes (Table A in S1 File), and 55.6% of Prochlorococcus transcript-OTUs representing 42 genes had significant diel periodicity in their abundance. From the 42 periodically expressed genes, orthologs of 21 genes (63.5% of the periodic transcript-OTUs) had differential transcriptional patterns among Prochlorococcus taxa and included genes for DNA replication and repair and carbohydrate metabolism (Table 1). For example, the time of maximum transcription of the genes for the DNA replication initiation protein dnaA and for the key cell division protein ftsZ was distinct even among closely related high-light Prochlorococcus OTUs (Fig 2). This diversity in gene expression patterns was not resolved within binned Prochlorococcus populations using the WGPB approach, where transcription of both dnaA and ftsZ had periodic patterns with peaks at~17 h and 16 h, respectively [12]. In contrast to genes with different patterns of transcription among gene-OTUs (Table 1), 16 genes had the same transcriptional patterns for all detected Prochlorococcus gene-OTUs (Table B in S1 File and S1 Fig).
Prochlorococcus populations are comprised of genetically different strains that vary substantially in gene content [10,35,36]. Variations in the transcriptional patterns of genes such as DNA replication and cell division may reflect the differences in physiology, growth and cell division among these Prochlorococcus strains. Additionally, the fact that many genes with periodic transcription patters had differential transcription at the OTU level suggests differences in global regulatory mechanisms among the Prochlorococcus OTUs. Daily transcription patterns also varied among Synechococcus gene-OTUs in coastal California Current waters. Among 98 unique transcript-OTUs for 37 different genes in Synechococcus (Table A in S1 File), 63.6% of the transcript-OTUs had differential patterns over the diel cycle ( Table 2). Eighteen transcript-OTUs from ten different genes had significant periodic patterns,  Transcription for all OTUs shown was estimated based on at least three probes with the exception of the two OTUs indicated with *. Middle panels: Hierarchical clustering of transcription patterns (by Pearson and two genes had significantly different periodic transcriptional patterns among the Synechococcus OTUs. Interestingly, transcription of the cpc gene encoding the pigment phycocyanin and the cpaB1 gene encoding the type 1 phycoerythrin pigment had significant periodicity in the Synechococcus CC9902-like OTUs (e.g. cpaB1 sequence is within 95% nucleotide identity to cpaB1 sequence in cultured Synechococcus CC9902), but not in the co-existing Synechococcus CC9311-like OTU. The timing of maximum abundance of transcripts of the cpc and cpaB1 genes was distinct for each OTU (Fig 3A).  [12] shows that the results of the two approaches are consistent.
doi:10.1371/journal.pone.0146706.g002 both clades chromatically adapt by changing their pigment content to optimize light utilization [37]. The different daily transcription patterns for genes encoding pigments, RuBisCO and photosystem I indicate that the two Synechococcus clades were responding differentially to temporal and spatial variability of environmental factors such as light and nutrient availability [37][38][39]. In addition, distinctions in cpaB-1 and cpc transcription patterns among Synechococcus strains explains the lack of significant periodic patterns observed for these genes when clustered at the genus level [11]. The alpha proteobacteria SAR11 proteorhodopsin gene (bop) had a predominantly oscillating transcriptional pattern (with some variation) at the genus level [12], and the MAGC approach showed that these patterns varied substantially among different SAR11 OTUs in both the CCS and NPSG (Fig 3B, S3 Fig) (Fig 3B). The less abundant bop transcript-OTUs, comprising the remaining 40-60% of the transcripts, had oscillating peak transcript abundances. Thus, OTU-specific transcription demonstrates that one dominant OTU can mask detection of variability in individual patterns. The variable proteorhodopsin gene transcriptional patterns among OTUs suggest that genetic diversity among SAR11 strains [40] sometimes results in different daily gene transcription patterns. Proteorhodopsin is a light-driven proton motive pump, and while the effect of light on the transcription of the bop gene is debatable [41,42], proteorhodopsin is involved in SAR11 cell survival under carbon-limited conditions [43]. Thus, variation in bop transcript abundances among OTUs may reflect the differences in the metabolic status of the cells and reflect variability in environmental conditions (e.g. organic carbon concentration and chemical composition).
Transcripts recovered in the CCS were dominated by the eukaryotic phytoplankter Ostreococcus, which displayed significant transcriptional periodicity across many genes, including the rbcL gene encoding the large subunit of RuBisCo [11]. OTU-specific transcriptional patterns showed that there was a change in the relative composition of the picoeukaryote Ostreococcus populations (Fig 4). We detected two different Ostreococcus rbcL OTUs (Fig 4C and 4D), with 87% and 90% similarity to the chloroplast of O. tauri RCC1561, and both transcript-OTUs had similar patterns of abundances during the CCS study (Pearson 0.74). However, while transcripts from one Ostreococcus rbcL-OTU were very abundant over the two days sampled, transcripts from another OTU-rbcL increased by more than a factor of two in relative abundance by the end of sampling period. Transcription of rbcL by other eukaryotic phytoplankton also shifted, with the relative abundances of haptophyte and stramenopile transcripts decreasing significantly on the second day (Fig 4D). These transcript shifts coincided with a change in water masses by the end of the diel sampling in the CCS [11] (also in Fig 4B). Thus, the shift in rbcL transcript abundances indicate OTU-specific responses to changed conditions (for example, nutrient availability) [44] and the advection of new populations with different water masses.
Lagrangian sampling efforts, designed to sample the same water mass, are becoming more feasible to implement at sea using advanced robotic approaches, but deconvoluting the effects of spatial and temporal variability is still difficult. The metatranscriptomic samples [11,12] were collected with a robotic drifting sampler to approximate Lagrangian sampling, but the path of each drifter migrated between submesoscale features, with corresponding changes in salinity and other environmental conditions such as nutrient concentrations [11,45].
Differences in pstS (the gene encoding the high-affinity phosphate-binding protein that is a marker for phosphorus stress) transcript abundances along the NPSG sampling transect were particularly striking, with Prochlorococcus pstS-OTUs within 95% identity to sequences from MIT9215 and MIT9515 having highest transcript abundances in low-phosphate waters (Fig 5). Phosphorus is an important nutrient that can limit primary productivity in the ocean (Table D in S1 File, [45,46]), and these pstS OTU-specific patterns indicate that MIT9215 and MIT9515-like OTUs may be particularly sensitive to low phosphorus concentrations, relative to other high-light Prochlorococcus strains. Such OTU or strain-level differences are indicative of the specific niches occupied by different populations that lead to differences in growth rates as a function of phosphate concentration. Thus, the OTU-specific transcriptional patterns observed here reveal the differences in physiological statuses of co-existing members of the microbial community and their individual responses to environmental variability.  [11]. CTD cast stations that followed the drifting sampler are shown. (B) Chlorophyll a as a function of salinity during the period of the CCS transect showing that samples were taken from different water masses [11]. (C) Transcript composition and (D) transcriptional patterns of the rbcL gene by eukaryotic phytoplankton. Arrow indicates the direction of the drifting sampler.

Conclusions
The variability in transcription at the OTU or strain-level resolution helps explain how such genetically diverse, sympatric microbial communities [5,10] still retain robust daily patterns in gene transcription [13]. The OTU-specific patterns of gene transcription reported here show that closely-related but genetically distinct taxa indeed have unique transcription patterns, that reflect the temporal (Figs 2 and 3) and spatial (Figs 4 and 5) variability in the environment. The results also support the hypothesis that genetic diversity among microbial strains/ecotypes is reflected in gene expression patterns [19,20]. The MAGC method provides a powerful complementary approach to the WGPB approach to resolve individual gene transcription at the higher taxonomic level, and thereby reveal differential responses that would otherwise be undetectable. The dynamics at the species and sub-species levels underpins a vast array of microbial population genetic diversity. To better understand the fundamental factors that shape marine microbial community composition relative to environmental variability, more highly resolved taxon-specific information is extremely important. Higher resolution approaches may also provide deeper insights into microbiome variability and dynamics in a variety of different environmental contexts.

Methods
Metatranscriptome sequence data, analyzed in this study, was obtained from NCBI Short Read Archive database (http://www.ncbi.nlm.nih.gov/Traces/sra/) from projects under accession numbers SRA: SRP017469 (the two-day study in the California Current System, CCS [11] and SRA: SRP041215 (three-day study in the North Pacific Subtropical Gyre, NPSG [12]. The probes from the MicroTOOLs microarray [24] were obtained from NCBI Gene Expression Omnibus database (www.ncbi.nlm.nih.gov/geo/) under accession number GPL16706. The probes were further filtered to target sequences that are at least 330-nt long, and the final query database consisted of total 98,632 probes targeting 16,806 gene-OTUs of 145 unique functional genes, where each gene-OTU was targeted by six probes. The probes were designed by Roche NimbleGen (Madison, WI) and quality-checked as described previously [24]. Each set of six 60-nt long probes is specific to one gene-out; one probe could potentially show cross-specificity to another gene-OTU, but not all six probes. It is important to note that probes can be designed for any genes of interest using several approaches, e.g. using eArray (Agilent, earray.chem.agilent.com/earray/). In order to assign each read to a target probe, the probe sequences were used as queries in BLASTN search against the metatranscriptomic sequence databases on a local server with the following parameters: 20,000 hits per query and E value of at least 0.0001. The BLASTN results were sorted to leave only the hits with alignment length of 60 nucleotides and nucleotide identity at least 95% for all probes, except probes for Prochlorococcus psbA, for which at least 98% nucleotide identity was selected. Because paired-end Illumina MiSeq sequencing was done in SRP041215, and both pair reads target the same transcript region, only one Illumina read from a pair was selected for further processing. The abundance of a transcript-OTU was calculated as average hit count for all probes (example in Fig 1). Rarely (three instances total, all for the Prochlorococcus psaA gene), a read would hit with equal similarity to several probes from different gene-OTUs, and the read was assigned to all these probes. Reads with incorrect assignments showed high variability between the probes and were manually evaluated. Detected transcripts were determined if at least two probes for this transcript had hits in at least one sample, and the minimum total transcription values were at least 3 or 6 (which is equal to 6-36 reads per transcript) in the CCS and NPSG, respectively. Next, gene transcription within each genus was normalized to total read count for this genus in each sample and multiplied by 10 6 .
Transcriptional patterns were defined as the difference in transcript abundance in samples relative to the mean transcript abundance in all samples, and periodic (following a cosine curve over a 24-hour period) transcriptional patterns were identified using the Fourier score (R package 'cycle' [47]). The significance of Fourier score was assessed with the false discovery rate (FDR) using a total of 10,000 background models generated with the autoregressive processes of order (AR1) model. The AR1 model allows generating background models with the same distribution as the original dataset [48]. Due to the nature of environmental data, the chosen FDR threshold was higher than it would be for a dataset obtained from cultures, and the significantly periodic transcripts were defined as those that had FDR < 0.25.
Transcriptional patterns were clustered using hierarchical clustering (Pearson correlation for distance measure and complete agglomeration method for clustering), and the significance of cluster assignment was estimated using Approximately Unbiased (AU) P-values of >0.95 (AU P-values range from 0 to 1 and show how strong the cluster is supported by data), where the AU P-values were calculated from at least 1,000 mutliscale bootstrap resampling (pvclust R package [49]). The identified genes, periodicity and cluster assignment for both NPSG and CCS studies are presented in S1 Table. All reads found with the probes for Prochlorococcus, Synechococcus, SAR11 and Ostreococcus are listed in S2 Table. All data processing and analysis were done using RStudio and BioConductor [50] with the additional packages ggplot2 [51], seqinr [52] and ShortRead [53].
Supporting Information S1 File. Supporting Tables. Summary of Results Obtained with the MAGC Approach in Comparison to the WGPB Approach for Metatranscriptomes from the Two Oceanic Regions ( Table A). Genes with Similar Transcriptional Patterns among Prochlorococcus OTUs during the 72-h Study in the NPSG (Table B). Genes with Similar Transcriptional Patterns among Synechococcus OTUs during the 48-h Study in the CCS (Table C). Phosphate Concentrations and Primary Production during the NPSG Cruise (Table D). Top panels: Transcriptional composition detected by the MAGC approach, where transcription was normalized to the total Prochlorococcus hits in each sample, over time of day (X-axis in hours). OTUs are color-coded according to the heat map coloration below. Middle panels: Hierarchical clustering of transcriptional patterns (by Pearson correlation) for amt and coxA transcripts. Each row in the heatmap shows transcription of a unique OTU transcript, and each column is a time point within the 72 hour time-series. The OTU transcript ID are only shown, and affiliation for each transcript can be found in S1 Table. Bottom panels: Temporal patterns of total transcript abundances detected by MAGC (open circle) in this study and by WGPB (closed circle) [12] shows that the results of the two approaches are consistent.  [12] shows that the results of the two approaches are consistent. (TIF) S1