Biogeography of Photosynthetic Light-Harvesting Genes in Marine Phytoplankton

Background Photosynthetic light-harvesting proteins are the mechanism by which energy enters the marine ecosystem. The dominant prokaryotic photoautotrophs are the cyanobacterial genera Prochlorococcus and Synechococcus that are defined by two distinct light-harvesting systems, chlorophyll-bound protein complexes or phycobilin-bound protein complexes, respectively. Here, we use the Global Ocean Sampling (GOS) Project as a unique and powerful tool to analyze the environmental diversity of photosynthetic light-harvesting genes in relation to available metadata including geographical location and physical and chemical environmental parameters. Methods All light-harvesting gene fragments and their metadata were obtained from the GOS database, aligned using ClustalX and classified phylogenetically. Each sequence has a name indicative of its geographic location; subsequent biogeographical analysis was performed by correlating light-harvesting gene budgets for each GOS station with surface chlorophyll concentration. Conclusion/Significance Using the GOS data, we have mapped the biogeography of light-harvesting genes in marine cyanobacteria on ocean-basin scales and show that an environmental gradient exists in which chlorophyll concentration is correlated to diversity of light-harvesting systems. Three functionally distinct types of light-harvesting genes are defined: (1) the phycobilisome (PBS) genes of Synechococcus; (2) the pcb genes of Prochlorococcus; and (3) the iron-stress-induced (isiA) genes present in some marine Synechococcus. At low chlorophyll concentrations, where nutrients are limited, the Pcb-type light-harvesting system shows greater genetic diversity; whereas at high chlorophyll concentrations, where nutrients are abundant, the PBS-type light-harvesting system shows higher genetic diversity. We interpret this as an environmental selection of specific photosynthetic strategy. Importantly, the unique light-harvesting system isiA is found in the iron-limited, high-nutrient low-chlorophyll region of the equatorial Pacific. This observation demonstrates the ecological importance of isiA genes in enabling marine Synechococcus to acclimate to iron limitation and suggests that the presence of this gene can be a natural biomarker for iron limitation in oceanic environments.

Synechococcus and Prochlorococcus are defined by two distinct lightharvesting (LH) systems that act as LH antenna for both types of photosynthetic reaction center, photosystem I (PSI) and photosystem II (PSII) [9,10]. The LH system in Synechococcus involves the phycobilisome (PBS), stacks of chromophorylated protein complexes located externally to the photosynthetic thylakoid mem-brane and encoded by the genes cpc (phycocyanin), cpe (phycoerythrin) and apc (allophycocyanin) [11]. Some Prochlorococcus strains have cpe genes; however, these are phylogenetically distinct from Synechococcus cpe [12] and no Prochlorococcus has been shown to synthesize a functional phycobilisome; indeed, the role of phycoerythrin is thought to be signal transduction rather than light harvesting [13]. The LH system in Prochlorococcus involves membrane-bound, chlorophyll-binding proteins (Pcbs) encoded by the pcb genes [14][15][16][17][18]. Some marine Synechococcus contain pcb-like genes that are induced under conditions of iron limitation and can be identified as a phylogenetically distinct group that includes the functionally characterized iron-stress-induced gene isiA; this gene is sometimes referred to as pcbD or pcbC/isiA and here is called isiAlike [6,[19][20][21]. We can therefore define three functionally distinct types of LH genes in marine oxyphotobacteria: (1) the PBS genes of Synechococcus; (2) the pcb genes of Prochlorococcus; and (3) the ironstress-induced genes (isiA-like) present in some marine Synechococcus.
The Global Ocean Sampling Project (GOS) is revolutionizing our understanding of the complexity of marine microbial communities that drive biogeochemical cycles [22][23][24][25][26]. It provides a unique and powerful tool with which the environmental diversity of a gene can be analyzed in relation to available metadata [27][28] such as geographical location and physical and chemical environmental parameters. Although the dataset is continuing to grow, this study analyzes the first one-third of the data: the ,0.8mm size fractions from 44 marine-surface stations of a transect of the Northern Atlantic through the Gulf of Mexico and into the equatorial Pacific. In this study, we analyze the environmental diversity and biogeography of the three functionally distinct groups of LH genes within the available GOS dataset and define the environmental parameters at which different photosynthetic strategies are successful.

Results
The environmental distribution of LH genes associated with different photosynthetic strategies was determined by phylogenetic analysis of all prokaryotic LH genes from marine stations of the GOS database [25] (Table 1). Of the 44 GOS stations, 19 had no hits for any prokaryotic LH peptides; these stations, representing only 14% of the sequenced GOS data, were either in coastal or temperate regions (N.E. Atlantic) and were dominated by eukaryotic cells .0.8 mm in size, or were from sites in the equatorial Pacific where the current sequenced metagenome size is very small [25]. At an e-value of 210, 368 unique positive hits were recovered for peptides of the Pcb or IsiA-like LH-types and 221 for those of the Synechococcus PBS LH-type. This study therefore identified 589 prokaryotic LH genes within the GOS dataset. Figure 1a shows the results of phylogenetic analysis of the prokaryotic chlorophyll-binding LH peptides (Pcb and IsiA-like, also referred to as accessory chlorophyll-binding proteins, CBPs [20]) in the GOS and NCBI databases (see Methods). The overall distribution of Pcb and IsiA-like peptides can be categorized into three distinct groups that reflect the phylogenetic distribution of these genes from cultured representatives [6,19,20,29]. Group I comprises a group of Pcbs from Prochlorococcus that, owing to the results of laboratory culture experiments, are thought to act as LH antennae for the photosynthetic reaction center PSI [6,[14][15][16][17]. Group II comprises Prochlorococcus Pcbs that, from laboratory studies, are thought to act as antennae for PSII [15]. Group III is phylogenetically similar to IsiA-like peptides of marine Synechococcus [6,29].
An important advantage of the GOS dataset is that genomic data can be analyzed in relation to the location at which the samples were obtained. Figure 1b shows that there is a greater diversity of unique genes in the pcb/isiA-like family at open-ocean stations compared with coastal stations. Phylogenetic studies on the PBS genes ( Fig. 2 and Fig. S2) revealed similar phylogenetic relationships between environmental and cultured representatives of these genes, and shows that there is a greater diversity of PBS genes in coastal stations than open-ocean stations. This relationship was used to identify the cpe genes that are phylogenetically related to cpe genes of Prochlorococcus strains (Fig 2a and Fig S2 group II); as these genes are not thought to be involved in lightharvesting [13,16] they have been omitted from further analysis.
A prokaryotic LH gene budget has been calculated by determining the fraction of the total number of functional LH genes at each GOS station that represent pcb, PBS or isiA-like LHtypes. These budgets have been plotted against surface chlorophyll concentrations measured from satellite images taken at the time of sampling (Fig. 3). Chlorophyll concentration is used as a first-order indicator of phytoplankton gross biomass and can indicate that macronutrients were present in the environment; production of approximately 1 mg/L Chl a requires 1 mmol/L of available nitrate [30]. These plots of chlorophyll concentration and LH gene budget (Fig. 3) demonstrate that the environment selects for different photosynthetic strategies.
The pcb-type genes show greatest genetic diversity in lowmacronutrient surface waters where chlorophyll concentrations are low (,0.35 mg/L Chl a) (Fig. 3a). The PBS genes show greatest genetic diversity in surface waters with higher chlorophyll concentrations and increased macronutrient availability (.0.35 mg/L Chl a); only genes of the PBS-type were present at GOS stations with .0.7 mg/L Chl a, these are omitted for clarity (Fig 3b). The extent of genetic diversity within the isiA-like gene group shows no clear correlation with surface chlorophyll concentration (Fig. 3c); however, the distribution of this gene occupies the specific niche between environments that select for the Pcb-and PBS-type LH systems (0.26-0.51 mg/L Chl a). The biogeographic ranges of these LH strategies based on chlorophyll concentrations are shown in Fig. 4. The pcb-type LH strategy is dominant in the oligotrophic open ocean, where Prochlorococcus is the numerically dominant marine phytoplankton [2,3,8]. The PBS-type strategy is favored on the edge of the ocean gyres, in nutrient-upwelling zones and in some coastal environments, showing a good correlation with the known geographic dominance of Synechococcus [3,4,31]. Interestingly, the specific biogeography of the isiA-like strategy in the GOS database is in the vicinity of the Galapagos Islands, in close proximity to the sites of the classic ironenrichment experiments IronExI and IronExII, which demonstrated that iron is the primary limiting trace element in this region [32]. The equatorial Pacific is the only prokaryotic-dominated high-nutrient low-chlorophyll (HNLC) marine ecosystem and is the only location in the GOS database at which the isiA-like gene is found with high genetic diversity.

Discussion
The phylogenetic analysis of the pcb/isiA gene family from the GOS dataset ( Fig. 1 and Fig. S1) resolves groups of functionally distinct genes similar to those recovered from analysis of genes in culture collections and environmental studies of the phylogeny of pcb genes [6]. This shows there is good coverage of the genetic capacity of the pcb/isiA gene family in current culture collections. The greater genetic diversity of the pcb/isiA gene family at openocean stations and of the PBS genes at coastal stations (Fig 1b and  2b) reflects the known global environmental distribution of Prochlorococcus and Synechococcus cells [2][3][4][5], and suggests that a high genetic diversity of LH functional genes reflects positive selection in a marine environment [33,34].
Prochlorococcus species have been separated into two main ecotypes that are adapted to high-light (HL) or low-light (LL) conditions, with considerable further niche adaptation within these groupings [7,[14][15][16]. The PSII-type pcb genes are the most diverse group at all GOS sampling stations (Fig. 1a). This probably reflects the low chlorophyll content of PSII core dimers compared with PSI core trimers [15], and the resulting need for PSII to be associated with an additional LH system to increase the functional cross-section of PSII. The PSI-type Pcbs are also consistently present in surface waters throughout the GOS sampling regions, although with lower diversity (Fig. 1b). This finding is consistent with those of Kettler et al (2007), who demonstrated that many HL ecotypes of Prochlorococcus contain both PSI-and PSII-associated Pcbs [35]. This observation suggests that light intensity is not the main ecological selection pressure on Prochlorococcus photosynthetic The total numbers of unique genes of each defined LH gene-type identified at each station are shown. Only samples in the size fraction ,0.8 mm and from surface (5-m depth) marine stations were used in this analysis; non-marine (such as a hypersaline lagoon) stations were not used. Stations where no LH genes were found are either in the NE Atlantic, and so assumed to be dominated by large eukaryotic phytoplankton species .0.8 mm, or from stations in the equatorial and south Pacific where the size of the sequenced environmental genome is low. doi:10.1371/journal.pone.0004601.t001 strategy and that nutrient availability may be a more important factor in determining Prochlorococcus ecotype distribution [7]. Some extant representatives of Prochlorococcus have been shown to contain genes encoding the protein phycoerythrin (cpeB and cpeA) [13,35,36]. A total of 98 Prochlorococcus cpe genes (alpha and beta subunit incomplete sequences) were recovered from the GOS database ( Fig. 2 and Fig S2), the majority of which were cpeB; however, the functional relevance of PBS genes in surface populations of Prochlorococcus is unlikely to involve light-harvesting [13,33], so these genes were omitted from further analysis in this study.
The strong correlations between greater genetic diversity in a group of genes and surface chlorophyll concentration (Fig. 3) reflect the positive selection for LH gene-types in a particular environment [9,31,33,34,37]. In addition, these correlations are substantiated by the known energetic and functional characteristics of each LH-type. Pcb genes are dominant in low-macronutrient waters (,0.35 mg/L Chl a), which reflects the lower macronutrient input required for the cell to synthesize a functional pcb-type LH system compared with the PBS-type LH system [9], thereby making this photosynthetic strategy favoured in this environment [8,9,37]. At higher macronutrient concentrations (.0.35 mg/L Chl a), PBS production is energetically favoured and the PBS-type system has an advantage over Pcbs by preferentially absorbing in the range 550-650 nm, where chlorophylls cannot absorb and that are predominant in waters sustaining a high biomass [35]. At Chl a concentrations .0.7 mg/ L, nutrient concentrations are sufficiently high to sustain large (.0.8 mm) eukaryotic phytoplankton cells that use other LH complexes [21].
The isiA-like LH-type is found specifically at the interface of two geographically defined regions dominated by pcb-type or PBS-type LH systems. Here, the environment selects for a unique photosynthetic LH strategy in which a Synechococcus cell incorporates both Prochlorococcus (Pcb type) and Synechococcus (PBS-type) LH antennae systems; the resulting ''chimeric'' cell can use each type of photosynthetic strategy and acclimate according to environmental conditions, thereby conferring a specific selective advantage and indicating that there is an environmental selection of photosynthetic strategy [31]. The observation that isiA-like genes are present in this region confirms that IsiA in the environment can alter the photosynthetic strategy of a cell and confer an advantage over cells with a PBS-only LH system [38][39][40]. The molecular function of the IsiA protein has been shown to be an antennae for PSI reaction centers [39,40], increasing the functional absorption cross-sectional area by 72% and enabling iron-limited cells to reduce the ratio of PSI:PSII such that the number of PSI centers is reduced [41]. As every functional PSII contains 3 iron atoms, compared with 12 in every functional PSI, this represents a significant reduction in iron quota per cell [39].
IsiA has been used to explain the specific photophysiology of phytoplankton communities in the equatorial iron-limited surface waters and recalculate global oceanic productivity [38]. Although the biogeography of isiA-like genes is consistent with this description, our current understanding of the function of the IsiA protein as a coupled antenna system for PSI is at odds with this interpretation.
Of the 11 sequenced marine Synechococcus species, 4 have been shown to contain the isiA-like gene [6,31]. Consistent with the biogeography of isiA-like genes outlined in this report, three of these extant marine Synechococcus species that contain isiA have been isolated from marine environments that are potentially ironlimited, including the Californian coastal upwelling zone [42] (Synechococcus sp CC9311, CC9605 and CC9902). isiA has also been reported in Synechococcus sp BL107, a strain isolated from ,100-m depth in the Mediterranean; it is unlikely that surface waters of the Mediterranean are iron-limited, but there is some evidence of sub-surface iron-limitation in stratified waters, although further study is required [43].
Another marine oxyphotobacteria in which isiA has been found is the diazatroph Trichodesmium [44,45]. Trichodesmium forms large colonies (.0.8 mm) and so would not be included in the current GOS dataset; however, it is widespread in many tropical and subtropical open-ocean gyres, where it has a key role in driving new production [44]. The distribution of isiA-containing Trichodesmium is at odds with the biogeography of the gene indicated from the GOS dataset. However, the iron requirements of Trichodesmium are considerably greater than those of other marine Synechococcus because nitrogen fixation is a major sink for iron (photosynthetic electron transfer requires 23-24 iron atoms, nitrogen fixation requires an additional 19 iron atoms) [45]. Trichodesmium is therefore iron-limited at greater iron concentrations than other non-diazatrophic oxyphotobacteria. The use of IsiA as an ironefficient LH photosynthetic strategy may allow Trichodesmium to fix nitrogen and drive new production in many open-ocean environments.
Considering the known function of the IsiA protein, the highly restricted biogeography of isiA across the currently available GOS stations and the native location of sequenced marine Synechococcus species containing the isiA gene, we propose that the presence of isiA in the marine environment can be used as a natural biomarker of iron-limitation in prokaryotic communities [46]. This paper describes natural environmental gradients of different photosynthetic strategies in marine oxyphotobacteria on oceanic basin scales, and describes an evolutionary gradient of photosynthetic strategy from an ancestral LH system (PBS) [19,47] that required high nutrient inputs and available iron, to a strategy that evolved to exploit increasingly iron-limited ocean environments (IsiA). Cells that permanently use this latter strategy (Pcb) could exploit the vast macronutrient-limited open-ocean gyres. Having exploit- Figure 1. Phylogenetic analysis of the pcb/isiA light-harvesting gene family. A maximum-likelihood phylogenetic tree of the C-terminal region of Pcb/IsiA LH peptides (a). Pcb and IsiA proteins (sequence details see Table 1) from the sequenced representatives of Prochlorococcus and Synechococcus in the NCBI database are included as references of phylogenetic classification. The tree was rooted from the middle point. Shading indicates the environmental location of recovered sequences (coastal, dark blue; open ocean, light blue). Three phylogenetic groups are resolved, see text for details (I, gray; II, yellow; III, pink). The bar corresponds to the average substitutions per site. Bootstrapping support numbers are shown. The pie chart (b) represents the metagenomic profile of LH genes identified at open-ocean or coastal locations. Referred sequences (unshaded): PcbA_ss120, PcbA of Prochlorococcus sp. CCMP1375 (SS120) (NP_875175); PcbB ss120, NP_875561; PcbC_ss120, NP_875277; PcbD_ss120, NP_875559; PcbE_ss120, NP_875841; PcbF_ss120, NP_875679; PcbG_ss120, NP_875284; PcbH_ss120, NP_875566. PcbA_9211, PcbA of Prochlorococcus sp. ed these environmental niches, photosynthetic species (using Pcbs) have become the most abundant photosynthetic species on the planet, with a pivotal role in providing energy for the marine environment.

Searching for LH genes in GOS
All available GOS protein sequences studied were obtained from the CAMERA database [24,25] and were valid at the time of submission. The IsiA/Pcbs dataset was obtained by BLAST analysis of the metagenomic open-reading frame (ORF) peptide database in CAMERA (http://camera.calit2.net/) using seven selected IsiA/Pcb sequences, including Pcbs of Prochlorococcus sp CCMP1986 (NP_892745), Synechococcus sp CC9605 (YP_381894), Prochlorococcus sp CCMP1375 (NP_875175), Acaryochloris marina (AAS76629), Acaryochloris marina (AAS76628), and IsiA's of Synechocystis sp PCC6803 (NP_441268) and Synechococcus sp PCC7002 (P31157), with a lower cut-off (1Ex = 210); any sequences that, by compare to the sequenced NCBI data, were shown not to be isiA/pcbs were manually removed. The length of available GOS IsiA/Pcbs is between 50 and 352 amino acid residues. The PBS dataset was obtained by BLAST analysis of eleven sequences to the metagenomic ORF peptide database in CAMERA, including the R-phycocyanin alpha chains of Synechococcus sp WH8103 (P11394), C-phycoerythrin class II alpha and beta chains of Synechococcus sp WH8102 (NP_898100 and NP_898113), phycocyanin alpha and beta chains of Synechococcus sp RS9917 (ZP_01080760 and ZP_01079824), C-phycoerythrin class I alpha and beta chains of Synechococcus sp WH7803 (YP_001224209 and YP_001224208), phycoerythrin alpha chain of Prochlorococcus marinus str MIT9303 (YP_001018237), phycoerythrin beta chain of Prochlorococcus marinus str MIT9301 (YP_001090554), and allophycocyanin alpha and beta chains of Synechococcus elongatus PCC 6301 (YP_171896 and YP_171897). A total of 319 sequences related to the subunits of PBS were found in the available GOS database, of which 221 were shown by subsequent phylogenetic analysis to be similar to Synechococcus sequences. The length of those peptides ranged from 30 to 179 amino-acid residues. The current GOS dataset consists of sequences in the ,0.8-mm cell size fraction; this will encompass all known examples of Prochlorococcus (cell size range 0.5-0.7 mm, mean 0.6 mm) [8] and a sizeable fraction of the Synechococcus species (cell size range 0.6-1.6 mm, mean 0.9 mm). The lack of sampling of some Synechococcus species is a limitation of the current dataset. In this study, we assume that the environmental conditions experienced by the sampled Synechococcus species are indicative of the entire Synechococcus community at that location.

Sequence alignment
A preliminary sequence alignment was inferred by ClustalX Version 1.83 [48] with (i) gap opening penalty of 10.00, (ii) gap extension penalty of 0.2, (iii) the Gonnet series for protein weight matrix, and (iv) hydrophilic penalties for the following amino acids: G, P, S, N, D, Q, E, K and R. The alignments were refined manually based on structural information obtained from secondary structure analysis and also from crucial chlorophyll-binding amino-acid positions. The secondary structure predictions were obtained using the web based program TMHMM Version 2.0 [49]. The average full length of IsiA/Pcbs is about 350 amino acids and the average length of alpha subunits of PBS is about 164 amino acids. To obtain enough structure information, the shorter peptide sequences (shorter than 150 amino acids for Pcb/IsiA and shorter than 80 amino acids for PBS alpha subunits) were excluded from the phylogenetic tree estimation. The alignments were divided into two parts: one part including the sequences without the N-terminal region and the other part including the sequences without the C-terminal region. The known LH genes from genomes of Prochlorococcus and Synechococcus in the NCBI dataset were used as references for the phylogenetic classification.

Phylogenetic analysis
Phylogenetic trees were inferred by using maximum-likelihood (ML) and Bayesian methods. The ML tree was analyzed by Phyml [50]; the WAG model [51] was also used with 100 replicates to give the bootstrap values. The Bayesian method was analyzed by MrBayes Version 3.1 [52,53]. Eight chains were run for the Metropolis-coupled Markov chain Monte Carlo model. Each chain ran for 1,000,000 generations and started with a flat prior for all trees. We have sampled trees from the chain every 100 generations. The 'burn-in' period covered the first 100,000 generations. A discrete-gamma model [53] was implemented to accommodate rate variation among sites for both ML and Bayesian analyses with four different categories. Phylogenetic analysis is restricted by short-length metagenomic fragments; as a result, N-terminal region or C-terminal region phylogenetic trees were constructed using data from IsiA/Pcb (.150 amino acids) and PBS (.80 amino acids), respectively. Only 24 gene sequences were recovered for the C-terminal region of PBS genes, about 10% of total recovered PBS gene fragments, so no phylogenetic tree has been constructed for these data.

LH gene budget
LH budgets for each GOS station were determined by analysis of the proportion of the total number of unique protein fragments (of any size) recovered from each GOS station that, through sequence alignment and phylogenetic analysis, were classed as (1) Pcb, (2) IsiA-like and (3) PBS from Synechococcus. The statistical significance of the correlation of the proportion of each LH group at each GOS station with surface chlorophyll concentration (downloaded from the GOS dataset) was determined by Pearson's correlation on each dataset.

Distribution of isiA in sequenced marine Synechococcus
Eleven fully sequenced and annotated genomes of Synechococcus were analyzed using the IMG system (http:img.jgu.doe.gov/) that also provided information on the environmental location of each species. NCBI was used to probe for the presence or absence of isiA homologs within these genomes. Figure S1 A maximum-likelihood phylogenetic tree of the Nterminals of Pcb/IsiA LH peptides. The Pcb and IsiA proteins from the sequenced representatives of Prochlorococcus and Synechococcus from the NCBI database are included as references for phylogenetic classification. The tree was rooted from the middle point. Shading indicates the environmental location of recovered sequences (coastal, dark blue; open ocean, light blue). Three phylogenetic groups are resolved (for details see Fig. 1). The bar corresponds to the average substitution per site. Bootstrapping support numbers are shown. The details of reference sequences (unshaded) are given below. Referred sequences in Figure S1 PcbA_ss120, PcbA of Prochlorococcus sp. CCMP1375 (SS120) Figure S2 Phylogenetic analysis of the PBS light-harvesting gene family. (a) A maximum-likelihood phylogenetic tree of the Nterminal amino-acid sequences of PBS beta subunit peptides greater than 80 amino acids in length (CpcB, CpeB and ApcB) obtained during the GOS expedition. Shading indicates the environmental location of recovered sequences as coastal (dark blue) or open ocean (light blue). Group I refers to PBS sequences phylogenetically similar to the references sequences from Synechococcus spp, whereas group II refers to PBS sequences phylogenetically to Prochlorococcus CpeB sequences that were omitted from further analysis. The tree was rooted from the middle point. The bar corresponds to the average substitution per site. The pie chart (b) represents the metagenomic profile of LH genes identified at open-ocean or coastal locations (excluding the metagenomic sequences similar to CpeB of Prochlorococcus spp). The details of reference sequences (unshaded) are given in the supplementary data. Referred sequences in Figure S2: Cpeb_9301, phycobilisome protein of Prochlorococcus marinus str. MIT 9301 (YP_001090554); Cpeb_8102, C-phycoerythrin class I beta chain of Synechococcus sp. WH 8102, NP_898108; cpeb_307, C-phycoerythrin class I beta chain of Synechococcus sp. RCC307, YP_001228314; cpeb_7803, Cphycoerythrin class I beta chain of Synechococcus sp. WH 7803, YP_001224208; cpeb_9902, C-phycoerythrin class I beta chain of