Diversity and Evolutionary History of Iron Metabolism Genes in Diatoms

Ferroproteins arose early in Earth’s history, prior to the emergence of oxygenic photosynthesis and the subsequent reduction of bioavailable iron. Today, iron availability limits primary productivity in about 30% of the world’s oceans. Diatoms, responsible for nearly half of oceanic primary production, have evolved molecular strategies for coping with variable iron concentrations. Our understanding of the evolutionary breadth of these strategies has been restricted by the limited number of species for which molecular sequence data is available. To uncover the diversity of strategies marine diatoms employ to meet cellular iron demands, we analyzed 367 newly released marine microbial eukaryotic transcriptomes, which include 47 diatom species. We focused on genes encoding proteins previously identified as having a role in iron management: iron uptake (high-affinity ferric reductase, multi-copper oxidase, and Fe(III) permease); iron storage (ferritin); iron-induced protein substitutions (flavodoxin/ferredoxin, and plastocyanin/cytochrome c6) and defense against reactive oxygen species (superoxide dismutases). Homologs encoding the high-affinity iron uptake system components were detected across the four diatom Classes suggesting an ancient origin for this pathway. Ferritin transcripts were also detected in all Classes, revealing a more widespread utilization of ferritin throughout diatoms than previously recognized. Flavodoxin and plastocyanin transcripts indicate possible alternative redox metal strategies. Predicted localization signals for ferredoxin identify multiple examples of gene transfer from the plastid to the nuclear genome. Transcripts encoding four superoxide dismutase metalloforms were detected, including a putative nickel-coordinating isozyme. Taken together, our results suggest that the majority of iron metabolism genes in diatoms appear to be vertically inherited with functional diversity achieved via possible neofunctionalization of paralogs. This refined view of iron use strategies in diatoms elucidates the history of these adaptations, and provides potential molecular markers for determining the iron nutritional status of different diatom species in environmental samples.


Introduction
Earth's early oceans were rich in dissolved ferrous iron, which fostered the evolution of catalytic proteins that relied upon the redox potential of iron [1]. The onset of the Great Oxygenation Event approximately 2.3 Gya [2] caused iron(III) to precipitate out of seawater, transforming iron from a readily-available nutrient to a scarce commodity. Yet the legacy of ferroproteins persists with iron remaining an obligate cofactor of many essential metalloproteins. Photosynthetic organisms have particularly high iron requirements, with about half their total intracellular iron contained within photosynthetic proteins [3][4][5].
Approximately 20% of global photosynthesis is carried out by marine diatoms [6]. As members of the stramenopiles, diatoms appeared in the fossil record about 190 million years ago [7] and subsequently diverged into four Classes-the more ancient Coscinodiscophyceae and Mediophyceae diatoms (commonly referred to as centric diatoms), and the more recently diverged Fragilariophyceae and Bacillariophyceae diatoms (commonly known as pennate diatoms). Today, diatoms are one of the more species-rich groups of eukaryotic micro-organisms, able to bloom in both iron-rich coastal and iron-poor open ocean environments [8]. Diatoms rely on a diversity of strategies to meet cellular iron demands, including a high-affinity iron uptake system; iron storage capacity; substitutions of iron-requiring proteins with non-ferrous functional equivalents; and mechanisms to mitigate the risk of damage from reactive oxygen species produced in the presence of this redox-active metal [9][10][11].
The high-affinity reductive uptake system for iron was initially described in yeast [12][13][14] and consists of a high-affinity ferric reductase (FRE) that dissociates iron(III) from organic ligands; a multi-copper oxidase (MCO) that oxidizes the released iron(II) to iron(III); and an iron permease (FTR) that receives iron(III) from MCO for translocation across the cell membrane. Genes encoding putative ferric reductases [15], the putative multi-copper oxidase [16] and an iron(III) permease [17] have been detected in a limited number of examined diatoms.
Once inside the cell, intracellular concentrations of iron must be tightly regulated to avoid oxidative damage. The best understood system for storing iron is ferritin (FTN), which self-assembles into a multimeric nanocage that sequesters iron(III) within its spherical structure [18]. Initially, genes encoding ferritin were conspicuously absent from the stramenopiles with publicly available whole genome sequences. The discovery in 2009 of FTN in a subset of diatoms led to the hypothesis that acquisition of this gene may have facilitated expansion of diatoms into the low-iron environment of the open ocean [19].
Low iron availability in today's oceans appears to have driven the evolution of proteins that are functionally equivalent to ferroproteins but do not use iron as a cofactor. Common examples are found in the photosynthetic electron transfer chain. The iron-requiring ferrodoxin (encoded by petF) can be replaced by flavodoxin (encoded by FLDA), which uses flavin, rather than iron, as the redox cofactor [10,[20][21][22]. Two isoforms encoded by a clade I and a clade II FLDA differ in their response to iron availability, with the clade II FLDA transcript abundance apparently regulated by iron levels [21]. The gene encoding ferredoxin appears plastid-encoded in most diatoms [23,24], although an evolutionarily recent transfer of the gene to the nucleus was reported for Thalassiosira oceanica [25]. A second substitution example is replacement of the iron-requiring cytochrome c 6 (CYTC6) with the copper-coordinating plastocyanin (PCYN). Thus far, this replacement has only been observed in T. oceanica [26], though detection of plastocyanin transcripts have been reported for Pseudo-nitzschia granii and Fragilariopsis cylindrus [9].
Superoxide dismutases (SODs) combat the formation of reactive oxygen species in the presence of redox-active metals like iron, catalyzing the transformation of O 2 into molecular oxygen and hydrogen peroxide [27]. Four types of SODs are defined by the use of different metal cofactors: Fe, Mn, Cu-Zn, or Ni. The Fe-and Mn-binding SODs are structurally similar and likely diverged following an ancient gene duplication [27,28]. Cu-Zn-and Ni-utilizing SODs are evolutionarily distinct from each other and from the Fe/MnSODs and may represent convergent evolution of similar function [29]. NiSODs were recently recognized in eukaryotes, having now been identified in the diatom Phaeodactylum tricornutum as well as the prasinophytes Ostreococcus and Micromonas [30][31][32]. Limited availability of genetic data from diatoms has hindered a better understanding of the influence of various iron utilization strategies on the distribution and ecological success of diatom species. The Marine Microbial Eukaryote Transcriptome Sequencing Project [33] has greatly expanded our knowledge of the breadth and depth of functional genetic diversity of marine microeukaryotes. At the time of this study, the MMETSP consisted of 367 transcriptomes derived from 151 genera and included 77 diatom transcriptomes from 47 species and 31 genera. With this data, in conjunction with existing whole genomes from other microeukaryotes, we examined the diversity and evolutionary history of genes required for different iron metabolic strategies in diatoms.

Components of a reductive uptake system appear in the four diatom Classes
Transcripts encoding at least one component of the high-affinity iron uptake system-ferric reductase (FRE), multi-copper oxidase (MCO) and/or iron(III) permease (FTR)-were identified in all but 3 diatom species based on the conservation of motifs originally defined in Saccharomyces cerevisiae [12][13][14] (Figs 1 and 2). Cyclophora radiata, Proboscia inermis, and Pseudonitzschia granii lacked evidence of transcription of any of the three components, despite the fact that P. granii was grown under iron-limiting conditions. Many species transcribed more than one copy of FRE, which encodes the ferric reductase (Fig 1). FRE copies were identified based on the presence of key motifs: an FAD binding site The two FRE paralogs identified in T. pseudonana [15] fall within both clades, while the two paralogs originally identified in P. tricornutum [15] group within FRE-I; the two FRE paralogs (ScFRE1, 2) identified in S. cerevisiae [12] group within the FRE-II clade.
Multicopper oxidases (MCOs) are members of a multi-protein family defined by the presence of four conserved copper-binding motifs with Cu-coordinating histidine or cysteine residues. Within this family, ferroxidases (FET) are MCOs with iron(II) oxidizing capacity. The identification of a multicopper oxidase coding gene in T. pseudonana was based upon homology of copper-binding motifs to yeast FET homologs [15]. The additional diatom sequences identified here contain all four Cu-coordinating motifs, with the exceptions of members of genera Thalassiosira, Skeletonema, Odontella, Detonula, and Thalassionema, which display incomplete motifs at the second and fourth Cu-coordinating sites observed in MCO paralogs (S3 Fig). Ferroxidase activity of the multicopper oxidase Fet3p in S. cerevisiae is inferred to be due to three potential residues, E185, D283, and D409 [14]. Alignment of diatom sequences with yeast Fet3p does not reveal conservation of these residues, though other acidic residues are seen adjacent to these sites. Phylogenetic analysis clusters putative MCOs from marine microeukaryotes into multiple clades, likely reflecting functional variants. MCO sequences with known ferroxidase function from four fungi (Blastobotrys adeninivorans AFET3, Cryptococcus neoformans CNLAC1, Phanerochaete chrysosporium MCO-4) join a well-supported clade that includes FET from T. pseudonana (S4 Fig); an association observed in previous studies [16].
Iron(III) permeases possess dual [REXXE] motifs as defined for S. cerevisiae (ScFTR) [13]. These residues are maintained in the putative diatom FTR transcripts ( S5 Fig). Transcripts for a single copy of FTR were detected in about half the diatoms across the four Classes, with two paralogs detected in six species (Fig 1). Diatom FTR forms a paraphyletic clade, owing to the inclusion of several non-diatoms, including the ciliate Tiarina fusus, the chlorophyte Ostreococcus mediterraneus, and the dinoflagellates Dinophysis acuminata and Kryptoperidinium foliaceum, the latter of which contains a diatom endosymbiont ( S6 Fig).

FTN is present in every diatom Class, with divergent paralogs found
Ferritin is defined by the presence of key residues at the ferroxidase centers, as confirmed in P. multiseries FTN by Marchetti et al. [19], and conserved residue pairs shown to be essential for iron release in Rana catesbeiana [34]. Putative FTN transcripts were detected across the four Classes of diatoms, with at least one copy identified in 33 out of 54 examined species from 21 Fig 1. Presence or absence of detected transcripts in diatoms per species or subspecies. Sampled diatoms are shown with three select outgroups. Numbers indicate total unique copy variants as defined by the number of independently clustering paralogs. Abbreviations: ferritin, FTN; flavodoxin, FLAV; ferredoxin, FER; plastocyanin, PCYN; cytochrome c6, CytC6; superoxide dismutase, SOD. Eukaryote-only 18S sequences curated and aligned by SILVA (www.arb-silva.de) were re-aligned with the 18S sequences available for all MMETSP samples and for genomes and non-MMETSP transcriptomes included in our analyses (see Methods). Tree drawn with FastTree. Diatoms all fell within a single clade; all other eukaryote branches except for select outgroups are not shown.
doi:10.1371/journal.pone.0129081.g001 different genera (Fig 1). The ferroxidase residues are conserved within all diatoms except P. tricornutum and Nanofrustulum sp., which both differ at one residue ( S7 Fig). There is less conservation at sites corresponding to iron release in R. catesbeiana ( S7 Fig). Inclusion of bacterial sequences in the ferritin phylogenetic tree resulted in a strongly supported branch composed primarily of diatoms and cyanobacteria, separate from other eukaryotes and heterotrophic bacteria (Fig 3). The diatom clade contains sequences from all four diatom Classes (Figs 1 and 4). A subset of diatom ferritin, henceforth referred to as Group II ferritin (FTN-II), is supported by high bootstrap support (Fig 4), and shows distinct residue differences at the C-terminus from Group I ferritin (FTN-I) ( S7 Fig). Transcripts from two dinoflagellates and from an unclassified pedinellid silicoflagellate grouped within the diatom clade (Fig 4). The dinoflagellates Glenodinium foliaceum and Kryptoperidinium foliaceum both transcribed sequences that grouped closely with sequences from Nitzschia and Cylindrotheca (Bacillariophyceae) (Fig 4). The putative ferritins from two diatoms-one identified in the P. tricornutum genome and one of the four putative ferritins from Nanonofrustum-were most similar to each other and grouped with heterotrophic bacteria rather than within the diatom clade (Fig 3).
Putative ferritins derived from 5 non-diatom stramenopile genera were distributed across the ferritin tree in a manner that did not match their 18S rDNA phylogeny (Fig 3), including the presence of the silicoflagellate sequence within the diatom cluster (Fig 4). Ferritin from the pelagophyte Chrysoreinhardia sp and the dictyophyte Rhizochromulina marina clustered within a clade otherwise composed primarily of haptophytes and chlorophytes (Fig 3). Similarly, a sequence from the synurophyte Mallomonas sp clustered with a clade of red algae and a sequence from the chrysophyte Ochromonas sp.

Redox protein pairs Ferredoxin/Flavodoxin
The gene encoding ferrodoxin (petF) has been previously observed to be plastid-encoded in diatoms, although a nuclear-encoded version of petF was recently detected in T. oceanica [25]. In our study, transcripts encoding ferredoxin were identified in about a third (17) of the examined species of diatoms (Fig 1). The ferrodoxin sequences all contain the four cysteine residues implicated in 2Fe-2S binding in ferredoxin from the green alga Chlorella fusca [35] at Cys37, Cys42, Cys45 and Cys75 (Fig 5). The petF transcripts from nine species display a putative plastid localization signal with accepted variations in the "ASAFAP" motif [25,36] suggesting a migration of the plastid genome-encoded petF to the nuclear genome (Fig 5). The majority of putative petF transcripts that encode a plastid localization signal cluster on a phylogenetic tree with the known diatom plastid-encoded petF ( S8 Fig).
Diatoms possess two flavodoxin isoforms-'long' flavodoxin (FLDAl) and 'short' flavodoxin (FLDAs)-that cluster into phylogenetic clades I and II, respectively [21] ( S9 Fig). Transcription of clade II FLDAs from Thalassiosira weissflogii and Thalassiosira oceanica is responsive to Fig 3. Diatom ferritin is evolutionarily distinct from eukaryotic ferritin. Midpoint-rooted maximum likelihood (ML) tree generated with RAxML using PROTGAMMAWAG model. Numbers at major nodes indicate ratio of bootstrap support values from 1,000 trees; values less than 0.5 removed for clarity. At major iron levels and results in production of a flavodoxin that putatively replaces ferredoxin under iron stress [10,21]. The function of clade I flavodoxin is currently unknown. Transcripts for the clade I FLDAl were detected in all four diatom Classes, but with more frequent detection in the Coscinodiscophyceae and Mediophyceae (Fig 1). Transcripts for the clade II FLDAs were detected in over half the examined species from all four major diatom Classes regardless of experimental conditions, most of which included iron-replete media. Three species-Proboscia inermis, Synedropsis recta, and Fragilariopsis kerguelensis-transcribed 2 or more distinct paralogs of clade II FLDAs ( S9 Fig).

Redox protein pairs Cytochrome c 6 / Plastocyanin
Transcripts encoding cytochrome c 6 (CYTC6) were detected in at least one representative transcriptome from each of the examined diatom species, with the exception of Thalassionema frauenfeldii (Fig 1). The copper-containing redox protein plastocyanin can substitute for cytochrome c 6 in the green alga Chlamydomonas [37]. Transcripts encoding putative plastocyanin (PCYN) were detected in at least one species from each of the major diatom Classes (Fig 1). Pseudo-nitzschia heimii and Fragilariopsis kerguelensis, both open-ocean species, possess three distinct copies of putative PCYN in contrast to other species in which only one variant was detected. All identified sequences maintain conservation of the canonical Cu-coordinating residues demonstrated in Populus nigra [38], with the notable exception of one copy from P. heimii, which does not conserve 2 of the 4 positions (S10 Fig). The diatom sequences cluster as a monophyletic clade that appears to share a common ancestor with other chromealveolates. Both F. kerguelensis and P. heimii exhibit one copy that does not cluster with other diatom sequences (Fig 6).

Preferential transcription of FeSOD in basal diatoms; Cu-ZnSOD in derived diatoms
Transcripts encoding iron and manganese superoxide dismutase (FeSOD and MnSOD) were distinguished by the presence of key metal-coordinating residues (FeSOD: Q-77 and A-146, MnSOD: G-77 and Q-146) (S11 Fig), [39] and were identified in a wide variety of eukaryotic groups (Fig 7). MnSOD transcripts were detected for all examined diatoms, with most diatoms transcribing two paralogs (Fig 1). Transcripts for the closely related FeSOD were more nodes, where the branching structure is in agreement with Bayesian inference, posterior probabilities are listed to the right of bootstrap values following a forward slash. Consensus trees were generated with MrBayes v3.1.2 from 1,000,000 generations, with trees sampled every 500 generations. Clades colored by dominant organismal phylogeny: diatoms, orange; mixed clades, grey; cyanobacteria, blue-green; haptophytes and cryptophytes, purple; rhodophytes, red; chlorophytes, green; opisthokonts, brown; alveolates, blue; and heterotrophic bacteria, yellow. Ratio of labeled phylogeny over total taxa in group given beside wedge label. Nodes with multiple nodes from the same taxonomic unit were collapsed. MMETSP  taxonomically restricted. About half the Coscinodiscophyceae transcribed one or more FeSOD paralogs and all but one species of Mediophyceae transcribed a single copy. Less than a quarter of the more recently diverged Fragilariophyceae and none of the Bacillariophyceae transcribed Fig 4. Ferritin phylogenetic tree of diatom clade. Maximum likelihood (ML) tree generated with RAxML using PROTGAMMAWAG model. Numbers beside branches indicate bootstrap support values from 1,000 trees; values under 50 removed for clarity. At major nodes, where the branching structure is in agreement with Bayesian inference, posterior probabilities are listed to the right of bootstrap values following a forward slash. Consensus trees were generated with MrBayes v3.1.2 from 1,000,000 generations, with trees sampled every 500 generations. Branches colored by organismal phylogeny: diatoms, orange; unclassified pedinellid, light green; dinoflagellates, blue. Genus and species are given followed by sequence source and ID. Multiple adjacent tips from the same taxonomic unit were collapsed, with number of members given in brackets. FeSOD homologs. The two MnSOD paralogs are separated from each other and from FeSOD by well-supported branches (Fig 8).
The distribution of putative Cu-Zn versus FeSODs in the examined diatoms displayed opposing trends. The basal Coscinodiscophyceae and Mediophyceae lineages more frequently displayed transcripts for the FeSODs, although transcripts corresponding to Cu-ZnSODs were also detected throughout these two Classes (Fig 1). In contrast, the more recently diverged Fig 6. Relationship of diatom PCYN to other phytoplankton. Midpoint-rooted approximately-maximum-likelihood tree of putative and known PCYN. Support values shown for deep nodes, with values under 0.5 removed for clarity. Branches outside of diatom clade are collapsed; with dominant phylogenetic composition, ratio of dominant taxa to total taxa, number of species in the collapsed clade, and number of sequences. At major nodes, where the branching structure is in agreement with Bayesian inference, posterior probabilities are listed to the right of bootstrap values following a forward slash. Consensus trees were generated with MrBayes v3.1.2 from 2,400,000 generations, with trees sampled every 500 generations. One representative is shown from groups sharing greater than 95% similarity in aligned sequence identity. Branches colored by organismal phylogeny: diatoms, orange; chlorophytes, green; haptophytes and cryptophytes, lavender; non-diatom stramenopiles, magenta; alveolates, blue; opisthokonts and amoebozoa, brown, mixed clades, gray. Previously identified PCYN from Thalassiosira oceanica noted in bold. Distinct P. heimii and F. kerguelensis paralogs shown with red or blue dots, respectively.  Outer border wedges indicate putative function as FeSOD, MnSOD I, or MnSOD II. Inner border wedges indicate dominant phylogenetic composition: diatoms, orange; mixed clades, grey; haptophytes and cryptophytes, purple; chlorophytes, green; alveolates, blue; and non-diatom stramenopiles, magenta. Decimals at edges of inside wedges give ratio of labeled phylogeny over total taxa in group. Bootstraps omitted for clarity. One representative is shown from groups sharing greater than 95% similarity in aligned sequence identity.

Iron Metabolism in Diatoms
Fragilariophyceae showed a bias towards Cu-ZnSOD rather than FeSOD transcripts; only Asterionellopsis glacialis did not reveal a Cu-ZnSOD, instead transcribing an FeSOD, while Thalassionema frauenfeldii displayed both forms. No FeSOD transcripts were detected within the 18 examined species of Bacillariophyceae; all instead transcribed the Cu-ZnSOD.

NiSOD transcription is common in diatoms
A putative NiSOD-encoding transcript (NiSOD) was detected in forty-five (85%) of the queried diatom species (Fig 1), with the majority of detected NiSODs encoded immediately downstream of ubiquitin by an ancient UBQ-NiSOD fusion gene (Fig 9, S13 Fig). These putative NiSODs conserve the Ni-hook motif required for superoxide dismutase activity [42]. Nine species transcribed a second NiSOD paralog without an apparent fusion to UBQ. A majority of UBQ-lacking NiSOD transcripts from across the marine microeukaryotes branch more deeply than those NiSODs that possess the UBQ presequence suggesting that the presequence is a derived state (S13 Fig).

Discussion
We used the sequence data generated for the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [33] to investigate the prevalence of select iron metabolism genes across diatoms. Previous research on the metabolic capabilities of these organisms had necessarily been limited by sequence data derived from a relatively low number of samples. Here, we re-evaluate previous hypotheses in light of the expanded resolution and breadth of organismal diversity enabled by the MMETSP. Of the 77 examined diatom transcriptomes, nine were derived from four diatom species grown under potentially iron-limiting conditions (media with less than 60 nmol Fe L -1 ): Thalassiosira weissflogii (MMETSP0878-0881), Chaetoceros sp. (MMETSP0200), Fragilariopsis kerguelensis (MMETSP0735, 0736), and Skeletonema marinoi (MMETSP0319, 0320). A publicly available transcriptome of Pseudo-nitzschia granii grown under iron-limiting conditions was also included. Despite the majority of cultures having been Iron Metabolism in Diatoms grown in iron-replete conditions, we detected transcripts for genes more commonly associated with iron limitation. Given this bias in growth conditions, it is important to note that the absence of detected transcripts in any given sample may mean that the culture conditions tested did not result in significant transcription of a gene. Due to differences in culturing conditions inherent to a distributed collaborative effort, we cannot confidently compare quantitative read counts of genes between samples. Therefore, these data provide information about the presence, but not absence or differential transcription, of particular sequences. Taken together, the data reveal a more complete picture of the distribution of iron metabolism genes across taxa, challenging assumptions and providing insight into gene origins, copy number, and divergent functions and affinities.

Revisiting hypotheses of lateral gene transfer
Lateral gene transfer (LGT) in eukaryotes is a recognized mechanism for the gain of new biochemical functions, increasing the potential for expansion into new ecological niches. Accurate detection of LGT, however, requires significant representation of sequences across a comprehensive taxonomic range [43]. The MMETSP data reveal one case that supports, with modifications, a hypothesis of LGT origin (FTN), a second case that expands upon a hypothesis of endosymbiotic gene transfer (petF), and two cases where previously hypothesized LGT instead likely reflects vertical inheritance within diatoms (NiSOD and PCYN).
Prior to the MMETSP, the handful of available ferritin sequences from diatoms came from pennate lineages. FTN sequences from pennate diatoms were separated from the other eukaryotes by long branches on a phylogenetic tree. The dissimilarity of the pennate diatom FTN from other eukaryotes, together with its apparent absence from other stramenopiles supported the original hypothesis of LGT acquisition of FTN in pennates [19]. The MMETSP data serve to extend and modify the story by showing FTN transcription in all four extant diatom Classes (Fig 1), while remaining largely absent from other stramenopiles. The additional diatom sequences identified here confirm that diatom ferritin does not branch with other eukaryotic ferritins and instead branches with cyanobacterial true ferritin, which is believed to also possess iron storage functionality, suggesting an ancient acquisition of FTN from this group (Fig 3). Two putative diatom FTN sequences appear more closely related to those from heterotrophic bacteria rather than photosynthetic cyanobacteria. One of the "bacteria-like" sequences is found in the Phaeodactylum tricornutum whole genome sequence [44] and groups closely with a sequence from a different diatom species, Nanofrustulum sp., suggesting that these sequences may have been acquired as separate LGT events from closely related heterotrophic bacteria (Fig 3).
Embedded within the clade of diatom ferritins are sequences derived from two dinoflagellates and one silicoflagellate (Fig 4). The two dinoflagellates (G. foliaceum and K. foliaceum) are closely related 'dinotoms' whose plastids are derived from a diatom endosymbiont [45], likely a member of the Bacillariophyceae [46]. The close affiliation of the dinotom FTN sequences with Nitzschia, Cylindrotheca and Nanofrustulum suggest that they are derived from their diatom endosymbiont rather than the dinoflagellate host. In contrast, the unrelated silicoflagellate may have acquired ferritin through the phagotrophy of a diatom, specifically a member of the Mediophyceae.
The absence of FTN in some diatoms, such as certain members of Thalassiosira, may be tolerated due to the presence of an alternative, non-ferritin based iron storage system. Such a system has been proposed for both the brown alga Ectocarpus siliculosis [47] and the Thalassiosiroid diatoms [48], both of which lack a known genomic copy of FTN and appear to store iron in mineralized clusters. Similar to LGT, endosymbiotic gene transfer (EGT) is the acquisition of genetic material from outside the host genome, in this case from an endosymbiont or endosymbiotically-derived organelle. We identified evidence for transfer of petF, which encodes ferredoxin, from the plastid to the nuclear genome. petF transcripts from multiple species across classes were detected despite the use of mRNA isolation protocols that bias against plastid encoded transcripts. A majority of these transcripts appear to encode a plastid-targeting peptide (Fig 5) similar to that detected for the T. oceanica variant [25], including conserved motifs recognized for plastid localization [36]. Transfer of petF from the plastid to the nucleus may have occurred several times, or alternately, this transfer may have occurred once in an ancestral diatom with subsequent loss in some descendant lineages ( S8 Fig). Nuclear regulatory control of PETF has been suggested to provide a more nuanced response to iron availability in T. oceanica [25], and this may reflect a more ancestral method of acclimation to trace metal availability in diatoms.
In contrast to the incorporation of FTN and petF into the diatom nuclear genome by LGT and EGT, respectively, two genes with hypothesized recent LGT origins appear to instead have roots deeper in the chromalveolate lineage. NiSOD and PCYN homologs are broadly distributed throughout diatoms, but unlike FTN they follow organismal phylogeny, clustering with other chromalveolates (S13 Fig, Fig 6). The gene encoding nickel superoxide dismutase (NiSOD) was originally hypothesized to have arisen relatively recently in eukaryotes via a lateral gene transfer from bacteria to prasinophytes (Ostreococcus) based on available molecular data at the time [31,32]. More recently, a NiSOD homolog was found in the diatom Phaeodactylum tricornutum [30]. Similarly, the apparent scarcity of PCYN, only previously identified in T. oceanica [26] and Fragilariopsis cylindrus [49] was hypothesized to reflect LGT events in select diatom species [50]. Our analyses demonstrate that homologs of both genes are present in every class of diatoms and many of the major branches of the chromalveolates, which supports vertical, rather than lateral, inheritance of these genes.
The multi-copy nature of iron genes contributes to interspecies variability Iron metabolism proteins in diatoms appear to be encoded primarily by multi-gene families, presumably resulting in proteins with divergent functions. For example, in many diatom species, multi-copy gene families were detected for two of the iron uptake system proteins-multicopper oxidase and ferric reductase. Multi-copper oxidases include members with ferroxidase and cuprous oxidase activity, and share sequence similarities with laccases [14]. The residues implicated in ferroxidase activity in yeast [14] are not maintained in diatoms, although ferroxidase activity has been demonstrated in vivo with T. oceanica [16]. Similarly, the vast majority of diatoms harbor multiple copies of putative diatom ferric reductase genes (FRE). This variability in copy number could allow for neofunctionalization of paralogs, perhaps resulting in separate metabolic functions.
An ancient duplication event appears to have led to at least two distinct paralogs of diatom ferritin, with a well-supported subset, FTN-II, showing distinct divergence of residues on the C-terminus of the predicted translation (Fig 4, S7 Fig). Mammals possess a light chain and heavy chain of ferritin: the heavy chain oxidizes iron(II), while the light chain fosters ferrihydrite nucleation, having lost the capacity to oxidize iron [51]. Whether or not the two paralogs of diatom ferritin form a complex or demonstrate functional differences is unknown, as only one paralog from P. multiseries (PID 237986) from FTN-I has been characterized [19,52].
Three putative plastocyanin paralogs were detected in Fragilariopsis kerguelensis and Pseudo-nitzschia heimii, both of which are found in low-iron open ocean regions. These alternate paralogs possess the canonical Cu-coordinating residues for Populus nigra plastocyanin [38], with one notable exception in P. heimii (PCYN1), which has His37Gln and Cys84Ser substitutions at the Cu-coordinating site relative to P. nigra PCYN, conceivably leading to altered binding capacity or neofunctionalization (S10 Fig). Characterization studies would be useful to determine the role of these additional copies of PCYN in P. heimii and F. kerguelensis, and to assess whether they confer an advantage in low-iron regimes.
Multiple superoxide dismutase metalloforms illustrate adaptive preferences for different transition metals Diversification of function through gene duplications is exemplified with the SODs. The biochemical importance of SOD is underscored by the presence of multiple isozymes. At least two different metalloforms of SODs were detected in every species of diatom (Fig 1). The gene encoding manganese superoxide dismutase (MnSOD), a metalloform that has been suggested to substitute for FeSODs under iron-limiting conditions [26], appears to have duplicated early in eukaryotic history, with most diatoms transcribing two or more paralogs (Fig 1, Fig 8). The common detection of NiSOD transcripts implies an important, constitutive role for the NiSOD as well. Most NiSOD transcripts identified here are fused to an ubiquitin-coding sequence (S13 Fig). In yeast, post-translational cleavage of ubiquitin-fusion proteins is performed by at least four ubiquitin-specific proteases [53]. Similar cleavage by ubiquitin-specific proteases of the NiSOD fusion protein would provide a mechanism for the immediate activation of the Nihook motif and suggest the possibility that UBQ regulatory pathways may control activated NiSOD protein abundance. Unlike NiSOD from bacteria, the complete putative NiSOD homolog from these eukaryotes has yet to be fully functionally characterized, although SOD functionality has been demonstrated in oligopeptide maquettes [31,54].
In contrast to the near ubiquity of Mn and NiSOD transcripts, Fe and Cu-Zn SODs displayed distinctive patterns. FeSOD was not detected in any members of the Bacillariophyceae, the most derived class of diatoms. Instead, FeSOD transcripts were more commonly detected as two distinct copies in members of the most ancient class of diatoms, the Coscinodiscophyceae (Fig 1). The Cu-ZnSODs displayed an opposite trend to the FeSODs, with transcripts more frequently detected in the more derived diatoms (Fig 1). The apparent preference of pennate diatoms for Cu-ZnSOD in Fe-replete media, and their parallel lack of FeSOD transcripts, suggests a permanent shift in metal-use priorities for this group of diatoms. While previous studies have emphasized the role of Fe and Cu-Zn SODs in diatoms, our analyses suggest that they may play an accessory role to the dominant Mn and Ni metalloforms.

Conclusions
The data presented in this study provides a revised perspective on the distribution and prevalence of key genes involved in iron metabolism in marine diatoms. The presence of transcripts encoding the three elements of the reductive uptake system (FRE, MCO, FTR) throughout the diatom lineage is evidence that this system has been evolutionarily conserved. We report that ferritin (FTN) coding genes are present in ancient diatoms, comprising a lineage distinct from canonical eukaryotic FTN. Additionally, two FTN paralogs are present in many diatoms, with one divergent clade displaying distinct differences on in silico translated C-terminal residues. The distribution of transcripts encoding the non-ferrous electron carriers plastocyanin (PCYN) and flavodoxin (FLDA) suggests the potential use of alternative redox metal strategies in a greater range of species than previously observed. Homologs of all four superoxide dismutase (SOD) metalloforms were found, illustrating the potential for adaptive use of different isozymes to ensure protection against oxidative damage in the face of metal scarcity.
Based on our analyses, much of the physiological diversity found in diatoms appears to come from gene duplications and subsequent divergence. In the majority of cases, we found species harbor multiple paralogs suggestive of functional diversification and lending insight into the adaptable nature of diatoms that may have contributed to their expansion into so many habitats.

Datasets
Processing and sequencing of samples was performed at the National Center for Genome Resources (NCGR) as part of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [33] and all sequence and metadata were retrieved through the CAMERA portal [55]. The assembly pipeline for the MMETSP transcriptomes is described in more detail at the project website.

Identification of transcriptome homologs
Translated genes encoding twelve proteins were targeted for analysis: high-affinity ferric reductase (FRE); multi-copper oxidase (MCO); iron(III) permease (FTR); ferritin (FTN); flavodoxin (FLDA); ferredoxin (petF); plastocyanin (PCYN); cytochrome c6 (CYTC6); and the Ni-, Fe-, Mn-, and Cu-Zn-superoxide dismutases (Fig 1). For each target gene, we identified reference sequences with an experimentally verified and/or a solved x-ray crystallographic protein structure (S1 Table). The reference sequences were aligned with Mafft L-INS-i [56], and trimmed on the carboxy and amino termini to match the mature protein coding regions given by the reference sequence(s) in Jalview 2.0 [57]. Hidden Markov Model (HMM) profiles of the reference sequence alignments were constructed with hmmsearch [58] and used to identify homologous sequences (based on e-value cutoff of 1e-05) in translated transcriptomic and genomic data sets. To ensure alignment quality, short fragments and extended contigs spanning less than 25% or greater than 500% of the mature protein coding regions, respectively, were removed in Jalview 2.0 [57]. Homologous sequences for FLAV, FER, PCY, CYTC6 from Thalassiosira oceanica and Chondrus crispus were identified in NCBI based on annotations and for Cu-ZnSOD, FeSOD, MnSOD, NiSOD, FRE, MCO, and FTR translated amino acid sequences based on BLASTP (e-value cut-off <1e-05). Potential homologs were aligned to the reference sequences using Mafft FFT-NS-I [56] and trimmed on the amino and carboxy termini to match the length of the reference sequences in Jalview 2.0 [57]. Those sequences with unique insertions of over 50 amino acids were removed to minimize inclusion of mis-assemblies and to improve alignment quality. Sequences with 100% identity within individual strains were removed from the alignment.
To resolve the phylogeny of the FTN tree, close homologs of FTN from prokaryotes were identified with a BLASTP search against GenBank nr on November 8 th , 2013 against all Bacteria, using six representative sequences as queries (

Identification of specific functional activity by selection for known motifs
Additional identification steps were taken with the components of the reductive uptake system due to the notable divergence in sequence homology within these large gene families. Putative functions of the three elements of the reductive uptake system were assumed after filtering the aligned, translated sequences for conserved motifs of published biochemical function. For FRE, this warranted conservation of the FAD binding site [(H-P-F-(S/T)-(V/L/I)] and the NADPHadenine binding motif (C-G-P) that were observed in the four FRE sequences from T. pseudonana and P. tricornutum [15] and initially described in yeast [12]. Valine and Alanine were permitted as replacements for Glycine in the NADPH-adenine binding motif. For MCO, copper-coordinating activity was assumed after filtering for conserved histidine residues within the canonical Cu-coordinating motifs described in yeast Fet3p [14]. Putative FTR function was assigned by filtering sequences for the "REXXE" motif implicated in iron-permeability [13].
Additionally, NiSOD was identified by the presence of the Ni-hook motif (H-C-X-X-P-C-G-X-Y) on the N-terminal end of the translated NiSOD gene, necessary for metal binding and catalysis [42]. Fe and Mn SOD are distinguished by conserved metal binding residues at sites 77 and 146 [39]. FeSOD conserves a Q at 77 and A (though G was accepted) at 146, with MnSOD showing a G and Q at 77 and 146, respectively.

Phylogenetic analysis
Phylogeny was inferred with maximum-likelihood phylogenetic trees generated using RAxML version 7.6.4 (translated FTN) [59] or FastTree version 2.1.7 (all other translated genes) [60]. Trees were visualized in Archaeopteryx [61]. Bayesian inference was performed with MrBayes version 3.1.2 [62] using mixed amino acid rate matrices. Unless otherwise noted, consensus trees were constructed with a relative burn-in of 25% following 1,000,000 generation runs, with trees collected every 500 generations.

Determining copy numbers of detected transcripts
Presence or absence and copy number of each gene was determined qualitatively from the phylogenetic tree for each set of translated amino acid sequences. For every strain, each translated sequence with distinct tree placements was counted as one single copy, with multiple copies given by sequences occupying unique placements on the phylogenetic tree. Sequences within the same species clustering immediately adjacent to these sequences were considered to be the same gene by assuming the closely clustered sequences to be either allelic or sequencing artifacts, though no attempts were made to distinguish between the two.

Evolutionary relationship inferred through 18S sequence homology
A eukaryote-only SSU RNA database (version last modified June 29, 2010) of high quality 18S sequences curated and aligned by SILVA (www.arb-silva.de) was downloaded from the Mothur wiki (www.mothur.org/wiki/Silva_reference_files) on April 26, 2013 and added to a file with the 18S sequences available for all MMETSP samples and for genomes and non-MMETSP transcriptomes included in our analyses. Sequences were then re-aligned using L-INS-i option in MAFFT, empty columns removed in JalView and the tree generated with FastTree using default arguments. Diatoms all fell within a single clade; all other eukaryote branches except for select outgroups are not shown. Node support values are calculated from 1,000 resamples, only values over 0.5 are shown. One representative is shown from groups sharing greater than 95% similarity in unaligned sequence identity. Branches colored by organismal phylogeny: diatoms, orange; chlorophytes, green; rhodophytes, red; haptophytes and cryptophytes, purple; non-diatom stramenopiles, magenta; alveolates, blue; opisthokonts and amoebozoa, brown; excavates, pale blue; rhizaria, grey. Legend, species, and PID, from (Taylor et al., 2005) marked with red arrows. Deviations from S. cerevisiae residues are boxed. Highlighted top rows, fungal MCOs with ferroxidase activity. Roman numerals above columns represent regions in S. cerevisiae Fet3p: I, T78 to L85; II, G121 to H128; III, H413 to H420; IV, G478 to H489. (TIF) Node support values are calculated from 1,000 resamples, only values over 0.5 are shown. Green highlighted box indicates homologs with proposed ferroxidase activity. One representative is shown from groups sharing greater than 95% similarity in unaligned sequence identity. Branches colored by organismal phylogeny: diatoms, orange; chlorophytes, green; haptophytes and cryptophytes, purple; non-diatom stramenopiles, magenta; alveolates, blue; opisthokonts and amoebozoa, brown; excavates, pale blue; rhizaria, grey. Confidence values shown for deep branches, other omitted for clarity. Accession IDs: Node support values are calculated from 1,000 resamples, only values over 0.5 are shown. One representative is shown from groups sharing greater than 95% similarity in unaligned sequence identity. Branches colored by organismal phylogeny: diatoms, orange; chlorophytes, green; haptophytes and cryptophytes, purple; non-diatom stramenopiles, magenta; alveolates, blue; opisthokonts and amoebozoa, brown; rhodophytes, red; rhizaria, grey. Legend, species, and accession numbers: TpFTR1, Thalassiosira pseudonana,  Fig. Diatom FTN conserves ferroxidase residues, with differences at the C terminus between two primary groups. Red arrows show ferroxidase residue sites in Pseudo-nitzschia multiseries. Red arrowhead marks the ambiguous positions at Glu130 and Glu131 of PmFTN, where either residue may function in ferroxidase activity. Blue arrows show conserved sites for iron release in Rana catesbeiana. (TIF)