Plastid genome analysis of three Nemaliophycidae red algal species suggests environmental adaptation for iron limited habitats

The red algal subclass Nemaliophycidae includes both marine and freshwater taxa that contribute to more than half of the freshwater species in Rhodophyta. Given that these taxa inhabit diverse habitats, the Nemaliophycidae is a suitable model for studying environmental adaptation. For this purpose, we characterized plastid genomes of two freshwater species, Kumanoa americana (Batrachospermales) and Thorea hispida (Thoreales), and one marine species Palmaria palmata (Palmariales). Comparative genome analysis identified seven genes (ycf34, ycf35, ycf37, ycf46, ycf91, grx, and pbsA) that were different among marine and freshwater species. Among currently available red algal plastid genomes (127), four genes (pbsA, ycf34, ycf35, ycf37) were retained in most of the marine species. Among these, the pbsA gene, known for encoding heme oxygenase, had two additional copies (HMOX1 and HMOX2) that were newly discovered and were reported from previously red algal nuclear genomes. Each type of heme oxygenase had a different evolutionary history and special modifications (e.g., plastid targeting signal peptide). Based on this observation, we suggest that the plastid-encoded pbsA contributes to the iron controlling system in iron-deprived conditions. Thus, we highlight that this functional requirement may have prevented gene loss during the long evolutionary history of red algal plastid genomes.


Introduction
The red algal class Florideophyceae comprises 95% (6,748 spp. out of 7,100 spp.) of the Rhodophyta and encompass a biologically diversified group of taxa [1,2]. Most of the red algal species (>95%) inhabit marine habitats, however about 5% are found in freshwater environments [3]. The Nemaliophycidae, one of five subclasses within Florideophyceae, contains more than half of these freshwater species. This subclass includes both marine and freshwater taxa with three exclusively freshwater orders (Balbianiales, Batrachospermales, Thoreales), six exclusively marine orders (Rhodachlyales, Balliales, Nemaliales, Entwisleiales, Colaconematales, Palmariales), and one order (Acrochaetiales) with both freshwater and marine species [ . Among the freshwater orders, the Batrachospermales and Thoreales have more than half of the freshwater species in all of the Rhodophyta [6]. Thus, a comparison of Nemaliophycidae plastid genomes of taxa from freshwater and marine habitats may provide insights into environmental adaptation [1]. A previous study demonstrated that the freshwater angiosperm Najas flexilis (water nymph) adapted to the aquatic environment from its terrestrial ancestor by the complete loss of the ndh gene family in plastid genome [7]. The NDH gene complexes encode for the NAD (P)H dehydrogenase complex that increases photosynthetic efficiency at variable light intensities in terrestrial habitats. In its transition to the aquatic environment, N. flexilis did not require resistance to high light stress (due to the refractive properties of water) and therefore the ndh gene family had been lost. Likewise, similar gene loss or retention events may be present in the evolution of red algal plastid genomes during their transitions from marine habitats to freshwater systems.
To date, 99 florideophycean plastid genomes (cf. 127 red algal plastid genomes including three new genomes) are available in the NCBI organelle database, including 23 Nemaliophycidae that have been detailed in three recent papers [8][9][10]. To extend our understanding of red algal plastid evolution as it relates to the habitat adaptation, we completely sequenced and annotated three new plastid genomes for Nemaliophycidae, including one marine (Palmaria palmata) and two freshwater species (Kumanoa americana, Thorea hispida). From a comparative analysis of plastid genomes, we seek to identify plastid genes involved in the transition between marine and freshwater red algae and their physiological implications.

Whole genome sequencing and plastid genome construction
Culture strains of two filamentous freshwater species of Kumanoa americana (hsy120, isolated by Franklyn D. Ott from a stream in Mississippi, USA) and Thorea hispida (hsy077, isolated by F. Ott from the Kaw river in Kansas, USA) were harvested with gentle centrifugations from the culture flask. Thalli of Palmaria palmata (commercially sold as dulse) were collected from Reid State Park in Maine, USA on 27 Aug. 2010 by HSY. Genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) and purified by LaboPass™ DNA Isolation Kit (Cosmo Genetech, Seoul, Korea). Genome sequence data were generated using the Ion Torrent PGM (Thermo Fisher Scientific, San Francisco, California, USA) Next-Generation Sequencing (NGS) platform. The sequencing libraries were prepared using the Ion Xpress Plus gDNA Fragment Library Preparation kit for 200 bp or 400 bp libraries. The library amplification and DNA sequencing were conducted by either Ion PGM Template OT2 200 or 400 Kits and Ion PGM Sequencing OT2 200 or 400 Kit for the Ion Torrent PGM platform.
From NGS genome sequencing data, short raw reads (< 50 bp) were removed completely from the analysis and the remainder of raw reads were de novo assembled into contigs using CLC Genomics Workbench 5.5.1 (CLC Bio., Aarhus, Denmark) and MIRA3 Assembler [11]. To obtain a plastid consensus sequence, contigs were sorted by tBLASTn (e-value: 1e-10) using the protein sequence of red algal plastid genes as a reference (i.e. Chondrus crispus, Calliarthron tuberculosum) [12,13]. After the reassembly process, we obtained the circular plastid genomes and those circular genomes were confirmed by MUMmerplot [14] and by comparing with reference genomes to check for completeness. The sequences were verified with readmapping tools implemented in CLC Genomics Workbench to correct for any sequencing errors or gaps.
Protein coding genes were manually annotated following the procedure described in Song et al., 2016 [15] using 'Bacteria and Archaea (11)' genetic code and the NCBI database for non-redundant (nr) protein sequences. To search the ribosomal RNA sequences, we applied RNAmmer v1.2 Server [16] using the option of Bacteria, and then re-confirmed by BLASTn. Introns, tRNAs, and other small RNA were searched by ARAGORN [17] and Rfam cmscan v1.1 [18]. Several important genes and ambiguous sequences were re-confirmed with PCR amplification followed by Sanger sequencing. The annotated plastid genomes were visualized using OrganellarGenomeDraw v1.2 [19]. Comparative analysis of the genome structure was accomplished using UniMoG v1.0 [20] and Mauve Genome Alignment v2.2.0 [21,22] through the Geneious plug-in using the default setting [23].

A statistical test for the habitat-gene correlation
Habitat information of each species was referred from previous studies. Ott [5] summarized detail about the habitats and isolation history for most nemaliophycidaean species. AlgaeBase [2] provided information about authentic references with the type locality and its distribution. Based on the habitat and gene presence/absence information in the plastid genome, we performed a chi-square analysis that assessed the correlation between habitat and putative habitat-specific genes. We used the chi-square test incorporated into R package [24]. The p-value cut-off (<0.01) was adjusted to reject the null hypothesis.

Phylogenetic analysis
The red algal protein sequences of heme oxygenase were collected from NCBI public databases, which include transcriptome and genome data. The heme oxygenase homologs were obtained by BLASTp against the NCBI database with an adjusted threshold-cut to 500 top matches of red algal heme oxygenase. To discover the nuclear heme oxygenase, we also surveyed heme oxygenase genes from the published [25][26][27][28] or unpublished (H.S. Yoon et al.) whole genome data as described in S5 Table. Multiple sequence alignments were performed using MAFFT version 7 [29] with the default options. These alignments were refined manually based on conserved domains. Maximum likelihood-based phylogenetic analysis and bootstrap methods (MLB) were conducted using IQ-TREE [30] with 1,000 ultrafast bootstrap replications. The evolutionary model was 'LG + I + G4' [31], which was automatically selected by the model test option incorporated in IQ-TREE. Finally, highly divergent or contaminant sequences, which showed a long-branch or taxonomical mismatch to sister taxa, were removed from the following analysis. e-value, identities at 20%, hits with at least 20% of the shortest sequence, and query coverage of both sequences at 70%. The resulting network was visualized with the aid of Cytoscape program [38].

General features of three plastid genomes
A total of 1.73 Gbp, 1.94 Gbp, and 1.10 Gbp of raw sequence data were produced for Palmaria palmata, Kumanoa americana, and Thorea hispida, respectively (see details in S1 Table). The average coverage for the plastid genomes was 1,003x in T. hispida, 215x in K. americana, and 343x in P. palmata.
Three complete plastid genomes ( Fig 1A) were manually annotated based on published red algal plastid references [39]. Table 1 summarizes the characteristics of plastid genomes of the Florideophyceae [8-10, 12, 13, 39-53]. The total size of the plastid genome was 184,026 bp in K. americana (GenBank accession number NC_031178), containing 194 protein-coding genes (CDS), while T. hispida (GenBank accession number NC_031171) was 175,193 bp in size including 192 CDSs. The P. palmata plastid genome (GenBank accession number NC_031147) was 192,961 bp in size with 203 CDSs. The GC content of K. americana was 29.3%, which was similar to T. hispida (28.3%), but lower than that of P. palmata (33.9%). The high GC content in P. palmata was more similar to that of the Bangiophyceae (average of 11 spp.: 33.1%) than other Florideophyceae species (average of 102 spp.: 29.3%).
Comparing the genome architecture, two major differences were evident among the three Nemaliophycidae species (Fig 1B). First, two copies of ribosomal RNA (rRNA) operon were present in P. palmata, whereas K. americana and T. hispida have only a single rRNA operon (5S, 23S, 16S rRNA) like as most of florideophycean species (Table 1). It has been reported that the plastid genome structures are highly conserved among four florideophycean subclasses (i.e., Nemaliophycidae, Corallinophycidae, Ahnfeltiophycidae, Rhodymeniophycidae) [39]. Second, K. americana had a large inversion between chlL and rRNA operon region that differs from other two species.

Specific gene loss in freshwater Nemaliophycidae species
Gene contents were generally conserved among the three Nemaliophycidae plastid genomes, however, there were some differences (S1 Fig, S2 Table). For example, the ycf91 gene was present only in two freshwater species of K. americana and T. hispida, but absent in marine P. palmata. In addition, six genes (ycf34, ycf35, ycf37, ycf46, grx, and pbsA) were preserved only in the marine P. palmata, which were absent in the freshwater K. americana and T. hispida. Therefore, these seven genes were candidates of habitat-specific plastid gene (i.e., marine vs. freshwater specific).
In order to broaden the investigation of these putative habitat-specific genes, we extended the survey to include all currently available 127 red algal plastid genomes, which include 16 freshwater species, 109 marine species, and two brackish species (S3 Table). From the data in S3 Table, we calculated both the habitat-specific gene concordance rate and the p-value from the chi-square test for seven genes for whether these genes were correlated to the habitat type. Interestingly, we discovered that four of these genes (pbsA, ycf34, ycf35, and ycf46) have higher than 80% concordance rate with statistically significant support (p-values = < 0.01). These results suggest that the presence of these four genes is significantly different between marine and freshwater habitats. For example, the pbsA gene showed 84.1% of habitat-specific gene concordance rate (p-value = 3.17E-07). Except for four species (Bangia atropurpurea, Sheathia arcuata, Paralemanea sp., Sirodotia delicatula), 12 of 16 freshwater species lacked the pbsA gene in the plastid genomes. Likewise, the habitat-specific gene concordance rates of ycf34, ycf35, ycf46 and their p-value were 85.6%, 81.8%, 87.9% and 4.83E-10, 2.42E-03, 5.43E-12, respectively.
To inspect the association between the phylogenetic relationship and the habitat-gene pattern, we mapped the presence or absence of these four genes on the ML phylogeny with habitat information (Fig 2). According to the result, the losses (i.e., absence) of four habitat-specific genes were more likely due to a phylogenetic pattern as compared to random events. For instance, all four genes were absent in four Cyanidiales species, which are all freshwater species. Two exclusive freshwater orders of Thoreales (T. hispida) and Batrachospermales (eight species) were mostly absent of these genes that were clearly different from marine orders that contained these genes (i.e., one species of Palmariales and 16 spp. of Nemaliales). It is interesting that some mangrove species (i.e., Bulboplastis apyrenoidosa, Caloglossa spp. Bostrychia spp.) [54][55][56] and two parasitic species (i.e., Polysiphonia infestans, Choreocolax polysiphoniae) [57,58], which have significantly reduced plastid genomes, lost these genes. However, there were some exceptional cases: two marine species of Hildenbrandiales (Apophlaea sinclairii, Hildenbrandia rubra) together with one freshwater species (H. rivularis) were absent of all these genes, same as in two marine Corallinophycidae species (Sporolithon durum and Calliarthron tuberculosum). Based on this observation (Fig 2), we postulate that four habitat-specific genes (pbsA, ycf34, ycf35, and ycf46) were adapted to freshwater habitat during their evolutionary history.
To find a functional relevance of these genes in environmental adaptation, we focused on the in silico functional analysis. Because any functions were reported for ycf genes, a conserved hypothetical protein family, we selected only on the pbsA (heme oxygenase) gene for downstream analyses.

Heme oxygenase
The function of heme oxygenase is generally known for the degradation of a heme to a biliverdin and is involved in the production of phycobilins [59]. For instance, the heme oxygenase degrades heme that is an essential step in the phycobilin biosynthesis in Cyanidium caldarium (Cyanidiophyceae) [60]. During a heme degradation, iron ions are released and those ions play an essential role in the iron recycling pathway [61,62].
While searching for pbsA genes in all available red algal genome data (S5 Table), two additional heme oxygenase genes were newly discovered from nuclear genome data. Both the newly discovered two heme oxygenases and pbsA had a conserved heme oxygenase domain (CDD name: HemeO superfamily), with conserved heme binding pockets (blue asterisk in Fig  3). According to recent studies about heme oxygenase in Chlamydomonas reinhardtii (Chlorophyta), two distinct types of nuclear-encoded heme oxygenase have been called as HMOX1 (plant type) and HMOX2 (animal type) [63]. However, there was no plastidal heme oxygenase (pbsA) in green algae and we could not find it from a currently available nuclear genome. Compared to the green algal heme oxygenase, we named three red algal heme oxygenase genes as HMOX1 and HMOX2 for nuclear copies and pbsA for the plastidal copy.
To identify the evolutionary history of three heme oxygenase isotypes, homologs of heme oxygenase were collected from the NCBI database (see details in Materials and Methods). These three distinct types of red algal heme oxygenase were grouped in three clades in the phylogenetic tree (Fig 4).  pbsA gene. The plastid encoded pbsA gene was present in most marine red algal species. Red algal pbsA genes were grouped in a highly supported clade with diverse cyanobacteria (94% MLB) (Fig 4), suggesting a cyanobacterial origin. Additionally, pbsA was present in the red algal derived plastids of cryptophytes. The pbsA homolog was also found in the nuclear genome of the Cyanophora paradoxa (Glaucophyta) with an additional extension of the N-terminal transit peptide (e-value = 1e-61; compare to pbsA gene of Porphyridium purpureum). The alignment for pbsA and its homologous proteins were identified to have a functional domain of heme oxygenase (see Fig 3A). Nonetheless, each type of heme oxygenase proteins contained a few differences, such as putative transmembrane domain (red box) in C-terminal extension or targeting domain (green box) in N-terminal extension (Fig 3A). For pbsA protein sequences, a transmembrane region was predicted as shown in Fig 3B, where most of the red algal pbsA proteins had putative transmembrane domains in the C-terminal. This C-terminal transmembrane domain was also present in HMOX2 genes, but HMOX2 genes contained additional putative transmembrane domains inside the functional heme oxygenase domain. None of HMOX1 genes in red algae were predicted to have a transmembrane domain region in their protein sequences.
One noteworthy discovery was that the pbsA gene is generally absent in freshwater species (75%; 12 of 16 spp.), but present in most of the marine red algal species (86%; 100 of 116) (see S3 Table). Heme oxygenase is well known for acting as an iron-controlling factor [61,64]. It has been demonstrated that transcription of the pbsA gene in a unicellular red alga, Rhodella violacea (Rhodellophyceae), was up-regulated in iron deprivation conditions [65]. Given these observations, it is highly likely that the pbsA gene in red algae assists to uptake of iron in iron deprived marine environment. On the other hand, most freshwater red algae have lost the pbsA gene, likely due to this gene being unnecessary or redundant in freshwaters that are typically not as iron limited as marine environments (iron composition in freshwater is~1,400 times higher than seawater [66]). Although gene content in red algal plastid genomes was highly conserved [39], pbsA gene may be a good example for the genomic response to environmental adaption.

L N A V RN L Q I S K --N S S ---------F L K I M M T M M L S L N A I KN L RD K I L N EGN ---------W W I I T T A R L L F N S V F P K L G K K T T EM M S ---------Y I R I F T K I LM MQ F L I N S KD K L R V L P P P P P P P P P P P P LM M I I M M L L RG T PQ T H S Y A SC L S KA SA E Y T R T I T -W WA V T RG L L GV T S L A V I AW W T W WA E R K T A T Y N R R V I S V V V K Y A L L F V V L L G T F R F L F D A RN G P Y Y KA I L W WR F FW WS V V F V V V V V I F F L K K RN YW W P L PA T A I I AG A -I R V T P R P V L L V L V A L L GW W C L A F R P S R KG R R K L V V L GC L L L V A V I L V V T M M V GG AM M SC A R T V T W WRNW WL A S L RN PW WV QA V V V A A I GA S V A L Y L A T N L A E -Q L R QG T T K SH SM MA EN S T N L AQ -Q L R EG T T K SH SM MA EN T I N L AD -E L R QG T T K SH SM MA EN -N N L AQ -E L R EG T T K SH SM MA EN D I G L A K -K L K V G T A K SH NM MA EN SN N L A I -Q L R EG T S K SH SM MA EN H KG F V E -EM M R F V AM M R L H T KD QA H KG F V E -EM M R F V AM M R L H T RD QA H KG F V E -EM M R F V AM M R L H T KD QA R KG F V Q -EM M R V V AM M R L H T KD QA S KG F V E -EM M R F V AM M R L H T KD QA D KG F I A -EM M R K V AM M K L H T KD QA EGG F V Q -EM M R A V AM M R L H T K E QA KN S F V QW WDM M R RA AM M K L H T RD Q S F P P I L T -E L R D V AM M K L H T R E QA S V P F T Q -EM M R Q KAM M S L H T F S QA V L T F T K -L L R K R T NQ K R T A S A A ED S F S E -E L Q T R T R RQH D L S N T -D D FW W I -R L R R E T R R EH N V S N T A E P L T K -R L S KA S R K I H N V S N S E V P L S I -AM M R KA T R K I H R E S D A SG R L T E -RM M R K S T R R I H GK T D R
Because of their low similarities (protein identities: 22.41~27.68%: 1e-05~1e-10) to other homologous proteins, including the cyanobacterial heme oxygenase, the origin of HMOX1 were still ambiguous. However, HMOX1 was only present in plastid-bearing eukaryotes and this was the only gene that possesses the plastid-targeting transit peptide among the heme oxygenase families. Therefore, we concluded that HMOX1 was involved in plastidal function. The phylogenetic position of HMOX1 in red algal-derived secondary endosymbionts suggested that HMOX1 was likely derived from secondary endosymbiosis events followed by the gene transfer to the nuclear genome of the host (Fig 4). Indeed, a trafficking experiment in Chlamydomonas reinhardtii showed that Chlamydomonas nuclear-encoded HMOX1 gene targets to the plastid [63]. Although none of these studies showed the trafficking of red algal HMOX1, it is highly likely that red algal HMOX1 targets the plastid because of the monophyly of red algae with the Viridiplantae. In Viridiplantae, Chlamydomonas reinhardtii, Arabidopsis thaliana, and Ceratodon purpureus provided experimental evidence for plastid trafficking with the N-terminal extension HMOX1 [70,73,74]. Nevertheless, function of HMOX1 genes in red algae needs further experimental investigation to elucidate its metabolic pathway.
HMOX2 gene. The red algal HMOX2, another nuclear encoded heme oxygenase gene, had one or more putative transmembrane domains in C-terminus (Fig 3). Transmembrane domain structures of red algae were homologous to those of metazoan (e.g., mammalian, amphibian) heme oxygenase [63,75]. Within the HMOX2 clade in the phylogenetic trees (Fig  4), red algae were grouped with diverse eukaryotes including chlorophytes, cryptophytes, and stramenopiles as well as non-photosynthetic fungi and Metazoa. Therefore, we would suggest that HMOX2 was derived from an ancient eukaryotic common ancestor. Gene network analysis and the sequence conservation between the HMOX2 and pbsA in protein alignment supported that the HMOX2 is related to the iron uptake function. Indeed, it has been reported that C. reinhardtii captured extracellular heme as an iron source with an association between the HMOX2 protein and the cytosolic membrane [63].

Conclusion
Three Nemaliophycidae plastid genomes were completely sequenced and annotated. These included two exclusively freshwater species Kumanoa americana and Thorea hispida and the marine species Palmaria palmata. Until recently, plastid genome data were used mainly for phylogenomic analysis (e.g., [47]), divergence time estimation (e.g., [10]), comparative structural analysis (e.g., [39]), and the development of red algal molecular markers for DNA barcoding studies (e.g., [12]). In this study, we focused on finding genomic clues for an environmental adaptation between marine and freshwater red algal species.
Based on the environment-specific genes of the heme oxygenase family in red algae, we postulate that red algae have adapted efficiently in differing iron concentration conditions through the retention or loss of the heme oxygenase genes. Although this study included only a few freshwater red algal species and was not able to present the direct evidence of correlation between habitat and gene retention, we demonstrated the general trend of red algal plastid gene loss (ycf34, ycf35, ycf46, and pbsA) in iron-limited marine habitats.
We also demonstrated different evolutionary strategies of three types of heme oxygenase genes in different lineages (e.g., presence of HMOX1 with gene duplications [HY1, HO2, HO3 and HO4), but the absence of HMOX2 in the streptophytes, which include charophytes and land plants) and habitat conditions (e.g., pbsA genes in the marine and freshwater red algal species). It is generally known that plastid genes are in the process of reduction (either complete loss or gene transfer to the host nucleus) after its endosymbiotic origin [76]. However, gene loss from the plastid genome appears functionally constrained as demonstrated in pbsA and its gene homologs. Through selective gene retention, red algae successfully adapted to different aquatic environments over billions of years of evolutionary history.