Expression analysis of microbial rhodopsin-like genes in Guillardia theta

The Cryptomonad Guillardia theta has 42 genes encoding microbial rhodopsin-like proteins in their genomes. Light-driven ion-pump activity has been reported for some rhodopsins based on heterologous E. coli or mammalian cell expression systems. However, neither their physiological roles nor the expression of those genes in native cells are known. To reveal their physiological roles, we investigated the expression patterns of these genes under various growth conditions. Nitrogen (N) deficiency induced color change in exponentially growing G. theta cells from brown to green. The 29 rhodopsin-like genes were expressed in native cells. We found that the expression of 6 genes was induced under N depletion, while that of another 6 genes was reduced under N depletion.


Introduction
Microbial rhodopsins are light-receiving membrane proteins that act as light-driven ion pumps, light-driven ion channels, light-driven enzymes, and photosensors [1]. The rhodopsin protein consists of seven transmembrane helices and binds an all-trans-retinal chromophore. The all-trans-retinal chromophore binds to a lysine residue conserved in the seventh transmembrane (TM) helix of all microbial rhodopsins through a protonated retinal Schiff base (SB) linkage. For decades, since light-driven H + pump bacteriorhodopsin (BR) was discovered in Halobacterium salinarum (formerly H. halobium) [2], microbial rhodopsin had been considered a unique protein possessed by a limited number of species, such as halophilic archaea. However, metagenomic analysis in the 2000s revealed that many marine prokaryotes have ion-pumping rhodopsins [3]. In 2002, a rhodopsins in the green alga Chlamydomonas reinharditii was found to act as a light-gated ion channel [4]. Histidine kinase rhodopsin (HKR) was also found in C. reinharditii, which contains a histidine kinase domain and response regulator domain connected to the C-terminal side of the rhodopsin domain as the first enzymatic rhodopsin [5]. The other enzymatic rhodopsin family is found in eukaryotes, and example include rhodopsin-guanylate cyclase (Rh-GC) [6] and rhodopsin-phosphodiesterase (Rho-PDE) [7]. Moreover, a new group of rhodopsins named heliorhodopsin (HeR) was also found in nature [8]. Heliorhodopsins display less than 15% sequence identity with microbial and animal rhodopsins. In the membrane, HeR is oriented in the opposite direction to the other rhodopsins. It is now revealed that microbial rhodopsins are widely distributed in not only bacteria but also cyanobacteria, algae and giant viruses [9][10][11]. Physiological roles,are related to energy production, phototaxis, regulation of gene expression, and photoautotrophy. ATP synthesis measurement in mutants of haloarchea indicates that BR and light-driven Cl − pump halorhodopsin (HR) generate proton motive force (PMF) depending on light [12,13]. Sensory rhodopsins (SR) and channel rhodopsins act as photosensor for photomotility in H. salinarum [14] and C. reinharditii [15], respectively. Anabaena sensory rhodopsin (ASR) activates a soluble transducer protein (ASRT) and regulates gene transcription [16]. Proteorhodopsin (PR) contributes to phototrophy in some species of flavobacterium [17] and proteobacterium [18] in the marine environment. Recent advances in genome research have led to the discovery of many proteins that are similar to microbial rhodopsin but lack the conserved retinal-binding lysine residue (Rh-noK). There are currently 5,558 known genes of microbial rhodopsins, including HeRs, of which approximately 600 are Rh-noK. [19]. Some Rh-noK genes were tandemly arranged with PR and retinal biosynthesis genes forming a putative operon [19].
Although the molecular properties of microbial rhodopsins are studied by many approaches, studies on the physiological functions of these molecules in nature are still in progress. Metagenomics analyses revealed that the abundance of the PR gene is negatively correlated with nitrate concentration; however, there was no significant correlation with light intensity in the north of the Sargasso Sea [20]. Nitrogen is an important source of metabolites including amino acids. Gene expression analysis of proteorhodopsin-containing flavobacteria Dokdonia sp. MED134 revealed that the carbon fixation pathway was shifted to that with anaplerotic CO 2 fixation under light conditions [21]. The effect of light was more significant in the poor-nutritional environment [21]. These results suggest that microbial rhodopsins are related to primary metabolisms processes in Dokdonia sp. MED134, such as carbon (C) and nitrogen (N) assimilation. N availability is limited in the marine environment; therefore, N availability could be the rate-limiting condition for primary metabolic processes, such as CO 2 fixation [22]. Although nitrate ions (NO 3 − ) are a major source of N in the marine environment, anthropogenic activities result in loading a high concentration of chemically reduced forms such as ammonium ions (NH 4 + ; [23]). To assimilate NO 3 − into amino acids, it first needs to be reduced to NH 4 + . Cryptophytes are unicellular algae ubiquitously found in marine and freshwater habitats [24]. Cryptophyte is considered a model taxon to study the evolution of plastids. Cryptophytes have a secondary plastid that has been acquired from other eukaryotes with primary plastids [25]. Guillardia theta is a cryptophyte isolated from coastal seawater [26]. Owing to the importance of the evolution of plastids, G. theta was the first cryptophyte whose nuclear genome was sequenced [27]. The nuclear genome of G. theta encodes many genes similar to microbial rhodopsins. While the molecular functions of some of these genes were characterized by heterologous expression systems [28], the molecular properties and physiological functions of many G. theta rhodopsins are currently not known. However, after the report of the genomic sequence of G. theta, new functional molecules have been reported, such as natural anion channels [29] and DTD-cation channels [30,31]. Based on these studies, interest in the uses of these rhodopsin-like proteins in native cells has been garnered.
In this study, we investigated the expression pattern of rhodopsin-like genes in G. theta under various growth conditions. The N deficiency induced color change in exponentially growing G. theta cells from brown to green. The expression of 29 rhodopsin-like genes was observed in native cells. We revealed that the expression of 6 genes was induced under N depletion, while those of the other 6 genes were reduced under N depletion. We show that some of the rhodopsin-like genes are related to the regulation of N-assimilation by energy production in this organism.

Phylogenetic analysis of microbial rhodopsin-like genes
Using the gene-specific NCBI reference sequences (RefSeq) database, we found 44 rhodopsinlike genes in the G. theta genome, although over 50 genes were suggested in a previous study [29]. The difference in the number of rhodopsin-like genes derived from the improvement of annotation in the database. Although two of them were predicted to encode hypothetical 7-transmembrane receptors (Gt_161042 and Gt_162503), our phylogenetic analysis showed that their sequences could not align with the other 42 G. theta rhodopsin and the representative rhodopsin sequences. Based on these results, it is considered that these two genes do not encode microbial rhodopsin. The other 42 genes were predicted to encode microbial rhodopsin proteins. Phylogenetic analysis indicated that most G. theta rhodopsins showed low similarity to representative rhodopsins from other species (Fig 1). There are five clades of G. theta rhodopsins. Clade D and E contain four cation channels and two anion channels, respectively. A lysine residue corresponding to K216 of BR is one of the important residues of microbial rhodopsins because it forms a Schiff base linkage with the retinal chromophore. The lysine residues are also conserved in most G. theta rhodopsins, although nine of the 42 rhodopsins do not have lysine residues (Rh-noK; S1 Fig). Clades A, B, and D contain two, five, and two Rh-noKs, respectively.
Using cDNAs derived from the extracted mRNAs, we re-analyzed the amino acid coding regions of Gt_164280 and Gt_120390. Gt_164280 was compared to the model transcript in the RefSeq database (S2 Fig). The nucleotides corresponding to the 451 st to 462 nd base were deleted in the model transcript and the 710 th , 816 th and 869 th bases of re-analyzed sequence were replaced from T to C, T to C and G to A, respectively (S2A Fig). As a result, a four-amino acid insertion occurred in the amino acid sequence corresponding to TM4 in the predicted protein from the re-sequenced gene compared to that of the model transcript (S2B Fig). In reanalyzed sequence, Met234 and Ser287 of the model transcript were replaced to Thr and Gly, respectively. Gt_120390 was compared to both the model transcript and the genomic sequence (S3 Fig). The seventh exon predicted from the genomic sequence was an exact match to the mRNA sequence we determined, but the model transcript had a large deletion of 144 bases (S3A Fig). On the other hand, the mRNA we determined had a 44-base extension after the 10th exon to the intron region and then a stop codon was appeared. The 11th exon was deleted in our sequence. As a result, the transmembrane region of the predicted protein from the reanalyzed gene was consistent with that of the model transcript, but a 48-residue insertion and 14-residue amino acid substitution occurred in the C-terminal extension (S3B Fig). The C-terminal six residues were truncated in the predicted protein from re-analyzed transcript.
Guillardia theta also carries two heliorhodopsin-like genes (XM_005821825 and XM_005823076); however, we did not analyze the expression of these heliorhodopsins in this study.

Effects of nitrogen availability in the growth of G. theta
Guillardia theta could grow in an artificial sea water-based medium. The color of cells turned reddish-brown to green in the late culture period (Fig 2). The color change reflected a difference in the carbon (C) or nitrogen (N) availability. Cells cultured in the C-deficient medium remained reddish-brown at the late stage of culture, whereas cells cultured in N-deficient medium turned green earlier than those cultured in N-sufficient medium. Growth of G. theta cells in aeration culture (12 h/12 h day/night) was monitored (Fig 3). The cellular growth in NO 3 as the sole N source was stopped by day 5 (Fig 3A, top

PLOS ONE
The rhodopsin-like gene expression of Guillardia theta To investigate the effect of N nutritional status on the pigment content, pigment extraction from the cells was carried out with acetone, and the change in the chlorophyll content was investigated ( Fig 3B). As a result, the Chl a content decreased under the culture conditions where the color of the cell changed to green. In addition, the times when the Chl a content was less than 2 μg/ 10 5 cells and when the color change of the cell almost agreed with each other. These results suggest that chlorophyll content is one of the values that can be used as an indicator of N depletion.

Gene expression pattern in different N availability on G. theta
The expression pattern of 42 microbial rhodopsin-like genes was investigated under different nitrogen conditions. The full length of each rhodopsin-like gene was amplified by reverse transcription polymerase chain reaction (RT-PCR). Twenty-nine genes could be amplified from mRNA derived from native cells (S4 Fig), indicating that these 29 genes were expressed in native cells. Among them, 25 genes could be detected using quantitative RT-PCR method. Guillardia theta has a predicted beta-carotene 15, 15'-monooxygenase gene (blh; Gt_105242), which is an enzyme that produces all-trans-retinal from beta-carotene. Based on this result, the relative expression of 25 microbial rhodopsin genes and a blh gene under N-deficient conditions against N-sufficient condition were quantified (Fig 4). The expression levels of nine of the 25 genes and the blh gene increased in N depletion (Fig 4A). Among these genes, six microbial rhodopsin genes showed > 2-fold increase in N depletion. In particular, the expression level of two genes significantly increased; Gt_120390 (GtCCR1) showed 10-fold increase and Gt_111593 (GtACR1) showed a 136-fold increase in N depletion (Fig 4A, middle and  right panels, respectively). The expression levels of one anion channel and two cation channels significantly increased under N depleted conditions. The expression level of 15 genes decreased upon N depletion (Fig 4B). Among these genes, six genes showed a > 0.5-fold decrease in N depletion. The expression of the genes encodeing putative sensory rhodopsins Gt_092481 (GtR1) and Gt_085745 (GtR2), and proton pump Gt_139416 (GtR3) were significantly suppressed in N deficiency. The gene expression of Gt_122016 (putative HKR) also tended to decrease under N-deficient conditions. Six Rh-noKs were expressed in native cells. The expression of two of six Rh-noKs, Gt_150025 and Gt_150790, was significantly suppressed in N deficiency (< 0.5-fold decrease). There was no difference in the gene expression of Gt_159333 regardless of nitrogen conditional.

N availability of Guillardia theta
While NO 3 − assimilation requires more reducing power than NH 4 + , the preferential utilization of NH 4 + had inhibitory effects on the growth in aqueous culture conditions resulting from

PLOS ONE
The rhodopsin-like gene expression of Guillardia theta proton imbalance [32]. Therefore, there are variations in the preference for nitrogen sources among organisms. The growth pattern of G. theta cells indicated that these algae prefer to use NH 4 + as an N source rather than NO 3 − (Fig 3). The color of G. theta cells turned reddish-brown to green, which was related to the nitrogen nutritional condition (Fig 2). Rhodomonas sp, the other species of cryptophyta, also caused the cell color change in N depletion, similar to G, theta [33]. In this case, phycoerythrin was preferentially degraded compared with Chl a and c under N-limiting conditions [33]. Based on this knowledge, the reason for cell color change in N depletion (Fig 2) is predicted to decrease of color pigments, such as phycobilin, carotenoid, and chlorophyll. Quantitative analysis of phycobilin and phycoerythrin is a topic of future research.

Putative physiological functions of rhodopsin-like genes in G. theta
In our results, N depletion induced a decrease in chlorophyll a content ( Fig 3B) and an increase in the expression of some rhodopsin-like genes (Fig 4). There was an overall negative correlation between PR gene abundance and chlorophyll a concentrations (but not light) in the surface samples and the depth profiles [20]. In the latter, there was also a negative correlation between PR genes and inorganic nutrients. The decrease in chlorophyll a content during the greening period suggested that the solar energy conversion by photosystems I and II decreased during the greening period. These results suggest that many rhodopsin-like genes are expressed under N-deficient conditions to compensate for the decrease in the utilization of solar energy. One of the physiological functions was predicted to emerge PMF for ATP synthesis [1]. Gt_139416 (GtR3) induced hyperpolarization of hippocampal neurons in response to blue light illumination and inhibited neuronal spikes [34]. Although the ion transporting activity was not characterized in detail, this study suggested that GtR3 could function as a light-dependent H + pump. The expression of GtR3 decreased with N depletion (Fig 4B), suggesting GtR3 might play a role other than the generation of PMF in N-depleted cells. The phylogenetic analysis showed that G.theta rhodopsin genes form five distinct clades from other representative microbial rhodopsins (Fig 1 and S1 Fig). Clade A includes GtR1 and GtR2, which have been suggested in the past as sensory rhodopsin related to phototaxis [28], but the functions of the other molecules are not known. The function of the molecules in Clade B is still unknown. This clade contains five Rh-noKs, the most numerous of all clades. The gene expression of all Rh-noK genes in clade B tended to decrease in N depletion (Fig 4). Although the function of these Rh-noK genes remains unknown, there is a possibility that these Rh-noKs has some function to modulate nitrogen homeostasis. Two of the three molecules in Clade C contain the histidine kinase domain, so they are predicted to be histidine kinase rhodopsins (HKR). The photoreaction of rhodopsin domains was reported in HKRs from C. reinharditii [5]. However, the histidine kinase activity of HKR as well as its physiological functions have not been revealed. Clade D has four cation channels reported to date [30,31] and other unexplored molecules may also have cation channel activity. The Clade also contains two Rh-noKs. Clade E contains two anion channels GtACRs [29]. The channel activity of Gt_161302 was also investigated in this study; however no activity was observed [29]. GtACR1 transports NO 3 − more preferentially than other anions [29], so that the increase in GtACR1 expression could enhance the transport of NO 3 − in native cells to compensate for N depletion. Gt_120390 (GtCCR1) and Gt_111593 (GtACR1) showed relatively red-shifted maximum absorbance in the CCR and ACR family, respectively [29,30]. The expression of these genes was drastically increased in N depletion (Fig 4A). The cell color change in N depletion (Fig 2) should be the cause of change in the light absorbance spectrum of G. theta cells compared with N sufficiency. The change in light absorbance spectrum might be the cause of the increase in GtCCR1 and GtACR1 in N depletion.

Rh-noK expression
The expression of Rh-noK genes in native G. theta cells was first detected in this study. Unexpectedly, six Rh-noK genes were expressed in G. theta cells (Fig 4 and S4 Fig), indicating that these Rh-noK genes were expressed and coded as functional proteins. Rh-noK does not have a lysine residue that binds the chromophore, all-trans retinal, so it is likely to function without binding the all-trans retinal. Photoreceptor proteins such as phytochrome and cyanobacteriochrome have GAF domains that bind chromophore phytochromobilin [35]. The chromophore binds to conserved cysteine residues in the GAF domain. However, as Rh-noK has no conserved lysine residues, there are also proteins that have a GAF domain but no conserved cysteine residues [36]. Although these proteins could not bind any chromophore, they have many diverse functions, such as a sensor for sodium-ion [37] and chloride ion concentrations [36]. Many Rh-noK genes have been found in organisms such as fungi and algae [38]; therefore, they play important roles in their lives. In yeast, Rh-noKs (called ORPs) are involved in the regulation of the plasma membrane H + -ATPase [39] and the maintenance of pH homeostasis [40]. It has been suggested that it has been shown to function as a chaperone [41]. The study of Rh-noK will be an interesting field for future investigation. This study focused on the gene expression pattern of microbial rhodopsin-like genes in G. theta and revealed that nitrogen nutrient conditions affected to gene expression patterns. In the next step, the interest should be on the gene and/or protein expression under other circumstances, such as light conditions. The temporal and positional patterns of their expression will also be interesting to reveal the functional differentiation among these rhodopsins in native cells.

Cell culturing
Guillardia theta CCMP2712 was obtained from the Provasoli-Guillard National Center for Marine Algae and Microbiota. Cultures were grown in polyethylene culture flask in h/2 media aerated at 25˚C under white light (30 mmol photons m −2 s −1 ) with light-dark cycle (12 h: 12 h). The growth rate was monitored by the optical density at 730 nm (OD 730). The value of OD 730 was correlated with the cell number which was counted with a cell counting plate (Fukaekasei, Japan). The cell number was then calculated according to the equation described below.
Cell number ðcells=mLÞ ¼ 4:1 � 10 6 � OD 730 ð1Þ Measuring chlorophyll content. Cells were collected from 1 mL culture by centrifugation. The collected cells were finally suspended in 90% acetone to obtain an extract containing all the pigments. The concentrations of chlorophyll a in the cells were determined by absorption using a UV/VIS spectrophotometer (Unicam UV 550, Thermo Spectronic, UK) and calculated according to the equations described below [42].

RNA extraction
Cells were grown under different nitrogen conditions (Table 1) based on artificial seawater media (h/2 media). The cells harvested by centrifugation at 3,000 × g for 15 min at 4˚C. Total cellular RNA was isolated using the RNeasy mini kit (QIAGEN, Germany) with a manufacturing protocol.

RT-PCR to amplify full length of microbial rhodopsin genes
The RNA was reverse-transcribed using the SMARTerTM RACE cDNA Amplification Kit (Takara Bio, USA). Oligo-dT primers and random primers (N-15) were used for the first strand synthesis. The full-length microbial rhodopsin-like genes were amplified using Q5 High-Fidelity DNA Polymerase (New England Biolabs, USA) and gene-specific primers based on the mRNA sequences in the NCBI database (Table 2). Gene-specific primers contained restriction enzyme recognition sites to clone each gene to the pET21a vector.

Reverse-transcription quantitative PCR (RT-qPCR)
The RNA was reverse-transcribed using ReverTra Ace1 qPCR RT Master Mix with gDNA remover (TOYOBO, Japan). The expression levels of each microbial rhodopsin-like gene were determined using real-time PCR assay. Real-time PCR was performed on an Eco™ Real-Time PCR System (Illumina, USA) using 1 ng total RNA eq. of cDNA for each sample. THUNDER-BIRD™ SYBR1 qPCR Mix (TOYOBO, Japan) was used to detect products, and 10 μM primers were used. The relative amount of cDNA in each sample was normalized using Gt_95624 (encoding an actin gene), and the melting curve was used to verify specificity. PCR was initially set at 95˚C for 60 s, followed by 42 cycles of 95˚C for 15 s and 60˚C for 60 s. The melting curve was set at 95˚C for 15 s, 55˚C for 15 s, and 95˚C for 15 s. Each gene-specific primer was based on the mRNA sequences in the NCBI database (Table 3).

Phylogenetic analysis
The amino acid sequences used for phylogenetic analysis were those registered in the NCBI Reference Sequences (RefSeq) except for Gt120390 and Gt164280. The amino acid-coding sequences of Gt120390 and Gt164280 were re-analyzed and deposited to the public database (accession numbers are MF039475 and LC591948, respectively). The evolutionary history was inferred using the neighbor-Joining method [43]. The optimal tree with the sum of branch length = 33.09495591 is shown. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Poisson correction method [44] and are in the units of the number of amino acid substitutions per site. The analysis involved 85 amino acid sequences. All ambiguous positions were removed for each sequence pair. There were a total of 409 positions in the final dataset (S1 Dataset). Evolutionary analyses were conducted in MEGA6 [45].