Identification and evolutionary characterization of salt-responsive transcription factors in the succulent halophyte Suaeda fruticosa

Transcription factors are key regulatory elements that affect gene expression in response to specific signals, including environmental stresses such as salinity. Halophytes are specialized plants that have the ability to complete their life cycle in saline environments. In this study we have identified and characterized the evolutionary relationships of putative transcription factors (TF) in an obligate succulent halophyte, Suaeda fruticosa, that are involved in conferring salt tolerance. Using RNA-seq data we have analyzed the expression patterns of certain TF families, predicted protein-protein interactions, and analyzed evolutionary trajectories to elucidate their possible roles in salt tolerance. We have detected the top differentially expressed (DE) transcription factor families (MYB, CAMTA, MADS-box and bZIP) that show the most pronounced response to salinity. The majority of DE genes in the four aforementioned TF families cluster together on TF phylogenetic trees, which suggests common evolutionary origins and trajectories. This research represents the first comprehensive TF study of a leaf succulent halophyte including their evolutionary relationships with TFs in other halophyte and salt-senstive plants. These findings provide a foundation for understanding the function of salt-responsive transcription factors in salt tolerance and associated gene regulation in plants.


Introduction
Salinity causes significant losses in agricultural production due to the limited capacity of crops to regulate homeostasis [1]. Halophytes are specialized plants that have adapted to tolerate high salt concentrations through complex mechanisms of gene expression and protein pathway adaptation [2]. In adverse environments, halophytes utilize a variety of physiological and metabolic responses to regulate stress-responsive genes and synthesize functional proteins through a complex signal transduction network to confer salinity tolerance [1]. Moreover, a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 acceptance. Differentially expressed (DE) genes and the entire assembled transcriptome were translated using Transdecoder software and the protein sequences clustered using CD-HIT [13].
Transcription factors were identified and used to search against the Plant Transcription Factor Database 3.0. HMM profiles of the 57 families were obtained and used to search against the S. fruticosa proteome using profile hidden Markov search in HMMER with an E-value cutoff of 10 −10 . Codes for TF prediction, DE TF identification and phylogenetic tree construction are available (S5 Fig). To identify potential TFs in the transcriptome and classify to which family each belongs to, we utilized HMM-based TF domain identification and protein homology search on the available transcriptome sequences of S. fruticosa (Fig 1).
Analysis of differential expression between treatments of 0 mM and 300 mM NaCl from S. fruticosa was performed using the EdgeR package from R [14]. We used the generalized linear models for data analysis for different salt concentrations of treatment and biological replicates. This differentiates the number of expressed transcripts across experimental conditions. We then searched and identified TF from the differentially expressed list using a profile hidden Markov search in HMMER [15] using an E-value of 10 −10 against the database from PlantTFDBv3.0. These TF were annotated based on gene ontology, and their functional domains and structures using BLAST2GO against NCBI non-redundant (nr) and SwissProt protein databases with a similar E-value cutoff of 10 −10 . Enrichment analysis for specific gene ontology for biological process, molecular function and cellular components were determined using default parameters. Functional interactions between DE TFs were performed using STRING software version 10. STRING is a widely used database and web interface to explore protein-protein interactions, including physical and functional interactions [16].
We performed protein sequence comparison of S. fruticosa TFs to known TFs of other green plant species using NCBI BLAST (Table 1). S. fruticosa protein sequences were generated in Fasta, and multiple sequence alignment was performed using EMBL-EBI Clustal Omega version using default parameters with input order.

Molecular and evolutionary analysis of gene structure and motif composition of selected TF families
In order to generate multiple sequence alignment of an entire TF family and construct a corresponding Maximum-Likelihood (ML) gene tree we used an alignment-tree co-estimation algorithm implemented in PASTA [17]. PASTA has been shown to produce accurate alignments and generate trees on large datasets. First, we ran PASTA for two iterations to generate TF family alignments and masked sites with <5% data. Second, we used that masked alignment to extract homologous genes from the S. fruticosa transcriptome using profile hidden Markov search in HMMER [15] with the E-value cutoff of 10 −10 . These gene hits were then combined with the original TF family sequences and the alignment and tree was co-estimated again in PASTA (Fig 1). Constructed trees from all plant TF families are uploaded and can be viewed using FigTree from this source: (will be deposited in Dryad repository upon acceptance).

Validation of differentially expressed transcription factors
Suaeda fruticosa seedlings were grown at Brigham Young University, Provo, Utah, USA according to the optimized protocol [10] under long term salinity treatment. After 8 weeks of growth, NaCl (0, 300 and 900 mM) was gradually introduced at the rate of 150 mM NaCl after 48 h intervals to avoid osmotic shock in such a way that all final salinity concentrations were achieved on the same day [10]. Plant samples of three biological replicates from roots and shoots were treated at low (0 mM NaCl) and optimal (300 mM NaCl) salt conditions and used  Evolutionary characterization of salt-responsive transcription factors in Suaeda fruticosa for transcriptome sequencing. For qRTPCR analysis RNA was isolated from 0 mM, 300 mM, and 900 mM NaCl (high inhibitory) grown plants. Transcription factors identified were selected for validation of differential expression using qRTPCR. For each qRTPCR reaction, 1 μg of RNA of 0 mM, 300 mM and 900 mM NaCl treated samples were reverse transcribed into cDNA using oligodT primers and Superscript IV (Life Technologies), and the cDNA libraries produced were used for qRTPCR as described [18]. The 900 mM samples were included as this high concentration is very inhibitory to plant growth, providing another comparison point. Primer sequences are available as supplementary information (S1 File). We ran second strand synthesis using an ABI Plus One thermocycler with annealing temperature of 58˚C. To assess validation for each gene, qRTPCR data were analyzed based on ΔΔCT and 2 -ΔΔCT method. The ΔCT value of each gene was calculated by subtracting the CT value of the endogenous control from the CT value of the target gene.
We selected the alpha tubulin gene as an endogenous control. Primers were designed from the top DE TF from S. fruticosa transcriptome sequences and optimized for RTPCR. We chose to sample 3 gene targets per family. Expression analysis using ΔΔCT, 2 -ΔΔCT and standard error of the mean were calculated using the data analysis package in Microsoft Excel. Data were plotted as mean fold change (2 -ΔΔCT ). Statistically significant differences (p < 0.05) were determined using a one-tailed two-sample t-test assuming equal variances for comparison of the fold change values between biological replicates using GraphPad Prism software.

Molecular characterization of abundant transcription factor families
We previously reviewed transcription factors identified in various halophytes that activate genes involved in cell maintenance, modifications and stress responses [8]. In this current work we performed protein sequence comparisons of S. fruticosa TF to known TF of other green plant species using NCBI BLAST ( Table 1). Analysis of the amino acid sequence alignment of the S. fruticosa TFs against known Spinacia oleracea and Chenopodium quinoa TFs shows conserved sequences throughout the four families tested (S1, S2 and S3 Figs). We found that S. fruticosa CAMTA TF are related to Calmodulin-binding transcription activator 2-like protein in Spinacia oleracea and Calmodulin-binding transcription activator 3-like protein in Chenopodium quinoa (Fig 2). We found that S. fruticosa BZIP TF are related to BZIP TF 16-like isoform X2 in S. oleracea and BZIP TF 16-like protein in C. quinoa (S1 Fig Table 2).
The results show that the most abundant TF family in S. fruticosa belongs to FAR1 with 177 identified TFs (8.18%). TF family bHLH is the next highest with 142 members (6.56%), followed by MYB with 134 TF (6.19%) and RAV as the fourth most abundant with 117 TF (5.41%). The smallest family belongs to HRT-like with only one hit. No TFs from the LFY gene family were found. These abundant TF are likely involved in other functional and structural mechanisms in the plant rather than salinity stress responses.

Evolutionary characterization of salt-responsive transcription factors in Suaeda fruticosa
Although the FAR1 family has the highest number of identified TFs in Suaeda, a different pattern was observed when differentially expressed (DE) genes were quantified. No FAR1 TFs were differentially expressed between the tested salt treatments. This suggests that the FAR1 TF family might have a different function rather than long-term salinity stress regulation. The bHLH family is the second highest in abundance with two DE bHLH TF between long-term no salt and optimum salt treatment in S. fruticosa. The MYB TF family was the third highest in abundance in our analysis. RAV is the fourth most abundant TF family identified in this study with two DE genes.

Evolutionary analysis of transcription factor encoding genes in Suaeda fruticosa
We reconstructed 57 ML TF family trees using the iterative alignment-tree searching algorithm in PASTA (Fig 3). The CAMTA TF family tree shows that the majority of DE and non-DE TF genes formed single monophyletic clades (Fig 2) [19,20]. In the bZIP TF family, most

Identification and annotation of differentially expressed transcription factor genes
We have focused on salt-responsive transcription factors that are differentially expressed (DE) between long-term contrasting conditions (no salt 0 mM NaCl versus optimum salt 300 mM NaCl concentration) with plants grown in a growth chamber. We performed differential  (Fig 3). We chose the top 4 DE TF families (MYB, CAMTA, MADS-box and bZIP) for expression profiling, phylogenetic tree construction and gene ontology annotation by analysis of available information from the databases (Fig 4). The MYB superfamily contains the highest number of DE TF between treatments and is the third most abundant TF family ( Table 2 and Fig 3) found in S. fruticosa.

Protein interaction network of differentially expressed transcription factors
The summary network of predicted physical and functional interactions among the identified DE TFs suggests involvement in flowering, stomatal development and stress regulation (S5 Fig). Importantly, the protein relationships predicted in S. fruticosa using Arabidopsis homologs MYB (FLP1, MYB13, LHY), ARR-B PCL1, and MADS-box AGL24 are involved in the same interaction network.
Genes that belong to the top DE TF families were also examined for their predicted interactions and potential functions with other genes (Fig 4). From the identified interactions between DE TFs, two S. fruticosa genes (Locus_17372_Transcripts_9,12), encoding similar identity with FLP (88% identity) and MYB88 (79% identity), contain a putative MYB transcription factor involved in stomata development (Fig 4A). The loss of FLP activity results in failure of guard mother cells to adopt the guard cell fate [22]. FLP and MYB88 negatively control the expression of genes associated with stomatal development but positively regulate gene expression related to stress conditions. Double mutants of FLP and MYB88 are more susceptible to drought and salt stress and lose water significantly faster than wild-type [23]. This suggests that these individual TFs may play important roles in salt regulation.
Four DE genes (Locus_36812_Transcript_1,2,5,6) related to LHY or CCA1 are predicted to interact with other MYB TFs (Fig 4B). CCA1 regulates ELF4 and ELF3 that are involved in circadian control and phytochrome regulation in C3 and CAM leaves [24]. The DE MADSbox AGL24 (Locus_82944_Transcripts_1,3,4,6) homologue is also predicted to interact with these MYB homologs (Fig 4C). These clock-associated genes in Mesembryanthemum are unaffected by salt stress, suggesting compensation of the central circadian clock against development and abiotic stress in specialized plants [25].
Sequences identified from the BLAST and SwissProt databases were mapped with GO terms and assigned functional terms based on the gene ontology vocabulary (S6 Fig). The TF are assigned into three main categories: Biological process refers to the biological objective of the genes or gene products, molecular function as the biochemical activity of the genes, and cellular components as the place where the interaction of the gene product actively functions. Dominant categories include metabolic, developmental and single organism process and stimulus response (each comprising 9%) for biological processes (S6A Fig). There are 59 hits (31%)

Validation of DE by quantitative reverse-transcriptase PCR (qRT-PCR) analysis
To validate the results from the transcriptome analysis, we selected DE genes in each of the top four families (bZIP, CAMTA, MYB and M-Type) for quantitative reverse transcriptase PCR (qRT-PCR) analysis to measure gene expression among different treatments and tissue types (roots and shoots). Specific primers were optimized for the selected TF genes using alpha Evolutionary characterization of salt-responsive transcription factors in Suaeda fruticosa tubulin as an endogenous control (S1 File). We amplified cDNA libraries from three biological replicates of roots and shoots for 0 mM and 300 mM treated plants. Several of the tested gene targets showed similar changes in gene expression (Fig 5) to those observed in the transcriptome analysis [11]; e.g. M-Type 26 and 28 (MADSbox AGL24) genes, which are downregulated at 300mM NaCl concentration. In addition, in this current analysis we included analysis of differential gene expression in plants grown in 900 mM NaCl, which significantly inhibits growth of S. fruticosa. Statistically significant decreases in expression of bZIP57 are observed in the 300 mM treated shoots, which correspond closely with the RNA-sequencing results. Similarly, there is a decrease of CAMTA12 expression in the 300 mM shoots. MADSbox29 shows a significant decrease of expression in shoots in optimal growth conditions of 300 mM NaCl, while MYB72 shows upregulation on the same tissue type and treatment.

Discussion
To identify and characterize putative transcription factors (TFs) in an obligate halophyte, Suaeda fruticosa, we utilized the RNA-seq data published earlier [11] to identify and characterize putative TFs that are potentially involved in salt tolerance. We analyzed the expression patterns of specific TF families, protein-protein interactions and evolutionary trajectories to predict the roles of differentially expressed TFs in salt tolerance. TF families with the most differentially expressed (DE) genes in response to salinity were identified as members of the MYB, CAMTA, MADS-box and bZIP families.
The FAR1 family has the highest number of identified TFs in S. fruiticosa, but none were observed to be differentially expressed between the tested salt treatments. This suggests that the FAR1 TF family is likely not involved in long-term salinity stress regulation but rather has other functions in the plant. As one possible example, Arabidopsis FAR1 TFs have been reported to bind to promoters of abscisic acid (ABA) genes to activate expression. In particular, under salt and osmotic stress, FAR1 has been shown to trigger the accumulation of ABA [26]. When FAR1 genes lose their functionality (e.g. by deletion), sensitivity to ABA-mediated inhibition of seed germination is reduced. Also, FAR1 member fhy3 and far1 mutants exhibit wider stomata, lose water faster, and are more sensitive to drought [27].
The second highest number of TFs identified were of the bHLH family. BHLH TFs are involved in salt stress tolerance and developmental processes in tobacco [28] and rice [29,30]. However, there are limited halophyte studies focusing on the involvement of bHLH TFs in drought and salinity stress in halophytes [31,32]. Overexpression of some bHLH genes were found to confer increased tolerance to salt and osmotic stress in Arabidopsis. This TF family has been observed to positively regulate salt-stress signals independent of ABA, and have been targets to improve salt tolerance in crops [33].
MYB TFs were the third highest in abundance, and are known to operate through ABAdependent or independent pathways. Among genome-wide identification and expression analyses related to plant abiotic stress, MYB is one of the most studied TF families in halophytes [31,34]. It has been suggested that following duplication events, MYB TFs often undergo subfunctionalization [35]. The MYB TF family is involved in controlling various cellular processes, including several abiotic and epigenetic control of stress responses [36]. This is consistent with our findings that S. fruticosa roots at 300 mM salt conditions show downregulated expression of MYB 07 and MYB 37, while expression is upregulated under stress conditions (no salt 000 mM and 900 mM NaCl treatments.) MYB plays diverse physiological and developmental roles that are either induced or repressed under different stress conditions [6]. In Arabidopsis, MYB2 is induced by salt and drought stress. Rice OsMYB2 encodes a stress-responsive MYB that plays a regulatory role in Evolutionary characterization of salt-responsive transcription factors in Suaeda fruticosa Fig 5. qRT-PCR validation of differentially expressed transcription factor genes in S. fruticosa. Fold changes in expression were calculated using alpha tubulin as the endogenous control. Standard error of the mean was calculated using the Prism Graph Pad data analysis package. R000 (roots at 0 mM NaCl), R300 (roots at 300 mM NaCl), R900 (roots at 900 mM NaCl), S000 (shoots at 0 mM NaCl), S300 (shoots at 300 mM NaCl), S900 (shoots at 900 mM NaCl).
salt, cold and dehydration [37]. In the halophyte Avicennia marina, the AmMYB1 gene confers increased salt tolerance with reduced chlorosis and other salt stress symptoms when introduced to tobacco plants [38]. The DE FLP and MYB88 putative MYB transcription factors may be involved in stomata development. The loss of FLP activity results in failure of guard mother cells to adopt the guard cell fate [22]. FLP and MYB88 negatively control the expression of genes associated with stomatal development but positively regulate gene expression related to stress conditions. Double mutants of FLP and MYB88 are more susceptible to drought and salt stress and lose water significantly faster than wild-type [23]. This suggests that these individual TFs may play important roles in salt regulation. These findings suggest that the MYB TF family in S. fruticosa is the most likely key transcription regulator for salt tolerance regulation.
Several differentially expressed M-type MADS-box genes, the third most abundant group, were identified in our study. The ancestral functions of MADS-box genes are currently unknown. Comparison of some MADS-box genes in Arabidopsis showed that they are polyphyletic with significantly longer branch lengths than for other genes, suggesting that they could be pseudogenized as a result of neutral evolution [39]. Most likely these copies appeared via whole genome duplications and intraspecific gene duplications [39,40].
The RAV TF family was found to have two DE genes in S. fruticosa. Some members of the RAV family have been found to modulate drought and salt-stress responses in Arabidopsis and are involved in ethylene and brassinosteroid responses [41].
Some genes that were selected from each TF family for qRT-PCR analysis showed statistically significant differential gene expression (Fig 5), while others did not (S7 Fig). This suggests that not all of the TFs tested are strongly linked to salt stress, and some may be involved in other pathways such as maintenance of the homeostatic balance in the plant. Four MADSbox DE genes were identified in Suaeda upon salt treatment. MIKC and M-type TFs show similar gene hits since both belong to the same MADS-box TF family. MIKC type TFs contain a keratin-like coiled-coil (K) domain while M-type lacks this domain. MADS-box family TF genes are involved in fruit development, seed pigmentation, floral organ identity determination, and stress response in several species [42]. MADS-box family TFs are potential candidates for salt regulation in S. fruticosa. In Brassica rapa, several MADS-box family TFs were shown to be induced by cold, drought and salt stresses [43]. In rice, three genes (OsMADS2, 30 and 55) showed more than 2-fold downregulation in response to dehydration and salt stress [44]. We also investigated another part of the MADS-box family, the M-type TFs, involved in flowering and reproduction organ development. This likely explains that we cannot adequately compare expression levels in the root tissue because these TFs are most likely not expressed in the roots [45].
Whole genome/large-scale chromosomal duplications play a crucial role in increasing copy number of CAMTA TF genes [20]. The close-relatedness of DE TF paralogs found most likely indicates that these genes were duplicated separately from other non-DE TFs, and subsequently their expression patterns and regulatory mutations were preserved by species-specific environmental constraints related to increased salt concentration [19]. Based on observed patterns, we hypothesized that such large CAMTA family expansions can be explained by smallscale gene duplication events (e.g. via unequal crossing over).
There were three differentially expressed bZIP TFs identified upon salt treatment in S. fruticosa. The BZIP TF family regulates light responsive genes and abscisic acid (ABA) mediated abiotic stress signaling pathways [46], and involves binding to G-box motifs [47]. The group F bZIP family from Arabidopsis and its related halophyte species was identified to be a key regulator of salt stress adaptation [48]. Arabidopsis AREB1, AREB2 and ABF3 are also important genes for signaling under drought stress. Group A bZIP in rice and tomato confers increased tolerance to water deficit and salt stress [49]. The S. fruticosa bZIP family is related to bZIP16, which acts as a repressor of LHCB2.4 [50]. In the shoots of S. fruticosa, we found that BZIP 59 is downregulated in 0 mM salt conditions compared to 300 mM. Similarly, 900 mM salt treated roots also have downregulated BZIP59 compared to 300 mM salt condition. The results with BZIP59 suggest that this TF may play a role in the optimal growth of S. fruticosa at 300 mM NaCl, as it is downregulated in both the no salt and elevated salt plants.
One of the major bZIP family expansions was observed on the branch that leads to seed plants [51]. Moreover, its evolution-by-gene duplication patterns fit to a random birth-deathmodel, suggesting that new gene copies occurred as a result of small-scale duplication events rather than whole genome/chromosome duplications [51].
Protein interaction predictions identified a number of potential important interactions involved in salinity tolerance. The AGL24 transcriptional activator mediates effects of gibberellins on flowering and regulates the expression of LFY genes for floral induction and development. A homologue of MYB13 (Locus_37251_Transcript_2) is involved in response to salt stress, jasmonic acid and gibberellin [25] and interacts with homologue PCL1 (Locus_119717_Transcript_1,2). PCL1 works as a transcriptional activator involved in circadian rhythm and regulation of flower development in Arabidopsis [52]. CAMTA3 was predicted to interact with DREB dehydration response elements. Studies of CAMTA3 in other plants reveal that it negatively regulates plant defense and suppresses salicylic acid accumulation and disease resistance. Calcium ion/calmodulin binding through CAMTA3 is critical for wound response. Overexpression of AtSR1/CAMTA3 effectively confers plant resistance to herbivore attack through salicylic acid/jasmonic acid crosstalk regulation [53,54].

Conclusions
In conclusion, we have identified several differentially expressed transcription factor genes in S. fruticosa, conducted phylogenetic analysis for top DE TFs, performed expression pattern analysis, and annotated predicted individual TFs involved in interaction networks. Phylogenetic analysis showed that the observed DE TFs are strongly conserved across plant species. This builds upon the very limited available information on TFs in succulent halophytes. Only minimal information on succulent halophytes is available [55], although considerable work has been done to identify transcription factors in non-succulent halophytes [56]. The results presented here provide basic information on key regulator TFs of S. fruticosa and contribute to an increased understanding of salt tolerance mechanisms of a succulent halophyte that may be utilized for the improvement of halophytes as non-conventional crops. Future analyses should include individual examination of the transcription factors identified in relation to salt tolerance between halophytes and salt-sensitive glycophytes. Each graph shows the qRTPCR results for test genes. The annotated putative genes are titled and the mean fold change represented by the 2-ΔΔCT method relative to 0 mM treated samples are shown on the y axis. Error bars depict the standard error of the mean for 3 biological replicates. R000 (roots at 0 mM NaCl), R300 (roots at 300 mM NaCl), R900 (roots at 900 mM NaCl), S000 (shoots at 0 mM NaCl), S300 (shoots at 300 mM NaCl), S900 (shoots at 900 mM NaCl).