Spt-Ada-Gcn5-Acetyltransferase (SAGA) Complex in Plants: Genome Wide Identification, Evolutionary Conservation and Functional Determination

The recruitment of RNA polymerase II on a promoter is assisted by the assembly of basal transcriptional machinery in eukaryotes. The Spt-Ada-Gcn5-Acetyltransferase (SAGA) complex plays an important role in transcription regulation in eukaryotes. However, even in the advent of genome sequencing of various plants, SAGA complex has been poorly defined for their components and roles in plant development and physiological functions. Computational analysis of Arabidopsis thaliana and Oryza sativa genomes for SAGA complex resulted in the identification of 17 to 18 potential candidates for SAGA subunits. We have further classified the SAGA complex based on the conserved domains. Phylogenetic analysis revealed that the SAGA complex proteins are evolutionary conserved between plants, yeast and mammals. Functional annotation showed that they participate not only in chromatin remodeling and gene regulation, but also in different biological processes, which could be indirect and possibly mediated via the regulation of gene expression. The in silico expression analysis of the SAGA components in Arabidopsis and O. sativa clearly indicates that its components have a distinct expression profile at different developmental stages. The co-expression analysis of the SAGA components suggests that many of these subunits co-express at different developmental stages, during hormonal interaction and in response to stress conditions. Quantitative real-time PCR analysis of SAGA component genes further confirmed their expression in different plant tissues and stresses. The expression of representative salt, heat and light inducible genes were affected in mutant lines of SAGA subunits in Arabidopsis. Altogether, the present study reveals expedient evidences of involvement of the SAGA complex in plant gene regulation and stress responses.


Introduction
The regulation of gene expression is accomplished by the coordinated action of multiple events to ensure a perfect synchrony of cellular activities from chromatin modification to mRNA formation [1][2][3][4]. Gene regulation in eukaryotes requires association of pre-initiation complex (PIC), transcription factors and activators at promoters [1,5,6]. One well-known mechanism for transcriptional activation suggests that activator proteins interact with promoter to recruit the components of transcriptional machineries and co-activators such as Transcription Factor II D (TFIID) complex, SAGA and mediator complexes [7][8][9]. The SAGA complex, a group of multi-protein complex, is important to induce the transcription of a subset of RNA polymerase II-dependent genes [10][11][12]. Indeed, the SAGA complex is a perfect archetype for multi-subunit histone modifying complexes and co-activator which regulates transcription by RNA polymerase II [13][14][15]. The first member of the SAGA complex family was isolated in budding yeast Saccharomyces cerevisiae [16]. The 1.8 megadalton S. cerevisiae SAGA complex is composed of 20 conserved proteins and contains different classes of transcriptional co-activator proteins such as SPT (Suppressor of Ty insertions), ADA (alteration/deficiency in activation), GCN5 (general control non-depressive), TAF (TBP-associated factors) proteins and DUBm (deubiquitylation module) [17]. These proteins are organized into different functional and structural sub-modules and thereby executing several cellular functions: nucleosomal histone acetyltransferase (HAT), histone deubiquitinylation, TATA-binding protein (TBP) binding and activator binding [10,13,18].

Plant's genome database search for identification of SAGA complex
National Centre of Biotechnology Information (NCBI); TAIR (The Arabidopsis Information Resource) and RAP (Rice Genome Annotation Project) databases were used for the screening of the SAGA complex in Arabidopsis, O. sativa and other plant genomes. Protein sequences of S. cerevisiae and human SAGA complex components (Table 1, S1 and S2 Tables) were used as queries to execute a BLASTP program against the protein sequences of Arabidopsis and O. sativa.

Alignment and phylogenetic analysis
Clustal-X version 1.83 software program was used for multiple sequence alignment of the protein sequences [33]. The aligned sequences were further used as input to create phylogenetic trees with the Neighbor-Joining method using a Jones-Taylor-Thornton (JTT) model. Bootstrapping was performed, involving 1000 replicates, to represent the evolutionary history of the group analyzed. The evolutionary distance was computed in MEGA 6.06 version [34].

Domain analysis and chromosomal localization
The domain analysis was performed by CDD (Conserved Domain Database) and Pfam (protein families database) with an e-value 1.0. Chromosome Map Tool database was used to define the position of the SAGA complex genes on Arabidopsis chromosomes [35]. "Paralogous in Arabidopsis" were used for determining the gene duplications and their existence of duplicated segments on chromosome with parameters set to a threshold above 6 per block for paired proteins [36].

Conserved motif analysis
The cis-regulatory elements/motifs were analyzed in 1000 bp upstream from the transcription start site (TSS) by using web based database Plant cis-acting regulatory DNA elements (PLACE) and Plant Cis-Acting Regulatory Elements (PlantCARE) databases and portals [37,38].

In silico microarray expression and protein interactome analysis
Microarray experiments data from Genevestigator database and analysis toolbox were employed to determine the gene expression profile of Arabidopsis and O. sativa SAGA complex genes in different tissue [39]. The cDNA signatures from Massively Parallel Signature Sequencing (MPSS) were used to count the number of corresponding mRNA molecules produced by each gene of Arabidopsis and O. sativa SAGA complex [40]. A protein-protein interaction network, for the prediction of functional associations within SAGA complex proteins, was prepared using the STRING database with a confidence threshold score of 0.6. [41]. The network was showed in the 'evidence' view, whereby lines linking proteins signify the category of evidence used in anticipating the association or interaction.

Functional annotation and co-expression analysis
Functional annotation and Gene Ontology analysis were performed from TAIR and agriGO [35,42]. Co-expression analysis for gene pairs and co-expressed gene network analysis for each SAGA gene was acquired from ATTED-II (The Arabidopsis trans-factor and cis-element prediction database) version c4.1 [43].

RNA extraction and Real-time PCR analysis
Total RNA was extracted from the flowers, leaves, roots, seedlings, stems and siliques as well as from treated leaves by Sigma's Spectrum plant total RNA isolation kit. The integrity of RNA, after DNase I treatment, was confirmed by agarose gel electrophoresis. Two microgram of total RNA was used as a template for first-strand cDNA synthesis using the Superscript-II RT kit (Invitrogen). Real-time PCR (qRT-PCR) gene expression analysis was performed and detected by using an ABI's 7500 Fast Real-time PCR machine [44]. Gene specific forward and reverse primers were designed by using ABI's-Primer express v2.0 software (S3 Table). The transcripts were normalized using Ubiquitin-10 (Ubq10, At4g05320) transcripts that work as internal control. The relative expression level of target genes was analysed by ΔΔCt method.

Identification and classification of SAGA complex subunits in plants
The SAGA complex is a multiple subunit protein complex and is highly conserved among human, S. cerevisiae and Drosophila [13,17]. The putative SAGA genes were identified in Arabidopsis and O. sativa genomes using protein sequences of S. cerevisiae and human SAGA genes as queries against the protein databases of Arabidopsis and O. sativa (NCBI, TAIR and RAP) (S1 and S2 Tables). We identified four protein subunits in the ADA group of the SAGA complex, viz. ADA1, ADA2b, ADA3 and GCN5 (ADA4) ( Table 1 and S2 Table). ADA2b (At4g16420) and GCN5 (At3g54610) have been previously studied in plants [7,28,30,45,46]; however, ADA1 and ADA3 proteins are yet to be characterized in plants. Two ADA1 proteins were identified each in Arabidopsis (At2g14850 and At5g67410) and O. sativa (Os12g39090 and Os03g55450) genome as homologs of S. cerevisiae and human ADA1 (Table 1 and S2 Table). Similarly, Arabidopsis (At4g29790) and O. sativa (Os01g73620) ADA3 were identified as homologs of S. cerevisiae and human ADA3 (Table 1 and S2 Table).
In S. cerevisiae and human, three to four proteins-SPT3, SPT7, SPT8 (not present in humans) and SPT20, have been reported in the SPT group of the SAGA complex (Table 1). Our study identified SPT3 and SPT20 proteins in Arabidopsis and O. sativa. Interestingly, the human SPT3 displays extensive sequence similarity to the histone fold motifs of TAF13 in its N-terminal region [49,50]. We found conserved domain TAF13 in Arabidopsis At1g02680 and O. sativa Os01g23630 (Table 1 and S2 Table). The SPT20 domain was found to be conserved in Arabidopsis At1g72390 and O. sativa Os01g02860 proteins (Table 1 and S2 Table). In earlier studies, a low level of similarity was reported between SPT3 (30%) and SPT20 (32.5%) homologs of S. cerevisiae and human (S2 Table) [51,52]. The SPT7 protein contains Bromodomain, a motif found in several transcription factors and co-activators, which is responsible for the acetylation of histones and transcriptional activation [53][54][55]. In Arabidopsis, 29 Bromo-domain-containing proteins are reported [56]. The BLAST analysis suggested that Arabidopsis At1g32750 (e-value 3e-07) and O. sativa Os06g43790/Os02g38980 (e-value 2e-07/1e-07 and protein similarity 29 /25%, respectively) have the highest protein sequence similarity to S. cerevisiae SPT7 and particularly to its Bromo-domain region. However, human STAF65/ SUPT7L (homolog of yeast Spt7) BLAST analysis revealed extremely low protein similarity and insignificant e-value of the search Spt7 homolog in Arabidopsis and rice genome. SPT8 protein of S. cerevisiae contains WD40 domain repeats and facilitates TBP interaction [8]. Arabidopsis and other plants encompass more than 200 putative WD40 domain containing proteins [57]. Arabidopsis At5g08390 and At5g23430 displayed protein similarity with corresponding S. cerevisiae SPT8. However, in plant genome, a large number of plant proteins comprising either Bromo-domain or WD40 domain, exhibited a substantial level of similarity with the Bromo-domain for SPT7 and the WD40 domain for SPT8, henceforth, further biochemical evidence is required to validate these subunits of the SAGA complex in the two plant species, Arabidopsis and O. sativa.
Interestingly, several TAFs subunits are shared by several complexes like TFIID, SAGA, SLIK (SAGA-like complex), and STAGA (SAGA altered, SPT8 absent) as earlier reported in S. cerevisiae and human [58]. Lago et al., 2004 explained about different TAFs and their conserved domain structures in Arabidopsis [59]. The TAF proteins in the SAGA complex include-TAF5, TAF6, TAF9, TAF10 and TAF12. However, our genome-wide similarity search analysis identified two candidate proteins representing TAF12 in O. sativa (Table 1), unlike only one protein reported previously [59].
Apart from these four groups, some other components also present in the SAGA complex, such as CHD1 (chromo-domain helicase DNA binding protein 1), TRA1 (Transcription-associated protein 1) and SGF29 (SAGA-associated factor 29) ( Table 1). The CHD subfamily-I chromatin remodeling proteins, S. cerevisiae CHD1 and human CHD2, share 45% protein similarity (S2 Table) [60]. BLAST searches identified Arabidopsis CHR5 (At2g13370) and O. sativa CHD (OsJ_25446) as homologs of S. cerevisiae CHD1 and human CHD2 (Table 1 and S2 Table). Further, we also identified two proteins, At3g27460 and At5g40550 in Arabidopsis encoding SGF29, as reported recently [61] and one protein in O. sativa (Os12g19350) ( Table 1 and S2  Table). TRA1 is a representative of a group of proteins that include DNA-dependent protein kinase catalytic subunit, ATM (Ataxia telangiectasia mutated) and TRRAP (transformation/ transcription domain-associated protein), with the carboxyl-terminal regions related to phosphatidylinositol 3-kinases [62]. We identified two TRA1 protein orthologs in Arabidopsis (At2g17930 and At4g36080) and one in O. sativa (Os07g45064) with the corresponding S. cerevisiae TRA1 and human TRRAP (Table 1 and S2 Table). In some reports, RTG2 protein has been considered as a subunit of the SAGA complex [47], whereas sometimes it has been suggested as a variant of the SAGA complex, SLIK [1,17,63]. Further biochemical evidences are required to validate the presence of RTG2 in Arabidopsis and O. sativa SAGA/SLIK complex.

Conserved domains in plant SAGA complex
The protein domains of Arabidopsis and O. sativa SAGA subunits, identified with the corresponding domains of S. cerevisiae and human SAGA subunits, is presented in Fig 1 Table 2). The domains of plant SAGA components share moderate (30-50%) to the high (50% and above) similarity with their counterparts in S. cerevisiae and human excluding FAT-domain and chromo-domain (Table 2    and Ostreococcus lucimarinus). In order to evaluate the molecular evolutionary relationship and conservation among SAGA protein components in different organisms, we aligned the different SAGA subunits and constructed a phylogenetic tree for each group. The phylogenetic tree analysis inferred immense conservation among the SAGA protein domains in S. cerevisiae, mammals, Arabidopsis, O. sativa, algae, bryophyte and Drosophila (Figs 2 and 3; Table 3 and S2 Fig). In the case of ADA group, three clades were exhibited for each SAGA subunit. The first and second clades comprised GCN5 (ADA5) and ADA1 proteins, respectively, while the third clade further divided into sub groups-ADA3 and ADA2b (Fig 2). In the phylogenetic Plants SAGA Complex analysis of ADA proteins from various organisms fall in a similar clade, excluding SpADA3 and DmADA3, which were close to the ADA1 clade (Fig 2). Similar to ADA group, other groups made several clades based on their similar protein domain specific phylogenetic tree analyses (Fig 3). In the phylogenetic tree of TAFs group, CrTAF5 and OlTAF12 proteins were present in different clades (S2 Fig). The phylogenetic tree constructed from plant SAGA proteins revealed that these proteins diverge into monocots and dicots (Figs 2 and 3). Based on the phylogenetic tree analysis, most protein domains in the SAGA subunits were remained extremely conserved in S. cerevisiae, mammals, Arabidopsis, O. sativa, algae, lycopsida, bryophyte and Drosophila during the course of evolution ( Table 3).
The phylogenetic trees were also constructed using the representative domain sequences of each protein of the SAGA complex of Arabidopsis, D. melanogaster, mammals (H. sapiens and  The analysis of phylogenetic tree from different domain of ADA protein groups of the SAGA complex showed two clades, the first clade comprised of ADA3, GCN5, SWRIM-ADA2b and SANT-ADA2b that represents HAT modules of the SAGA complex. However, the ZZ-ADA2b domain, which is involved in interaction with GCN5, presented with ADA1 domains in the second clade (S3A Fig). The phylogenetic tree analysis of full length protein sequences indicated that ADA1 forms a different group from other ADA protein groups (Fig 2). The apparent reason behind the presence of two ZZ-ADA2b and ADA1 domains in one group might be that both protein domains are involved in proteinprotein interactions. The phylogenetic tree constructed from each domain of the DUBm SAGA complex subunits suggested that SGF11 and Peptidase C19D-UBP domains were present in the same clade (S3C Fig), which is against the result obtained in the phylogenetic tree with full length protein (Fig 3A). Notably, several paralogs were found for SAGA complex components in selected plants, mainly in G. max and P. trichocarpa (Table 3). Moreover, a variation was observed in the total number of SAGA complex components among dicots, as compared to monocots ( Table 3). The variation in the number of the SAGA complex subunits suggested that these components could have been executed to accomplish the distinct and specialized roles in plants.

Chromosomal distribution and functional annotation of plant SAGA complex
The Arabidopsis Genome Initiative provides the opportunity to identify the instances of chromosomal block duplication in the genome [64]. We intended to investigate, whether proteins encoding for the SAGA complex are associated with chromosomal block duplication in Arabidopsis. We used TAIR chromosome map viewer and Paralogons in Arabidopsis for the localization of the SAGA components across the five chromosomes (S4 Fig). Most of these SAGA complex proteins were in the duplicated segmental regions of Arabidopsis chromosome [36]. Moreover, we also identified that some of the Arabidopsis and O. sativa SAGA subunits were found in more than one copy such as ADA1, TAF6, TAF9, SFG29 and TRA1. Thus, it seems that some of the SAGA proteins were duplicated during evolution.
The functional characterization analysis showed that Arabidopsis and O. sativa SAGA complex components play a key role in gene expression, transcription initiation, complex assembly and several metabolic and cellular processes (Fig 4). Gene Ontology predicted that plant SAGA complex components also participate in a transcription regulator activity, binding, catalytic activity as well as in the development of cell and organelle parts (S4 Table). Recent studies suggest participation of some of the plant SAGA complex subunits, for example Arabidopsis ADA2B, SGF29 and GCN5, in the light- [29], cold- [28,65] and salt-induced [61] gene expression, flower development [66], histone acetylation [30,45]. The functional characterization analysis also indicated that their involvement in auxin, cytokinin, ethylene and jasmonic acid mediated signaling pathways (S4 Table). In sum, functional and GO analysis predicated the involvement of the plant SAGA complex not only in chromatin remodeling, but also in abiotic and biotic processes.

Protein-protein interactome analysis of Arabidopsis SAGA complex
To examine interactions among Arabidopsis SAGA complex components, we mapped the SAGA proteins over STRING interactome, a database of known and predicted protein   interactions [41]. The analysis of Arabidopsis SAGA component proteins revealed an interconnected sub-network of 131-hub proteins (confidence score 0.6, Fig 5 and S5 Table). These analyses suggested that many hub proteins create a network which behaves as a functional module within the complex. Moreover, the protein-protein interaction analysis of S. cerevisiae SAGA proteins using the STRING database displayed 190-protein interactions with a confidence score of 0.6 (S6 Table). Interestingly, most of these protein-protein interactions were similar in the SAGA proteins of Arabidopsis and S. cerevisiae (S5 and S6 Tables). The mutation and biochemical characterization studies in S. cerevisiae and mammals established that these interactions are essential for SAGA structure and its stability. For instance, any alteration in SPT7, SPT20, TAF5, TAF10, or TAF12 affects the SAGA composition and integrity [67][68][69]. In silico expression analysis of the SAGA complex encoding genes Gene expression profiles of the SAGA complex components can provide significant evidences for their potential functional roles. The functional annotation of SAGA components in Arabidopsis and O. sativa revealed their diverse roles in plant development (Fig 4). To further validate, we examined the expression profile of the SAGA complex components in different tissues using Genevestigator microarray database and its expression meta-analysis tool [54], and MPSS database [55]. The expression profile of the SAGA complex encoding genes was examined in 9 different plant organs of Arabidopsis and O. sativa (Fig 6). AtTaf10, AtGcn5 and AtChr5 were expressed at low levels in all the examined developmental stages (Fig 6A). However, the expression of AtAda1a, AtTaf12b, AtTaf6b and AtTra1a was higher in the aforesaid developmental stages. In the case of O. sativa, SAGA subunit genes were found highly expressed in booting, seedling, milk, flower and stem elongation stages (Fig 6B). During germination, transcript accumulation was observed at higher levels for AtTaf6/6b, AtTra1a, AtAda2b, AtTaf1a, AtTaf12b and AtUbp22 genes in Arabidopsis, whereas, for OsAda2b, OsAda3, OsTaf5, OsTaf12/12b, OsTaf1a, OsUbp22, OsSus1, OsSgf11 and OsTaf9 genes in O. sativa. In the booting stage of O. sativa, OsTaf5, OsTaf13, OsTaf12/12b, OsGcn5, OsAda2b, OsAda1a, OsSgf29, OsAda3, and OsSgf11 were among the highly expressed genes, whereas, OsSus1, OsSgf29, OsUbp22, OsTaf6 and OsSgf11 were the genes that highly expressed during dough developmental stages. The meta-analysis displayed an enhanced expression of SAGA component genes in the endosperm (micropylar, peripheral and chalazal), seed coat, suspensor callus and primary cells of Arabidopsis (Fig 7A), whereas in callus, sperm cells, panicle, leaf, pistil, stigma, ovary and root tip of O. sativa (Fig 7B). The results suggest a diverse role of SAGA component genes being expressed throughout different developmental phases in distinct plant organs and tissues. Data was extracted from the MPSS database library (17 and 20 bases), representing 12 and 13 different anatomical parts of Arabidopsis and O. sativa, respectively. These signatures uniquely recognize specific gene, which show a perfect match (100% identity over the tag length), and signify a quantitative estimation of expression of that gene. These MPSS tags further confirmed transcript abundance of SAGA protein encoding genes in different plant parts (S7 and S8 Tables). Transcript differences are generally presented by the total number of tags (TPM, transcripts per million), low expression if smaller than 25 TPM, moderate expression if 26 to 250 TPM, while highly expressed in case of >250 TPM. Based on these signatures/tags, five Arabidopsis genes viz., AtAda3, AtAda1b, AtTaf12, AtTra1a and AtSgf29 were expressed at low levels, whereas AtTaf10 expressed at a higher level in leaf, root, siliques and callus (S7 Table). Other Arabidopsis SAGA genes exhibited a moderate level of transcript accumulation. MPSS analysis in O. sativa showed that OsAda2b, OsTaf10 and OsTra1 expressed at higher levels (>250 TPM). The maximum transcript abundance was observed for OsAda2b in mature leaves and for OsTaf10 in young leaves, ovary and mature stigma and callus, whereas OsTra1 was significantly expressed in most of the plant parts, except germinating seed and stem. The SAGA genes, OsGcn5, OsAda1a, OsSpt20, OsSgf11, OsTaf9b and OsTaf12 expressed at low levels, whereas others at moderate levels (S8 Table).

Co-expression analysis for gene pairs and gene network analysis of the SAGA complex
The expression profiles of SAGA components in Arabidopsis and O. sativa using Genevestigator and MPSS revealed that many of the components have distinct tissue-specific expressions.   We further examined whether these genes co-express during plant development or in any other physiological condition. The co-expression data for each SAGA component gene pairs were generated from ATTED-II database, which includes 1388 microarray experimental data [43]. The strength of co-expression for the interconnecting gene pairs was determined by Mutual Rank (MR) process using these microarray data. Forty-two significant co-expression patterns (Table 4) were obtained between SAGA components from 171 co-expressing gene pairs (S9 Table). These co-expression patterns were identified under different biotic, abiotic, hormonal and tissue conditions, for example, co-expression analysis of gene pairs data showed that Ada2b was strongly co-expressed with Taf6, Taf13 and Spt20 genes in all the developmental and environmental stress conditions. Likewise, Ubp22 co-expressed with Sgf29b and Taf1b with Tra6 and Taf13, at high MR values. The significant MR values for Taf1b, Spt20, Tra1, Taf9, Gcn5, Ada2b and Ada3 suggest their co-expression at the tissue level. The genes, Spt20, Ada2b, Taf12b, Chr5, Taf9 and Taf10 showed co-expression in abiotic stress conditions (Table 4). Under hormonal condition, Ada2b, Ada3, Chr5, Taf6, Taf10, Taf12b, Taf13 and Tra1a exhibited a substantial level of co-expression strength, whereas Spt20 was found to be co-expressed with Chr5 in biotic stress condition ( Table 4). The co-expressed gene network analysis was done to identify the genes, which co-regulate with the SAGA complex (S5 Fig). Co-expressed gene network provides the evidence of highly interconnected expression modules of a subset of genes, which additionally show another layer of regulation, and consequently the complementary evidences to understand gene function network. A total of 181 proteins was found to be co-regulated with SAGA complex components (S10 Table). Approximately 36% of 181 proteins are recognized to be involved with regards to abiotic or biotic stimulus or stress, developmental processes, transcription regulation, signal transduction and other biological processes (S6 Fig). This analysis further indicates a potential role of the SAGA complex in regulating plant development and responses to various physiological stresses.

Expression analysis of the SAGA complex subunit during developmental stages and stress conditions
We performed a quantitative gene expression analysis of ten representative SAGA components by QRT-PCR in different Arabidopsis tissues: flowers, mature leaves, siliques, six-day-old seedlings, stems and roots (Fig 8A). The gene expression profile of the SAGA complex components was plotted with reference to the expression of ubiquitin. The genes of the SAGA complex, although expressing at a lower level compared with Ubiquitin, showed consistent expression in almost all the examined plant parts (Fig 8A). These indicate the involvement of the SAGA complex in gene regulation throughout the plant body. However, there were certain components that showed spatial preference, for example, the expression of Spt20 was relatively higher in root and leaf, whereas, Sgf11 in leaves and seedlings than other examined tissues.
The effect of high temperature and salt concentrations was also examined on the expression pattern of the SAGA components in Arabidopsis. The excised leaves of Arabidopsis were either exposed to a high temperature at 37°C for 2 hr or 150 mM NaCl for a period of 24 hr for high salt stress condition, and the gene expression of the SAGA components was compared with their respective controls. The gene expression of the most of selected components of the SAGA complex was induced under elevated salt concentration ( Fig 8B) and high temperature ( Fig 8C); however, the fold of induction varies for different components. Interestingly, Sgf29b expression was suppressed in salt treatment condition ( Fig 8B). Thus, the qRT-PCR results suggested the significance of SAGA components gene expression in plants during abiotic stresses.
As discussed above, in silico co-expression analysis of the SAGA complex subunits suggested that these subunits were co-expressed in the tissues, hormones and stress conditions. Notably, the quantitative gene expression analysis of selected SAGA components further supported the co-expression analysis, such as Spt20 and Chr5, Taf13 and Tra1, Spt20 and Chr5 showed high co-expression with significant MR value in tissue; while Spt20 and Ubp22, Sgf11 and Tra1, Spt20 and Chr5, Gcn5 and Taf6 were co-expressed and considerable MR value in abiotic stress. The qRT-PCR analysis is in agreement with the in silico co-expression profile ( Table 4, Fig 8 and S9 Table).

SAGA complex regulates expression of heat, salt and light-induced genes
SAGA complex facilitates the PIC assembly in the core promoter region of yeast and human genes [70][71][72][73][74]. Little is known about how SAGA complex facilitates gene regulation in plants.
To address this, RNA was isolated from seven homozygous T-DNA SAGA subunit Arabidopsis mutants and wild type plants, grown under different conditions such as light/dark, high salt or heat stress (Fig 9 and S7 Fig). The gene expression of light induced (At1g67090 and At4g02770) [2,75], salt induced (At2g40140 and At1g56600) [76,77] and heat induced (At1g71000 and At5g12030) [78] genes was examined in these mutant in comparison to the wild type plants by qRT-PCR (Fig 9A). The expression of both the light activated genes was considerably reduced in all the mutants, except in sgf11 − for both the genes and in gcn5 − for At4g02770, in which the relative expression values were not statistically significant (Fig 9B). In the case of salt stress, the expression level of both the salt induced genes declined in mutants as compared to the wild type, except At1g56600 in taf13 − , which was statistically not significant ( Fig 9C). Under heat stress, expression of the heat activated genes was decreased in mutants, except At5g12030 in gcn5 − and taf13 − and At1g71000 in sgf11 − mutant which were not statistically significant (Fig 9D). These results anticipated that SAGA complex plays significant roles in the transcription regulation of stress inducible genes.

Discussion
The SAGA complex has been previously shown to be associated with transcriptional regulation of~10% RNA polymerase II-dependent S. cerevisiae genes, which contribute in response to DNA damage and other stress conditions such as heat, oxidation, and metabolic starvation [71,72,79]. A recent report indicates that SAGA complex regulates all active genes and present at their promoters and transcribed regions [80]. With the computational approach, we identified 18 putative SAGA complex subunits in Arabidopsis and O. sativa. The protein similarities among Arabidopsis and S. cerevisiae SAGA complex subunits are low (17%) to medium (51%), as observed between S. cerevisiae and human SAGA complex (15% to 56%; S2 Table). Since the SAGA complex is involved in the fine-tuning of gene expression, this could be one of the reasons for the poor protein similarities. Our results on in silico expression, GO analysis and qRT-PCR of plant SAGA complex representative genes suggested their role in various cellular, physiological and molecular processes. The previous reports on the functions of ADA2b, GCN5, TAF10, TAF6 and SGF29 in plants are in accordance with our study, suggesting conservation of the SAGA complex throughout evolution [28,46,61,[81][82][83]. Thus, the presence of conserved domain is helpful in identifying most of the putative members of plant SAGA complexes in different plant organism databases. Beside the low level similarity in full protein sequence (S2 Table), most of the domains present in plant SAGA complex encoding genes were found conserved among different organisms (Figs 2 and 3; Table 2). The similarity between conserved domain's amino acid sequences of Arabidopsis SAGA was observed higher, i.e. from 30% to 97% (Table 2 and S1 Fig). Notably, similar range of similarities was found between the key domains of the SAGA complex in S. cerevisiae and human (Table 2 and S1 Fig). On the basis of protein or conserved domain similarity and phylogenetic analysis, our results altogether suggested that plant SAGA complex was observed to be closer to the human than that to the yeast SAGA complex (Figs 2 and 3; Table 2 and S2 Table).
Our analysis of protein alignment, phylogenetic tree and chromosomal distribution suggested that many plant SAGA complex representative genes might have duplicated during evolution (Figs 2 and 3; S2 Fig). For example, Taf6, Taf9, Taf12, Ada1, Tra1 and Sgf29 have been found duplicated in either O. sativa or Arabidopsis. Besides these genes, other SAGA subunit genes are also found duplicated in other lower and higher plant groups (Table 3). This duplication event may also lead to variability in the SAGA complex components in plants like Ada2acontaining (ATAC), SLIK/SALSA or STAGA [74], or sometimes shares subunits with other complexes like TFIID [68]. The protein interactome analysis suggested that Arabidopsis SAGA complex proteins interact with each other and thus further suggested their conservation in plants (Fig 5). The structural integrity of the SAGA complex is dependent on the protein-protein interactions as evident in our study, and also discussed in previous reports; such as TAF10 and TAF12 associate directly via their histone fold domains with SPT7 and ADA1, establishing SPT7-TAF10 and ADA1-TAF12 heterodimer, respectively [84,85], whereas TAF5 interact with ADA1, ADA3 and SPT7 [69].
Our results suggested that the SAGA complex encoding genes expressed in most of the plant parts and playing an essential role in plant development. Previous reports in Arabidopsis, gcn5 − exhibit pleiotropic developmental abnormalities, such as abnormal meristem role, dwarfism, loss of apical dominance, defects in floral organ identity [28,29,31,[86][87][88]]. An insertion of T-DNA elements in the Arabidopsis Ada2b produces a dwarf phenotype with defects in root and shoot development [28,87,89]. Arabidopsis sgf29a − shows a little delay in leaf and flower development [61]. Importantly, some reports on plant TAFs (TAF5, 6 and 10) indicated their indispensable role in plant development [81,83,90]. Notably, SAGA complex is also critically involved in developmental aspects and is indispensable for viability in metazoan [11]. Recently, ubiquitin protease activity of the SAGA complex showed significant regulation of the expression of the tissues specific genes and developmental processes in Drosophila [91]. In Drosophila, loss of SAGA subunit functions, such as ADA2b, SGF11 and Nonstop protein (homolog of ENY2), display photoreceptor axon targeting defects, whereas, GCN5 has an essential role in the development of eye and wing disc [92,93]. While, mice TAF9b and GCN5 are required for the regulation of genes during neuronal and mesoderm development [94,95]. These accumulating evidences indicate that the functions of the SAGA complex in higher organisms involve more sophisticated mechanisms in regulation of gene expression than unicellular counterpart like S. cerevisiae during development processes.
SAGA complex expedites the gene expression that anticipates to various environmental cues such as DNA damage and abiotic stress conditions [12]. Many reports, as discussed above, reveal that the SAGA complex is directly or indirectly contributing in various developmental and stress regulated processes, for example, arsenite stress conditions [52] osmotic stress [96] and ultraviolet induced [97]. The yeast SAGA complex also takes part in the up-regulation of several genes during environmental stress, including carbon starved condition [71]. Our results support the stress inducible expression of several SAGA components in Arabidopsis. Interestingly, the promoter sequence analysis of the SAGA components revealed several stress responsive cis-motifs (S11 Table), indicating their involvement in transcription regulation activities in response to stress. Nevertheless, further experimentation is needed to validate the involvement of these motifs in the regulation of the SAGA component genes. The expression analysis of the SAGA subunits supports its potential roles in response to environmental cues (Figs 6 and 7). These results are in accordance with the earlier published reports on plant ADA2b, GCN5 and SGF29a [26,28,46,61]. Arabidopsis ada2b-1 − mutant displays enhanced hypersensitivity to salt and abscisic acid stress than wild-type plants [26,61]. Although, loss of SGF29a function displays salt stress tolerance, the gene expression level of stress-related genes markers such as COR78 (cold regulated 78), RAB18 (responsive to aba 18), and RD29b (responsive to desiccation 29b) are lower in sgf29a − mutant after 3 hr of NaCl treatment [61]. Arabidopsis HAT protein GCN5 and co-activator ADA2b proteins play significant roles in cold responses and loss of functions of these proteins showed a decline in the expression of several cold-regulated genes [27,28,98]. Altogether, the property of the SAGA complex in the regulation of stress genes is not only well maintained within plants, but also comparable to S. cerevisiae or human [71,74].
In conclusion, we identified 18 subunits of the SAGA complex in Arabidopsis and O. sativa. The protein similarities at the level of conserved domain indicate that the SAGA complex is conserved in eukaryotes such as S. cerevisiae, plants and mammals. The expression analysis of the SAGA components indicates that the networks of SAGA complex are involved in various biological processes in plants, including development, physiology and response to environmental stresses via gene regulation. This study advances our understanding about SAGA components and their different functions in plants. The co-expressed gene networks are drawn based on their rank of correlation from ATTED-II database. Orange line displays conserved co-expressed which is inferred from the comparison with mammalian coexpression data provided from COXPRESdb; Red dotted line display protein-protein interaction information that is provided from TAIR and IntAct. The octagon shape indicates transcription factor genes. White circles shape indicates SAGA complex genes which were used to give input for generating gene network. Gray circle shape indicates other genes in co-expressed gene networks.   Table. List of co-expression genes in Arabidopsis. (PDF) S10 Table. List of genes obtained from ATTED-II for Arabidopsis SAGA complex coexpressed gene network analysis. (PDF) S11 Table. Analysis of cis-regulatory element in 1000bp upstream promoter sequences from TSS in SAGA complex subunit genes using PlantCARE and PLACE database. (PDF)