Identification and expression of GRAS family genes in maize (Zea mays L.)

GRAS transcriptional factors have diverse functions in plant growth and development, and are named after the first three transcription factors, namely, GAI (GIBBERELLIC ACID INSENSITIVE), RGA (REPRESSOR OF GAI) and SCR (SCARECROW) identified in this family. Knowledge of the GRAS gene family in maize remains was largely unknown, and their characterization is necessary to understand their importance in the maize life cycle. This study identified 86 GRAS genes in maize, and further characterized with phylogenetics, gene structural analysis, genomic loci, and expression patterns. The 86 GRAS genes were divided into 8 groups (SCL3, HAM, LS, SCR, DELLA, SHR, PAT1 and LISCL) by phylogenetic analysis. Most of the maize GRAS genes contain one exon (80.23%) and closely related members in the phylogenetic tree had similar structure and motif composition. Different motifs especially in the N-terminus might be the sources of their functional divergence. Segmental- and tandem-duplication occurred in this family leading to expansion of maize GRAS genes and the expression patterns of the duplicated genes in the heat map according to the published microarray data were very similar. Quantitative RT-PCR (qRT-PCR) results demonstrated that the expression level of genes in different tissues were different, suggesting their differential roles in plant growth and development. The data set expands our knowledge to understanding the function of GRAS genes in maize, an important crop plant in the world.


Introduction
The GRAS gene family is an important plant-specific transcription factor family whose name is an acronym of the first three identified members: GIBBERELLIC ACID INSENSITIVE (GAI), REPRESSOR OF GA1 (RGA), and SCARECROW (SCR) [1]. Typically, GRAS proteins are 400-700 amino acids and exhibit some C-terminal homology. A common characteristic of all GRAS proteins is the presence of 5 carboxy-terminal motifs in the order of: leucine heptad repeat Ⅰ(LHRI), VHIID motif, leucine heptad repeat Ⅱ(LHRⅡ), PFYRE motif, and SAW motif [2][3]. Leucine heptad repeats are frequently found in bZIP transcription factors, which are PLOS ONE | https://doi.org/10.1371/journal.pone.0185418 September 28, 2017 1 / 17 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 important for protein-protein interactions [4]. The VHIID and PFYRE motifs are consistently found and highly conserved in GRAS proteins. The SAW motif is less conserved, but has 3 conserved pairs of amino acids found with consistent spacing: R-E, W-G, and W-W. While the function of these motifs is unknown, their highly conserved nature suggests that they are critical to GRAS protein function [5]. These conserved motifs can directly affect the function of GRAS proteins; in fact, mutations in the SAW and PFYRE motifs of SLR1 and RGA proteins result in huge phenotypic variation in Arabidopsis [6][7]. The order of these conserved motifs is similar across most GRAS proteins. In contrast, the N-terminus is variable, except for DELLA subfamily that contains 2 conserved motifs (named DELLA and TVHYNP) and the variable length and sequence of N-terminus seems like the major contributor for their genespecific functions [8,9,10]. The GRAS family is divided into 7 subfamilies at first, DELLA, SCARECROW (SCR), LATERAL SUPPRESSOR (LS), HAIRY MERISTEM (HAM), Phytochrome A signal transduction 1 (PAT1), SHORT-ROOT (SHR) and SCARECROW-LIKE9 (SCL9) [1,4]. Then the SCL9 was renamed as LISCL, and a new subfamily SCL3 was established and the GRAS gene family was divided into eight distinct branches, namely LISCL, PAT1, SCL3, DELLA, SCR, SHR, LS and HAM, based on studies on the model plants Arabidopsis and rice [7]. Later, the GRAS family was divided into ten subfamilies, DELLA, AtSCR, AtLAS, HAM, AtPAT1, AtSHR, AtSCL3 and LISCL, DLT, AtSCL4/7 [9]. In other studies, the GRAS genes can be divided into at least 13 branches [11] in Populus, Arabidopsis and rice, and 16 branches [12] in Medicago truncatula. So far, the GRAS gene family has been genome-wide explored in several plant species, including Populus, Arabidopsis, rice, Chinese cabbage, Prunus mume, tomato, sacred lotus, grapevine, Isatis indigotica, Medicago truncatula, Castor Beans, and pine [11,12,13,14,15,16,17,18,19,20]. The GRAS gene family plays a crucial role in diverse plant growth and development processes, including gibberellin signal transduction [21,22], phytochrome A signal transduction [23,24], axillary meristem initiation [25], shoot meristem maintenance [26], root radial patterning [27,28]. For example, PAT1 and SCL21 are mainly involved in phytochrome A (phyA) signal transduction pathways in Arabidopsis thaliana. Light signaling via the phyA photoreceptor controls basic plant developmental processes, such as de-etiolation and hypocotyl elongation [24]. SCARECROW-LIKE 13 (SCL13) mainly involves in phyB signal transduction pathways to control basic plant developmental processes like PAT1 and SCL21 in Arabidopsis thaliana [29]. SCARECROW-LIKE 3 (SCL3) is involved in signal transduction pathways through gibberellins (GAs) and acts as a positive regulator to integrate and maintain a functional GA pathway by attenuating the DELLA repressors in the root endodermis [30]. DELLA subfamily is a representative subfamily of GA signaling that has been analyzed in detail. Gain-or loss-of-function mutants of the DELLA genes in Arabidopsis, maize, wheat, rice (Oryza sativa) and barley show GA-insensitive dwarf or GA constitutive response phenotypes [21, 31,32,33]. Studies of the Arabidopsis ga1-3 (RGA) and the rice SLENDER RICE 1 (SLR1) have demonstrated that these DELLA proteins function in the nucleus and are degraded rapidly when the plants are treated with GA [6,34]. The degradation of the DELLA proteins is thought to be an essential event in GA signal transduction. Mutations which lack the DELLA motif or the surrounding regions cannot be degraded by treated with GA, and show a GA-insensitive dwarf phenotype [6,35,36]. Previous studies showed that miR171 can regulate some GRAS genes. The miR171-targeted SCL transcription factors SCL6/SCL6-IV, SCLL22/SCL6-III, SCL27/SCL6-II (also known as hairy meristems [HAM] and lost meristems [LOM]) have been demonstrated to play an important role in the proliferation of meristematic cells, polar organization and chlorophyll synthesis [3,[37][38][39][40][41]. DELLA-regulated POR expression is, at least in part, mediated by miR171-targeted SCLs in light [42]. MOC1, a member of LS subfamily, is a key gene for controlling rice tillers, which may improve the production of crops [43].
Compared with other families of transcription factors, very few researches have explored the whole genome of the GRAS families. The identification of GRAS members in different species was slightly different among studies. There were 32 to 34 genes identified from Arabidopsis [10, 11, 44], 57 and 60 GRAS genes were identified in rice in two reports [10,11]. In addition, 68 GRAS transcription factors were identified in Medicago truncatula [12]. As more species have their genome sequenced available, more GRAS proteins could be identified among them. Furthermore, the genome-wide comparisons of GRAS family members may also be performed among several important species for evolutionary analysis.
Maize is one of the most crops in the world and it has tremendous value for providing food, forage, pharmaceuticals, and other industrial products. To improve root growth, plant height and seed size, it is necessary to explore the GRAS family in maize. With the availability of maize genome sequences [45], it is possible for us to identify all the GRAS family genes in maize and find the right gene which is very necessary for production or growth.
In this study, we conducted a genome-wide analysis for all the members of GRAS family in maize. The GRAS genes were identified with database on the website, conducted phylogenetic relationships and analyzed their protein structures and gene structures. We discerned their locations on the chromosomes and their expression patterns as well. Then, qRT-PCR was performed to confirm the expression patterns getting from the database. The data presented here is necessary to explore systematically the gene function of the maize GRAS family genes.

Genome-wide identification of GRAS family members in maize
It is possible to identify all GRAS gene family members in maize because the maize genome has been sequenced [45]. Here we identified 86 GRAS transcription factors (ZmGRAS1-Zm-GRAS86) from the maize genome and the gene location, the number of amino acids, molecular weight, theoretical pI, were analyzed and summarized in Table 1. The length of amino acid sequences encoded by ZmGRAS varied from 111 amino acids (aa) to 734 aa, and molecular weight ranged from 12308.9 to 72083.7 kDa and the pI varied from 4.4973 to 7.7965. The average value of pI was 6.44532, suggesting that the maize GRAS proteins tended to acidic. In addition, alternative splicing was found in 18 ZmGRAS genes, with 2 to 4 alternative splice forms (S1 Table).
However, inconsistent with our results, there were 104 and 112 GRAS genes of maize in the PlantTFDB website (http://planttfdb.cbi.pku.edu.cn/family.php?sp=Zma&fam=GRAS) and PlnTFDB website (http://plntfdb.bio.uni-potsdam.de/v3.0/fam_mem.php?family_id=GRAS&sp_ id=ZMA), respectively (S2 Table). The number of the GRAS genes in the two websites was greater than our results. After analysis carefully, we found that the 104 GRAS genes of maize in the PlantTFDB website were very different from the 112 GRAS genes in the PlnTFDB website. Genes written in the red words were the different genes between the two websites in S2 Table. The 104 GRAS genes in the PlantTFDB website included alternative splicing genes. If the "genes" containing splice variants were considered one gene, there remained only 86 GRAS genes which was consistent with our result above. For example, the gene "GRMZM2G015080"in the PlantTFDB website existed two transcripts which were considered to be two genes, but only one gene, actually. In addition to alternative splicing, most of the GRAS genes in the PlnTFDB website could not be found in the MaizeGDB website (http://www.maizegdb.org/gene_center/gene) probably owning to the low version. We downloaded the GRAS protein sequences of Arabidopsis thaliana, Medicago truncatula, Oryza sativa and Sorghum bicolor from the PlantTFDB website. Very few reports have been published on Zea mays GRAS proteins, to our knowledge, three of these genes were previously described, such as, D9 (ZmGRAS12) [45,46], D8 [ZmGRAS54) [45,47,48] and SCR (ZmGRAS48) [49,50] (Table 1). The relatively high number of GRAS genes in maize may be due to the maize genome experiencing tandem and large-scale segmental duplications [51].

Phylogenetic analysis of GRAS genes
To study evolutionary relationships between GRAS transcription factors, the sequences of Arabidopsis, Medicago truncatula, Oryza sativa and Sorghum bicolor were downloaded for alignment and were used to conduct phylogenetic tree by PHYLIP (Version 3.695) using Neighbor-Joining method [52,53]. The 86 maize GRAS proteins comprise 8 subfamilies (SCL3, HAM, LS, SCR, DELLA, SHR, PAT1 and LISCL) by clade support values, tree topology and Arabidopsis classification (Fig 1) [7]. Each of the SCR and LS subfamily contained only four maize GRAS genes and was the relatively small subfamily of the whole subfamilies. The number of maize GRAS genes in SHR, PAT1, DELLA, HAM, SCL3 was very similar about ten, while the LISCL subfamily was very different from the above subfamilies that contained the largest number of maize GRAS genes and this branch was also the largest branch in Fig 1,

ZmGRAS protein sequence alignments and conserved motifs
In Arabidopsis, GRAS proteins have 5 conserved domains in the C terminus, named as LHR I, VHIID, LHR II, PFYRE and SAW [2][3]. To identify conserved domains, we performed an alignment within ZmGRAS protein sequences using Clustal X, (Version 2.0) [54]. Multiple sequence alignments of the 86 predicted ZmGRAS proteins led to the discovery of five conserved C-terminal GRAS domains mentioned above, which were similar to Arabidopsis GRAS proteins (S1 Fig). The conserved motifs for each GRAS protein were also identified using MEME (http://meme.sdsc.edu/meme/intron.html) (Fig 2B; S4 Table). A total of 20 conserved motifs were identified (named Motif1-20) and detailed information for each motif was listed in S4 Table. Motifs (Motif1, 2, 3, 4, 5, 6, 9 and 11) were widely distributed in most maize C-terminus of GRAS proteins, while the N-terminus contained various motifs, such as motif15, 16, 17, which was consistent with the previous conclusions that C-terminal region of the GRAS proteins was more conserved than the N-terminal region [2]. Although the motifs of the N-terminal regions of the GRAS genes were variable, it increased functional diversity and the complexity of biological networks of the GRAS genes [9, 10]. Genes among different subfamilies had divergent motifs in the N-terminal regions, but most of the GRAS proteins in the same subfamily had similar motifs. For example, In LISCL subfamily, there are three specific motifs (motif15, motif16, and motif17) in its C-terminus ( Fig 2B; S2 Fig). The results were consistent with previous study that the pattern of protein disorder could be more conserved through evolution than the amino acid sequence in the N-terminus [55].

Structural organization of ZmGRAS genes and chromosomal localization
The overall pattern of intron positions can affect phylogenetic relationships when analyzing gene family evolution [56]. To evaluate the diversity of the GRAS genes, we analyzed the structure of each maize GRAS gene. The result showed that 69 (80.23%) ZmGRAS genes with nonintron and only 17 (19.77%) genes with 1-5 intron (Fig 2C). There were nine genes with one intron, five genes with two introns, two genes with three introns, and only one gene with five introns. In addition, most GRAS gene members of the same branch generally showed similar exon-intron structures.
To investigate the chromosomal distribution of the GRAS family in maize, the physical location information of the maize GRAS genes on chromosomes according to the phytozome database (http://phytozome.jgi.doe.gov/, v3) was used to draw the map. The 86 GRAS genes demonstrated a nonrandom distribution. More than one third of the GRAS transcription  Table).
https://doi.org/10.1371/journal.pone.0185418.g001 factors were found on two chromosomes: chromosome 4 (n = 17, 19.77%) and chromosome 1 (n = 13, 15.12%), and only 3 (3.49%) on chromosome 8. ZmGRAS genes were not found on Duplication events are of interest in across many taxa, and maize originates from an ancient allotetraploid event and has undergone several rounds of polyploidy [45,57]. We identified eleven duplicated genes with highly amino acid sequence and structure similarities, and all of them contain only one exon. These duplicated genes belonged to six groups with six, six, four, two, two, two genes in SHR, PAT1, LISCL, DELLA, SCL3, LS subfamily, respectively, each of the duplicated genes contained two genes with very close genetic relationship (Fig 2; Fig 3). Five of these duplicated genes were distributed on chromosome 1 and none of them on chromosome 4, and 6. In addition, two pairs duplicated genes on chromosome 2 and 7 (ZmGRAS10/ZmGRAS61, ZmGRAS32/ZmGRAS46) belonging to the same SHR subfamily. These results suggested that segmental duplication has played a role in subfamily's origin [40].

Expression of ZmGRAS genes in different tissues and developmental stages
GRAS transcription factors have important roles in plant growth and development, such as cell maintenance and proliferation, axillary shoot meristem formation, root radial pattering, and male gametogenesis. Genes expressed high in particular tissues may play essential roles in the development of the tissues. The expression of GRAS genes in different tissues and different developmental stages were analyzed using published microarray data (Maize eFP Browser, http://bar.utoronto.ca/efp_maize/cgi-bin/efpWeb.cgi). The expression data included 13 different tissues, including germinating seed 24 H (seedling), coleoptile, radicle, stem and SAM (V1), first internode, immature tassel, meiotic tassel, anthers, primary root, pooled leaves, silks, base of stage 2 leaf (adult leaf), and embryo 24 DAP. The expression patterns of 75 genes were found in this database (Fig 4).
In Fig 4, partial genes in a branch with similar expression patterns in different tissues were laid in the same branch in Fig 2, (Fig 2; Fig 4), but the duplicated gene pair ZmGRAS40/62 belonging to the PAT1 subfamily had different expression pattern (Fig 4). We analyzed the regulator elements of the 2000bp promoter region of the gene ZmGRAS40 and ZmGRAS62 using plantcare (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) and New PLACE (https://sogo. dna.affrc.go.jp/cgi-bin/sogo.cgi?lang=en&pj=640&ac-tion=page&page=newplace) online websites and found that the regulator elements in front of the two promoters were not the same (S5 Table). For example, in the 2000bp promoter region of gene ZmGRAS62 there were four "MYBPLANT motif" which couldn't be found in the 2000bp promoter of gene ZmGRAS40, the "MYBPLANT motif" was distributed in four positions upstream of the start codon, respectively, "-387bp", "-495bp", "-1489bp", "-1662bp"which was plant MYB binding site according to the New PLACE website. The AmMYB308 and AmMYB330 transcription factors from Antirrhinum regulated phenylpropanoid and lignin biosynthesis in tobacco [58]. The Lignin was one of the components of the cell wall and filled with the cellulose framework to enhance the mechanical strength of the plant, which is conducive to the organization of water transport, and affect the growth and development of organs including leaf and stem. Additionally, in the promoter of the gene ZmGRAS62, there existed GA-responsive elements (GARE2OS-REP1 and GAREAT) which didn't existed in the promoter of the gene ZmGRAS40 according to the New PLACE website [59,60]. Gibberellins (GAs) were phytohormones that regulate various aspects of plant development, including germination dormancy, leaf morphogenesis and shoot and root growth, etc [61]. We speculated that different expression level between the gene ZmGRAS62 and ZmGRAS40 at the stage of first internode (V5), base of stage 2 leaf (V5) and pooled leave possibly owning to the above reason. But further experiments were needed to verify this hypothesis. Additionally, partial genes with similar expression patterns belonged to different branch. For example, ZmGRAS60 and ZmGRAS34 had similar expression patterns, but they were divided into SHR subfamily and LISCL subfamily, respectively. In the heat map, columns represent genes, while rows represent different tissues, including germinating seed 24 H (seedling), coleoptile, radicle, stem and SAM (V1), first internode, immature tassel, meiotic tassel, anthers, primary root, pooled leaves, silks, base of stage 2 leaf (adult leaf), and embryo 24 DAP. The color changes from green to red represent the relative low or high expression in leaves respectively. The red lines represent eleven duplicated genes pairs. https://doi.org/10.1371/journal.pone.0185418.g004 Real-time quantitative RT-PCR (qPCR) was used to validate the microarray data. Eleven genes were selected to confirm expression patterns in primary root, pooled leaf, coleoptile, SAM, adult leaf, silk, seedling, meiotic tassel and immature tassel. As shown in Fig 5, 6 genes (except ZmGRAS69, ZmGRAS40, ZmGRAS19, ZmGRAS12 and ZmGRAS62) had similar expression patterns between the qPCR data and microarray data. For example, ZmGRAS67 had higher expression in seedling, and ZmGRAS10, ZmGRAS61 were mainly expressed in SAM. However, ZmGRAS69 had high expression in seedlings in the qPCR, in contrast to the pooled leaf microarray data. These conflicting results may result from the different plant materials, different growth conditions, and different experimental conditions. These results suggest that some GRAS genes with different expression levels in different organs might play key roles in plant development. Several GRAS genes may also have unique functions during specific developmental stages.

Conclusion
86 putative GRAS family genes were identified from maize via sequence comparison between maize, Arabidopsis, rice, Medicago truncatula and Sorghum bicolor. Only a few genes from this transcription factor family have been previously characterized in detail in maize. Thus, this work is the first comprehensive and systematic analysis of GRAS transcription factors in maize. The number of the maize GRAS family genes is larger than other plants, suggesting that they might encounter gene segmental-duplication and tandem-duplication during the evolution. Due to the fact that all of the maize GRAS genes expressed differentially, the genes possibly encountered sub-functionalization or neo-functionalization. So, it's reasonable to explore each gene for its specific role in maize growth and development, to support the current work on maize molecular breeding.

Database search for GRAS genes in maize
The whole protein sequences of Zea mays were downloaded from the MaizeGDB (http://www. maizegdb.org/, v3). The protein sequences for 34 Arabidopsis GRAS genes were downloaded from plant TFDB (http://planttfdb.cbi.pku.edu.cn/). HMMER 3.0 software was obtained from the HMMER website (http://hmmer.janelia.org/) and was employed in searching for GRAS proteins in the entire protein dataset of Zea mays with a cut-off E-value of 1e -5 using PF03514.11 which was the newest HMM model for the GRAS transcription factor family downloaded from the Pfam database (http://pfam.xfam.org/) [63,64] as a query. Genes were classified according to the distance homology with Arabidopsis and rice genes [7].

Phylogenetic analysis of maize GRAS proteins
All GRAS protein sequences of Arabidopsis, rice, Medicago truncatula, and Sorghum bicolor were downloaded from plant TFDB (http://planttfdb.cbi.pku.edu.cn/). Then, together with the 86 maize GRAS proteins, the multiple sequence alignment was performed using Clustal X, (Version 2.0) [54]. The aligned sequences were then subjected to phylogenetic analysis by Neighbor Joining (NJ) method using PHYLIP (Version 3.695) with 1000 bootstrap replicates [52,53].

Chromosomal location of maize GRAS genes
All GRAS genes were mapped to maize chromosomes based on information available at the Phytozome website (http://phytozome.jgi.doe.gov/, v3). The map was drafted using photoshop CS3 based on chromosome size.

Analysis and distribution of conserved motifs and exon-intron structures
The exon-intron organization of GRAS genes was determined by the online GSDS 2.0 tools (Gene Structure Display Server) [65] based on the CDS sequence and corresponding genomic sequences which were obtained from the website (https://phytozome.jgi.doe.gov/pz/portal. html). Multiple EM for Motif Elicitation (MEME, http://meme-suite.org/) [66] was used to search for possible conserved motifs in the complete amino acid sequences of maize GRAS proteins using the default settings.

Gene duplication analysis of maize GRAS genes
Gene duplication events of GRAS genes in maize B73 were investigated. We defined the gene duplication using the following criteria: 1) the alignment of whole protein length covered >80% of the longest gene, 2) the aligned region had an identity >80% and 3) only one duplication event was counted for tightly linked genes. The duplicated gene pairs were connected by red lines using photoshop CS3.

Expression patterns of GRAS family in maize
The data of expression patterns of GRAS family genes in maize was found in Maize eFP Browser (http://bar.utoronto.ca/efp_maize/cgi-bin/efpWeb.cgi). The expression patterns of different genes were searched by primary gene ID. The expression level of different tissues was put in a table, then analyzed using Cluster (v3.0) [67] and Java Treeview (v1.1.6).

Plant materials and growth conditions
Maize seeds (Zea mays L. cvB73) were grown in soil under greenhouse conditions at 25˚C/22˚C (day/night) with a photoperiod of 16/8 h (day/night) for 2 weeks.

RNA isolation and real-time quantitative RT-PCR expression analysis
Primary root and pooled leaf were sampled when the first leaf is fully extended (V1), coleoptile was sampled 6 days after sowing (6 DAS), SAM was sampled when the three leaves were fully extended(V3), adult leaf was sampled when the seven leaves were extended(V5), silk was sampled when the silks emerge from the husk(R1), seedling was sampled from 24 h after imbibition, meiotic tassel was sampled when the eighteen leave were extended(V18), immature tassel was sampled when the thirteen leave were extended(V13) [68,69], these nine different tissue materials were collected and stored for RNA isolation. Total RNA was extracted using Trizol (Invitrogen, Carlsbad, CA, USA). All the primers for qPCR were designed using QuantPrimer (http://quantprime.mpimp-golm.mpg.de/) (S6 Table). We selected the best pair of primers for qRT-PCR, the specificity of primers was tested by melting curve. A single peak indicates that the amplification product is specific and the corresponding PCR results were used for data analysis. Reverse transcription was performed with 5 μg total RNA as the template by using the TransScript1 II One-Step gDNA Removal and cDNA Synthesis SuperMix (TRANSGEN BIOTECH, AH311). Quantitative RT-PCR (qRT-PCR) was carried out on the Bio-RAD CFX96 using the Real SYBR Mixture (CWBIO, CW0760). The results were analyzed with the Bio-RAD CFX Manager software. Three biological replicates were performed.
Supporting information S1