Genome-scale identification, classification, and tissue specific expression analysis of late embryogenesis abundant (LEA) genes under abiotic stress conditions in Sorghum bicolor L.

Late embryogenesis abundant (LEA) proteins, the space fillers or molecular shields, are the hydrophilic protective proteins which play an important role during plant development and abiotic stress. The systematic survey and characterization revealed a total of 68 LEA genes, belonging to 8 families in Sorghum bicolor. The LEA-2, a typical hydrophobic family is the most abundant family. All of them are evenly distributed on all 10 chromosomes and chromosomes 1, 2, and 3 appear to be the hot spots. Majority of the S. bicolor LEA (SbLEA) genes are intron less or have fewer introns. A total of 22 paralogous events were observed and majority of them appear to be segmental duplications. Segmental duplication played an important role in SbLEA-2 family expansion. A total of 12 orthologs were observed with Arabidopsis and 13 with Oryza sativa. Majority of them are basic in nature, and targeted by chloroplast subcellular localization. Fifteen miRNAs targeted to 25 SbLEAs appear to participate in development, as well as in abiotic stress tolerance. Promoter analysis revealed the presence of abiotic stress-responsive DRE, MYB, MYC, and GT1, biotic stress-responsive W-Box, hormone-responsive ABA, ERE, and TGA, and development-responsive SKn cis-elements. This reveals that LEA proteins play a vital role during stress tolerance and developmental processes. Using microarray data, 65 SbLEA genes were analyzed in different tissues (roots, pith, rind, internode, shoot, and leaf) which show clear tissue specific expression. qRT-PCR analysis of 23 SbLEA genes revealed their abundant expression in various tissues like roots, stems and leaves. Higher expression was noticed in stems compared to roots and leaves. Majority of the SbLEA family members were up-regulated at least in one tissue under different stress conditions. The SbLEA3-2 is the regulator, which showed abundant expression under diverse stress conditions. Present study provides new insights into the formation of LEAs in S. bicolor and to understand their role in developmental processes under stress conditions, which may be a valuable source for future research.

Introduction 31]. Heterologous expression of BnLEA4-1 in E. coli shows tolerance to heat and salt stress [32]. The citrus dehydrin acts as radical scavenger and reduces the metal toxicity [18]. Likewise, two soybean LEA4 proteins bind to Fe and are associated strongly in reducing oxidative damage induced by abiotic stress [33]. Further, it was shown that loss of LEA4 proteins result in drought susceptibility in Arabidopsis [34]. The Arabidopsis LEA2 protein alters the pathogenesis-related protein expression and confers defense response [35]. Similarly, the group 3 LEA proteins in maize confer tolerance to bacterial infection. While their heterologous expression in tobacco exhibit tolerance to Pseudomonas syringae [36], wheat TaLEA2 and TaLEA3 in yeast enhance the salt and freezing stress tolerance [37]. Lin et al [38] found that VrDhn1 stabilizes the DNA under seed desiccation. Thus, it appears overexpression of diverse LEA proteins offer tailored protection against abiotic stress in a wide range of plants [15].
Sorghum bicolor is the fifth most important cereal crop, used as food, feed, fuel, fibre, and fertilizer. It is moderately tolerant to drought, salinity, water logging conditions as well as high temperature [39][40][41][42]. The knowledge about the number of LEA proteins and their families, structure characterization, tissue specific expression, and chromosomal location is meagre in S. bicolor. Hence, in the present investigation, comprehensive genome-scale identification of LEAs, their structural characterization, chromosomal location, and promoter analysis alongside the tissue specific gene expressions were carried out under varied abiotic stress conditions.

Identification, chromosomal localization, and gene structure analysis of LEA in S. bicolor
In the present study, 34 Oryza [23] and 51 Arabidopsis [7] LEA gene sequences were retrieved from NCBI database and searched (using TBLASTN) against Sorghum bicolor genome in Gramene database (http://www.gramene.org/) to find out their homologs. Genscan (http://genes. mit.edu/GENSCAN.html) program was used to retrieve the coding and protein sequences. Based on homology, Sorghum LEA sequences were analyzed by SMART program (http:// smart.embl-heidelberg.de/) [43] for the presence of conserved domains. MOTIF search (http://www.genome.jp/tools/motif/) tools were used to check the reliability of conserved domains. Chromosomal locations of LEAs were determined with the information obtained from Gramene database and the physical map was drawn based on their positions. Gene characterization was studied using Gene Structure Display Server (http://gsds.cbi.pku.edu.cn) [44].

Promoter analysis of SbLEA family, phylogenetic analysis, and estimation of synonymous and non-synonymous substitution rates
The 1000 bp genomic sequence upstream of start codon of SbLEA genes were examined using PLACE [51] software to check for the presence of cis-elements responsible for development, biotic, and abiotic stresses. The NJ phylogenetic trees for LEA protein family of S. bicolor, O. sativa, and A. thaliana were generated using MEGA 6.2 software [52] with default parameters like Poisson correction, pairwise deletion, and bootstrap value (1,000 replicates). Paralogues and orthologues were identified using phylogeny and InParanoid 8 (the orthology analysis software) [53] with default parameters like 0.01 cut off E value, 50 or higher cut off score values. Synonymous and non-synonymous sites and substitution rates of paralogous and orthologous gene pairs were calculated using PAL2NAL software (http://www.bork.embl.de/pal2nal/) [54].

In-silico expression profiling of SbLEAs
Expression analysis for the identified SbLEA genes was performed using Affymetrix wholetranscriptome Sorghum array data accessible from the SorghumFDB [55]. The Genevestigator platform [56] was used to perform the microarray analysis for SbLEAs genes under several environmental stresses (drought, salt, heat, and cold) with different samples embedded in the platform. The expression profiles of SbLEA genes identified from Sorghum array was used for cluster analysis. A heat map of expression profiling was developed by using hierarchical clustering tool embedded in Genevestigator platform [57].

Plant material and stress conditions
The seeds of S. bicolor BTx623 variety were sown in pots containing 4.5 kg of black clay soil under glass house conditions at 28/20 o C day/night temperatures. After 40 days, the plants were subjected to drought and salt stresses by treating with 1 liter each of 150 mM mannitol and NaCl individually for 4 h. The cold stress was applied by keeping the plants at 4˚C for 4 h and heat stress by exposing the plants to 40˚C for 4 h in a growth chamber. The respective controls were maintained under identical conditions. Roots, stems, and leaves were collected and snap frozen immediately in liquid nitrogen and stored at -80˚C until further use.

RNA extraction and qRT-PCR analysis for transcriptional profiling of SbLEA genes
The MACHEREY-NAGEL kit was used to isolate the total RNA from roots, stems, and leaves by following the manufacturer's instructions. The first strand cDNA was synthesized from total RNA (3 μg concentration) using first strand synthesis kit (Thermo Scientific). Gene specific primers were designed by using NCBI PRIMER Blast (www.ncbi.nlm.nih.gov/tools/ primer-blast/) [58] and Primer3 software (http://bioinfo.ut.ee/primer3-0.4.0/) [59] with the default parameters: 57-60˚C annealing temperature, 18-22 bp primer length, 50-55% GC contents, and 80-140 bp amplicon length (S1 Table). The SYBR Green Master Mix (2X) (Takara) was used according to the manufacturer's recommendations. Two biological duplicates with three technical replicates were taken for qRT-PCR analysis in Mx3000p (Agilent Technologies) with the following thermal cycles: 1 cycle at 95 o C for 10 min, followed by 40 cycles alternatively at 95 o C for 15 sec and 60 o C for 1 min. The amplicon dissociation curves were recorded with fluorescence lamp after 40 th cycle by heating from 58 to 95 o C within 20 min. Transcript levels of SbAcp and SbEP-F genes were used as internal controls [60]. Relative gene expressions were calculated by employing Rest software [61] and average values are represented. Statistical significance of the expression values was determined by using t-test.

Identification, chromosomal localization and gene structure analysis of SbLEA genes
A total of 68 LEA genes were identified in the genome of S. bicolor based on rice and Arabidopsis LEA homologs. Their reliability was checked for the presence of conserved domain using SMART and MOTIF tools. The genes are grouped into 8 sub-families like LEA 1-6, dehydrins, and SMP based on their conserved domains and Pfam nomenclature. Among all the families, SbLEA2 was found as the largest family with 40 genes (SbLEA2-1 to SbLEA2-40), followed by SbLEA3 with 7 genes (SbLEA3-1 to SbLEA3-7), and SbDHNs with 6 genes (SbDHN1-SbDHN6). Both SbLEA1 and SbLEA4 families contain 5 genes each, while SMP has only 3 members. The smallest families are SbLEA5 and SbLEA6 with one member each (Table 1). SbLEA genes were distributed on all the chromosomes. Out of 68 genes, 13 genes are localized on chromosome 1; 11 on 2, 10 on 3, 6 on 4, 3 on 5, 7 on 6, 4 on 7, 3 on 8, 8 on 9, and 3 on 10 ( Table 1 and Fig 1). All the members of SMPs have only 1 intron and 2 exons. A total of 22 genes out of 40 in the group SbLEA2 lack introns. SbLEA2-9 showed a maximum of 8 exons. Out of 68 SbLEA genes, only one exon was observed in 31 genes, 2 exons in 19 genes, 3 in 6, 4 in 5, and 5 in 5, 6 in 1, and 8 in 1. A total of 22 genes out of 40 in the group SbLEA2 lack introns (Table 1 and Fig 2).
Majority of the SbLEA proteins phosphorylate at serine and threonine sites and very few of them at tyrosine residue. In case of SMP group members, phosphorylation occurs at threonine. Protein kinase C (PKC) and unsp are the most dominant types present in higher amounts in all the SbLEA proteins. Next to PKC, cdc2, PKA, DNAPK, P38MAPK, and PKG are the most common kinases associated with phosphorylation. The highest number of cdc2 was found in LEA-2 family (S2 Table).

Conserved motif analysis
Sixty eight SbLEAs did not share high similarity, and each family was submitted to MEME separately and in combination for domain or motif structure analysis. Ten conserved motifs were identified for each family except SbLEA-6, which contains only 7 (Fig 3 and S1 Fig). The paralogs and closely related genes exhibit similar motif compositions. The composition of the motifs is similar in each family but varies among different families. Motif 3 in LEA-1, motif 5 in LEA-2, and motif 5 and 6 in DHNs appeared as the biggest motifs. Fifty four SbLEA proteins exhibit common motifs and motif 1 is the most common and conserved structural motif present in majority of the proteins. Motifs 9 and 10 are the key features of DHN sequences. For recognition of SbDHN proteins, K-segment in motifs 1 and 3, S-segment in motif 2, and Y-segment in motif 4 were used (Fig 3 and S1 Fig). Conserved motifs were not observed in LEA-1, 4, 5, 6, and SMP families. Next to motif 1, motif 3 is the most conserved and located at C terminus. While in LEA-3 group, motif 5 is the most conserved, in LEA-2 family, motif 7 is the structural motif conserved at N terminus (S2 and S3 Figs). the targets for 15 different miRNAs. It appears that miRNAs target 18 genes in SbLEA-2 group, and 5 in SbLEA-3. While six miRNAs target LEA3-5, 3 of them target SbLEA2-35 group. Sbi-miR6225, sbi-miR437x, sbi-miR5568, and sbi-miR6220 appear as the most common miRNAs that target SbLEA genes and participate in cleavage and translation (S3 Table).  Table). Characterization of LEA genes under abiotic stress in Sorghum

Phylogenetic analysis of LEA family proteins
Phylogenetic analysis was carried out for 68 SbLEA proteins to analyse the evolutionary relationships within and between the groups (Fig 4). Different families of SbLEAs exhibit high similarity and cluster into 2 major clades (Fig 4).   Table 2). To know the evolutionary relationship and find ortholog pairs, another phylogenetic tree was constructed with Arabidopsis and Oryza (Fig 5). In this, LEA proteins are grouped into 2 clades, while LEA-2 family of Sorghum, Oryza and Arabidopsis fall into clade 2, others into clade 1. Thirty eight out of 68 from Sorghum, 9 out of 39 from rice, and 7 out of 51 from Arabidopsis fall into clade 2, but SbLEA-2 family appears as the most dominant group. A total of 11 paralogs each are observed in Sorghum and Arabidopsis, but only 7 in Oryza. The SbLEA shows 12 orthologs with Arabidopsis and 13 with Oryza. The Oryza and Arabidopsis share only six orthologs among them (Fig 5 and S5 Table). From the InParanoid, the orthology analysis of SbLEAs exhibits ortholog relationship with Setaria, Oryza, Hordeum and Brachypodium (S6 Table).

Estimation of non-synonymous and synonymous substitution rates of LEA
The non-synonymous (d N ) versus synonymous (d S ) substitutions (d N /d S ) were estimated for SbLEA genes which show duplication events within Sorghum as paralogs ( Table 2). The   Table 2). Most of the paralogs d N /d S were found to be below <1 ( Table 2). The paralogous synonymous and non-synonymous substitution calculations were extended to orthologous LEA gene pairs between Arabidopsis, Oryza and S. bicolor. Out of 25 orthologs, Sorghum shows 12 events with Arabidopsis of which 4 duplications share same chromosomes (Sb01g046000/At1g72100 on  Table). The orthology analysis of SbLEAs with Oryza, Setaria, Brachypodium and Hordeum shows that majority of them exhibit Darwinian selection, and the d N /d S ratio is greater than 1 (S6 Table).

Microarray-based gene expression profiling in different tissues and different developmental stages under abiotic stress conditions
Of the 68 sorghum SbLEAs, microarray data for 65 SbLEA genes were available on the Genevestigator platform, these were further utilized for expression analysis. Expression of these 65 SbLEA genes in six tissues (roots, pith, rind, internode, shoot, and leaf) was analyzed under normal and abiotic stress conditions using microarray data (Fig 6A). The expression level was higher in root, pith and in the leaf tissues. The expression profiles of SbLEAs genes were analyzed at five different development stages, including stem elongation, booting, flowering, dough, and seedling. SbLEA genes were found expressed in all developmental stages (either up-regulated or down-regulated) (Fig 6B). However, the expression of SbLEA genes in the booting and flowering stages demonstrated a slightly different pattern, particularly SbLEA-2 members displayed the dominant expression profile compared to other developmental stages. High expression of SbLEA genes during booting and flowering stages might have been caused by booting-related cellular deteriorations, leading to substantial metabolic or physiological changes that significantly affect the overall regulation under abiotic stresses.
Hierarchical clustering based on the above expression profiles of individual SbLEA genes under various abiotic stress conditions allowed grouping of the 65 SbLEA genes into two major clusters. One of these clusters contained the only SbLEA2-22 gene which shows very high up-regulation under different stress conditions. The remaining SbLEA genes were distributed among other sub-clusters of the second major cluster (Fig 6C). The heat map of different SbLEA genes following abiotic stresses showed significantly altered expression (either up-regulation or down-regulation) up to 2.5-folds (Fig 2). Members of the SbLEA-2 (SbLEA2-22, SbLEA2-24, SbLEA2-32, SbLEA2-33, and SbLEA2-37) were up-regulated under stress conditions. Similarly, SbLEA3-1 and SbLEA3-2 members were up-regulated under salt, cold, and drought stresses.

Quantitative expression analysis of SbLEAs
To investigate the differential gene expressions in vegetative tissues of Sorghum, a systematic analysis of quantitative real-time (qRT)-PCR was carried out for a group of 23 SbLEA genes. qRT-PCR expression analysis of 23 SbLEA genes in different tissues under drought, salt, heat, and cold stresses reveals their comprehensive roles in stress tolerance mechanism, as well as in growth and development. The differential expression patterns in roots, stems, and leaves are shown in the Figs 7 and 8A. Most of the LEA genes exhibit the highest expression levels in stem tissues (SbLEA1-5, 2-9, 2-13, 2-18, 2-37, 3-7, and 4-1) (Fig 8A).  Table).

Discussion
Genome-wide analysis of Sorghum bicolor for LEA genes reveals 68 SbLEAs that belong to 8 families. Similar studies in other plant species showed different number of LEAs; 23 in Phyllostachys [62], 27 in tomato [26], 29 in potato [25], 30 in Prunus [24], 32 in maize [63], 34 in rice [23], 36 in soybean [64], 51 in Arabidopsis [7], 53 in poplar [21], 61 in Cucumis melo (melon) and 73 in Citrullus lanatus (water melon) [65], 72 in sweet orange [66], 79 in cucumber [67], 108 in Brassica [20], 136 in Gossypium arboreum, 142 in G. raimondii, and 242 in G. hirsutum [68]. It is puzzling to note that the number of LEA genes is very large, abundant and diversely distributed across different taxa. The abundance perhaps indicates their conservative role under abiotic stress conditions as well as during growth and development. It is interesting to observe that aquatic plants have less number of LEAs because they do not suffer from drought stress. Thus, the present and previous research findings are consistent with the results of Kamisugi and Cuming [69] regarding the wider distribution and function of LEA proteins in terrestrial plants. Generally, the LEA families with close taxonomic relationships exhibit the same number and distribution of genes. However, the number of the LEA genes varies in Sorghum, maize, and rice. This occurrence may be due to the evolutionary variations of the whole genomes and wide changes in the environment. Comparison of SbLEAs with rice and maize show divergence signals which are associated with selected traits and are functionally stressresponsive. This indicates that stress adaptation in maize is possible by evolution of protein coding sequences [70]. The divergence of LEA families in Zea and Oryza occurred due to evolutionary changes, the large number of LEA genes and evolution of LEA-2 family members may be meant for adaptation of Sorghum to stress conditions. The LEA family proteins are further classified into 8 subfamilies among all the crops based on their conserved domain and phylogenetic tree analysis. But, Arabidopsis holds an extra subgroup named as AtM [7]. In Sorghum, the most dominating LEA-2 family has the highest number of genes (58.8%), but dicots such as Arabidopsis (codes for 35%), Populus (49%) and Brassica (23%) are rich in LEA-4 members [7,20,21]. Similarly, DHNs and SMP groups also show variations among monocots and dicots. Arabidopsis consists of 10 DHNs, and 6 SMPs [7], Oryza 8 DHNs, 5 SMPs [23], Brassica 23 DHNs, 16 SMPs [20], and Sorghum 6 DHNs [71], and 3 SMP genes. The expansion of gene family depends on segmental, tandem duplications, and transposition events [72]. In the present study, 22 paralogs were observed including 4 regional duplications, and 13 paralogous pairs (SbLEA-2 family) with segmental duplication events. This indicates that segmental and tandem duplications are responsible for SbLEA gene family expansion [20,26]. Lan et al. [21] pointed out that stress-responsive genes generally contain very less number of introns. In the present study, 45.58% of LEA genes lack introns (especially 55% of genes in the major SbLEA-2 group) and 27.94% hold one intron. Similar results were recorded in Brassica [20]. This supports the earlier view that introns delay the gene expression and extend the transcript length, which results in an additional burden on the process of transcription [73].
Filiz et al. [22] and Altunoglu et al. [62] pointed out that LEA4, LEA5, and LEA6 group proteins are acidic while most of the LEA proteins are basic in nature. Present study shows that 73.52% are basic in nature, but 85% of proteins from SbLEA-2 are basic thus corroborating the earlier findings. In contrast, SMPs are found to be acidic in nature which is in agreement with the findings of Liang et al. [20] in Brassica. The grand average of hydropathy values of SbLEA proteins are highly hydrophilic, except SbLEA-2 family. Previous studies report only one or two proteins with hydrophobicity [7,20], while 85% of SbLEA-2 group proteins are hydrophobic, similar to cotton LEA2 members [68]. Hydrophilic nature and high net charge are the characteristic features of LEAs [74], which makes them disordered, and act like molecular chaperones under stress in plants [75].
Instability index shows majority of the SbLEA proteins are stable like that of SiLEAs as noticed by Cao and Li [26]. LEA proteins are not transmembrane proteins [76] and are located in mitochondria, chloroplasts, nucleus, and cytoplasm. Contrarily, SbLEA-2 family members exhibit transmembrane helices, which are hydrophobic in nature. Detection of transmembrane helices in proteins indicate their expression in subcellular compartments. SbLEA-2 shows high aliphatic index inferring the relative volume occupied by aliphatic side chains like alanine, valine, isoleucine and leucine, which enhance the thermostability of proteins [77]. Majority of the SbLEA-2 family members are localized in chloroplasts, like in cotton [68]. The wide distribution within subcellular compartments leads to interaction with cellular membranes under stress and establish protective mechanism for stress tolerance [15].
Generally, the diversity of structure and conserved motifs cause the evolution of multigene families [78]. It is the amino acid composition that causes disordered structure in LEAs [79]. Our analysis revealed that SbLEA proteins show group-specific conserved motifs. Identical results were reported earlier for LEA proteins in Arabidopsis [7], Prunus [24], poplar [21], Solanum [26], maize [63], Brassica [20], and cotton [68]. Specific conserved motifs and their number indicate that they are evolved from the gene expansion within their specific families, and the motif compositions vary from one family to the other. While glycine-rich regions are noticed in AtLEA-2, other LEA members are rich in lysine [7]. But, conserved motifs in SbLEA-2 family are rich with cystine and lysine in contrast to hydrophilins that lack tryptophan and cysteine [80]. The intrinsically disordered proteins which are small in size play several important roles in cells that help in structural flexibility, binding of DNA, RNA, proteins, macro molecules, and membrane proteins to protect and maintain the cellular stability under stress [75,81,34]. Phosphorylation helps LEA and dehydrin proteins in binding to calcium, iron and other divalent cations [82,83]. Phosphorylation of YnSKn type DHNs by PKCs, and SKn DHNs by CK2s, maintains the activity of DHNs conferring tolerance to stress. Eriksson and Harryson [84] and Nagaraju et al. [71] pointed out that such phosphorylation enhances the membrane binding activity of DHNs.
Micro RNAs (miRNAs) are the large group of small, noncoding regulatory elements, which play pivotal roles in gene regulation by disturbing the transcripts of genes and mediate the plants adaptation under abiotic stress [85][86][87]. For example, expression of rice miR319a in creeping bent grass confers tolerance against salt and drought stresses [88]. Also, salt stress alters the expression of miR396c and miR394 [89]. Sb-miR437, found in majority of SbLEA genes has also been identified earlier in Oryza, maize, and sugarcane but absent in Arabidopsis and Populus. This suggests that miR437 is monocot specific [90]. Sorghum miRNAs may target transcription factors like SPB, zinc finger, WRKY, WD-40, NAC, MYB, HSFs, GRAS, ARFs, and bHLH families [91], which play important roles in growth, development, metabolism, biotic and abiotic stresses [92][93][94].
Present study identifies several abiotic stress-responsive elements, hormone specific, development specific, and biotic stress-responsive elements, as also noticed in other crop plants [26,68]. The cis-elements responsive to phytohormones increase the plants potentiality to survive under environmental changes. It is known that ABRE play an important role in ABA signalling and abiotic stress tolerance. Similarly, DRE/CRT/LTRE (drought responsive/C-repeat/low temperature-responsive) elements enhance the drought, cold and salt-responsive gene expression, by controlling transcription factors like CBF/DREB1 [95,96]. Multiple CGCG cis-elements present in all the SbLEAs bind to calmodulin/Ca 2+ and are responsible for eliciting multiple signaling pathways [97]. SbLEAs also contain biotic stress-responsive cis-elements; WBOXNTERF3, WBOXATNPR1, and CGTCA that respond to wounds, pathogens and salicylic acid [98,99]. GT1GMSCAM4 cis-elements, rich in GAAAAA, were detected, and play a crucial role in salt and pathogen-induced gene expression and tolerance [100]. The MYB cisacting promoter elements identified in the present study play a key role in the abscisic aciddependent signaling pathway in response to drought, salt, and cold as pointed out by Li et al. [101]. Identification of wide range of cis-elements in the Sorghum paralogous gene promoter regions perhaps indicate the variation in expression between paralogous duplicated genes, neo-functionalization or sub-functionalization, which is an important evolutionary mechanism [102]. The presence of these cis-elements in SbLEA genes represent that they play important roles in different stresses.
Based on the phylogenetic analysis, SbLEA genes were classified into 8 groups, similar to other plants [7,20,68]. While SbLEA2 is the largest group, SbLEA5 and 6 represent fewer genes, consistent with Arabidopsis [7]. Interestingly, LEA6 group is absent in rice [23]. The present study revealed 25 ortholog gene relationships with Arabidopsis and Oryza. Generally, Sorghum exhibits relationship with Oryza, being the common monocot ancestor, but the present study reveals that S. bicolor LEA proteins are phylogenetically close to Arabidopsis also. The phylogenetic tree depicts common evolutionary origin of LEA-1, 3,4,5,6, and SMP [6], which is consistent with potato and cotton [25,68]. Genome-wide analysis in few plants reveals the differences among LEAs in monocots and dicots. In dicots, LEA4 and DHNs are the most abundant [7,20,26], but analysis of Sorghum reveals LEA2 is a big, atypical, hydrophobic group. A recent study in rice and poplar reports higher number [66]. The phylogenetic analysis reveals that whole genome duplication contributes to expansion of SbLEA family. Indeed, rice (monocot ancestor) genome contains 34 LEA genes [23], and the whole genome duplication event is expected to generate 68 genes as seen in Sorghum. Similar results were observed in Arabidopsis, Brassica, and cotton also. Out of a total of 22 paralogous duplication events, 1 segmental and 4 tandem duplications are observed in Sorghum. As pointed out by Salih et al. [103], the abundance of LEA proteins mainly occur through segmental duplication events during evolution, similar to Arabidopsis, Brassica, and cotton. It is known that the synonymous (d S ) and nonsynonymous (d N ) values reveal the selective pressure on SbLEA duplicated genes. While greater than 1 d N /d S value indicates positive selection, less than 1 functional constraint, and equal to 1 neutral selection [104]. The d N /d S ratio analysis of SbLEA 22 paralogous pairs reveal that only 11 events had ratios of which one shows more than 1, and remaining very low values, similar to Brassica [20], melon [65], and cotton [68]. This infers that during evolution, the purifying selection influences the SbLEA genes and specifically LEA2 shows conserved structures and functions under selective pressure [105].
Gene expression analysis provides new insights into their function [106,107]. Microarray data from the databases show high expression of SbLEA genes in different tissues. This indicates that abiotic stresses and/or high metabolic activity generally lead to up-regulation of SbLEA genes in different tissues in a tissue-specific manner. These results agree with the results of quantitative real-time expression analysis carried out for a set of SbLEA genes in the present study. SbLEA gene expressions in different tissues exhibit variations, which reveal their role during growth and development. Both SlLEA9 and SlLEA23 show high expression levels in tomato flower buds, suggesting their roles in reproductive development [26]. The At5g27980 regulates pollen germination and tube growth due to its abundant expression in the mature pollen [108,109]. Expression of ZmLEA3 group in root, stem, and leaf tissues also suggests their role in growth and development [63]. Present study shows abundant expression of LEA2 group genes in vegetative tissues, akin to cotton LEAs [68]. Majority of the SbLEAs are expressed in leaf tissues, consistent with the observations of Liang et al. [20] in Brassica. Native expression of paralogous genes in different tissues implies distinct divergence and evolution of duplicated genes for different functions during plant growth and development. SbLEA genes expression was further assessed under drought, salt, heat, and cold in different tissues, which gives new insights into their critical roles under abiotic stress conditions. These results show significant changes in expression levels under diverse stresses implying their association with stress tolerance. They act as molecular chaperones, protect, stabilize, prevent aggregation and denaturation of proteins under stress conditions [110]. Among different tissues, roots are first affected under many abiotic stresses [111], followed by leaves. Leaves wilt or become chlorotic and lead to disruption of photosynthesis and yield losses [112]. The paralogs also show expression variations similar to previous studies by Du et al. [24]. Expression of ZmLEA3 at the transcriptional level was reported under biotic and abiotic stresses and its over-expression in tobacco exhibit tolerance against osmotic and oxidative stresses by participating in protein protection mechanism and by binding to metal ions [36]. Similarly, SbLEA3-2 upregulates in leaf tissues under all stresses, acting as regulatory gene that participates in stress tolerance mechanism. SbLEA1-5, SMP-1, SMP-2, LEA3-2, LEA4-3, and many members of the SbLEA-2 group upregulate in stem under heat, drought, and salt stresses. Over expression of SiLEA14 enhances abiotic stress tolerance in foxtail millet [113]. While overexpression of tomato LEA25 enhances salt and chilling stress tolerance in yeast [29], NtLEA7-3 displays tolerance against cold, drought, and salt stresses in Arabidopsis [28]. The Brassica BnLEA4-1 expressed in E. coli exhibits tolerance to temperature and salt stresses [32]. The SbLEA-2 family members, a typical hydrophobic proteins, upregulate under different stresses, and the results are consistent with that of cotton which show high expression under drought stress [68]. The Medicago MtPM25, a hydrophobic protein participates in disaggregation of proteins under stress, but unable to protect membranes [114]. Thus, the abundant presence of LEA-2 genes under stress conditions indicates that they act as key factors in plant adaptation mechanism under diverse environmental stresses.

Conclusion
A systematic genome-wide analysis resulted in the identification of a total of 68 LEA genes in Sorghum, which are classified into 8 groups and distributed on all the chromosomes. For the first time in monocots, a typical hydrophobic group SbLEA2 is identified with large number of genes like that of dicots. Present study helps in understanding the evolution and functions of an important major family SbLEA2 by functional analysis. It appears that segmental and whole genome duplication plays an important role in their expansion. The gene organization and motif compositions of the LEAs are highly conserved which indicate their conserved functional roles. Alongside the abiotic stress-responsive elements, hormone specific, developmental, biotic and other cis-elements were identified, indicating their complex regulatory mechanism. Further, the diversified and tissue specific expression profiles provide a further insight into the possible functional divergence in SbLEA gene family. The transcriptional profiling under abiotic stress indicates they might play an essential role in stress tolerance. Taken together, present study lays the foundation for further investigations of the specific functions of these Sorghum LEA genes, especially LEA2 family, in other monocots with reference to abiotic stress tolerance.