Genome-wide identification and expression profiling of glutathione S-transferase family under multiple abiotic and biotic stresses in Medicago truncatula L.

Glutathione transferases (GSTs) constitute an ancient, ubiquitous, multi-functional antioxidant enzyme superfamily that has great importance on cellular detoxification against abiotic and biotic stresses as well as plant development and growth. The present study aimed to a comprehensive genome-wide identification and functional characterization of GST family in one of the economically important legume plants—Medicago truncatula. Here, we have identified a total of ninety-two putative MtGST genes that code for 120 proteins. All these members were classified into twelve classes based on their phylogenetic relationship and the presence of structural conserved domain/motif. Among them, 7 MtGST gene pairs were identified to have segmental duplication. Expression profiling of MtGST transcripts revealed their high level of organ/tissue-specific expression in most of the developmental stages and anatomical tissues. The transcripts of MtGSTU5, MtGSTU8, MtGSTU17, MtGSTU46, and MtGSTU47 showed significant up-regulation in response to various abiotic and biotic stresses. Moreover, transcripts of MtGSTU8, MtGSTU14, MtGSTU28, MtGSTU30, MtGSTU34, MtGSTU46 and MtGSTF8 were found to be highly upregulated in response to drought treatment for 24h and 48h. Among the highly stress-responsive MtGST members, MtGSTU17 showed strong affinity towards its conventional substrates reduced glutathione (GSH) and 1‐chloro‐2,4‐dinitrobenzene (CDNB) with the lowest binding energy of—5.7 kcal/mol and -6.5 kcal/mol, respectively. Furthermore, the substrate-binding site residues of MtGSTU17 were found to be highly conserved. These findings will facilitate the further functional and evolutionary characterization of GST genes in Medicago.

Introduction provided tolerance not only to a range of abiotic stresses, including cold, dehydration, UV, oxidative stress, salt, and heavy metals [28] but also against herbicides [29]. A few members of tau, phi and theta class GST has also possessed glutathione peroxidase activity [25]. Theta class of GSTs play a prominent role to detoxify oxidized lipids [30] whereas zeta GSTs are involved in the tyrosine catabolism [30], tolerance against chilling and salt stresses in Euphorbia esula [31]. The DHAR class GSTs are particularly up-regulated during light and drought stresses [32] in comparison to the GHR and mPGES2 members which are differentially regulated under various abiotic stresses in Arabidopsis [23].
The functional characterization of GST gene family had been performed in various plant species but limited to legumes family. Preliminary genome-wide identification GST members had been performed in Medicago [33] with limited genomic information and expression profiling data. In the present study, systematic identification and characterization of GST gene family have been conducted in Medicago for the understanding of their role in different physiological conditions. The study identified a total of ninety-two GST members which add 29 new members and 4 new class of GST as compared with the previous report [33]. Each of these members was analyzed further to identify their physiochemical properties, chromosomal location, presence of conserved motifs, structural organization, sub-cellular localization, phylogenetic relationship, and protein structure. Further, transcript profiling of all Medicago GST members was analyzed in different developmental stages, anatomical tissues and response to various abiotic and biotic stress conditions using microarray data and few selected of them were confirmed by quantitative RT-PCR. Additionally, molecular docking study of ten highly stress-responsive members was performed with two well-known substrates of GST-GSH and 1-chloro-2,4-dinitrobenzene (CDNB) to elucidate the binding affinity and residues. This genome-wide analysis and expression profiling will provide a critical platform for the identification of stress and pathogen-resistant genes and development of resistant plants.

Identification, annotation and sequence analysis GST genes in Medicago
To identify the entire GST family of Medicago truncatula, we have performed the systemic BLASTp search of known GST protein sequences of Arabidopsis as the query in JGI Phytozome 12 the plant genomic resources Medicago truncatula Mt4.0v1 (https://phytozome.jgi. doe.gov/pz/portal.html) with default parameters. All the found hits were further analyzed through the NCBI conserved domain database (https://www.ncbi.nlm.nih.gov/Structure/cdd/ wrpsb.cgi) to confirm the presence of conserved GST domains and to classify them in different classes. All the significant hits were analyzed for their annotation, chromosomal location, CDS coordinate (5' to 3'), length of the respective gene, CDS and protein from Phytozome 12 plant genomic resources. The ExPasy site (http://web.expasy.org/protparam/) was used to calculate their respective molecular weight and isoelectric point (pI) [34]. The subcellular localization of each protein was predicted using CELLO v.2.5: sub-cellular localization predictor (http://cello. life.nctu.edu.tw/) [35], the pSORT prediction software (http://www.genscript.com/wolf-psort. html) [36] and for chloroplast localization ChloroP (http://www.cbs.dtu.dk/services/ChloroP/) [37].

Chromosomal distribution, gene duplication and Ka/Ks calculation
The chromosomal distribution diagram was plotted using CIRCOS software (http://circos.ca/) [38] according to the chromosome number and CDS coordination information from the Phytozome 12 the plant genomic resources (https://phytozome.jgi.doe.gov/pz/portal). For the duplication study within the Medicago genome, data including the synonymous rate (Ks) and non-synonymous rate (Ka) values were retrieved from the plant genome duplication database (http://chibba.agtec.uga.edu/duplication/index/downloads) [39]. Homologous genes were further analyzed by BLASTp search whereas sequence similarities more than 90% among the proteins were considered as segmental duplication [40], and tandem duplicated genes were identified by five or fewer genes in a 100kb region. The selection pressure of duplicated genes was calculated using the Ka/Ks ratio where Ka/Ks ratio >1, = 1, or <1 imply the positive, neutral, and negative selection, respectively. The estimated date (Mya, million years ago) of each duplication event was calculated by using T = Ks/2λ where T is divergence time, Ks is the number of synonymous substitutions per site, and λ is the fixed rate of 1.5×10 −8 synonymous substitutions per site per year expected for dicotyledonous plants [41].

Exon-intron structure and molecular evolutions analysis
Gene structure (Exon-intron) map was generated using the gene structure display server, GSDS 2.0 (http://gsds.cbi.pku.edu.cn/) [42]. The individual map was generated by aligning the CDS sequence with their respective genomic DNA sequence. To analyze the evolutionary relationships, protein sequences were aligned using ClustalW to construct an unrooted phylogenetic tree using the default parameters of the maximum likelihood method of MEGA7 software with 1000 bootstraps.

Identification of conserved motif, SSR markers and glycosylation sites
The online MEME tool (http://meme-suite.org/tools/meme) [43] was used to identify all the conserved motifs among all MtGSTs using default parameters with a maximum number of different motifs to find = 10, minimum width = 10, maximum width = 50. To identify the SSR marker in MtGSTs, microsatellite identification tool (MISA, http://pgrc.ipk-gatersleben.de/ misa/misa.html) [44] was used with at least ten units for mononucleotide repeats and a minimum five units for dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeats. The maximum distance between two markers was set 100 units. The number of glycosylation sites in MtGST proteins was identified using NetNGlyc 1.0 server (http:// www.cbs.dtu.dk/services/NetNGlyc/) [45].

Expression profiling of MtGST using microarray data
To explore the different temporal and spatial gene expression patterns of the MtGST gene family, transcript abundance data of 68 genes with the specific probe was retrieved from genevestigator (https://genevestigator.com/gv/) [46] at various anatomical tissues, developmental stages and in response to different biotic and abiotic stress conditions. Data for seven distinct developmental stages (germination, seedlings, main roots, axillary shoot, developed flower, flower and pods and mature pods) and 24 specific anatomical tissues from primary cells, seedlings, inflorescence, shoot and roots were obtained. Expression data against two devasting abiotic stress (salinity and drought) and in response to six pathogen infection, were also obtained from the same database and analyzed. For salinity stress, 3 days old Jemalong A17 seedlings were treated with 180mM NaCl and radicle samples were collected after 6h, 24h and 48h (Experiment ID: MT-00011). Similarly, 24 days old Jemalong A17 seedlings were exposed to water withholding conditions, and samples were collected after 2d, 3d, 4d, 7d, 11d, 14d and 14d+1d rewatering (Experiment ID: MT-00013). In the case of perturbation, fold change in expression as compared to the respective untreated/control sample was retrieved for each stress conditions. Heat maps with hierarchical clustering were generated by default parameters of Multiple Experiment Viewer (MEV) 4.9 software package with Manhattan correlation [47].

Analysis of putative promoter of MtGST genes
The putative promoter sequences (1kb upstream of the translation start site) of all MtGSTs genes were downloaded from the Phytozome version 12.1 database and analyze through Plant-CARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) [48] to identify the presence and position of important cis-regulatory elements.

Plant material and stress treatments
M. truncatula line Jemalong A-17 (PI670016) seeds obtained from the USDA-GRIN database were used for the analysis. Plants were grown in quart gallon pots filled with a mix of peat moss and perlite (1.5:1) in a growth chamber under controlled conditions with 16 hours of light and 8 hours of dark period. Plants were irrigated with Peter's solution on alternate days to create optimum growing conditions. The experiment designed for control, drought, and salinity stress with nine replicates each treatment. For salinity treatment, three weeks old plants irrigated with 180mM NaCl in Peter's solution. To avoid osmotic shock/stress, the salinity level was increased in two steps. A solution of 300mM Mannitol and 20% Poly-Ethylene Glycol (PEG6000) in Peter's solution was used to create drought conditions. The control plants were irrigated with Peter's solution. The leaf and root samples were collected after 24 and 48 hours of the treatment from three individual plants in each treatment group for RNA extraction and gene expression analysis.

Quantitative reverse transcription-PCR (qRT-PCR)
The expression analyses of selected 10 MtGST genes (S1 Table in S2 File) were performed using qRT-PCR. RNA extraction was done with TRIzol reagent (Invitrogen, Carlsbad, CA, USA). DNA contamination was removed by treating RNA with TURBO DNase enzyme (Invitrogen, Carlsbad, CA, USA). The gene expression was done by qRT-PCR in the BIO-RAD CFX Connect Real-Time System (BIO-RAD, Hercules, CA, USA). The iTaq Universal SYBR Green One-Step Kit (BIO-RAD, Hercules, CA, USA) was used in the study. The reaction was carried out in 10μl reaction volume containing 25 ng of RNA, 0.25 μM of each primer, 0.125 μl iScript Reverse Transcriptase enzyme (BIO-RAD, Hercules, CA, USA), and 5 μl of 2x one-step SYBR Green Reaction mix (BIO-RAD, Hercules, CA, USA). The PCR program was as follows: 50˚C for 10 min, 95˚C for 1 min, then 40 cycles of 95˚C denaturation for 10 s, 60˚C annealing and extension for 20 s. The Medicago PDF2, PPRrep, and Ubiquitin genes were used as reference for the study (S1 Table in S2 File). The amplification specificity was tested with melt curve analysis as follows: 65-95˚C, 0.5˚C increments 2-5 second per step. Two technical replicates were used for each biological replicate, so a total of six replicates for each treatment were used for analysis. The gene expression data analysis was done using CFX manager software [49].

Homology based modelling of few selected MtGST proteins
Three-dimensional structure of 10 highly stress up-regulated members were created using the automated homology-based modelling tool of SWISS-MODEL (http://swissmodel.expasy.org/ ) [51]. The structure of MtGSTU8 and MtGSTU17 were built using the template of Glycine max GSTU (PDB: 4TOP), MtGSTU28 and MtGSTU29 using a synthetic GST tau protein (PDB: 6GHF), MtGSTU46 and MtGSTU47 using a glutathione transferase family member from Ricinus communis (PDB: 4J2F), MtGSTF8 using Arabidopsis thaliana GSTF7 (PDB: 6EZY), MtGSZ2 and MtGSTZ3 using a Zeta class glutathione S-transferase from Arabidopsis thaliana (PDB: 1E6B), and finally structure of MtGSTT2 was built based on the human GSTT1-1 (PDB: 2C3N). All these structures and putative active site residues were visualized using BIOVIA Discovery Studio Visualizer v.4.5.

Molecular Docking study
The 3D structure of the above MtGST proteins was used as a receptor to check the binding potential with two well-characterized substrates-reduced glutathione (GSH) and 1-Chloro-2,4-dinitrobenzene (CDNB). Three-dimensional chemical structures of these two ligands were retrieved from the PUBCHEM compound database (http://www.pubchem.ncbi.nlm.nih.gov) as SDF file. The ligand file conversions required for the docking study were performed using the open-source chemical toolbox Open Babel v. 2.3.2. [52]. Grid box parameters were set to accommodate each compound within the binding site of each protein and determined using AutoDock Tools v. 1.5.6rc3 [53]. Molecular docking calculations for two ligands with each of the proteins were performed using AutoDock Vina v. 1.1.2. [54] and the PDBQT files were generated using the MGL tools.

Statistical analysis
Statistical data analysis was performed for the relative normalized expression from three biological replicates under each treatment and time-point (n = 6). The Student's t-test were performed for each treatment against the respective controls to determine the significant alteration (P value< 0.05) that were marked with different letters.

In silico identification, nomenclature, and characterization of M. truncatula GST transcripts
After a systematic BLASTp search against the Medicago truncatula Mt4.0v1 genome database with the query sequence of Arabidopsis GST proteins [55], a total of 120 non-redundant proteins encoded by 92 genes were obtained containing either typical GST N-and/or C-terminal domains. All these full-length putative GST proteins were classified into twelve families based on their conserved domain analysis. The GST members of Medicago were named according to the previously reported Dixon and Edwards, 2010 [25] by adding prefix "Mt" (Medicago truncatula) to the subclass identifiers (e.g., MtGSTU, MtGSTF, MtGSTT, MtGSTZ, MtGSTL, MtTCHQD, MtDHAR, MtEF1Bγ, MtMGST, MtGHR, MtGSTM, MtGSTH represents tau, phi, theta, zeta, lambda, TCHQD, DHAR, EF1Bγ, mPGES, GHR, metaxin, hemerythrin class, respectively) followed by gene number. Previously identified MtGST members were assigned the same name as given by Han et al. (2018), and the newly identified members were added after them. Previously identified MtGSTU48 is excluded in our study because of the absence of conserved GST domains. The tau and phi classes are found to be the most abundant with 51 and 11 members, respectively. The length of the MtGST genes ranged from 210 bp (MtGSTF10) to 15125 bp (MtGSTL5), and the deduced complementary DNA sequence (CDS) were 210 bp (MtGSTF10) to 3816 bp (MtGSTH3) long, whereas the polypeptide length varied from 69 aa (MtGSTF10) to 1271 aa (MtGSTH3). The molecular weight (MW) of the MtGST proteins varied from the lowest 7.90 kDa (MtGSTF10) to the highest of 146.59 kDa (MtGSTH3) and the predicted pI values ranged from 4.86 to 9.48. However, the average gene length, CDS, protein, MW and PI were 2892.15 bp, 845.67 bp, 280.94 aa, 32.150 kDa and 6.473, respectively ( Table 1). Most of the MtGST proteins were found to be 200 to 400 aa long with few exceptions of longer proteins (MtGSTL5 and MtGSTH3-5) and shorter proteins (MtGSTU50-52 and MtGSTF10) due to the presence of other domains apart from GST and truncated/deleted genes, respectively. Most of the MtGST proteins were predicted to be localized in the cytoplasm, followed by chloroplast, mitochondria, nucleus, plasma membrane and extracellular space (Table 1).

Chromosomal localization and gene duplication
The 92 non-redundant MtGST genes were mapped on the 8 different chromosomes and scaffold regions of M. truncatula (Fig 1). The number of MtGST genes on each chromosome varied widely. Chromosome 1 contained the highest number of twenty-one GST members, followed by chromosome 7 and 2 with seventeen and thirteen members, respectively. In chromosome 3 and 4 has eleven genes each, while chromosome 5 and 8 have seven genes each. The lowest number of three members were located in chromosome 6. To deduce the intensified  which showed that all of the duplicated genes were negatively selected. All the duplicated MtGST gene pairs have a Ka/Ks values less than 1 (

MtGST family members are evolutionary conserved
The online MEME motif search program identified 10 putative conserved motifs (more than 10 amino acids long) that were found to be present in at least three classes of the MtGSTs (S2 Table in S2 File). The motif 1-8 were specific for tau class while motif 9 and 10 were specific for phi and lambda classes. To explore the expansion of GST family members in Medicago, an unrooted phylogenetic tree was generated (Fig 2). The phylogenetic tree showed that each class of MtGST members clustered together to form a separate clade except MtMGST1, MtEF1Bγ4, and MtMGST2 (Fig 2). This result indicated the separation of GST classes took place before the individual family member expansion. The gene structures showed that the presence of 1-3 exons in tau, phi and TCHQD members; similar to the gene structures of these classes of GST in wheat [56]. The DHAR and metaxin classes contained 6 exons while theta, GHR, zeta and EF1Bγ classes contain 3-7 exons except for MtEF1Bγ4 with one exon. The exon number of lambda and hemerythrin classes contained more exons than other classes varied from 3 to 25. Maximum numbers of exon found in MtGSTL5 (25 exons) followed by the MtGSTH4 and MtGSTH5 with 14 exons, and MtGSTH3 (12 exons).

Analysis of microsatellite markers and glycosylation sites in MtGSTs
A major focus of genetic mapping efforts is to explore molecular markers related to particular traits. A total of 101 SSR markers were distributed among 51 out of 92 MtGST genes. The most abundant were mononucleotide repeats (63 occurrences), followed by dinucleotide repeats (28 occurrences), trinucleotide (6 occurrences), tetranucleotide repeats (2 occurrences), and pentanucleotide (2 occurrences) (S3 Table in S2 File). Twenty-three MtGST genes possessed more than one SSR marker. Furthermore, glycosylation is a crucial protein secondary structure modification which plays a critical role in determining the protein 3D conformation, function and stability [57,58]. In the present study, 48 out of 92 MtGST proteins had the presence of predicted glycosylation sites. The maximum number of 11 glycosylation sites were found in MtGSTH3 and MtGSTH4 followed by the MtGSTL5 (9 sites), MtMGST3 and MtHSTH5 with 6 sites each (S4 Table in S2 File). Prediction score greater than 0.75 indicated the maximum possibility of glycosylation. Among the total of 117 predicted sites, only 15 sites have the prediction score equal or greater than 0.75, and thus have maximum possibility glycosylation mediated secondary structure modification (Table 3).

MtGST transcripts showed dynamics variation in various plant developmental stages and organ differentiation
To elucidate the role of MtGST genes, their expression was analyzed at the seven distinct developmental stages and twenty-five anatomical tissues. Expression data of 68 MtGST genes were analyzed and found to be expressive in all the developmental stages and anatomical tissues in a differential pattern. Based on these patterns, all these genes could be classified into three groups: i) extremely low levels of expression in almost all the tissues and organs, ii) some MtGSTs exhibited low to medium levels of expression among different organs/tissues, and iii) some were highly expressive across all the tissues and developmental stages of its entire life cycle. Among them, MtGSTU13 and MtDHAR1 showed the maximum levels of expression in all developmental stages while MtGSTU8, MtEF1Bγ1, MtEF1Bγ3, MtGSTU20, MtGSTU24, MtGSTL4, MtGSTU47, MtEF1Bγ2, MtGSTZ2, MtGSTL2, MtGSTF5, MtMGST1 showed high levels of expression ( Fig 3A). Notably, one cluster MtGSTU42 to MTGSTH1 had extremely low levels of expression in almost all the developmental stages ( Fig 3A). Similarly, one cluster of MtGST genes (MtGSTU13 to MtGSTT1) showed a high level of expression in almost all the analyzed tissues while another cluster MtGSTU8 to MtGSTH5 showed low to medium level of expression ( Fig 3B). A cluster of four MtGST genes-MtGSTU14, MtGSTU26, MtGSTU44 and MtGSTZ3 possessed a very low level of expression in all the analyzed tissues. Interestingly, another cluster of MtGST genes-MtGSTU2 -MtGSTF9 possessed high to medium level of expression in all the analyzed tissues, except for the inflorescence and shoot specific tissues ( Fig 3B). Two transcripts of MtGSTU13 and MtDHAR1 maintained a high level of expression in all the analyzed anatomical tissues similar to all the developmental stages, indicating their important roles in the plant development and tissue differentiation. Some of the MtGST members showed tissue-specific pattern too e.g. expression of MtGSTU42 is primary cell-specific while that of MtGSTU7 is root-specific.

Stress-induced transcript alteration of MtGST genes
To investigate the abiotic stress-responsiveness, expression profiling of 68 MtGST transcripts were further analyzed in response to two different abiotic stresses viz. drought and salinity. Fold change in expression was analyzed at 6h, 24h, 48h of salt stress, while data were analyzed for 2d, 3d, 4d, 7d, 10d, 14d of drought stress and 14d drought +1d re-watering as compared to their respective 2d control expression level (S5 Table in S2 File). Most of the MtGST members were differentially upregulated whereas some of them were downregulated too (Fig 4A). Two clusters-MtGSTU46 to MtGSTF8 and MtGSTL4 to MtGSTH5 genes were highly up-regulated in both drought and salinity stresses in almost all the time points while one cluster MtGSTU37 to MtGSTH3 genes showed down-regulation in almost all cases with few exceptions (Fig 4A). Among them, MtGSTU8, MtGSTU17, MtGSTF8, MtGSTT2 and MtGSTZ1 members were highly upregulated in all cases of these two abiotic stresses. While some members showed stress-specific pattern such as MtGSTU44 and MtGSTT1 were drought specific, and MtGSTU2 and MtGST20 were salinity stress-specific (Fig 4A). To understand the role of MtGST genes under biotic stresses, their expression profile was analyzed in response to two common fungal infections-Phymatotrichopsis omnivore and MtGST genes was analyzed at five major anatomical tissues, such as primary cell, seedlings, inflorescence, shoot and roots. (B) Expression of the same 68 MtGST genes was analyzed at seven distinct developmental stages, such as germination, seedlings, main roots, axillary shoot, developed flower, flower and pods, and mature pods. Expression data was retrieved from genevestigator (https://genevestigator.com/gv/) and the heatmap was created with hierarchical clustering of Manhattan distance correlation using MeV software package. A colour scale is provided along with the heat map to recognize the differential pattern of expression.  Results suggested that one clade of MtGST genes-MtGSTU18 to MtGSTL2 were highly upregulated in response to all six infectious agents' treatment except few early days after infection (dpi) time points (Fig 4B). Among the six pathogens, the maximum 43 members were upregulated in response to E. coli infection followed by 41 and 40 MtGST members were upregulated in response to P. omnivore and M. phaseolina infection, respectively. Although there is no specific cluster of downregulated genes, a cluster of MtGSTU7 to MtGSTU25 showed downregulation in response to almost all infections and time points. Interestingly, only MtGSTF4 was sharply upregulated against the infection of all pathogens at all the time points (Fig 4B).

The putative MtGST promoters contained various cis-acting elements
To investigate the presence and position of possible cis-elements involved in the activation of stress-related genes, the putative promoter region of 68 MtGST genes were scanned through Plant CARE database. The analysis revealed the presence of several cis-acting elements conferring plant hormone and stress responsiveness in the promoter of MtGST. We have identified the presence of seven hormone-related elements such as abscisic acid-responsive (ABRE), auxin-responsive (AUXRR-core), ethylene-responsive (ERE), gibberellin-responsive (GARE and P-box), salicylic acid-responsive elements (TCA-element) and methyl jasmonate-responsive element (TGACG-motif); seven defence and stress-responsive elements such as fungal elicitor-responsive (Box-W1, W-Box), heat stress-responsive(HSE), low-temperature-responsive (LTR) elements, MYB binding site involved in drought-inducibility (MBS), stress responsiveness (TC-rich repeats), and wound-responsive element (WUN-motif) (Fig 5). The most abundant cis element found in MtGST promoters were stress responsiveness TC repeats (91) followed by the HSE (81), ABRE (77), and MBS (72). The promoter of MtTCHQD1 and MtGSTH5 contains the highest number of 15 cis elements followed by the promoter of MtGSTU8, MtDHAR2 and MtEF1Bγ with 13 members each (Fig 5 and S7 Table in S2 File). Presence of these diverse types of hormones and stress-related cis-elements in the promoter region could be directly correlated with the stress-responsive transcript alteration of MtGSTs.

Experimental validation of gene expression profile of ten selected MtGSTs genes
To validate the expression pattern of MtGST genes in response to two abiotic stresses (drought and salinity) and minimize the variation of control, time points and tissue selection; we have analyzed the expression of ten selected stress responsive MtGST genes in both leaf and root tissues using the same control sample through qRT-PCR. The real-time PCR expression profile of the selected genes reveal good correlation with the microarray data. Under drought stress, most of the MtGST genes were highly upregulated in both shoot and root tissue samples. The expression of MtGSTU8, MtGSTU28, MtGSTU30, MtGSTU34, MtGSTU46 genes were upregulated both in leaf and root tissues after 24 and 48 hours of the drought treatment (Fig 6). The expression of MtGSTU14 was highly upregulated in the root tissues on drought stress at both time points (Fig 6C and 6D), while the expression of the MtGSTU46 gene is highly upregulated in the leaf tissues after 48 hours of drought stress ( Fig  6L). This pattern of expression suggesting the root-or shoot-specific expression pattern of MtGSTU14 and MtGSTU46 genes to drought, respectively. The expression of MtGSTF8 showed drastic upregulation of more than 10 folds in shoot tissue at both the time points, while the expression in root is unaltered (Fig 6, O-P). Moreover, the expression of MtGSTF1 gene downregulated in leaf tissues after both 24 and 48 hours of treatment while its expression in root tissues remained unchanged under drought stress. The MtGSTT2 transcript showed higher expression in leaf tissues only after 48 hours of treatment ( Fig 6T) and in root tissues only after 24 hours of drought stress (Fig 6S). The expression of the MtGSTF9 gene upregulated after 24 hours for both treatments in leaf tissues but downregulated in root tissues. MtGSTs are more expressive towards drought stress as compared with the salinity stress. Only few of the selected genes showed sharp enhancement/deregulation in response to salinity at both 24h and 48h time points. MtGSTU46 showed more than 5 folds upregulation in response to salinity at 24h stress treatment (Fig 6K). Similarly, the expression of MtGSTF9 upregulated in the leaf tissues at 24h and in the root tissues after 48h of salt stress. Overall, drought treatment influences upregulation for most of the analyzed genes, while salt stress does not affect the expression of most of the targeted genes, that is clearly visible from the heatmap generated based on fold change in expression (Fig 6U).

MtGSTU17 showed the lowest binding affinity towards GSH and CDNB
Among the highly stress-responsive MtGST members, ten proteins-six from tau class, one from phi class, two from zeta class and one from theta class were used in the molecular docking study to determine their affinity towards the well-known substrates of GST-GSH and CDNB (S1 Fig  in S1 File). Docking scores showed that MtGSTU17 has the lowest binding energy with GSH (-5.7 kcal/mol) and CDNB (-6.5 kcal/mol) followed by MtGSTU8 with a binding energy -5.7 kcal/mol for GSH and -6.0 kcal/mol for CDNB (Table 4). Similarly, MtGSTZ2 showed a higher binding affinity with GSH (-5.2 kcal/mol) and CDNB (-5.6 kcal/mol) followed by MtGSTT2 and MtGSTF8 with GSH (-5.4 kcal/mol, -4.8 kcal/mol) and with CDNB (-5.1 kcal/mol, -5.2 kcal/mol), respectively in case of all classes ( Table 4). The lower binding energy with its substrates of MtGSTU8 (S2 Fig in S1 File) could be directly correlated with its higher specific activities of 5.93 ±0.17 μmol/min/mg. Specific hydrogen bonds and hydrophobic interactions between four lowest binding affinity providing proteins with those ligands were analyzed using Discovery studio program. The best-scored protein MtGSTU17 stabilized with GSH within the binding pocket by forming six hydrogen bonds with Ser88, Leu89, Phe150, Gly151, Tyr157, Val158 and one hydrophobic bond with Ser88 and their bond length were 4.29, 3.65, 6.01, 3.41, 5.03, 3.59 and 5.48 A˚, respectively. Besides, MtGSTU17-CDNB docked complex formed four hydrogen bonds with Trp103, Tyr157 and Val158 while two alkyl bond and one pi-sigma bond with Pro91 and Ala100, respectively (Fig 7 and S8   to understand their structural details, conformational behavior, stability, and flexibility of the protein-ligand docked complex. In the elastic graph, each dot represents one spring between the corresponding pair of atoms, where the darker greys indicate stiffer springs and vice versa The first column represents the predicted 3D structure, the second and third column represents the 2D interaction of each protein with GSH and CDNB, respectively. The green, light green, pink, purple Orange and red spheres represent residues involved in the hydrophobic interactions, carbon-hydrogen bond interactions, Pi-alkyl interactions, Pi-cation interactions and unfavorable acceptor-acceptor interactions, respectively. https://doi.org/10.1371/journal.pone.0247170.g007 (S3 Fig in S1 File). The analyzed protein atoms involved in springs were less in both docked complex, which mainly depends on the protein atoms in amino acid residues.

Discussion
Adverse environmental conditions including abiotic and biotic stresses induce the production of ROS [11] which damage cellular macromolecules such as proteins, lipids, nucleic acids and cell membranes. To adapt this condition, plants have developed various physiological, chemical, and enzymatic defence mechanisms which help in their avoidance and/or tolerance of stresses [59]. In such events, antioxidant enzymes involved in the major defence mechanisms. Glutathione S-transferases (GSTs) are ubiquitous, multi-functional and antioxidants protein superfamily, that have great importance for mediating the removal of stress-inducing toxic compounds from plants. GSTs were reported from plants in the 1970s, for their potential roles in protecting maize crops from a herbicide chloro-S-triazine atrazine [60,61].
In this report, we have identified a total of ninety-two GST family members, each of which members contain at least one conserved domain related to GST protein ( Table 1). The number of identified GST genes are higher in Medicago as compared to 79 GST genes in rice [62], 55 in Arabidopsis [63], 84 in barley [64], 42 in maize [65], 49 in C. rubella [66], 82 in radish [67], 90 in potato [68], 90 in tomato [69], 85 in pepper [70]; but fewer than wheat with 330 GSTs [56] and soybean with 126 GSTs [71]. There is no direct correlation with the number of GST family members and their respective plant genome size (Table 5). Medicago possessed 1.12 and 1.16 times more GST members as compared with the similar-sized genome of radish and rice, respectively. Despite the higher genome size of Z. mays, C. annuum, H. annuus, H. vulgare; they possessed a smaller number of GST genes as compared with Medicago. The possible reason behind this observation could be the identification strategies, quality of genomic sequences and identification of new classes over time (Table 5). Data for MGST, Metaxin, Hemerythrin, GST2_N is missing for most of the previously reported plant species. However, this is one of the first reports for the presence of metaxin and hemerythrin class of GST members in Medicago. Another thing is clear from the GST number distribution among 20 plant species in thirteen classes (Table 5), the tau and phi classes are by far the most abundant classes in plant. The lambda, DHAR, zeta and EF1Bγ classes were next in number with a variable quantity. There is only one report for the presence of 2 GST2_N (thioredoxin-like) protein in B. oleracea [72]. Subcellular localization of 21 GST proteins from Physcomitrella patens, representing 10 classes were tested using C-terminal GFP fusions followed by visualization using confocal microscopy in Nicotiana benthamiana [73]. Sixteen proteins (Four phi-PpGSTF1, PpGSTF4, PpGSTF9, and PpGSTF10; four hemerythrin-PpGSTH1, PpGSTH2, PpGSTH3, and PpGSTH7; three EF1Bγ-PpEF1Bγ1, PpEF1Bγ2, and PpEF1Bγ4; two theta-PpGSTT1 and PpGSTT3; one TCHQD-PpTCHQD2, one Ure2p-PpUre2p1; and one zeta-PpGSTZ1) showed typical cytosolic and nuclear localizations. However, PpTCHQD3-GFP showed only cytosolic localization. Similarly, PpTCHQD5 and PpGSTL1 localized in chloroplasts; and PpDHAR1 and PpGSTI1 were shown to localize in both the cytosol and chloroplasts. Thus, the localization of GSTs has been reported in various subcellular compartments including cytosol, mitochondria, endoplasmic reticulum, nucleus and plasma membrane [74]. Subcellular localization of MtGST proteins has also been predicted to cytoplasm, chloroplast, mitochondria, nucleus, plasma membrane and extracellular space using three independent tools (Table 1).
Species-specific gene family expansion in plants often arises as a result of tandem duplications, segmental duplications, whole-genome duplications, and interspecific hybridizations that can facilitate the evolution of functional diversity [79]. Among the functional diversities, many of the plants tend to duplicate their genes to adapt with different adverse conditions and  developmental processes [80]. We have identified 7 pairs of duplicated genes in MtGST family, and all of them were found to be duplicated segmentally ( Table 2). Most of the MtGSTs member exhibited similar gene structure within the same phylogenetic class, and the significant differences among the twelve classes indicated their evolutionary relationship (Fig 2). The position and numbers of intron/exon were found to be conserved in the different classes of GST (Fig 2). Usually plant GST of phi class contain three exons, tau class has two exons, zeta GSTs have ten exons, and theta group has six exons [81]. P. patens TCHQD GSTs has one to three introns, and PpDHAR contained four to seven introns [73]. However, the position and numbers of intron were found to be conserved in the N-terminal domain of GSTs [73]. The intensity of gene divergence could be corelated with the extensive rates of intron gain/loss in the gene structure. In the eukaryotic genome, microsatellites have widely distributed that exhibit taxon-specific variations in motif structure, genomic location, and frequency [82]. Identification of mutant SSR markers is effective in investigating the genetic variation and mapping. The identified mononucleotide repeats (62.37%), dinucleotide repeats (27.72%) and trinucleotide repeats (5.9%) could be used for the identification genotypes and presence of specific GST member.
The expression patterns of GST genes in different tissues had been described in many species, such as rice [62], pepper [70], and tomato [69]. Expression pattern of MtGSTs is found to be tissue and developmental stage specific. Similarly, four GST genes (SlDHAR2, SlGSTF2, SlEF1Bγ1, and SlEF1Bγ3) showed the maximum level of expression in seedlings, seed, pericarp, placenta, petal, ovary, flowers, and fruits of tomato [69]. In sunflower, six GST genes were mainly expressed in leaves, four in seeds, and two each in flowers and roots among the 14 identified members [68]. Expression of GST genes had also been found to be highly specific towards tissue and developmental stages in Arabidopsis [25] and rice [62]. GST transcripts were found to be upregulated under different abiotic and biotic stresses revealed that the function of GST transcript and its protein to mitigate the stress response. Among MtGST members, MtGSTU8, MtGSTU17, MtGSTU28, MtGSTU29, MtGSTU46, MtGSTU47, MtGSTF8, MtGSTZ2, MtGSTZ3 and MtGSTT2 were highly upregulated in two abiotic conditions. Drought treatment induces strong transcript enhancement in both root and shoot tissues within 24h as compared with salinity stress (Fig 6). Most of the abiotic stresses induce the level of reactive oxygen species and consequent generate oxidative stress. Here, salinity and drought stress treatment at Medicago leaves enhance the endogenous levels of H 2 O 2 significantly as compared with the control leaves for 24h and 48h (S4 Fig in S1 File). Thus, the upregulation of MtGST transcripts (Fig 6) could be directly corelated with the H 2 O 2 generation of the respective stress treatment (S4 Fig in S1 File). One of the highly drought responsive members, MtGSTU8, possessed a significantly higher specific activity (μmol/min/mg) against CNDB and 4-Chloro-7-nitrobenzofurazan (NBD-CI) as compared with other members [33].
To improve the abiotic stress resistance crops, we need a deep understanding of the structural and enzymological properties of the predicted residues involved in GSH and toxic substance binding. After analyzing, the ten maximum abiotic stress-responsive members, MtGSTU8 and MtGSTU17 were found to have the lowest binding energy with GSH and CDNB (Table 4). A study on substrate binding residues of OsGSTU17 showed that Lys42, Val56, Glu68, and Ser69 are the critical components of G-site, while Pro16, Met17, Asn109, Leu112, Tyr113, Phe116, Trp161, Phe162, and Trp165 are the critical residues of H-site [83]. Intermolecular interaction of MtGSTU17-GSH happened at the residues of Ser88, Leu89, Phe150, Gly151 Tyr157, Val158, while CDNB binds with Pro91, Ala100, Trp103, Tyr157, Val158 residues. The type and position of amino acid interacting at G-sites and H-sites were found to be conserved among OsGSTU17 and MtGSTU17. The role and importance of specific interacting residues with ligands in the GST active sites could be further analyzed using site-directed mutagenesis.

Conclusion
To reiterate, a comparative genome-wide analysis of GST gene family was performed in a model legume plant, Medicago and identified ninety-two members with four new families. All the identified members were further investigated for their classification, transcript structure, evolution, conservation, stress responsiveness and putative function. The expression profiles of MtGST genes from microarray data revealed that most of the MtGST transcripts were highly expressed in developmental stages and anatomical tissues. Few of the selected MtGST transcripts were found to be extremely upregulated in response to drought stress. Finally, molecular docking study confirmed the conservance of substrates (GSH and CDNB) binding sites to GST during the detoxification reaction.