In-silico analysis of cis-acting regulatory elements of pathogenesis-related proteins of Arabidopsis thaliana and Oryza sativa

Pathogenesis related (PR) proteins are low molecular weight family of proteins induced in plants under various biotic and abiotic stresses. They play an important role in plant-defense mechanism. PRs have wide range of functions, acting as hydrolases, peroxidases, chitinases, anti-fungal, protease inhibitors etc. In the present study, an attempt has been made to analyze promoter regions of PR1, PR2, PR5, PR9, PR10 and PR12 of Arabidopsis thaliana and Oryza sativa. Analysis of cis-element distribution revealed the functional multiplicity of PRs and provides insight into the gene regulation. CpG islands are observed only in rice PRs, which indicates that monocot genome contains more GC rich motifs than dicots. Tandem repeats were also observed in 5’ UTR of PR genes. Thus, the present study provides an understanding of regulation of PR genes and their versatile roles in plants.


Introduction
Plants are persistently under the threat of several pathogens like bacteria, viruses, fungi, nematodes and other threats. However plant pathogen interactions are extremely intricate and cause majority of plants to become impervious to the vast majority of pathogens. Further these interactions exhibit specific responses that permit only a few pathogens to colonize and spread disease [1][2]. Early recognition of a pathogen is an indispensable step for disease resistance in plants, which is followed by an activation of a series of defense responses during the interaction. During incompatible interactions, avirulence pathogen proteins (Avr) interact with host resistance (R) genes bringing about a series of defense responses such as: accumulation of Reactive Oxygen Species (ROS); enhancement in abscisic acid (ABA), salicylic acid (SA), jasmonic acid (JA), auxins and gibberellins; synthesis of pathogenesis related (PR) proteins; phytolexins accumulation; and hypersensitive response (HR) induction. Consequently, plants don't develop disease symptoms and are safe. In case of compatible interactions, the virulent pathogen invades host machinery and results in systemic infection which prompts the development of disease symptoms [1][2][3]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 PR proteins are a collective set of low molecular weight proteins which accumulate under various biotic and abiotic stresses and under specific physiological conditions like pollen development, leaf senescence, fruit development and ripening [4][5][6][7]. Such proteins have been considered to perform a number of functions, acting as transcription factors, protease inhibitors, enzymes involved in hydrolysis and many are associated with various metabolic pathways [4,8]. PRs were isolated for the first time from tobacco leaves (Nicotiana tabacum) infected with tobacco mosaic virus [9] and subsequently reported from many other plant species including A. thaliana, alfalfa, barley, bean, carrot, chickpea, grape vine, maize, pepper, pearl millet, rice, rubber, soybean, sunflower, sorghum, tomato and wheat [10]. PR proteins have been characterized and classified into 17 families based on the sharing of amino acid sequences, serological relationships, and enzymatic or biological activity [11].
PR1 was the first PR protein to be discovered and has a molecular weight of 14 to 17 kD and acts as a molecular marker for systemic acquired resistance response. It has antifungal activity. PR2 proteins are β-1, 3-glucanases, and their molecular mass ranges from 33 to 44 kDa. They comprise large and highly complex gene families involved in pathogen defense as well as a wide range of normal developmental processes. They are induced in response to wounding or infection by viruses, bacteria and fungi. β-1,3-glucanases degrade pathogen's cell walls by cleaving β-1,3-glucosidic bonds in β-1,3-glucan, a major component of fungal cell wall. PR3 proteins (chitinases) have molecular mass in the range of 15-43 kDa. They cleave the chitin polymers in fungal cell wall, resulting in a weakened cell wall and making fungal cells osmotically sensitive. PR4 proteins are chitin binding proteins having molecular mass between 9-30 kDa. These proteins bind to chitin, and play an important role in enhancing the chitinase activity [6,12]. Thaumatin-like proteins (PR5) possess molecular mass between 18-25 kDa. These can act as antifungal; glucanase and xylanase inhibitors; and α-amylase and trypsin inhibitors. They are also known to be induced during wounding and by insect feeding; especially by phloem feeding insects [13]. Proteinase inhibitors (PR6) and endoproteinases (PR7) are highly stable defensive proteins of plant tissues that are both developmentally regulated and induced in response to insect and pathogen attacks. PR9 (peroxidase) catalyzes cross-linking of macromolecules in plant cell wall. It also produces free radical like H 2 O 2 against a wide range of pathogens [6]. PR10 are ribosome inactivating proteins, known to inhibit translation in fungi. These proteins protect plant proteins and other cellular structures during dormancy, salinity or cold stress [14]. PR12 (defensin) are small cysteine rich peptides providing protection against a broad range of organisms. They are known to inhibit protein synthesis, enzyme activity and ion channel function [15]. PR15 and PR16 catalyze oxidation of oxalates by molecular oxygen, yield CO 2 and H 2 O 2 . They have role in plant development, defense, signaling, differentiation and apoptosis [16].
For a number of PR proteins, activities are known or can be deduced. The majority of PRs (e.g. PR1, PR2, PR3, PR4, PR5, PR7, PR12, PR13 and PR14) possess antifungal activity, whereas, PR8 and PR11 are classified as endochitinases. PR15 and PR16 are oxalate oxidase and oxalate oxidase-like proteins, respectively [4,16]. However, very little is known about molecular mechanism of gene expression of PR genes. In one study, Lodhi et al., 2008 [17] deduced a relationship among architecture of promoter sequence, positioning of nucleosome and expression of PR-1a in tobacco. Therefore, study of gene expression regulation of PR proteins is a crucial step in understanding the molecular mechanisms of plant defense response. Transcription regulation involves association between transcription factors and particular cisacting regulatory elements (CAREs) of a specific gene involved in plant defense response [18]. CAREs are short regulatory motifs (5-20 bp) present in the promoter regions of target genes (typically, non-coding DNA). Promoters play an important role in controlling gene expression. Multiple CAREs such as TATA box, GC box, CAAT box contain coupling sites for transcription factors, enhancers and repressor elements required for proper spatiotemporal expression of genes [19]. Cis-acting regulatory elements are essential transcriptional gene regulatory units as they control various stress responses. Recent advancements in such experimental techniques as RNA interference, microarrays, RNAseq and others have allowed identification and investigation of promoter regions of target genes but these techniques are expensive and technically challenging. Therefore, computational methods are being used to search the promoter regions for different cis-elements responsible for the regulation of the genes [18]. Different computer programs can also be used to look for known cis-elements and to study their organization. Such web-based tools as PLACE [20], PlantCARE [21], AGRIS [22], TRANSFAC [23] and PlantPAN [24] have been developed for the analysis of cis regulatory elements in plant genes.
Examination of CAREs within the promoter sequences of PR genes as well as their combinatorial effects, will lead to better comprehension of regulation of their gene expression. Understanding of cis-elements can also allow us to effectively change the expression pattern of a gene in desired way, which further can provide new ways for the plant genetic engineering technology for protection of crops against biotic and abiotic stresses. To the best of our knowledge, no work has been reported on cis-elements of Arabidopsis thaliana and Oryza sativa PR. Therefore, the present study was planned, to characterize cis-acting regulatory elements (CAREs) of PR classes 1, 2, 5, 9, 10 and 12 with respect to their occurrence and putative role in model plants, Arabidopsis thaliana and Oryza sativa. We also tried to validate our in-silico work with wet lab studies wherever available.

Results and discussion
Search for PR genes of A. thaliana and O. sativa and their structural analysis  Tables 1  and 2 and include locus, chromosome number, strand, transcript count, transcript id, gene length, CDS length, number of exons, and protein length. MatGAT tool was used to compare PR gene sequences of A. thaliana and O. sativa, which revealed percentage similarity (evolutionary distance) among different PRs to range from 39.4% for PR10 to 67.3% for PR2 (S1 Table). PR10 proteins are coded by multigene families and shows higher inter specific variation. However, all of them possess conserved glycine rich loop in their sequence [29]. PR gene

Retrieval of promoter regions and analysis of cis-regulatory elements
Promoter sequences up to1.5 kbp upstream from the translation start site of each PR gene of A. thaliana and O. sativa were scanned using PlantCare program for the identification of cisacting regulatory elements (CAREs). The study revealed a total of 55 CAREs in A. thaliana     [31,32]. Microarray expression analysis of A. thaliana revealed a range of cis-elements responsive to different types of ROS. Such elements have been categorized into two different categories: common ROS-related e.g. TATCCAT/C-motif, GCN4_motif and G-box and ROS-specific element like W-box [33]. Some other elements like H-box, ethylene-responsive GCC elements, salicylic acid, ethylene, abscisic acid and calcium are known to contribute to the response to oxidative damage in A. thaliana [34].
TATCCAT/C-motif is an amylase element representing sugar repression responsiveness. It plays an important role in GA-regulated expression. AtPR9 (peroxidase) shows the presence of this motif. In A. thaliana, peroxidases play an important role in generating H 2 O 2 during the defense response and also provide resistance against a wide range of pathogens. Peroxidases also play a vital role in leaf expansion [35].
GCN4_motif (TGTGTCA) is an essential cis-element required for an endosperm specific gene expression. AtPR12 shows the presence of GCN4_motif. AtPR12 has a role in protecting germinating seeds and developing seeds [36].
G-box (CACGTG) element is involved in response to light, abscisic acid, methyl-jasmonate and anaerobiosis and has a role in ethylene induction as well as in seed specific expression. It is also known as ABRE (ABA-responsive element) [37,38]. It has been shown to be present in all A. thaliana PR genes except in AtPR2 and in all O. sativa PR genes (Fig 5a and 5b).
W box (TTGACC) is an elicitor responsive cis-element, present in AtPR 9 and AtPR 10. W-boxes are found to interact with transcription factors belonging to WRKY family. W box regulates the expression of defense-related (PR10) genes and has role in biotic and abiotic stresses; seed dormancy; senescence etc [39][40]. During stress response, AtPR10 is induced by ABA, ethylene, jasmonic acid and salicylic acid. This gene may be induced by ROS and may act as a protinase against cellulases and pectate lyases of the pathogen [29]. Increase in ROS level especially H 2 O 2 has been shown to increase the PR10 in plants [41]. The presence of W box in AtPR9 indicates its role in senescence [40].  Drought -inducible [60] (Continued) In addition to the above mentioned responsive elements, other oxidative stress responsive elements like AREs, ethylene-responsive GCC elements, ERE and H-box are also present in promoters of PR gene sequences (Fig 5a and 5b). AREs (Anaerobic responsive elements) are essential for anaerobic induction, present in AtPR5, AtPR9 and AtPR10. AREs are bipartite elements consisting of GC and GT motifs. GT motif resembles AtMYB2 transcription binding site, which is drought and low oxygen induced element [42].GCC-box, ethylene-responsive element is necessary for high-level jasmonate-mediated regulation of PR12 expression during plant defense response [43][44]. GCC element is also associated with the expression of many genes involved in different kinds of abiotic and biotic stresses. H-box is a root specific regulatory element present in AtPR9 and OsPR10. It regulates defense genes by elicitors and other stress stimuli [45]. The TGACG motif, also known as 'as1 element', is another well characterized cis-element present in plants. TGACG motif is methyl jasmonate responsive element present among A. thaliana and O. sativa PR gene sequences. The transcription of TGACG mediated PR sequences is regulated by binding of BZIP TGA factor to TGACG element [46]. TC-rich repeats are seen in AtPR1, AtPR2, OsPR2 and OsPR9 and have role in stress and defense responsiveness. Whereas, G-box [47] and TATCCAT/C-motif are also involved in regulating defense responses.
A number of cis-elements associated with light stress include ACE, AE-box, ATC-motif, ATCT motif, Box I, Box II, BoxW1, Box4, Box S, CATT motif, CG motif, Chs-CMA1a, Chsunit 1 ml, G box, GA motif, GAG motif, GATA-motif, Gap box, GT1 motif, I-box, MNF1, Sp1, TCCC motif and TCT motif. Among these some elements such as ACE, AE-box, Box4, Box S, CATT motif, G box, I-box, Sp1, TCCC motif and TCT motif are present in both A. thaliana and O. sativa PRs, whereas Chs-unit 1 ml and Gap box (OsPR10); Box II (OsPR5); Cis-elements in promoter region of pathogenesis-related proteins Cis-elements in hormonal regulation. Motifs involved in hormonal regulation were found to be second largest in number after stress responsive motifs present in PRs. Few motifs such as ABRE (abscisic acid), CGTCA and TGACG (methyl-jasmonate), GCC-box (ethylene), TCA element (salicylic acid) (Fig 5c) were present in both A. thaliana and O. sativa PRs. Some motifs were only limited to O. sativa PR genes, like TATC and GARE motifs are gibberellinresponsive elements present in OsPR2 and OsPR12, respectively. Abscisic acid responsive elements such as motif II b (OsPR2 and OsPR9), motif lib (OsPR5) and CE1 (OsPR12) were also observed only in O. sativa PRs. PLACE tool shows the presence of E-box and DPBF CORE DCDC3 motifs in OsPR5. Auxin responsive elements, AuxRR-core and TGA-element were found in AtPR9 and AtPR10, respectively. A MeJA responsive element, JERE motif was found in AtPR10. The AGRIS tool revealed the presence of CACATG motif, which is the binding site for MYC2 transcription element and has a role in jasmonic and abscisic acid signaling have been reported in all A. thaliana PRs [64].
Calcium responsive cis-elements. Calcium (Ca 2+ ) is an intracellular regulator, consequential for many plant biological functions. Ca 2+ signaling is a paramount mechanism evolved in plants to defend themselves against pathogens. It is required for inducing defense-related genes and hypersensitive cell death. Calmodulins (CaM) interact with specific TF (WRKY, MYB, and NAC) families, and regulate the expression of defense genes, but the direct role of CaM in regulating plant defense genes has not been studied so far [65,66]. There are few specific promoter motifs which are regulated by calcium. These include ABRE or ABRERATCAL (Abscisic Acid Responsive-Element), C-Repeat/ DRE (Drought-Responsive Element), Site II, CAM box, CRT and W-box [67,68] ABRERATCAL (MACGYGB where M = C/A, Y = T/C, B = T/C/G) is the binding site harbored by ABA-induced gene promoter [69]. In the present study, presence of calcium responsive cis-elements was identified by using PlantPAN. We observed the presence of ABRE related elements in all the A. thaliana PR sequences (Fig 7), whereas, no ABRE like elements was observed in O. sativa PR sequences.
Analysis of conservation of cis-elements in promoter regions of PR genes. Analysis of Phylogenetic conservation of sequences involves the identification of conserved motifs across the genes. The goal of this work was the identification of cis-regulatory sequences conserved in promoters of PR genes of A. thaliana and O. sativa. PlantCare data was analyzed to reveal the conserved sequence motifs in promoters of A. thaliana and O. sativa PR genes (S2 Table). CAAT and TATA box act as binding sites for transcription factors. CAAT box, important in core promoter activity is almost conserved in all PRs of A. thaliana and O. sativa, whereas, TATA box is conserved in most of the PRs except PR5 and PR9. Zuo and Li (2011) also showed the presence of TATA-less promoters in plant genome [70], which indicates that TATA box is not conserved in all the plant genomes. G-box is conserved in all PR promoter regions of A. thaliana and O. sativa except PR2. Ishige et al., [71] examined 11 different G-box tetramers in regulation of GUS gene expression and found each G-box sequence influenced gene expression in different ways. One of the G-box sequences, G-box 10 was shown to confer high level constitutive expression in roots, leaves and seeds. MBS cis-element is present in PR2, PR5, PR9 and PR12. MYB transcriptional factor requires MBS for the gene expression of drought inducible genes. CGTCA motif is also found to be conserved in PR5 and PR9 of A. thaliana and O. sativa, involved in methyl jasmonate (MeJA) responsiveness. It activates series of defense mechanisms in response to different abiotic stresses like drought, salinity and low temperature. MeJA motif in the 5' UTR of PR genes infers a possible role in pathogen stress or wound responses. PR1 and PR5 show the presence of ABRE motif, a positive regulator of abscisic acid signaling under drought stress and high salt condition in the vegetative tissues of plants.
There are few CAREs which are unique to A.thaliana PR proteins (S2 Table). CAREs like A box and CCGTCC box are development related motifs involved in activation of meristem  Table). F box is cis-element conserved in PR9; involved in regulating plant defense responses in response to biotic and abiotic stresses. GCN4 motif present in PR12 has role in endosperm specific gene expression.

Tandem repeats and CpG/CpNpG analysis by PlantPAN
The eukaryotic genome has a wide number of DNA repeats and these repeats have a role in genome evolution [72]. Repetitive DNA may be interspersed in a tandem configuration throughout the genome or may be restricted at some specific location. DNA tandem repeats according to their repeated unit length can be classified into three groups: (i) microsatelliterepeat unit less than 9 nucleotide in length (ii) minisatellite-with 6-100 bp (usually around 15 bp) repeats (iii) megasatellite-tandem repeats of longer units, with length more than 135 nucleotides [18,73]. Microsatellites are codominant, abundant, multi-allelic, so can be used as molecular markers, in linkage mapping and gene tagging [74].The 1.5kbp upstream promoter region of PR genes revealed the presence of DNA tandem repeats. We found tandem repeat units in three AtPRs (AtPR1, AtPR9 and AtPR12) and one in OsPR (OsPR1) ( Table 5). AtPR1 and AtPR9 contain mononucleotide repeats with a repeat size of 1 nucleotide. AtPR12 contain minisatellite. OsPR1 has tetranucleotide repeat with the repeat size of 4 nucleotides. Variation in length of tandem repeats in promoter region could be due to numerical changes like addition and deletion of transcription factor binding sites [75].
Epigenetic modifications like DNA methylation, chromatin remodeling and histone modification are heritable changes in gene expression which influence the phenotype [76]. Among these, DNA methylation is important and affects gene expression in plants and animals [77]. DNA methylation occurs at cytosine base, within CpG dinucleotide or may occur at CpNpG (N = A, C or T) sites [78]. CpG rich regions are named as CpG islands and to classify a genome region as CpG island, three conditions must be fulfilled (i) GC content should be above 50% (ii) length of CpG/CpNpG region should be greater than 200 bp (iii) ratio of observed-toexpected CpG dinucleotide number should be above 0.6 [18]. CpG islands are present at or near the gene's transcription start site and they may regulate the tissue-specific gene expression [79]. DNA of plant species has been shown to contain more CpG dinucleotides than human DNA [80]. Methylation of cytosine at CpG islands has been shown to restrict the access of promoter region of genes to their transcription factors hence, preventing their expression [81]. Cytosine methylation patterns are not static; they change substantially with developmental state or with environmental conditions across the plant genome [82]. DNA methylation has also been shown to play an important role in plant embryogenesis, seed development, in regulating an immune response to infection by pathogens, environmental adaptation and stress resistance. Defects in methylation can cause defect in embryogenesis like abnormal cell division and seed viability reduction, developmental retardation, reduced plant size and partial sterility [77,[83][84]. CpG/CpNpG analysis revealed the occurrence of CpG/CpNpG islands in the second half of the promoter region (towards 3' end) of all the OsPRs except in OsPR2 but none is identified in AtPRs ( Table 6). The absence of CpG/CpNpG islands in the OsPR2 might be due to spontaneous deamination in the germline during evolution [85]. The study performed by Ferguson and Jiang [86] also showed that monocot genome contains more GC-rich content than dicots. The absence of CpG islands in A. thaliana as compared to O. sativa genome may be due to the difference in codon usage, GC-biased gene conversions and mutational biases prevalent in two species [87].

In silico analysis of PR genes expression
Meta-analysis of Genevestigator microarray datasets was performed on A. thaliana and O. sativa PR genes (Fig 8a and 8b). AtPR10 was a highly expressed gene in almost all stages of development except senescence, where, AtPR12 showed the highest expression during the senescence stage. AtPR1 and AtPR2 expression was maximal during developed rosette stage and minimal during the seedling stage. During the germinating seed stage, AtPR5, AtPR9, AtPR10 and AtPR12 showed same level of expression. Analysis of expression patterns of OsPRs revealed that OsPR10 showed the highest expression at dough stage whereas, OsPR12 appeared to be highly expressed at germination stage. OsPR9 and OsPR10 shared a similar kind of expression at many development stages except at heading, milk and dough stages. OsPR2 and OsPR12 also shared low expression during seedling, tillering and stem elongation stages. Expression analysis of A. thaliana and O. sativa PR genes in response to hormonal treatments and abiotic stresses were also investigated by Genevestigator (Fig 9a and 9b). Based on microarray data available, it has been observed that in response to hormonal treatments, AtPR1, AtPR2 and AtPR5 were highly up-regulated by external application of salicylic acid, whereas; AtPR9 and AtPR10 were down-regulated. AtPR1 and AtPR2 were also up-regulated by IAA and ABA, respectively. In case of O. sativa PRs, OsPR9 and OsPR12 were minimally up-regulated by the application of jasmonic acid.
Furthermore, the expression profiles of AtPR genes in response to different abiotic stresses (cold, drought, heat, osmotic, salt and wounding) were also analyzed. Up-regulation was observed in AtPR1 and AtPR2 under drought; AtPR1, AtPR2, AtPR5 under osmotic stress and in wounding; and in AtPR2 under salt stress. Down-regulation was observed in AtPR1 and AtPR2 under heat stress; AtPR9 under salt stress and AtPR10 under cold stress. Expression profiles of OsPRs for cold, drought and salt were also retrieved. OsPR5 was highly and OsPR2 was minimally up-regulated, whereas OsPR1, OsPR9 and OsPR10 were down-regulated under drought stress. Under cold stress, OsPR10 was slightly up-regulated and OsPR1 was minimally down-regulated.
The present study, involved identification of different cis-elements associated with biotic and abiotic stresses present in 5' UTR sequences of PR genes of A. thaliana and O. sativa. The effort has been made to validate the functions of PR's cis-elements with the microarray data and with literature, wherever it is available. Microarray data indicated stage-specific expression of many AtPRs and OsPRs during different development stages. In case of O. sativa PRs, OsPR5 is induced under drought stress and in germinating stage; this may be due to the presence of AtMYC2 and RY-element cis motifs, respectively. Microarray data also revealed slight up-regulation of OsPR12 gene during the flowering stage, this can be linked up with the presence of A-box, cis-regulatory element involved in specific day-length to control flowering in O. sativa [88]. The oxidative stress responsive cis-element like ABRE has been shown to upregulate OsPR2, OsPR5 and OsPR10 under stresses like drought and cold providing clue towards its diverse roles [89]. Various elicitor responsive elements like GCC box and Wbox have been observed in AtPR5 and AtPR12. These were observed to be highly expressed during seed germination. AtPR1, AtPR2 and AtPR5 were up-regulated after wounding; this may due to the presence of wound responsive element (W-box) in their promoter regions. AtPR1 and AtPR2 are highly expressed under drought stress condition which maybe because of AtMYC2 element, which acts as a drought inducible element. TGACG-motif present in AtPR5, AtPR9, AtPR10, OsPR2, OsPR5, OsPR9 and OsPR12 increases the production of secondary metabolites by arresting or delaying cell cycle in G1/S checkpoint. This motif tends to be present in the promoter region of Kip-related proteins (cell cycle inhibitor) and inhibits the active Cyclin-dependent kinase/Cyclin complex [90]. The combined data of cis-element and the meta-analysis of PRs provided insights into the role of many PRs, and hence this information can be employed in future studies of PRs from other plant species.

Conclusion
PR proteins play an important role in providing resistance to plants. Till now, no work has been reported on cis-elements present in A. thaliana and O. sativa PRs. Therefore, in the present work, an in-silico approach was followed to study the presence of cis-elements in PR genes. We also tried to validate our in-silico work with the wet lab studies wherever available. This work throws light on the promoter regions of PRs, which further can provide new ways for the plant genetic engineering technology for protection of crops against diseases.