Physico-chemical characterization and topological analysis of pathogenesis-related proteins from Arabidopsis thaliana and Oryza sativa using in-silico approaches

Plants are constantly under the threat of various biotic and abiotic stress conditions and to overcome these stresses, they have evolved multiple mechanisms including systematic accumulation of different phytohormones, phytoalexins and pathogenesis related (PR) proteins. PR proteins are cluster of proteins with low molecular weight which get incited in plants under different stresses. In this paper, in-silico approaches are used to compare the physico-chemical properties of 6 PR proteins (PR1, PR2, PR5, PR9, PR10, PR12) of Arabidopsis thaliana and Oryza sativa. Topological analysis revealed the presence of transmembrane localization of PR2 and absence of transmembrane domain in PR10 of both model plants studied. Amino acid composition shows the dominance of small aliphatic amino acids i.e. alanine, glycine and serine in both plants studied. These results highlights the similarities and differences between PRs of both model plants, which provides clue towards their diversified roles in plants.


Introduction
Ever increasing human population and drastic climate change being observed in recent decades continue to pose serious threat to growth and productivity of agricultural crops. The latter encounter different types of environmental stresses, mainly categorized as biotic or abiotic. The abiotic stresses include salinity, cold, heat, drought, floods, heavy metals etc. whereas, biotic stresses include attack by pathogens (bacterial, fungal, viral). Both biotic and abiotic stresses are detrimental to plant growth and development because they are known to cause several metabolic dysfunctions in plants and in extreme cases can also cause death of the plant [1,2]. During the course of evolution, plants have evolved a broad range of defense mechanisms for survival under various stressful conditions. These mechanisms involve the responses like activation of Reactive Oxygen Species (ROS); accumulation of different phytohormones like abscisic acid (ABA), ethylene (ET), jasmonic acid (JA), methyl jasmonate (MeJA) and salicylic proteins that is useful in planning laboratory experiments.. Different physico-chemical properties like protein length, amino acid composition, molecular weight, aliphatic index, extinction coefficient, isoelectric point, half-life, instability index and grand average of hydropathicity can be analyzed. Several studies have shown the presence and expression of PR genes in various plants including Arabidopsis thaliana and Oryza sativa. In our earlier study, we compared cis-elements in the promoter regions of PR proteins using in-silico tools [26]. To the best of our knowledge, there is no report on physicochemical properties and topology of various PRs of A. thaliana and O. sativa. Hence, the present study was planned to analyze and compare physicochemical properties and topology of 6 PR (PR1, PR2, PR5, PR9, PR10 and PR12) proteins of each of the two model plants. This study will aid in understanding the occurrence of diversification in different PR proteins of A. thaliana and O. sativa. Further, it will also throw light on the similarities and differences between A. thaliana and O. sativa PR protein sequences.

Results
The results on different PR proteins of Arabidopsis thaliana and Oryza sativa viz. PR1, PR2, PR5, PR9, PR10 and PR12 with respect to their accession numbers, physico-chemical properties, cellular localization, topology and signal peptides are presented in Tables 1-3.

Physico-chemical properties
Physico-chemical properties like protein length, molecular weight, isoelectric point (pI), total number of negatively and positively charged residues, extinction coefficient, instability index (II), aliphatic index (AI) and grand average of hydropathicity (GRAVY) for PRs of A. thaliana and O. sativa were computed using ExPASy ProtParam tool (

Topological analysis
Comparative topological analysis of different PR proteins of A. thaliana and O. sativa using various online tools indicated the presence of transmembrane domains in all PR proteins

Secondary structure prediction
SOPMA tool was used to predict percentage occurrence of secondary structure features (alpha helices, extended strands, beta turns and random coils) of PR proteins of A. thaliana and O. sativa (Figs 3 and 4).The analysis revealed occurrence of maximum frequency of random coils in PR1 and PR5 of both A. thaliana and O. sativa, whereas, alpha helices were found to be maximum in PR2, PR9 and PR12 proteins of both the plants. In case of PR10 protein, number

PLOS ONE
Physico-chemical characterization of PR proteins of extended strands was maximum in A. thaliana, whereas alpha helices were found to be maximum in O. sativa.

Discussion
The present study focused on in-silico analysis of physico-chemical properties of 6 PR proteins (PR1, PR2, PR5, PR9, PR10, PR12) w.r.t various parameters including their subcellular localization, topology and detection of signal peptides. Comparison of protein length and molecular weight of each type of PR in both species (A. thaliana and O. sativa) showed little variation. However, protein lengths and molecular weights of different PRs (PR1, PR2, PR5, PR9, PR10 and PR12) within each plant were significantly different. Subcellular localization, interactions and solubility depend upon isoelectric point and number of positively and negatively charged residues. pI is the pH value at which proteins carry no charge or the sum of negatively and positively charges is equal. pI value more than 7 was observed for PR12 of both the species; PR1 and PR9 of A. thaliana; and PR2 and PR5 of O. sativa. For PR10 of both the plants, the pI value was less than 7. This study is in line with some of the earlier studies which show that PR1 and PR2 can either be acidic or basic in nature [6,44]. The acidic nature of PR10 observed for both the species studied is in confirmation with an earlier study which also showed PR10 to be acidic [45]. The EC value of a protein solution is an important parameter based on amount of light absorbed per mole of protein at a certain wavelength, most commonly 280 nm wavelength is used. EC value of protein is calculated from the number of tryptophan, tyrosine and cysteine residues per molecule because these residues contribute significantly to measured optical density of denatured protein at 276-282 nm range [28,46,47]. In the present study, minimum EC value (500 M -1 cm -1 ) was observed for OsPR12 which is mainly due to four cystine amino acids (two cysteines joined by disulphide bond form cystine). For, AtPR12 relatively higher value of EC (8980 M -1 cm -1 ) was observed which is due to the presence of 1 tryptophan and 2 tyrosine residues in addition to four cystines (8 cysteines). Among different PR proteins analysed, PR2 of both the species was found to be tyrosine rich and PR5 was cystine rich.

Physico-chemical characterization of PR proteins
Instability index (II) indicates about the protein stability under both in-vivo and in-vitro conditions. Proteins with instability index (II) <40 are considered to be stable and those with II value >40 are referred to as unstable [48]. Instability index of PR9 and PR10 of both species studied; and PR1 and PR12 of A. thaliana only were found to be less than 40 indicating their stable nature. PR5 of both plants; PR2 of A. thaliana only; PR1 and PR12 of O. sativa only had stability index of more than 40, indicating them to be unstable proteins. Apart from instability index (II), aliphatic index (AI) is another parameter to check the stability of the proteins. For a protein, AI can be defined as the relative volume captured by aliphatic side chains of amino acids like A (alanine), V (valine), L (leucine) and I (isoleucine). Earlier a good correlation was established between AI and thermostability of proteins by Ikai [49]. Among the PRs of A. thaliana and O. sativa, AtPR5 (59.63) and OsPR1 (59.94) have lower values of AI as compared to other PRs of A. thaliana and O. sativa; indicating that they are less thermostable and have more flexible protein structure. The high AI value indicates that under wide range of temperature conditions, the protein is stable. Apart from studying protein concentration and stability, its hydrophobic or hydrophilic character is also analyzed with the GRAVY score. GRAVY score for particular protein is calculated as the sum of hydropathy values of all amino acids

PLOS ONE
Physico-chemical characterization of PR proteins present in the protein, divided by the number of residues in that protein. Its value lies between -2 to +2 where; negative score means hydrophilicity and positive score indicates hydrophobicity [50]. Proteins with more negative GRAVY score are considered to be hydrophilic in nature with good solubility and vice-versa. If a protein has GRAVY score more than 0.4, suggest its hydrophobic nature and difficult to detect on 2-D gels [51].
In-silico approaches have been used to determine subcellular localization of proteins which plays an important role to depict their function. Three different tools viz., Cello, Euloc and Wolfpsort were used to determine subcellular localization of different PR proteins analysed in this study. Cello tool is a two level support vector machine (SVM) classifier system and its prediction regarding subcellular localization for a particular protein is considered accurate/ acceptable if its reliability/confidence value is at least 1 [52]. In our study, we observed minimum confidence score of 1.5 in case of AtPR9 and 2.9 for OsPR10. Whereas, maximum score of 4 was obtained for AtPR1, OsPR1, OsPR5, OsPR9 and OsPR12. Euloc is a hybrid tool that integrates three different approaches like homology search, Hidden Markov Model and SVM for detection of subcellular localization of proteins [31]. Wolfpsort predicts multi-site (nine sites) localization of a protein. It is a Sequence-based prediction method which along with homology/ functional motifs and sorting signals has greatly improved the accuracy of the prediction of subcellular localization of proteins [53].
Subcellular localization of PR proteins as predicted by Cello and Euloc was similar for all the PR proteins of A. thaliana and O. sativa except for PR1 of O. sativa and PR5 of A. thaliana. The results obtained by Wolfpsort for PR1, PR2, PR5 and PR9 were different from either predicted by Euloc and Cello or both. Subcellular localization of PR10 or PR12 was found to be cytoplasmic and extracellular, respectively and was uniformly predicted by all the three tools used. PR1 is an important antifungal protein and is known to be localized in extracellular space. Previously the localization of PR1 is detected in vacuoles, vesicles of cortical cytoplasm, Endoplasmic reticulum bodies etc., using prolonged dark incubation in combination with salicylic acid treatment of seedlings of A. thaliana by Pecenkov et al., [54]. In our study also, this protein (PR1) was shown to be localized in vacuoles (Euloc in O. sativa and Wolfpsort in A. thaliana). PR2 are group of proteins involved in number of developmental processes as well as in defense against biotic stress. Many studies reported the difference in localization of PR2 proteins in potato cultivars susceptible or resistant to PVY infections. The level of PR2 was found to be higher in cell walls, chloroplasts and vacuoles of susceptible cultivar [55]. In a number of in vivo studies, PR5 proteins have shown anti-microbial activities [56,57]. PR5 has been shown to exhibit sequence similarity with a sweet tasting protein, thaumatin, which is derived from Thaumatococcus daniellii, a shrub from South Africa. Hence, they are also called as Thaumatin-like proteins (TLPs) [58]. Though PR5 proteins are called as TLPs, but none of them have a sweet taste like thaumatin. Based on their molecular weight, TLPs have been shown to fall into two categories viz. high molecular weight group with molecular weight range between 22-26 kDa and low molecular weight group with range less than 18 kDa. High molecular weight TLPs have been shown to get accumulated in cell vacuoles whereas, low molecular weight TLPs are extracellular [59]. PR5 of Gossypium hirsutum (GhPR5) has been shown to contain 242 amino acids with a signal peptide at the N-terminal end that facilitates its secretion into the extracellular space and a signal peptide at C-terminal signal to transport it to vacuoles [60]. In our study, PR5 proteins were found to be of high molecular weight with 26 kDa for A. thaliana and 29 kDa for O. sativa. Though being of high molecular weight PR5 of O. sativa was predicted to be extracellular by three different tools used in the study. However, one of the tools, Euloc indicated the localization of PR5 protein in the vacuoles of A. thaliana. The extracellular location of PR5 is indicative of presence of signal peptide in the N-terminal end of O. sativa as well as in A. thaliana as predicted by Cello tool. The vacuolar location of PR5 in A. thaliana as predicted by Euloc tool is the indication of presence of signal peptide at C-terminal end of protein. PR9 are peroxidases and provide resistance against pathogens. They are extracellular or transmembrane proteins playing an important role in plant cell wall construction by catalyzing lignification [61]. Earlier the subcellular localization of peroxidases from sweet potato (swpa4) using fluorescence microscopy was evaluated, and observed the expression of swpa4 in the extracellular space of cell [62]. PR10 are localized in the cytoplasm and are non-transmembrane proteins getting induced during several biotic and abiotic stresses [63]. In our study, we also found localization of PR10 in the cytoplasm of A. thaliana and O. sativa. Defensins are small and globular proteins belong to PR12 type of PR proteins present in extracellular spaces of plant cells. They are known to provide first line of immunity against pathogen attack [64,65]. All the three tools used in the present study revealed the extracellular localization of PR12 of both A. thaliana and O. sativa.
For the topological analysis, 8 different tools were used ( Table 3). Out of these 8 tools, 3 tools viz TMpred, PHDhtm and TMAP gave similar results with respect to presence or absence of transmembrane domains for all the 6 PR proteins in both the species. Whereas, other tools showed variable results for one or the other PR protein. The tool TMHMM indicated the presence of transmembrane domain in AtPR5 as well as AtPR12 but not in OsPR5 and OsPR12. This might be due to some prediction error of the tool itself [66] or may be the transmembrane domain of OsPR5 and OsPR12 did not meet the cutoff of the tool [67].
Analysis of amino acid composition of all PR proteins of A. thaliana and O. sativa revealed the dominance of amino acids with small aliphatic side chains (alanine, serine, glycine and threonine). Glycine is the smallest amino acid without any side chains and is often found in loop regions. The frequent presence of glycine has been reported in membrane proteins mainly in the transmembrane helices, suggesting its structural role [68,69]. In the present study, among different PR proteins maximum percentage of glycine was observed for PR1 which has been shown to be transmembrane protein by 6/8 tools in A. thaliana and 4/8 tools in O. sativa. PR10 protein has been localized in cytoplasm by three software's used in the study. However, it also shows high percentage of glycine amino acid in both plants. Alignment of PR10 of A. thaliana and O. sativa revealed the presence of glycine motif (GXGGXG), which is also known as RNA binding site. The glycine motif has been shown to be involved in different enzymatic processes, membrane binding and transport, biosynthesis of secondary metabolites, binding to phytohormones etc [70]. Side chain of alanine being non-reactive is not directly involved in the function of the protein but has significant role in substrate recognition. Serine and threonine have a fairly reactive hydroxyl group which forms hydrogen bonds with number of polar substrates. Serine forms a catalytic triad along with histidine and aspartic acid (Asp-His-Ser) in many hydrolases. In rare cases, serine is replaced by cysteine in catalytic triad to fulfil same role. Proline has been shown to act like a molecular chaperon which provide protection against abiotic and biotic stresses by enhancing activities of some enzymes as well as maintaining integrity of proteins [71].

Conclusion
PR proteins are defense related inducible proteins associated with resistance to various kinds of biotic and abiotic stresses in plants. In the present study, several bioinformatics tools were used to study the variations in the physicochemical properties and topology of 6 PRs each of A. thaliana and O. sativa. The results of this study demonstrated that PR2 protein of both A. thaliana and O. sativa are the larger proteins with molecular weights of 37KDa and 35KDa, respectively followed by PR9, in both species viz. A. thaliana (35KDa) and O. sativa (42 KDa). Among A. thaliana and O. sativa PRs, maximum AI was observed for AtPR9 (97.17) and OsPR2 (85.96), respectively. For the subcellular localization prediction, we used 3 tools viz. Cello, Euloc and Wolfpsort. All these tools gave similar results for almost all of the PRs except for AtPR5, OsPR1 and OsPR2. This study throws light on the similarities and differences among the physio-chemical properties, topology, amino acid composition and secondary structure features of 6 PRs (PR1, PR2, PR5, PR9, PR10, PR12) of A. thaliana and O. sativa. This study will help in understanding the occurrence of diversification and functional multiplicity of various PR proteins of A. thaliana and O. sativa.