Detection, Characterization and Evolution of Internal Repeats in Chitinases of Known 3-D Structure

Chitinase proteins have evolved and diversified almost in all organisms ranging from prokaryotes to eukaryotes. During evolution, internal repeats may appear in amino acid sequences of proteins which alter the structural and functional features. Here we deciphered the internal repeats from Chitinase and characterized the structural similarities between them. Out of 24 diverse Chitinase sequences selected, six sequences (2CJL, 2DSK, 2XVP, 2Z37, 3EBV and 3HBE) did not contain any internal repeats of amino acid sequences. Ten sequences contained repeats of length <50, and the remaining 8 sequences contained repeat length between 50 and 100 residues. Two Chitinase sequences, 1ITX and 3SIM, were found to be structurally similar when analyzed using secondary structure of Chitinase from secondary and 3-Dimensional structure database of Protein Data Bank. Internal repeats of 3N17 and 1O6I were also involved in the ligand-binding site of those Chitinase proteins, respectively. Our analyses enhance our understanding towards the identification of structural characteristics of internal repeats in Chitinase proteins.


Introduction
Chitin is one of the most abundant biopolymer in nature and is made up of an insoluble homopolymer of b-1,4 linked N-acetyl glucosamine (GlcNAc) units [1].Chitin serves a morphological structural role in arthropods, including crustaceans and insects, as well as mollusks, nematodes, and worms.It is also found in fungi, making up from less than 1% to more than 40% of the cell wall, depending on the species [2].Chitinases are hydrolytic enzymes that break down the glycosidic bonds in chitin.Chitinases are occurring in organisms that need to either reshape their own chitin or dissolve and digest the chitin of other invading fungi and animals.
Chitin has not been found in mammals.Nevertheless, several mammalian proteins with homology to fungal, bacterial, or plant Chitinase have been identified [3].All Chitinases have been recognized to play important roles in self-defense against pathogens [4].Most recently, however, some Chitinases have been found to appear in response to environmental stresses, such as cold, drought, and high salt concentration [4].Other Chitinases are reported to participate in important physiological processes of plants, such as embryogenesis and ethylene synthesis [4].The variable effectiveness of specific Chitinases against different pathogens and the existence of microbial Chitinase inhibitors led to the hypothesis that Chitinases may co-evolve with fungi in response to variation in pathogen defenses against chitinolytic activity [5].
The majority of protein sequences is aperiodic and usually has globular 3D structures carrying a number of various functions.The foremost efforts of researchers were devoted to these types of proteins and as a result, significant progress has been made in the development of bioinformatics tools for their analysis [6,7].However, proteins also contain a large portion of periodic sequences representing arrays of repeats that are directly adjacent to each other [8].
Intragenic duplications of genetic material have important biological roles because of their protein sequence and structural consequences [9].Bioinformatics tools are important for analysis of protein repeats with emphasis on the sequences, 3D structures, and sequence-structure relationship as well as highlighting successful strategies for the prediction of the protein structure [10].These tandem repeats are considerably diverse, ranging from the repetition of a single amino acid to domains of 100 or more residues.They are ubiquitous in genomes and occur in at least 14% of all proteins [11].Before analysis of repeats, it just needs to score protein sequences in multiple sequence alignment.Common methods (e.g. the dot matrix method) for detection of similarity depend on pairwise alignment of sequences [12].The abundance of natural structured proteins with tandem repeats is inversely correlated with the repeat perfection.The chance to find natural structured proteins in Protein Data Bank (PDB) (http://www.rcsb.org/pdb) increases with a decrease in the level of repeat perfection [10].
When a certain threshold of the conserved residues in the repeat is exceeded, the repetitive regions of proteins are predominantly disordered and the main reason of residue conservation in tandem repeats may due to the change from a structural to an evolutionary one [13].Hence, internal repeats in Chitinase involved in diversification of Chitinases with different structural and functional properties and it may also play role in quick evolution of Chitinase in all organisms.Repetitive sequences apparently formed after the prokaryotic-eukaryotic divergence by a mechanism with weak length-dependence such as recombination.Repetitive proteins evolve quicker than non-repetitive proteins [11].Protein repeats have highlighted the multi-functionality of repeat types, their structural differences, and their proliferations in different evolutionary lineages.One likely reason for their evolutionary success is that repeat-containing proteins are relatively ''cheap'' to evolve.By this we mean that large and thermodynamically stable proteins may be arisen by the simple expedient of intragenic duplications, rather than the more complex processes of de novo a-helix and bsheet creation [14].

Selected sequences of Chitinase
Chitinase sequences were obtained from PDB [15].].Among 147 Chitinase sequences of known structure retrieved from PDB, 34 sequences were selected based on 50% sequence identity, which includes both eukaryotic and prokaryotic Chitinase sequences.Among the obtained 34 sequences, ten did not have the Chitinase domain and these were excluded from further analysis.The remaining 24 Chitinases sequences were subsequently used to analyze for detection of internal repeats and secondary structure (Table 1).

Detection of internal repeats using RADAR
We used RADAR (Rapid Automatic Detection and Alignment of Repeats) (http://www.ebi.ac.uk/Tools/pfa/radar/) to identify internal repeats in protein sequences.Many large proteins evolved from internal duplication and many internal sequence repeats correspond to functional and structural units.RADAR uses an automatic algorithm by segmenting query sequence into repeats and identifies short composition biased as well as gapped approximate repeats.Complex repeat architectures involve many different types of repeats in query sequence [16].The segmentation procedure has three steps: (i) repeat length is determined by the spacing between suboptimal self-alignment traces; (ii) repeat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment.

Computing the % identity between the repeat sequences detected by RADAR
As RADAR gives only a Z-score between the repeats, we computed the % identity between each repeat pair or the tandem repeats (more than a pair of repeats) in a protein using the Smith-Waterman server available at the European Bioinformatics Institute (http://www.ebi.ac.uk/Tools/psa/emboss_water/) [17,18].The structural relatedness of the proteins involves consideration of average root-mean-square deviation (RMSD) of Ca atoms and Z-score between structures.The structural similarity of the 24 Chitinase structures was carried using PDBeFOLD server [19].The PDB structures were downloaded from RCSB website (http://www.rcsb.org/pdb)and the PDB coordinates were uploaded to the server for finding structural similarity.PDBeFold structural similarity searches were conducted using WWW interface at http://www.ebi.ac.uk/msd-srv/ssm/.

Visualization using RasMol
RasMol is a molecular graphics visualization tool which is used for primary depiction and exploration of biological macromolecular structures, such as those found in the PDB [20].The secondary structure region which is corresponding to internal repeat sequences was used for structural analysis.The secondary structure of the Chitinase was retrieved from PDB and then the repeated region was detected as structure.The repeated region was visualized in 3-D structure using RasMol software and the repeated sequences were separated and visualized using RasMol.PDB file of all Chitinase sequences downloaded from PDB were edited and extracted the repeated amino acid sequence in separate files for comparison in RasMol.PDB files can be downloaded for visualization in RasMol.

Multiple sequence alignment and phylogenetic tree
Multiple sequence alignment was carried out using ClustalW [21] and MUSCLE [22].The phylogenetic tree was constructed using Neighbor Joining method implemented in MEGA [23].The bootstrap analysis with 10,000 replicates was used to assess the robustness of the branches.

Internal repeats analysis
Of 24 selected sequences of Chitinase from various organisms, RADAR was performed to detect the internal repeats.Six out of 24 sequences (2CJL, 2DSK, 2XVP, 2Z37, 3EBV and 3HBE) do not contain any internal repeats.The repeats in the remaining sequences vary from 2 repeats per amino acid sequence of Chitinase proteins.Some Chitinases with more than two repeats were also observed.For example, 3IAN, 3N17, 3ARX, 2DKV, 1ITX, and 1WB0 contain two repeated regions; 3ARX and 3ALF contain two tandem repeats and 3QOK contains four tandem repeats.Length of amino acid residues of Chitinase proteins which are identified in repeat region also varies.Ten sequences contained repeats of length ,50, and the remaining 8 sequences contained repeat length between 50 and 100 residues.Table 2 shows the % identity obtained between pairs of repeats or tandem repeats in a given Chitinase.Analysis of the extent of sequence identity between the internal repeats reveal that in general shorter repeats have higher % identity while longer repeats have low % identity.This reveals that the repeats have diverged considerably after the duplication event.

Fold distribution of Chitinases
The Chitinases appear to be very diverse in terms of sequence and yet adopt only a limited number of folds.Analysis of the folds of the Chitinases using CATH database (http://www.cathdb.info)reveals that they belong to two major folds, namely, i) Triosephosphate isomerase (TIM) barrel fold and ii) Endochitinase fold.TIM barrel is a conserved protein fold consisting of eight a-helices and eight parallel b-strands that alternate along the peptide backbone [24].Among the 24 Chitinases considered, 18 of them belong to the TIM barrel fold and 6 belong to the Endochitinase fold.Inter-repeat % sequence identity among TIM barrel fold sequences As a number of TIM barrel fold Chitinases contain long repeats, we assessed the % sequence identity across the various repeats in this fold using the Emboss Waterman -Smith local alignment algorithm.Quite interestingly, the Chitinases including 1ITX, 3ARX, and 1KFW all shared .40%sequence identity in the repeat regions (Table 3).Analysis of the presence of DXDXE functional motif in Chitinase sequences reveals that this motif was conserved in all sequences of the TIM barrel fold.The rest of the sequences which belong to the Endochitinase fold did not contain the above motif.Interestingly, this motif was also present in the RADAR detected internal repeat region of 1ITX, 3ARX, 3G6M and 1KFW.The inter-sequence repeat analysis carried out between the Chitinases containing internal repeats and those without internal repeats showed scores less than 25% identity.

3-D structural similarity between the Chitinases
The RMSD and Z-scores obtained for pair-wise structural alignments obtained between the Chitinases belonging to the TIM fold and Endochitinase fold are given in Table S1 and Table S2 respectively.In general all the structures retain similar threedimensional structures as revealed by the low RMSD values and high Z-scores.Among the Chitinases belonging to the TIM fold, the structures of 3G6M, 1O6I, 3ARX, 3OA5, 1ITX, 1KFW, 1WBO, 3QOK, and 3ALF shared an RMSD ,2.0 angstrom (A ˚).Quite interestingly, proteins belonging to this set with 3G6M, 3ARX, 1ITX, and 1KFW share reasonable inter-repeat % identity between them (Table 3).Other proteins belonging to the TIM fold share RMSD .2.0 A ˚(Table S1).
Among the Chitinases belonging to the Endochitinase fold, most of them share RMSD ,2.0A ˚whereas the pairs 3HBE vs 2CJL, 3HBE vs 1WVV show low RMSD.It is interesting to point out that among these three proteins, both 2CJL and 3HBE do not have any repeats and 3-D structural similarity within repeats (intra-repeat) in Chitinases (Table 2).
Surprisingly, in many cases the repeats are too divergent to be identified as similar structure based on visual analysis.Structural alignment of these repeats may uncover more similar members and provide an objective way to identify truly dissimilar structural repeats.Hence structural superposition of repeats of Chitinases belonging to the TIM barrel fold was carried out.The results reveal that the RMSD between superposed repeats ranges from 0.70 A ˚to 3.8 A ˚(Table 2).Ignoring repeats of short length, the variation in RMSD with % sequence identity of intra-repeats in 8 Chitinases belonging to the TIM barrel fold is plotted in Figure 1.The results demonstrate that repeats in 3ARX, 1ITX, 3BXW and 3G6M show larger deviation in structure as shown by RMSD .2.5 A ˚. Repeats in 3N17, 3ALF, 1KFW and 3QOK show lower structural divergence (RMSD,2.5A ˚).

Structural visualization of internal repeats in Chitinase
The internal repeats identified using RADAR were used to separate the secondary structure of those repeat regions from whole secondary structure of that particular Chitinase protein sequence.When comparing the identified internal repeat amino   structural arrangements (Figures 2 and 3).In other cases, although repeats could be identified based on sequence similarity, no structural similarity could be observed.

Analysis of amino acid residues of repeat segments present in ligand binding site
We further analyzed the involvement of residues in the repeat segments in the binding of ligands.Excluding the binding of very small ligands such as sulphate, phosphate and glycerol, we observed binding of N-acetyl-d-glucosamine (NAG) in 3N17 and that of a cyclic dipeptide C14 in 1O6I.In 3N17 Chi A, apart from residues Gln 109 and Ala 287, Gln 145 from repeat 1 and Asn 228 from repeat 2 are involved in binding of NAG.Like-wise, residues Met 212 and Tyr 214 in 1O6I from repeat 1 are involved in the binding of cyclic dipeptide C14.The other binding site residues namely, Trp 97, Glu 144 and Trp 403 are not part of the repeated segment (Figure 4).

Multiple sequence alignment and phylogenetic analysis of Chitinases
The multiple sequence alignment for 18 TIM barrel fold Chitinases and 6 Endochitinase fold Chitinases considered in the study are showed in Figure S1 and Figure S2, respectively.Wherever present, the repeat segments are marked in the sequences.As the Chitinases considered belong to a diverse set of sequences, no uniformity in the location of repeats could be observed.The phylogenetic tree revealed two major clusters with 100% bootstrap support, one having all Chitinases belonging to the TIM barrel fold and another having the Endochitinase fold (Figure 5).We also performed phylogenetic analysis for each fold Chitinases.The phylogenetic relationships of Chinitases with Endochitinase fold are similar to the combined phylogenetic analysis (Figure S3), but relationships of Chinitases with TIM barrel fold show some discrepancy to the combined analysis (Figure S4), which suggested the sequence divergence is higher for TIM barrel Chinitases.

Conclusions
The sequence comparison between different organism of both eukaryotes and prokaryotes reveals occurrence of internal repeats in Chitinase protein in most cases.The Chitinases considered here adopt two major folds, namely, the TIM barrel fold and the Endochitinase fold.There are huge differences in the number of internal repeats and number of amino acid residues present in each internal repeat.The present study reveals that in general intra-protein repeats of length .50show low % identity, reflecting the considerable divergence that has taken place after the duplication event.Repeats in some Chitinase belonging to the TIM barrel fold also show considerable structural divergence as revealed by higher RMSD values.Also the sequence location of the repeats is not uniform.Quite interestingly, in spite of divergence at the sequence level, almost of all the structures considered in the present study retain similar three-dimensional folding as revealed by the low RMSD values.Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units [16].The present study suggests that the internal repeats present in Chitinases do not disturb their stability or alter their structures or function.
Table S3 RMSD and Z-scores of structural superposition of proteins belonging to the Endochitinase fold.(PDF)

Figure 2 .
Figure 2. Internal repeats with their corresponding secondary structure.The internal repeats identified using RADAR was used to compare the internal repeats with its secondary structure using secondary structure database of PDB.The structure revealed the secondary structure as follows: T: Turn, E: Beta strand, G: 3/10 helix, B: Beta bridge, S: Bend, H: Alpha-Helix.These five repeats showed similar secondary structures between the internal repeats of corresponding Chitinase sequences.A: 1ITX (Bacillus circulans) shows the repeat regions 33-114, 235-317 and 159-213, 360 -428 and their corresponding DSSP secondary structure assigned from PDB; B: 3SIM (Crocus vernus) shows internal repeat regions from 156-178 and 187-212 and its corresponding secondary structure assignments.doi:10.1371/journal.pone.0091915.g002

Table 1 .
List of amino acid sequences of Chitinase protein used in the present study.

Table 2 .
List of internal repeats identified in different Chitinase sequences available in the Protein Data Bank with % identity between the repeats and RMSD. doi:10.1371/journal.pone.0091915.t002