Identification of a Novel Calcium Binding Motif Based on the Detection of Sequence Insertions in the Animal Peroxidase Domain of Bacterial Proteins

Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca2+ coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33–79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca2+ binding with a KD of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.


Introduction
Bacterial as well as eukaryotic proteins have evolved to recognize calcium ions by a number of structural motifs which can be identified by specific consensus sequences. In eukaryotic cells Ca 2+ is a common intracellular second-messenger molecule and impacts nearly every aspect of cellular life [1]. In bacteria, it is known that calcium has an important structural role in guaranteeing the integrity of the outer lipopolysaccharide layer and the cell wall [2]. The increasing number of proteins containing Ca 2+binding motifs supports the importance of calcium in protein stability, enzymatic activity or signal transduction [3]. Several types of Ca 2+ binding motifs have been identified in bacterial proteins. These include hemolysin-type calcium-binding (HTCaB) region [4], EF-hand [3] and EF-hand like domains [5], [6], [7].
HTCaB containing proteins are extracellular, exported by type I secretion systems [8] and are often determinants of pathogenesis like the hemolysins of several gram negative pathogenic bacteria or symbiosis. Examples include the alkaline protease of Pseudomonas aeruginosa and the nodulation protein NodO. The alkaline protease adopts a beta-roll structure and calcium ions are bound in the loop regions connecting the strands of the roll. Calcium coordination is primarily achieved by interaction with aspartate side chains and glycin oxo-groups [4]. NodO from nitrogen fixing bacteria presents in its C-terminal end an HTCaB-related calcium binding signature, which consists of a multiple tandem repeat of a nonapeptide [9].
Multicellular behavior of the beneficial bacterium Pseudomonas putida KT2440 is sustained by two large extracellular bacterial adhesins, termed LapA and LapF [10]. LapA presents four tandem repeats of HTCaB and LapF three tandem repeats of the NodO calcium binding signature in their C-terminal ends [11]. In the same bacterium there is a third large extracellular protein, encoded by PP_2561, which also has a C-terminal fragment rich in HTCaB repeats. This protein is an important bacterial determinant of plant root colonization and induced systemic resistance against phytopathogens [12]. Remarkably this protein also presents HTCaB sites located on C-terminal extensions of two animal peroxidase-like (ANP-like) domains.
In spite of its designation, proteins of the animal peroxidase_like superfamily are not limited to metazoans and are also found in fungi or plants. This superfamily includes animal heme peroxidases (ANHEMP) and related proteins. The mammalian goat lactoperoxidase [13] and human myeloperoxidase [14] are among the best characterized members of this superfamily. More recently two so far uncharacterized novel families (cd09819 and cd09821) of bacterial heme peroxidases (BACHEMP) have been defined within this superfamily [15]. Members of these families have been found in Proteobacteria, Cyanobacteria, Actinobacteria and Chlamydiae; however their distribution is not ubiquitous within these taxons and often only a small number of strains encode homologues in their genomes. Enzymes of this class had previously been shown to oxidize Mn (II) in fungi [16] and a-proteobacteria [17]. However, an in-frame deletion of a PP_2561 homologue in Pseudomonas putida GB-1 did not compromise Mn (II) oxidation in this strain [18] as an indication that this activity may not be mediated by BACHEMP in P. putida. Even though the biological relevance of the HTCaB sites present in PP_2561 is unknown, it has been suggested that it may be related to the Ca 2+ -mediated modulation of the Mn (II) oxidation activity as observed for a related ANHEMP in Aurantimonas [17].
In this work we have examined in detail the sequence of the ANP-like domains of the protein encoded by PP_2561, renamed here as PepA (Pseudomonas extracellular peroxidase) and those of its homologues in other species. We were able to identify multiple short inserts of conserved sequence within these catalytic domains. Microcalorimetric titration of a peptide containing this inserts consensus sequence and of the purified N-terminal ANP-like domain of PepA revealed that these inserts bind specifically Ca 2+ ions. Since this consensus did not match any of the previously defined sequences of calcium binding motifs (Table 1), we were able to define a novel motif, which was termed PERCAL (peroxidase calcium binding site). Sequence database searches strongly suggested that the PERCAL motif is abundantly present in pro-and eukaryotes.

Results
Identification of Sequence Inserts that are Rich in Glycine and Aspartic Acid within Animal Peroxidase-like Domains of Bacterial Proteins A scan of the protein encoded by PP_2561 (PepA) at MyHits (http://myhits.isb-sib.ch/), which includes PeroxiBase, PRO-SITE and Pfam databases, categorized this protein as a member of the animal heme peroxidase superfamily. To retrieve sequences of bacterial members of this superfamily, UniProtKB (Swiss-Prot and TrEMBL, release July 13 rd 2010) was scanned using the PROSITE profile PS50292, which defines the animal heme peroxidase superfamily, and by applying a taxonomic filter for bacteria. One hundred and twelve sequences containing 125 hits in total were retrieved. Eight of these hits in seven sequences were excluded from further analysis due to an unusually short size of less than 200 amino acids. Interestingly twelve sequences, including three from P. putida (strains KT2440, F1 and GB-1) and nine from different a-proteobacteria, presented two animal peroxidase-like (ANP-like) domains within the same protein.
Since these 12 protein sequences differed widely in length (between 2650 and 3619 amino acids), we decided to compare their 24 catalytic domains (matching profile PS50292) instead of the complete protein sequences. A sequence alignment of these ANP-like domains showed for some sequences multiple insertions (Fig. S1). In the case of the three P. putida homologues, 3 insertions were initially detected in each of the 6 domains. Most interestingly, an alignment of these 18 insert sequences revealed that they share significant sequence similarities, which are defined by the consensus G-x-D-G-x(2)-G-T/N-A-D-D (Text S1). An example of these insertions is shown in figure 1. Visual inspection of the sequences indicated that there are further sequence segments, which show a high degree of similarity to the above consensus. These additional fragments were recognized when a less stringent motif G-x-D-x(6)-D-D was used for the sequence scan. Analysis of the three P. putida proteins with this less stringent motif resulted in the detection of 10 inserts per sequence. The consensus obtained from these 30 inserts identified for the P. putida sequences differed only slightly from the initial one as Thr appeared besides Ala in the ninth position of the consensus ( Table 2; Text S1). Interestingly, all these insertions were located within the two ANP-like domains of the proteins (Table S1). A scheme with the architecture of PP_2561 containing these insertions is shown in figure 2. A total of 119 hits within 27 protein sequences presenting the motif G-x-D-x(6)-D/E-D/E were retrieved from all 105 bacterial heme-dependent peroxidase sequences (Table 2; Table S1). Among these were the 12 sequences that contain two ANP-like domains and, in addition, 15 sequences that harbor a single ANP-like domain. Interestingly, 71 of these hits presented a Gly in the forth position and, with the exception of one hit, all were located within the peroxidase domain (Table 2). In these 70 hits in 23 sequences the tenth residue was always Asp. The negative charge of the side chain was also fully conserved for position 11 of the motif, since aspartate and in two cases glutamate residues were found at this position. In contrast, from the remaining 48 inserts, which do not present Gly in the forth residue, only 4 are found within the ANP-like domain and the rest are in other parts of the protein. Thus in total 74 inserts within the ANP-like domains of 24 protein sequences match G-x-D-x(6)-D/E-D/E (Table 2;  Table S1). A subsequent alignment of such inserts (Text S2) resulted in the web logo of figure 3. Ninety one percent of these insertions match G-x-D-G-x(5)-D-D, of which all except one present G-x-D-G-x-x-G/N-T/N-x-D-D and are contained in 23 protein sequences (Table 2; Table S1). Of these, 12 possess two ANP-like domains and 11 others only one domain (Table 3;  Table S1).
Interestingly, in all bacterial proteins with two ANP-like domains, this motif was present in the N-terminal domain, with the exception of P. putida sequences, which presented insertions also in the C-terminal domain (Table 3; Table S1).
Phylogenetic Analysis of ANP-like Domains of Bacterial Proteins with at Least One Intradomain Hit with the Motif G-x-D-G-x(5)-D-D The 23 full-length protein sequences that contain at least one G-x-D-G-x(5)-D-D motif harbor in total 35 ANP-like domains. A significant number of ANP-like domains contained one of these motifs inserted close to the C-terminal end (Table S1) and the sequences flanking the C-terminal extension of the domains were included to optimize the alignment of this region (Text S3). The phylogenetic tree derived from this alignment (Fig. S2) shows two major branches (Fig. 4). One of them groups the ANP-like domains contained in single domain ANHEMPs with the N-terminal domain of two-domain ANHEMPs, and the other branch groups all the C-terminal ANP-like domains that are free of inserts. N-and C-terminal ANP-like domains clustered in the same branch only in the three P. putida strains (Fig. 4). This suggests that there may have been two evolutionary mechanisms leading to the formation of twodomain ANHEMPs, the duplication of the same domain (applies for P. putida) and the recruitment of two unrelated ANP-like domains in a single protein (the rest).

Search of the Protein Data Bank for Structures which Contain the Consensus Motif Identified
The sequences of the protein data bank (pdb) were searched with the motif G-x-D-G-x(5)-D-D. At the time of the search there were 71138 entries in the pdb. In total 6 protein structures were identified which contain this sequence motif. These structures were a lignin peroxidase (pdb 1B80), an acetate kinase (pdb 1G99), an amine dehydrogenase (pdb 1JJU), a domain of unknown function (pdb 2I6E) and two different pseudopilin structures (pdb 1T92 and 3G20). Figure S3 highlights the segments corresponding to the sequence motifs in these 6 three dimensional structures. In 5 of these structures the motif is surface exposed and forms a loop structure. In 4 of these structures the motif is not involved in the recognition of any ligands. However, in the structures of the lignin peroxidase and pseudopilin (pdb 3G20), this motif was involved in the coordination of a calcium ion and the molecular details of these interactions in the case of pseudopilin is shown in figure 5.
In the pseudopilin structure [19] the sequence motif is present on the fragment comprising amino acids 114 to 125 with the sequence PGPDGVPNTEDD (amino acids of the motif are in bold). As shown in figure 5 the bound calcium ion is pentacoordinated by two main chain interactions (P114 and V119) as well as by three side chain interactions (D117, T122 and D125). The aspartate residues 117 and 125 are part of the sequence motif. No calcium ions were present in buffers used for the purification and crystallization of the protein, which suggests that bound calcium has been co-purified with the protein indicative of a high binding affinity. In the case of the lignin peroxidase structure [20] the bound calcium is coordinated by two interactions with main chain oxy groups, three side chain interactions and two water molecules. From the motif identified only the first glycine (main chain interaction) and the first aspartate residue (side chain) are involved in the coordination of the calcium ion. The C-terminal pairs of aspartate residues are not involved in calcium coordination.
Structural data strongly suggest that calcium is bound at the consensus motifs present in the lignin peroxidase (pdb ID 1B80) and the pseudopilin (pdb ID 3G20) structures. Both structures [19], [20] have been solved at resolution inferior to 1.8 Å and were refined to R values of 0.17 and R free values of 0. 21-0.22 %. These parameters minimize the probability of a misinterpretation of the final electron density maps. In addition there is functional evidence for a role of calcium in the activity of both proteins. Korotkov and colleagues [19] demonstrated that the Hemolysin-type calcium binding region (HTCaB) PROSITE NodO calcium binding motif [9] PR00313 (signature) Tandem repeat of a nonapeptide S-100/ICaBP type calcium binding protein PROSITE Peroxidase activity of pseudopilin, key features of type II secretion systems, depends on the presence of bound calcium. Nie and Aust [21] have shown that calcium ions are released from lignin peroxidase following its thermal denaturation. The same authors were able to correlate the loss of calcium with a loss of enzymatic activity [22].

Design of the BACHEMP-Cons Peptide Which Harbors the Consensus Sequence of the Motif Identified
Based on the above observation, we hypothesized that the sequence motif identified might correspond to another, yet unidentified binding motif for calcium ions. To verify this hypothesis experimentally and to determine precisely the ligand specificity of this motif we designed a peptide for subsequent microcalorimetric titrations with different cations. Due to the flexible nature of the peptide main chain, peptides can adopt many different conformations. However, the inspection of the pseudopilin structure reveals that the amino acids of the motif need to be present in a defined loop structure in order to establish the contacts detailed in figure 5. This structure also shows that the segments before and after the motif, two beta strands, interact with each other in an antiparallel manner. For that reason we hypothesized, that this antiparallel interaction between the strands flanking the motif is important for the adaption of the correct conformation of the motif.
We therefore designed a 21-mer peptide, termed BACHEMP-Cons, in which the first 4 (DIDI) and the last 3 amino acids (IGN) correspond to the flanking beta strands in the pseudopilin structure. These sections are colored in blue in figure 6. The sequence of the fragment of the pseudopilin structure that is involved in Ca 2+ binding was replaced by the consensus sequence derived from the inserts within the ANP-like domains of 24 bacterial proteins (Fig. 3). This consensus sequence is shown in gold in figure 6. The final sequence of the BACHEMP-Cons peptide is thus DIDIVLPGPDGILGTADDIGN. This peptide was synthesized and submitted to microcalorimetric titrations.

Experimental Proof for the Binding of Calcium Ions to the Motif Identified
Isothermal titration calorimetry (ITC) [23] was used to study the interaction between the peptide BACHEMP-Cons and various ligands. Based on the observation that there are various protein structures in which calcium ions are bound to the sequence motif identified, initial experiments were aimed at investigating whether  the peptide designed was able to bind Ca 2+ . Initially polybuffer (pH 6) was titrated with 5 mM CaCl 2 to assess the dilution heat effects. Peaks resulting from this titration were small and uniform (not shown), indicative of weak dilution heats. Subsequently 50 mM peptide were titrated with CaCl 2 . Significant exothermic heat changes were observed which diminished as the titration proceeded (Fig. 7A), upper trace. These heat changes are due to binding events at the peptide. Data analysis revealed that binding was driven by favorable enthalpy (216.368.3 kcal/mol) and counterbalanced by unfavorable entropy changes (TDS = 210.268.4 kcal/mol). An association constant of 30600610000 M 21 was determined which corresponds to a dissociation constant K D of 33611 mM.
This binding event could have been caused by the interaction of the Ca 2+ cation or the Cl 2 anion with the peptide. To verify this issue the titration was repeated using 5 mM NaCl as ligand. However, no binding heats were measured (not shown), which indicates, firstly, that the above binding parameters represent the interaction of the Ca 2+ cation with the peptide and, secondly, that the Na + cation does not bind to the peptide. Subsequently the interaction of the peptide with Ca 2+ was studied at other pH values (Fig. 7, Table 4). At all pH values binding was observed and a plot of the K D values determined as a function of pH is shown in figure 7B. Since the pH optimum for binding was found to be at pH 6.0, all subsequent studies were conducted at this pH.
The next experiment was aimed at verifying whether or to what degree the presence of monovalent cations interferes with the Ca 2+ recognition by the BACHEMP-Cons peptide. To this end the titration of the peptide with CaCl 2 was repeated in polybuffer supplemented with 100 mM NaCl (Fig. 8, Table 4). Data analysis revealed a reduction of affinity by a factor of around 3 (K D = 102611 mM) and a change in the mode of binding, since binding was driven by favorable enthalpy and entropy changes. However, data illustrate that this interaction also occurs in the  Table S1 and Text S2 for more details. doi:10.1371/journal.pone.0040698.g003 Table 2. Sequence variation of the eleven residues inserts found in the animal heme dependent peroxidases of bacteria.  presence of physiologic concentrations of monovalent cations like Na + .

Experimental Evidence that the Motif Identified is Specific for Calcium Ions
A central question in the study of the motif described concerns the specificity of ligand recognition. To evaluate the specificity of binding, microcalorimetric titrations of the peptide in polybuffer, pH 6.0, supplemented with 100 mM NaCl were conducted using chloride salts of Mg 2+ , Cu 2+ , Fe 2+ , Cr 2+ and Mn 2+ . For none of these metal ions was a binding heat detected. This finding is exemplified by the titration of the BACHEMP-Cons peptide with MgCl 2 as shown in figure 8B. The observed heats were small and uniform, matching those observed for the titration of buffer with this ligand.
In summary, the microcalorimetric titrations reveal, firstly, that Ca 2+ binds with a physiologically relevant affinity to BACHEMP-Cons, secondly, that this binding is specific and, thirdly, that binding also occurs in the presence of physiological concentrations of monovalent anions and cations. To study the interaction of Ca 2+ with the protein encoded by PP_2561, the DNA fragment coding for the N-terminal ANP-like domain was cloned in the expression vector pET28 and the resulting his-tagged fusion protein was expressed in E. coli. No hemolysin type calcium binding sites are present in this domain (Fig. 2). Spectral analysis of the purified protein revealed the absence of bound heme, the co-factor essential for catalytic activity. Therefore protein was reconstituted with heme (see materials and methods for details) and spectra of the reconstituted protein were recorded (Fig. S4). From the absorbance at 280 nm and 410 nm a molar ratio heme/protein of 1:0.9 was determined. Subsequently, protein was submitted to spectrophotometric assays to measure its peroxidase activity. Using the conditions described under Materials and Methods a k cat of 2969 min 21 and K M of 1361 mM for peroxide were calculated. These results confirm the annotation of this domain as peroxidase. Since we were able to detect the protein on the bacterial surface (unpublished), the protein encoded by PP_2561 was renamed as PepA (Pseudomonas extracellular peroxidase).
Subsequently, the binding of Ca 2+ to the purified domain of PepA was analyzed using ITC. To avoid that the protein sample contains bound Ca 2+ , the protein was purified under denaturing conditions and then refolded (see Materials and Methods for details). The microcalorimetric titration of the resulting protein with CaCl 2 is shown in figure 9. The titration of the protein resulted in large exothermic signals, indicative of enthalpy driven binding. Data analysis gave rise to a dissociation constant of 1261 mM and an enthalpy change of 224.562 kcal/mol. These data show that calcium binding also occurs to the PERCAL motif containing catalytic domain. To assess a potential influence of calcium binding on the catalytic activity, protein at the same concentration as used for the ITC experiments (11 mM) was exposed to calcium concentration in the range between 3.6-30 mM. Samples should correspond thus to protein differentially saturated with calcium. However, the catalytic activities of these samples were almost indistinguishable, indicating that calcium binding has no modulatory effect on the catalytic activities.

Frequency of the Consensus Sequence G-x-D-G-x(5)-D-D and Four of its Variants in the Databases
The totality of sequences present in the UniProtKB/TrEMBL database were screened using PROSITE for the presence of the identified Ca 2+ binding motif and derivatives thereof (Table 5). To this end the sequences from the individual kingdoms of life were analyzed separately. The observed number of hits was then compared to the estimated number of expected random matches for the corresponding motif, which was calculated using the algorithm described by Nicodème [24]. The ratio of the observed number of hits over its calculated random occurrence can provide first information on the phylogenetic distribution of these motifs ( Table 5, Table S2). Motifs 1 and 2 (Table 5) covered more than 90% of the insertions within the ANP-like domains, although the  ratio to the random occurrence of motif 1was only slightly above 1. This in addition to the results shown in table 2 suggests that this particular motif is little specific. The ratio raised significantly for motif 2, in which the stringency of positions seven and eight was increased. This was more noteworthy in the bacterial kingdom, for which the ratio increased to around ten. It should be mentioned that the core of the BACHEMP-Cons peptide matches motif 2.
Thus we propose G-x-D-G-x-x-[GN]-[TN]-x-D-D as a new Ca 2+ binding motif named PERCAL (of peroxidase and Ca binding). The database was also scanned against motif 3, which presents glutamic acid in the ninth position and covers 82% of the ANPlike intradomain insertions plus the sequence found in 122 pseudopilin sequences including that of the structure with pdb 3G20. In this case the ratio to the random occurrence increased to 26, as a consequence of the elevated value obtained for bacteria (41) though the ratio in eukaryotes was lower than 1. The same search was performed with motif 4, which presents aspartic acid in this ninth position since it was observed in 15 pseudopilin sequences. Interestingly the ratio of hits observed with motif 4 in comparison to the estimated random occurrence increased to 34 in eukaryotes. Motif 5 is the result of restricting the ninth residue to A, T, D or E in motif 2. The ratio calculated for this motif, which includes more than 80% of insertions in peroxidases and the sequences found in pseudopilins, was in all kingdoms of life significantly superior to 1 ( Table 5, Table S2). The core of BACHEMP-Cons peptide also matches this motif. The fact that the ratios of hits using the motif and its derivatives is significantly above the number of random hits expected suggests that a significant number of proteins in bacteria as well as in eukaryotes have evolved to posses PERCAL motifs. The different ratios of specific over random hits in bacteria and eukaryotes might suggest that the motif may further be present in different sub-families. However, experimental data are essential to verify the functional role of PERCAL motifs in other proteins.

Discussion
Previous work in our laboratory revealed the importance of the bacterial determinant PP_2561 in the beneficial interaction that the bacteria P. putida KT2440 establish with the model plant Arabidopsis [12]. The initial annotation of the PP_2561 encoded protein was a putative secreted hemolysin-type calcium-binding bacteriocin (EMBL AAN68170.1). In the course of our investigation it was renamed as a heme peroxidase that we have called PepA. Inspection of the PepA entry at the Conserved Domains Database [25] revealed that it consists of two ANP-like domains (as defined by profile PS50292 of PROSITE) with an internal region of low homology. In addition PepA contains at its C-terminal end a region that shares sequence similarities with the peptidase M10 serralysin. This fragment contains multiple calcium binding motifs defined by Pfam00353. PepA not only presents Ca-binding sites concentrated at its C-terminal end as part of the serralysin-like Cterminal domain but also the hemolysin-type calcium-binding (HTCaB) motifs (signature PS00330) on the segment flanking the C-terminus of both ANP-like domains (Fig. 2). Given that the sequence fragments recognized by profiles PS50292 and PS00330 did not overlap the region of low homology observed within the ANP-like domains, it can be concluded that they were not due to these well characterized HTCaB signatures. In addition, a distinctive feature of the ANP-like domains of PepA was the presence of short insertions of eleven residues, which were conserved in the domains of their P. putida homologues (strains F1 and GB1) and in other homologous proteins from diverse bacterial taxons (Table 3). The sequence analysis of these inserts in the ANP-like domains resulted in the definition of an initial motif G-x-D-G-x(5)-D-D, which was used for searches in the pdb database. Interestingly, one of the 6 entries identified (pseudopilin) contained a hit with the sequence G-p-D-G-v-p-N-T-e-D-D involved in the coordination of a calcium ion (Fig. 5), which also matches the more specific consensus sequence G-  pseudopilin is not predicted by PROSITE, Pfam and SPRINT to possess a calcium binding motif, experiments were conducted to determine whether the consensus defined in the figure 3 binds calcium ions. Microcalorimetry was employed to study ligand binding to the BACHEMP-Cons peptide that contained a motif sequence matching this consensus. Binding occurred over the pH range tested (pH 5-9) with dissociation constants ranging from 33-79 mM. This value is well above the intracellular calcium concentration, which in E. coli is approximately 90 nM [26]. However, the affinity of calcium binding proteins for Ca 2+ ranges from submicromolar to millimolar dissociation constants [6], [27], [28]. These differences in affinity may reflect the differences in the calcium levels present in the protein environment. Due to the relatively low cellular Ca 2+ concentration, cytosolic proteins may have evolved to recognize calcium more tightly than extracellular proteins. It should be mentioned that the niche for P. putida is garden soil and the plant rhizosphere [29]. The exchangeable Ca 2+ concentration (ECC) in soil varies considerably but even for a soil containing a low ECC of 20 mmol/L [30], this value is significantly higher than the above mentioned cytosolic concentration. In this context a dissociation constant of 12 mM, as obtained for the N-terminal peroxidase domain of PepA, would result in a calcium saturation of the protein.
Ca 2+ binding to the BACHEMP-Cons peptide also occurred, although with reduced affinity, in the presence of physiologic concentrations of monovalent ions. Most importantly, no binding of other bivalent cations was observed, which demonstrates that the PERCAL motif is specific for calcium. Calcium binding occurred also to the recombinant purified N-terminal peroxidase domain of PepA with an affinity superior to the values observed for the peptide. Similarly to our results, an increase in Ca +2 binding affinity to the protein with respect to different peptides have also been observed [31]. It should be noted that the stoichiometry of calcium binding was found to vary considerably between the individual protein lots. This may be due to technical difficulties to generate protein entirely devoid of calcium, in spite of having used reagents with the lowest calcium traces to prepare solutions for The values shown are means and standard deviations from three individual experiments.   Table 4. doi:10.1371/journal.pone.0040698.g008 protein refolding. We therefore have to consider the protein used for ITC binding studies in figure 9 as partially Ca 2+ saturated. It is tempting to speculate that, high affinity PERCAL sites are already calcium saturated in the samples used, which consequently implies that ITC permitted only the observation of the lower affinity calcium binding events. The consensus proposed for PERCAL corresponds to motif 2 in table 5. This motif recognizes 90 % of the inserts in the ANP-like domains. In addition, its observed frequency is for bacteria as well as for eukaryotes above the expected value for a random distribution. The ratio of the observed hits over its estimated random quantity was particularly elevated in prokaryota, which suggested that the Ca 2+ binding motif here identified is particularly frequent in this kingdom (Table 5; Table S2). A similar result was obtained for several sequences included in versions of the PERCAL motif, i.e motif 5 of Table 5, which has the ninth position of the consensus restricted to the four amino acids Ala, Asp, Glu and Thr. In this case the ratios increased up to about 8 in eukaryotes and 35 in prokaryotes, which is consistent with the proposition that PERCAL is an omnipresent new Ca 2+ binding motif. The ninth position of PERCAL deserves an additional attention since residues at this position appears to be specific for individual protein families: 1) in ANP-like domains of bacterial proteins PERCAL habitually presents A/T in this position (86%); 2) in the general secretion pathway protein G (or pseudopilin) [32], this position is in all cases occupied by D/E and 3) when this position is occupied by aspartate, as for motif 4 of Table 5, the resulting motif, identified in uncharacterized proteins and proteins containing a Zn finger domain elsewhere, is 34 times more frequent than randomly expected in eukaryotes. However, we have maintained an ambiguity in the ninth position of this consensus sequence although motif 5 (G-x-D-G-x-x-[GN]-[TN]-[ATDE]-D-D) was over represented in the databank compared to PERCAL. This decision was taken since the introduction of a restriction at this position would have prevented the detection from more than half of the eukaryotic hits identified with PERCAL (Table S2). In addition, a restriction of position 9 would also have resulted in the detection of fewer insertions in the ANPlike domains. We conclude that each of these insertions, five in every ANP-like domain of PepA, would be able to bind calcium. Further knowledge on the structures of proteins containing PERCAL, especially eukaryotic proteins, is needed in order to identify residues that are critical for binding and that consequently should be playing a central role in the precise definition of the PERCAL motif.  Table 5. Presence of the different versions of the PERCAL calcium binding motif in bacteria and eukaryotes.

Motif Number
Motif sequence

Coverage of insertions (a)
Ratio of observed sequence motifs over estimated random occurrence  The analysis of the pseudopilin structure provides insight into the molecular basis of the recognition of Ca 2+ by the PERCAL motif. There are five direct interactions of which three involve main chain oxygen atoms (amino acids 1, 5 and 8). Of these, only amino acid at position 8 shows a high degree of conservation. Most interestingly, there are only two side-chain interactions (aspartates at positions 3 and 11). This may suggest that the remaining conserved amino acids of the motif have a structural role and may contribute to the formation of the loop structure. Frequently, ligand recognition is mediated by an indirect interaction with protein bound water. The pseudopilin structure has been solved at a resolution of 1.78 Å and contains 265 water molecules [19]. However, the inspection of the calcium binding site reveals the absence of bound water suggesting that water may not be involved in Ca 2+ recognition. Striking parallels exist to other calcium binding motifs, as exemplified by the HTCaB region of the alkaline protease [4]. In analogy to PERCAL motif, Ca 2+ establishes only two interactions with amino acid side chains. As in the case of PERCAL these side chains are aspartic acid residues. In the structure of the alkaline protease the remaining 4 interactions involve main chain oxygens, of which three are from conserved glycine residues.
We have investigated the catalytic activity of the N-terminal ANP-like domains of PepA, which contains 5 PERCAL insertions. The K M for H 2 O 2 of the His-tagged N-terminal domain (PepA-Nter) after its reconstitution with heme was estimated to be 1361 mM. K M values higher than 10 mM have been reported previously for the oxidation of 4-aminoantipyrine (same electron donor than the one used in this study) by H 2 O 2 in the presence of cytochrome c [33]. There is a wide range in the kinetic constants of peroxidases. As an example k cat values for myoglobin and horseradish perodixase vary as much as three orders of magnitude (21 and 60000 min 21 , respectively) [34]. The work performed here with PepA constitutes to our knowledge the first testimony in the literature reporting kinetic parameters on bacterial heme peroxidases that are included in the animal peroxidase_like superfamily. It should be mentioned that no peroxidase activity was observed for PepA using an alternative assay that employs a different electron donor [35]. This suggests that the nature of the electron donor is decisive to measure its catalytic activities. However, the physiologically relevant electron donor of PepA remains to be identified. Another possible hypothesis to explain the low peroxidase activity of PepA-Nter may be the reduction of enzymatic activity caused by the insertion of PERCAL motifs although further work with homolog proteins is required to assess whether these catalytic properties are conserved.
The binding of calcium to pseudopilin was found to be essential for protein activity [19]. To verify whether Ca 2+ binding modules PepA activity, the peroxidase assays were conducted using hemereconstituted PepA-Nter differentially saturated with Ca 2+ . The catalytic properties of these differentially Ca 2+ saturated proteins did not reveal any major variations, but considering the low magnitude of response a more subtle Ca-mediated modulation of peroxidase activity cannot be excluded. Hence it is plausible that the protein was partially saturated with its ligand previous to the incubation with Ca 2+ , given the difficulties to generate protein entirely devoid of calcium.
The analysis of the consensus sequences of known calcium binding motifs (Table 1) shows that the HTCaB motif is the closest relative of the PERCAL motif. In total PepA is predicted to contain 13 Ca-binding sites of the HTCaB type and 10 of the PERCAL type. As stated above the PERCAL sites are exclusively present on the ANP-like domains whereas the HTCaB sites are solely present on the segment located C-terminal to both ANP-like domains. PERCAL mediated plant root surface binding of PepA and a detoxification of reactive oxygen species caused by the peroxidase activity of PepA may be key mechanisms for the apparent role of PepA in rhizosphere colonization and the triggering of induced resistance against phytopathogens [12]. Current work in our laboratory is being performed to elucidate whether this protein is involved in Ca 2+ -mediated cell-cell adhesion, adhesion to biotic surfaces and in addition if its mechanism of action in the interkingdom communication is related with oxidative stress resistance.
The two ANP-like domains of PepA share around 70 % of sequence identity. Thus certain similarities in their catalytic properties are expected. Due to this elevated degree of sequence identity these domains clustered together in a phylogenetic tree of ANP-like domains of bacterial heme peroxidases containing PERCAL (Fig. 4). This was unique for PepA and their P. putida homologues and suggests that PepA is the result of a gene duplication event followed by divergent evolution. However, for the PepA homologues with two ANP-like domains from other species, their N-and C-terminal domains did not cluster together and remarkably in all cases only the N-terminal domains presented PERCAL motives. Therefore, two types of ANP-like domains can be distinguished: those with and those without PERCAL motif. This hence raises the question as to the functional differences between both types of domain. In this context the study of a PepA homologue from Aurantimonas [17]) can offer a work hypothesis. This protein, which has one ANP-like domain free of PERCAL inserts, was found of being able to oxidize Mn (II). Since P. putida GB1, which has a PepA homologue with PERCAL insertions in both ANP-like domains, lacks the Mn (II) oxidizing activity [18], it is tempting to speculate that the PERCAL-free ANP-domain of the Aurantimonas PepA homologue may be the cause of the catalytic activity. However it should be pointed out that in Erythrobacter sp. strain SD-21 the Mn-oxidizing protein contains only one ANP-like domain [17] with three PERCAL insertions (Fig. 4), which indicates that at present it is not possible to predict the catalytic properties of animal heme peroxidases of bacterial proteins. In addition, the PERCAL motif could account for the Ca 2+ -mediated modulation of the Mn (II) oxidation activity, as observed for the PepA homologue in Aurantimonas, besides the proposition that points to the HTCaB sites [17].
The recent work of Marchler-Bauer and coworkers [15] has led to the differentiation of two subtypes of bacterial ANP-like domains, namely An_peroxidase_bacterial_1 (cd09819) and An_peroxidase_bacterial_2 (cd09821). We could confirm that the presence or absence of PERCAL motif did not determine whether a given domain belongs to one or the other family. This is exemplified by the fact that the C-terminal domains of two heme peroxidases, Q1YMS2 and A3XF15 (both free of PERCAL), were included into the same cd09821 sequence cluster together with the N-and C-terminal domains of PepA, whereas in the tree generated here (Fig. 4) they were grouped apart in a different branch from the PepA domains. We thus propose the presence of PERCAL as a significant phylogentic factor in the so far uncharacterized bacterial family of heme peroxidases.

Database Searches
Initial searches were performed against UniProtKB/TrEMBL (release July 13 rd 2010, 11397958 entries) using the PROSITE profile PS50292 to identify proteins containing the animal peroxidase-like (ANP-like) domain. A taxonomy filter for bacteria was used. Segments recognized by profile PS50292, corresponding Text S3 Sequences of the ANP-like domains of bacterial proteins used in this work according to Prosite profile PS50292. (DOCX)