A Computational Module Assembled from Different Protease Family Motifs Identifies PI PLC from Bacillus cereus as a Putative Prolyl Peptidase with a Serine Protease Scaffold

Proteolytic enzymes have evolved several mechanisms to cleave peptide bonds. These distinct types have been systematically categorized in the MEROPS database. While a BLAST search on these proteases identifies homologous proteins, sequence alignment methods often fail to identify relationships arising from convergent evolution, exon shuffling, and modular reuse of catalytic units. We have previously established a computational method to detect functions in proteins based on the spatial and electrostatic properties of the catalytic residues (CLASP). CLASP identified a promiscuous serine protease scaffold in alkaline phosphatases (AP) and a scaffold recognizing a β-lactam (imipenem) in a cold-active Vibrio AP. Subsequently, we defined a methodology to quantify promiscuous activities in a wide range of proteins. Here, we assemble a module which encapsulates the multifarious motifs used by protease families listed in the MEROPS database. Since APs and proteases are an integral component of outer membrane vesicles (OMV), we sought to query other OMV proteins, like phospholipase C (PLC), using this search module. Our analysis indicated that phosphoinositide-specific PLC from Bacillus cereus is a serine protease. This was validated by protease assays, mass spectrometry and by inhibition of the native phospholipase activity of PI-PLC by the well-known serine protease inhibitor AEBSF (IC50 = 0.018 mM). Edman degradation analysis linked the specificity of the protease activity to a proline in the amino terminal, suggesting that the PI-PLC is a prolyl peptidase. Thus, we propose a computational method of extending protein families based on the spatial and electrostatic congruence of active site residues.


Introduction
Proteolytic enzymes catalyze the cleavage of peptide bonds in proteins and are divided into several major classes based on their mechanism of catalysis [1,2].The MEROPS database systematically categorizes these protein families and clans to provide an integrated information source [3].The abundance of proteolytic enzymes in biological systems results from the varied physiological conditions under which these enzymes have evolved to be effective [4].
We selected proteases with known active sites and 3D structures from each family listed in MEROPS and encapsulated their active site motifs into a single protease search module.We previously presented a bottom-up method for active site prediction (CLASP) using active site residues [5].Subsequently, we used CLASP to quantify promiscuous activities in a wide range of proteins [6].
Here, we used CLASP to query proteins of interest for proteolytic function using this search module.Such a search module is equivalent to running a BLAST search from the MEROPS database site [7,8].
While BLAST looks for sequence homology, CLASP detects spatial and electrostatic congruence between residues to predict similar catalytic properties in proteins.Sequence alignment techniques are known to fail to detect distant relationships since considerable divergence often resembles noise [8].More importantly, proteins redesigned from chiseled scaffolds through exon shuffling and those resulting from convergent evolution remain beyond the scope of such methods [9].The phenomenon of convergent evolution, first proposed in serine proteases [10], is no longer considered to be a rare event [11,12].Structural alignment methods have addressed some of these deficiencies, but can be misled by non-catalytic parts of the protein [13].A recent method employs learning techniques to predict whether proteins have proteolytic activities, but has not identified any novel proteases undetected by other methods [14,15].CLASP unraveled a promiscuous serine protease scaffold in alkaline phosphatases (AP) [5], one of the widely studied promiscuous enzyme families [16,17], and also a scaffold recognizing a b-lactam (imipenem) in a cold-active Vibrio AP [18,19].
Several conserved proteases have been implicated in bacterial pathogenesis [20].Proteases are integral components of outer membrane vesicles (OMVs), which all gram-negative bacteria shed as blebs from the cell surface [21].We queried other proteins present in OMVs using the CLASP protease search module and found that phosphoinositide-specific phospholipase C (PI-PLC) is a Pro-X specific protease.PI-PLCs are part of the signal transduction pathways of higher organisms [22][23][24].Prokaryotic PI-PLCs are important virulence factors that alter the signaling pathways of higher organisms [25][26][27].We demonstrated a serine protease domain in PI-PLC from Bacillus cereus through its proteolytic activity and the inhibition of its native activity on phospholipids by serine protease inhibitors (IC50 = 0.018 mM).Edman degradation analysis demonstrated that the specificity of the protease activity was for a proline in the amino terminal, suggesting that PI-PLC is a prolyl peptidase [28].
To summarize, the distinct types of proteases categorized in the MEROPS database were used to generate a search module that could be used to query any protein with known 3D structure for the presence of a promiscuous proteolytic activity.This search module identified a serine protease scaffold in PI-PLC from Bacillus cereus, which was validated by in vitro experiments.A similar computational approach can be adopted for other enzymatic functions to extend protein families based on the spatial and electrostatic congruence of active site residues: relationships that often escape detection by sequence alignment or global structure alignment methods.

Results
We chose a set of proteases with known 3D structures and active site residues from each of the seven major classes in the MEROPS database (Table 1) [3].We then created signatures encompassing the spatial and electrostatic properties of the catalytic residues in these proteins [5].To maintain uniformity, we chose three residues from the active site neighborhood, including the catalytic residues (Table 2).These signatures were then used to query other proteins of interest using CLASP.Matches with low scores (less than an empirical threshold of 0.1) indicate a good spatial and electrostatic congruence, and a significant likelihood that these proteins possess proteolytic functions.
The predicted residues, deviations in distances, potential difference in cognate pairs, and scores were determined for a phosphoinositide-specific PLC (PI-PLC) (PDB id: 1PTD) from Bacillus cereus (Table 3).PI-PLC was indicated to be a serine protease because the best match was with a trypsin protein, PDBid:1A0J [29].The residues predicted by CLASP as responsible for its protease activity coincide with the active site responsible for its native phospholipase activity (His32, Asp67, His82, and Asp274) (Fig. 1) [30].However, there was little sequence similarity within the set of querying and queried proteins, suggesting that established sequence alignment methods would fail to detect this relationship (Table S1).
We tested this prediction by performing an in vitro protease assay on commercially available PI-PLC from Bacillus cereus.The protease activity of PI-PLC on the substrate protein UVI31+ [31,32] was inhibited by the protease inhibitor leupeptin, while other inhibitors like AEBSF were unstable during a long incubation (Fig. 2A).A MALDI TOF analysis showed a clean, 13.4 kDa peak for purified UVI31+ protein (Fig. 2B), which was split into two fragments of 2.0 kDa (Fig. 2C) and 11.4 kDa (Fig. 2D) on incubation with PI-PLC.Edman degradation analysis demonstrated that the protease activity was specific for a proline following the first seven residues of the UVI31+ protein (marked by an asterisk -MAEHQLGP*IAG).This suggested that the PI-PLC is a putative prolyl peptidase.The predicted protease scaffold was tested by assaying inhibition of its phospholipase activity by the trypsin inhibitor AEBSF (IC 50 = 0.018 mM).Assays were performed with the substrate in the form of large, unilamellar vesicles.The vesicles consisted of either pure phosphatidylinositol (PI) (Fig. 2E) or an equimolar mixture of PI, phosphatidylcholine (PC), phosphatidylethanolamine (PE), and cholesterol (CH) (Fig. 2F).In both cases, the maximum reaction rates decreased in a dose-dependent way in the presence of AEBSF (Fig. S1).
We tested the proteolytic functions and inhibition using protease inhibitors of the non-toxic Bacillus cereus phosphatidylcholinespecific phospholipase C (PC-PLC) and the closely related highly toxic C. perfringens a-toxin (CPA), which possesses an additional Cterminal domain responsible for the sphingomyelinase, hemolytic, Motifs extracted from each of these proteases consist of three residues.Types: aspartic (A), cysteine (C), glutamic (G), metallo (M), asparagine (N), serine (S), threonine (T).doi:10.1371/journal.pone.0070923.t001 and lethal activities [33].CPA and PC-PLC activity on phospholipids was unaffected by trypsin inhibitors, consistent with the CLASP analysis which fails to detect a serine protease scaffold in these proteins (Table 4, 5).CPA does have a metallo-protease motif from thermolysin PDBid:1FJO (Table 4).Remnants of a metallo-protease in the CPA protein preparation prevented direct confirmation of its proteolytic function.A metallo-protease inhibitor did not inhibit CPA activity.This lack of inhibition by a single compound is insufficient to rule out the existence of a metallo-protease scaffold.
The PC-PLC proteolytic activity could also be an artifact of metallo-protease contamination, which is difficult to remove.CLASP detects in this protein a glutamic protease motif from the Eqolisin family of peptidases, PDBid:1S2B (Table 5), which does not coincide with its native active site (Fig. 3).While this protein's lack of inhibition by serine and metallo-protease inhibitors is consistent with CLASP analysis, mutational studies would be required to confirm the moonlighting glutamic protease scaffold [34].Thus, the protease activities of CPA and PC-PLC remain open to debate.

Discussion
Proteases have evolved to use different mechanisms for proteolysis [2,3,[35][36][37].Although most peptidases cleave peptide bonds by hydrolysis, recently a novel protease was shown to be a lyase [38,39].There is considerable interest in developing computational methods to identify new proteolytic enzymes and their substrates.MEROPS provides a BLAST search for any query protein [3].Another recent method employed learning techniques to predict proteolytic activities, but found no novel proteases undetected by other methods [14,15].Computational methods are also used for predicting protease substrates [40].Here, we selected proteases with known active sites and structures from each family listed in MEROPS, and encapsulated their active site motifs into a single protease search module.Using our previously described method [5], we exploited this search module to unravel proteolytic activities in phosphoinositide-specific PLC (PI-PLC) [23,24].
The importance of proteases in organisms from all kingdoms is well established.In humans, abnormal proteolysis is linked to pathologies like cancer, stroke, heart attack, and parasite infection [41][42][43].The complete set of known proteases present in human, chimpanzee, mouse, and rat have been incorporated into the Degradome database [44].In plants, papain-like cysteine prote-ases are critical enhancers of immunity [45].The bactericidal properties of human neutrophil elastase, a serine protease, have been exploited to design a therapeutic chimeric antimicrobial protein that targets the outer-membrane of bacteria and bolsters the innate immune defense system of grapevines against the Pierce's disease-causing Gram-negative Xylella fastidiosa [46].Several conserved proteases have been implicated in bacterial pathogenesis and are intricately involved in the Type III secretion system [47], quorum sensing [48], motility [49], chaperones for OMV proteins [50], and the protein quality control mechanism essential for degrading unfolded proteins [51].
Proteases are also an integral component of outer membrane vesicles (OMVs), which are shed by all Gram-negative bacteria as blebs from the cell surface [21].OMVs from pathogenic bacteria are transported through the host plasma membrane by endocytosis [52,53], and deliver several virulence factors that modulate the host immune system, alter host cell signaling pathways, and aid the colonization of host tissues [54,55].OMVs contain other proteins like alkaline phosphatase (AP), phospholipase C (PLC), and blactamases [56].
Previously, we detected a promiscuous serine protease scaffold in APs using CLASP [5], and a scaffold recognizing a b-lactam (imipenem) in a cold-active Vibrio AP [18,19].The theoretical foundation of CLASP is that the electrostatic potential difference (EPD) in cognate pairs of active site residues is conserved in proteins with the same functionality.The significance of EPD was extended to a method for enumerating possible pathways for proton abstraction in the active site [57], compute electrostatic perturbations induced by ligand binding [58], and propose a rational design-flow for directed evolution [59,60].Recently, we proposed a methodology for the multiple sequence alignment of related proteins with known structures using electrostatic properties as an additional discriminator and identified mutations that might be the source of functional divergence in a protein family.The active site and its close surroundings contained enough information to infer the correct phylogeny for related proteins [61].Here, we confirmed the presence of this proteolytic scaffold in a cold-active Vibrio AP (VAP) (IC 50 of 0.35+/20.05mM (n = 6) for AEBSF at pH 7.0).Since APs are present in OMVs, we queried other proteins present in OMVs using motifs from different proteases listed in MEROPS.CLASP analysis using the search module (Table 1 and 2) indicated that PI-PLC is a protease with Pro-X specificity (Table 3).This was validated by protease assays, mass spectrometry and by inhibition of the native phospholipase activity by the serine protease inhibitor AEBSF (IC 50 = 0.018 mM).Edman degradation analysis demonstrated that the protease activity was specific for a proline in the amino terminal, suggesting that the PI-PLC is a prolyl peptidase [28].
Other endogenous proteolytic substrates of PI-PLC might be discovered by liquid chromatography-mass spectrometry-based peptidomics [62].
Enzymes that cleave phospholipids are defined by the site of cleavage as PLA (releasing the fatty acids) or PLC/PLD (releasing the polar head group) [28,63].In higher eukaryotes, phosphoinositide-specific PLC (PI-PLC) produces critical secondary messen-gers for signal transduction pathways [22,23].Prokaryotic PI-PLCs are important virulence factors, possibly by altering this signaling pathway [25,26].We experimentally demonstrated the serine protease scaffold in PI-PLC from Bacillus cereus (Fig. 2).The hypothesis concerned the origin of the diverse peptidase families and the evolutionary pressures that molded each may be reinforced by these new families of proteolytic enzymes [64].
The genus Clostridium consists of spore-forming, rod-shaped, Gram-positive bacteria, of which Clostridium perfringens is one of the most pathogenic, with hemolytic, dermonecrotic, vascular per- meabilization, and platelet-aggregating properties [65].C. perfringens strains are classified into five toxinotypes based on four typing toxins [66].The C. perfringens a toxin (CPA), present in all five toxinotypes, is a zinc-dependent enzyme with both phospholipase C (PLC) and sphingomyelinase (SMase) activity [67].The Nterminal domain (,250 residues) is similar to the Bacillus cereus phosphatidylcholine-specific phospholipase C (PC-PLC) [33,68].The C-terminal domain has an eight-stranded anti parallel bsandwich motif similar to eukaryotic calcium-binding C2 domains and confers toxicity on the enzyme [69,70].The observed protease activities of CPA and PC-PLC remain unconfirmed due to suspected metallo-protease contamination.However, CPA and PC-PLC activity on phospholipids were unaffected in the presence of trypsin inhibitors, corroborating the CLASP analysis failure to detect a serine protease scaffold in these proteins.
Another aspect of catalysis that should be modeled is the flexibility and diversity in the active site scaffold of related enzymes.For example, there are many unconventional serine proteases [36].The group of residues that can match a particular residue from the input motif can be varied in CLASP, allowing it to model unconventional motifs.While stereochemical equivalence can be hardwired for amino acids with similar properties, there are instances where residues with different properties occupy the same sequence and spatial location and perform the same Table 3.The deviation in distances (dD), potential difference in cognate pairs (dPD), predicted residues (PR), and final scores of a PI-PLC (PDB id: 1PTD) from Bacillus cereus.The lack of PI-PLC proteolytic activity on the many tested synthetic substrates, and its specificity for UVI31+ protein, indicates that one should exert caution before ruling out protease activity in an enzyme.This is particularly true when a serine protease inhibitor inhibits the native activity, confirming a serine protease-like scaffold (with the classical catalytic triad) in the active site.Serine protease inhibitors are not active on other serinecentric enzymes like serine b-lactamases, or on metallo-enzymes like CPA and PC-PLC.This establishes their specificity for the serine protease scaffold.Proteases are a unique class of enzymes with many possible substrates due to the theoretically infinite number of DNA sequences that could encode proteins with correspondingly infinite folds.Fluorogenic substrate microarrays determine protease substrate specificity using a wide range of fluorogenic protease substrates [72,73].Directed evolution strategies can modify the specificities [59,74,75].The ''poor specificity conversion'' to convert chymotrypsin to trypsin is an example of the difficulty of such an endeavor [76].
We propose a computational methodology to extend protein families based on the spatial and electrostatic properties of the catalytic residues in proteases.The distinct of protease types categorized in the MEROPS database were selected to generate a search module that can query any protein with known structure for the presence of a promiscuous proteolytic activity.

CLASP Algorithm
The CLASP algorithm was described previously [5].Given the active site residues from a protein with known structure, a signature encapsulating the spatial and electrostatic properties of the catalytic site is used to search for congruent matches in a query protein, generating a score which reflects the likelihood that the activity in the reference protein exists in the query protein.
Adaptive Poisson-Boltzmann Solver [77] (APBS) and the PDB2PQR package [78] were used to calculate the potential difference between the reactive atoms of the corresponding proteins.The APBS parameters are set as follows: solute dielectric, 2; solvent dielectric, 78; solvent probe radius, 1.4 A ˚; temperature, 298 K; and ionic strength, 0. APBS writes out the electrostatic potential in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron.All protein structures were rendered by PyMol (http:// www.pymol.org/).
2 Protein, Substrate, and Reagents PI-PLC was purchased from Sigma.Trypsin inhibitor from chicken egg white and PMSF (phenylmethylsulfonyl fluoride) were obtained from Roche.

PI-PLC Assay and Inhibition Using Trypsin Inhibitors
4.1 Vesicle preparation and characterization.The appropriate lipids -Lipids (Phosphatidylinositol/Phosphatidylethanolamine/Phophatidylcholine/Cholesterol -40:30:15:15 ratio) were mixed in organic solution and the solvent (mixture of chloroform/methanol/hydrochloric acid mixture 200/100/1, by volume) was evaporated to dryness under N 2 .Solvent traces were removed by evacuating the lipids for at least 2 hr.The lipids were then rehydrated in 10 mM Hepes buffer with 150 mM NaCl, pH 7.5.Large unilamellar vesicles (LUV) were prepared from the swollen lipids by extrusion and sized using 0.1 mm Nuclepore filters, as described by Ahyayauch et al. [79].The average size of LUV was measured by quasi-elastic light scattering using a Malvern Zeta-sizer.Lipid concentration, determined by phosphate analysis, was 0.3 mM in all experiments.
4.2 Aggregation assay.All assays were carried out at 39uC with continuous stirring in 10 mM Hepes buffer (pH 7.5) with 150 mM NaCl and 0.1% BSA for optimum catalytic activity.The enzyme concentration was 0.16 U/mL.Lipid aggregation was monitored in a Cary Varian UV-vesicle spectrometer as an increase in turbidity (absorbance at 450 nm), as described by Villar et al. [80].

MALDI-TOF Analysis and Edman Degradation
MALDI-TOF mass spectrometric analysis was performed using an UltraFlextreme MALDI-TOF (Bruker Daltonics, Germany).Positive ionization and linear mode were used.The experimental parameters were: laser power, 60%, voltage, 25 kV, and mass difference in linear mode with external calibration, ,6100 ppm (,60.01%).The matrix was sinapinic acid.The external calibration standard consisted of insulin, ubiquitin, cytochrome C, and myoglobin.Edman degradation was performed by Intas Pharma (http://intaspharma.com/).Table S1 Percentage identity/similarity among all proteases chosen for the search module and the PI and PC PLC from Bacillus cereus.(PDF)

Figure 1 .
Figure 1.Superimposed active sites of trypsin and PI-PLC based on the active site match: His/57/NE2, Asp/102/OD1, and Ser/195/ OG from PDBid:1A0J and His/32/NE2, Asp/67/OD1, and Ser/234/OG from PDBid:1PTD, respectively.(a) Superimposed proteins.Trypsin (PDBid:1A0J) is in blue and PI-PLC (PDBid:1PTD) is in grey.After superimposition, all three atoms in both proteins lie on the same plane (Z = 0), such that His57 and His32 (colored in black) lie on the coordinate center and Asp102 and Asp67 lie on the X-Y plane (Y = 0).The active site residues of trypsin are red and those of PI-PLC, yellow.His32, Asp67, His82, and Asp274 are all part of the active site scaffold in PI-PLC [30].(b) Distances between pairs of residues in the matches in A ˚. (c) Potential differences between pairs of residues in the matches.Electrostatic potential in dimensionless units of kT/e where k is Boltzmann's constant, T is the temperature in K and e is the charge of an electron.doi:10.1371/journal.pone.0070923.g001

Figure
Figure S1 Linear regression for the inhibition of PI-PLC activity.(a) inhibition of PI-PLC activity on phosphatidylinositol (PI) by trypsin inhibitor AEBSF.(b) inhibition of PI-PLC activity on PI and phosphatidylcholine (PC), cholesterol (CH), and phosphatidylethanolamine (PE) by trypsin inhibitor AEBSF.(PDF)

Table 1 .
Proteases from different families.

Table 2 .
Active site residues, distances (D), and potential difference (PD) of residue pairs for proteins from each major class in the MEROPS database.