Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

GPS-CCD: A Novel Computational Program for the Prediction of Calpain Cleavage Sites

  • Zexian Liu ,

    Contributed equally to this work with: Zexian Liu, Jun Cao

    Affiliations Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Department of Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China

  • Jun Cao ,

    Contributed equally to this work with: Zexian Liu, Jun Cao

    Affiliation Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China

  • Xinjiao Gao,

    Affiliation Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China

  • Qian Ma,

    Affiliation Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, China

  • Jian Ren ,

    xueyu@mail.hust.edu.cn (YX); renjian.sysu@gmail.com (JR)

    Affiliation State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University (SYSU), Guangzhou, Guangdong, China

  • Yu Xue

    xueyu@mail.hust.edu.cn (YX); renjian.sysu@gmail.com (JR)

    Affiliation Hubei Bioinformatics and Molecular Imaging Key Laboratory, Department of Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, China

GPS-CCD: A Novel Computational Program for the Prediction of Calpain Cleavage Sites

  • Zexian Liu, 
  • Jun Cao, 
  • Xinjiao Gao, 
  • Qian Ma, 
  • Jian Ren, 
  • Yu Xue
PLOS
x

Abstract

As one of the most essential post-translational modifications (PTMs) of proteins, proteolysis, especially calpain-mediated cleavage, plays an important role in many biological processes, including cell death/apoptosis, cytoskeletal remodeling, and the cell cycle. Experimental identification of calpain targets with bona fide cleavage sites is fundamental for dissecting the molecular mechanisms and biological roles of calpain cleavage. In contrast to time-consuming and labor-intensive experimental approaches, computational prediction of calpain cleavage sites might more cheaply and readily provide useful information for further experimental investigation. In this work, we constructed a novel software package of GPS-CCD (Calpain Cleavage Detector) for the prediction of calpain cleavage sites, with an accuracy of 89.98%, sensitivity of 60.87% and specificity of 90.07%. With this software, we annotated potential calpain cleavage sites for hundreds of calpain substrates, for which the exact cleavage sites had not been previously determined. In this regard, GPS-CCD 1.0 is considered to be a useful tool for experimentalists. The online service and local packages of GPS-CCD 1.0 were implemented in JAVA and are freely available at: http://ccd.biocuckoo.org/.

Introduction

Calpains constitute an important family of the Ca2+-dependent cysteine proteases, which contain a nucleophilic cysteine in the catalytically active site [1][7]. Calpains are widely expressed in mammalians and conserved across eukaryotes [1][5], [8], [9]. For instance, in budding yeast, at least one calpain-like protease, Rim13/Cpl1, has been identified, although its functions are still elusive [8], [9]. In humans, there are over 14 distinct members of the calpain superfamily, some of which are tissue specific. Calpain 1 (µ-calpain, micromolar Ca2+-requiring) and Calpain 2 (m-calpain, millimolar Ca2+-requiring) are ubiquitously expressed and well characterized isoforms [1], [2], [4], [5]. Through spatial and temporal cleavage of a variety of substrates to change their conformation, function and stability [1][4], Ca2+-activated calpains play an important role in numerous biological processes, including the regulation of gene expression, signal transduction, cell death/apoptosis, remodeling cytoskeletal attachments during cell fusion/motility and cell cycle progression [1][4], [6], [10][12]. Moreover, calpain aberrancies are frequently implicated in a variety of diseases and cancers [5][7], [13], [14]. Although many studies have tried to dissect the regulatory roles and molecular mechanisms of calpain-dependent cleavage, in fact our understanding of calpain is still fragmentary.

Identification of the site-specific calpain substrates is fundamental for dissecting the roles of calpain cleavage in numerous biological pathways. Besides the conventional experimental approaches with Edman N-terminal sequencing or mass spectrometry (MS) [12], [15], a peptide library approach was also designed to investigate the sequence/structural specificities of calpains [16][18]. Thus far, hundreds of calpain-cleaved proteins have been experimentally identified, including structural proteins, membrane receptors, and transcription factors [12], [15][18]. However, high-throughout technique for the identification of calpain substrates is still limited. Recently, besides time-consuming and labor-intensive experimental methods, the development of computational approaches has also promoted the discovery of the proteolytic cleavage sites [16], [19][22]. In a previous study [16], Tompa et al. collected 106 calpain cleavage sites in 49 substrates from the scientific literature, and determined the amino acid preferences around the cleavage bond, from P4 (upstream) to P7' (downstream). They constructed a position-specific scoring matrix (PSSM), and observed that the preferred residues for µ- calpain and m- calpain recognition are Leu, Thr and Val in the P2 position, and Lys, Tyr and Arg in the P1 position [16]. Based on this rationale, they synthesized a short peptide of TPLK|SPPPSPR (“|” is the potentially cleaved position), which was experimentally verified to be a superior substrate of calpain [16]. With a similar algorithm, Boyd et al. developed an online tool of PoPS (Prediction of Protease Specificity), which allows researchers to use their own training data for building computational models and predicting protease specificity [19], [20]. Based on the frequency and substitution matrix scoring strategy, SitePrediction was designed for predicting Calpain 1 and 2 specific cleavage sites, respectively [21]. Recently, duVerle et al. also constructed a web service for the prediction of calpain cleavage sites [22]. Although a number of predictors were implemented, more efforts need to be made for further improving the prediction accuracy.

In this work, we collected 368 experimentally verified calpain cleavage sites in 130 proteins (Supplementary Table S1). With a previously released algorithm of GPS (Group-based Prediction System) [23], we developed a novel software package of GPS-CCD (Calpain Cleavage Detector) for the prediction of calpain cleavage sites. The leave-one-out validation and 4-, 6-, 8-, 10-fold cross-validations were performed to evaluate the performance of the prediction system. By comparison, the GPS 2.0 algorithm was employed for its outstanding prediction performance, with an accuracy 89.98%, sensitivity 60.87% and specificity 90.07%. Furthermore, there are many proteins experimentally identified as calpain substrates for which the exact cleavage sites have not been verified, and we collected 196 such proteins from PubMed (Supplementary Table S2). As an application, we predicted potential calpain cleavage sites for these targets (Supplementary Table S2). These prediction results might be a useful resource for further experimental investigation. Finally, the online service and local packages of GPS-CCD 1.0 were implemented in JAVA 1.5 (J2SE 5.0) and are freely available for academic researchers at: http://ccd.biocuckoo.org/.

Methods

Data preparation

We searched the scientific literature from PubMed with the keyword of “calpain” to obtain the experimentally verified calpain substrates with cleavage sites (before June 30th, 2010). The data collected by Tompa et al. and duVerle et al. were also integrated [16], [22], while the protein sequences were retrieved from the UniProt database.

We defined a calpain cleavage peptide CCP(m, n) as a cleavage bond flanked by m residues upstream and n residues downstream. As previously described [23], [24], we regarded all experimentally verified cleavage sites as positive data (+), while all other non-cleavage sites in the same substrates were taken as negative data (−). If a cleavage site locates at the N- or C-terminus of the protein and the length of the peptide is smaller than m+n, we added one or multiple “*” characters as pseudo amino acids to complement the CCP(m, n). The positive data (+) set for training might contain several homologous sites from homologous proteins. If the training data were highly redundant with too many homologous sites, the prediction accuracy would be overestimated. To avoid such overestimation, we clustered the protein sequences with a threshold of 40% identity by CD-HIT [25]. If two proteins were similar with ≥40% identity, we re-aligned the proteins with BL2SEQ, a program in the BLAST package [26], and checked the results manually. If two calpain cleavage sites from two homologous proteins were at the same position after sequence alignment, only one item was preserved, the other was discarded. Finally, the non-redundant benchmark data set for training contained 368 positive sites from 130 unique substrates (Supplementary Table S1).

The algorithms

To predict the calpain cleavage sites, a previously self-developed GPS 2.0 algorithm was employed and improved [23]. Based on the hypothesis of similar short peptides exhibiting similar biological functions, we can use an amino acid substitution matrix, eg., BLOSUM62, to evaluate the similarity between two CCP(m, n). As previously described [23], the substitution score between two amino acids a and b can be denoted as Score (a, b). Then the similarity between two CCP(m, n) of A and B is defined as:

If S (A, B) <0, we simply redefined it as S (A, B)  = 0. A putative CCP(m, n) is compared with each of the experimentally verified cleavage peptides in a pairwise manner to calculate the similarity score. The average value of the substitution scores is regarded as the final score. Then we designed a motif length selection (MLS) approach to exhaustively test the combinations of CCP(m, n) (m = 1, …, 30; n = 1, …, 30). The optimal CCP(m, n) was selected for its highest leave-one-out performance. The Sp value was fixed at 90%.

Previously, we observed that different amino acid substitution matrices generated difference in the prediction [23]. To improve the robustness and performance of the prediction system, we developed the novel approach of “Matrix Mutation” (MaM) to generate an optimal or near-optimal matrix [23]. This method was also used in this work. First, BLOSUM62 was chosen as the initial matrix, while the leave-one-out validation was calculated. In BLOSUM62, the substitution score between “*” and other residues is −4 but redefined as 0. Then we fixed the specificity (Sp) at 90% to improve sensitivity (Sn) by randomly picking out one value from the BLOSUM62 matrix for mutation (+1 or −1). If the Sn value increased, the mutation was adopted. This process was terminated when the Sn value was not increased any further. The training order of MLS followed by MaM can not be reversed.

Performance evaluation

As previously described [23], [24], four standard measurements, including accuracy (Ac), sensitivity (Sn), specificity (Sp) and Mathew correlation coefficient (MCC) were defined as shown below:

The self-consistency validation was calculated to evaluate the prediction performance on the benchmark data set. To further estimate the robustness of the prediction system, the leave-one-out validation and 4-, 6-, 8-, 10-fold cross-validations were also carried out. Receiver Operating Characteristic (ROC) curves and AROCs (area under ROCs) were performed.

Implementation of the online service and local packages

The online service and local packages of GPS-CCD 1.0 were implemented in JAVA and are freely available at http://ccd.biocuckoo.org/. For the online service, we tested the GPS-CCD 1.0 on a variety of internet browsers, including Internet Explorer 6.0, Netscape Browser 8.1.3 and Firefox 2 under the Windows XP Operating System (OS), Mozilla Firefox 1.5 of Fedora Core 6 OS (Linux), and Safari 3.0 of Apple Mac OS X 10.4 (Tiger) and 10.5 (Leopard). For the Windows and Linux systems, the latest version of the Java Runtime Environment (JRE) package (JAVA 1.5 or later versions) of Sun Microsystems should be pre-installed. However, for Mac OS, GPS-CCD 1.0 can be directly used without any additional packages. For convenience, we also developed local packages of GPS-CCD 1.0, which worked with the three major Operating Systems, Windows, Linux and Mac.

Results

Development of GPS-CCD with the GPS 2.0 algorithm

In this work, we collected experimentally identified calpain cleavage sites from the scientific literature (Supplementary Table S1). By means of integration with previous studies and a simplification of redundancies, a dataset of 368 experimentally verified calpain cleavage sites in 130 proteins was constructed. Previously, we developed the GPS (Group-based Prediction System) algorithm for the prediction of phosphorylation sites [23], [24]. In contrast to the arbitrarily determined flanking peptides in our previous work [23], [24], here we exhaustively tested the combinations of CCP(m, n). The optimal CCP(10, 4) was selected for its highest leave-one-out performance. Then the scoring matrix BLOSUM62 was also optimized by MaM. After the training to improve performance, the self-consistency validation, the leave-one-out validation and 4-, 6-, 8-, 10-fold cross-validations were thoroughly carried out. ROC curves were drawn, and the AROC values were calculated as 0.946 (self-consistency), 0.838 (leave-one-out), 0.837 (4-fold), 0.853 (6-fold), 0.855 (8-fold) and 0.851 (10-fold), respectively (Figure 1). The self-consistency validation evaluates the prediction accuracy merely on the benchmark data, while the leave-one-out validation and 4-, 6-, 8-, 10-fold cross-validations assess the performance and robustness on an independent data set. Since the results of 4-, 6-, 8-, 10-fold cross-validations were close to the leave-one-out validation, we used the leave-one-out validation as the major performance indicator for further analysis.

thumbnail
Figure 1. The prediction performance of GPS-CCD 1.0.

The self-consistency validation, leave-one-out validation and 4-, 6-, 8-, 10-fold cross-validations were calculated. The Receiver Operating Characteristic (ROC) curves and AROC values were also performed.

https://doi.org/10.1371/journal.pone.0019001.g001

With this performance taken into consideration, we developed a novel predictor of GPS-CCD (Calpain Cleavage Detector). The Ac, Sn and Sp values of GPS-CCD with different cutoff values were presented (Table 1). To avoid too many false positive hits, a high threshold was chosen as the default threshold. As an example, the protein sequence of the human G1 cyclin-dependent kinase 4 inhibitor p19/CDKN2D/INK4d (UniProt ID: P55273) is presented (Figure 2). It was proposed that µ-calpain cleaves CDKN2D after the R25, H29, Q47, G64, L113 and A127 residues, and plays an important role in modulating cell cycle regulatory protein turnover [27]. With the default parameter (high threshold), we successfully predicted the four known bonds after R25, Q47, G64 and A127, with three additionally potential cleavage bonds after the S73, G74, and D80 residues (Figure 2).

thumbnail
Figure 2. The screen snapshot of GPS-CCD software.

A high threshold was chosen as the default cut-off. The human cyclin-dependent kinase 4 inhibitor D/CDKN2D (P55273) is presented as an example.

https://doi.org/10.1371/journal.pone.0019001.g002

Comparison of different computational approaches

For comparison, we also investigated the performances of several other approaches or predictors, including GPS 1.1 algorithm [24], PoPS [19], [20], SitesPrediction [21] and CaMPDB [22]. The only difference between GPS 2.0 and GPS 1.1 is that the MaM process is not carried out in GPS 1.1. To avoid any bias, the same training data (368 sites) was used for GPS 1.1, while the CCP(10, 4) was determined with the highest leave-one-out result. Since the PoPS software package allows user-defined computational models [19], [20], we used our training data set to construct a PSSM model in PoPS. Again, the CCP(8, 3) was selected based on the highest leave-one-out result. The leave-one-out results of GPS 1.1 and PoPS were performed for comparison. Besides a frequency scoring algorithm, SitePrediction also adopted an additional substitution matrix scoring strategy by comparing potential cleavage sites to the known sites, and this method is quite similar with GPS 1.1 algorithm [21]. Since user-defined models can not be constructed in SitePrediction, we directly submitted the benchmark data set to calculate the performances of Calpain 1 (SitePrediction 1) and Calpain 2 (SitePrediction 2), respectively.

In Table 1, we fixed the Sp values of GPS 1.1, PoPS and SitePrediction to be similar with GPS 2.0 and compared the Sn values. When the Sp value was ∼85%, the Sn values of GPS 2.0, GPS 1.1, PoPS, SitePrediction 1 and SitePrediction 2 were 66.58%, 60.33%, 60.32%, 50.82% and 48.37%, respectively (Table 1). Moreover, when the Sp value was ∼90%, the Sn values of GPS 2.0, GPS 1.1, PoPS, SitePrediction 1 and SitePrediction 2 were 60.87%, 50.00%, 52.45%, 41.30% and 39.67%, respectively (Table 1). In addition, when the Sp value was ∼95%, the Sn of GPS 2.0 (45.92%) was still much better than GPS 1.1 (34.51%), PoPS (36.14%), SitePrediction 1 (31.52%) and SitePrediction 2 (28.26%) (Table 1). Previously, it was observed that the accuracy of SitePrediction can be comparative with PoPS, when the same training and testing data sets were provided [21]. In our analysis, we confirmed this conclusion that the performance of SitePrediction like algorithm of GPS 1.1 is quite similar with PoPS (Table 1). The SitePrediction did not exhibit superior performance because of limited training data. Taken together, the prediction performance of the GPS 2.0 algorithm was much better than other methods. In addition, ROC curves were drawn, whereas the AROC value of the GPS 2.0 algorithm was generally better than the other approaches (Figure 3A).

thumbnail
Figure 3. Comparison of GPS 2.0, GPS 1.1 [24], PoPS [19], [20], SitesPrediction [21] and CaMPDB [22].

The leave-one-out performances were calculated for GPS 2.0, GPS 1.1 and PoPS. We calculated the accuracy of SitesPrediction by directly submitting the benchmark data set for the prediction. (A) The data set contains 368 cleavage sites in 130 unique substrates; (B) For CaMPDB, we took 267 cleavage sites in 104 proteins from its website [22]. The highest AROC value in CaMPDB was 0.801.

https://doi.org/10.1371/journal.pone.0019001.g003

In CaMPDB, duVerle et al. developed a calpain cleavage sites predictor with a training data set containing 267 cleavage sites in 104 proteins (http://www.calpain.org/prediction_view.rb) [22]. The tool always predicts 10 potential cleavage sites for any given protein sequences. If we divide one sequence into two fragments as inputs, the prediction results are different from the original sequence. Also, if we input a putative sequence as ‘AAAAAAAAAAA’, this program still provides 10 positive hits. In this regard, the Ac, Sn, Sp and MCC values can not be estimated. However, they calculated the AROCs of different methods, while the highest AROC was 0.801 for the Support Vector Machines (SVM) algorithms with Radial Basis Function (RBF) kernel [22]. To avoid any bias, we used the same data set (267 sites) for comparison. After training, the optimal CCP (8, 12) was determined for GPS 2.0 and GPS 1.1, while the CCP(6, 3) was selected for PoPS. Again, leave-one-out ROC curves were drawn, while AROC results were 0.846, 0.806, and 0.809 for GPS 2.0, GPS 1.1 and PoPS, respectively (Figure 3B). In this regard, the performances of GPS 1.1 and PoPS are similar with the previous study, while GPS 2.0 is much better.

Large-scale prediction of calpain cleavage sites in proteins

While a large number of proteins have been experimentally verified to be cleaved by calpains, the bona fide cleavage sites still need to be elucidated. To perform an application of GPS-CCD 1.0, we first collected 196 calpain cleavage substrates from the scientific literature (Supplementary Table S2). With the default threshold (high), we predicted potentially calpain cleavage site for these proteins (Supplementary Table S2). The prediction results should be useful for further experimental verification. Several examples were randomly picked out, and their prediction results are presented in Figure 4 with the help of DOG 1.0 [28].

thumbnail
Figure 4. Applications of GPS-CCD 1.0.

Here we predicted the potential calpain cleavage sites in the experimentally identified calpain substrates with a default threshold. (A) The human NCS1 (P62166); (B) The human PEBP (P30086); (C) The Rat Ptpn5 (P35234); (D) The Aplysia atypical PKC (C3VIX7).

https://doi.org/10.1371/journal.pone.0019001.g004

It was proposed that chronic exposure to paclitaxel (Taxol) activates µ-calpain and diminishes inositol trisphosphate (InsP3)-mediated Ca2+ signaling, through cleaving and degrading neuronal calcium sensor-1/NCS1 (P62166) [29]. However, the precise cleavage sites have not been experimentally identified. Here, we predicted that the human NCS1 protein might be cleaved after G2, N5, K7, T17, and K174 (Figure 4A). Interestingly, most of these potential sites were located in the N-terminus of the protein, with the K174 site is at the boundary between the EF-hand 4 domain and the IL1RAPL1 Interaction domain. None of which are located within the EF-hand domain. As a serine protease inhibitor, human phosphatidylethanolamine-binding protein 1/PEBP (P30086) was identified as an in vitro and in situ calpain substrate, with the bona fide cleavage sites again not yet determined [30]. In a model of brain injury, activated calpain leads to PEBP degradation and enhances the chymostrypsin-like activity of the proteasome [30]. We predicted that PEBP might be cleaved after G108 and R161 (Figure 4B). Since both of the two sites locate in the phosphatidylethanolamine-binding domain, PEBP proteolysis by calpain might disrupt its original roles to alleviate impaired proteasome function in Alzheimer's disease (AD) [30]. Recent work by Xu et al. suggested that extrasynaptic NMDA receptors have an important role in excitotoxicity via the calpain-mediated cleavage of striatum-enriched protein-tyrosine phosphatase STEP/Ptpn5 (P35234) [31]. We predicted that STEP might be cleaved after S52, S84, T146, Q366 and S367 residues (Figure 4C). In addition, an atypical protein kinase C (C3VIX7) isolated from Aplysia californica was demonstrated to be a calpain substrate [32]. Here we predicted the cleavage bonds to be after G187, N192, G471, G520, Q531 and Y536 (Figure 4D).

Discussion

Calpain-mediated cleavage is an important PTM of proteins [1][9]. The identification of new calpain substrates with cleavage sites is the key step to establishing a foundation for understanding the regulatory roles of the calpain cleavage processes. Although many studies have investigated the functions and biological roles of calpain cleavage in various cellular processes, an unambiguous consensus motif has still not been detected for either µ-calpain or m-calpain [16][18]. In contrast to labor-intensive and expensive experimental approaches, the computational prediction of calpain cleavage sites is comparatively simple, and might therefore be of great help in providing information for further experimental verification.

To date, hundreds of calpain cleavage sites were experimentally identified, while a large number of these known sites were collected in a variety of public databases [22], [33][35]. For example, a proteolytic event database of CutDB contains 63 known calpain substrates with 165 cleavage sites [33], whereas the peptidase database MEROPS has collected 101 Calpain 1 and 147 Calpain 2 sites, respectively [34], [35]. Also, a recently constructed database CaMPDB collected 104 experimental identified calpain targets with 267 sites by literature curation [22]. Based on these experimental data, several computational tools have developed for the prediction of calpain cleavages sites. For example, SitePrediction can distinguish between calpains, with a training data set containing 79 Calpain 1 and 103 Calpain 2 sites from MEROPS database [21]. Moreover, with 47 Calpain 1 and 57 Calpain 2 sites in Homo sapiens, SitePrediction also provides the organism-specific predictions [21]. In addition, several extra features for calpain cleavage sites prediction, such as PEST sequence (short peptide rich in Pro/P, Glu/E, Ser/S and Thr/T), solvent accessibility and secondary structure were considered and analyzed in PoPS and SitePrediction [19][21].

In this study, we presented a novel predictor of GPS-CCD with an improved GPS 2.0 algorithm [23]. In our benchmark data set, the number of experimentally identified calpain cleavage sites is still limited, while the specific calpain information for a considerable proportion of known sites is ambiguous. In this regard, GSP-CCD predictions didn't distinguish among different calpain isoforms as previously carried out [22]. By comparison, our approach is much better than other existing methods currently in use. Through the application of annotation, the exact cleavage sites for potential substrates identified in previous studies were obtained (Supplementary Table S2). In this regard, we conclude that GPS-CCD 1.0 is a useful tool for pinpointing potential calpain cleavage sites, while computational predictions followed by experimental verification should lead to an improved identification of calpain substrates in the near future.

With the continuous efforts that have led to the spate of reports, many functions have been assigned to calpains, with the result that the calpains target a broad range of broad substrates in a variety of biological processes. The collection of calpain substrates from the literature provided the opportunity to analyze the functional abundance and diversity of calpain cleavage processes. With a hypergeometric distribution [36], we statistically analyzed the enriched biological processes, molecular functions and cellular components with gene ontology (GO) annotations for the human calpain substrates (Supplementary Table S3). The GO association files were downloaded from the GOA database (EBI, on June 29th, 2010) [37]. For biological processes, our analysis suggests that calpain substrates are enriched in response to a variety of stimulus, such as drug (GO:0042493), corticosterone stimulus (GO:0051412), organic nitrogen (GO:0010243) and so on (Supplementary Table S3). Calpain cleavage is also highly implicated in regulation of mitochondrial membrane (GO:0046902, GO:0051881) and apoptosis (GO:0043066, GO:0042981, GO:0006916) (Supplementary Table S3). Also, the significantly over-represented molecular functions of human calpain substrates are protein activity and various molecular binding, which can be dynamically regulated by cleavage (Supplementary Table S3). Moreover, calpain cleavage targets were distributed in a variety of subcellular localizations, such as cytoplasm (GO:0005737), cytosol (GO:0005829), axon (GO:0030424), actin cytoskeleton (GO:0015629), and nucleoplasm (GO:0005654) (Supplementary Table S3). Taken together, our analysis can be a good start for further investigating molecular mechanisms of calpain cleavage.

Supporting Information

Table S1.

We collected 368 experimentally identified calpain cleavage sites in 130 unique proteins from the scientific literatures (PubMed). a. UniProt, the UniProt accession number; b. Position, the position of a calpain cleavage site, while its following bond can be disrupted by calpain; c. PMID, the primary references.

https://doi.org/10.1371/journal.pone.0019001.s001

(XLS)

Table S2.

From previous experimental studies, we also collected 196 calpain cleavage substrates. The exact calpain cleavage sites had not yet been experimentally determined. The default threshold (high) was adopted for GPS-CCD 1.0.

https://doi.org/10.1371/journal.pone.0019001.s002

(XLS)

Table S3.

The top 15 most enriched processes, functions and localizations of human calpain cleavage substrates. From Table S1 and Table S2, we collected 176 human calpain targets. The human proteome contains 18,262 proteins which have at least one GO term. a. the number of proteins annotated; b. the proportion of proteins annotated; c. E-ratio, enrichment ratio.

https://doi.org/10.1371/journal.pone.0019001.s003

(XLS)

Acknowledgments

The authors are thankful for Dr. Christian Schönbach and two anonymous reviewers, whose suggestions have greatly improved the presentation of this manuscript. Pacific Edit reviewed the manuscript prior to submission.

Author Contributions

Conceived and designed the experiments: YX JR. Performed the experiments: ZL JC JR YX. Analyzed the data: ZL JC JR YX. Contributed reagents/materials/analysis tools: XG QM. Wrote the paper: ZL JC JR YX.

References

  1. 1. Croall DE, Ersfeld K (2007) The calpains: modular designs and functional diversity. Genome Biol 8: 218.DE CroallK. Ersfeld2007The calpains: modular designs and functional diversity.Genome Biol8218
  2. 2. Franco SJ, Huttenlocher A (2005) Regulating cell migration: calpains make the cut. J Cell Sci 118: 3829–3838.SJ FrancoA. Huttenlocher2005Regulating cell migration: calpains make the cut.J Cell Sci11838293838
  3. 3. Glading A, Lauffenburger DA, Wells A (2002) Cutting to the chase: calpain proteases in cell motility. Trends Cell Biol 12: 46–54.A. GladingDA LauffenburgerA. Wells2002Cutting to the chase: calpain proteases in cell motility.Trends Cell Biol124654
  4. 4. Goll DE, Thompson VF, Li H, Wei W, Cong J (2003) The calpain system. Physiol Rev 83: 731–801.DE GollVF ThompsonH. LiW. WeiJ. Cong2003The calpain system.Physiol Rev83731801
  5. 5. Huang Y, Wang KK (2001) The calpain family and human disease. Trends Mol Med 7: 355–362.Y. HuangKK Wang2001The calpain family and human disease.Trends Mol Med7355362
  6. 6. Liu J, Liu MC, Wang KK (2008) Calpain in the CNS: from synaptic function to neurotoxicity. Sci Signal 1: re1.J. LiuMC LiuKK Wang2008Calpain in the CNS: from synaptic function to neurotoxicity.Sci Signal1re1
  7. 7. Zatz M, Starling A (2005) Calpains and disease. N Engl J Med 352: 2413–2423.M. ZatzA. Starling2005Calpains and disease.N Engl J Med35224132423
  8. 8. Futai E, Maeda T, Sorimachi H, Kitamoto K, Ishiura S, et al. (1999) The protease activity of a calpain-like cysteine protease in Saccharomyces cerevisiae is required for alkaline adaptation and sporulation. Mol Gen Genet 260: 559–568.E. FutaiT. MaedaH. SorimachiK. KitamotoS. Ishiura1999The protease activity of a calpain-like cysteine protease in Saccharomyces cerevisiae is required for alkaline adaptation and sporulation.Mol Gen Genet260559568
  9. 9. Hayashi M, Fukuzawa T, Sorimachi H, Maeda T (2005) Constitutive activation of the pH-responsive Rim101 pathway in yeast mutants defective in late steps of the MVB/ESCRT pathway. Mol Cell Biol 25: 9478–9490.M. HayashiT. FukuzawaH. SorimachiT. Maeda2005Constitutive activation of the pH-responsive Rim101 pathway in yeast mutants defective in late steps of the MVB/ESCRT pathway.Mol Cell Biol2594789490
  10. 10. Squier MK, Sehnert AJ, Sellins KS, Malkinson AM, Takano E, et al. (1999) Calpain and calpastatin regulate neutrophil apoptosis. J Cell Physiol 178: 311–319.MK SquierAJ SehnertKS SellinsAM MalkinsonE. Takano1999Calpain and calpastatin regulate neutrophil apoptosis.J Cell Physiol178311319
  11. 11. Tan Y, Wu C, De Veyra T, Greer PA (2006) Ubiquitous calpains promote both apoptosis and survival signals in response to different cell death stimuli. J Biol Chem 281: 17689–17698.Y. TanC. WuT. De VeyraPA Greer2006Ubiquitous calpains promote both apoptosis and survival signals in response to different cell death stimuli.J Biol Chem2811768917698
  12. 12. Yousefi S, Perozzo R, Schmid I, Ziemiecki A, Schaffner T, et al. (2006) Calpain-mediated cleavage of Atg5 switches autophagy to apoptosis. Nat Cell Biol 8: 1124–1132.S. YousefiR. PerozzoI. SchmidA. ZiemieckiT. Schaffner2006Calpain-mediated cleavage of Atg5 switches autophagy to apoptosis.Nat Cell Biol811241132
  13. 13. Arrington DD, Van Vleet TR, Schnellmann RG (2006) Calpain 10: a mitochondrial calpain and its role in calcium-induced mitochondrial dysfunction. Am J Physiol Cell Physiol 291: C1159–1171.DD ArringtonTR Van VleetRG Schnellmann2006Calpain 10: a mitochondrial calpain and its role in calcium-induced mitochondrial dysfunction.Am J Physiol Cell Physiol291C11591171
  14. 14. Williams A, Sarkar S, Cuddon P, Ttofi EK, Saiki S, et al. (2008) Novel targets for Huntington's disease in an mTOR-independent autophagy pathway. Nat Chem Biol 4: 295–305.A. WilliamsS. SarkarP. CuddonEK TtofiS. Saiki2008Novel targets for Huntington's disease in an mTOR-independent autophagy pathway.Nat Chem Biol4295305
  15. 15. Ottens AK, Golden EC, Bustamante L, Hayes RL, Denslow ND, et al. (2008) Proteolysis of multiple myelin basic protein isoforms after neurotrauma: characterization by mass spectrometry. J Neurochem 104: 1404–1414.AK OttensEC GoldenL. BustamanteRL HayesND Denslow2008Proteolysis of multiple myelin basic protein isoforms after neurotrauma: characterization by mass spectrometry.J Neurochem10414041414
  16. 16. Tompa P, Buzder-Lantos P, Tantos A, Farkas A, Szilagyi A, et al. (2004) On the sequential determinants of calpain cleavage. J Biol Chem 279: 20775–20785.P. TompaP. Buzder-LantosA. TantosA. FarkasA. Szilagyi2004On the sequential determinants of calpain cleavage.J Biol Chem2792077520785
  17. 17. Cuerrier D, Moldoveanu T, Davies PL (2005) Determination of peptide substrate specificity for mu-calpain by a peptide library-based approach: the importance of primed side interactions. J Biol Chem 280: 40632–40641.D. CuerrierT. MoldoveanuPL Davies2005Determination of peptide substrate specificity for mu-calpain by a peptide library-based approach: the importance of primed side interactions.J Biol Chem2804063240641
  18. 18. Thomas DA, Francis P, Smith C, Ratcliffe S, Ede NJ, et al. (2006) A broad-spectrum fluorescence-based peptide library for the rapid identification of protease substrates. Proteomics 6: 2112–2120.DA ThomasP. FrancisC. SmithS. RatcliffeNJ Ede2006A broad-spectrum fluorescence-based peptide library for the rapid identification of protease substrates.Proteomics621122120
  19. 19. Boyd SE, Garcia de la Banda M, Pike RN, Whisstock JC, Rudy GB (2004) PoPS: a computational tool for modeling and predicting protease specificity. Proc IEEE Comput Syst Bioinform Conf 372–381.SE BoydM. Garcia de la BandaRN PikeJC WhisstockGB Rudy2004PoPS: a computational tool for modeling and predicting protease specificity.Proc IEEE Comput Syst Bioinform Conf372381
  20. 20. Boyd SE, Pike RN, Rudy GB, Whisstock JC, Garcia de la Banda M (2005) PoPS: a computational tool for modeling and predicting protease specificity. J Bioinform Comput Biol 3: 551–585.SE BoydRN PikeGB RudyJC WhisstockM. Garcia de la Banda2005PoPS: a computational tool for modeling and predicting protease specificity.J Bioinform Comput Biol3551585
  21. 21. Verspurten J, Gevaert K, Declercq W, Vandenabeele P (2009) SitePredicting the cleavage of proteinase substrates. Trends Biochem Sci 34: 319–323.J. VerspurtenK. GevaertW. DeclercqP. Vandenabeele2009SitePredicting the cleavage of proteinase substrates.Trends Biochem Sci34319323
  22. 22. duVerle D, Takigawa I, Ono Y, Sorimachi H, Mamitsuka H (2010) CaMPDB: a resource for calpain and modulatory proteolysis. Genome Inform 22: 202–213.D. duVerleI. TakigawaY. OnoH. SorimachiH. Mamitsuka2010CaMPDB: a resource for calpain and modulatory proteolysis.Genome Inform22202213
  23. 23. Xue Y, Ren J, Gao X, Jin C, Wen L, et al. (2008) GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 7: 1598–1608.Y. XueJ. RenX. GaoC. JinL. Wen2008GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy.Mol Cell Proteomics715981608
  24. 24. Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, et al. (2005) GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res 33: W184–187.Y. XueF. ZhouM. ZhuK. AhmedG. Chen2005GPS: a comprehensive www server for phosphorylation sites prediction.Nucleic Acids Res33W184187
  25. 25. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658–1659.W. LiA. Godzik2006Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.Bioinformatics2216581659
  26. 26. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.SF AltschulTL MaddenAA SchafferJ. ZhangZ. Zhang1997Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res2533893402
  27. 27. Joy J, Nalabothula N, Ghosh M, Popp O, Jochum M, et al. (2006) Identification of calpain cleavage sites in the G1 cyclin-dependent kinase inhibitor p19(INK4d). Biol Chem 387: 329–335.J. JoyN. NalabothulaM. GhoshO. PoppM. Jochum2006Identification of calpain cleavage sites in the G1 cyclin-dependent kinase inhibitor p19(INK4d).Biol Chem387329335
  28. 28. Ren J, Wen L, Gao X, Jin C, Xue Y, et al. (2009) DOG 1.0: illustrator of protein domain structures. Cell Res 19: 271–273.J. RenL. WenX. GaoC. JinY. Xue2009DOG 1.0: illustrator of protein domain structures.Cell Res19271273
  29. 29. Boehmerle W, Zhang K, Sivula M, Heidrich FM, Lee Y, et al. (2007) Chronic exposure to paclitaxel diminishes phosphoinositide signaling by calpain-mediated neuronal calcium sensor-1 degradation. Proc Natl Acad Sci U S A 104: 11103–11108.W. BoehmerleK. ZhangM. SivulaFM HeidrichY. Lee2007Chronic exposure to paclitaxel diminishes phosphoinositide signaling by calpain-mediated neuronal calcium sensor-1 degradation.Proc Natl Acad Sci U S A1041110311108
  30. 30. Chen Q, Wang S, Thompson SN, Hall ED, Guttmann RP (2006) Identification and characterization of PEBP as a calpain substrate. J Neurochem 99: 1133–1141.Q. ChenS. WangSN ThompsonED HallRP Guttmann2006Identification and characterization of PEBP as a calpain substrate.J Neurochem9911331141
  31. 31. Xu J, Kurup P, Zhang Y, Goebel-Goody SM, Wu PH, et al. (2009) Extrasynaptic NMDA receptors couple preferentially to excitotoxicity via calpain-mediated cleavage of STEP. J Neurosci 29: 9330–9343.J. XuP. KurupY. ZhangSM Goebel-GoodyPH Wu2009Extrasynaptic NMDA receptors couple preferentially to excitotoxicity via calpain-mediated cleavage of STEP.J Neurosci2993309343
  32. 32. Bougie JK, Lim T, Farah CA, Manjunath V, Nagakura I, et al. (2009) The atypical protein kinase C in Aplysia can form a protein kinase M by cleavage. J Neurochem 109: 1129–1143.JK BougieT. LimCA FarahV. ManjunathI. Nagakura2009The atypical protein kinase C in Aplysia can form a protein kinase M by cleavage.J Neurochem10911291143
  33. 33. Igarashi Y, Eroshkin A, Gramatikova S, Gramatikoff K, Zhang Y, et al. (2007) CutDB: a proteolytic event database. Nucleic Acids Res 35: D546–549.Y. IgarashiA. EroshkinS. GramatikovaK. GramatikoffY. Zhang2007CutDB: a proteolytic event database.Nucleic Acids Res35D546549
  34. 34. Rawlings ND (2009) A large and accurate collection of peptidase cleavages in the MEROPS database. Database (Oxford) 2009 bap015.ND Rawlings2009A large and accurate collection of peptidase cleavages in the MEROPS database.Database (Oxford) 2009bap015
  35. 35. Rawlings ND, Morton FR, Kok CY, Kong J, Barrett AJ (2008) MEROPS: the peptidase database. Nucleic Acids Res 36: D320–325.ND RawlingsFR MortonCY KokJ. KongAJ Barrett2008MEROPS: the peptidase database.Nucleic Acids Res36D320325
  36. 36. Zhou F, Xue Y, Lu H, Chen G, Yao X (2005) A genome-wide analysis of sumoylation-related biological processes and functions in human nucleus. FEBS Lett 579: 3369–3375.F. ZhouY. XueH. LuG. ChenX. Yao2005A genome-wide analysis of sumoylation-related biological processes and functions in human nucleus.FEBS Lett57933693375
  37. 37. Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, et al. (2009) The GOA database in 2009—an integrated Gene Ontology Annotation resource. Nucleic Acids Res 37: D396–403.D. BarrellE. DimmerRP HuntleyD. BinnsC. O'Donovan2009The GOA database in 2009—an integrated Gene Ontology Annotation resource.Nucleic Acids Res37D396403