This paper describes a web server developed for designing therapeutic peptides with desired half-life in blood. In this study, we used 163 natural and 98 modified peptides whose half-life has been determined experimentally in mammalian blood, for developing in silico models. Firstly, models have been developed on 261 peptides containing natural and modified residues, using different chemical descriptors. The best model using 43 PaDEL descriptors got a maximum correlation of 0.692 between the predicted and the actual half-life peptides. Secondly, models were developed on 163 natural peptides using amino acid composition feature of peptides and achieved a maximum correlation of 0.643. Thirdly, models were developed on 163 natural peptides using chemical descriptors and attained a maximum correlation of 0.743 using 45 selected PaDEL descriptors. In order to assist researchers in the prediction and designing of half-life of peptides, the models developed have been integrated into PlifePred web server (http://webs.iiitd.edu.in//raghava/plifepred/).
Citation: Mathur D, Singh S, Mehta A, Agrawal P, Raghava GPS (2018) In silico approaches for predicting the half-life of natural and modified peptides in blood. PLoS ONE 13(6): e0196829. https://doi.org/10.1371/journal.pone.0196829
Editor: Hendrik W. van Veen, University of Cambridge, UNITED KINGDOM
Received: June 11, 2017; Accepted: April 22, 2018; Published: June 1, 2018
Copyright: © 2018 Mathur et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: Authors are thankful to funding agencies Council of Scientific and Industrial Research (project Open Source Drug Discovery and GENESIS BSC0121), Department of Science and Technology (DST INSPIRE) Govt. of India for financial support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: One of the co-authors, Gajendra. P. S. Raghava, is an academic editor of PLoS ONE. This does not alter the authors’ adherence to all the PLOS ONE policies on sharing data and materials.
The technological advances have led to the revival of interest of the pharmaceutical industry in peptide-based therapeutics . Peptides show diverse therapeutic properties [2,3] like anticancer , antimicrobial [5,6], antiparasitic , cell penetrating [8,9], antihypertensive , tumor homing . The peptides have a number of advantages over small molecule-based drugs that include high specificity and low side effects [1,12]. Despite many advantages, therapeutic peptides still face many roadblocks on the road to the pharmaceutical market. The major hurdle that is blocking the path of development of therapeutic peptides is their short half-life due to their susceptibility to enzymatic degradation that reduces their bioavailability. Different routes of peptide deliveries have been explored that include intranasal , transdermal , oral , pulmonary , rectal . The parenteral route of peptide delivery is preferred over other routes of administration for efficient systemic delivery as it prevents cleavage of peptides by the gastrointestinal enzymes.
In the past, numerous attempts have been made to increase the half-life of peptides in blood that includes cyclization of peptides, incorporation of modified residues and terminal modifications . These methods not only enhance the in-vivo half-life but also increase bioavailability. Determination of half-life of novel peptides in blood is one of the major challenges in understanding their stability. The experimental techniques to determine the half-life of peptides are well established and highly accurate. Unfortunately, these experimental techniques are costly, cumbersome and time-consuming. Therefore, alternate methods are required for estimating the half-life of peptides. An in silico method to predict and design half-life of peptides in blood will be an invaluable tool for the researchers working in the field of therapeutic peptides. Previously, computational tools have been developed for predicting the half-life of proteins. ProtLifePred [http://protein-n-end-rule.leadhoster.com/] and ProtParam  are based on the N-end rule and predict the half-life of proteins in E.coli, S. cerevisiae and mammalian cells. The stability of HIV-derived peptides in the cytosol of human peripheral blood mononuclear cells can be judged using the Stability Prediction tool . SprotP server  identifies proteins with a half-life less than 30 minutes in human embryonic kidney 293T cells. Recently, our group developed a web server HLP  for predicting half-life of peptides in the intestine-like environment.
To the best of the authors’ knowledge no in silico method has been developed to predict the half-life of peptides in mammalian blood. Thus, we made a systematic attempt to understand the nature of peptides having long life and short life in mammalian blood. In the present study, we have developed in silico models using various machine learning techniques and features namely, amino acid composition, dipeptide composition, binary profile, atom composition and chemical descriptors to predict the half-life of peptides in blood.
We extracted sequences and structures of the experimentally determined half-life of peptides from PEPlife , which is a database of the half-life of 2230 peptides in various environments like blood, urine, intestinal, kidney and brain homogenates, various cell lines and media like PBS, etc. We used following procedure to derive our dataset. Firstly, we extracted the peptides whose half-life had been experimentally validated in mammalian blood from PEPlife and obtained 1392 entries. Secondly, we removed all peptides having number of residues more than 50 or less than 5. Thirdly, we removed all those peptides having half-life more than 24 hours and less than 20 seconds. After the above filters, we got 1119 peptides having length from 5 to 50 and half-life from 20 seconds to 24 hours. Fourthly, from these 1119 peptides, we removed peptide sequences that had complex terminal modifications like PEGylation, biotinylation etc. or complex non-terminal modifications like sarcosine, β-alanine, etc. The peptides whose structures were not available in PDB database or PEPlife database were also removed; we got 682 peptides after this step. Finally, we got 261 unique peptides (See Supporting information pdb_files.zip) after removing redundancy, where no two peptides are identical. We called this dataset of 261 peptides as modified dataset as it contains natural and modified peptides. We also created dataset of natural peptides that contain only 163 natural peptides. The dataset consists of unique non-identical sequences, though few sequences may have up to 90% sequence similarity due to availability of limited dataset. Detailed information of the peptides containing natural as well as modified residues is given in Supporting Information file 261_natural+modified.xlsx whereas information of the peptides containing only natural residues is provided in Supporting Information file 163_natural.xlsx. Literature shows that even a single residue mutation or chemical modification in the peptide can change its half-life considerably [24,25]; so such peptides were retained in the dataset. In order to present half-life on a linear scale we have taken log2 of the half-life of peptides in seconds. The construction of datasets and the prediction approach followed is shown in Fig 1.
Development of models
We used various machine learning techniques in this study for developing regression models. We implemented support vector machine (SVM) using SVM light software available at http://www.cs.cornell.edu/People/tj/svm_light/. SMOreg, Linear Regression, Gaussian Processes, IBk were implemented using Weka , a Java-based software package. In order to evaluate performance of models, we used leave-one-out cross-validation (LOOCV) technique. In LOOCV technique, for N number of peptides in the dataset, N-1 peptides are employed for training while the remaining one is used for testing. This process is repeated N times in order to test each peptide once. We also evaluated the performance of the model on 10% independent dataset of the natural peptides selected randomly. The experiment was perfromed 5 times and the average values were reported. In the present study, we used different types of features for developing models; the following is a brief description of the features.
- Residue composition: In this study, we used amino acid composition of peptides for developing models, where a vector of dimension 20 presents peptide. Similarly, models were also developed using dipeptide composition of peptides where the peptide is represented by a vector of dimension 400 [27,28].
- Binary pattern: The order and frequency of residues can be studied using the binary pattern profile of peptides . To analyze the role of the terminal residues we took five residues from both the N and C terminus and calculated their binary profiles.
- Atom composition: It represents the frequency of 8 types of atoms (C, H, O, N, S, F, Cl, Br) present in the peptide sequence . The atom composition was calculated from the SMILES of the peptide sequences, which includes the information of the chemical modifications as well as the amino acid.
- Chemical descriptors: Chemical descriptors are useful for developing QSAR models of peptides. We used PaDEL  which is an open source software for calculating more than 15,400 descriptors, consisting of 2D, 3D and fingerprints. To select the minimum number of descriptors that correlate to the half-life of peptides, we employed the CfsSubsetEval along with BestFirst modules of Weka.
Analysis of peptides
We examined the physicochemical properties and amino acid composition of 20 peptides with the highest and the lowest half-lives (Fig 2). It was observed that peptides with long half-lives showed a high frequency of negatively charged (Glu) and small sized residues (Ala, Glu, Ile and Leu). These amino acids might be involved in stabilizing the half-life of peptides. The peptides with a short half-life are enriched in aromatic (Tyr and Phe) and neutral amino acids (Gly, His, Ser and Tyr). Previously, Morozumi et al., 2011 have shown that substitution of Glu with neutral amino acids resulted in lowering of the half-life of analogs of motilin-grehlin chimeric peptides . We studied the distribution of half-life of the peptides with different sequence similarity present in our natural peptide dataset (S1 Fig) and observed that even the substitution of a single or double residue results in changing the half-life of the peptide significantly (S1 Table). The correlations between the half-lives of all 163 natural peptides with amino acid composition and physicochemical properties also show similar patterns (S2 and S3 Tables). It was observed that composition of amino acid Ala (a hydrophobic, non-polar residue) shows highest correlation followed by Glu (a negative charge residue) (S2 Table). In contrast, the composition of Phe (an aromatic residue) shows highest negative correlation with half-life of peptides. Previous studies have also reported lowering of half-life in peptides enriched in aromatic amino acids[31–33].
Prediction of half-life on the natural dataset
In silico models have been developed on 163 natural peptides, to predict the half-life of peptides using different types of sequence-based features (Table 1). The amino acid composition based regression model achieved a maximum Pearson’s correlation coefficient (R) of 0.643 with mean absolute error (MAE) 1.531. The dipeptide composition based model attained R of 0.640 with MAE of 1.539. The atom composition achieved R of 0.532. To analyze the role of the amino acids present at the termini of the sequence, the first five residues of the N-terminus and the 5 residues from the C-terminus were used to develop models. Amino acid composition of N5 reached R of 0.251 with MAE of 2.723 while C5 achieved R of 0.245 with MAE of 2.317. Dipeptide composition of N5 reached R of 0.163 while R of C5 was 0.230. The binary composition of N5 showed R = 0.174 with MAE of 2.515 while C5 reached R = 0.271, MAE being 2.304. To develop structure-based regression models we used 45 selected PaDEL descriptors (S4 Table) and applied various machine-learning techniques. The maximum R of 0.743 with MAE = 1.369 was achieved on SMOreg (Table 2). Performance of the model trained on PaDEL descriptors was also evaluated on the 10% independent dataset. We achieved R of 0.2 with MAE 1.646 and RMSE of 2.11. Detailed description of the features can be obtained from http://www.yapcwsoft.com/dd/padeldescriptor/.
Prediction of half-life on the modified dataset
On the dataset with 261 sequences containing both modified and natural sequences, we used atom composition and PaDEL descriptors as input features. Atom composition attained R of 0.586 with MAE of 1.756. The 43 selected PaDEL features (S5 Table) achieved a maximum Pearson’s correlation coefficient of 0.692 with MAE = 1.564. The performances of the models of other machine learning techniques are given in Table 3.
Implementation and description of web-server
In order to contribute to the community, we have implemented the models developed in the form of a freely accessible web server called ‘PlifePred’. Researchers can utilize this platform to predict and design the half-life of peptides. This web-server has two main modules-Natural and Modified. The Natural module has two sub-modules-Sequence Based and Structure Based. The Sequence Based module has three modules: Analog Generation, Batch Submission and Protein Scan. These modules will facilitate users with peptide composed of all natural residues in the sequence. Analog Generation module allows users to predict the half-life of a natural peptide and generates all possible single point mutation analogs along with the physiochemical properties, facilitating the scientific community in designing peptides with desired half-life and physiochemical properties. Batch Submission module assists users to screen peptide sequences in bulk and predicts half-life as well as physiochemical properties of the sequences. The Protein Scan tool allows users to submit a protein sequence and it predicts the half-life of overlapping peptides of a length chosen by the user along with their physiochemical properties, besides allowing generation of mutant peptides of peptide fragment selected by the user. The Structure Based module has two modules: Draw and File. In the Draw module, Marvin Draw applet has been integrated to facilitate users to draw and submit queries of desired peptide structures. Users with pdb files of their query peptide can use the File module to predict the half-life of peptides in blood. The Modified module also has Draw and File sub-modules which will be useful in the rational designing of the half-life of peptides with chemical modifications and non-natural amino acids. This module will be useful for users to study the effect on the half-life of peptides when different modifications are introduced within its sequence. The PlifePred web-server was implemented using HTML, PHP and Perl languages and is available at URL http://webs.iiitd.edu.in/raghava/plifepred/.
Despite the advantages of peptides over small drugs, many of them fail to reach the market because of their low stability in vivo as a result of degradation by proteases. The half-life of therapeutic peptides governs their bioavailability, biodistribution and their dosing regimen. In the wet-lab, it is a costly and time-consuming process to synthesize peptides and examine the effect of mutating different residues and the role of different chemical modifications on the desired peptide. To help researchers and expedite their research, in the present study, we have developed in silico models to predict the half-life of modified as well as natural peptide sequences. The models have been developed on the largest available dataset of experimentally validated half-life of peptides in blood. The structure-based models using chemical descriptors as input features gave the best results for both modified and natural dataset followed closely by the amino acid composition-based model on the natural dataset. We also benchmarked our result with the tools (ProtLifePred, ProtParam and HLP) already available in the literature and observed that none of them were able to outperformed our method. ProtLifePred and ProtParam showed R of 0.051 with MAE 35298 whereas HLP showed the R of 0.08 with MAE of 2821. One possible reason could be that these softwares are not specifically designed for predicting half-life of peptides present in blood. The compositional analysis revealed that the charge and size of peptides are important parameters governing peptide stability. The shorter half-life is observed in peptides rich in large and aromatic amino acids whereas peptides with negatively charged and small amino acids have a longer half-life. These results are concordant with the study performed by Sharma et al. for analyzing the half-life of peptides in intestine-like environment . The models obtained in the present study have been integrated in a freely available web server ‘PlifePred’ to aid the scientific community in the rational designing of peptide half-life. PlifePred will be a useful resource to predict and study the effects of various mutations and modifications on the half-life of peptides in blood.
S1 Table. Showing half-life of peptides having high similarity; having one or two mutations.
S2 Table. Correlation between half-life of natural peptide dataset and amino acid composition.
S3 Table. Correlation between half-life of natural peptide dataset and physicochemical properties.
S4 Table. Descriptors used for the development of structure based models on natural dataset.
S5 Table. Descriptors used for the development of structure based models on modified dataset.
S1 Fig. Shows variation in half-life of peptides by box-plot for different clusters having sequence similarity in different range.
S1 File. 261_natural+modified.xlsx: Detailed information of the peptides containing natural as well as modified residues.
S2 File. 163_natural.xlsx: Detailed information of the peptides containing only natural residues.
S3 File. pdb_files.zip: Structures of the peptides used in the study.
Authors are thankful to funding agencies Council of Scientific and Industrial Research (project Open Source Drug Discovery and GENESIS BSC0121), Department of Science and Technology (DST INSPIRE), Govt. of India for financial support.
- 1. Zompra AA, Galanis AS, Werbitzky O, Albericio F. Manufacturing peptides as active pharmaceutical ingredients. Future Med Chem. 2009;1: 361–77. pmid:21425973
- 2. Usmani SS, Bedi G, Samuel JS, Singh S, Kalra S, Kumar P, et al. THPdb: Database of FDA-approved peptide and protein therapeutics. Sang Q-XA, editor. PLoS One. 2017;12: e0181748. pmid:28759605
- 3. Singh S, Chaudhary K, Dhanda SK, Bhalla S, Usmani SS, Gautam A, et al. SATPdb: a database of structurally annotated therapeutic peptides. Nucleic Acids Res. 2016;44: D1119–26. pmid:26527728
- 4. Tyagi A, Tuknait A, Anand P, Gupta S, Sharma M, Mathur D, et al. CancerPPD: a database of anticancer peptides and proteins. Nucleic Acids Res. 2015;43: D837–43. pmid:25270878
- 5. Waghu FH, Gopi L, Barai RS, Ramteke P, Nizami B, Idicula-Thomas S. CAMP: Collection of sequences and structures of antimicrobial peptides. Nucleic Acids Res. 2014;42: D1154–8. pmid:24265220
- 6. Seshadri Sundararajan V, Gabere MN, Pretorius A, Adam S, Christoffels A, Lehväslaiho M, et al. DAMPD: a manually curated antimicrobial peptide database. Nucleic Acids Res. 2012;40: D1108–12. pmid:22110032
- 7. Mehta D, Anand P, Kumar V, Joshi A, Mathur D, Singh S, et al. ParaPep: a web resource for experimentally validated antiparasitic peptide sequences and their structures. Database (Oxford). 2014;2014: bau051–bau051. pmid:24923818
- 8. Agrawal P, Bhalla S, Usmani SS, Singh S, Chaudhary K, Raghava GPS, et al. CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 2016;44: D1098–103. pmid:26586798
- 9. Mathur D, Mehta A, Firmal P, Bedi G, Sood C, Gautam A, et al. TopicalPdb: A database of topically delivered peptides. Gupta D, editor. PLoS One. 2018;13: e0190134. pmid:29432422
- 10. Kumar R, Chaudhary K, Sharma M, Nagpal G, Chauhan JS, Singh S, et al. AHTPDB: a comprehensive platform for analysis and presentation of antihypertensive peptides. Nucleic Acids Res. 2015;43: D956–62. pmid:25392419
- 11. Kapoor P, Singh H, Gautam A, Chaudhary K, Kumar R, Raghava GPS. TumorHoPe: a database of tumor homing peptides. Xue B, editor. PLoS One. 2012;7: e35187. pmid:22523575
- 12. Vlieghe P, Lisowski V, Martinez J, Khrestchatisky M. Synthetic therapeutic peptides: science and market. Drug Discov Today. 2010;15: 40–56. pmid:19879957
- 13. Campbell C, Morimoto BH, Nenciu D, Fox AW. Drug development of intranasally delivered peptides. Ther Deliv. 2012;3: 557–68. Available: http://www.ncbi.nlm.nih.gov/pubmed/22834082 pmid:22834082
- 14. Gautam A, Nanda JS, Samuel JS, Kumari M, Priyanka P, Bedi G, et al. Topical Delivery of Protein and Peptide Using Novel Cell Penetrating Peptide IMT-P8. Sci Rep. 2016;6: 26278. pmid:27189051
- 15. Philippart M, Schmidt J, Bittner B. Oral Delivery of Therapeutic Proteins and Peptides: An Overview of Current Technologies and Recommendations for Bridging from Approved Intravenous or Subcutaneous Administration to Novel Oral Regimens. Drug Res (Stuttg). 2016;66: 113–20. pmid:26536331
- 16. Agu RU, Ugwoke MI, Armand M, Kinget R, Verbeke N. The lung as a route for systemic delivery of therapeutic proteins and peptides. Respir Res. 2001;2: 198–209. Available: http://www.ncbi.nlm.nih.gov/pubmed/11686885 pmid:11686885
- 17. Jitendra , Sharma PK, Bansal S, Banik A. Noninvasive routes of proteins and peptides drug delivery. Indian J Pharm Sci. 2011;73: 367–75. pmid:22707818
- 18. Jenssen H, Aspmo SI. Serum stability of peptides. Methods Mol Biol. 2008;494: 177–86. pmid:18726574
- 19. Wilkins MR, Gasteiger E, Bairoch A, Sanchez JC, Williams KL, Appel RD, et al. Protein identification and analysis tools in the ExPASy server. Methods Mol Biol. 1999;112: 531–52. Available: http://www.ncbi.nlm.nih.gov/pubmed/10027275 pmid:10027275
- 20. Lazaro E, Kadie C, Stamegna P, Zhang SC, Gourdain P, Lai NY, et al. Variable HIV peptide stability in human cytosol is critical to epitope presentation and immune escape. J Clin Invest. 2011;121: 2480–92. pmid:21555856
- 21. Song X, Zhou T, Jia H, Guo X, Zhang X, Han P, et al. SProtP: a web server to recognize those short-lived proteins based on sequence-derived features in human cells. Buratti E, editor. PLoS One. 2011;6: e27836. pmid:22114707
- 22. Sharma A, Singla D, Rashid M, Raghava GPS. Designing of peptides with desired half-life in intestine-like environment. BMC Bioinformatics. 2014;15: 282. pmid:25141912
- 23. Mathur D, Prakash S, Anand P, Kaur H, Agrawal P, Mehta A, et al. PEPlife: A Repository of the Half-life of Peptides. Sci Rep. 2016;6: 36617. pmid:27819351
- 24. Dwyer JJ, Wilson KL, Davison DK, Freel SA, Seedorff JE, Wring SA, et al. Design of helical, oligomeric HIV-1 fusion inhibitor peptides with potent activity against enfuvirtide-resistant virus. Proc Natl Acad Sci U S A. 2007;104: 12772–7. pmid:17640899
- 25. Nguyen LT, Chau JK, Perry NA, de Boer L, Zaat SAJ, Vogel HJ. Serum stabilities of short tryptophan- and arginine-rich antimicrobial peptide analogs. Vij N, editor. PLoS One. 2010;5: e12684. pmid:20844765
- 26. Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics. 2004;20: 2479–81. pmid:15073010
- 27. Agrawal P, Bhalla S, Chaudhary K, Kumar R, Sharma M, Raghava GPS. In Silico Approach for Prediction of Antifungal Peptides. Front Microbiol. Frontiers; 2018;9: 323. pmid:29535692
- 28. Bhasin M, Raghava GPS. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem. 2004;279: 23262–6. pmid:15039428
- 29. Kumar R, Chaudhary K, Singh Chauhan J, Nagpal G, Kumar R, Sharma M, et al. An in silico platform for predicting, screening and designing of antihypertensive peptides. Sci Rep. 2015;5: 12512. pmid:26213115
- 30. Yap CW. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem. 2011;32: 1466–74. pmid:21425294
- 31. Gorris HH, Bade S, Röckendorf N, Albers E, Schmidt MA, Fránek M, et al. Rapid profiling of peptide stability in proteolytic environments. Anal Chem. 2009;81: 1580–6. pmid:19159331
- 32. Pang SC, Sarda IR, Wigle DA, Zhou ZJ, Jennings DB, Flynn TG. Plasma clearance and tissue binding of rANP[99–126] and iso-rANP[1–45] in the rat. Regul Pept. 1991;33: 1–9. Available: http://www.ncbi.nlm.nih.gov/pubmed/1646465 pmid:1646465
- 33. Ghiso J, Shayo M, Calero M, Ng D, Tomidokoro Y, Gandy S, et al. Systemic catabolism of Alzheimer’s Abeta40 and Abeta42. J Biol Chem. 2004;279: 45897–908. pmid:15322125