Peptide-protein interactions contribute a significant fraction of the protein-protein interactome. Accurate modeling of these interactions is challenging due to the vast conformational space associated with interactions of highly flexible peptides with large receptor surfaces. To address this challenge we developed a fragment based high-resolution peptide-protein docking protocol. By streamlining the Rosetta fragment picker for accurate peptide fragment ensemble generation, the PIPER docking algorithm for exhaustive fragment-receptor rigid-body docking and Rosetta FlexPepDock for flexible full-atom refinement of PIPER docked models, we successfully addressed the challenge of accurate and efficient global peptide-protein docking at high-resolution with remarkable accuracy, as validated on a small but representative set of peptide-protein complex structures well resolved by X-ray crystallography. Our approach opens up the way to high-resolution modeling of many more peptide-protein interactions and to the detailed study of peptide-protein association in general. PIPER-FlexPepDock is freely available to the academic community as a server at http://piperfpd.furmanlab.cs.huji.ac.il.
Peptide-protein interactions are crucial components of various important biological processes in living cells. High-resolution structural information of such interactions provides insight about the underlying biophysical principles governing the interactions, and a starting point for their targeted manipulations. Accurate docking algorithms can help fill the gap between the vast number of these interactions and the small number of experimentally solved structures. However, the accuracies of the existing protocols have been limited, in particular for ab initio docking when no information about the peptide beyond its sequence is available. Here we introduce PIPER-FlexPepDock, a fragment-based global docking protocol for high-resolution modeling of peptide-protein interactions. Integration of accurate and efficient representation of the peptide using fragment ensembles, their fast and exhaustive rigid-body docking, and their subsequent accurate flexible refinement, enables peptide-protein docking of remarkable accuracy. The validation on a representative benchmark set of crystallographically solved high-resolution peptide-protein complexes demonstrates significantly improved performance over all existing docking protocols. This opens up the way to the modeling of many more peptide-protein interactions, and to a more detailed study of peptide-protein association in general.
Citation: Alam N, Goldstein O, Xia B, Porter KA, Kozakov D, Schueler-Furman O (2017) High-resolution global peptide-protein docking using fragments-based PIPER-FlexPepDock. PLoS Comput Biol 13(12): e1005905. https://doi.org/10.1371/journal.pcbi.1005905
Editor: Roland L. Dunbrack Jr., Fox Chase Cancer Center, UNITED STATES
Received: August 21, 2017; Accepted: November 29, 2017; Published: December 27, 2017
Copyright: © 2017 Alam et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This work was supported by the European Research Council under the ERC Grant Agreement  (to OSF), the USA-Israel Binational Science Foundation  (to OSF & DK), and NSF grants DBI 1458509 and AF 1527292 (to DK). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Proteins are the workhorses inside living cells, and interactions among them are critical for various important biological processes . A significant fraction of these interactions (15–40%)  are peptide mediated, where a short stretch of residues from one partner contributes most to its binding to the other. Such short peptidic regions, also termed short linear interacting motifs (SLIMs) are often found embedded inside disordered regions of intrinsically disordered proteins (IDPs) [2, 3], or appear as flexible linkers connecting domains  and as flexible loops tethered to rigid segments .
The development of accurate structure based modeling tools is critical for atomic level understanding of peptide-protein interactions, to allow the manipulation of known interactions, to discover yet unknown peptide-protein interactions and networks, and to provide starting points for the design of novel peptides and related molecules to target specific systems of pharmacological interest . A number of computational tools have been developed to assist the characterization of peptide-protein interactions, including the prediction of peptide binding sites [7–9], refinement of coarse peptide-protein models , folding and docking on a known binding site  and most challenging of all, global peptide-protein docking with no prior information about the peptide structure and the binding site [12–17]. The challenges associated with the global docking of flexible peptides have been addressed in different ways, by reducing the conformational space to be sampled both for the internal degrees of freedom of the peptide as well as its rigid-body orientations on the receptor surface. For peptide docking within the HADDOCK docking framework , the peptide backbone is represented by idealized conformation(s), such as alpha helix, beta strand and polyproline-II, followed by rigid-body, semi-flexible and fully-flexible docking with explicit solvation . The pepATTRACT protocol [13, 19] uses the same approach to represent the peptide, followed by coarse-grained rigid-body docking and flexible full-atom refinement. The AnchorDock protocol uses molecular dynamics simulations to generate a set of plausible peptide conformations, which are then docked using anchor-driven simulated annealing molecular dynamics around predicted anchoring spots on the receptor . The CABS-dock protocol uses randomly generated peptide conformations based on either predicted or known secondary structure, randomly orients these peptides over the receptor surface, and refines them using replica exchange Monte Carlo dynamics . The MDockPep protocol  uses peptide sequence similarity to extract fragments from high resolution protein structures, which are further refined using MODELLER  to generate plausible peptide conformations, and then docked onto the receptor using rigid-body docking and flexible docking with AutoDock Vina . The recently published IDP-LZerD protocol models the binding of long disordered segments to structured proteins using the Rosetta fragment picker protocol  to generate fragments of 9-residue overlapping windows followed by LZerD  rigid-body docking and molecular dynamics refinement . Finally, we have recently advanced a novel, global motif-based peptide fragment docking approach, PeptiDock , in which peptide binding motif information rather than secondary structure propensity is used to extract fragments from the Protein Data Bank (PDB ), which are then docked to the receptor using PIPER rigid body docking , followed by minimization using CHARMM .
Notwithstanding these significant recent advances in global peptide docking, present approaches are still limited in their modeling quality and general applicability, and there is ample room for improvements that would enable the detailed high-resolution study of more peptide-protein interactions with higher accuracy. Here we describe PIPER-FlexPepDock, a successful effort toward the development of such a robust, highly accurate, global peptide-protein docking protocol. By integrating accurate peptide fragment ensemble generation using the Rosetta fragment picker , fast and exhaustive fragment-receptor rigid-body docking using PIPER docking , and flexible full-atom refinement of coarse PIPER models using Rosetta FlexPepDock , we were able to sample both the peptide backbone conformational states, as well as the landscape of the peptide-receptor interactions efficiently and with much higher accuracy than current protocols: on a representative non-redundant dataset of peptide-protein complexes well resolved by X-ray crystallography (Table 1 below), PIPER-FlexPepDock generates for about half models within 2.5 Å ligand RMSD (2.0 Å, when restricted to motif regions where available), more than twice as many as for existing peptide docking protocols such as pepATTRACT  (among the 10 top-ranked predictions; Table 2 below).
PDB ids of the initial calibration set are highlighted in bold.
Results are shown for PIPER-FlexPepDock runs on unbound receptor structures, including receptor minimization.
Our results highlight the relevance of representing the peptide as a set of fragments that can be exhaustively docked as rigid bodies onto the receptor structure and subsequently refined using an accurate refinement protocol. They reinforce the underlying biophysical model of a conformer ensemble of the free peptide that already samples the bound conformation (at least in the encounter-complex, protein-like environment) and involves only limited induced fit, not unlike the classical association between preformed protein domains. As a result, PIPER-FlexPepDock brings into reach the study and targeted manipulation of a range of additional peptide-mediated interactions not accessible before due to limitations in sampling and/or accuracy.
Overview of the PIPER-FlexPepDock protocol (Fig 1)
Example shown: PDZ domain-peptide interaction [PDB IDs of receptor structure 1MFG (bound) and 2H3L (free)]. For a given receptor structure and peptide sequence, the divide and conquer strategy involves first the description of the peptide as an ensemble of fragments (A), their fast and exhaustive rigid body docking (using PIPER) onto the whole receptor (binding site region is shaded salmon) (B), and subsequent high-resolution refinement (using Rosetta FlexPepDock; the top 5000 models are included in the plot) (C), followed by clustering and selection of top ranking representatives. Fragments are colored according to their similarity to the native bound peptide conformation. L-RMSD: Ligand root mean square deviation from crystal structure; see text for more details.
Step A | Generation of fragment set to represent the peptide conformer ensemble.
In a previous study we have shown that the bound peptide conformation can be well represented by extraction of short fragments from the PDB based on information of known binding sequence motifs . Here we have generalized this approach beyond motifs, using fragment libraries selected by the Rosetta fragment picker protocol  based on sequence and secondary structure similarity (see Methods). The coordinates of the top 50 mapped fragments are extracted from the PDB, including both backbone and side-chain atoms, and non-identical residues in the extracted fragments are mutated to the desired sequence. This set of fragments adequately represents the peptide conformational ensemble, sampling also its receptor bound conformation (see below). The peptide may be trimmed in cases where information is available about the range of the binding segment (from motif databases such as the Eukaryotic Linear Motif (ELM) resource [29, 30], literature, or experiments such as alanine scanning), since fragments generated for shorter peptide sequences are usually better representative than longer fragments (as, e.g., for loop modeling ), and fraying ends beyond the motif may contribute less to determine critical binding details.
Step B | Fragment rigid-body docking using PIPER.
Each of the fragments is docked onto the receptor structure using PIPER, an exhaustive Fast Fourier Transform (FFT)-based rigid body docking algorithm , as implemented previously for PeptiDock  (see Methods), and top ranking fragment orientations from each docking run are collected and combined together. These models are of low resolution as no flexibility is included in the PIPER algorithm, and therefore ranked using a soft potential that allows a certain degree of steric clashes to overcome the limitations of rigid-body only docking.
Step C | FlexPepDock refinement of PIPER models and selection of final models.
Each of the PIPER models is refined by a single fully flexible refinement run using the Rosetta FlexPepDock Refinement algorithm  (see Methods). The top ranking refined models are clustered (as in Gray et al. ), clusters are ranked based on the reweighted score of the best scoring model in each cluster (as in Raveh et al. ), and the top 10 ranked cluster representatives are selected as prediction (following the CAPRI scheme that accepts 10 models ).
Initial calibration of the PIPER-FlexPepDock on a small set of protein-peptide complexes
Motivated by our recent advance in global peptide docking using a motif-focused approach  we ventured into the development of a more generalized protocol. We initially calibrated our docking approach on a small but representative set of nine peptide–protein complexes (highlighted in bold in Table 1; see also S1A Table). We trimmed the peptide based on the motif defined in ELM, where available. For all complexes high modeling accuracy was achieved for this new global docking approach (within ≤2.5Å Ligand RMSD models among the top 10 ranking clusters; Table 1). For the full length peptides modeling near-native models were obtained for 5/9 cases, highlighting the benefits for motif (or shorter peptide sequence) focused modeling, due to better fragment quality compared to the corresponding full-length peptides (Table 1). Encouraged by these initial results, we proceeded to the validation of our protocol on a larger representative set of peptide-protein complexes (Table 1 and S1B Table).
Assessment of peptide docking performance
We assessed the performance of PIPER-FlexPepDock on a larger, non-redundant set consisting of 27 complexes (compiled from the 42 complexes used in previous studies, but non-redundant at the domain level, as defined by CATH ; see Methods), among them 12 with reported binding motif. The benchmark is summarized in Table 1 (S1C Table provides results for the redundant set of 42 complexes used in previous studies, as well as additional details, including performance of other approaches for comparison).
Representing the peptide conformational states using fragments.
Fragments derived from solved protein structures contain valuable information about the local structural context that can be used to efficiently reduce sampling space for various modeling applications, including e.g. ab initio protein folding  and loop building [31, 38, 39]. In our protocol we use the Rosetta fragment picker protocol  to generate fragments consistent with both the peptide sequence and the (predicted) secondary structure (See Methods).
How accurately do the fragments represent the peptide conformational states? Most importantly, how similar are peptide conformations to the one adapted when bound to their receptor? A significant representation of similar fragments could guarantee that, when docked with high density in the binding site using an exhaustive but accurate rigid-body docking algorithm, they could efficiently be refined to high resolution using an accurate refinement algorithm such as Rosetta FlexPepDock. To assess the quality of the fragments (i.e., their coverage of the bound conformation) we analyzed the distribution of backbone RMSDs of the fragments relative to the bound peptide conformation. Reassuringly, the fragment pool generated using the Rosetta fragment picker protocol represents in most cases the bound like peptide conformation with high accuracy in our benchmark of 27 peptide-protein complexes (Fig 2A: median backbone RMSD within 2.0 Å for 15 out of the 27 cases, with average backbone RMSD of the best ten fragments within 1.0 and 1.5 Å for 14 and 21 cases, respectively). The best accuracy is achieved for helical peptide motifs (e.g., the helical nuclear receptor box motif in 2A3I ; for helical peptides with coiled terminus segments such as 2FMF  and 1NX1  the median backbone RMSD is slightly higher). Even for the remaining peptides the fragment ensemble is often composed of a significant portion of bound like representatives. The worst representation of bound-like peptides is obtained for few longer coil peptides, such as 2B9H , which defines the limitation of the fragment picker protocol for longer sequences. In such cases, trimming the peptide might improve the quality significantly.
(A) Fragment quality: distribution of fragment backbone RMSDs relative to the native bound peptide conformation (defined as fragment quality). PDBs with and without motif information are grouped separately. The initial calibration set is marked with asterisks (*). (B) PIPER rigid body docking: distribution of the number of models within 5Å ligand (L)-RMSD from the native, colored according to fragment quality. (C) Improvement after FlexPepDock refinement: distribution of the L-RMSDs of the top 1% FlexPepDock refined models (in orange) and corresponding PIPER models (in gray). Shown are the results of runs starting from the unbound receptor structure and including receptor minimizations (see also Fig 3). The circles represent the L-RMSD values of the best model among the top 10 ranking clusters. The Y-axis has been trimmed to 7Å. Note that for the PIPER runs, circles represent the top-ranked model of a PIPER run (including density clustering, as described in Methods and Porter et al. ), while the distributions represent the subset of models that served as starting structures for the models selected after FlexPepDock refinement. The former allows the comparison of the final results from a PIPER run to a corresponding PIPER-FlexPepDock run, while the latter shows improvement due to FlexPepDock refinement for the finally selected models.
We previously showed that extracting fragments based on sequence motif information allows identification of bound peptide conformations that reflect the structural pattern of these motifs . We demonstrate here that representative fragments are not restricted to peptides with known motifs. In fact, a comparison to the fragments extracted based on sequence motif (for the dataset analyzed in the PeptiDock study, using the motif definition therein ) shows that the fragment picker approach produces overall ensembles that contain structures more similar to the bound peptide conformation (see S2 Table).
Rigid-body docking: Fragment quality and PIPER performance.
The fact that the fragment ensembles include bound-like conformations justifies proceeding to the next step, namely their docking onto the receptor. The PIPER rigid-body docking protocol allows fast and exhaustive sampling to provide coarse models of fragment-receptor interactions, of which the top-scoring can be followed up by subsequent refinement to allow for conformational changes upon binding. The effective range for successful refinement using the FlexPepDock protocol was previously found to be within up to 5Å in terms of Cartesian RMSD, and up to 50 degrees in terms of Φ–Ψ RMSD (distance of fragment from the bound peptide conformation in Φ–Ψ dihedral space) . It is thus important for the PIPER docking stage to identify a large pool of fragments that are densely docked in close proximity (within effective Cartesian RMSD range) of the native peptide binding mode, involving docked fragments that are similar to the native peptide bound conformation (within effective phi-psi RMSD range). Indeed, analysis of the top ranking PIPER models shows presence of good quality fragments at the binding site (in fact, most complexes include <1.0Å bb RMSD fragments; Fig 2B).
Improvement of PIPER models by FlexPepDock refinement.
The FFT algorithmic implementation of rigid-body sampling in PIPER makes exhaustive orientation search possible with significant computational efficiency, but is defined on a grid. Consequently, the scoring function can successfully isolate the best few hundreds from the vast pool of billions of positions of the peptide fragment relative to the receptor, but not discriminate the top rigid-body docked models further (Fig 3A & S1 Fig). In turn, the Rosetta scoring function used in the FlexPepDock Refinement protocol (currently Talaris 2014 ) is highly accurate, but this flexible docking protocol lacks the ability for fast and exhaustive sampling. Thus, to address the problem of exhaustive sampling with high accuracy, we combine the fast and exhaustive rigid-body sampling of PIPER with accurate flexible refinement by FlexPepDock of the top ranking few hundred best models. Indeed, the FlexPepDock refinement stage significantly improves the model quality, as well as better model ranking (See Figs 2C and 3C and S1 Fig). This includes the identification of a near-native funnel missed before (e.g. 1CZY in Fig 3 –compare A to C), or significant enhancement of a near-native funnel (e.g. 1JD5 and 2A3I). More examples can be found in S1 Fig.
Left: PDB id 1CZY (coiled peptide); Center: 1JD5 (extended peptide); Right: 2A3I (helical peptide). (A) Energy landscape as sampled in the first docking step of the protocol by PIPER rigid body docking of peptide fragments onto the unbound receptor structure. (B-D) Energy landscapes for the PIPER-FPD scheme, starting from the unbound receptor structure (B), the unbound receptor structure including receptor flexibility (C), and the corresponding bound receptor for comparison (D). Models are colored according to fragment quality, as in previous Figures. (E) Comparison of the modeled to the native structure (shown in blue and green, respectively).
We performed three runs to assess protocol performance (Summarized in Fig 4A and S1B Table; specific examples are shown in Fig 3): First, we applied the protocol to bound receptor structures. For these runs a near-native peptide conformation (L-RMSD < = 2.0Å, see Methods section) was found among the top 10 ranked clusters for 19 out of 27 complexes (success rate = 70%, Fig 3D). We then proceeded to the real-world scenario, in which the free receptor structure was provided as starting point (unbound run), leading to worse performance, as expected (10 complexes successfully modeled—success rate = 37%, Fig 3B). Importantly however, when including also receptor flexibility during the refinement stage (unbound-min run), these results improved, in particular if 10 best models are considered (14/27 complexes successfully modeled—success rate = 52%, Fig 3C).
(A) Overall performance on a non-redundant set of 27 peptide-protein complexes. Top: Distribution of best model L-RMSDs (among top 10 ranking clusters) for runs using the bound (BOUND) and free (UNBOUND & UNBOUND-MIN) receptor structures, the latter including also receptor flexibility in the final refinement step (only the motif region was modeled for the 12 complexes with known motif). Shown are both the L-RMSD values for each protein-peptide complex (grey circles, rounded values for improved visibility are provided), as well as the distribution (quartiles and medians, with median values printed alongside). Bottom: Distribution of the ranks of the first near-native cluster (defined as L-RMSD < = 2.0Å), shown using different shades (for corresponding results among the top1, top3 and top10 ranked predictions). (B) Comparison to performance by other algorithms. Top: Box plots of best L-RMSDs among top 10 ranking clusters, including results for the motif part where the motif is known (as in A), as well as for the full peptide, for comparison. Bottom: Performance is shown for different cutoffs (3.0Å and 2.0Å L-RMSD in left and right boxes, respectively) (See S1B Table for more details).
Comparison with other global docking protocols
We compared the results of PIPER-FlexPepDock (unbound-min run) with other existing global peptide-protein docking protocols such as HADDOCK , pepATTRACT , CABS-dock , and MDockPep  on our non-redundant set of 27 complexes, as well as on the set of 42 complexes used by these protocols in previous studies [34 complexes were compared with HADDOCK as other 8 cases were not included in their unbound run set). Since full length peptides were modeled using the other protocols, we modeled full length peptides for the motif set cases for valid comparison. The success rate for generating near-native models (i.e., L-RMSD within 2.0Å, or 3.0Å) was significantly better for PIPER-FlexPepDock than any other protocol, even for models of the full peptides (see Fig 4B and Table 2).
The PIPER-FlexPepDock server for the high-resolution modeling of peptide-protein interactions
In order to maximize the impact of our new protocol for global peptide-protein docking and to make it accessible to the modeling of many new peptide-protein complexes, we have set up a user-friendly server open to the scientific community (Fig 5). All that is needed is a structure of the receptor and a sequence of the peptide, but additional information about peptide secondary structure can also be included to narrow the search. The top-ranking resulting models can be downloaded, or inspected by an interactive viewer using the 3Dmol.js libraries .
(A) Job submission page: the required input includes the structure of the receptor and the sequence of the peptide; advanced options are accessible via a button. The tabs at the top provide links to detailed descriptions of the server, as well as to the Queue (upper right). (B) Results of an example peptide docking run: The liprin C-terminal peptide sequence VRTYSC docked onto the PDZ domain of GRIP1 (free receptor PDB id 1N7E). The top10 ranking models can be downloaded, and links to the individual models are provided to the left for inspection using an interactive viewer. In this case, Model 1 is an accurate prediction (L-RMSD = 1.0Å from solved structure PDB id 1N7F). On the right side a scatter plot shows the sampled energy landscape (relative to the lowest energy structure of the simulation).
A new approach for global peptide docking with excellent performance
With the presentation of our new PIPER-FlexPepDock algorithm, we have demonstrated that combining fast and exhaustive rigid-body docking (using the FFT-based PIPER docking algorithm) of a representative peptide conformer ensemble (approximated by fragments extracted from solved structures, based on local similarity of sequence and secondary structure), with high-resolution refinement (using Rosetta FlexPepDock) is a successful approach for the generation of models of peptide-receptor structures of remarkable accuracy–significantly better than any other current protocol—starting from the sequence of the peptide and the structure of the receptor. The performance on a representative benchmark of solved peptide-protein complex structures demonstrates both accuracy and robustness of our modeling approach, and opens up the way of modeling many more peptide-protein interactions at much higher resolution and accuracy than any existing global peptide-protein global docking protocol.
Receptor-bound peptide conformations are adequately represented by fragments extracted from protein monomer structures
This study demonstrates that fragments derived from solved protein structures, based on secondary structure and sequence similarity (rather than on sequence binding motifs which are not always available) represent the peptide conformational states with high accuracy, in particular the bound state. Interestingly, it is this same observation regarding the representation of local conformational preference that provided originally the platform for the breakthrough of Rosetta ab initio protein structure prediction . This indicates that while isolated peptides in solution rarely show significant conformational preferences , in the encounter complex regime in vicinity of other proteins, their conformational freedom seems to be restricted significantly (similar to local peptide regions within a full protein) and can be represented by fragment libraries, in concordance with previous reports that show similar arrangements of fragments within monomers and peptide-protein interactions .
Effective sampling of the energy landscape
The simplified scoring function and exhaustive sampling with PIPER allows uniform sampling of the fragments onto the receptor on a smoothened energy landscape. The top scoring PIPER models represent the dense sampling into wider energy basins. Though the ranking of models might lack the accuracy at this stage, the following refinement stage performs local sampling to efficiently locate the minimum. Interestingly, this approach is much more effective than the local refinement starting from one representative model: only one FlexPepDock optimization run is necessary starting from each PIPER model, compared to several hundred to thousand runs starting from a representative (defined, e.g. from a PIPER run as implemented in the PeptiDock peptide motif docking algorithm ). This is most probably due to the fact that these starting coarse models are trapped in many distinct states, each near a distinct local minimum, simplifying sampling during optimization.
Mapping encounter complexes and more
The peptide-receptor binding energy landscape can provide a broader understanding of the binding mechanism itself. The exhaustive sampling with accurate refinement provides a high-resolution map of the energy landscape and helps us understand the energetic of the encounter between the peptide and the receptor. In a previous study, we were able to demonstrate that experimentally observed encounter complexes are well reproduced from a global protein docking energy landscape , and we anticipate that the corresponding peptide-protein docking energy landscape will provide similar information.
The approach described in this study improves significantly both accuracy as well as scope of peptide docking, at least as suggested by its performance on the widely accepted PeptiDB peptide docking benchmark [12, 13, 15, 16]. At the same time, it also highlights the bottlenecks to be overcome for its broader generalization: (1) Accurate modeling of peptide conformational ensemble: Even though the fragments generated using the Rosetta fragment picker protocol sample in general the bound peptide conformation well, challenges remain in the modeling of longer peptides, as well of as peptides with unusual conformations (Fig 2A). This is attributed to the lack of a large pool of longer representative fragments with similar sequences in solved structures. The rigid body PIPER docking step does not include any flexibility, and therefore accurate fragments are very important for efficient further refinement by FlexPepDock to near-native model quality. This challenge could be overcome by incorporating a peptide-folding algorithm as first step for fragment generation, assuming that bound-like conformations would indeed be sampled. (2) Modeling significant receptor backbone flexibility: While for many peptide-protein interactions the receptor is already pre-organized and the binding of the peptide does not induce considerable movement , binding may involve significant structural rearrangement of the receptor (e.g., in the binding of Slam tail peptide to the SH2 domain of the XLP protein SAP, PDB id 1D4T). To model such challenging cases, improved modeling of receptor flexibility is mandatory (using e.g. backrub moves  and other advanced comparative modeling approaches ). (3) Improved ranking of alternative models: Inspection of failures highlights that despite low quality, many of the failed simulations model the peptide into the correct binding pocket, and identify the binding hotspot regions, similar to our observation in CAPRI community-wide performance . However, the details are not correct, often pointing the wrong peptide residue side chain into a given binding pocket. Such ranking problems might be removed with the advance towards better scoring functions. (4) Extension to flexible interactions: Last but not least, this approach might be restricted to peptide-mediated interactions in which the bound peptide adopts one, defined conformation, since it has been calibrated on well-resolved crystal structures of peptide-protein complexes. Many biologically relevant interactions remain more flexible, and are therefore studied using e.g. NMR experiments. The next challenge will be to extend this approach to the study of such interactions.
To summarize, the novel global peptide-docking pipeline presented here allows modeling of peptide-protein interactions with much improved accuracy and scope. With further improvements for modeling of increased receptor flexibility and peptide conformational ensemble generation as described above, we should be able to accurately model any interaction that adapts a stable conformation that can be crystallized, as well as explore common features of interactions beyond.
Materials and methods
Docking performance and analysis was calibrated and assessed on a benchmark of peptide-protein complexes derived from the PeptiDB database , filtered according to the following criteria:
- Availability of both the complex and the free receptor structure, solved by X-ray crystallography (resolution of the complex ≤2.0Å).
- Absence of crystal contacts that could influence the peptide conformation. In certain cases this further interaction is of biological relevance, leading to receptor multimerization and clustering (e.g. PeptiDB entries involving some of the SH3 domain-peptide interactions, 2AK5 , and 2J6F ). Since for these cases, obtaining high-resolution models might be challenging without including the symmetry mate, such examples were removed from the dataset.
- Absence of large receptor rearrangement upon peptide binding. Even though the present implementation of PIPER-FlexPepDock does allow for local conformational changes in the receptor (backbone as well as side chains), accurate modeling of more significant movement of the receptor upon peptide binding (e.g. significant loop movement at the binding interface in PeptiDB entry 1D4T ) require the development of algorithms for efficient modeling of more significant receptor flexibility, which is beyond the scope of the present study.
- Non-redundant dataset. The criteria above result in a dataset of 42 complexes (S1C Table) that is very similar to the one used in previous studies by different groups [12, 13, 15, 16]. To ensure that no bias towards a certain peptide-receptor would be introduced, we extracted a domain non-redundant set (defined by CATH classification ), resulting in the 27 complexes described in this study in detail (Table 1 and S1B Table).
The dataset was further divided into two subsets, based on available information about a peptide binding motif (defined in this study based on ELM , http://elm.eu.org): For the motif set (12 complexes) we modeled only the motif part, since it contributes most to binding, and shorter peptides are easier to model. To enable comparison to performance of other protocols, we subsequently also docked the full peptide. For the non-motif set (15 complexes), the full peptide was docked.
Initial calibration set: For initial calibration, we selected a smaller subset of 9 complexes (S1A Table). The established protocol was then validated on the remaining complexes, to ensure similar performance and thereby prevent overfitting of the modeling protocol.
The steps of the PIPER-FlexPepDock protocol
In the following we provide specific details of the different steps of the PIPER-FlexPepDock protocol. For runline commands, see the Supplementary S1 Text.
(1) Generation of peptide conformations using Rosetta fragment picker and Rosetta fixbb design.
The Rosetta fragment picker  uses a scoring measure composed of a weighted combination of secondary structure propensity, sequence profile similarity and residue propensities for local regions in the Ramachandran plot  to map fragments to vall, a database of solved high-resolution monomer protein structures (e.g., vall.jul19.2011, available as part of the Rosetta release). Consequently, the mapped fragments are consistent with the peptide sequence (as defined by a sequence similarity profile generated with PSI-BLAST  and secondary structure as predicted using PSIPRED ; even though PSIPRED was shown to perform quite well for shorter sequences , we use the full protein sequence from which the peptide was derived for PSIPRED and PSIBLAST runs, where available). If the preferred secondary structure is already known (e.g. the alpha helical nuclear receptor box motif) it can be provided instead of PSIPRED predictions. Secondary structural information can also be obtained from experimental techniques such as Circular dichroism (CD) spectroscopy, or approximated by residue Ramachandran local region propensities (derived from statistical analysis of high-resolution protein structure ). The coordinates of the top fifty assigned fragments are extracted from the PDB, and side chains of residues not identical to those of the query peptide are modeled using the Rosetta fixbb design algorithm . The whole process results in an ensemble of 50 fragments for the query peptide sequence.
(2) Rigid body docking using PIPER.
Each of the fifty fragments is globally docked onto the receptor using the PIPER Fast Fourier transform (FFT) docking algorithm, as detailed before , decomposing the free receptor into independent binding units (either a single domain or repeated, non-decomposable domains; as in Lavi et al. ). The calculations are performed for each of 70,000 rotations, and one lowest-energy translation for each rotation is retained. For each fragment docking run the top ranked 250 solutions (total 50x250 = 12500 models) are collected for refinement in the next step (see S3 Fig for a comparison of performance using different numbers of top-ranked solutions).
Selection of final model from a PIPER simulation: In order to compare performance of a protocol involving only the first PIPER rigid body docking step (in Table 1), we selected the final models as reported previously (similar to the PeptiDock implementation , but without minimization). In short, the models collected are clustered (with radius of 3.5Å Cα RMSD), and cluster density is used for ranking and selection of representatives.
(3) The Rosetta FlexPepDock refinement algorithm.
The FlexPepDock Refinement protocol refines all of the peptide’s degrees of freedom (i.e. its rigid body orientation as well as backbone dihedral angles), as well as the receptor side chain conformations. Rosetta FlexPepDock refinement was performed as described previously , with slight changes: (1) Sampling: In our present implementation, we also allowed the receptor backbone to move during minimization steps, to allow for slight readjustment upon binding (compare e.g. Fig 3B and 3C). (2) Scoring: Rosetta energy function Talaris2014  was used. Clustering of models was performed as previously described, using a threshold of 2.0Å . The top-scoring member of each cluster (according to reweighted score) was selected as the representative member, and clusters were ranked based on the reweighted score of the representative members (as in Raveh et al. ).
Model evaluation criteria
For each global docking run the 10 top ranking clusters were selected as prediction and evaluated for quality based on ligand RMSD (L-RMSD), calculated between the native and model peptide backbone atoms after optimal superimposition of the receptor, as done in the CAPRI assessment [34, 35]. L-RMSD and other measures, such as Fnat and I-RMSD, were calculated using DockQ .
Rosetta release version
The protocol and tests described in this manuscript follow the FlexPepDock protocol, as implemented within the Rosetta weekly release version 2016.20.58704.
Simulation running time
The processing time for the different stages of the protocol depends on both the length of receptor and the peptide sequence. For example the global docking the carboxy-terminal tail of the ErbB2 Receptor GLDVPV onto the free ERBIN PDZ domain (103 residues) the generation of 50 fragments takes ~8 CPU minutes over an AMD Sun cluster with 300 cores. For the same complex a single PIPER fragment docking simulation takes ~2 minutes and a single refinement run of the PIPER docked model takes ~1 minutes on the same system architecture (~ 1.5 hours to refine all models).
The runline commands are provided in the Supplementary S1 Text. The Rosetta software is available for free to the academic community. The details regarding downloading and installation is available at https://www.rosettacommons.org. PIPER FFT rigid body docking is available as part of the protein-protein docking server ClusPro (PeptiDock at https://peptidock.cluspro.org).
Top line: PIPER rigid body docking of peptide fragments onto the unbound receptor structure; Middle lines: FlexPepDock refinement of the PIPER docked fragments on the unbound rigid (second line) and flexible (third line) receptor structure; Bottom line: PIPER-FlexPepDock results starting from a bound receptor structure.
S2 Fig. Fragment quality is significantly better for shorter, motif-defined peptide segments (accompanies Fig 2A).
Distributions of fragments backbone RMSD values relative to the bound peptide conformations for the motif segments and corresponding full length peptides. The motif set complexes 1JWG and 1TP5 are not added as in these cases the motif covers the whole peptide.
S3 Fig. Performance of PIPER-FPD with different number of top PIPER models selected for the refinement stage.
Distributions of L-RMSDs of the best models among top 10 ranking clusters for runs using the bound receptor structure (BOUND) and the free receptor structure (UNBOUND & UNBOUND-MIN), the latter including also receptor flexibility in the final refinement step (only the motif region was modeled for the 12 complexes with known motif). The number of PIPER models taken for the FlexPepDock refinement step is shown below each boxplot. Based on these results, we determined a cutoff of 250 models for optimal tradeoff between performance and running time.
S1 Table. Details of the datasets of peptide-protein complexes, including modeling results for PIPER-FlexPepDock and other peptide docking protocols (accompanies Table 1).
(A) Calibration set (n = 9 complexes); (B) Non-redundant set (n = 27 complexes); (C) Redundant set (n = 42 complexes).
S2 Table. Median fragment-native Backbone-RMSD values for the PeptiDock set complexes obtained using Rosetta fragment picker and the motif-based fragment generation approach used in PeptiDock .
We thank Dr. Barak Raveh for insightful discussions. We also thank Dr. Christina Schindler for providing the pepATTRACT models as reported in Schindler et al. , and Dr. Mikael Trellet and Prof. Alexandre Bonvin for providing the link for the SBGrid deposited HADDOCK models (https://data.sbgrid.org/dataset/131/) as reported in Trellet et al. , for the comparison of performance.
- 1. Pawson T, Nash P. Assembly of cell regulatory systems through protein interaction domains. Science. 2003;300(5618):445–52. pmid:12702867
- 2. Petsalaki E, Russell RB. Peptide-mediated interactions in biological systems: new discoveries and applications. Curr Opin Biotechnol. 2008;19(4):344–50. pmid:18602004
- 3. Neduva V, Linding R, Su-Angrand I, Stark A, de Masi F, Gibson TJ, et al. Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol. 2005;3(12):e405. pmid:16279839
- 4. Vacic V, Oldfield CJ, Mohan A, Radivojac P, Cortese MS, Uversky VN, et al. Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome Res. 2007;6(6):2351–66. pmid:17488107
- 5. Gamble TR, Vajdos FF, Yoo S, Worthylake DK, Houseweart M, Sundquist WI, et al. Crystal structure of human cyclophilin A bound to the amino-terminal domain of HIV-1 capsid. Cell. 1996;87(7):1285–94. pmid:8980234
- 6. London N, Raveh B, Schueler-Furman O. Druggable protein-protein interactions—from hot spots to hot segments. Curr Opin Chem Biol. 2013;17(6):952–9. pmid:24183815
- 7. Trabuco LG, Lise S, Petsalaki E, Russell RB. PepSite: prediction of peptide-binding sites from protein surfaces. Nucleic Acids Res. 2012;40(Web Server issue):W423–7. pmid:22600738
- 8. Saladin A, Rey J, Thevenet P, Zacharias M, Moroy G, Tuffery P. PEP-SiteFinder: a tool for the blind identification of peptide binding sites on protein surfaces. Nucleic Acids Res. 2014;42(Web Server issue):W221–6. pmid:24803671
- 9. Lavi A, Ngan CH, Movshovitz-Attias D, Bohnuud T, Yueh C, Beglov D, et al. Detection of peptide-binding sites on protein surfaces: the first step toward the modeling and targeting of peptide-mediated interactions. Proteins. 2013;81(12):2096–105. pmid:24123488
- 10. Raveh B, London N, Schueler-Furman O. Sub-angstrom modeling of complexes between flexible peptides and globular proteins. Proteins. 2010;78(9):2029–40. pmid:20455260
- 11. Raveh B, London N, Zimmerman L, Schueler-Furman O. Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors. PLoS One. 2011;6(4):e18934. pmid:21572516
- 12. Trellet M, Melquiond AS, Bonvin AM. A unified conformational selection and induced fit approach to protein-peptide docking. PLoS One. 2013;8(3):e58769. pmid:23516555
- 13. Schindler CE, de Vries SJ, Zacharias M. Fully Blind Peptide-Protein Docking with pepATTRACT. Structure. 2015;23(8):1507–15. pmid:26146186
- 14. Ben-Shimon A, Niv MY. AnchorDock: Blind and Flexible Anchor-Driven Peptide Docking. Structure. 2015;23(5):929–40. pmid:25914054
- 15. Kurcinski M, Jamroz M, Blaszczyk M, Kolinski A, Kmiecik S. CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Res. 2015;43(W1):W419–24. pmid:25943545
- 16. Yan C, Xu X, Zou X. Fully Blind Docking at the Atomic Level for Protein-Peptide Complex Structure Prediction. Structure. 2016;24(10):1842–53. pmid:27642160
- 17. Peterson LX, Roy A, Christoffer C, Terashi G, Kihara D. Modeling disordered protein interactions from biophysical principles. PLoS Comput Biol. 2017;13(4):e1005485. pmid:28394890
- 18. Dominguez C, Boelens R, Bonvin AM. HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J Am Chem Soc. 2003;125(7):1731–7. pmid:12580598
- 19. de Vries SJ, Rey J, Schindler CEM, Zacharias M, Tuffery P. The pepATTRACT web server for blind, large-scale peptide-protein docking. Nucleic Acids Res. 2017.
- 20. Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics. 2014;47:5 6 1–32.
- 21. Trott O, Olson AJ. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. 2010;31(2):455–61. pmid:19499576
- 22. Gront D, Kulp DW, Vernon RM, Strauss CE, Baker D. Generalized fragment picking in Rosetta: design, protocols and applications. PLoS One. 2011;6(8):e23294. pmid:21887241
- 23. Venkatraman V, Yang YD, Sael L, Kihara D. Protein-protein docking using region-based 3D Zernike descriptors. BMC Bioinformatics. 2009;10:407. pmid:20003235
- 24. Porter KA, Bing X, Beglov D, Bohnuud T, Alam B, Schueler-Furman O, et al. ClusPro PeptiDock: Efficient global docking of peptide recognition motifs using FFT. Bioinformatics. 2017; pmid:28430871
- 25. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–42. pmid:10592235
- 26. Kozakov D, Beglov D, Bohnuud T, Mottarella SE, Xia B, Hall DR, et al. How good is automated protein docking? Proteins. 2013;81(12):2159–66. pmid:23996272
- 27. Brooks BR, Brooks CL 3rd, Mackerell AD Jr., Nilsson L, Petrella RJ, Roux B, et al. CHARMM: the biomolecular simulation program. J Comput Chem. 2009;30(10):1545–614. pmid:19444816
- 28. Kozakov D, Brenke R, Comeau SR, Vajda S. PIPER: an FFT-based protein docking program with pairwise potentials. Proteins. 2006;65(2):392–406. pmid:16933295
- 29. Dinkel H, Van Roey K, Michael S, Kumar M, Uyar B, Altenberg B, et al. ELM 2016-data update and new functionality of the eukaryotic linear motif resource. Nucleic Acids Res. 2016;44(D1):D294–300. pmid:26615199
- 30. Puntervoll P, Linding R, Gemund C, Chabanis-Davidson S, Mattingsdal M, Cameron S, et al. ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003;31(13):3625–30. pmid:12824381
- 31. Messih MA, Lepore R, Tramontano A. LoopIng: a template-based tool for predicting the structure of protein loops. Bioinformatics. 2015;31(23):3767–72. pmid:26249814
- 32. Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, et al. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J Mol Biol. 2003;331(1):281–99. pmid:12875852
- 33. Lensink MF, Velankar S, Wodak SJ. Modeling protein-protein and protein-peptide complexes: CAPRI 6th edition. Proteins. 2017;85(3):359–77. pmid:27865038
- 34. Mendez R, Leplae R, De Maria L, Wodak SJ. Assessment of blind predictions of protein-protein interactions: current status of docking methods. Proteins. 2003;52(1):51–67. pmid:12784368
- 35. Mendez R, Leplae R, Lensink MF, Wodak SJ. Assessment of CAPRI predictions in rounds 3–5 shows progress in docking procedures. Proteins. 2005;60(2):150–69. pmid:15981261
- 36. Pearl FM, Bennett CF, Bray JE, Harrison AP, Martin N, Shepherd A, et al. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 2003;31(1):452–5. pmid:12520050
- 37. Rohl CA, Strauss CE, Misura KM, Baker D. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66–93. pmid:15063647
- 38. Park H, Lee GR, Heo L, Seok C. Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments. PLoS One. 2014;9(11):e113811. pmid:25419655
- 39. Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, et al. BriX: a database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res. 2011;39(Database issue):D435–42. pmid:20972210
- 40. Li Y, Suino K, Daugherty J, Xu HE. Structural and biochemical mechanisms for the specificity of hormone binding and coactivator assembly by mineralocorticoid receptor. Mol Cell. 2005;19(3):367–80. pmid:16061183
- 41. Guhaniyogi J, Robinson VL, Stock AM. Crystal structures of beryllium fluoride-free and beryllium fluoride-bound CheY in complex with the conserved C-terminal peptide of CheZ reveal dual binding modes specific to CheY conformation. J Mol Biol. 2006;359(3):624–45. pmid:16674976
- 42. Todd B, Moore D, Deivanayagam CC, Lin GD, Chattopadhyay D, Maki M, et al. A structural model for the inhibition of calpain by calpastatin: crystal structures of the native domain VI of calpain and its complexes with calpastatin peptide and a small molecule inhibitor. J Mol Biol. 2003;328(1):131–46. pmid:12684003
- 43. Remenyi A, Good MC, Bhattacharyya RP, Lim WA. The role of docking interactions in mediating signaling input, output, and discrimination in the yeast MAPK network. Mol Cell. 2005;20(6):951–62. pmid:16364919
- 44. Leaver-Fay A, O'Meara MJ, Tyka M, Jacak R, Song Y, Kellogg EH, et al. Scientific benchmarks for guiding macromolecular energy function improvement. Methods Enzymol. 2013;523:109–43. pmid:23422428
- 45. Rego N, Koes D. 3Dmol.js: molecular visualization with WebGL. Bioinformatics. 2015;31(8):1322–4. pmid:25505090
- 46. Simons KT, Kooperberg C, Huang E, Baker D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 1997;268(1):209–25. pmid:9149153
- 47. Ho BK, Dill KA. Folding very short peptides using molecular dynamics. PLoS Comput Biol. 2006;2(4):e27. pmid:16617376
- 48. Vanhee P, Stricher F, Baeten L, Verschueren E, Lenaerts T, Serrano L, et al. Protein-peptide interactions adopt the same structural motifs as monomeric protein folds. Structure. 2009;17(8):1128–36. pmid:19679090
- 49. Kozakov D, Li K, Hall DR, Beglov D, Zheng J, Vakili P, et al. Encounter complexes and dimensionality reduction in protein-protein association. Elife. 2014;3:e01370. pmid:24714491
- 50. London N, Movshovitz-Attias D, Schueler-Furman O. The structural basis of peptide-protein binding strategies. Structure. 2010;18(2):188–99. pmid:20159464
- 51. Poy F, Yaffe MB, Sayos J, Saxena K, Morra M, Sumegi J, et al. Crystal structures of the XLP protein SAP reveal a class of SH2 domains with extended, phosphotyrosine-independent sequence recognition. Mol Cell. 1999;4(4):555–61. pmid:10549287
- 52. Davis IW, Arendall WB 3rd, Richardson DC, Richardson JS. The backrub motion: how protein backbone shrugs when a sidechain dances. Structure. 2006;14(2):265–74. pmid:16472746
- 53. Song Y, DiMaio F, Wang RY, Kim D, Miles C, Brunette T, et al. High-resolution comparative modeling with RosettaCM. Structure. 2013;21(10):1735–42. pmid:24035711
- 54. Marcu O, Dodson EJ, Alam N, Sperber M, Kozakov D, Lensink MF, et al. FlexPepDock lessons from CAPRI peptide-protein rounds and suggested new criteria for assessment of model quality and utility. Proteins. 2017;85(3):445–62. pmid:28002624
- 55. Jozic D, Cardenes N, Deribe YL, Moncalian G, Hoeller D, Groemping Y, et al. Cbl promotes clustering of endocytic adaptor proteins. Nat Struct Mol Biol. 2005;12(11):972–9. pmid:16228008
- 56. Moncalian G, Cardenes N, Deribe YL, Spinola-Amilibia M, Dikic I, Bravo J. Atypical polyproline recognition by the CMS N-terminal Src homology 3 domain. J Biol Chem. 2006;281(50):38845–53. pmid:17020880
- 57. Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol. 1963;7:95–9. pmid:13990617
- 58. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. pmid:9254694
- 59. Ward JJ, McGuffin LJ, Buxton BF, Jones DT. Secondary structure prediction with support vector machines. Bioinformatics. 2003;19(13):1650–5. pmid:12967961
- 60. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, et al. The Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 2002;58(Pt 6 No 1):899–907. pmid:12037327
- 61. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003;302(5649):1364–8. pmid:14631033
- 62. Basu S, Wallner B. DockQ: A Quality Measure for Protein-Protein Docking Models. PLoS One. 2016;11(8):e0161879. pmid:27560519