In Silico Generation of Peptides by Replica Exchange Monte Carlo: Docking-Based Optimization of Maltose-Binding-Protein Ligands

Short peptides can be designed in silico and synthesized through automated techniques, making them advantageous and versatile protein binders. A number of docking-based algorithms allow for a computational screening of peptides as binders. Here we developed ex-novo peptides targeting the maltose site of the Maltose Binding Protein, the prototypical system for the study of protein ligand recognition. We used a Monte Carlo based protocol, to computationally evolve a set of octapeptides starting from a polialanine sequence. We screened in silico the candidate peptides and characterized their binding abilities by surface plasmon resonance, fluorescence and electrospray ionization mass spectrometry assays. These experiments showed the designed binders to recognize their target with micromolar affinity. We finally discuss the obtained results in the light of further improvement in the ex-novo optimization of peptide based binders.


Introduction
The design of new ligands and receptors for protein recognition is a key step towards the development of new diagnostic tools and selective drugs. So far antibodies represent the biomolecules with the highest affinity and selectivity toward proteins [1,2,3]. Notably, they are usually employed in a number of diagnostic applications such as ELISA and immunohistochemical assays. However both their production and optimization are expensive and time consuming requiring either in vitro (cell line culture and monoclonal screening) or in vivo procedures (animal immunization) [4]. Peptides are an emerging alternative to antibodies as drug candidates [5], anti-aggregation agents [6], and probes for molecular recognition [7]. They can be automatically synthesized offering a wide variety of chemical targeted modifications, such as fluorescent or affinity tags. They are highly versatile while composed by a limited number of building blocks and their sequences are typically extracted from protein-binding peptides or protein domains [8,9] or optimized by phage-display libraries [10,11,12] sometimes reaching picomolar affinity towards their targets [12].
Given a particular peptide sequence targeting a protein, and when its binding site is known, a number of docking algorithms can provide an accurate structural model of the complex [13]: FlexPepDock [14,15], PepCrawler [16], HADDOCK [17], AutoDock Vina [18], or GAsDock [19]. These are generally capable of suggesting possible binders out of a number of possibilities. However, while performing accurately in their structural determination task, these algorithms are generally not capable of accurately estimating the dissociation constants between peptides and proteins. While successful attempts in this direction have been made [20,21,22], plain docking disregarding this issue has been successfully embedded into a number of algorithms for the ex-novo optimization of short peptides allowing random sequences to evolve towards a high affinity to their target.
Typical molecular optimization algorithms embedded in evolutionary codes are found in the form of genetic algorithms [23,24] (allowing a population of molecules to be scored, selected, mixed, and mutated with the goal of maximizing their score), or Monte Carlo (MC) based algorithms such as simulated annealing [25,26,27]], and replica exchange Monte Carlo (REMC) where the system is allowed to explore high temperature unfavorable configurations to escape local minima [28], or agent-based algorithms where molecules can follow arbitrary sets of rules to reach a low score [29,30]. These algorithms have been successfully used to optimize clusters [23,27,30], molecular conformations [24,25], protein conformations [28], supramolecular structures [29]. In all cases the protocols are designed to minimize the system "score" generally corresponding to the system or the interaction energy between its components.
When optimizing peptides as binders the score is generally the output of an existing docking code [31] and these have been embedded in protocols capable of evolving random sequences. For instance MOE Dock, a docking algorithm based on Monte Carlo (MC) simulated annealing, has been incorporated in a genetic algorithm to generate tetrapeptides with dissociation constant K D = 60μM with quino-protein glucose dehydrogenase [32] and also to generate α-synuclein aggregation modulators with K D = 19μM [33]. AutoDock Vina has been implemented in ENDPA, another genetic-algorithm-based code, and used for generating propyl oligopeptidase, p53 and DNA gyrase ligands [34]. The Leap-Frog genetic algorithm based search engine together with the commercial FlexiDock has been used to generate octamers and 13-mers with micromolar affinity for Ochratoxin A [35]. Higher affinity has been reached by an algorithm based on a combination of molecular dynamics, semiflexible docking using Autodock Vina, and replica exchange Monte Carlo (REMC [36]). Using this protocol a deca-alanine was shown to evolve to a final sequence showing nanomolar affinity towards the antiretroviral drug efavirenz [37].
Here we aim to evolve in silico protein-binding octapeptides starting from a random sequence. We use the Vina-based approach of Ref. [37] (descibed in Fig 1A), that successfully generated drug recognizing peptides of nanomolar affinity, to generate a number of protein binding peptides. We chose the maltose binding protein (MBP, Fig 1B) as a test case and a set of peptides computationally designed to bind to its maltose binding site. MBP is the prototypical member [38] of the periplasmic binding proteins also called "gold mine for the study of protein-ligand recognition" [39]. MBP, with a molecular mass of 42 kDa, is an ellipsoidal monomeric protein consisting of two globular domains linked by a flexible multistranded region. The MBP is known to exist in two conformations: a populated open state (95%) and a closed state with the two globular domains approaching each other [40]. The flexible region defines a binding pocket for maltose, maltotriose, and other maltodextrins [41,42]. Upon binding to its native ligands, the MBP conformational equilibrium shifts towards the closed form, a behavior shared by all periplasmic binding proteins [43].
The paper is organized as follows: (i) in Sec.2.1 the peptide:MBP complexes are computationally generated following the protocol schematized in Fig 1A and screened by means of an empirical docking-based method to identify the binder with higher affinity towards a defined protein binding site; (ii) the peptide:MBP complex stoichiometry is then determined in Sec.2.2 through Electrospray Ionization Mass Spectrometry (ESI-MS) and in Sec.2.3 the binding affinity is experimentally determined by means of Surface Plasmon Resonance (SPR) and Fluorescence Spectroscopy; (iii) in Sec.2.4, by comparison of experimental affinities and theoretical predictions, we demonstrate the computationally generated peptides to converge towards a consensus sequence; in Sec.3, we discuss the results with the goal of pinpointing possible strategies for the development of accurate computational protocols for peptide design.

Computational generation and characterization of peptide ligands
The algorithm, schematized in Fig 1A, was designed to optimize sequence (SEQ) and conformation (R) of a random starting peptide of fixed length. Its kernel iteratively (i) mutates the primary structure of the peptide, (ii) relaxes the newly mutated structure, and (iii) docks the new structure to the target assigning it a docking score. All the procedure is carried out in vacuum. In the last step (iv) the mutation is accepted or rejected following a MC acceptance probability: where S old is the docking score of the starting peptide, S new is the docking score of the mutated peptide, k B T a tunable parameter which defines the acceptance probability of unfavorable mutations. The algorithm runs in parallel at three different k B T. At every step a REMC swap is attempted between two configurations belonging to two randomly selected k B T. The three different temperatures guaranteed appropriate mixing of the system allowing the exploration of high energy configurations while pushing the system towards lower energy configurations.
We performed the molecular design on the crystal structure of the MBP in its open configuration (in white in Fig 1B) and we selected the region surrounding Met330 (in yellow in Fig  1B) as the binding site. We optimized a set of octapeptides by running the optimization algorithm for 100 steps at k B T = 0.2, 0.4, 0.6 using in all cases a linear octa-alanine as the starting sequence, following the protocol of Ref. [37]. We ran the optimization code 9 times, selecting the lowest scoring end-simulation peptide of each run for further analysis. Scores are Autodock Vina estimated binding affinities in kcal/mol. The binding affinity evolution observed in a typical run is shown in Fig 2. From the run of Fig 2, we further selected two peptides along the minimization path and the starting poly-alanine. Moreover, due to the low aqueous solubility of the poly-alanine, we also considered the sequence AAARRAAA as a negative control. Overall, for further screening, we chose the 13 peptides reported in Table 1.
To characterize the chosen peptides we performed multiple dockings by switching off the mutation step of the optimization code (highlighted in grey in Fig 1A) and redocked all the 13 peptides at k B T = 0.6 (MC+Vina scores in Table 1). For comparison, we docked the same peptides also with the original Autodock (Vina scores in Table 1). Due to the stochastic character of the methods the end configurations widely varied but all were correctly located inside the pocket (as shown in Fig 3A and Figure A in S1 File). Accordingly the standard deviation on their docking scores was of 5% for MBP-9 and up to the 15% for MBP-11. In general, the inclusion of MC steps in Vina greatly improved the chances of finding a low energy configuration: the Vina scores were between -7.5 and -10.2kcal/mol, while MC+Vina found configurations with binding affinities between -12.3 and -20.2 kcal/mol, almost twice the binding affinities of the configurations found by the original Vina. By redocking the peptides both with Vina and MC+Vina we did not obtain the same binding energies we obtained for their generation (indicated by arrows Fig 2). Further, MC+Vina identified the poly-ALA as the peptide with the least favorable binding energy. Similar results were obtained by docking all the peptide with the same REMC protocol used for their generation. We further validated our results by refining the MC+Vina lowest energy configurations with FlexPepDock (see Table A and Figure A in S1 File).
The preliminary experimental results for the first eight peptides of Table 1 pointed toward a high micromolar affinity (3.28-5.11 kcal/mol at room temperature according to the relation ΔG = RTln(k D ) where ΔG is the binding affinity R the ideal gas constant, T the system temperature, and the k D measures are those collected in Table 2 and discussed in Sec.2.4). Since the result is well below the theoretically estimated binding affinity, we assumed that the abnormally high affinity values were due to the docking algorithm missing a number of sequence dependent entropic contributions. In order to correct these errors, we observed that the calculated energy was nonzero when docking a peptide to a non-binding site (as in the fifth column of Table 1). Thus we assumed this latter energy should be balanced by missing sequence dependent terms and applied a correction to the binding energy, defining the corrected binding energy as: where <E binding site > is the average interaction energy of a peptide docked to its binding site, while <E no binding site > is the average interaction energy of the same peptide docked to a nonbinding site. We employed the MC+Vina scoring protocol able to find the lowest energy configurations for the peptides of Table 1. We ran multiple dockings generating a number of possible MBP:peptide complexes targeting the maltose binding site in the open protein configuration ( Fig 3A) and in closed maltose containing configuration (Fig 3B) at k B T = 0.6. While it is not possible to exclude a priori the aspecific binding of the peptides to either of the globular domains, we assumed no binding was possible in the maltose containing closed configuration, an assumption confirmed by ESI-MS experiments (see Sec.2.2 and Sec.2.4). We averaged the energies over the last 10 configurations of 10 runs of the docking protocols (last two columns of Table 1). The difference between the two energies for each peptide is plotted in Fig 3C showing that negative controls have the lowest BE among the peptides, and the other binding energies are in the micromolar range. Now MBP-9 emerges as a clear outlier and the best candidate for binding to the MBP.

Stoichiometry of peptide:protein complex through ESI-MS
The binding between the sequence endowed with the highest putative affinity toward MBP, namely MBP-9, was analyzed using the ESI-MS technique (Fig 4). The protein alone gave a narrow charge-states distribution centered on the peaks corresponding to the 14+/15+ charge states (Fig 4A), suggesting a compact and structured conformation of the protein.
In the presence of an equimolar amount of MBP-9 new peaks arose in the spectrum ( Fig  4B), corresponding to the MBP:MBP-9 complexes with a 1:1 and 1:2 protein:peptide stoichiometries. The protein in the 1:1 and 1:2 bound states were~33% and~5% of the total amount, respectively. Protein:peptide complexes can be almost completely dissociated in the gas-phase by increasing the declustering potential (DP) of the spectrometer from 60 to 180 V (Fig 4C). As a negative control, we choose two standard proteins with a molecular mass similar to MBP: Ovalbumin (42.7 kDa) and Neuroserpin (46.3 kDa). The mass spectra of these proteins in the presence of MBP-9 were acquired under the same experimental conditions, and no peaks corresponding to protein:peptide complexes were detected (data not shown), indicating a specific recognition of MBP-9 toward the MBP. Competitive experiments between MBP-9 and maltose were also performed through ESI-MS analyses, employing an equimolar mixture of MBP, MBP-9, and maltose ( Fig 4D). Under these conditions, the great majority of the protein was bound exclusively to maltose, in agreement with its higher affinity (*1μM) [44]. Only a small amount (less than 5%) can be referred to MBP bound to both ligands (MBP-9 and maltose) ( Fig 4B). Therefore, the occupation of MBP pocket by maltose almost completely inhibits the binding of MBP-9, confirming that confirming that designed peptide binds in the same site of the sugar.

Estimation of dissociation constants of the peptide:protein complexes
SPR was employed to evaluate the affinities of designed peptides towards the MBP protein.
SPR experiments were carried out by immobilizing MBP on a CM5 chip with an immobilization level of 4900 RU, while peptides were employed as analytes. The analysis of MBP-9 experiments provided dissociation constants in the micromolar range K D = 72±3 μM, by applying a 1:1 Langmuir model (Fig 5A).
To assess the specificity of the recognition mechanism of the designed peptides we carried out competitive binding experiments through co-injection of peptides and maltose. MBP-9 at 200μM was co-injected with maltose in a 1:100 ratio. The recorded sensogram ( Fig 5B) showed a dramatically decrease of the RU signal upon the presence of maltose. In this experiment the peptide concentration was twofold greater than K D , thus MBP protein was nearly saturated and consequently the signal decrease resulted from peptide displacement. Further, to corroborate the affinity values evaluated through SPR using an in-solution binding assay, we performed fluorescence experiments on the MBP-9:protein complex. Tryptophan fluorescence emission at 333 nm showed a dose-response quenching upon MBP-9 addition and -Δ fluorescence intensity was plotted against the concentration values of MPB-9 (Fig 6). Data were fitted with a 1:1 model of interaction, providing a K D = 65 ±10μM, in agreement with the SPR measurement.

Comparison between theoretical and experimental binding affinities
The SPR technique was employed to evaluate the affinity of remaining peptides towards the MBP, and the results are collected in Table 2. Dose response assays clearly revealed the recognition of 9 out of 11 designed peptides towards the protein. Exceptions are MBP-10 and MBP-11  Figure C in S1 File), due to their tendency to aggregate. Further, MBP-8 at 200μM was also co-injected with maltose in a 1:100 dramatically decrease of the RU signal upon the presence of maltose, as formerly observed for MBP-9, while MBP-1 and MBP-6 were tested for their specificity by MS-ESI with Ovalbumin and Neuroserpin: also in these cases quite small amounts of the complexes were detected at 10-fold molar excess of the ligand confirming a strong specificity in the sequence able to recognize MBP (data not shown).
By converting the binding affinities in units of energy, using ΔG = RTln(k D ) with RT = 0.593kcal/mol with error defined as σ ΔG = 0.434(σ kD /k D ) where σ kD is the experimental, it is possible to compare the measured binding affinities with the BE calculated in Sec.2.2 (Fig 7). In the limit of a very small dataset, the comparison shows a good correlation.
The inspection of the protein maltose binding site revealed the presence of several aromatic and hydrogen bonding residues besides of Met 330 (see the insets of Fig 7). The formers allow for multiple π-π interactions between the protein and the peptides aromatic rings. This is particularly evident for MBP-9 which is endowed with a pronounced aromatic character. But the aromaticity itself is not sufficient to guarantee binding, since it is pronounced for binding as well as for "non-binding" sequences. Indeed the comparison between the amino acidic composition of the most potent binders (MBP-8, -9) and the non-binding sequences (MBP-10, -11), showed that even if they share common chemical features only the binding peptides bear Proline residues. This iminoacid, in the middle of the sequences, seems to ensure a conformational turn that could better accommodate peptides in the maltose site, probably aiding π-stacking interactions among protein and aromatic side-chains of peptide sequences.

Discussion and Conclusions
We have computationally generated a set of octa-peptides targeting the maltose site of MBP with an algorithm previously employed successfully in the framework of drug recognition. Peptides were screened with a docking based algorithm, allowing the selection of the best candidate binder out of a number of possibilities. SPR, ESI-MS and fluorescence based assays confirmed its 1:1 stochiometry, its selectivity towards the binding site they have been designed for and leading to a binding affinity in the micromolar range. In Silico Generation of Peptides However, current docking algorithms while pinpointing possible binders and predicting their bound conformation turn out to be unable to rank and estimate the binding energy between the peptides and the MBP and to discern between binders and non-binders. For instance, the original Vina binding energy is reasonably accurate for small molecules: the maltotriose with MBP open configuration scores -7.1kcal/mol and -8.0kcal/mol for the closed configuration to be compared to the experimental values of -8.4kcal/mol [45] and -9.3kcal/mol [46]. Vina+MC is accurate for very short peptides, for instance KAK+OPPA with the MC +Vina code scores -9.5±0.6 kcal/mol (the experimentally determined value is -9.8kcal/mol [47]). It is also possible to reproduce the measured binding affinity of -9.7kcal/mol obtained for the octapeptide of Ref. [37] with calculated average value of 7.1±1.1kcal/mol (with a minimum at -10.2 kcal/mol). However Vina, as well as the others Vina-based protocols, is orders of magnitude far from the experimentally evaluated binding energies for longer peptides.
While this is an important obstacle for the progress of the in silico optimization of peptidebased binders, we have delineated a new route. In order to match the experimental binding affinities and computational values, we have demonstrated that it is crucial to consider a negative control in the estimation of the entropic contribution of the global binding energy. Indeed we have redefined this contribution as the "binding" energy of the peptide towards an experimentally determined non-binding configuration. ESI-MS and SPR experiments, showed that the peptides do not form a 1:1 complex with MBP when an excess of maltose is present, indicating that they are specific for MBP in its maltose-free conformation and that they are unable to bind to the MBP maltose-containing-closed conformation, making the maltose-containingclosed conformation a reliable negative control. Further, since the structure of the two MBP globular domains is unaffected by the presence of maltose, we chose the multistranded region connecting them as control for peptide aspecific binding even if we did not exclude a priori that aspecific interactions could occur in the globular lobes.
While in the proposed work we assess the limitations of the docking based automated evolution of protein binding peptides, the redefinition of the ligand binding energy provides an intriguing direction for the computational evolution and screening of peptides with reliable predicted binding affinities for proteins.

Methods Computational
For molecular modelling and docking we used the ProteinDataBank structures 1OMP [48] and 3MBP [46]. The docking protocol, implemented in a bash script, is based on a combination of Modeler 9.11 [49], AutoDock Tools 1.5.4, and AutoDock Vina 1.1.0 [18]. All dockings were performed in a box of size 25x25x25Å, to constrain the peptides in the MBP binding pocket, with exhaustiveness 10, and energy_range 4. The other parameters have been set to their default values. For MBP the box was centered on the sulfur atom of Met330. The starting configuration for all the Vina-based docking runs was a linear poly-alanine. AutoDock Vina was run 10 times obtaining 90 configurations, MC+Vina 10 times for 100 steps obtaining 100 endsimulation configurations. All the figures throughout the text were generated with VMD.

Protein Preparation and characterization of His MBP
This protein was expressed, purified, and characterized as reported in Figure B in S1 File [50] Peptide Synthesis Reagents (Fmoc-protected amino acids and resins, activation and deprotection reagents) from Novabiochem (Laufelfingen, Switzerland) and InBios (Napoli, Italy); Solvents and HPLC analyses from Romil (Dublin, Ireland); reversed phase columns for peptide analysis and LC-MS system from ThermoFisher (Milan, Italy). Solid phase peptide syntheses were performed on a fully automated multichannel peptide synthesizer Syro I (Multisynthech, Germany). Preparative RP-HPLC was carried out on a Shimadzu LC-8A, equipped with a SPD-M10 AV detector and with a Phenomenex C18 Jupiter column (50x22 mm ID; 10 μm). LC-MS analyses were carried out on a LCQ DECA XP Ion Trap mass spectrometer equipped with a OPTON ESI source, operating at 4.2 kV needle voltage and 320°C with a complete Surveyor HPLC system, comprised of MS pump, an autosampler and a photo diode array (PDA), and Narrow bore 50x2 mm C18 BioBasic LC-MS columns. Peptides were synthesized employing the solid phase method on a 50 μmol scale following standard Fmoc strategies [51] with Rink-amide resin (substitution 0.5 mmol/g) solid support. The amino acids were activated with HBTU/ Oxyme /DIEA (1:1:2), and the Fmoc deprotection was achieved by using a 20% (v/v) piperidine solution in DMF. All couplings were performed for 15 min and deprotections for 10 min. Finally the peptides were removed from the resin by treatment with a TFA:TIS:H2O (90:5:5, v/v/v) mixture for 90 min at room temperature. Crude peptides were then precipitated in cold ether, dissolved in a water/acetonitrile (1:1, v/v) and lyophilized. Products were purified by RP-HPLC applying a linear gradient of 0.1% TFA CH 3 CN in 0.1% TFA water from 1% to 30% or 5% to 70% depending on the hydrophobicity over 13 min using a semi-preparative 2.2x5 cm C18 column at a flow rate of 15 mL/min. Peptides purity and identity were confirmed by LC-MS. Purified peptides were lyophilized and stored at -20°C until use.

ESI-MS analysis of protein-peptides complexes
ESI-MS analysis were carried out on a hybrid quadrupole time-of-flight mass spectrometer (QSTAR Elite, AB-Sciex, ForsterCity, CA, USA), which is equipped with nano-ESI source [52]. The samples were infused by metal-coated borosilicate capillaries, with emitter tips of 1μm internal diameter (Proxeon, Odense, Denmark), and the instrumental setting was: declustering potential (DP) +60 V, ion spray voltage +1.1/+1.2 kV and curtain-gas pressure 20psi. Both the sample source and the instrument interface were kept at room temperature. The spectra were recorded under nondenaturing conditions (50 mM ammonium acetate pH 7). The protein complexes with peptides were prepared by mixing 20μM protein solution with an equimolar amount of ligand.

Surface Plasmon Resonance
The interactions between the protein and computationally optimized peptides were measured using the BIAcore 3000 (GE Healthcare Milano, Italy). MBP was immobilized at a concentration of 100 μg/mL in 10 mM acetate buffer pH 5 (flow rate 5 μL/min, time injection 7 min) on a CM5 Biacore sensor chip, using EDC/NHS chemistry following the manufacturer's instructions. Residual reactive groups were deactivated by treatment with 1 M ethanolamine hydrochloride, pH 8.5. The reference channel was activated with EDC/NHS and deactivated with ethanolamine. The binding assays were carried out at 20 μL /min at 25°C, with 4.5 min contact-time. Peptides were diluted in the HBS running buffer (10 mM Hepes, 150 mM NaCl, 3 mM EDTA, pH 7.4). Analyte injections were performed at the indicated concentrations. The sensor surface was regenerated by using 1-3 washes of 10mM NaOH for 1 minute. The association phase (kon) was followed for 250s, whereas the dissociation phase (koff) was followed for 300 s. The instrument BIAevaluation analysis package (version 4.1, GE Healthcare, Milano, Italy) was used to subtract the signal of the reference channel. Conversely, an affinity steady state model was applied to fit the RU max data versus peptides concentrations and fitting was performed with GraphPad Prism v4.00 using the one-site binding equation [53,54].

Fluorescence binding analysis
The data were acquired at 25.0°C, using an excitation wavelength of 298.0 nm and a fluorescence emission wavelength ranging from 300 to 400 nm. The acquisition parameters were set as follows: excitation and emission slits at 5 nm; 120 nm/min scan rate; 1.00 nm data interval averaging time at 0.500 s, PMT voltage at "high". The fluorescence values recorded at 333 nm were extracted, and transformed to -Δfluorescence which was obtained by subtracting them to the emission fluorescence intensity of the ligand-free protein, and, then, plotted against the peptide concentration [55]. MBP was used at the concentration of 14 μM and incubated in the presence of increasing concentrations of MBP-8 ranging from 0 to 350 μM. Experiments were carried out in duplicates. A control assay was carried out employing as titrant the buffer to assess that the dilution effect was under 3%, not affecting the results.