Computational Analysis and Prediction of the Binding Motif and Protein Interacting Partners of the Abl SH3 Domain

Protein-protein interactions, particularly weak and transient ones, are often mediated by peptide recognition domains, such as Src Homology 2 and 3 (SH2 and SH3) domains, which bind to specific sequence and structural motifs. It is important but challenging to determine the binding specificity of these domains accurately and to predict their physiological interacting partners. In this study, the interactions between 35 peptide ligands (15 binders and 20 non-binders) and the Abl SH3 domain were analyzed using molecular dynamics simulation and the Molecular Mechanics/Poisson-Boltzmann Solvent Area method. The calculated binding free energies correlated well with the rank order of the binding peptides and clearly distinguished binders from non-binders. Free energy component analysis revealed that the van der Waals interactions dictate the binding strength of peptides, whereas the binding specificity is determined by the electrostatic interaction and the polar contribution of desolvation. The binding motif of the Abl SH3 domain was then determined by a virtual mutagenesis method, which mutates the residue at each position of the template peptide relative to all other 19 amino acids and calculates the binding free energy difference between the template and the mutated peptides using the Molecular Mechanics/Poisson-Boltzmann Solvent Area method. A single position mutation free energy profile was thus established and used as a scoring matrix to search peptides recognized by the Abl SH3 domain in the human genome. Our approach successfully picked ten out of 13 experimentally determined binding partners of the Abl SH3 domain among the top 600 candidates from the 218,540 decapeptides with the PXXP motif in the SWISS-PROT database. We expect that this physical-principle based method can be applied to other protein domains as well.


Introduction
The interactions between protein domains and their peptide ligands play critical roles in signal transduction and many other key biological processes. Because domain-peptide interactions are usually weak and transient, and often depend upon post-translational modification, they tend to be underrepresented in high-throughput and computational studies [1], thus highlighting the need to develop new methods to identify these interactions. The Src Homology 3 (SH3) domain is the most abundant modular domain in the human proteome and presents in a wide variety of proteins, such as kinases, lipases, GTPases, and adaptor proteins, to orchestrate diverse cellular processes [2][3][4][5][6]. SH3 domains are 50-70 amino acids long and consist of five b-strands arranged into two sheets packed at right angles. They recognize the prolinerich peptides with the consensus motif PXXP (where P is proline and X is any amino acid) [7,8] that forms a lefthanded poly-proline type II (PPII) helix [9]. Depending on the position of the positive residue in the peptide sequence, the majority of SH3 ligands fall into two classes that bind to the protein in opposite orientations [10]: N-terminal to Cterminal (class I) or C-terminal to N-terminal (class II). Class I peptides typically consist of a core motif of RXLPX#P (where # is usually a hydrophobic residue), whereas the class II peptides contain a core motif of PX#PXR. In class I peptides, the proline residues in bold occupy the sites in the hydro-have been developed to tackle this problem. Peptide library screening is often used to determine the binding motif of a SH3 domain, in which the binding peptides are sequenced and aligned to generate a frequency matrix representing the amino acid preference at each position [11,12]. Bias may be introduced by not completely sampling all possible peptides, not quantitatively weighting the contribution of peptides to the matrix based on their binding strength and/or not distinguishing peptides bound to the SH3 domain in different binding modes. Computational methods such as Scansite [13], SPOT [14], and VIP [15] methods have been developed to predict interacting proteins of a domain. The performance of Scansite totally depends on the accuracy of the frequency matrix determined by the peptide library experiments. SPOT is limited by the relatively small number of residue contact pairs between SH3 domains and peptides. The performance of the VIP method can be improved if more conformational sampling is done and more rigorous binding energy prediction method, including the conformational energy change and the desolvation contribution upon peptide binding, is applied.
In this study, we analyzed the binding specificity of the SH3 domain of the human protein Abl. The Molecular Mechanics/ Poisson-Boltzmann Solvent Area (MM/PBSA) method [16] was first applied to calculate the binding free energies between the Abl SH3 domains and 35 ten-residue-long peptides (15 binders and 20 non-binders). As a validation of the MM/PBSA method on the domain-peptide system, the calculated binding free energies of the 15 known binders correlated well with the experimental values [17] and were distinct from those of the non-binders. Analysis of the molecular dynamics (MD) trajectories and binding free energy components shed light into understanding the mechanism of the binding specificity of the Abl SH3 domain. The residue preference at each position of the peptide ligand was then studied systematically by single position mutation and MM/PBSA calculations, which we call the virtual mutagenesis (VM) method [18,19]. A single position mutation free energy profile (SPMFEP) was established from such analysis to quantitatively represent the binding motif and was in good agreement with the experimental measurements. We used SPMFEP as a scoring matrix to search the SWISS-PROT database for potential binding partners of the Abl SH3 domain. Most experimentally determined binding proteins of the Abl SH3 domain were ranked in the top 600 candidates among about 6.2 3 10 7 decapeptides in the database and many promising candidates were also suggested.

Molecular Basis of the Binding Specificity of the Abl SH3 Domain
The calculated binding free energies correlate well with the experimental values. We first evaluated the performance of the MM/PBSA method on calculating the binding free energies of the Abl SH3 domain and its peptide ligands (15 binders and 20 non-binders). As shown in Table 1 and Figure  1, the calculated relative binding free energies of the 15 known binders show good correlation with the experimental values (the correlation coefficient r and standard deviation [SD] are 0.82 and 1.7, respectively). We then analyzed the free energy components to search for the dominant factor that dictates the binding specificity ( Table 1). As the favorable electrostatic interaction between the peptide and the SH3 domain DE ele is canceled by the unfavorable electrostatic contribution to desolvation DG PB , the van der Waals interaction DE vdw is the most favorable component of the binding free energy. The favorable DE vdw is mainly from the interactions between the conserved proline residues of the PXXP motif and the hydrophobic surface that is formed by Tyr7, Phe9, Trp36, Tyr52, and Pro49 of the Abl SH3 domain and conserved in almost all SH3 domains. To investigate which energetic factor determines the relative binding affinities of these 15 binders, we compared the correlations between the measured binding free energies and each of the four free energy components, DE ele , DE vdw , DG PB , and DG SA . None of these components shows good correlation with the experimental values and the largest correlation coefficient is only 0.43 for DE ele , which suggests that no individual free energy component dominates the binding specificity. We then analyzed the non-polar (DE vdw þ DG SA ) and the electrostatic contribution (DE ele þ DG PB ) to the binding free energy. The electrostatic contribution correlates well with the binding free energies (r ¼ 0.73, SD ¼ 1.6) while the non-polar contribution does not show any correlation (r ¼ À0.32, SD ¼ 2.0). It suggests that the binding preference of these peptides is mainly determined by the electrostatic contribution to binding.
We also found that the conformational change of the peptide upon binding DE conf_pep was important (r ¼ 0.54, SD ¼1.8). The sum of DE ele , DE vdw , DG PB , DG SA , and DE conf_pep correlates well with the binding free energies (r ¼ 0.79, SD ¼1.8) whereas the sum of the first four terms does not (r ¼ 0.52). Our calculations highlight the crucial effect of the change of the conformational energy of peptide to the affinities, which is not always appropriately considered [20,21]. Inclusion of the conformational entropy ÀTDS in the free energy calculation only slightly improves the correlation coefficient from 0.79-0.82, which suggests that the conforma-

Synopsis
One of the central questions of molecular biology is to understand how signals are transduced in the cell. Intracellular signal transduction is mainly achieved through cascades of protein-protein interactions, which are often mediated by peptide-binding modular domains, such as Src Homology 2 and 3 (SH2 and SH3). Each family of these domains binds to peptides with specific sequence and structural characteristics. To reconstruct the protein-protein interaction networks mediated by modular domains, one must identify the peptide motifs recognized by these domains and understand the mechanism of binding specificity. These questions are challenging because the domain-peptide interactions are usually weak and transient. Here, the authors took a physical-principles approach to address these difficult questions for the SH3 domain of human protein Abl, which binds to peptides containing the PXXP motif (where P is proline and X is any amino acid). They generated a position-specific scoring matrix to represent the binding motif of the Abl SH3 domain. Analysis on the binding free energy components suggested insights into how the binding specificity is achieved. Most known protein interacting partners of the Abl SH3 domain were correctly identified using the position-specific scoring matrix, and other potential interacting partners were also suggested.
tional entropy is not the determinant factor of the binding specificity of the binders.
The binding free energies and the free energy components for the 20 non-binders of the Abl SH3 domain were also calculated (Table S1). Two distributions of the binding free energies for binders and non-binders are distinct (Figure 2), which indicates that the MM/PBSA method can distinguish binders from non-binders. We also found that most nonbinders preferred unbound conformations. First, the average conformational entropy change upon binding ÀTDS for binders and non-binders are 32.1 kcal/mol and 35.9 kcal/ mol, respectively, indicating that most non-binders lost more entropy upon binding than did binders. Second, the change of conformational energy for binders and non-binders are also significantly different: the average value for binders and non-binders are 2.0 kcal/mol and 4.3 kcal/mol, respectively.
The binding motif can be revealed by the VM method. To understand the mechanism of the binding specificity, we need to determine the binding motif of the domain. We analyzed the amino acid preference at each position of the peptide using the VM method (see Materials and Methods). We compared our results with the available experimental measurements at positions P 3 , P 0 , P À3, and P À5 (Tables S3-S6) [17,22]. These four positions are particularly important for the peptide binding: the two usually conserved Pro residues at P 3 and P 0 ensure strong binding affinity and    residues at P À3 , and P À5 are essential to the binding specificity [9,10]. To determine the residues of the Abl SH3 domain that are important for peptide binding, the contribution of each SH3 domain residue to binding with the template peptide APSYSPPPPP (the two conserved Pro residues are in bold) was analyzed (the polar contribution to desolvation was calculated using the GB/SA method implemented in the mm_pbsa module of AMBER 8 for the sake of efficiency.) (Table S7) [19].
In the crystal structure of APSYSPPPPP complexed with the Abl SH3 domain, the residue at position P À5 in the peptide occupies a hydrophobic pocket of the SH3 domain. Proline at this position has relatively strong van der Waals interactions with Trp36 (À1.3 kcal/mol) and Trp47 (À0.9 kcal/ mol) in the Abl SH3 domain (Table S7). Several other residues, Asn, Leu, Met, Phe, Tyr, and Val, are also strongly favored at this position ( Figure 3A). It is worth pointing out that the mutation of Pro to Tyr or Phe, does not significantly impair the van der Waals interaction between the peptide and the SH3 domain due to the conformational change of the peptide backbone. Our predicted preference based on free energy calculations is consistent with the experimental results: Pro is the most preferred, whereas other residues especially hydrophobic ones (Phe, Leu, Met, Val, and Trp) are also favored [22].
It should be noted that the preference of residue at P À5 are closely related to the residues at the adjacent positions. For example, the known binder FGTYPPPLPP (A4 in Table 1) has a Gly at P À5 , which is not favored at this position based on the VM result. By analyzing the MD trajectory on A4-SH3 complex, we found that Phe at P À6 in A4 can occupy the binding pocket that is occupied by Pro at P À5 in the template peptide APSYSPPPPP to form favorable van der Waals interactions with Trp36 and Trp47. Moreover, the benzyl ring of Phe at P À6 is parallel to the aromatic ring of Trp47 to form strong p-p stacking interactions. Therefore, if there is a small residue (Ala, Gly, or Ser) at P À5 , an aromatic residue (Tyr, Phe, or Trp) may be preferred at P À6 . This suggests that the repertoire of SH3 domain-binding peptides may be much larger than previously thought.
Our analysis showed that Trp, Phe, Tyr, Met, and Pro are favored at P À3 ( Figure 3B), which is in good agreement with the study of Villanueva's et al. that the most favorable residues are Trp, Tyr, Phe, and Met (ordered based on the binding free energies) [22]. This observation is consistent with findings that an aromatic residue is favored at P À3 of mouse protein 3BP1, a known binder of the Abl SH3 domain [12,23,24]. The energy component analysis (Table S4) suggests that the strong preference of these four residues at this position is mainly due to the favorable non-polar contribution (DE vdw þ DG SA ) upon peptide binding: DE vdw þ DG SA for Trp, Tyr, Met, and Phe are À55.9, À55.4, À54.4, and À53.8 kcal/mol, respectively, which are stronger than the other 16 residues at this position. Interestingly, positive charged residues, Arg and Lys, are not favored at this position, which is in contrast to most SH3 domain-binding peptides. To investigate the reason, we have compared the electrostatic surfaces of four SH3 domains, Abl tyrosine kinase SH3 domain (PDB entry 1bbz) [25], c-Crk N-terminal SH3 domain (PDB entry 1cka) [26], Grb2 N-terminal SH3 domain (PDB entry 1gbq) [27], and rat amphiphysin-2 SH3 domain (PDB entry 1bb9) [28] ( Figure  4). Electrostatic potentials were calculated by solving the Poisson-Boltzmann equation using the Delphi program [29] in Insight II [30]. We find that the rat amphiphysin-2 SH3 domain has the largest areas of negative electrostatic potentials, which is mainly due to the acidic residues in Arg-Thr and extended n-Src loops of the domain. The large patch of negative electrostatic potential explains why the amphiphysin SH3 domain specifically recognizes the PXRPXR motif with two positively charged Arg residues. The c-Crk N-terminal and the Grb2 N-terminal SH3 domain have distinct but relatively small negative potential near the Arg-Thr and n-Src loop, which may explain why these two SH3 domains bind to peptides only possessing a single positively charged residue. Compared with the other three SH3 domains, the Abl SH3 domain does not possess remarkable and continuous distribution of the negative electrostatic potentials and therefore positively charged residues are not strongly preferred.
Proline is highly preferred at positions P 0 and P 3 . We find that Pro at P 0 has favorable interactions with Phe9 and Tyr52 (À1.0 kcal/mol and À1.0 kcal/mol) (Table S7), as well as favorable van der Waals interaction with Pro49 (À0.8 kcal/ mol). Experimental and theoretical studies have focused on understanding how SH3 domains recognize the core motif PXXP [31,32]. Proline is the most favorable residue at this position based on our free energy calculation ( Figure 3C). When only the non-polar contribution (DE vdw þ DG SA ) is considered, the peptide with Pro has the strongest interaction with SH3 (À54.8 kcal/mol) (Table S5), which agrees with the study of Wang et al. [31]. In addition to the non-polar contribution, we found the less unfavorable desolvation-free energy is also an important factor for the preference of proline. For example, the residues at P 0 in peptides 16, 19, and 20 (Table S5) all have relatively small side chains (Thr, Val, and Pro, respectively) and have similar protein-ligand electrostatic interactions DE ele . When the polar contributions of desolvation-free energies DG PB were considered, the peptide 20 has the least unfavorable contribution (DE ele þ DG PB ¼ þ 20.5 kcal/mol). We believe that upon peptide binding the desolvation cost for the nitrogen-substituted atom in Pro may be less than that of the non-substituted nitrogen atom in the other amino acids, because the nonsubstituted nitrogen atom can be easily polarized by the solvent [32].
Proline is also strongly selected at P 3 in almost all SH3 domain-binding peptides. From the calculated binding free energies ( Figure 3D), it is interesting to find that, although Pro is preferred at this position by the Abl SH3 domain, the preference is not very strong. Proline at this position can be mutated to several other residues including Trp, Phe, Met, His, Leu, Gln, and Tyr, and the mutated peptides still have relatively strong binding free energies. Our calculation seems to contrast with what has been suggested about the critical role of Pro at P 3 in SH3 ligands [10]. However, in the mutation experiments reported by Pisabarro et al. [17], Pro at P 3 in peptide APTYPPPLPP was mutated to His, Leu, and Tyr, and the binding affinities of the mutated peptides were only slightly decreased from À7.1 kcal/mol to À6.2, À6.2, and À6.2 kcal/mol, respectively. These three favorable residues reported by Pisabarro et al. were also relatively favored in our predictions.

Identifying Physiological Interacting Partners of the Abl SH3 Domain
Based on the comparison between the experimental and calculated results, we have shown that the VM method can determine the binding motif of the Abl SH3 domain. The difference between the binding free energies of the mutated peptide at each position and the template peptide AP-SYSPPPPP, called SPMFEP, can be used as a position specific scoring matrix to predict the binding affinities of peptides (Table 2). To evaluate the performance of SPMFEP, the 15 binders and 20 non-binders were first scored using SPMFEP. Two obvious distributions for binders and non-binders can be observed ( Figure 5), indicating that binders and nonbinders can be successfully distinguished by SPMFEP. The binding affinity of peptide A4 was under-estimated. From our analysis (see above), we know that Phe at P À6 in the peptide A4 is favorable in the hydrophobic binding pocket that is originally occupied by Pro at P À5 in the template peptide. Consequently, the residues at P À6 and P À5 may interact with each other. In SPMFEP, the inter-dependence between positions is not considered. In fact, all methods using a position specific scoring matrix, such as Scansite [13], have the same limitations. Overall, SPMFEP performs well on the selected 35 peptides.
We next scanned the SWISS-PROT using SPMFEP to predict interacting partners of the Abl SH3 domain. There are about 6.2 3 10 7 ten-residue-long peptides in the current SWISS-PROT database (May 2005), in which about 218,540 ten-residue-long peptides have the PXXP motif. Only about 2,600 peptides have scores smaller than two, which are in the top 0.005% (the top 600 peptide sequences in 353 unique human proteins are listed in Table S9). We first carefully examined the top ten candidates in the human proteome (Table 3), among which WASF1 and EVL are known interacting partners of the Abl SH3 domain [33,34]. WASF4 is a homology of WASF1 and is in the same protein family. SEM6A is a homology of the mouse protein SEM6D, a known Abl SH3 domain-binding protein, and is therefore likely to be a true binder [35]. In total, we have identified two known binders and two candidates highly supported by experimental evidence, among the top ten peptides, which is a surprisingly good result. As a comparison, the top ten human peptides in the Scansite search are [13] 3BP2, RX, RBMG, TACT, PRL3, SCA3, AT19, AD08, DYN2, and SEP4, among which only 3BP2, the homology of a known binder (mouse protein 3BP2), is likely to be a true binder [36] but no binding information is found for all other candidates in BIND [36] and MINT [37] databases to interact with Abl protein or the Abl SH3 domain. If only considering the top ten candidates, the SPMFEP method based on VM performs better than Scansite on identifying the interacting partners of the Abl SH3 domain.
In MINT [37] and BIND [36], 44 non-redundant proteins have been identified to directly interact with the protein Abl, and 13 of them, including five mouse proteins and eight human proteins, bind to the Abl SH3 domain. We compared the performance of SPMFEP, Scansite [13], and iSPOT [14], to identify these 13 proteins ( Table 4). The top 600 candidates found in human proteins and the top 2,000 candidates found in all proteins were saved for further analyses.
SPMFEP can successfully identify ten known binders: seven of the eight human proteins and three of the five mouse proteins (Table 4 and Table S8). 3BP1_mouse and 3BP2_mouse are not ranked highly, considering all proteins (1,393 and 1,895) but they are in the top 500 if only mouse proteins are considered. The human homologies of 3BP1_mouse and 3BP2_mouse are in the top 600 candidates (249 for 3BP1_human and 502 for 3BP2_human) (Table S9) and it is reasonable to believe that they are true binders of the Abl SH3 domain. Overall, we have successfully identified most of the known binders of the Abl SH3 domain.
Scansite can identify eight known binders of the Abl SH3 domain (Table 4). P73_human, not identified by SPMFEP, is ranked 321 in the Scansite result. Agami et al. [38] reported that a P73 mutant P338A could not form stable P73-Abl complexes. If P73 interacts with Abl by ten-residue-long peptide segment, this peptide segment should be AFKQSP-PAVP, which is the same as the peptide identified by Scansite. Based on our VM analysis and mutation experiments by Pisabarro and Serrano [17], Phe at P À5 and Lys at P À4 are not favored. Because Scansite considers 15-residue-long peptides rather than ten-residue-long peptides in SPMFEP, it is likely that the five additional residues may contribute favorably to binding. It is not surprising that the longer the peptide, the more specific, but less sensitive, are the predictions.
Using iSPOT, we can only correctly identify five binders (Table 4). In iSPOT, the scoring matrix was derived from position-specific contacts based on six SH3-peptide or SH3protein complex structures [14]. The accuracy of the matrix is limited by the relatively small number of residue-resident contacts found between SH3 domains and their binding peptides to fill the 27 3 10 3 10 3 10 position-specific contact matrix.
CABL2_human and SEM6D_mouse (Table 4) cannot be identified by all the three methods. Experiments [35,39] have shown that their interactions with Abl are mediated by the interaction between the proline-rich region and the Abl SH3 domain. Since the scoring matrix used by all the three methods does not consider dependence between positions, we suspect that synergistic interactions may exist between positions within or beyond the proline-rich regions of the two proteins.

Discussion
In this study, we have demonstrated that the MM/PBSA method can accurately calculate the binding free energies between the Abl SH3 domain and its peptide ligands.   Examination of each component of the binding free energy shows that, besides the non-bonded interactions and desolvation effect, the change of the conformational energies of the peptides upon binding is also crucial to determine the binding specificity of the domain. These results are encouraging to apply MD simulation and free energy calculation to understand the molecular mechanism of other domainpeptide and protein-protein interactions.
We have also shown that the VM method can precisely determine the sequence motif recognized by the Abl SH3 domain. The experimental scheme of the VM method is totally different from those of the current peptide library experiments in the following ways: 1) produce all possible peptides that are one amino acid different from the template peptide; (2) measure the ''binding'' affinities of all these peptides; and, (3) generate a scoring matrix based on the binding affinity differences between the mutated and the template peptides to determine the binding motif.
There are advantages of this scheme. First, the preference of an amino acid is quantitatively measured, based on the binding affinity of the peptide, which at least partially overcomes the sampling difficulty in the current peptide library experiment. Ideally, to determine the binding motif of a domain, one should examine the binding between the domain and all possible peptides of a given length, and align all binding peptides to calculate the frequency of each amino acid occurring at each position. In reality, there are usually only 10 7 -10 10 peptides in the library, due to the limit of time and cost. If the length of the binding peptide is ten, there are 10 10 -10 13 possible peptides and the coverage of the peptide sequence space by the peptide library is about 10 À6 -10 À3 . If 15-residue-long peptides are considered as in Scansite, the coverage drops dramatically to 10 À12 -10 À9 . By measuring the relative preference of every amino acid based on the peptidebinding affinities, one overcomes this insufficient sampling issue and mimics the ideal procedure given the positionindependence assumption. Second, the VM method can evaluate the penalties of unfavorable amino acids even the peptide do not really bind to the protein, which is very hard if not impossible in the experimental approaches. Third, since the scoring matrix is obtained by taking the difference between the template and the mutated peptides that are only one amino acid different from the template peptide, some  errors due to insufficient sampling of conformational space and/or inaccurate free energy calculation can be cancelled. There are two major hurdles of applying the VM method in a high-throughput manner. First, the MD simulation and free energy calculations are time-consuming. Second, a domainpeptide complex structure is required. Given the fast pace of advancement of computer power and structural genomics/ homology modeling, we believe that the VM method will become more and more useful.

Materials and Methods
MD simulations. MD simulations were performed on the 15 binders and 20 non-binders of the Abl SH3 domain using the AMBER 8 simulation package [19] and AMBER03 force field [40]. The amino acid sequences and the experimentally determined binding affinities of the 15 binders are shown in Table 1 [17]. Ten peptides, B1 to B10, were randomly selected from the human proteome and are considered as non-binders (Table S1). Ten peptides, C1 to C10, do not bind to the Abl SH3 domain but are Class I binders of other SH3 domains [12]: C1 and C2 bind to the Src SH3 domains, C3 and C4 bind to the Yes SH3 domain, and C5 to C10 bind to the Grb2 Nterminal SH3 domain (Table S1). We chose the crystal structure of the peptide APSYSPPPPP complexed with the Abl SH3 domain (Class I binder and the PDB entry is 1bbz) [25] as the template and mutated it to other peptides using the scap program [41]. The complex was solvated in a rectangular box of about 3,000 TIP3P water molecules so that the boundary of the box is at least 9 Å away from any solute atom. Counter-ions of Na þ were placed based on the Columbic potential to keep the whole system neutral. Particle Mesh Ewald (PME) was employed to consider the long-range electrostatic interactions [42]. Following 2,000 steps of minimization, a 1.2 ns (30 ps temperature increase from 10 8K to 300 8K and 1.17 ns equilibration and data collection) MD simulation with a 2.0 fs time step was performed on each complex. The SHAKE procedure was employed to constrain hydrogen atoms [43] during MD and all heavy atoms of SH3 were restrained using a 5 kcalÁmol À1 ÁÅ À2 harmonic force (see discussion in Supporting Information).
To determine the conformational energy of unbound peptide in solvent, 2.0 ns MD simulation was conducted on each peptide. Each peptide was solvated in a water box of about 1,600 water molecules, which extended 10 Å away from any peptide atom. 1,000 steps of minimization were followed by a 2.0 ns MD simulation for equilibration and data collection using the same set-up as described above.
Free energy calculations using the MM/PBSA method. The binding free energy is calculated as: where DE MM is the molecular mechanics interaction energy between the SH3 domain and the peptide, DG PB and DG SA are the electrostatic and non-polar contributions to desolvation upon peptide binding, respectively, and ÀTDS is the conformational entropy change. To consider the conformational flexibility of the peptide, we ran two separate MD simulations on the complex and the free peptide to calculate the binding free energy [16]. DE MM was calculated using the sander program in AMBER 8 [19]. DG PB was calculated using the pbsa program in AMBER 8. The grid size used to solve the Poisson-Boltzmann equation was 0.5 Å , and the values of interior dielectric constant and exterior dielectric constant were set to 1 and 80, respectively (the influence of the interior dielectric constant value to the free energy calculation is discussed in Supporting Information). DG SA was estimated from the surface area [16,44]. The peptide-SH3 interaction energies were calculated from 150 snapshots taken from 300 ps to 1.2 ns MD simulation trajectories of the complex. 160 snapshots taken from 400 ps to 2.0 ns MD simulations on the unbound peptides were used to calculate the conformational energy change for the peptides.
The normal mode analysis was performed to estimate the vibrational component of the entropy using the nmode program in AMBER 8 [19]. In the absence of solvent, the structures (complex, SH3, and peptide) were minimized with no cutoff for non-bonded interactions, by using conjugate gradient and then Newton-Raphson minimizations until the root mean square of the elements of gradient vector was less than 5 3 10 À5 kcal/mol. Then, normal mode calculations were carried out with no cutoff for non-bonded interactions. A distance-dependent dielectric constant (e ¼ 4R ij ) was used to mimic solvent screening. Frequencies of the vibrational modes were computed at 300K for these minimized structures and using a harmonic approximation of the energies. Due to the high computational demand, only 25 snapshots taken from MD were used to estimate ÀTDS.
The VM method. To investigate the preference of residues at each position, systematic single point mutation was performed on the peptide. The peptide APSYSPPPPP in the crystal structure 1bbz was used as the template. Each residue of the peptide was mutated to the other 19 residues using the scap program [41]. Minimization, MD simulations, and MM/PBSA calculations were performed on all 190 mutated complexes, as well as on the free peptides using the same setup described above. Assuming mutating a single residue of the peptide did not significantly change the peptide conformation, we did not include the conformational entropy in the comparison of 20 residues at each position.
Single point mutation free energy profile and database scan. The SPMFEP is a 10 3 20 matrix, which represents the difference between the binding free energies of the mutated peptides and the template peptide APSYSPPPPP (Table 2). SPMFEP can be used as a position specific scoring matrix. The score of each peptide is calculated as: P 10 i¼1 M S;i dðS; S i Þ, where M S,i is the score of the amino acid S at i th position in the SPMFEP and S i is the amino acid at the i th position of the peptide. All ten-residue-long peptides in the SWISS-PROT database (release 46.4) were scored using the SPMFEP. The Perl script used for the database scan is available upon request.  The structure of SH3 shown here was extracted from the snapshot at 0.1 ns. The residues at P 3 , P À3 , P À4 , P À5, and P À6 of the peptide are colored in red and other residues are colored according to residue type defined in Insight II. Found at DOI: 10.1371/journal.pcbi.0020001.sg002 (2.9 MB TIF).