A disordered encounter complex is central to the yeast Abp1p SH3 domain binding pathway

Protein-protein interactions are involved in a wide range of cellular processes. These interactions often involve intrinsically disordered proteins (IDPs) and protein binding domains. However, the details of IDP binding pathways are hard to characterize using experimental approaches, which can rarely capture intermediate states present at low populations. SH3 domains are common protein interaction domains that typically bind proline-rich disordered segments and are involved in cell signaling, regulation, and assembly. We hypothesized, given the flexibility of SH3 binding peptides, that their binding pathways include multiple steps important for function. Molecular dynamics simulations were used to characterize the steps of binding between the yeast Abp1p SH3 domain (AbpSH3) and a proline-rich IDP, ArkA. Before binding, the N-terminal segment 1 of ArkA is pre-structured and adopts a polyproline II helix, while segment 2 of ArkA (C-terminal) adopts a 310 helix, but is far less structured than segment 1. As segment 2 interacts with AbpSH3, it becomes more structured, but retains flexibility even in the fully engaged state. Binding simulations reveal that ArkA enters a flexible encounter complex before forming the fully engaged bound complex. In the encounter complex, transient nonspecific hydrophobic and long-range electrostatic contacts form between ArkA and the binding surface of SH3. The encounter complex ensemble includes conformations with segment 1 in both the forward and reverse orientation, suggesting that segment 2 may play a role in stabilizing the correct binding orientation. While the encounter complex forms quickly, the slow step of binding is the transition from the disordered encounter ensemble to the fully engaged state. In this transition, ArkA makes specific contacts with AbpSH3 and buries more hydrophobic surface. Simulating the binding between ApbSH3 and ArkA provides insight into the role of encounter complex intermediates and nonnative hydrophobic interactions for other SH3 domains and IDPs in general.


Introduction
Protein-protein interactions are involved in most cellular processes, especially cellular signaling. These interactions often involve binding of small protein domains to intrinsically disordered proteins (IDP), but unlike long-lived complexes typically involving larger and stronger interfaces, the binding pathways for these interactions are not always well understood [1][2][3].
Regions of disorder are now known to be present in between 25% and 41% of eukaryotic proteins, and can exhibit functional diversity by having multiple interaction partners [4]. IDPs also tend to bind with lower affinity to their partners than folded proteins, with fast association and dissociation [3,[5][6][7]. This fast binding and unbinding, along with the fast turnover rates of IDPs within cells, allows for regulation of processes that require rapid responses [8]. Despite fast on and off rates, IDP binding interactions must still be very specific in order to relay signals accurately, which may require more complex binding landscapes [3,5,6,9,10]. To fully understand how IDPs bind their partners, how their binding is modulated by different cellular contexts, and how changes to the binding process can be used to regulate function, it is essential to go beyond analyzing the final bound state and instead characterize the complete binding pathway and associated kinetics.
IDPs often bind to folded proteins through a pathway that takes place in at least two steps [11][12][13][14][15]. Binding typically begins with the creation of an encounter complex ensemble when the IDP "dances" on top of its partner domain before transitioning to a more structured fully engaged bound state through a process of induced-fit folding [13,[15][16][17][18][19]. IDPs are well suited to quickly form this initial encounter complex because they generally adopt a more extended conformational ensemble and therefore have a larger capture radius than folded proteins of the same length [20].
Electrostatic interactions have been shown to often drive the formation of encounter complex ensembles and can even accelerate association beyond the diffusion limited rate, predominantly by electrostatic orientational steering [5,11,13,[21][22][23]. Additionally, if one segment of the IDP possesses more intrinsic pre-folded structure, binding may proceed starting with this segment and then extending to the rest of the sequence which folds upon interaction with the partner protein [11,24]. Thus, pre-formed secondary structure can improve affinity and influence the binding pathway, including the nature of intermediate states [25,26], but too much structure can slow binding without improving affinity [11,27]. Additionally, the ability of IDPs to form nonnative contacts during binding, and the presence of significant disorder even after binding can also be important for IDP function [20,27,28]. This underscores the importance of understanding the interactions at play during the binding process as well as in the fully engaged final complex.
In addition to the nature of the intermediate states in the binding pathway, the location of the transition state in the pathway dictates the binding kinetics and is therefore critical to function.
The transition state for binding can either precede or follow the encounter complex intermediate [5]. Fast-binding proteins are canonically thought to bind in a diffusion-limited manner, and therefore experience a rate determining transition state that precedes the encounter complex [5].
In other cases, when electrostatic attraction enhances binding, binding can proceed completely downhill, without a free energy barrier [29,30]. However, weaker IDP complexes with short lifetimes may exhibit binding kinetics that are different in nature from higher affinity complexes [31], and a few IDPs have been shown to associate quickly to form an encounter complex followed by a slower transition to the fully engaged complex [14,17,19,[32][33][34][35][36][37]. Because of their fast binding and dissociation rates and short-lived intermediate states, IDP binding often appears as two-state in NMR [38][39][40] and stopped-flow experiments [41][42][43]. Due to experimental challenges, for many IDPs the specific binding pathway, including the nature of binding intermediates and the timescale of their formation, is still unknown.
Computer simulations have been a valuable tool for examining the binding pathway at temporal and spatial resolutions that cannot be obtained through experiments. Initially, coarsegrained molecular dynamics (MD) simulations based on the topology of the fully engaged complex indicated that the initial step in the binding process for IDPs might often be the formation of a flexible encounter complex [13,37,[44][45][46][47][48][49][50]. Another strategy to simulate IDP binding and characterize the encounter complex with limited computational power has been to conduct atomistic MD using an advanced sampling algorithm [51], such as multicanonical MD [52]. More recently, advancements in both hardware and the accuracy of force fields have enabled unbiased atomistic MD simulations of IDP binding on the microsecond timescale [53]. Those unbiased MD simulations that have explicitly examined the IDP binding pathway generally reveal a fast initial association between the IDP and its partner, followed by a slower evolution into the fully engaged complex [23,[54][55][56][57][58]. However, the details of the binding pathway, including the nature of intermediates in the binding process, has yet to be determined for most IDPs and their binding partners.
One common IDP binding domain is the SH3 domain. It is conserved through more than one billion years of evolution from yeast to humans, and frequently occurs in protein-protein interaction modules, often involving cellular signaling, assembly, or regulation [59]. SH3 domains bind disordered proline-rich target peptides that usually contain a PxxP motif, where x can be any residue [60,61]. This PxxP motif forms a polyproline type II (PPII) helix in the bound complex and is flanked by specificity elements, which often include positively charged lysine or arginine residues [59,62]. The PxxP motif, which is pseudo-palindromic, has been observed to bind the SH3 domain surface in two different orientations (class I and class II) depending on the location of positively charged residues either N or C terminal to the PxxP motif (+xxPxxP or PxxPx+) [63,64]. Despite very similar binding motifs and bound structures, SH3 domains perform a wide variety of different functions in different contexts that require specific binding interactions and biophysical properties [65][66][67]. Understanding the SH3 domain-peptide binding process may help to reveal the mechanism for the functional diversity of SH3 domains and serve as a model for understanding binding properties of extended IDPs.
Previous studies of proline-rich peptides binding to SH3 domains indicated that fully engaged bound complexes often exhibit conformational exchange between different bound states [39,40,68]. However, information about the binding pathway is more limited, as NMR experiments on SH3 domains often indicate two-state binding, possibly due to fast exchange of an encounter complex with either the fully engaged bound state or the unbound state [23,[38][39][40]. One study of SH3-peptide binding found that the transition state for binding is stabilized by long-range electrostatic interactions; however, there is less electrostatic enhancement to the binding rate than for folded proteins, which form more short-range electrostatic interactions in the transition state [31]. Simulation studies of the C-CRK N-terminal SH3 domain binding to a proline-rich peptide have also indicated that electrostatic interactions are important for the formation of the highly dynamic encounter complex, which transitions to the fully engaged complex when the PPII helix locks into the hydrophobic grooves of the binding site [21,23,56]. However, these results are somewhat in contrast to the picture that hydrophobic interactions are most important for stabilizing the encounter complex for IDPs [17], and the authors of these studies did not try to quantitatively show that an encounter complex is an intermediate to binding or assess the different types of intermolecular interactions in the encounter complex across many independent binding simulations [23,56,69]. Therefore, it is still not clear that SH3 domains form a metastable electrostatic encounter complex, or whether the transition state for SH3 binding occurs before or after the formation of such an encounter complex.
We have used all-atom molecular dynamics (MD) simulations to characterize the initial binding interaction between an SH3 domain of yeast Actin Binding Protein 1 (Abp1p) and ArkA, a disordered region of the yeast actin patch kinase, Ark1p. Abp1p is involved in assembly of the actin cytoskeleton through localization of cortical actin patches, actin organization, and endocytosis [70,71]. While several other sequences are known to bind the Abp1p SH3 domain (AbpSH3), ArkA is the partner with the highest affinity for the domain [72]. The structures of AbpSH3 alone and bound to ArkA (Fig 1A) have been solved by x-ray crystallography and NMR, respectively [70,72]. We focus on the binding of a 12-residue truncation of ArkA (residues Lys (3) to Lys(-8)) that binds AbpSH3 with a Kd of 1.7 µM and is comprised of an N-terminal segment containing the PxxP motif and an adjacent C-terminal segment containing key specificity elements (we use a standard numbering system for peptide positions based on [63], as shown in Fig 1B) [72].  [72], showing the two binding surfaces (SI (red) and SII (blue)) with bound ArkA in gray (seg1), magenta (Lys(-3)) and green (seg2). The NMR structure was determined with a longer 17-residue ArkA sequence (residues 6 through -10) [73], but only the shorter ArkA sequence is displayed.
The N and C-termini are labeled. B) Sequence of ArkA used in all simulations with seg1 shown in black, the central lysine in magenta, and seg2 in green. The capping groups on the C and Nterminal ends are also shown.
AbpSH3 has the typical SH3 fold with a five-stranded -sandwich and long RT-loop, which is involved in ArkA binding [72]. The 12-residue ArkA peptide contains three Lys residues ( Fig 1B), giving it a net positive charge, while the AbpSH3 domain has a net negative charge of -12. Thus, electrostatic attraction contributes to the affinity between the peptide and domain. ArkA can be described as two segments (seg1 and seg2) where seg1 is the N-terminal proline-rich end and seg2 is the C-terminal segment (Fig 1B). The proline-rich seg1 interacts with AbpSH3 in the typical class II orientation, with a PxxPx+ sequence that forms a PPII helix with each Px well packed into a groove [72]. The region of AbpSH3 which binds to the PxxP motif is referred to as surface I (SI) (Fig 1A). The C-terminal seg2 forms a 310 helix in the NMR structure and makes contacts on a region of AbpSH3 distinct from SI, referred to as surface II (SII) (Fig 1A). The conserved Lys(-3) serves as the dividing residue between seg1 and seg2, and binds between SI and SII in a negatively charged 'specificity pocket', packing against a Trp side chain [72]. Previous NMR experiments have shown that seg1, containing the PxxP motif, can bind to AbpSH3 without seg2, but it does not fully engage the binding surface [40]. Seg2 alone, on the other hand, shows no detectable binding by NMR titration [40]. The role of each segment in the full binding pathway has not previously been investigated.
Using MD simulations, we found that ArkA initially forms a heterogeneous encounter ensemble, followed by the tight binding of seg1 and seg2 in the correct orientation with the formation of specific contacts (Fig 2). Significantly, ArkA forms many nonnative contacts in this encounter ensemble, but they are restricted to the canonical highly acidic binding surface of the AbpSH3 domain. Seg1 is largely pre-structured in a PPII helix and only needs to lock into the grooves of SI to bind, while seg2 is more conformationally flexible. The PPII helix in seg1 can bind in a reverse orientation in the encounter complex, and seg2 may be important for stabilizing the correct orientation of ArkA on the binding surface. Nonspecific hydrophobic and long-range electrostatic interactions stabilize the encounter complex, while specific hydrophobic interactions form only on transition from the encounter complex to the fully engaged state. Our simulations show that step 1 of binding is more than an order of magnitude faster than the overall association rate determined by NMR relaxation dispersion experiments. Our binding model also explains the greater influence of hydrophobic interactions on binding compared to long-range electrostatic interactions, which likely only affect step 1 of the binding pathway. Overall, we have gained an understanding of the different interactions of the two ArkA peptide segments with AbpSH3 along the binding pathway. This provides insights for how binding of this common interaction domain in other proteins may be tailored to meet their specific functional needs.

MD Simulations
MD simulations were run on four constructs: ArkA bound to AbpSH3 (bound simulations), ArkA alone (unbound simulations), ArkA binding to AbpSH3 (ArkA binding simulations), and ArkA seg1 binding to AbpSH3 (seg1 binding simulations). The bound simulations were all started from the lowest energy NMR structure of ArkA bound to AbpSH3 (PDB: 2RPN) [72]. Before running the simulations, the ArkA sequence was truncated to the 12-residue (Fig 1B) construct and a capping acetyl group was added to the ArkA N-terminus. Two different starting structures were used to initiate the unbound simulations. One starting structure was from the NMR structure of ArkA bound to AbpSH3 and the other was a fully extended peptide. For the ArkA and seg1 binding simulations, the peptide construct was placed at least 10 Å from AbpSH3 to ensure that the peptide and domain were not interacting at the beginning of the binding simulations (simulations were run with a non-bonded cutoff distance of 9 Å for the direct space sum). For the binding simulations, the starting structure of both ArkA and seg1 came from the ArkA unbound simulations, and the AbpSH3 structure came from the unbound crystal structure (PDB: 1JO8) [70].
The effective concentration of the protein in our simulations was around 4 mM (S2 Table), which is close to the experimental concentration of 1 mM. In all constructs, ArkA or seg1 were edited to have a capping acetyl group on the N-terminus (ACE) and a capping amide group on the Cterminus (NHE); except the bound simulations, which only have the acetyl group.
All simulations were run on Amber 16 using the Amber ff14SB forcefield [74], and the binding simulations were run with dihedral angle modifications that improve accuracy for the energy barrier between cis and trans states of the peptide bond [75]. The CUDA version of pmemd in Amber 16 was used to run the simulations on GPUs [76]. All simulations were solvated with TIP3P-FB water [77]. The bound simulations were solvated such that the edge of the box was at least 9 Å from any peptide or protein atom. Binding simulations were solvated with the edge at least 12 Å from any peptide or protein atom and adjusted to have an equal volume. The unbound simulations were solvated with water 15 Å from the edge of the peptide. The box dimensions are summarized in S3 Table. Salt ions were added to neutralize each system: 10 sodium ions for the seg1 binding and bound simulations, 9 sodium ions for the ArkA binding simulations and 3 chloride ions for the unbound simulations.
All systems were subject to two rounds of energy minimization of 1000 steps, where the first 500 steps were steepest descent and the second 500 steps conjugate gradient. The systems were then subject to heating from 100 to 300 K (40 ps with harmonic restraints with a force constant of 10 kcal/mol), and equilibration (50 ps with harmonic restraints with a force constant of 10 kcal/mol). All constructs, except the unbound simulations, were equilibrated again for 200 ps without restraints. Independent simulations were started with new random velocities. Bonds to hydrogen were constrained using the SHAKE algorithm during all simulations. The particle-mesh Ewald procedure was used to handle long-range electrostatic interactions with a non-bonded cutoff of 9 Å for the direct space sum.
The unbound simulations were run using temperature Replica Exchange MD [78]. 48 replicas were simulated using the NVT ensemble with temperatures from 290.00 -425.00 K with geometric spacing to achieve similar exchange probabilities for all replicas (S1 Table) [78]. Each replica was equilibrated without restraints for 500 ps. The simulations were run with an integration step every 2 fs and coordinates stored every 25 ps. Three independent simulations from both an extended peptide and the conformation in the NMR structure were run for at least 125 ns each, resulting in a total of 1.15 s of simulation at 300 K. The first 50 ns of each simulation was removed before analysis, resulting in 0.850 s of simulation data used in the ArkA unbound ensemble.
The bound and binding simulations were run using the NPT ensemble at 300 K with a Monte Carlo barostat, new system volumes attempted every 100 steps, an integration step every 2 fs, and coordinates stored every 10 ps. The number and length of all simulations are summarized below ( Table 1). The replica exchange simulations were run on the XSEDE resource Xstream [79], as well as a local cluster, and all other simulations were run on a local cluster.

Simulation Analysis
To analyze the trajectories, the AmberTools 16 package was used to measure dihedral angles, distances, secondary structure, solvent accessible surface area, hydrogen bonds, and salt bridges [76]. In house Python scripts were used for additional analysis. . ArkA seg1 lacks three of these pairs, so the remaining four were used, in both cases this is called the binding surface distance. The dihedral angles were used to calculate the polyproline II helix (PPII) content as described by Masiaux et al. [82]. Residue distances were calculated based on the center of mass for each residue in ArkA and AbpSH3, and 8 Å was used as the cutoff distance to define a contact. Contact maps were created based on the percentage of the simulation during which residue contacts were made. Contact maps that describe a subset of the simulated ensemble (encounter, forward, reverse, seg2 only, encounter other, or unbound) were created based on the percentage of that subset that is making a contact.
The data from the binding simulations were divided into unbound, encounter complex, and fully engaged based on the binding surface distance ( Table 2). The definition of the fully engaged complex was based on a binding surface distance less than 11.5 Å because the bound simulations had a binding surface distance less than 11.5 Å in 98% of the simulated ensemble. There is a clear free energy barrier between the fully engaged state and the encounter complex based on a population histogram from our binding simulations (S17 Fig). However, in our simulations, there appears to be no free energy barrier between the unbound state and the encounter complex, indicating that formation of the initial encounter complex is downhill in free energy. We still wanted to define the encounter complex separate from the unbound state in order to characterize this intermediate state to binding, so we chose the encounter complex definition to be between 11.5 and 23 Å binding surface distance. This definition of the encounter complex captures the states that are most populated along the binding surface distance reaction coordinate (S17 Fig), and also excludes states where ArkA has no contacts with the SH3 domain (S18 Fig), which exist at binding surface distances greater than 23 Å, defined as unbound. Within the encounter complex ensemble, four categories were defined based on contacts between ArkA and AbpSH3 ( Table 2).  Hydrophobic contacts were selected from those hydrocarbon groups that are closest together in the NMR structural ensemble (2RPN) [72]. Contacts were defined based on a 6 Å cutoff distance between hydrocarbon groups. Hydrogen bonds were counted when the distance between the acceptor atom and donor heavy atom was less than 3 Å and the angle between the acceptor atom, donor hydrogen, and donor heavy atom was greater than 135. Similarly, salt bridges were counted when the distance between the heavy atoms of the charged groups was less than 3 Å and the angle between the oxygen atom, hydrogen, and nitrogen was greater than 135.
SH3 domain dipole moment. The net dipole of the AbpSH3 domain was calculated using the

Protein Dipole Moments Server administered by the Weizmann Institute Department of Structural
Biology [83] by uploading the crystal structure of the domain (PDB: 1JO8) [70].
Calculation of kon and k1. The time constant, , for binding of ArkA and seg1 to AbpSH3 is related to the rate constant, kon or k1, and the concentration of protein and peptide by the equation where k is either kon or k1. kon is the rate constant for complete binding to the fully engaged complex, while k1 is the rate constant for formation of the encounter complex. The volume of the boxes varied slightly between the two ArkA constructs, so the concentrations and rates were slightly different (S2 Table).
If the AbpSH3 concentration is held constant, then the transition from unbound to either the encounter complex or the fully engaged complex can be treated as a first-order reaction, so the binding time follows a Poisson distribution [84,85]. The binding time constant, , was calculated from a fit of the empirical cumulative distribution function to the theoretical cumulative distribution function (TCDF) [85], where is time of the simulation when binding occurs. Since we observed some overlap in the distribution of binding surface distances for fully engaged and encounter complexes, we used a more stringent definition of binding for identifying transitions between states. To go from the encounter complex to the fully engaged complex we required that the binding surface distance be below 10.5 Å for at least 1 ns, and to go from unbound to encounter we required it to be below 21 Å for 1 ns. was then used to calculate the binding rate constant, The standard deviation in calculation of k1 was determined using the bootstrap method, but we could not determine a standard deviation for kon because not all simulations reached the fully engaged state.
Our simulations were performed without salt present (aside from neutralizing ions), while the experimental rate constants were measured with 100 mM NaCl and 50 mM phosphate, which could affect the binding rate, particularly for the formation of the encounter complex, which is partially driven by electrostatic interactions.
To determine the number of transitions from the encounter complex, we used a similar method and required that the binding surface distance be above 25 Å for at least 1 ns for the transition from encounter complex to unbound to be counted in order to make sure that we only counted true transitions out of the encounter complex free energy minimum (S17 Fig).

Experimental
The AbpSH3 protein and ArkA peptide were produced as previously [40]. The spectra were processed using standard approaches and the program chemex [86,87] assuming a 2-state reaction [40,88], which has been applied to a few other domains [31,38]. Global fitted values of kex and pbound were extracted (224 s-1 and 0.08) from these data. A value of koff was subsequently calculated (206.5 s-1) using the equation

ArkA disorder when unbound & bound
We first sought to characterize the structural ensemble of the unbound ArkA peptide to help determine how the intrinsic structural propensities of ArkA contribute to binding. In order to determine the structural ensemble of the unbound ArkA peptide, we simulated the 12-residue ArkA alone using REMD [78] (unbound simulations). The completeness of sampling was examined using the running averages of 310 helix, bend, and turn structure, as well as end-to-end behaving as a disordered peptide with both the end-to-end distance and dihedral angle RMSD sampling multiple states that are different from the NMR reference structure (Fig 3A). We also ran simulations starting from the NMR bound structure of ArkA to compare its structure when fully engaged with AbpSH3 (bound simulations) (Fig 3C). NMR experiments have shown that in the fully engaged state ArkA seg1 adopts a PPII helix and seg2 is often in a 310 helix [72]. In both the alone and bound simulations, seg1 of ArkA is largely structured with the majority of time spent in a PPII helix, while seg2 is less structured (Fig 4).   Table 1. This ensured that the binding simulations would not be biased by a single starting ArkA conformation. We also ran binding simulations with the shorter seg1 peptide (seg1 binding simulations), which contains the PxxP motif that is relatively structured, to examine the different roles of seg1 and seg2 in binding (S1 Text).
In the ArkA binding simulations, we found that an initial encounter complex forms quickly before ArkA transitions more slowly to a fully engaged state (S1 Movie and S2 Movie). As described in the methods, the fully engaged state was defined as a structure where the binding surface distance is below 11.5 Å, while we defined the encounter complex as having binding surface distance between 11.5 Å and 23 Å. In the binding simulations, ArkA passes through the encounter complex (Fig 5A-B) before reaching the fully engaged state. Interestingly, in some of the independent simulations ArkA forms an encounter complex, then dissociates before rebinding ( Fig 5A), while in others it quickly reaches the same fully engaged state observed in the bound simulations ( Fig 5B). In the seg1 binding simulations we similarly observed the formation of an initial encounter complex, followed by either unbinding and rebinding, or transition to a stable fully engaged bound state (S11 Fig), while in the bound simulations, the complex remains in the fully engaged state 98% of the time (Fig 5C).  Table) and the maximum AbpSH3 domain diameter is 33 Å. Center of mass distances between ArkA and the SH3 domain range from 11 to 64 Å in the binding simulations, while binding surface distances range from 9 to 67 Å. In the bound simulations, the binding surface distances range from 8 to 16 Å.

The ArkA encounter complex is a heterogeneous ensemble that includes nonnative interactions
To further examine the nature of the encounter complex, we projected the data onto coordinates corresponding to native backbone folding (dihedral RMSD) and binding (pairwise binding surface distance) (Fig 6). In the binding simulations, ArkA samples many states with different degrees of native folding and binding before reaching the fully engaged and native folded state found in the lower left of the plot (Fig 6A, blue rectangle). In particular, the encounter complex ensemble is a highly heterogeneous state, as shown in Fig 3D, and 57% of the ArkA binding ensemble occupies the encounter complex without forming the native ArkA fold (Fig.   6A), indicating that ArkA does not need to be already preformed in the native conformation before interacting with AbpSH3, consistent with an induced-fit binding mechanism. However, in 20% of the binding ensemble ArkA has a native fold but is still in the encounter complex. This indicates that, at times, ArkA may first adopt a native fold and then reorient and dock into the fully engaged state in a conformational selection mechanism. Multiple steps and potential pathways to binding may exist within the encounter complex ensemble. Additionally, Fig 6B confirms that the bound simulations stay fully engaged but do rarely (4% of the ensemble) sample nonnative conformations that are different from the NMR structure (unfolded and fully engaged in Fig 6B). We only observe two brief instances, totaling to less than 2% of the ensemble, where the complex transitions to the encounter complex and back to the fully engaged state in one of the five bound simulations. as has been observed for other SH3 domains [89]. In the encounter complex, seg1 forms more contacts (9 on average) with the binding surface than seg2 (6 on average). The nonnative contacts on the binding surface that are formed in the encounter complex are consistent with ArkA binding in reverse in part of the encounter ensemble (Fig 8). In general, SH3 domains depend on the PxxP motif to bind, and as this is a pseudo-palindromic motif, reverse binding for seg1 on SI is not surprising. To further examine the conformational states in the encounter complex, we broke the encounter complex into four categories (defined in the methods, Table 2): forward, reverse, seg2 only, and other ( Table 3). The reverse structures are found in the part of the encounter ensemble that has a binding surface distance higher than 15 Å (Fig 6A), while the forward structures are found at binding surface distances less than 15 Å, as expected. The encounter complex contact map (Fig 7) shows that seg1 interacts more with the binding surface overall than seg2. Furthermore, Table 3 shows that the encounter complex is twice as likely to sample a state with seg1 engaged in the correct orientation on SI than a state with only seg2 engaged in the correct place on SII. Together, this indicates that seg1 likely binds before seg2.
Seg2 may be needed to ensure specific forward binding since seg2 does not interact with the domain when ArkA binds in reverse (Fig 8). Both the ArkA and seg1 binding simulations exhibit forward and reverse binding, showing that in the encounter complex the two segments behave somewhat independently. However, the ArkA encounter complex ensemble is much more complex and heterogeneous than that of the short seg1 peptide (S12 Fig), and ArkA samples the forward state less often than the seg1 peptide ( Table 3), indicating that this short peptide may not give an accurate representation of how that segment behaves as part of the longer sequence.  Although the percent of the encounter complex ensemble that is in the forward and reverse encounter is about the same ( Table 3), 33 of the 50 individual binding simulations sampled forward encounter at some point in the simulation, compared to only 15 that sampled the reverse encounter.
We found that generally, when the encounter complex enters the forward encounter state it does not change orientation; however, sometimes ArkA spins around and shifts over to go from forward to reverse (2 times out of 31) or reverse to forward (4 times out of 15) without entering the unbound state in between (S3 Movie). The encounter complex is in dynamic exchange between different predominantly nonnative conformations and contacts, including the forward and reverse orientation of the peptide (Fig 3D). This dynamic exchange may help to prevent the encounter complex from being trapped in off-pathway states for binding, such as a reverse encounter complex.
The nonspecific, disordered encounter complex that we characterize from our simulations is similar to encounter complexes seen previously in MD simulation studies of the proline-rich Sos (designed to imitate the encounter complex) [56]. With our comprehensive analysis, it now seems likely that a diverse encounter complex ensemble that includes nonnative interactions may be characteristic of the binding between proline-rich peptides and SH3 domains.

Long-range electrostatic interactions stabilize the encounter complex
Because of the complementary charges of ArkA and AbpSH3 and previous studies that focused on electrostatic interactions, we chose to particularly examine the role that long-range electrostatic interactions play in the encounter complex. We measured the intermolecular electrostatic contacts present in the encounter complex ensemble and in the fully engaged ArkA-AbpSH3 complex. While the long-range electrostatic contacts present in the encounter complex ensemble are more diverse (nonspecific) than those in the bound simulations (Fig 9), the average number of electrostatic contacts in the encounter complex ensemble at any given time is very similar to the average number in the bound simulations (Table 4). Thus, the main favorable contribution of the positively charged ArkA peptide interacting with the negatively charged AbpSH3 binding surface is gained upon formation of the encounter complex rather than upon transitioning from the encounter to the fully engaged state.
Previous studies of SH3 binding have also found that electrostatic interactions are important for the formation of the complex [89]. MD simulations of the Sos peptide binding to c-Crk N-SH3 were able to specifically identify electrostatic interactions that occur in the encounter complex, including nonnative contacts, although they did not quantify these interactions [21,23].
Experimental studies of the viral NS1 peptide binding to the CrkII N-SH3 domain indicate that electrostatic contacts are important for specific binding, and that flexibility in the fully engaged state allows increased electrostatic stabilization as multiple interactions form as part of the ensemble of bound states [68,89]. Our simulations indicate that a diversity of different electrostatic contacts, each present in only part of the ensemble, is even more characteristic of the ArkA-AbpSH3 encounter complex than the fully engaged complex. The heterogeneity, or 'fuzziness', of the encounter complex ensemble is important, as there can be multiple pathways from this fuzzy encounter state to the fully engaged complex [37]. Electrostatic interactions can enhance this effect, not only stabilizing the encounter complex, but also lowering the free energy barrier to transition between basins and transition to the fully engaged state [13,37]. As many of the ArkA-AbpSH3 encounter complex electrostatic contacts do not form at all in the fully engaged ensemble, it is also important that none are strong enough interactions to trap the complex in a conformation incompatible with transitioning to the fully engaged state.

Hydrophobic and short-range interactions are nonspecific in the encounter and specific in the fully engaged complex
Since simulations indicate that long-range electrostatic interactions are already formed in the ArkA-AbpSH3 encounter complex, we sought to identify what energetically favorable changes occur upon transitioning from the encounter complex to the fully engaged state. By measuring the solvent accessible surface area of the complex, we found that in the encounter complex part of the SH3 domain binding surface is buried (Fig 10A) because ArkA forms transient nonspecific hydrophobic interactions with the binding surface. However, in transitioning to the fully engaged complex, the ArkA PPII helix packs into the grooves in SI, and native contacts that are largely absent in the encounter ensemble form between hydrophobic sidechains at the interface (Fig 10C).
This buries more of the binding surface, which transitions from ~50 to ~45 to ~40 nm2 solvent exposed surface area as the complex transitions from unbound to the encounter complex to the fully engaged complex (Fig 10A). Additionally, in transitioning from the encounter complex to the fully engaged complex, one to two specific short-range hydrogen bond or salt bridge interactions appear (S14 Fig). In particular, in the bound simulations, there is one hydrogen bond, from the ArkA P(2) carbonyl oxygen to the AbpSH3 Y54 side chain hydroxyl group, that is present more than any others, in 88% of the simulated ensemble (Fig 10B). We also found that a short-range electrostatic salt bridge forms between ArkA K(-3) and AbpSH3 E17 in 82% of the bound simulations (Fig 10D). These specific, short-range interactions are rarely formed in the encounter ensemble, indicating that they may also to help to stabilize the fully engaged state and prevent unbinding. Previous mutation studies have found that mutating K(-3) or P(2) causes a large reduction in binding affinity of ArkA [72], possibly in part due to disruption of the salt bridge or hydrogen bond that these residues form.
The K(-3) mutation had the largest effect on binding affinity [72], which may also be in part due to its specific hydrophobic interactions in the fully engaged complex (Fig 10C). The other mutation that caused a significant reduction in binding affinity was L(-7) [72], which is a hydrophobic residue that is also buried when the fully engaged complex forms in our simulations (Fig 10C), indicating the importance of these specific hydrophobic interactions. While the encounter complex is characterized by nonspecific electrostatic and hydrophobic interactions, the fully engaged complex requires more specific and complete hydrophobic contacts between ArkA and the AbpSH3 binding surface and is additionally geometrically constrained by the formation of a specific hydrogen bond and salt bridge.

ArkA-AbpSH3 two-step binding model
Putting together all of our data from the binding simulations, we can form a picture of how ArkA binding to AbpSH3 proceeds (Fig 2). Initially the ArkA peptide is (orientationally) steered by long-range electrostatic attraction to the AbpSH3 binding surface and forms a metastable encounter complex (step 1). This encounter complex is stabilized by transient and nonnative interactions, including long-range electrostatic interactions and partially engaged hydrophobic contacts, but the binding surface is still partially solvated, especially SII. Even before formation of the encounter complex, seg1 of ArkA is pre-folded into a PPII helix, and in the encounter complex it often forms nonspecific hydrophobic interactions with SI and nonnative hydrogen bonds, although part of the peptide is still solvated in any given conformation, and the P(2) to Y54 hydrogen bond and K(-3) to E17 salt bridge are essentially absent. The seg1 PPII helix can interact with SI of AbpSH3 in either the forward or reverse orientation in the encounter complex, but the reverse orientation requires that seg2 interact with solvent rather than SII of AbpSH3. From the forward state of the encounter complex, ArkA-AbpSH3 can transition to the fully engaged state through a zippering process [5], burying hydrophobic sidechains and displacing more solvent, particularly on seg2 and SII, and forming the P(2) to Y54 hydrogen bond and K(-3) to E17 salt bridge (step 2). This transition also coincides with seg2 of ArkA becoming a bit more structured, although it is clear that the fully engaged state is still in dynamic exchange, consistent with previous co-liner chemical shift perturbation measurements [40]. The AbpSH3 binding pathway that we have characterized (Fig 2) is similar to that proposed for the c-Crk N-SH3 domain [21,23], although our simulations provide more sampling of individual binding trajectories, allowing us to quantitatively characterize the presence of different conformational states and long and short range interactions that are present in the encounter complex ensemble. In combination with previous studies, our results indicate that this pathway may be a common binding progression for prolinerich peptides binding to SH3 domains.

ArkA intrinsic structure affects the binding pathway
Even in the fuzzy encounter complex ensemble, seg1 of ArkA is largely folded into a PPII helix, exhibiting a binding strategy also employed by other IDPs, where one segment with preformed structure can dock into place first, followed by the coupled folding and binding of more flexible segments [5,7,11,25,26,48]. Polyproline sequences are especially well adapted to this strategy as PPII helices are rigid, allowing them to project from folded parts of a larger full length protein, and hydrophobic yet also highly soluble in water [5]. This involvement of pre-formed structure in the binding pathway is useful for modulating the entropy change on binding by tuning the degree of structure present in the unbound state [5,24]. Sequence changes that change the PPII propensity but maintain the same fully-engaged SH3 complex could be a mechanism to tune the association rate, affinity, and specificity of the interaction for different cellular functions. Seg2 of ArkA is more flexible and therefore less likely to form the first tight interactions with the AbpSH3 binding surface. Modulating the amount of intrinsic 310 helix structure in seg2 would be unlikely to affect the peptide binding affinity [25], since this segment also remains quite flexible in the fully engaged complex.

Binding rates probed by NMR and MD simulations
Using NMR CPMG experiments, we determined that the ArkA peptide binds quickly, on the s timescale, at our experimental concentrations (Table 5) [40]. In the binding simulations, ArkA generally reached a stable state in the 1 s of simulation time, but often this state was part of the metastable encounter complex rather than the fully engaged state. Based on the 9 simulations (out of 50) that did reach the fully engaged state, we calculated a binding rate constant, kon, which we compare to the experimental binding rate (Table 5). Our simulations show that binding happens on a similar timescale to the binding rates measured by NMR. However, the kon value from our simulations is imprecise because most simulations remained in the encounter complex (panel A in S16 Fig), and we are only able to definitively state that our simulations are not inconsistent with the rate constants determined by NMR. We can more precisely calculate a rate constant, k1, for step 1 of binding (Fig 2), since all simulations reached the encounter complex (panel B in S16 Step 1 occurs more than an order of magnitude more rapidly than the complete binding process. This extremely rapid k1 indicates that k2 could be quite low and still result in the fast kon observed experimentally. For example, using this value of k1, a rough approximation of k-1 from our simulations (2.6  107 s-1), and the experimental value of kon, we can calculate k2 based on the steady-state approximation for a two-step reaction. If we approximate k-2 = 0, kon is given by and we can solve for k2 in terms of kon: Based on this calculation, we find that k2 is 6.8  105 s-1. This corresponds to a timescale for step 2 of about 1 s, which is similar to the timescale of binding for a single ArkA molecule at our experimental SH3 domain concentration (~8 s).  IDPs seem to exceed the upper limit for binding [43,90]. One study of a disordered region of PUMA binding to Mcl-1 found that an association rate that at first seemed to be diffusion limited in fact showed a temperature dependence for kon, indicating an energy barrier in the association process, and therefore two-step binding [43]. In our simulations, the diffusion limited association rate with an electrostatic enhancement is captured by k1; however, the overall association rate, kon, also depends on step 2 in our binding model (Fig 2).
In our simulations, step 1 (formation of the encounter complex) happens about two orders of magnitude more rapidly than the overall binding process, indicating that the transition state for binding occurs after the formation of the encounter complex. In fact, initial formation of the encounter complex appears to be a downhill process with respect to free energy, as measured along the binding distance reaction coordinate (S17 Our simulations indicate that when ArkA has formed an encounter complex with AbpSH3, it is still more likely for it to unbind and begin interacting with something else than to proceed to the fully engaged state. Our simulation result showing that the rate limiting step for ArkA binding to AbpSH3 occurs after the encounter complex formation contrasts with previous experimental data that indicate a two-state binding process [40]. Typically, when a single association rate is observed for a two-step binding reaction, it indicates that step 2 is very fast compared to step 1 [7], but this does not appear to be the case for SH3 domain binding based on this and other simulation studies where the encounter complex forms more quickly than the fully engaged state [23,69]. In fact, in our simulations, step 1 of binding actually proceeds downhill, consistent with other studies of electrostatically enhanced binding [29,30]. Other MD simulation studies of IDP binding have also revealed encounter complexes that form quickly, followed by a slower transition to the fully engaged complex [50,54,58]. One alternative explanation for the apparent two-state binding is that the encounter complex is only present at a very low population (< 0.5%), and therefore not detectable by NMR [38].

The role of the encounter complex and hydrophobic interactions in binding kinetics and function
Our two-step binding model (Fig 2) that includes a fuzzy encounter complex stabilized by nonspecific hydrophobic and electrostatic interactions followed by formation of native contacts in the fully-engaged complex is observed in simulations of other IDP binding proteins, including other SH3 domains [21,23], the PDZ domain [54], self-binding proteins [57], and the TAZ1 domain [50]. IDP complexes that lack strong charge complementarity, such as pKID and KIX, are similar, but rely mainly on nonspecific hydrophobic interactions to stabilize the encounter complex [48]. interactions are critical to steps 1 and 2, forming nonspecifically in the transition to the encounter complex and specifically, to bury more surface area, when transitioning to the fully engaged complex. If binding only occurred in one step, mutations that affect hydrophobic interactions would only affect the dissociation rate, and not association rate of the peptide. However, with our two step binding model, we predict that hydrophobic interactions affect the stability of both the encounter complex and fully engaged state, and therefore play a role in determining the overall association rate.
The central role of hydrophobic interactions in SH3 binding was also observed by Meneses and Mittermaier [31]. They find that electrostatic rate enhancement of binding to the Fyn SH3 domain is minimal since long-range electrostatic interactions do not significantly increase the association rate compared to hydrophobic interactions. In our model (Fig 2), hydrophobic interactions form during both reaction steps and could have large effects on the association rate as well as the dissociation rate. This is consistent with the differences in CrkII N-terminal SH3 binding by the virus protein NS1 and the endogenous binding partner JNK1 observed by Shen et al. [89]. While the increased binding affinity and higher association rate of NS1 has been attributed to its higher positive charge [89], NS1 also contains more hydrophobic residues than JNK1, particularly within the PxxPx+ motif, which likely also has an effect on association since hydrophobic interactions enhance the formation of the encounter complex and fully engaged complex.
The encounter complex would likely play an important functional role in SH3 binding in the cellular context. Competition between binding partners may need to be tuned by modulating the encounter complex to determine which interaction will be dominant, as in the case of CITED2 competing with HIF-1 to bind TAZ1 [91]. CITED2 forms an encounter complex with TAZ1 while HIF-1 is bound, which allows it to completely displace HIF-1 even though both partners have similar affinities to TAZ1. There is evidence that AbpSH3 can form an intramolecular interaction with a proline-rich sequence of the Abp1p protein, which may inhibit binding of other partners [92].   from those hydrocarbon groups that are closest together in the NMR structural ensemble (2RPN) [72]. Contacts were defined based on a 6 Å cutoff distance between hydrocarbon groups. Error bars represent the standard deviation between independent simulations. There is no clear barrier between the unbound state and encounter complex, indicating that formation of the encounter complex from the unbound state is downhill in free energy. Within the encounter complex, the binding surface distance reaction coordinate reveals two populations.