Global Conformational Selection and Local Induced Fit for the Recognition between Intrinsic Disordered p53 and CBP

The transactivation domain (TAD) of tumor suppressor p53 can bind with the nuclear coactivator binding domain (NCBD) of cyclic-AMP response element binding protein (CBP) and activate transcription. NMR experiments demonstrate that both apo-NCBD and TAD are intrinsic disordered and bound NCBD/TAD undergoes a transition to well folded. The recognition mechanism between intrinsic disordered proteins is still hotly debated. Molecular dynamics (MD) simulations in explicit solvent are used to study the recognition mechanism between intrinsic disordered TAD and NCBD. The average RMSD values between bound and corresponding apo states and Kolmogorov-Smirnov P test analysis indicate that TAD and NCBD may follow an induced fit mechanism. Quantitative analysis indicates there is also a global conformational selection. In summary, the recognition of TAD and NCBD might obey a local induced fit and global conformational selection. These conclusions are further supported by high-temperature unbinding kinetics and room temperature landscape analysis. These methods can be used to study the recognition mechanism of other intrinsic disordered proteins.

A large number of proteins (between 25% and 41%) are intrinsically disordered, however, these proteins also play important function in cell signaling and cancer upon binding with multiple interaction partners. [11] In this study, NMR experiments indicate that apo-TAD is intrinsic disordered protein and apo-NCBD is not entirely unstructured with a helical molten globule [3] [12]. Upon binding each other, both NCBD and TAD undergo a transition from disordered to well folded. [10] This suggests that both NCBD and TAD have significant conformational adjustment in complex. These experimental observations raise an interesting question if these intrinsic disordered NCBD and TAD obey an induced fit upon binding. To reveal this question, we utilize all atom molecular dynamics (MD) simulations in explicit solvent to analyze the coupling between binding and folding in the NCBD-TAD complex [13].
However, so far the folding time scales of all atomic MD simulations are restricted to microsecond magnitude at room temperature (298K), which is significant shorter than the folding half times of most proteins [14,15]. In order to reveal the conformational changes within reasonable time, all MD simulations in explicit solvent at high temperature have been widely used to monitor the unfolding pathways of proteins. The unfolding timescales could be nanosecond at 498K [14,16]. Moreover, according to the principle of microscopic reversibility, experiments have demonstrated that the transition state for folding and unfolding is supposed to be same [14]. Therefore, MD simulations high temperatures are used in this study. Although it is impossible to accumulate long enough trajectories at room temperature to draw any meaningful conclusions, multiple trajectories of room temperature simulation are also used in this research to compare with experimental observations and other simulations.

Molecular Dynamic Simulations
The atomic coordinates of NCBD and TAD were obtained from NMR structure (pdb code: 2L14) [10]. Point mutants of L10A/L13A, L69Q/W70S, W100Q/F101S and L69Q/W70S/ W100Q/F101S were modeled with SCWRL3. [17] All hydrogen atoms were added using the LEAP module of AMBER 11 [18]. Counter-ions were used to maintain system neutrality. All systems were solvated in a truncated octahedron box of TIP3P waters with a buffer of 10 Å [19]. Particle Mesh Ewald (PME) [20] was applied to handle long-range electrostatic interactions with default setting in AMBER11 [18]. The parm99 force filed was used to compute the interactions within protein [21]. The SHAKE algorithm [22] was employed to constrain bonds including hydrogen atoms. All solvated systems were first minimized by 1000-step steepest descent to remove any structural clash, followed by 20 ps heating up and brief equilibration in the NPT ensembles at 298K. The time step was 2 fs with a friction constant of 1 ps 21 using in Langevin dynamics. To study the folded state of each solvated system, ten independent trajectories of 10.0 ns each in the NPT ensemble at 298K were simulated with PMEMD of AMBER11. Then ten independent unfolding trajectories of 10 ns each were performed to investigate unfolding pathways for each solvated system in the NVT ensemble. Four mutant systems were simulated for five trajectories of 10.0 ns each at 298K. A total of 800 ns trajectories were collected for the wild type and mutant at 298K and 498K. It took about 55,000 CPU hours in the in-house Xeon (3.0 GHz) cluster.
Native contacts of the bound and apo states for NCBD and TAD were monitored to detect the beginning of the unfolded state. It was found that 8 ns were sufficient to reach the equilibrium state for both apo and bound states at 498K. Therefore, the first 8 ns (a total of 80.0 ns for each system) were used to monitor the unbinding kinetics. The remaining 2 ns (a total of 20.0 ns for each system) were used to study the unfolded equilibrium state.

Transition State Analysis
The Ca root mean square deviations (RMSD) and conformational cluster at 498K were used to determine transition state [23]. The RMSD between any two conformers along the 498K trajectory was defined byD ij . Suppose Nstructures were extracted from a trajectory. These structures were mapped as N points (x i ,y i ) into two dimensions. The Euclidean distance of any two points was calculated with d ij~½ (x i {x j ) 2 z(y i {y j ) 2 1=2 . Suppose the N structures and the N points in two-dimension are linear mapping, the error E between them can be defined with equation (1). Nonlinear mapping algorithm (NLM) [24] was applied to optimize the error E between N points and N structures.

Data Analysis
The tertiary assignments were performed with in-house software [25,26,27,28,29,30,31]. The side chains of two residues that were not adjacent were supposed to be in contact when the distance was less than 6.5 Å . Electrostatic interactions are assigned when the distance between the center mass of positive charge and negative charge residues for NCBD and TAD is less than 11 Å [32]. The fraction of native tertiary contact (Qf) and the fraction of native binding contact (Qb) are used to reveal the unbinding kinetics. All fitted kinetic curves are handled with Origin 8.5.
According to the definition of protein W-values, the W-values were calculated with a similar method used in the previous works [33,34,35] W caled whereN TS i represents the number of native contacts of residue i at the transition state, N F i and N U i represents the number of native contacts of residue i at the folded and unfolded state, respectively. . Helices a1, a2 and a3 of NCBD are colored with blue, cyan and green, respectively. Helices a4 and a5 of TAD are colored with yellow and red, respectively. N and C-terminal domains are labeled.large number of proteins (between 25% and 41%) are intrinsically disordered, however, these proteins also play important function in cell signaling and cancer upon binding with multiple interaction partners. [11] In this study, NMR experiments indicate that apo-TAD is intrinsic disordered protein and apo-NCBD is not entirely unstructured with a helical molten globule [3] [12]. Upon binding each other, both NCBD and TAD undergo a transition from disordered to well folded. [10] This suggests that both NCBD and TAD have significant conformational adjustment in complex. These experimental observations raise an interesting question if these intrinsic disordered NCBD and TAD obey an induced fit upon binding. To reveal this question, we utilize all atom molecular dynamics (MD) simulations in explicit solvent to analyze the coupling between binding and folding in the NCBD-TAD complex. [13]. doi:10.1371/journal.pone.0059627.g001 Contacts were assigned if the side chain heavy atoms of two residues that are nonadjacent were less than 6.5 Å .

Conformational Selection and Induced Fit Mechanism
For each simulated bound conformation in 298K trajectory, the apo conformation with the minimum RMSD was selected as the global most similar structure [36]. 10 pairs of bound conformations and apo conformations were used to evaluate the average RMSD as a function of distance from the centroid of binding partner.
To obtain the relationship between structural deviations and the distance from the centroid of binding partner, all atoms are assigned to different groups range from 0 to 50 Å at intervals of 0.5 Å [36] according to the distance from the centroid. The bound and apo conformations were superposed by all Ca atoms. The centroid of binding partner was defined as the center of mass in bound partner conformation.
For each pair of bound and apo conformation, the RMSD deviations were used to test the R value. The two-sample Kolmogorov-Smirnov (KS) [37] test was used to reveal the distribution of RMSD values for bound conformation and apo conformation in each distance group. Both the median P value and the conformation fraction with P,0.1 were calculated for total 100 pairs of conformations in each distance group. Because the distributions of magnitudes of structural deviations do not fit any known distribution, therefore, the nonparametric KS test [38] was used in our study.
Histograms of average RMSD values between bound conformation and apo conformations were quantified as the magnitude of conformational selection and induced fit for total 100 pairs of conformations. The probabilistic weighting differences between distributions of conformational selection and induced fit [36] were used to determine the relative magnitude where D c and D I represent the distributions of deviation values for conformational selection and induced fit, respectively, rrepresent the RMSD values, f represent the frequencies of RMSD values, N C and N I represent the number of points in distributions of RMSD values for conformational selection and induced fit, respectively.

Binding Mode between Intrinsic Disordered NCBD and TAD
A limited number of trajectories for MD simulations (5-10) are sufficient to capture the average properties of proteins [39]. To study the recognition for intrinsic disordered NCBD and TAD, 10 independent trajectories of 10.0 ns each for apo-NCBD, apo-TAD, and their complex were simulated at room temperature (298K), respectively. Ca and W/y fluctuations for apo and bound states are illustrated in Figure 2. The Ca variations of bound NCBD are significant smaller than those of apo-NCBD, especially in the region of C-terminal (residues Ile42 to Gly58) at the TAD binding site. This indicates that bound NCBD becomes more stable upon TAD binding. The Ca variations of bound TAD are also smaller than those of apo-TAD, especially in helix a5 (residue Pro94 to Trp100) within AD2 subdomain. It suggests that helix a5 becomes less flexible and more rigid upon NCBD binding. The W/ y variation of bound NCBD is similar to that of apo-NCBD. This suggests that the secondary structure has not significant change upon TAD-binding. The W/y variation of bound TAD is similar to that of apo-TAD. These results are in good agreement with NMR experiments [10].
To clearly illustrate the conformational difference, the landscapes of distance difference between the average pairwise intra-molecular distance of bound states and corresponding average pairwise intra-molecular distance of apo states for NCBD and TAD are shown in Figure 2C-2D. The landscapes can reflect the relative conformational change of NCBD and TAD backbone. The deep red areas show that the distance differences between residues 5-30 and 40-59 for NCBD are positive values. This indicates that both helices a1 and a2 are stretched away from helix a3. The deep blue areas represent that the distance differences are negative values. The most regions of distance differences between bound and apo-NCBD residues are negative, indicating that the NCBD backbone become more rigid and structured upon p53 TAD binding. The landscape of distance differences between bound and apo-TAD residues is also shown in Figure 3. It is found that most distance differences are positive, especially the region of residues Glu75 to Leu90. This reflects that bound TAD is extended upon NCBD binding because the residues from Pro74 to Ala86 form a long distorted loop around NCBD helix a3 [10].
To explore the conformational difference between bound and apo states for NCBD and TAD, the energy landscapes with the reaction coordinates of RMSD and the radius of gyration (Rg) were shown in Figure 3. From this figure, we can find that apo-NCBD has the propensity of extension with the maximum Rg of 18 Å and RMSD of 12 Å for apo state (16.5 Å for bound NCBD). The distribution of conformer for apo-TAD is also looser than that of bound TAD. There are two basins of energy-minimum for bound NCBD, one basin (Bound1) with Rg values between 13 Å and 14 Å and Ca RMSD ranking between 4 Å and 8 Å , the other (Bound2) with Rg between 14.5 Å and 15 Å and Ca atom RMSD between 10 Å and 11 Å . There is just one energy-minimum basin for apo NCBD with Rg between 11 Å and 13.5 Å and Ca atom RMSD between 5 Å and 10 Å . For TAD, there is one energyminimum basin for apo TAD with Rg between 11 Å and 13 Å and Ca atom RMSD between 5 Å and 8 Å , and one basin for bound TAD with Rg between 13 Å and 14 Å and Ca atom RMSD between 3 Å and 5 Å . The average structures corresponding to energy-minimum for NCBD and TAD are also shown in Figure 3. The helical content of apo NCBD is about 44%, which is lower than that of bound1 (47%) or bound2(54%). As for the conformers of two basins of bound1 and bound2 for bound NCBD, helix a3 of bound1 separates from helices a1 and a2. The helical content of apo TAD (28%) is similar to that of bound TAD (27%).
To study the driving force for these conformational adjustments, the electrostatic, hydrophobic, and hydrogen-binding interactions between NCBD and TAD were analyzed. Figure 4A illustrates the stable hydrophobic contacts in ten independent simulations. Eight stable hydrophobic contacts were found with population higher than 60%, such as Ala42/Leu73, Ile44/Met87, Phe43/Trp100, Ala42/Pro74, Leu17/Phe101, Met40/Met87, Leu17/Ile97, and Leu14/Phe101. It was found that the hydrophobic residues of Met87, Ile97, Trp100 and Phe101 are located at helix a5. This suggests that AD2 subdomain plays a key role in hydrophobic interactions between NCBD and TAD, consistent with the experimental observation [7,10]. The electrostatic interactions are illustrated in Figure 4B. Five stable electrostatic interactions are found between NCBD and TAD. The positively charged and negatively charged residues are focused on Arg47/Asp96, Lys45/ Glu75, Lys50/Asp96, Arg3/Glu75 and Lys18/Asp104, indicating that electrostatic contacts also have contributions to the stability of complex. Besides the hydrophobic and electrostatic interactions, there are also two stable hydrogen bonds and shown in Figure 4C. They are focused on Arg47 and Asp96. In summary, these interactions are located at helices of NCBD and TAD. This suggests that helical regions are critical for the binding of NCBD and TAD. The contribution of binding free energy between NCBD and TAD for hydrophobic residues is about 80% with the MMPBSA method (shown in Figure S1A). Therefore, hydrophobic contacts play key roles in the recognition between intrinsic disordered NCBD and TAD. This is in accord with most proteinprotein interactions conclusion that the interface is predominant hydrophobicity [40].

Binding Kinetics
The fraction of native tertiary contact (Qf) and the fraction of native binding contact (Qb) in log scale are applied to reveal unfolding and unbinding kinetics. Time evolutions of logQb, logQf for apo and bound of NCBD and TAD, and logQf for the complex are shown in Figure 5. This figure suggests that the tertiary unfolding and unbinding can be fitted well by a single exponential function, indicating first order kinetics in the NVT ensemble at high temperature (498K). The fitted kinetics data were listed in Table 1. The kinetics analysis shows that the unbinding half-time is 2.91860.028 ns, the tertiary unfolding half-time is 1.26060.016 ns for bound NCBD, 1.73260.010 ns for the complex, and 1.53960.014 ns for bound TAD. This indicates that the tertiary unfolding of NCBD and TAD is slight faster than the unbinding between NCBD and TAD, respectively. The half-time of tertiary unfolding for apo-NCBD and apo-TAD was 0.72460.010 ns and 1.47360. 020 ns. This suggests that the tertiary unfolding of bound NCBD and TAD is slightly postponed upon the binding of each other.

Transition State Analysis
Based on the unfolding kinetics, the complex unfolds via a twostate process [41]. Therefore, a transition state corresponds to the energy-maximum along the reaction coordinate at high-temperature. Experiments have supported that the transition state gets close to the native state with kinetically and thermodynamically distorted [42,43,44]. Therefore, the Ca root mean square deviations (RMSD) between the high-temperature simulated complex and the NMR structure were first analyzed and shown in Figure 6. There are three plateaus in this figure. The first plateau between 500 and 800 ps represents a distorted structure from native state. The second plateau between 2000 and 6000 ps is a state with the structure sharply unfolding. The last plateau from 7000 to 10,000 ps corresponds a unfolded state. Between the first and the second plateau, the RMSDs have a significant improvement with 4 Å deviations corresponding to the loss of native tertiary contacts and the broken of hydrophobic core.
According to the analysis of RMSD, the structures corresponding to the energy-maximum comprise the transition sate. It is found that transition state occurs early to form a distorted structure of the native state. [45] The RMSD analysis is shown that the structures near 960 ps significantly change after temperature equilibration. This indicates that the structures around 960 ps might comprise the transition state ensemble.
In order to confirm the transition state, the conformer cluster [23] was analyzed to identify different states along trajectory at high temperature. Figure 6 illustrates the cluster analysis for the complex. Two dimensional projection of RMSD [24,45] approximates the deviations of conformational space along the whole trajectory. All points are sequentially connected at intervals of 20 ps to form three distinct clusters. The first cluster ranges between 1 and 960 ps, the second one between 2340 and 6180 ps, and the last one between 6880 and 10,000 ps. These time spans of clusters are consistent with those of plateau regions. The first cluster includes the initial and the rapid structural deviations. This suggests that weak activation energy is sufficient for the initial structure rearrangements. Therefore, the activation free energy for the structural changes around 960 ps is the major barrier. After transformation from transition state, the structures enter into the second cluster. The last cluster is comprised of the structures that have lost most native tertiary contacts and represent the The same procedures were applied to both apo-NCBD and apo-TAD, and yielded their transition states. Figure 6 also illustrates the average structures of transition states for apo-NCBD and bound NCBD. It is found that the transition state structure of bound NCBD is more native-like than that of apo-NCBD. In fact, the native hydrophobic contacts within bound NCBD are 32.76% and apo-NCBD with 27.59%. The average structures of transition states for both apo-TAD and bound  TAD are also shown in Figure 6. The long loop between helices a4 and a5 is extended in bound TAD, mainly because the helices a4 and a5 surround the broad hydrophobic cleft formed by three helices of NCBD. Similar to NCBD, bound and apo-TAD are also denature-like. This is consistent with the NMR experiments [10].

F-value Prediction
F values have been widely used by theoretical and experimental works to identify the key residues for protein folding [46,47,48]. The structures of transition state ensemble are used to predict the F values and shown in Figure 7. This figure illustrates that the F values of helix a1, a2, a3, a4, and a5 between bound and apo states have not significant different. To our surprise, the main difference is focused on the loop regions (residues Pro74 to Ala86) between helices a4 and a5 and relative larger F values in bound TAD. In addition, residues Met87, Asp88, Leu90 and Leu92 have larger F values. This is consistent with the experimental observation that these residues tend to form a distorted helix for the complex [10]. All predicted F values can be confirmed by experiments.

Recognition Mechanism
Conformational selection and induced fit are two widely used models to interpret the recognition between intrinsic disordered proteins [52]. According to the conformational selection paradigm, various conformational ensembles explore the free energy landscapes corresponding to diverse stable unbound states in equilibrium. During the binding process, the favorable conformation compatible with binding selectively stabilize, and the populations of conformational ensembles shift towards stabilizing state [53,54,55,56]. However, the induced fit scenario proposes that the favorable conformation results from significant changes of unbound ensembles upon allosteric binding [57,58,59,60]. It is worthy to point out that conformational selection and induced fit models cannot be distinguished absolutely [61]. Indeed, some systems involve kinetic elements of both mechanisms [62,63].
For this system, we analyze the structural deviations of NCBD and TAD upon binding. The possible magnitudes of conformational selection and induced fit [36] are calculated to reveal the recognition mechanism. To explore the recognition mechanism, the average RMSD deviations of bound conformation and apo conformations are analyzed as a function of distance from the centroid of binding partner and shown in Figure 8. This figure illustrates that the RMSD variation gradually increases until to the global level. This suggests that there is an induced fit far away for the binding site. Similar results are found for TAD (shown in Figure 8B).
To address the statistical significance for differences of deviations between these two systems, two sample Kolmogorov-Smirnov test [37] is used to calculate the P value for each distance group. Figure 8C illustrates the median of P values and the fraction with P,0.1 for all 100 pairs of NCBD conformations in each distance group. It is found that the median P values are typically smaller than 0.1 in most distance group, especially in some larger distance group with median P values approximates 0. The conformations with P,0.1 exceed 50% in most distance groups. This suggests that the bound NCBD is significant different from the apo conformation far away from the binding site and the differences are statistically significant. Similar results are found for 100 pairs of p53 TAD conformations. Based on the RMSD and pvalue analysis, the recognition between intrinsic disordered NCBD and TAD might obey an induced fit.
Average RMSD and KS test analysis suggest the possibility of induced fit in the recognition between intrinsic disordered NCBD and TAD. Next natural question to ask is if there is any global conformational selection and the relative magnitudes of induced fit and conformational selection in the NCBD-TAD recognition. The histograms of conformational frequency for induced fit and conformational selection are used to evaluate the relative magnitudes. Parameter D represents the probability-weighted difference between conformational selection and induced fit at global and local regions and is shown in Figure S1B. This figure suggests that the magnitude of conformational selection is quantitative larger than that of induced fit at global level, and smaller than that of induced fit at local level. This indicates that NCBD might adopt global conformational selection and local induced fit upon TAD binding. Furthermore, the global magnitude of conformational selections is also larger than that of induced fit and local magnitude of conformational selection is also smaller than that of induced fit for TAD. It reveals that TAD might also obey global conformational selection and local induced fit mechanism. These findings are consistent with the previous works that the intrinsic disorder protein [64] obeys conformational selection and induced fit mechanism [11,61].
Conformational selection overall offers little benefit to enhance protein stability upon binding. Interestingly, this is also observed in comparing unfolding half times of corresponding unfolding simulations. The unfolding half time of bound TAD (1.539 ns) is comparable to that of apo-TAD(1.473 ns). The unfolding half time of bound NCBD is also similar to that of apo-NCBD. The transition states between bound and apo states for NCBD and TAD are similar and denature-like. The W-values of helices a1, a2, a3, a4, and a5 between bound and apo states have not significant different. Therefore, high-temperature unbinding kinetics data further support a global conformational selection mechanism in the NCBD-TAD interactions. The folded state confirms the local conformational change (shown in Figure 3). The results suggest that the significant differences for NCBD and TAD are mostly focused on C terminal region. The free energy landscape for bound and apo states also confirms local conformational difference. Overall, these results support the existence of local induced fit for the recognition between NCBD and TAD.

Conclusions
Molecular dynamics (MD) simulations are used to study the recognition mechanism between intrinsic disordered proteins. The average RMSD values between bound and corresponding apo states and Kolmogorov-Smirnov P test analysis indicate that TAD and NCBD may follow a global conformational selection mechanism. Quantitative analysis indicates that the magnitude of conformational selection is more pronounced than that of induced fit interaction at global level. The local magnitude of conformational selection is smaller than that of induced fit for TAD and NCBD. This suggests that both TAD and NCBD have local conformational optimization. Therefore, the recognition between TAD and NCBD might follow global conformational selection and local induced fit. These conclusions are further supported by high-temperature unbinding kinetics and room temperature landscape analysis. These methods can be used to study the recognition mechanism of other intrinsic disordered proteins.