Decipher the Mechanisms of Protein Conformational Changes Induced by Nucleotide Binding through Free-Energy Landscape Analysis: ATP Binding to Hsp70

ATP regulates the function of many proteins in the cell by transducing its binding and hydrolysis energies into protein conformational changes by mechanisms which are challenging to identify at the atomic scale. Based on molecular dynamics (MD) simulations, a method is proposed to analyze the structural changes induced by ATP binding to a protein by computing the effective free-energy landscape (FEL) of a subset of its coordinates along its amino-acid sequence. The method is applied to characterize the mechanism by which the binding of ATP to the nucleotide-binding domain (NBD) of Hsp70 propagates a signal to its substrate-binding domain (SBD). Unbiased MD simulations were performed for Hsp70-DnaK chaperone in nucleotide-free, ADP-bound and ATP-bound states. The simulations revealed that the SBD does not interact with the NBD for DnaK in its nucleotide-free and ADP-bound states whereas the docking of the SBD was found in the ATP-bound state. The docked state induced by ATP binding found in MD is an intermediate state between the initial nucleotide-free and final ATP-bound states of Hsp70. The analysis of the FEL projected along the amino-acid sequence permitted to identify a subset of 27 protein internal coordinates corresponding to a network of 91 key residues involved in the conformational change induced by ATP binding. Among the 91 residues, 26 are identified for the first time, whereas the others were shown relevant for the allosteric communication of Hsp70 s in several experiments and bioinformatics analysis. The FEL analysis revealed also the origin of the ATP-induced structural modifications of the SBD recently measured by Electron Paramagnetic Resonance. The pathway between the nucleotide-free and the intermediate state of DnaK was extracted by applying principal component analysis to the subset of internal coordinates describing the transition. The methodology proposed is general and could be applied to analyze allosteric communication in other proteins.


Introduction
Biomolecular machines, as motor proteins [1] and ATPdependent molecular chaperones [2], undergo a major conformational change upon ATP binding and hydrolysis. In many cases, the ATP molecule regulates the protein function by an allosteric mechanism [3][4][5][6][7]: binding of ATP in the ATPase domain of these proteins transmits a conformational change to a distant binding site where the substrate binding propagates an analogous conformational change towards the nucleotide-binding site. The exact biophysical characterization of the mechanisms of propagation of a conformational change induced by ligand binding through a protein remains a major challenge in structural biology [8,9]. In biomolecular machines, as the ATP-dependent molecular chaperones [10], crystallographic data of the different nucleotidebound states do not provide generally a complete description of the mechanism by which the protein undergoes a conformational change, because the pathway between the different conformational states and the dynamical information are missing [5].
In recent years, NMR, among other techniques, contributed to the characterization of allostery and protein dynamics [8,9,11] including nucleotide-dependent conformational changes in molecular chaperones [12,13]. Conformational changes induced by a ligand can be mediated by perturbations of the mean protein conformation (enthalpic effect) as well as by changes of its fluctuations and dynamics (entropic effect) [5,9,[14][15][16]. Both enthalpic and entropic effects are encompassed in the free-energy landscape (FEL) of a protein [17]. In solution, the FEL is best regarded as a multi-dimensional surface with multiple local minima separated by barriers [17]. Different protein-bound states correspond to different FELs, with different shapes and distributions of populations of conformers [4,5]. Hence, the nucleotide binding and the nucleotide release in a biomolecular machine redistribute its conformational substates and modify its FEL [18].
In many cases, how the dynamics contributes to the conformational transition at the different time-scales of the protein motions and how the FEL is modified upon nucleotide binding remain elusive. Nowadays, standard all-atom molecular dynamics (MD) simulations [19][20][21] represent a powerful tool, complementary to experiments, for investigating the conformational sampling and nucleotide-dependent conformations of biomolecular machines [22][23][24][25]. The fully unbiased all-atom MD simulations permit to monitor the sequence of spontaneous conformational changes at the atomic level [26] and to translate the conformational fluctuations in terms of modified protein FEL. Although unbiased MD simulations of a fully solvated protein may extend to the millisecond range [27] with specific hardware [28], the current state-of-art of unbiased MD simulations correspond typically to 100 ns up to 1 ms time-scale for a system of several thousands of atoms in explicit solvent with the standard computational capabilities available to most of the research groups [29][30][31]. However, transition between active and inactive states in proteins occurs typically on the microsecond to the millisecond time-scales [32]. The complete conformational transition between different nucleotide-binding states of a molecular chaperone in fully unconstrained all-atom MD is thus still inaccessible [31]. To overcome this time-scale restriction, different computational strategies were developed to shorten artificially the time-scale of the conformational transitions. Targeted MD [33], accelerated MD [34], metadynamics [35] to cite a few allow the acceleration of the conformational sampling of all-atom MD by adding a bias potential. An alternative strategy consists to eliminate the fast degrees of freedom of the protein from the MD calculation by coarse-graining the protein structure. A recent application to the 70 kDa heat-shock protein (Hsp70), a key ubiquitous molecular chaperone [36], allow us to observe a short-life time full transition between an ATP-like state and an ADP-like state using a coarsegrained model where the solvent and the nucleotides were implicit [37]. However, current coarse-grained models of proteins lack the description of the crucial explicit interactions between the nucleotide and the protein which can be only described accurately at the atomic level.
In spite the fact that all-atom unbiased MD simulations are not able to follow the full conformational change of a large ATP-dependent protein, as Hsp70 analyzed in the present work, MD simulations probing the structural fluctuations around the end points of a conformational transition are useful to understand the conformational changes. Normal mode analysis (NMA) has been also extensively applied to study the conformational transitions in proteins because the directions of the low-frequency vibrational modes provide the direction of the largest deformation at thermal equilibrium and can serve as collective coordinates to describe the conformational changes [38,39]. Indeed, intrinsic low-frequency vibrational dynamics of proteins in different conformational states correlates with the structural changes between the different conformations induced by ligand binding or by protein-protein interactions [16,[40][41][42][43][44].
Aside the limited sampling issue of standard all-atom MD simulations of large conformational changes upon ligand binding, another issue is to extract from these data relevant information about the mechanism of propagation of a small perturbation at one ligand binding site to a different remote binding site. To get insight into the mechanisms, several analytical tools were applied to ATP-dependent conformational transitions such as the analysis of contact maps [45] and of the correlations between the residue motions [22]. In NMA, relevant information is gained by projecting the direction of the low-frequency modes of a given conformation on the shortest path between the two end structures of a conformational transition [1,23,[46][47][48][49]. None of these approaches is easily related simultaneously on the FEL of the protein and to mutagenesis experimental data.
In the present work, we addressed these two issues (MD sampling and relation between FEL and biochemical data) for the conformational change induced upon nucleotide binding to a molecular chaperone Hsp70 by using rather extensive unbiased MD simulations of the protein in three different nucleotidebinding states; nucleotide-free, ADP-bound and ATP-bound. Hsp70 is a key molecular chaperone which assists in the correct folding and refolding of proteins in all prokaryotic and eukaryotic organisms [2,36,[50][51][52][53]. Hsp70 s contribute to other crucial cellular processes including the protein degradation, the translocation of peptides across the cell membrane, the formation of protein complexes, and the inhibition of the programmed cell death (apoptosis) [54][55][56][57]. Among the Hsp70 s, human Hsp70 (hHsp70) has attracted great interest because of its demonstrated implications in numerous misfolding diseases [58] (Parkinson, Alzheimer…) and in cancer [59]. Hsp70 s of all species share common structural features and it is hypothesized that they perform their biological functions in a similar manner. Most mechanistic information about hHsp70 was derived from comparisons with the highly homologous bacterial E. coli Hsp70 (DnaK) (see Table S1). It should be mentioned however that a high homology of sequence between two proteins does not guarantee that their allosteric mechanisms and residues involved are similar [60]. In the present work, the calculations and analysis were performed for DnaK because a large set of structural and biochemical data are available (see Results and Discussion sections for a comparison with the present results and their implications for hHsp70). In particular, the allosteric communication in Hsp70 s, i.e. how the binding of a ligand in the nucleotide-binding domain (NBD) modifies the conformation of the remote substrate-binding domain (SBD) of the protein, was extensively studied for DnaK [12,[61][62][63][64][65][66][67][68][69]. The structural model used in the present work is the two-domain structure of ADPbound DnaK (residues 4-603, missing the C-terminal part) solved by NMR (PDB ID: 2KHO) [70]. Our first aim was to probe the conformational dynamics and the structural perturbations induced upon exchange of ADP by ATP and by the removal of

Author Summary
The precise biophysical characterization of the mechanisms of the protein conformational changes controlled by a nucleotide remains a challenge in biology. Molecular dynamics simulations of proteins in different nucleotidebinding states contain information on the nucleotidedependent conformational dynamics. However, it is difficult to extract relevant information about the conformation-induced mechanism from the raw molecular dynamics data. Herein, we addressed this issue for the major ATP-dependent molecular chaperones Hsp70 s, which contribute to crucial cellular processes and are involved in several neurodegenerative diseases and in cancer. To function, Hsp70 undergoes several conformational changes controlled by the state of its nucleotidebinding domain. We demonstrated that the analysis of the effective free-energy landscape of the protein projected along the amino-acid sequence and computed from the molecular dynamics simulations of Hsp70 in different nucleotide-binding states, holds the key to identify the key residues of the conformational induced pathway. Identification of the key residues involved in the propagation of the structural changes induced by ATP binding offer alternative druggable specific sites other than the ligand binding clefts. The methodology developed for Hsp70 is general and can be adapted to any ligand induced conformational change in proteins.
ADP from this NMR model of E. coli DnaK on the micro-second time-scale at the atomic scale.
The sampling issue of the unbiased MD simulations of E. coli DnaK was addressed by considering two trajectories with different initial conditions of the protein in each of its nucleotide-binding states and more important by restricting the data to a subset of the conformational degrees of freedom for which the convergence between the two trajectories was reached. This was achieved by projecting the FEL of the protein along its amino-acid sequence by computing the free-energy profiles (FEPs) along the coarse-grained dihedral angles (CGDAs) c defined from four consecutive C a atoms of the protein (Fig. S1) and describing the backbone motions [71,72]. Only the FEPs which were similar in the two trajectories of the protein in the same nucleotide-binding state were kept for the analysis of the allosteric communication, we named these FEPs the DATA. No information can be exploited from the other FEPs: they may or not be relevant to the conformational fluctuations of the protein (NO DATA set). With this approach, we were able to quantify how each FEP varies along the amino-acid sequence depending on the nucleotide-binding state of the protein and to relate the FEP modifications to biochemical data of wild-type or mutant E. coli DnaK chaperones. Indeed, the comparison between the DATA sets of FEPs for E. coli DnaK in different nucleotidebinding states lead finally to a subset of only 27 key dihedral angles involved in the conformational nucleotide-dependent transitions. The fluctuations of these 27 CGDAs c on their FEPs are subdiffusive [71][72][73]. The coupling between the CGDAs c can be revealed by analyzing the correlation of their motions using the dihedral principal component analysis (dPCA) [74] allowing the definition of collective motions to which these degrees of freedom contribute the most, in a similar manner to principal component analysis in Cartesian coordinates [75,76] or to the full-correlation analysis based on the information theory [77]. The study of the FELs was complemented by an analysis of the long-life time contacts between residues present in ATP-DnaK trajectories and not present in the trajectories of the protein in the other nucleotide-binding states.
The paper is organized as follows. Results are presented in the next section and are compared to available experimental data. The structure of Hsp70 s and the results of the unbiased all-atom MD simulations for E. coli DnaK are shown. There, we described and compared to available experimental data an intermediate state found by MD upon ATP-binding, named ATP*-Hsp70, where the two domains of the protein are docked on a submicrosecond time-scale. The construction of a map of persistent residue contacts in ATP-bound DnaK conformational dynamics is described and compared to available experimental data. Next, the methodology to cope with the limited sampling of the present unbiased MD simulations and to extract the key residues from the analysis of the FEPs along the amino-acid sequence is presented, as well as its application to the MD simulations of ATP-bound DnaK. Then, a comparison between the key residues extracted from this new methodology with biochemical and structural data on the allostery in Hsp70 s is discussed in detail. The paper ends with a short conclusion where the major results are summarized and with the Methods section.

Conformational changes of E. coli Hsp70 (DnaK)
Conformations of Hsp70 s from experiments. All homologous Hsp70 s share a common architecture composed of a 45 kDa N-terminal NBD [78] which catalyzes ATP hydrolysis, and of a 25 kDa C-terminal domain [79] which is divided into SBD, which binds to non-native or unfolded substrate proteins, and a C-unstructured terminal domain ( Fig. 1 A). The N-terminal 45-kDa NBD is divided into two rather symmetrical lobes, I and II ( Fig. 1 A), each divided into two subdomains A and B [78]. The SBD is composed of a b-sandwich subdomain (SBD-b), which has a hydrophobic peptide binding pocket, and of a a-helical ''lid'' subdomain (SBD-a), which regulates the access of the substrate to the SBD-b [79]. The NBD and the SBD are connected by a conserved hydrophobic linker [80].
Hsp70 s occur in two main conformations to perform their chaperone function, we named closed conformation (with ADP bound to the NBD or without nucleotide = APO, Fig. 1 A) and open conformation (with ATP bound to the NBD, Fig. 1 B). In the nucleotide-free Hsp70 and in ADP-bound Hsp70, the SBD is assumed to be closed with low binding and release rate of the protein substrate. However, recent experiments shown that the complete closure of the SBD is not strictly required in the chaperone protein interactions [81]. In absence of ATP, the SBD and the NBD are undocked and the inter-domain linker is exposed to solvent, as shown by low-resolution Small Angle X-ray Scattering (SAXS) data [82,83], Förster resonance energy transfer (FRET) data [84,85] and by the two-domain Hsp70 NMR derived-structure (PDB ID: 2KHO) [70] (Fig. 1 A). In ATP-bound Hsp70, the SBD is open with fast binding and release of the protein substrate, and the SBD and NBD are docked, as shown by SAXS data [82,83] and FRET data [84,85] and suggested by the X-ray structure of an ATP-Hsp110 homolog [86,87] and by a new open conformation of an ATP-bound DnaK mutant (PDB ID: 4B9Q) [69] stabilized by introducing disulphide bridges at specific positions deduced from the structure of the ATP-Hsp110 homolog ( Fig. 1 B).
The transition between the open and the closed structural states of Hsp70 is triggered by the ATP binding to the NBD which induces a conformational change of the inter-domain linker and the opening of the SBD [61,62], through an allosteric mechanism which is not fully elucidated [12,[61][62][63][64][65][66][67][68]. The ATP hydrolysis restores the closed conformation of the SBD [59,60]. In vivo, the ATP hydrolysis and the nucleotide exchange rates are increased by co-chaperones [88].
Unbiased MD simulations of the APO and ADP-bound DnaK reveal that the SBD is moving freely around the NBD whereas MD simulations of the ATP-bound DnaK reveal the docking of the SBD onto the NBD. To get insights into the mechanism of the conformational dynamics of E. coli DnaK, we focused here on the spontaneous conformational transition, i.e. from the closed to the open conformation ( Fig. 1 C), induced by the binding of ATP to the NBD of DnaK in absence of protein substrate [61,62]. We performed 2 unbiased all-atom MD simulations in explicit solvent with different initial conditions in each nucleotide-binding state of the NBD, i.e. nucleotide-free (APO state), ADP-bound and ATP-bound for a total simulation time of 1.9 ms. The six MD trajectories (APO1, APO2, ADP1, ADP2, ATP1, and ATP2) are described in the Methods section.
First, MD simulations of the nucleotide-free and the ADPbound DnaK revealed that the SBD is moving freely around the NBD. Indeed, Fig. 2 clearly shows that the different conformational sub-states of APO (Fig. 2 A) and ADP-DnaK (Fig. 2 B) correspond to different orientations of the SBD relative to the NBD. Structures of E. coli DnaK upon ADP-binding and without nucleotide are elongated with no strong interactions between the two main domains, as expected by comparison with experimental data and particularly SAXS experiments [83]. In addition, Figs. 2 A and 2 B agree with the hypothesis of a SBD diffusing around a cone, as suggested from residual dipolar coupling NMR data of E. coli DnaK [70]. By contrast, MD simulations of ATP-bound DnaK reveal conformational changes. Indeed, significant structural changes and particularly the docking of the SBD onto the NBD (more precisely on the lobe I of the NBD) were found upon the binding of ATP (Fig. 2 C) in the two MD runs ATP1 and ATP2. In addition, differences between the MD runs ATP1 and ATP2 were observed. Actually, it seems that there are (at least) two different binding positions of the SBD onto the lobe I of the NBD, more precisely there are two orientations of the SBD, bound to the NBD (Fig. 2 C). The time-scale of the present MD simulations is too short to observe the full conformational change between the ADP-bound and the ATP-bound states of Hsp70: in other words, the opening of the substrate binding domain was not observed (Fig. 2 C). However, we found that the ATPbinding to the NBD of E. coli DnaK induces the docking of a closed SBD onto the NBD (we named this intermediate state ATP*) whereas this docking was not observed for ADP-bound DnaK and for APO-DnaK.
Comparison of ATP*-DnaK structure with experimental and MD data. In spite of the fact that the SBD of ATP*-DnaK remains closed on the time-scale of the present MD simulations, its structural properties are compatible with SAXS [82,83], as shown in Fig. 3. In all SAXS experiments [82,83,89], the structure of ATP-Hsp70 was more compact than the structure of ADP-Hsp70: the difference between their radius of gyration, DR g = R g (ADP)2Rg (ATP), was about 5 Å for E. coli DnaK [83] and between 2 and 7 Å for bovine Hsc70 (bHsc70) [82,89]. In our MD simulations of E. coli DnaK, DR g varies between 3.6 Å and 6.8 Å in good agreement with SAXS data. The distribution of the distance r between two heavy atoms, P(r), in ATP-bound DnaK and ADPbound DnaK computed from MD simulations are also in good agreement with the distribution P(r) of Hsp70 homologs measured by SAXS and with the distribution P(r) of E. coli DnaK computed from the experimental X-ray [69] and NMR [70] resolved 3D structures (Fig. 3). In MD, P(r) has only one peak at about 30 Å for ATP-DnaK ( Fig. 3 A), in agreement with the distribution PLOS Computational Biology | www.ploscompbiol.org extracted from the SAXS data of ATP-bHsc70 [82] (Fig. 3 B) and from the X-ray data of ATP-DnaK [69] (Fig. 3 A). On the contrary, the functions P(r) extracted from SAXS data of ADP-bHsc70 [82] (Fig. 3 B) and from NMR data of ADP-DnaK [70] (Fig. 3 A) and the one calculated from the MD simulations of ADP-DnaK (Fig. 3 A) show two peaks: one at 30 Å and a distinct shoulder at 60 Å , reflecting the bilobal shape of the protein in this nucleotide-binding state.
The intermediate structure of ATP*-DnaK is different from the very recent crystallized triple mutant ATP-bound DnaK (E47C, T199A, F529C) in an open conformational state with a SBD-a covalently bound to the NBD through a disulphide bridge 47C-529C (PDB ID: 4B9Q) [69]. In ATP*, as in ATPbound DnaK (Fig. 1 B), the linker forms a b strand interacting with the loop between the strands b2 and b2' of the SBD-b. However, the position of the linker in ATP-bound DnaK is shifted relative to the one in ATP*: a contact is formed between D393 and I418 in ATP-bound structure whereas a contact is formed between D393 and N415 in ATP* (see next subsection) [69]. Recent NMR studies suggest that DnaK occur in at least three states: a closed conformation (ADP-bound with a flexible linker as the NMR-derived structure [70]), an open conformation (ATPbound with the linker bound between the subdomains IA/IIA of the NBD and the NBD/SBD docked as in the crystallized structure [69]), and an intermediate allosteric active state (where the closed SBD bound to a substrate and the NBD domains interact with the linker bound to the NBD). This intermediate state is however different from ATP*-DnaK found in MD as it occurred when both ATP and the protein substrate are bound to the NBD and to the SBD, respectively.
The intermediate state ATP*-DnaK was the most frequent SBD/NBD transient docked state found in our recent intensive coarse-grained MD simulations of E. coli DnaK using a completely unrelated force-field [37]. On the contrary, in the recent all-atom MD simulations of DnaK by Chiappori et al. [25] using the same force-field used in the present work, the authors did not observe the docking of the SBD onto the NBD. They did find however a motion of the SBD towards the NBD (Fig. 2 in Ref. 25) which resembles to the one leading to the ATP*-DnaK intermediate state found here. It is worth noting however that the number and the duration of the MD simulations in Ref. 25 were much smaller than the present ones: the trajectories were typically of about 50-100 ns of duration with the longest duration being 225 ns for an ATPbound DnaK trajectory [25]. The absence of ATP* in the calculations of Ref. 25 may arise simply from the too short duration used in this earlier work [25] for the random initial condition chosen.

Specific contacts in ATP-bound DnaK
In order to characterize the interdomain interactions leading to the ATP*-DnaK intermediate state ( Fig. 2 C), as well as the interactions between the nucleotide and the NBD of E. coli DnaK, the contact maps (CMs) of ADP and ATP-bound DnaK and the contact map difference (CMD) ATP/ADP were computed for the MD trajectories of E. coli DnaK in these two different nucleotidebinding states (see Methods section for the calculations of CM and CMD).
The CM of ATP-bound DnaK ( Fig. S2 A), which is composed of 1994 contacts, contains only 6 interdomain contacts (contacts between NBD/linker, linker/SBD or NBD/SBD): R167-L390, R167-L391, L391-T417, L391-P419, L392-T416 and D393-N415 ( Fig. 4 A). The CM of ADP-bound DnaK has no interdomain contacts ( Fig. S2 B), which means that the interdomain contacts are unique to the ATP* intermediate state, as revealed by the CMD ATP/ADP (Fig. S2 C). In MD, this major difference about the interdomain contacts in ADP-bound and in ATP-bound DnaK arises from a different interaction between the NBD and the linker in these two nucleotide-binding states, as shown by the different distances R167-L390 and R167-L391 in these two states ( Table 1).
The comparison between MD results for ADP-bound and ATPbound DnaK shows that the communication between the NBD, the linker and the SBD involves the residue R167 of the subdomain IA of the NBD, the residues L390 and L391 of the linker and the residues 415-419 of the subdomain b of the SBD. These residues were identified to be crucial for the interdomain communication in DnaK. Indeed, the mutation R167A induces a loss of allosteric coupling between the NBD and the SBD [64] and the mutations K414I [90] and N415G [91] eliminate and attenuate allostery, respectively. The residues L390, L391 and L392, which belong to the VLLL motif of the linker (highly conserved between the different species of Hsp70 [67]), was shown essential for E. coli DnaK's in vivo functions [67]. In addition, it has been proposed that ATP binding to the ATPase domain causes a bending of the C-terminal part of the linker of E. coli DnaK at D393, in order to facilitate an insertion of the linker between the NBD and the SBD [64]. In the present MD simulations, the coupling between the NBD and the SBD confirms the insert of the linker upon ATP binding between the two domains through the interactions R167/L390 and D393/N415.
The contacts between ATP and the NBD were identified (  The analysis of the contact maps provides only a static representation of the MD data and of the influence of ATP binding on the conformations of E. coli DnaK. The modifications of the protein dynamics induced by the nucleotide and the correlations between the different structural modifications along the DnaK amino-acid sequence are not easily interpreted from this sole analysis. As stated in the introduction, the FEL holds the key to understand how a nucleotide modifies the properties and populations of the protein conformers. To provide the link between the FEL and the global conformational change observed in ATP-bound MD trajectories, we analyze next the free-energy profiles (FEPs) of the CGDAs c (see Fig. S1 and Eq. 1) along the amino-acid sequence of E. coli DnaK.
Analysis of the main-chain FEPs along the amino-acid sequence to detect key residues involved in the conformational changes In order to identify which residues are involved in the nucleotide signal propagation along the backbone of E. coli DnaK, we examined which of the 597 CGDAs c have a FEP depending on the nucleotide-binding state of the NBD. However, the FEPs of DnaK in a given nucleotide-binding state computed from the present unbiased MD trajectories are effective FEPs. The effective FEP differs from the actual FEP, which is an equilibrium thermodynamic property and should be computed from a large ensemble of trajectories of the protein in the same nucleotide state. However, the size of the system simulated (about 500000 atoms) and our computing capabilities limited the number of trajectories to only two trajectories of several hundred of nanoseconds per nucleotide-binding state of the NBD. Because of this limited sampling issue, only the FEPs which are similar in different trajectories in a given nucleotide-binding state were considered.
Convergence of the FEPs.     Table 2). The combination of the three different comparisons reveals common CGDAs c for which the FEP is significantly influenced by the nucleotide-binding state of the NBD (  (Fig. S3). It means that c 387 has a specific FEP in ADP-bound DnaK or in nucleotide-free DnaK but another FEP in ATP-bound DnaK (Fig. 6). We named hereafter c 387 an ATP-specific angle. Another example is c 98 which has a FEP weakly influenced by the nucleotide in the comparison APO/ATP  Fig. S3. It means that c 98 has a specific FEP in ATP-bound DnaK or in nucleotide-free DnaK but another specific FEP in ADP-bound DnaK (Fig. S4). We named hereafter c 98 an ADP-specific angle.
Finally, 7 CGDAs c are APO specific (for example c 213 , Fig. S4), 6 CGDAs c are ADP specific, 9 CGDAs c are ATP specific and 5 CGDAs c are specific to each different nucleotide-bound state (for example c 465 , Fig. S4). These 27 CGDAs c ( Fig. 7 and Table 3 (Fig. 7). In other words, the modification of the (unknown) FEL of DnaK upon ligand binding [4,7] can be represented here quantitatively by its projection over 27 dimensions. Obviously, these 27 coordinates are not independent and the local motions of the CGDAs c upon nucleotide-binding are coupled to each other. In order to understand how the binding of ATP to the NBD of DnaK induces the docking of its SBD onto its NBD, we established the relation between these 27 local motions c and the global motions (domains) of ATP-DnaK by applying dPCA [74].
The dPCA was carried on the subset of 27 CGDAs c using the MD trajectories of ATP-DnaK (see Methods section for more details). Results for the MD trajectory ATP1 are shown (the results for the MD trajectory ATP2 lead to the same conclusions [92]). Only modes 1 and 2 contribute significantly to the MSF of the 27 CGDAs. Indeed, for the MD run ATP1, l 1 contributes 74% of the total covariance of the system and l 2 8% whereas for the MD run ATP2, l 1 contributes 85% of the total covariance of the system and l 2 3%; justifying to use only the two collective modes 1 and 2 as the minimal representation of the free-energy landscape of ATP-DnaK.
The projections of the vectors u n (t) (Eq. 4) extracted from the MD trajectory along the eigenvectors of modes 1 and 2 defined the collective coordinates dPC 1 (t) and dPC 2 (t), respectively (Eq. 7). From the two-dimensional PDF of dPC 1 and dPC 2 computed from the MD trajectory, i.e. P(dPC 1 , dPC 2 ), the effective FES V 1,2 = -k B T ln[P(dPC 1 , dPC 2 )], shown in Fig. 8, was calculated. The largest motions defined by the first and second dPCs can be interpreted in terms of specific features of the two-dimensional FES V 1,2 because the dPCA method is converged in the subspace of the 27 dihedral angles considered (see Methods section and Refs. [93] and [94]). The FES of the MD run ATP1 shows 5 global minima A i (Fig. 8) defined from the isolines of free-energy (#3 k B T). The minima A i are ordered according to the chronology of the MD trajectory,   from i = 1 to 5. The most probable structure within each minimum A i of the MD run ATP1 is shown in Fig. S5: the structure corresponding to the minimum A 1 is an elongated structure, without interactions between the NBD and the SBD and closely similar to the initial one (PDB ID: 2KHO) [70]. In the basin corresponding to the minimum A 2 , the SBD rotates around the linker and get close to the NBD but without strong interactions between the two main domains. In the structure of the global minimum A 3 of the MD run ATP1, the SBD is docked to the lobe I of the NBD (Fig. S5). Note that the lid of the SBD (SBD-a) is docked in the subdomain B of the lobe I of the NBD. Structures corresponding to the minima A 4 and A 5 , are characterized by the docking of the two domains, as observed in the minimum A 3 , but with local conformational changes in the SBD (for example of the CGDA c 504 , Fig. S6).
In the representative structure of the global minimum of the MD run ATP2, as shown in Fig. 2, the SBD is docked to the lobe I of the NBD but in a different position, compared to the global minimum of the MD run ATP1. Although the two end points of the ATP-bound DnaK trajectories are different, the ground-state (on the time-scale of our simulations) is identical in the space of the subset of CGDAs with a similar FEP. Indeed, each of the 27 CGDAs c is in the same local conformation in the structures corresponding to the global minimum of the FESs computed from the MD runs ATP1 and ATP2, as shown in Fig. S6.
Because the dPCA catches the most important fluctuations occurring in a MD trajectory, the application of the dPCA analysis to the APO and ADP-bound DnaK trajectories produce different behaviors. The FES of the ADP-bound DnaK trajectories (computed from the dPC 1 and dPC 2 on the 27 dihedral angles) shown no transition state between the initial and final conformations of the protein [92]. The FES of the nucleotide-free (APO) DnaK trajectories (computed from the dPC 1 and dPC 1 on the 27 dihedral angles) shown three basins: a small basin corresponding to the initial state, a transition state, and a final basin corresponding to the final state [92]. The transition state corresponds to an intermediate state where the SBD starts to rotate on a cone due to a change of the linker conformation [92]. Free-minimum energy path on the two-dimensional FES of ATP-bound DnaK. The initial A 1 and final A 5 conformations are located at the bottom of two ''basins'' of the FES (Fig. 8) which can be connected by a pathway of minimum energy (PME) (Fig. 9 A). The PME for the transitions A 1 RA 2 RA 3 RA 4 RA 5 on the FES of the MD run ATP1 was computed as described in the Methods section. The PME was divided into 74 steps (from s 0 to s 73 ) with minima A 1 , A 2 , A 3 , A 4 and A 5 at respectively s 0 , s 14 , s 36 , s 63 and s 73 ( Fig. 9 A and B). The PME shows 4 saddle points for each transition: at s 9 for the transition A 1 RA 2 ; at s 18 for the transition A 2 RA 3 ; at s 54 for the transition A 3 RA 4 and at s 68 for the transition A 4 RA 5 . The largest free-energy barrier along this pathway is observed for the transition A 3 RA 4 (Fig. 9 B) and is larger than 4.0 k B T (<2.5 kcal/mol). From the FES ( Fig. 9 A), we extracted the coordinates of the bin (dPC 1 ; dPC 2 ) of each step s j along the PME. Each step s j of the PME, which is a bin of the FES (see Methods section), represents a number N(s j ) of frames of the corresponding MD trajectory. For example, the step s 1 corresponds to the bin (dPC 1 = 21.53; dPC 2 = 0.64) and to N(s 1 ) = 87 frames of the MD run ATP1. From the N(s j ) frames, we computed the 1-D FEPs V j (c i ) of the 27 CGDAs c used in the dPCA, and we defined the value c i (s j ) as the minimum of the FEP V j (c i ) (the most probable value of the CGDAs c i at each step s j ). Finally, for each of the 27 CGDAs c i , we projected the trajectory c i (s j ) along the PME and along its corresponding one-dimensional FEP V(c i ) (shown in Fig. S6) to visualize the coupling between the local and the global collective motions described by dPC 1 and dPC 2 (as shown in Fig. S7). The CGDAs c with the largest changes along the PME are the CGDAs having multiple-minima FEPs (except for c 387 ) (Fig. S7), corresponding also to those having a significant influence (see Eq. 5 and Fig. S8) in the dPC modes [D i 1 (mode 1) and D i 2 (mode 2).10%], i.e. c 77 , c 98 , c 213 , c 387 , c 407 , c 464 , c 504 and c 519 . In addition, others CGDAs c i show smaller changes along the PME, for example c 250 , c 432 , c 444 , c 465 or c 492 . In total, these 13 c i CGDAs correspond to 49 residues.
At s 0 , corresponding to the minimum A 1 on the FES (Fig. 8), the structure is still elongated and without interactions between the NBD and the SBD and is closely similar to the initial structure (Fig. S9). From s 0 to s 9 (corresponding to the saddle point of the transition A 1 RA 2 ), we observed the rotation of the SBD around the linker (Fig. S9), due to the relaxation of c 98 , c 465 and c 492 towards the minimum of their FEPs (Fig. S7), involving a reorganization of subdomains IB of the NBD and of the SBD-b (Fig. S9). The system goes through this saddle point from s 9 to s 14  shown with gray (green for the most probable one) and red diamonds, respectively. The PME is represented by the red line. B: Free-energy V PME along the PME for the transition A 1 RA 2 RA 3 RA 4 RA 5 . C: Results of the clustering along the PME. Each frame is represented at the corresponding time t observed in the MD trajectory. doi:10.1371/journal.pcbi.1003379.g009 as a consequence of jumps of c 213 , c 387 and c 432 to local minima of their FEPs and last but not least c 504 jumps to its most probable conformation (Fig. S7). These changes involve a large rotation of the SBD of ATP-DnaK around the linker, which becomes close to the NBD (Fig. S9) at s 14 , corresponding to the minimum A 2 . The rotation of the SBD is thus related to the relaxation of dihedral angles c 98 , c 465 and c 492 involving a reorganization of subdomains IB of the NBD and of the SBD-b (Fig. S9). The angle c 98 is located within the NBD and coupled to the 27 other dihedral angles, in particular to c 12 built on the residues T11, T12, N13 which are in contact with ATP (Fig. 4) providing a physical link between the SBD rotation and the ATP/NBD interactions. For the transition A 2 RA 3 (from s 14 to s 36 ), c 387 , c 444 and c 519 change their conformation (Fig. S7), allowing the system to go through the saddle point at s 18 , which involves a rotation of the subdomain SBD-a of ATP-DnaK, which becomes close to subdomain IB of the NBD (Fig. S9). Then, the system relaxed to a state where the SBD is finally docked to the lobe I of the NBD (Fig. S9) at s 36 with changes of c 77 , c 213 , c 250 , c 387 and c 432 (Fig. S7), and adopted its most stable structure in the MD trajectory ATP1, corresponding to the global minimum A 3 of the FES (Fig. 8).
In the MD trajectory ATP1 (on the contrary to trajectory ATP2, data not shown), the system did not end in its most stable state (A 3 ) but escaped from this minimum because of changes of c 213 , c 464 and c 465 which jumps to local metastable states of their FEPs, which allows the system to climb the highest energy barrier (4 k B T) along the PME at s 54 (Fig. 9 B). The system goes through this saddle point of the transition A 3 RA 4 at s 54 with changes of c 98 and c 407 (Fig. S7) which jump to local minima of their FEPs and the system relaxed to the minimum A 4 . The structure corresponding to the minimum A 4 is still characterized by a docking of the two domains (Fig. S9) but with local changes in the SBD, particularly in the SBD-b. Finally, the last transition which occurred is the transition A 4 RA 5 , with a saddle point at s 68 (Fig. 9  B). The system goes through the saddle point by a change of c 407 coupled with a change of c 504 up to the second most probable state along the PME, namely A 5 (Fig. S7). The structure in A 5 is characterized by the docking of the SBD on the NBD but with structural changes within the SBD and particularly within the SBD-b compared with the representative structure of the minimum A 3 (Fig. S9). Combination of the global (dPC) and local (c) coordinates allow us to decipher the mechanistic process by which the protein undergoes a conformational change upon ATP binding along the PME.
Another point to figure out is the relation between the PME on the two-dimensional FES and the MD trajectory or in other terms, the relation between the step s j of the PME and the time t of the different events observed in the MD trajectory. As explained in Methods section, each step s j of the PME is characterized by a number of frames N(s j ) of the MD trajectory. So we extracted and plotted the time t for each frame i, i = 1 to N(s j ) (Fig. 9 C). Starting from the minimum A 1 , the MD trajectory reaches the docked state, which corresponds to the minimum A 3 after 150 ns. The basin corresponding to this minimum is explored during the major time period of the MD trajectory (up to 400 ns). The ''X'' shaped of the graph during this period (between 150 and 400 ns) shows us that it exists different attempts to escape this minimum A 3 (there are in fact three entrances/exits into/from this state) but this only happens when all the conditions given previously in the text about local conformational changes are reached (at t = 400 ns). As shown in Fig. 9 A, it is due to the existence of the highest barrier between the minima A 3 and A 4 . Finally, the system reaches the minimum A 5 at t = 500 ns and stay in the corresponding basin up to the end of the trajectory.

Discussion
The analysis of the FEPs, as shown in the previous section, revealed the 27 CGDAs c of E. coli DnaK influenced by the nucleotide-binding state of the NBD (Table 3). The 91 residues associated to these 27 CGDAs c (Table 3) are involved in the longrange (on about 5-10 nm) propagation of the conformational fluctuations between the NBD and the SBD of E. coli DnaK. Starting from c 12 in the NBD, which corresponds to residues in direct contacts with the ATP nucleotide, the signal induced by the nucleotide binding is propagated up to c 524 , in the SBD, through c 387 , located in the inter-domain linker (Fig. 7). Experimentally, Xray crystallography, NMR spectroscopy, biophysical and biochemical techniques have contributed to understand such an allosteric communication in Hsp70 s (Table 3). A large comparison between the residues elucidated from the analysis of the FEPs similarity computed from the MD simulations (Table 3) and the experimental database of key residues relevant for the communication in DnaK available in the literature, shows that most of the residues identified by our methodology have been found to be experimentally relevant for the allosteric communication in Hsp70 s and that new residues can be proposed to play a role according to the present MD calculations.
In more detail, the comparison in Table 3 can be summarized as follows. In the NBD, mutation of residue T13 in bovine Hsc70 (T11 in E. coli DnaK) to S13 completely abrogated the communication between the NBD and the SBD [95]. Residue T13 was indeed found from the analysis of the FEP of c 12 , which belongs to the nucleotide-binding pocket (Fig. 7). Mutation in bHsc70 of residue K71 (K70 in E. coli DnaK), which is the only residue of subdomain IB in contact with the ATP molecule, affects the ATP hydrolysis [96] and also abrogated the ATP-induced conformational changes determined by SAXS experiments [89]. In the network extracted from our analysis, several CGDAs c, having a significant role for signal propagation, belong to subdomain IB of the NBD and particularly c 65 and c 77 , which enclosed residue K70 (Fig. 7). In addition, fluorescence of W102, the only tryptophan residue of E. coli DnaK [97] is directly dependent on the ADP/ATP conformational changes in the NBD and was identified from the analysis of the FEP of c 102 in the present MD simulations. In the NBD, the interface between the subdomains IA and IIA plays a key role in Hsp70, acting as a cleft for the binding of the interdomain linker [12,65,86] in the ATPbound state (open structural state) of Hsp70 [65]. The analysis of the FEPs of DnaK reveals the important role of CGDA c 213 , which is precisely localized in subdomain IIA, and of the CGDA c 387 , which is part of the linker (Fig. 7).
The linker is the physical link between the NBD and the SBD and was identified early as a key region for the allosteric communication in Hsp70 s [64,65,67,98]. The linker is flexible and solvent exposed in the nucleotide-free and in the ADP-bound DnaK and is docked in the IA/IIA cleft of the NBD in the ATPbound state. In the present MD simulations, the CGDA c 387 belonging to the linker occurs in a specific conformation in ATP-DnaK (Fig. 6), confirming that ATP binding lead to a conformational change in the linker region (even if the SBD stays closed in the present ATP-bound DnaK trajectories).
Finally, the dynamics of the subdomain IIB of the NBD, and particularly its rotation, is known to be allosterically relevant [12,66,69,99]. The rotation of the subdomain IIB occurs upon nucleotide exchange from ADP to ATP and the binding of ATP involves the opening of the nucleotide-binding cleft relative to an ADP-bound NBD. These findings are confirmed by our analysis of the FEPs which revealed that c 249 , c 250 and c 253 are influenced by the nucleotide; these CGDAs c are localized in the loop between two a-helices of the subdomain IIB of the NBD for which large changes in the NMR chemical-shifts between the ADP-bound and the ATP-bound NBD of E. coli DnaK were observed [12] Most of the CGDAs c located in the SBD of DnaK influenced by the nucleotide-binding state of the NBD are located in the SBD-b subdomain (Fig. 7). These CGDAs are related to residues previously identified experimentally. Indeed, c 408 , c 432 and c 444 involve residues M408, D431 and E444 which modify the substrate binding and affinity to E. coli DnaK [100,101]. In addition, residue I462, belonging to c 463 , has been shown to be allosterically relevant due to the fact that the dual DnaJ and peptide-binding defects were observed in I462T [102]. In addition, amide hydrogen exchange experiments [98] revealed that the binding of ATP in the NBD of Hsp70 involves local structural changes in the segments 413-437 (which encloses the CGDA c 432 found in the analysis of the FEPs) and 439-457 (which encloses the CGDA c 448 found in the analysis of the FEPs) of E. coli DnaK (Fig. 7).
Other CGDAs c revealed by our analysis (see Table 3 and Fig. 7) such as c 98 , c 339 , located in the NBD, c 490 and c 492 , located in the SBD-b, or c 504 , c 508 and c 524 located in the SBD-a provide new hypothesis for the residues relevant for the allosteric communication in Hsp70. Interestingly, the residue N98 contacts G506 in the crystal form of ATP-bound DnaK [69]. The mutation V337F, a residue preceding the residues defining c 339 strongly reduces the capacity of DnaK to refold luciferase [103]. Finally, c 524 includes the residue D526 forming a highly conserved ionic pair with R445 which stabilizes the SBD-a/SBD-b interactions through an ionic network including also K446 (precedes c 448 , Table 3), D518 (belonging to c 519 , Table 3) and D450 (belonging to c 448 , Table 3) [104]. Two GCDAs c are located at the interface between the SBD-b and the SBD-a, namely c 504 and c 508 and one is located in the kink between the helices A and B of the SBD-a, namely c 524 (Fig. 7). This demonstrates that even if the SBD remains closed in ATP*-DnaK, local conformational changes within the SBD were observed in MD.
The conformational changes of the SBD induced by ATP binding in MD can be related to recent experimental data. Indeed, EPR spectroscopy [81] has been recently used to investigate the conformations of the SBD-a relative to those of the SBD-b of E. coli DnaK. Distance distributions between the residues E430 and R547 of the nucleotide-free DnaK and of ATP-bound DnaK were extracted from the EPR spectra (Fig. 10). The E430/R547 distance distribution measured in nucleotide-free DnaK shows two interacting spin populations, one around 12 Å and one with a broader population between 16 Å and 20 Å (Fig. 10 A, black  arrows). Only a small fraction of the labeled residues was not interacting (fraction of non-interacting spins, f NI = 0.09), corresponding to residues distant by more than 20 Å . Upon ATP binding, the short-distance population disappeared and almost 80% of the spin labeled residues was separated by more than 20 Å , whereas in 20% of the molecules, the distance was in a range similar to the nucleotide-free state (Fig. 10 C and D, black arrows).
To compare with EPR data [81], we computed the probability distribution of the distance between the centers of mass of the side chains of E430 and R547 from the MD trajectories of APO-DnaK and ATP-DnaK. Up to 20 Å , the distance distribution of nucleotide-free DnaK shows two populations; a broad distribution centered at 14.6 Å and a narrower distribution centered at 18.7 Å (Fig. 10 A). Note that the MD simulations provide information on the E430/R547 distance distribution for distances inaccessible to EPR (.20 Å ): in the case of the nucleotide-free DnaK, a third population is observed and is characterized by a broad distribution centered at 24.7 Å (Fig. 10 C). The fraction of distances larger than 20 Å in MD of the nucleotide-free DnaK (f NI ) is about 45%. Upon ATP binding, the population of the shortest distance observed in the nucleotide-free DnaK (centered at 14 Å ) disappears (Fig. 10 B), as observed in EPR experiments, and the distance distribution shows two peaks centered at 20.5 Å and 23.1 Å , respectively (Fig. 10 D). Consequently, the fraction of distances larger than 20 Å increases up to 73%, as observed in EPR (Fig. 10 C and D). These MD results show that the addition of ATP increases the distances between the SBD-b and the SBD-a even if the SBD remains closed. In addition, the MD simulations provide a possible structural origin for the disappearance of the population of molecules with the shortest E430/R547 distance upon ATP-binding in the experiment. Indeed, we performed a clustering of the E. coli DnaK structures within each subpopulation of the distance distributions shown in Fig. 10 B and D and computed the FEPs of each of the CGDA c selected previously and which belong to the SBD (see Table 3). From these calculations, we found that only the FEP of c 504 was different in each subpopulation of molecules having a different E430/R547 distance ( Fig. 10 E and F). In the APO state, the structures belonging to the shortest distance population have a most probable value of c 504 equal to 71.5u whereas it is 55.5u for the structures in the 2 largest subpopulations (Fig. 10 E). In the ATP state, c 504 moves more freely compared with the APO state ( Fig. 10 F), meaning that the lid is more mobile in this nucleotide-binding state compared with the nucleotide-free state, which could explain that the shortest distance population distribution observed for the nucleotide-free DnaK disappears for the ATP-bound DnaK.
A statistical analysis of 926 Hsp70 sequences pointed to interdomain sectors that might mediate the allosteric communication between the NBD and the SBD of Hsp70 [91]. Although it is unclear that the allosteric mechanism and residues can be extracted from the sole comparison of sequences [60], we found that these sectors correspond to key CGDAs found from the present analysis of the FEL of DnaK. The sectors comprise the  Table 3). In total, 19 residues (involved in 10 CGDAs) out of 91 residues (involved in 27 CGDAs) revealed by the present MD simulations were also found by a statistical analysis of Hsp70 sequences [91]. Due to the importance and relevance of hHsp70 in diseases, it is instructive to examine to which extent the residues identified in the allosteric mechanism of DnaK by MD are conserved in hHsp70. In order to answer this question, we performed the alignment of the sequence of E. coli Hsp70 (Uniprot ID: P0A6Y8) on the sequence of the human inducible Hsp70 (Uniprot ID: P08107) ( . In addition, at least one residue of each CGDA found to be dependent on the nucleotide-binding state of the NBD of DnaK is common between human and the E. coli Hsp70 s. However, one emphasizes that the conservation of the residues involved in the conformational changes of DnaK do not necessarily guarantee that the allosteric mechanisms of DnaK and hHsp70 are similar [60]. Residues involved in the dynamics of the conformational change of DnaK are also largely conserved in hHsp70. In DnaK, 49 residues are involved in the collective modes 1 and 2 computed from the dPCA of the MD trajectories as identified by their influence (Eq. 5) D i 1 (mode 1) and D i 2 (mode 2) (Fig. S8) and by their dynamics along the PME (Fig. 9). To identify these residues in hHsp70, we aligned the sequences of DnaK and hHsp70 (Table  S1) and reported the results of the alignment in Table 4. On 49 residues identified in this manner, 2 are gaps and 33 are positives ( Table 4). As stated in the introduction, the low-frequency vibrational modes of proteins are generally correlated with the reaction coordinates of their conformational changes [49]. Based on all-atom normal modes of hHsp70 [23,49], we identified 64 residues having the largest influences in the low-frequency vibrational modes [92] of hHsp70 which are compared to those deduced from dPCA in Table 4. Comparison between the NMA and dPCA shows that 12 residues are identical in both methods, namely F78, K100, K250, K251, G408, V409, A467, P468, R469, K493, S494 and T495 ( Table 4). The most important differences between the two approaches were observed in the subdomain IIA of the NBD, in the linker and in the SBD-a where we did not find residues common to the two analyses. The differences observed are expected because NMA describes the structural fluctuations of the protein in the vicinity of the minimum of the free-energy basin characterizing its native state (harmonic approximation) whereas the dPCA analysis include jumps between minima of the protein FEL.

Conclusion
In summary, all-atom MD simulations of Hsp70-DnaK in different nucleotide-binding states in explicit solvent were performed. We observed the docking of the SBD of DnaK onto the lobe I of its NBD upon ATP-binding: the structure found is an intermediate plausible ATP-bound state, named ATP*, which is compared to several experimental data. A strategy was presented in which the FEL computed from unbiased MD simulations is represented by FEPs of CGDAs of the main chain along the amino-acid sequence of the protein. The FEPs can be quantitatively compared in different nucleotide-binding states by using a similarity index. The analysis of the FEPs allowed us to identify a small network of 27 CGDAs c. The coupling between these 27 CGDAs was deciphered by using dihedral principal component analysis. The conformational change induced by ATP binding was represented by a pathway of minimum energy on the FES built from the two lowest dihedral principal components of the ATPbound DnaK trajectories. The 27 coordinates correspond to 91 residues which are involved in the interdomain communication upon nucleotide binding to DnaK. Most of these 91 residues revealed by MD were shown experimentally to be relevant for the communication between the NBD and the SBD in the Hsp70 chaperone cycle. The present work also suggests 26 new key residues that could be tested experimentally. For example, the residues A503 to G506, defining c 504 , belonging to the kink between SBD-a and SBD-b, seem to play a key role in the dynamics of the docking as well as for the intrinsic dynamic of the SBD. We found that the conformation of these residues influences the distribution of the distance E430/R547 explaining the structural origin of the variations of this distance observed in EPR upon ATP binding to E. coli DnaK [81]. The strategy developed to analyze the long-range effect of ATP binding on the structure and dynamics of E. coli DnaK is general and could be applied to decipher allosteric communication in other proteins.

MD simulations
All unbiased all-atom MD simulations in explicit water of E. coli DnaK using the NMR-derived structure (PDB ID: 2KHO in which the missing atoms of ARG and TYR were added) [70] were carried out with the GROMACS software package [105] and the GROMOS96 ffG43a1 force field [106,107]. The time step used in all simulations was 0.001 ps and the list of neighbors was updated every 0.005 ps with the 'grid' method and a cut-off radius of 1 nm. The coordinates of all the atoms in the simulation box were saved every 2 ps. The initial velocities were chosen randomly. We used the NPT ensemble with a cubic box. The temperature and pressure were kept to the desired value by using the Berendsen method [108] and an isotropic coupling for the pressure (T = 300 K, t T = 0.1 ps; P0 = 1 bar, coupling time t P = 0.5 ps). The electrostatic term was computed by using the particle mesh Ewald algorithm [109] (with a radius of 1 nm) with the Fast Fourier Transform optimization (on 140 points for DnaK, with an order equal to 4 for the interpolation). The ''cut off'' algorithm was applied for the non-coulomb potentials with a radius of 1 nm. The system was warmed up for 40 ps and equilibrated for 600 ps with lower restraints, finishing with no restraints at 300 K. We performed 6 MD runs, 2 in each nucleotide-binding state (APO1, APO2, ADP1, ADP2 and ATP1, ATP2), as described in the following subsections. The stability of the protein structure in each MD trajectory was monitored by computing the Root-Mean-Square-Deviation (RMSD) between the molecular structure every 2 ps and the initial structure (as done before [23], Fig. S10 A). To reduce the computational cost, the APO and ADP-bound DnaK MD runs were stopped when the RMSD was converged to a constant value with a standard deviation lower than 0.5 Å (Fig.  S10 B) during at least 50 ns, resulting in six MD runs of different durations. The 2 MD runs of ATP-DnaK were continued after convergence of the RMSD up to 0.5 ms in order to explore a possible opening of the SBD which did not occur however on that time-scale.
MD runs of APO-Hsp70. The initial structure of the nucleotide-free DnaK (PDB ID: 2KHO) [70] was solvated in a cubic box with 136508 SPC water molecules keeping a minimum distance of 0.9 nm between the solute and each face of the box. We used periodic boundary conditions with an initial value of the box side of 16.112 nm. The charge of the system was neutralized by adding 25 Na + counter-ions. The energy of the system was first optimized with the ''steepest descent minimization'' algorithm and then by using a ''conjugate gradient'' algorithm. We performed two runs with different initial conditions. The total simulation time was 151 ns for the MD run APO1 and 360 ns for the MD run APO2.
MD runs of ADP-Hsp70. The topology files of the ADP and ATP molecules were constructed with the PRODRG program (version 2.5 BETA) [110]. We added one ADP molecule and one divalent ion (Ca 2+ ) to the nucleotide-free DnaK (PDB ID: 2KHO) [70]. The initial coordinates of the ADP molecule were extracted from those of the ADP molecule in the X-ray structure of the isolated NBD of ADP-bound hHsp70 (PDB ID: 1HJO [111]) after Table 4. List of residues found to be relevant for conformational dynamics of human Hsp70 deduced from NMA of human Hsp70 [49,92] and from dPCA of E. coli MD trajectories (present work) after sequence alignment of E. coli DnaK on human Hsp70 (Table  S1).
Residues common to the two analyses are in bold. alignment of the backbone atoms of the residues 5 to 377 of the hHsp70 NBD on those of the DnaK NBD (PDB ID: 2KHO) by using the VMD software [112]. The protein was solvated with 136488 SPC water molecules [113] in a cubic box of the same dimensions as in the MD runs APO (see above), and 26 Na + counterions were added to neutralize the system. The ADP-bound DnaK initial structures were relaxed during 600 ps with the protein structure restrained to the experimental structure (PDB ID: 2KHO) and with no constraints on the solvent and on the ligands (Dataset S1 in supporting information). We performed two MD runs ADP1 and ADP2 with different initial conditions. The total simulation time was 164 ns for the MD run ADP1 and 151 ns for the MD run ADP2. MD runs of ATP-Hsp70. The ATP-Hsp70 structure was built using the NMR-derived structure of E. coli DnaK (PDB ID: 2KHO) [70] with ADP and one divalent ion inserted as described above. The initial coordinates of the ATP molecule was built by adding a phosphate group to the ADP molecule in the free-space available within the NBD of the ADP-bound DnaK structure. The precise initial location of the third phosphate group was not crucial because the ATP-bound DnaK structure was relaxed during 600 ps with the protein structure restrained to the experimental structure (PDB ID: 2KHO) and with no constraints on the solvent and on the ligands (Dataset S2 in supporting information). We compared the ATP-bound construct relaxed after 600 ps with available X-ray structures of isolated NBD of hHsp70 bound to ADP and PO 4 (PDB IDs: 1S3X [114], 3AY9 [115], 3JXU [116]). The relaxation of the ATP molecule using the present force-field displaced the added phosphate group to a position close to the experimental position of the PO 4 group in the isolated NBD of hHsp70. Comparison with the new published X-ray full-length ATP-bound open structure (PDB ID: 4B9Q) [69], confirmed that the initial relaxed position of the ATP molecule (which mimic the ATP binding position in the full-length E. coli Hsp70 in a closed state) is realistic. The protein was solvated with 136485 SPC water molecules [113] in a cubic box of the same dimensions as in the MD runs APO and ADP (see above), and 27 Na + counterions were added to neutralize the system. We performed two MD runs ATP1 and ATP2 with different initial conditions. The total simulation time was 553 ns for the MD run ATP1 and 560 ns for the MD run ATP2. The MD runs confirmed that the very precise initial locations of the nucleotides are not so crucial. The X-ray structures provide indeed only a static picture of the binding mode of ADP and ATP. The NBD is dynamics and the nucleotides are fluctuating within the NBD pocket. In the ATP-bound Hsp70 MD simulations, we found that the terminal phosphate groups of ATP and ADP are fluctuating relative to the nucleic base with a RMSD varying between 2 Å and 9 Å for ATP and between 2 Å and 6 Å for ADP. The position of the nucleotide fluctuates relative to the NBD (which is itself dynamics) with a RMSD varying between 3 Å and 4 Å both for ADP and ATP MD trajectories.
Definition of the coarse-grained dihedral angles (CGDAs) c and of their free-energy profiles (FEPs) The conformational fluctuations of the main chain of E. coli DnaK were monitored by recording the fluctuations of the CGDAs c [71]. The CGDA c for a residue i is the dihedral angle formed by the vectors (virtual bonds) joining 4 successive C a atoms (i21, i, i+1 and i+2) along the amino-acid sequence (Fig. S1 A) [117]. The first dihedral angle along the sequence is c 2 and the last is c N-2 , where N is the total number of residues. Each CGDA c i is a local probe of the FEL along the amino-acid sequence of the protein [71].
The FEP of each c i is computed from the usual Boltzmann formula: where k B is the Boltzmann constant, T the temperature (300 K) and p(c i ) is the probability distribution function (PDF) of the CGDA c i computed from the MD trajectory (Fig. S1 B) [71]. Note that the experimental NMR structure of E. coli DnaK is comprised of 600 residues, starting from residue 4 to residue 603 [70], and is described by 597 angles c i , from index i = 5 to index i = 601.
To quantify the convergence between two FEPs of a same CGDA c i in two MD trajectories with the same nucleotide-binding state but with different initial conditions, we computed the similarity index H between their associated PDFs [72]: where p 1 (c i ) and p 2 (c i ) are two PDFs of the same CGDA c i computed from two MD trajectories 1 and 2.
The similarity index H varies between 0 (dissimilar) and 1 (identical). Two FEPs, v 1 (c i ) from the MD trajectory 1 and v 2 (c i ) from the MD trajectory 2, are considered similar (''converged'') if the index H is larger than 0.5.

Contact map (CM) and contact map difference (CMD)
Distances between all pairs of atoms of the 600 residues were calculated every 2 ps for each MD run (APO1, APO2, ADP1, ADP2, ATP1 and ATP2). The distance matrix between the residues was built by retaining only the smallest atomic distance between the atoms of each residue pair. If this smallest distance was smaller than 4 Å , the residues were said in contact (contact matrix element is 1) and not in contact otherwise (contact matrix element is 0). For each nucleotide-binding state (APO, ADP and ATP), only contacts with at least 50% of occupancy in the two MD runs of E. coli DnaK in the same nucleotide-binding state (for example ATP-bound DnaK) and for which the average distance was similar in the two MD runs (for example MD runs ATP1 and ATP2) within 0.5 Å were finally considered, defining the contact map (CM, Fig. S2 A and B) of a nucleotide-binding state (for example ATP). In addition, the difference between the contact maps, the contact map difference (CMD, Fig. S2 C) was computed. For example, the CMD ATP/ ADP is the difference between the contact matrices ATP and ADP defined above, and thus reveals the atomic contacts present in the ATP nucleotide-binding state of DnaK, but not in its ADP nucleotide-binding state.

Dihedral Principal Component Analysis (dPCA)
In dPCA [74], the correlated internal motions of a protein are quantified by a covariance matrix C ij [Eq. (3)] of the Cartesian components of the two-dimensional vectors u(t) [Eq. (4)] representing the CGDAs: where ,…. denotes the average over all sampled conformations and The diagonalization of the covariance matrix C ij gives 2N (N = number of CGDAs c eigenvalues l k , ordered by decreasing value: l 1 .l 2 .….l 2N and eigenvectors e k = {e 1 k ,e 2 k ,…,e 27 k } T , with e i k = {e i k (X);e i k (Y )} where e i k (X) is the Cartesian component x of the projection of the eigenvector e k on the i th CGDA (similarly for e i k (Y )). The contribution of each CGDA c i to a mode k is the so-called influence [74]: The mean-square fluctuation (MSF i ) of each vector u i can be decomposed into collective modes: The eigenmodes with the largest eigenvalues l k (named slow modes) correspond to the collective modes contributing the most to the MSF of the protein [see Eq. (6)]. Therefore, l 1 is the eigenvalue of the collective mode which contributes the most to the structural fluctuations of the CGDAs c i in an MD run [72,74]. Finally, the projection of the MD trajectory on the k th eigenvector e k is named the dihedral principal component dPC k , More details can be found in Refs. [72,74,118,119].

Free-energy surface (FES) from dPCA
The effective two-dimensional FES shown in Fig. 8 was computed from the two-dimensional probability density of dPC 1 and dPC 2 computed from the covariance matrix of the subset of the 27 dihedral angles involved in the conformational change of DnaK (all shown in Fig. 7). The two-dimensional FES computed from the covariance matrix of all dihedral angles is indeed irrelevant due to a lack of convergence sampling [94]. Indeed, some dihedral angles are in a diffusive regime. As shown in Refs. [93,94], the diffusive regime of the internal degrees of freedom of a protein can be measured by the cosine content (CC) of the principal components. A large value (close to 1.0) of the CC means that some of the degrees of freedom are fluctuating as the variables of free random walks on the time-scale of the present MD simulations. The FES built on principal components with CC close to 1.0 shows artifacts, i.e. basins which are signatures of the cosine shape of the principal components and not of an actual minimum of the FES [93,94,119]. We computed the cosine content of each principal component [92]. The values of the cosine content computed for dPC 1 and dPC 2 calculated from the covariance matrix in the space of the 27 dihedral angles having a converged FEP are very small (CC 1 = 0.11 and CC 2 = 0.16 for ATP1 trajectory for example). On the contrary, the cosine content of the principal component dPC 1 computed in the space of all dihedral angles is very large (CC 1 = 0.69 for ATP1 trajectory for example).

Pathway of minimum energy
The Fig. S11 A shows an example of a free-energy surface (FES) generated by projecting a MD trajectory on the essential plane defined by two principal components [Eq. (7)]. In this example, the FES has two minima A 1 and A 2 . The pathway that connects the two minima A 1 and A 2 by traveling along the ''valleys'' of the FES and going over the ''passes'' (which are saddle-points on the surface and correspond to the transition states) in order to minimize the free-energy difference while moving, is commonly referred to the Pathway of Minimum Energy (PME) [120]. To find the PME for the transition A 1 RA 2 , we applied the following methodology. First, an initial guess of the path is made, here the straight interpolation line d 1 from A 1 to A 2 . Secondly, we move from A 1 along the direction d 1 by 0.1 (which is the bin size defined for the calculation of FES) and draw a segment p 1 perpendicular to the direction d 1 . Note that the size of the segment p 1 is chosen to belong to a circle of diameter A 1 A 2 . Finally, we search along the segment p 1 the bin with the minimum of free-energy, named s 1 . This bin is also considered as the first step of the PME. We repeat the algorithm by constructing a straight interpolation line d 2 from s 1 to A 2 . The algorithm ends after N steps when the distance between s N and A 2 is smaller than the size of the bin defined for the calculation of FES. The PME corresponds to the ensemble {s j } j = 1,..,N and to the corresponding free-energy profile V PME (s j ) (Fig. S11 B).

Supporting Information
Dataset S1 Cartesian coordinates of the nucleotide and of the protein for the initial structures of the ADP1 and ADP2 trajectories (Protein Data Bank format is used). Figure S10 Convergence analysis of MD trajectories. A: C a RMSD with respect to the initial structure computed for the MD runs APO1 (red), ADP1 (blue) and ATP1 (green) up to 250 ns. B: Variance s of the C a RMSD with respect to the initial structure computed for the MD runs APO1 (red), ADP1 (blue) and ATP1 (green) on time windows of 10 ns and up to 250 ns. (EPS) Figure S11 Pathway of minimum energy (PME) for the transition A 1 RA 2 . A: FES generated for the MD run ATP1 from dPCA. Minima are shown with grey diamonds. The PME is shown with black cross and a red line. Interpolated lines d (dashed blue lines) and perpendicular lines p (dot-dashed green lines) are shown for the first three steps only. Empty black squares show the intersection between d and p. B: Free-energy profile V PME along the PME shown in panel A. (EPS)