• Loading metrics

On identifying collective displacements in apo-proteins that reveal eventual binding pathways

On identifying collective displacements in apo-proteins that reveal eventual binding pathways

  • Dheeraj Dube, 
  • Navjeet Ahalawat, 
  • Himanshu Khandelia, 
  • Jagannath Mondal, 
  • Surajit Sengupta


Binding of small molecules to proteins often involves large conformational changes in the latter, which open up pathways to the binding site. Observing and pinpointing these rare events in large scale, all-atom, computations of specific protein-ligand complexes, is expensive and to a great extent serendipitous. Further, relevant collective variables which characterise specific binding or un-binding scenarios are still difficult to identify despite the large body of work on the subject. Here, we show that possible primary and secondary binding pathways can be discovered from short simulations of the apo-protein without waiting for an actual binding event to occur. We use a projection formalism, introduced earlier to study deformation in solids, to analyse local atomic displacements into two mutually orthogonal subspaces—those which are “affine” i.e. expressible as a homogeneous deformation of the native structure, and those which are not. The susceptibility to non-affine displacements among the various residues in the apo- protein is then shown to correlate with typical binding pathways and sites crucial for allosteric modifications. We validate our observation with all-atom computations of three proteins, T4-Lysozyme, Src kinase and Cytochrome P450.

Author summary

Designing drugs which target specific proteins involved in diseases consumes a lot of time and effort in the pharmaceutical industry. In recent times, in silico design of drugs using all-atom molecular modelling has started to provide crucial inputs. Even so, discovery of binding pathways of small molecules both at the primary binding site, as well as sites for allosteric control, is time consuming and often fortuitous. We provide here a framework within which critical conformational changes likely to occur during binding are quantified from statistical analysis of configurations of proteins in their apo, or inactive form, greatly simplifying identification of target residues. We illustrate this idea by analysing ligand binding pathways for three proteins T4- Lysozyme, P450 and Src kinase, which are active respectively in the immune system, metabolism and cancer.


Modern drug discovery operates around the principle of molecular recognition in the context of protein-ligand binding. In the preliminary stages, drug discovery programs often invest in quantifying the binding affinity of suitable drug candidates towards a well-established protein target. However, such efforts often encounter scenarios where identification of the ligand-binding cavity on the protein target becomes challenging. In addition, screening across a large pool of protein receptor candidates as potential drug targets for a specific drug proves tedious and time consuming.

A popular and inexpensive approach for zeroing on a suitable combination of protein receptor and ligand has been drug design based on in silico techniques. Towards this end, macromolecular docking [15] based techniques are routinely employed for screening protein-ligand recognition partners. One of the major drawbacks of docking based approach has been the heuristic choice of the scoring function implemented in the docking programs to quantify and rank-order the protein-ligand affinity. Further, the other major criticism has been that in most of the cases the receptor cavity’s inherent flexibility is not taken into account for docking purposes. Rather, docking softwares [617] consider the protein cavity to be rigid and perform the search for a ligand’s ability to dock only in a limited area of protein, often pre-determined by the user’s intuition-based inputs. As a result, docking based drug discovery protocols are fraught often with failures in identifying the correct set of protein ligand combinations.

Molecular Dynamics simulations, with the recent emergence of Graphics-Processing-Unit (GPU) based computation and special purpose machine like Anton, has slowly emerged as an accurate approach to simulate the complete process of protein ligand binding in its atomistic details. The approach, majorly pioneered by Shaw and coworkers [1821] has been able to decipher the kinetic pathways leading to spontaneous recognition of the ligand by the receptor. This process is frequently being supplemented by judicious application of Markov State Model [2224] based techniques to infer long time recognition processes by shorter time simulations. These techniques being unbiased and devoid of user intervention, currently remain one of the best approaches to explore the protein ligand recognition at their highest resolution. But the process comes at the expense of high computational cost and extended wall clock time. As a result, a plan for practical usage of the Molecular dynamics simulation for the purpose of serious drug discovery is still in its infancy.

The current work offers a practical middle ground between expensive Molecular Dynamics simulation of protein-ligand recognition processes and inaccurate docking based approach by employing an idea from materials physics, which correctly accounts for deformations in protein structure relevant for binding. We use a formalism, introduced earlier to analyse atomic displacements in solids [2527], to predict the protein locations which are potentially susceptible towards ligand recognition. We build on our work under the basic premises that all external and internal stresses result in two types of mutually orthogonal deformations namely those which are “affine” i.e. expressible as a homogeneous deformation of the native structure, and those which are not i.e. “non-affine”. Further details are provided in figure A and section 1 of S1 text. As further elaborated in a later section, the current study characterizes the local arrangements of atoms inside protein by probing an important metric termed as “Non Affine parameter” (NAP) [25]. The approach is first convincingly validated by reproducing key conformational changes en route to ligand entry to a mutant T4 Lysozyme. This formalism is then used to propose potential regions which are implicated in “gate-opening” and capture the conformational dynamics during the binding process in the protein receptor Src kinase and Cytochrome P450. Finally, we show how spatial correlations of NAP can elucidate sites for allosteric control in these proteins.

For discussion on the effect of different coarse-graining volume on the eigen-spectrum refer section 6 of SI—Notes on the effect of choice of neighbourhood on NAP eigen-spectrum. c. Local structure of the T4Lys protein showing the neighbourhood analysed in b. The helices and β−sheets are in purple while the rest of the protein is in cyan. The dominant eigenvector is also shown in transparent purple (see also SI movies S1 & S2, description provided in section 4 of SI—Description of Videos).


We explore three independent protein-ligand systems. Most of the validations of our approach has been done on binding of benzene to the solvent-inaccessible cavity of the L99A mutant of T4 Lysozyme(T4Lys), a system which has served long as the prototypical protein ligand system for biophysical studies [2837]. Apart from the binding process of simulation, in the current work, we have also simulated the apo or ligand-unbound form of T4Lys (PDB id: 3DMV) and Src kinase (pdb id: 1Y57). Some data for Cytochrome P450 (pdb id: 2CPP) has also been shown.

Non affine parameter is a novel collective coordinate for protein structure

Atoms of an isolated protein molecule in aqueous solution and at non-zero temperatures undergo continuous random motion. For most proteins, these atomic motions may be analysed in terms of displacement fluctuations about some fixed native structure. Some of these fluctuations represent local affine distortions of the native state while others correspond to non-trivial conformation changes. It is the latter which are associated with functional aspects of the protein. Determining functionally relevant conformational changes is a challenge, especially if such distortions are subtle and have to be extracted from the random background noise. We show now how a collective variable may be defined, which does exactly that. This collective variable, which we call the non-affine parameter, was first defined for crystalline solids. In solids, non-affine displacements are obtained by systematically projecting out [25] trivial homogeneous deformations of atoms, capturing only those displacements responsible for irreversible plastic events [26, 27]. The NAP is the squared sum of the mean amplitudes of these displacements. In proteins NAP behaves as a collective coordinate responsible for revealing binding pathways and also for determining possible allosteric couplings between spatially distant residues.

Consider for example (see Fig 1a) an atom i and its neighbourhood Ωi which contains j = 1, nΩ neighbours. The size of Ωi is fixed. Too small a Ωi results in large fluctuations while making Ωi too large may average out the relevant signal and make it disappear. Typically, we use a neighbourhood which contains 50–100 atoms excluding H-atoms. The instantaneous positions of the atoms ri at any particular time step of MD simulation are compared with a set of fixed reference positions Ri taken, for example, from the (ligand-free) native state of the protein. NAP is then defined as the value of the quantity where the minimisation is over choices of the deformation matrix . NAP is therefore the least square error incurred in trying to represent an arbitrary displacement as an affine or homogeneous deformation of the reference configuration.

Fig 1.

a. Schematic diagram used for defining the NAP, χi for particle i (see text). The neighbourhood Ωi centred around i is shown as a sphere. The reference positions of the neighbours Rj are shown in light grey while the instantaneous positions rj are shown in dark grey. We have subtracted off the displacement of particle i (Ri = ri) for simplicity. b. The distribution of the eigenvalues λμ of PCP for three locations in the protein T4Lys, location I corresponds to a carboxyl C atom in helix 7, II corresponds to a carboxyl O atom in helix 5 and III corresponds to a Cα atom in a relatively inflexible portion of a β sheet. In each of the locations I and II, which are active during ligand binding, the largest eigenvalue is separated from the rest by a large gap Γi. The Γi is markedly smaller in location III which is relatively inert. Here, we consider the mode corresponding to highest eigen value to be the key mode. Here each set of eigenvalues are normalized by the maximum eigen-value to make it 1. That way Γi (the difference between the eigen-values) can be compared across the three different locations of the protein on equal footing.

The minimisation process is actually equivalent [25] to a projection. The instantaneous displacement of atom i is ui = riRi. Within the region Ωi, displacements of neighbouring particles relative to that of i are given by Δij = ujui. Next we define a projection of Δij onto the non-affine subspace using a projection operator P where P = I − R(RTR)−1RT, and the nΩ d × d2 elements of R,γγ = δαγRjγ′, taking i to be at the origin. Finally, one may show χ(Ri) = ΔT P Δ where we have used the definition Δ, as an nΩ d dimensional column vector constructed by rearranging Δij. This projection formalism can be carried out for any given set of {Ri}. This identification also allows us to compute statistical averages, probability distributions and spatio-temporal correlation functions using standard methods of statistical mechanics. For example, one derives [25] the equilibrium ensemble average 〈χ〉 = ∑μ λμ where λμ are the eigenvalues of the matrix PCP with C = 〈ΔT Δ〉 the local displacement correlator, and we have suppressed the particle index i for simplicity. Associated with NAP, we can also define the local NAP susceptibility 〈(χi − 〈χi〉)2〉 and the spatial correlation function 〈(χi − 〈χi〉)(χj − 〈χj〉)〉, in the same way as it is done for a crystalline solid [26]. Section 1 of SI, along with figure A, provides a summary of useful terms related to our analysis for the readers.

One of our important results concerns the distribution of the eigenvalues λμ. For crystalline solids we had observed [26] that (1) this distribution is non uniform and the largest eigenvalue is separated from the rest by a large gap and (2) the eigenvector corresponding to the largest eigenvalue represented the dominant plastic mode eg. the creation of a defect dipole. Quite interestingly, the situation is similar here. For details of NAP formalism refer section 1 of SI.

In Fig 1b we have plotted the relative magnitudes of the eigenvalues of the local PCP matrix for three different locations on the protein T4Lys. The first one (I) corresponds to a Carbon atom on a Carboxyl group on residue 132 (amino acid Asn) in a region of helix 7, the second one (II) on an Oxygen atom of a Carboxyl group on helix 5 corresponding to residue 105 (amino acid Gln), and the third one (III) for an α− Carbon atom within β sheet 1, belonging to residue 13, (amino acid Leu). We shall see later that regions I and II are both quite active during binding events while region III is relatively inert. The eigenvalue spectrum surprisingly bears an imprint of this behaviour. The active regions appear to be dominated by a single non-affine mode which is particularly soft. The eigenvalue for this mode is larger than the rest by a substantial amount producing a gap Γi in the non-affine spectrum. The inert regions of the proteins have more uniformly distributed eigenvalues. By looking at the structure alone, (see Fig 1C) it would have been impossible to guess this fact.

In Fig 2a & 2b we show that the distribution of the eigenvalues of the PCP matrix are not random. The soft modes that prove out to be useful in our study correspond to a set of eigenvalues which are quite well separated from the spectrum of the rest of the eigenvalues. In order to demonstrate this, in Fig 2a, we plot the distribution of the various eigenvalues of the PCP matrix for region (I) of T4Lys (Fig 1c), which is involved in the gate opening resulting from the movements of helices 7 and 9. This is fitted to the Marchenko Pastur (MP) distribution from the prediction from Random Matrix Theory [3840]. It is clear that the fit is not perfect and there are several eigenvalues which are separated from the rest by gaps. On the other hand, if the matrix PCP was constructed out of Δ which were completely random then we should have obtained the distribution enclosed under the red curve representing the MP distribution for our data. This is shown explicitly for a random correlation matrix in Fig 2b. The fact that we have a sparsely distributed tail of few relatively very large eigenvalues widely separated from the rest of the spectrum and these large eigenmodes contribute most towards the major conformational changes, clearly demonstrates that our analysis is not a straightforward strain-analysis [41]. Instead, indeed, there is an underlying physics that can be revealed by focusing on the non-affine subspace PΔ of the total displacement Δ. Non-affinity may play an important role in guiding a majority of the routine functions performed by the proteins.

Fig 2.

a. Distribution of the eigenvalues of the PCP matrix at a specific location in the protein T4Lys (blue histogram) compared with the fitted Marchenko-Pastur distribution (red curve). b. A similar fit for a truly random correlation matrix constructed out of vectors with the same number of components as in the first case but now taken from a Gaussian random distribution of zero mean and unit standard deviation.c. Three dimensional plot of the local NAP, χi, the gap Γi and the NAP susceptibility for all locations analysed for three different proteins of widely diverse structure and function viz. T4Lys (purple symbols), Src kinase (cyan symbols) and P450 (green symbols). The relation between all three variables for the three proteins is same within the scatter of the data. All the neighbourhoods analysed contained 50–100 atoms. Refer to section 5 of SI and figure C for detailed discussion on sensitivity of choice of neighbourhood-volume for detailed description on neighbourhood.

The relation between activity i.e. a large NAP susceptibility and gaps, Γi, appears to be an universal feature of all the three proteins, with very diverse structures and functions, studied by us. In Fig 2c we have plotted the NAP susceptibility Γi and NAP for several regions within three proteins T4Lys, P450 and Src kinase. In all of these proteins the three quantities are positively correlated with each other and the nature of the correlation (the slope of the curves) appears to be the same within the error bars for all the proteins.

We demonstrate next that regions of the protein which are involved in a binding event and show large NAP values during the process of ligand binding also have large NAP susceptibility in the apo form of the proteins without the ligand. The motions of the protein during binding events are also captured by the dominant non-affine eigenmode. Finally we show that large NAP correlations between spatially distant parts of the proteins point out domains of possible allosteric control.

Non affine parameter successfully tracks the ligand binding event in T4 Lysozyme

We first validate the ability of NAP to successfully trace the ligand binding pathway for a recently studied protein-ligand system by Mondal et al. [28], namely benzene binding to T4Lys. Briefly, Mondal et al. has recently captured the full details of the benzene binding pathways to the buried cavity of T4Lys using multi-microsecond long atomistic MD simulations. The resulting binding trajectories had identified more than one binding pathways. A key observation from those simulated trajectories was that the benzene binding to the solvent-inaccessible cavity of T4Lys did not require large-scale protein conformational change. Rather, the simulations discovered that it is the subtle displacement of helices of the protein which opens up productive pathways leading to ligand binding in the buried cavity.

Specifically, as plotted in the right hand axis of Fig 3a, a subset of the trajectories pinpointed that prior to benzene binding, the distance between helix-4 and helix-6 increases transiently by around 2–3 Å from its equilibrium value, facilitating benzene-gating in the cavity, before it reverts back to its original equilibrium distance after the binding event is complete. On the other hand, another subset of trajectories, spawned from same starting configuration, revealed an alternate pathway of ligand binding, where the transient increase in the distance between a different pair of helices namely, helix-7 and helix-9 triggers the ligand entry to the cavity, as depicted in Fig 3b (right hand axis). The snapshots shown at the bottom panel of the Fig 3a and 3b captures the location of the benzene relative to the helix gateway during the period of binding event. The three different snapshots labelled as 1, 2 and 3 respectively represent the helix conformations before, during and after the ligand binding event. The creation of a helix gateway prior to ligand binding (snapshot 2 in each case) is very evident from the increased inter-helix distance.

Fig 3. NAP as a good descriptor of ligand binding event: Correlation between time profile of inter-helix distance (blue curves right-hand axis) and corresponding NAP of the helices (red curve, left hand axis) during the process of ligand (benzene) recognition to the buried cavity of L99A T4 Lysozyme.

a. The trajectory involving helix4-helix6 gateway leading to ligand binding, b. the trajectory involving helix7-helix9 gateway leading to ligand recognition. Below each of these curves we also show the ligand (pink) and the protein (cyan) at three time slices with positions corresponding to (1) outside the gate, (2) within the gate and (3) after the binding event. The three time slices are also labelled (black dots) in the NAP vs time plots a. and b. Here, all the possible distances between helix 7 particles (excluding hydrogen) and helix 9 particles (excluding hydrogen) are calculated and then an average of all these values are taken to arrive at a singe point on the blue curve in figure 3b. Same goes for the blue curve in figure 3a. Also refer section 7 of SI and figure E for detailed discussion on sensitivity of main and side chains atoms on NAP Analysis.

We calculate the time profile of NAP, averaged over all heavy atoms involving these helices and compare its correspondence with corresponding inter-helix distance. As presented in Fig 3a, we plot the profile of average NAP involving helix-4 and helix-6 on the left hand axis and that of average distance between these two helices on the right hand axis, for the part of time period spanning the benzene entry to the cavity. We find that, both NAP profile and distance profile involving helix-4 and helix-6 follow almost identical trend during the course of ligand entry. Similarly, as shown in Fig 3b, for the other trajectory where ligand binding occurred via helix-7 and helix-9 gateway, time profile of NAP involving helix-7 and helix-9 reproduces the trend of distance fluctuation involving helix-7 and helix-9 quite accurately. The remarkable consistency between the NAP profile and displacement profile of key-profiles responsible for ligand binding suggests that these subtle helix fluctuations are results of non-affine displacements of local environment for each particle. Overall, these results validate the potential of NAP as a suitable collective variable or descriptor for tracking ligand binding kinetics.

NAP susceptibility of apo protein highlights potential ligand hotspots

The results discussed in previous section have validated the role of NAP as a good descriptor of a ligand binding event and its ability to track the ligand binding pathways. But the validation needed the extensive simulation of actual ligand binding events. As an important finding of the current work, in this section we show that by analysing the NAP susceptibility on a ligand-free (apo) protein itself, one can predict potential protein sites for facile ligand approach, otherwise known as “ligand hotspots”. Since MD simulations of the apo protein do not need to track rare binding events, these cost a small fraction of the time needed for simulating the full protein-ligand systems. Fig 4a and 4c illustrate the spatial distribution of computed NAP susceptibility of two protein systems in their ligand-free (apo) form, namely, T4Lys and Src kinase. For the purpose of comparison, we have also plotted the locations on protein surface which ligand frequently visits in previously carried out multi-microsecond long unbiased ligand-binding simulations (Fig 4b and 4d).

Fig 4. NAP susceptibility of apo proteins (a,c) and ligand binding hotspots (b,d).

a The spatial distribution of the NAP susceptibility shown as a colour map red (scaled to 100) to blue (0) for L99A T4 Lysozyme. b. In the adjacent panel, we show the same protein now together with snapshots of the ligand. Ligand snapshots in red represents configurations obtained from 4–5 μs and blue from 5–6 μs of a long MD trajectory. After 6 μs the ligand stays close to the binding gateway. Panels c and d show the corresponding plots for Src kinase. In c the colours have meanings same as in a. Panel d is reprinted with permission from Ref. [18], (copyright (2011) American Chemical Society) red represents configurations earlier than 15 μs while blue from 15-20 μs. Locations within the protein which are important for ligand binding are numbered and are described in detail in the text. Note that in both cases the regions which have large NAP susceptibility are also those where ligands spend most of their time during binding events.

Fig 4a represents the three-dimensional map of NAP susceptibility of T4Lys. Location 2 of T4Lys includes residues with very high NAP susceptibility (shown in red spheres). A comparison with corresponding location 2 of the time-trace of the ligand in Fig 4b, as obtained from unbiased ligand-binding trajectories, identifies this location as the native ligand binding site at the buried cavity near the 102nd residue (a part of helix 5). Further, apart from the final ligand binding location at the 102nd residue, the computed NAP susceptibilities around ligand-free T4Lys were also found to be quite high near locations 1 and 3, which are respectively the 105th (a part of helix 5) and 137th residues (gateway between helix-7 and helix-9). The residues 102 and 105 (which are parts of helix-5) act as hinge-centers or pivots about which the neighboring residues undergo relative re-arrangement opening up the gateway between helices 4 and 6. Similarly, the residue 137 (which is a part of helix-8) acts as a pivot for the gateway between helix 7 and 9. The residue 164 corresponds to location 4 in T4Lys which also has a very high NAP-susceptibility value as it is a terminal residue-part of random coil, mildly contributing towards re-arrangement in helix-9. Interestingly, as shown in Fig 4b, previously simulated independent binding trajectories have also revealed that the pathways leading to ligand entry to the buried cavity of T4Lys involve the helices near locations 1, 2 and 3. More over, the NAP susceptibility analysis recognises multiple locations with moderate NAP susceptibility (shown by green sphered residues), which are also regularly visited by ligands. Although away from major binding sites, these sites serve as possible precursors to intermediates leading to final molecular recognition. Taken together, the ability to correctly predict eventual ligand binding site and other accessory potential ligand hotspots at T4Lys in its apo form by measuring NAP susceptibility, without the need for prior knowledge of protein-ligand binding interactions, rates NAP susceptibility as a very promising metric.

The display of the hinge action which opens the gateway to the hydrophobic binding cavity is demonstrated by the supporting movie S1 (opening-up of helix-7 and helix-9) and S2 (opening-up of helix-4 and helix-6), see (S1 Text for further details). Note that these are not MD simulation snapshots but represent the dominant eigen-displacements arising from a single non-affine mode with the largest eigenvalue. The collective motion involves a “twisting” of the helices and is a very special linear combination of hundreds of local normal modes from which all affine (strain-like) distortions have been systematically projected out. We return to this issue later in the paper.

The ability of NAP susceptibility to predict the potential ligand hotspots on a protein surface is also validated in another popular receptor protein, namely the Src kinase. Fig 4c shows the NAP susceptibility map around Src kinase in its ligand-free form while the corresponding simulated ligand binding trajectories, as previously simulated by Shaw and coworkers, is shown in Fig 4d. In this case as well, the ATP-binding site (location 2) is correctly predicted to have one of the highest NAP susceptibilities in the apo-form and independent simulated ligand-binding trajectories also confirm this. Moreover, other locations of kinase rated as high to intermediate NAP susceptibility for ligand approach are also independently found to be locations regularly traced by ligand in the unbiased ligand binding trajectories by Shaw and coworkers [18]. Specifically, so-called PIF site, MYR site and G-loop site of kinase (PIF-site:region1, MYR-site:region4,), which were found to attract regular ligand visit in actual ligand binding trajectories, are also predicted to have high to moderate NAP susceptibility for the ligand in apo form of the kinase itself. However, interestingly, our NAP susceptibility analysis also predicts a new location 5 deemed highly susceptible to ligand, which is otherwise not known as a ligand hotspot. The discovery of this new site prompted further analysis to justify the observed high susceptibility for the ligand. We have also checked the effect of simulation lengths on the NAP susceptibility. Figure B plots the NAP susceptibility of all particles of src kinase computed over two different simulation lengths. We find that the results remain invariant over simulation time. We also note that these simulations on ligand-free protein, which are used for NAP analysis, are orders of magnitude shorter than the simulation time required for seeing the direct ligand binding events. In the next section, we show that the newly obtained regions of Src kinase which have high NAP-susceptibility also play an important role in establishing an allosteric communication across the protein structure. All the important sites with moderate and high NAP susceptibility values are reported in section 3 of SI.

NAP correlations and allostery

In the previous section, we have explored the NAP susceptibilities for different locations of the protein and using this as a metric we were able to extract the important ligand hotspots, in accordance with the simulated ligand binding trajectories. Similarly spatial NAP correlation function or covariance, calculated at the same time, as defined previously, measures the correlation between non-affine displacement fluctuations at two different, possibly distant, locations. We link this quantity to a measure of allostery in the protein systems i.e. the ability of spatially distal sites in proteins to influence catalytic or binding activity.

In Fig 5, we plot residue-residue NAP covariance matrix representing spatial correlation functions for our two protein systems, namely T4Lys (Fig 5a) and Src kinase (Fig 5b) respectively. On the two axes we have the residue ids and the value of the spatial correlation function between two residues is given by the colour of the point. High values above a suitable cut-off are in yellow and the rest are in black. We are interested in those off-diagonal residues that are spatially far away from each other, since spatially adjacent residues will be trivially correlated due to direct interactions. As shown in Fig 5a, we could get five prominent locations in the matrix for T4Lys which have high NAP correlation at a spatially far-off distance (see S1 Text for further details of these regions). For example, we found that a very high NAP correlation exists between residue 137 in region 3 and residue 18 in region 5 (labelled (3,5)) of T4Lys. This allosteric connection can be seen to be contributing towards regulation of the hydrogen bond between the residue 22 Glu and residue 137 Arg. Using a newly introduced method ExProSE, It was confirmed by Greener et al. [42] that, the breaking of a hydrogen bond between Arg137 and Glu22 causes the opening motion taking the protein from a closed active site cleft conformation to an open active site cleft conformation. Also, earlier, Mchaourab et al. [43] have confirmed that in solution there is a conformeric equilibrium between substrate-bound and substrate-unbound T4 Lysozyme which is in accord with the active site cleft opening upon substrate removal due to Hinge-Bending motion. Mchaourab et al. used the site-directed Spin Labeling to detect these motions. From our NAP formalisms too, [26] we can see that the softest non-affine modes corresponding to the residues 18 and 137 act as the nucleating-precursors to the opening of the active site cleft. These modes contribute by initially breaking the hydrogen bond between residues 22 and 137 and therefore we can directly see the increasing distance between residue 22 and 137 in S3 Video (description provided in section 4 of SI—Description of Videos) in the supplementary info.

Fig 5. NAP correlation function plotted as a colour map spanning residue ids where correlation values larger than a threshold (= 4 percent of the maximum value) are shown in yellow while the rest is kept black; a T4Lys, b Src kinase.

Highly correlated, spatially distant regions are sites for possible allosteric modification. Such pairs of sites are numbered in both cases with the font sizes corresponding to the relative value of the correlation (see section 3 of S1 Text). The numbers correspond to the same regions as in Fig 4. It is observed for T4Lys that the residue 18 from region 5 (yellow) has the highest correlation with the residue 137 in region 3 (black) as demonstrated by the largest font. In case of Src kinase, the residues 416 and 328, from regions 5(yellow) and 1(blue) respectively, have the highest correlation.

For Src kinase too, as shown in Fig 5b, we could identify five regions of high NAP correlations at distant locations. For example, residue 416 (Tyr) of region 5 of src kinase has a very high NAP correlation with residue 328 (Val) (an element of Helix-αc) of region 1 (labelled (1,5). This is supported by a previous report [44] of coupling between the activation loop and Helix-αc. A hallmark of active conformation of kinase [45] is the autophosphorylation of the activation loop (404–432) at the residue 416 (Tyr) and “αC-in” conformation of Helix-αc. Analysis of non-affinity shows that the softest non-affine modes (S4 Video in supporting info, description provided in section 4 of SI—Description of Videos) at residue 416 and residue 328 in our simulations act as incipients for residue 416 (Tyr) protruding outward from the activation loop region, thereby becoming more exposed for the phosphorylation at this site (this site being tyrosine which is favourable for phosphorylation), and at the same time inward rotation of Helix-αc. In summary, our analysis based upon NAP-susceptibility and NAP spatial correlation function points towards allosteric contact between Helix-αc (helix undergoing inward rotation) and activation loop of kinase (phosphorylation site Tyr416 getting exposed). Apart from this, we also note that there is a slight distortion of the salt-bridge between 295LYS and 310GLU. The two residues 295LYS and 310GLU are shown in silver small spheres and stick form. This is also supported by the previous works [4547]. The details of regions 1-5 are mentioned in section 3 of SI.

Multiple active sites in Cytochrome P450

The family of Cytochrome P450 proteins are quite versatile and are present in many different human tissues performing many tasks such as oxygen metabolism by binding to Heme, breakdown of toxins and hormones, synthesis of cholesterol [48] etc. They are often associated with intracellular membranes such as in mitochondria and the endoplasmic reticulum. Extensive studies using sequence-based co-evolutionary analysis and anisotropic thermal diffusion MD simulations have identified four principal active regions in P450 related to membrane association, catalytic activity, Heme binding and dimerization [49].

Our results for the NAP susceptibility are shown in Fig 6. Remarkably, we can also identify the same regions as obtained in the earlier study [49], although the variant of P450 used by us is somewhat different. A complete analysis of all the different binding and allosteric processes of P450 analysed using the NAP values, eigenvectors and correlation functions is much beyond the scope of the present work and will be published elsewhere.

Fig 6. Relative NAP susceptibility plotted with the same color scale as in Fig 4 for Cytochrome P450.

We have identified four regions of high activity. Region 1 and 2 are associated with membrane binding and catalytic activity respectively while 3 and 4 may correspond to Heme binding and dimerisation.


Pinpointing sites of catalytic activity, binding of ligands or other functionally important conformational changes in proteins is a challenge when such displacements in protein residues are subtle and easily masked by background noise. Accurate “order parameters” or collective variables need to be designed to study such activity. It is often unclear how such collective variable may be defined. Drawing from ideas which were first demonstrated in crystalline solids [2527] we have defined a new collective variable, NAP, which quantifies non-affine thermally excited displacements of proteins. Firstly, we show that the susceptibility of different locations of a protein to non-affine displacements can be used as an indicator to determine regions which are important for binding events. Secondly, from the analysis of the NAP susceptibility values we conclude that the regions with higher values of NAP susceptibilities are also those which contain a dominant non-affine mode whose eigenvalue is separated from those of the the rest of the modes by a large gap. The eigenvalue spectrum of the non-affine modes is also highly non-random. The eigenvector corresponding to this dominant mode involves atomic displacements that are the ones prone to act as binding gate-ways and the time spent by a ligand during a binding event at a particular residue is determined by the NAP susceptibility of this region. We also show that a positive correlation exists between the NAP value, its susceptibility and the magnitude of the gap in all the proteins studied by us.

Recent times have seen the emergence of many novel techniques to tackle the challenging issue of coming up with optimum sets of collective variable. Tiwary and Berne’s techniques of spectral gap optimisation of order parameters (SGOOP), [50] Time-structure based independent component analysis (TICA)-based approach for optimising collective variable [5153] for protein conformational analysis and molecular recognitions certainly have contributed significantly in this line. The approach described in this work is quite relevant, in light of these recent efforts in optimising collective variable for describing a biological processes. It is quite remarkable that our projection formalism yields useful data even from short simulation runs and results are robust over different simulation lengths (refer section 2 of SI and figure B). Our analysis is fundamentally different from principal component analysis (PCA) [5456] or other PCA based methods [57] which are used extensively for analysing protein configurations obtained from simulations. The restriction to the coarse graining volumes Ωi makes the analysis local and the projection to the non-affine sub space guarantees that all trivial motions are filtered out. Nonetheless, we have analysed the sensitivity of our results over choice of coarse-graining volumes in Section 5 of SI and figure C provides justification of our choice of Ωi. The NAP eigen spectrums are also analysed over varying coarse-grains radius in figure D and section 6 of SI. Our NAP analysis are also robust against choice of main-chain or side-chain atoms of protein for the computation (see section 7 of SI and figure E.) Finally, the large gap in the non-affine excitation spectrum between the dominant eigenmode and the rest ensures that there is very little mixing of the modes. As an example, recall the (un-)twisting motion of the helices observed in T4Lys as shown in the supplementary S1 Video (see S1 Text). Such a motion requires a special linear superposition of a large number of normal modes. The dominant PCA mode on the other hand cannot capture these motions (refer to figure F and section 8 of SI for detailed discussion on comparison with PCA) in detail although atomic displacements in the relevant regions are large. Unfortunately, these displacements are also contaminated by unimportant, affine motons which mask the gate opening feature.

The ability to compute NAP susceptibility by our approach helps us to unify two different proposed hypotheses of protein-ligand recognition, namely “conformational selection” [5863] (in which protein’s intrinsic ability to shift to a ligand-preferred conformation guides the molecular recognition) and “induced fit” (in which ligand induces the protein to change its conformation). Our calculations show that these ideas are not in conflict but follows from standard fluctuation-response behaviour connecting local fluctuations of non-affine part of atomic displacements to the response of the protein to the conjugate local non-affine field [26]. This is a simple consequence of a fluctuation response relation viz. 〈χ〉 = 〈χ0 + hχ〈(χ − 〈χ〉)20 if we identify ligands with local fields hχ which generate NAP in proportion to their susceptibilities at hχ = 0. The ligand, in this case acts as such an “external” field and the response of the protein to the field is proportional to the fluctuations of local NAP in its apo state without the field (ligand).

Additionally, we see that the spatial correlation of NAP among various residues can be used to discover sites of possible allosteric control. NAP susceptibilities and correlations may be obtained readily from simulations of the apo protein without the ligand and come at a fraction of the cost needed for detailed simulations of ligand binding events. This implies that our proposed method is extremely well suited to be adapted for high throughput searches of possible protein ligand pairs and allosteric control.

It is to be noted that in our work we have focussed on the projection of the displacements to the non-affine subspace which is orthogonal to local strain [41]. Since binding hotspots feature large displacements, both strain and strain fluctuations may be large together with large NAP values. NAP however, quantifies the error made in trying to fit the local displacements to an affine model and therefore large NAP values also signify that a description in terms of strain becomes invalid in those regions.

Materials and methods

Protein models and simulation details

We have worked on three systems including binding of benzene to the solvent-inaccessible cavity of the L99A mutant of T4 Lysozyme (T4Lys). Also we had previously performed multi-microsecond molecular dynamics simulation where we have simulated the complete kinetic process of ligand recognition [28]. The details of atomistic simulation process and models for this system have been discussed therein. The apo or ligand-unbound form of T4Lys (PDB id: 3DMV) and and Src kinase (pdb id: 1Y57) have also been explored in our work along with with some data for Cytochrome P450 (pdb id: 2CPP). All the apo forms have been modelled using same simulation protocols. We have used the CHARMM36 force field [64]. The protein is solvated by 11613 TIP3P water molecules in a cubical box of dimension 7.18 nm. Sufficient number of ions were added to maintain 150 mM salt concentration. All MD simulations were performed with the Gromacs 5.0.6 simulation package [65]. During the simulation, the average temperature was maintained at 303K using the Nose-Hoover thermostat [66, 67] with a relaxation time of 1.0 ps and an average isotropic pressure of 1 bar was maintained with the Parrinello-Rahman barostat [68]. The Verlet cutoff scheme was employed throughout the simulation with the truncated and shifted Lenard Jones interaction extending to 1.2 nm and long-range electrostatic interactions [69] treated by Particle Mesh Ewald (PME) summation [70]. All bond lengths involving hydrogen atoms of the protein and the ligand benzene were constrained using the LINCS algorithm [71] and water hydrogen bonds were fixed using the SETTLE [72] approach. Simulations were performed using the leapfrog integrator with a time step of 2 fs and initiated by randomly assigning the velocities of all particles. The total time for our simulations of the apo-proteins amounted to (T4Lys = 493.54ns, Src kinase = 460.00ns, P450 = 500ns) 1453.54 ns which was sufficient to obtain equilibrated data.

Supporting information

S1 Text. Supporting information including text and figures.


S1 Video. Supporting video demonstrating the opening of the helices 4 and 6.


S2 Video. Supporting video demonstrating the opening of the helices 7 and 9.


S3 Video. Supporting video demonstrating the non-affine modes for the residues having high spatial NAP correlation shown in Fig 4(a).


S4 Video. Supporting video demonstrating the non-affine modes for the residues having high spatial NAP correlation shown in Fig 4(b).



We thank Madan Rao, Satyavani Vemparala and Pratyush Tiwari for discussions.


  1. 1. Wu G, Robertson DH, Brooks CL, Vieth M. Detailed analysis of grid-based molecular docking: A case study of CDOCKER—A CHARMm—based MD docking algorithm. J. Comput. Chem.;24(13):1549–1562. pmid:12925999
  2. 2. Wang R, Lu Y, Wang S. Comparative Evaluation of 11 Scoring Functions for Molecular Docking. J. Med. Chem. 2003;46(12):2287–2303. pmid:12773034
  3. 3. Sledz P, Caflisch A. Protein structure-based drug design: from docking to molecular dynamics. Curr. Opin. Struct. Biol. 2018;48:93–102. pmid:29149726
  4. 4. Rachman MM, Barril X, Hubbard RE. Predicting how drug molecules bind to their protein targets. Curr. Opin. Pharm. 2018;42:34–39.
  5. 5. Yuriev E, Ramsland PA. Latest developments in molecular docking: 2010-2011 in review. J. Mol. Recog.; 26(5):215–239.
  6. 6. Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ. PatchDock and SymmDock: servers for rigid and symmetric docking. Nucl. Acids Res. 2005;33(Web Server):W363–W367. pmid:15980490
  7. 7. Grosdidier A, Zoete V, Michielin O. SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucl. Acids Res. 2011;39(suppl):W270–W277. pmid:21624888
  8. 8. Kurcinski M, Jamroz M, Blaszczyk M, Kolinski A, Kmiecik S. CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucl. Acids Res. 2015;43(W1):W419–W424. pmid:25943545
  9. 9. Blaszczyk M, Kurcinski M, Kouza M, Wieteska L, Debinski A, Kolinski A, et al. Modeling of protein-peptide interactions using the CABS-dock web server for binding site search and flexible docking. Methods. 2016;93:72–83. pmid:26165956
  10. 10. Ciemny MP, Debinski A, Paczkowska M, Kolinski A, Kurcinski M, Kmiecik S. Protein-peptide molecular docking with large-scale conformational changes: the p53-MDM2 interaction. Scientific Reports. 2016;6(1). pmid:27905468
  11. 11. Bikadi Z, Hazai E. Application of the PM6 semi-empirical method to modeling proteins enhances docking accuracy of AutoDock. J. Cheminform. 2009;1(1):15. pmid:20150996
  12. 12. Huey R, Morris GM, Olson AJ, Goodsell DS. A semiempirical free energy force field with charge-based desolvation. J. Comput. Chem. 2007;28(6):1145–1152. pmid:17274016
  13. 13. Pierce BG, Wiehe K, Hwang H, Kim BH, Vreven T, Weng Z. ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics. 2014;30(12):1771–1773. pmid:24532726
  14. 14. Chen R, Li L, Weng Z. ZDOCK: An initial-stage protein-docking algorithm. Proteins: Structure, Function, and Genetics. 2003;52(1):80–87.
  15. 15. Mintseris J, Pierce B, Wiehe K, Anderson R, Chen R, Weng Z. Integrating statistical pair potentials into protein complex prediction. Proteins: Structure, Function, and Bioinformatics. 2007;69(3):511–520.
  16. 16. Pierce B, Tong W, Weng Z. M-ZDOCK: a grid-based approach for Cn symmetric multimer docking. Bioinformatics. 2004;21(8):1472–1478. pmid:15613396
  17. 17. Pierce BG, Hourai Y, Weng Z. Accelerating Protein Docking in ZDOCK Using an Advanced 3D Convolution Library. PLoS ONE. 2011;6(9):e24657. pmid:21949741
  18. 18. Shan Y, Kim ET, Eastwood MP, Dror RO, Seeliger MA, Shaw DE. How Does a Drug Molecule Find Its Target Binding Site? J. Am. Chem. Soc. 2011;133(24):9181–9183. pmid:21545110
  19. 19. Dror RO, Pan AC, Arlow DH, Borhani DW, Maragakis P, Shan Y, et al. Pathway and mechanism of drug binding to G-protein-coupled receptors. Proc. Natl. Acad. Sci. USA 2011;108(32):13118–13123. pmid:21778406
  20. 20. Buch I, Giorgino T, De Fabritiis G. Complete reconstruction of an enzyme-inhibitor binding process by molecular dynamics simulations. Proc. Natl. Acad. Sci. USA 2011;108(25):10184–10189. pmid:21646537
  21. 21. Plattner N, Noé F. Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models. Nature Comm. 2015;6(1).
  22. 22. Bowman GR, Pande VS, Noé F. An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation. Springer Netherlands; 2014. Available from:
  23. 23. Chodera JD, Noé F. Markov state models of biomolecular conformational dynamics. Curr. Opin. Struct. Biol. 2014;25:135–144. pmid:24836551
  24. 24. Pande VS, Beauchamp K, Bowman GR. Everything you wanted to know about Markov State Models but were afraid to ask. Methods. 2010;52(1):99–105. pmid:20570730
  25. 25. Ganguly S, Sengupta S, Sollich P, Rao M. Nonaffine displacements in crystalline solids in the harmonic limit. Phys Rev E. 2013;87:042801.
  26. 26. Ganguly S, Sengupta S, Sollich P. Statistics of non-affine defect precursors: tailoring defect densities in colloidal crystals using external fields. Soft Matter. 2015;11(22):4517–4526. pmid:25953064
  27. 27. Nath P, Ganguly S, Horbach J, Sollich P, Karmakar S, Sengupta S. On the existence of thermodynamically stable rigid solids. Proc. Natl. Acad. Sci. USA 2018;
  28. 28. Mondal J, Ahalawat N, Pandit S, Kay LE, Vallurupalli P. Atomic resolution mechanism of ligand binding to a solvent inaccessible cavity in T4 lysozyme. PLOS Comput. Biol. 2018;14(5):e1006180. pmid:29775455
  29. 29. Eriksson AE, Baase WA, Wozniak JA, Matthews BW. A cavity-containing mutant of T4 lysozyme is stabilized by buried benzene. Nature. 1992;355(6358):371–373. pmid:1731252
  30. 30. Eriksson A, Baase W, Zhang X, Heinz D, Blaber M, Baldwin E, et al. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science. 1992;255(5041):178–183. pmid:1553543
  31. 31. Morton A, Matthews BW. Specificity of ligand binding in a buried nonpolar cavity of T4 lysozyme: Linkage of dynamics and structural plasticity. Biochemistry. 1995;34(27):8576–8588. pmid:7612599
  32. 32. Vallurupalli P, Chakrabarti N, Pomès R, Kay LE. Atomistic picture of conformational exchange in a T4 lysozyme cavity mutant: an experiment-guided molecular dynamics study. Chem. Sci. 2016;7(6):3602–3613. pmid:30008994
  33. 33. Mulder FAA, Hon B, Muhandiram DR, Dahlquist FW, Kay LE. Flexibility and Ligand Exchange in a Buried Cavity Mutant of T4 Lysozyme Studied by Multinuclear NMR†. Biochemistry. 2000;39(41):12614–12622. pmid:11027141
  34. 34. Kitahara R, Yoshimura Y, Xue M, Kameda T, Mulder FAA. Detecting O2 binding sites in protein cavities. Scientific Reports. 2016;6(1).
  35. 35. Quillin ML, Breyer WA, Griswold IJ, Matthews BW. Size versus polarizability in protein-ligand interactions: binding of noble gases within engineered cavities in phage T4 lysozyme. J. Mol. Biol. 2000;302(4):955–977. pmid:10993735
  36. 36. Deng Y, Roux B. Computations of Standard Binding Free Energies with Molecular Dynamics Simulations. J. Phys. Chem. B. 2009;113(8):2234–2246. pmid:19146384
  37. 37. Bouvignies G, Vallurupalli P, Hansen DF, Correia BE, Lange O, Bah A, et al. Solution structure of a minor and transiently formed state of a T4 lysozyme mutant. Nature. 2011;477(7362):111–114. pmid:21857680
  38. 38. Lee AA, Brenner MP, Colwell LJ. Predicting protein–ligand affinity with a random matrix framework. Proc. Natl. Acad. Sci. USA 2016;113(48):13564–13569. pmid:27856761
  39. 39. Götze F, Tikhomirov A. Rate of convergence in probability to the Marchenko-Pastur law. Bernoulli. 2004;10(3):503–548.
  40. 40. Marčenko VA, Pastur LA. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES. Mathematics of the USSR-Sbornik. 1967;1(4):457–483.
  41. 41. Mitchell MR, Tlusty T, Leibler S. Strain analysis of protein structures and low dimensionality of mechanical allosteric couplings. Proc. Natl. Acad. Sci. USA 2016;113(40):E5847–E5855. pmid:27655887
  42. 42. Greener JG, Fliippis I, Sternberg MJE. Predicting Protein Dynamics and Allostery Using Multi-Protein Atomic Distance Constraints. Structure. 2017;25:546–558. pmid:28190781
  43. 43. Mchaourab HS, Oh KJ, Fang CJ, Hubbell WL. Conformation of T4 Lysozyme in Solution. Hinge-Bending Motion and the Substrate-Induced Conformational Transition Studied by Site-Directed Spin Labeling. Biochemistry. 1997;36(2):307–316. pmid:9003182
  44. 44. Huse M, Kuriyan J. The Conformational Plasticity of Protein Kinases. Cell. 2002;109(3):275–282. pmid:12015977
  45. 45. Tong M, Pelton JG, Gill ML, Zhang W, Picart F, Seeliger MA. Survey of solution dynamics in Src kinase reveals allosteric cross talk between the ligand binding and regulatory sites. Nature Comm. 2017;8(1).
  46. 46. Tiwary P, Mondal J, Berne BJ. How and when does an anticancer drug leave its binding site? Science Adv. 2017;3(5).
  47. 47. Taylor SS, Kornev AP. Protein kinases: evolution of dynamic regulatory proteins. Trends Biochem. Sci. 2011;36(2):65–77. pmid:20971646
  48. 48. Guengerich FP. Cytochrome P450 and Chemical Toxicology. Chem. Res. Tox. 2008;21(1):70–83.
  49. 49. Liu J, Tawa GJ, Wallqvist A. Identifying Cytochrome P450 Functional Networks and Their Allosteric Regulatory Elements. PLoS ONE. 2013;8(12):e81980. pmid:24312617
  50. 50. Tiwary P, Berne BJ. Spectral gap optimization of order parameters for sampling complex molecular systems. Proc Natl Acad Sci USA. 2016;113:2839–2844. pmid:26929365
  51. 51. M Sultan M, Pande VS. tICA-Metadynamics: Accelerating Metadynamics by Using Kinetically Selected Collective Variables. J. Chem. Theory Comput. 2017;13(6):2440–2447. pmid:28383914
  52. 52. McCarty J, Parrinello M. A variational conformational dynamics approach to the selection of collective variables in metadynamics. J. Chem. Phys. 2017;147(20):204109. pmid:29195289
  53. 53. Ahalawat N, Mondal J. Assessment and optimization of collective variables for protein conformational landscape: GB1 β-hairpin as a case study. J. Chem. Phys. 2018;149(9):094101. pmid:30195312
  54. 54. Amadei A, Linssen ABM, Berendsen HJC. Essential dynamics of proteins. Proteins: Structure, Function, and Genetics. 1993;17(4):412–425.
  55. 55. Kitao A, Go N. Investigating protein dynamics in collective coordinate space. Curr. Opin. Struct. Biol. 1999;9(2):164–169. pmid:10322205
  56. 56. R BB, Dusanka J, Martin K. Harmonic analysis of large systems. I. Methodology. J. Comput. Chem.; 16(12):1522–1542.
  57. 57. Zhang Z, Wriggers W. Local feature analysis: A statistical theory for reproducible essential dynamics of large macromolecules. Proteins: Structure, Function, and Bioinformatics. 2006;64(2):391–403.
  58. 58. Hammes GG, Chang YC, Oas TG. Conformational selection or induced fit: A flux description of reaction mechanism. Proc. Natl. Acad. Sci. USA 2009;106(33):13737–13741. pmid:19666553
  59. 59. Tsai C, Ma B, Sham YY, Kumar S, Nussinov R. Structured disorder and conformational selection. Proteins: Structure, Function, and Bioinformatics;44(4):418–427.
  60. 60. Vogt AD, Di Cera E. Conformational Selection or Induced Fit? A Critical Appraisal of the Kinetic Mechanism. Biochemistry. 2012;51(30):5894–5902. pmid:22775458
  61. 61. Changeux JP, Edelstein SJ. Conformational selection or induced fit? 50 years of debate resolved. In: F1000 biology reports; 2011.
  62. 62. Silva DA, Bowman GR, Sosa-Peinado A, Huang X. A Role for Both Conformational Selection and Induced Fit in Ligand Binding by the LAO Protein. PLoS Comput. Biol. 2011;7(5):e1002054. pmid:21637799
  63. 63. Johnson KA. Role of Induced Fit in Enzyme Specificity: A Molecular Forward/Reverse Switch. J. Biol. Chem. 2008;283(39):26297–26301. pmid:18544537
  64. 64. Best RB, Zhu X, Shim J, Lopes PEM, Mittal J, Feig M, et al. Optimization of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone Phi, Psi and Side-Chain Xi1 and Xi2 Dihedral Angles. J. Chem. Theory Comput. 2012;8(9):3257–3273. pmid:23341755
  65. 65. Hess B, Kutzner C, van der Spoel D, Lindahl E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008;4(3):435–447. pmid:26620784
  66. 66. Nose S. A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 1984;52(2):255–268.
  67. 67. Hoover WG. Canonical dynamics: Equilibrium phase-space distributions. Phys Rev A. 1985;31:1695–1697.
  68. 68. Parrinello M, Rahman A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 1981;52(12):7182–7190.
  69. 69. Páll S, Hess B. A flexible algorithm for calculating pair interactions on SIMD architectures. Comput. Phys. Commun. 2013;184(12):2641–2650.
  70. 70. Darden T, York D, Pedersen L. Particle mesh Ewald: An Nlog(N) method for Ewald sums in large systems. J. Chem. Phys. 1993;98(12):10089–10092.
  71. 71. Hess B, Bekker H, Berendsen HJC, Fraaije JGEM. LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem.;18(12):1463–1472.
  72. 72. Miyamoto S, Kollman PA. Settle: An analytical version of the SHAKE and RATTLE algorithm for rigid water models. J. Comput. Chem. 1992;13(8):952–962.