^{1}

^{2}

^{2}

^{3}

^{2}

^{4}

^{1}

^{2}

^{*}

Conceived and designed the experiments: DPG JGG PE FF. Performed the experiments: DPG JGG. Analyzed the data: DPG JGG. Contributed reagents/materials/analysis tools: DPG JGG. Wrote the paper: DPG JGG PE FF.

The authors have declared that no competing interests exist.

Knowledge of the Free Energy Landscape topology is the essential key to understanding many biochemical processes. The determination of the conformers of a protein and their basins of attraction takes a central role for studying molecular isomerization reactions. In this work, we present a novel framework to unveil the features of a Free Energy Landscape answering questions such as how many meta-stable conformers there are, what the hierarchical relationship among them is, or what the structure and kinetics of the transition paths are. Exploring the landscape by molecular dynamics simulations, the microscopic data of the trajectory are encoded into a Conformational Markov Network. The structure of this graph reveals the regions of the conformational space corresponding to the basins of attraction. In addition, handling the Conformational Markov Network, relevant kinetic magnitudes as dwell times and rate constants, or hierarchical relationships among basins, completes the global picture of the landscape. We show the power of the analysis studying a toy model of a funnel-like potential and computing efficiently the conformers of a short peptide, dialanine, paving the way to a systematic study of the Free Energy Landscape in large peptides.

A complete description of complex polymers, such as proteins, includes information about their structure and their dynamics. In particular it is of utmost importance to answer the following questions: What are the structural conformations possible? Is there any relevant hierarchy among these conformers? What are the transition paths between them? These and other questions can be addressed by analyzing in an efficient way the Free Energy Landscape of the system. With this knowledge, several problems about biomolecular reactions (such as enzymatic activity, protein folding, protein deposition diseases, etc.) can be tackled. In this article we show how to efficiently describe the Free Energy Landscape for small and large peptides. By mapping the trajectories of molecular dynamics simulations into a graph (the Conformational Markov Network) and unveiling its structural organization, we obtain a coarse grained description of the protein dynamics across the Free Energy Landscape in terms of the relevant kinetic magnitudes of the system. Therefore, we show the way to bridge the gap between the microscopic dynamics and the macroscopic kinetics by means of a mesoscopic description of the associated Conformational Markov Network. Along this path the compromise between the physical nature of the process and the magnitudes that characterize the network is carefully kept to assure the reliability of the results shown.

Polymers and, more specifically, proteins, show complex behavior at the cellular system level,

Complex network theory

A similar approach, commonly used to analyze the complex dynamics, is the construction of Markovian models. Markovian state models let us treat the information of one or several trajectories of molecular dynamics (MD) as a set of conformations with certain transition probabilities among them

Finally, other strategies to characterize the FEL that have successfully helped to understand the physics of biopolymers, are based on the study of the Potential Energy Surface (PES)

In this article we make a novel study of the FEL capturing its mesoscopic structure and hence characterizing conformational states and the transitions between them. Inspired by the approaches presented in

In this section we show the round way of the FEL analysis: the map of microscopic data of a MD into a Conformational Markov Network (CMN) and, by unveiling its mesoscopic structure, the description of the FEL structure in terms of macroscopic observables.

First, we encode a trajectory of a stochastic MD simulation into a network: the CMN. This map will allow us to use the tools introduced henceforth to analyze a specific dynamics of complex systems such as biopolymers.

The CMN has been proven to be a useful representation of large stochastic trajectories

Our CMN is constructed as follows. The conformational space is divided into

The CMN constructed in this way, is described by a single matrix

Provided the MD trajectory is long enough to consider the sample in equilibrium, the weight-distribution of nodes in the CMN will be the stationary solution of Eq. (1) and detailed balance condition (2) will be fulfilled

A detailed check and discussion about the Markovian character of the networks shown in this article can be found in the

To illustrate the CMN approach and the methods presented below, we introduce here a synthetic potential energy function, that serve us as a toy model where results can be easily interpreted. This potential energy is reminiscent of that funnel surfaces recurrently found when the FEL of proteins are studied

(A) 2D funnel-like potential. (B) A stochastic trajectory is translated into a CMN where 6 sets of nodes (corresponding to different color) are the result of the SSD algorithm. (C) Recovering the spatial coordinates, the stationary probabilities of each node are shown in color code. The 6 basins detected are represented as color striped regions. (D) A coarse-grained CMN is built where new nodes take the role of the basins.

A stochastic trajectory has been simulated using an overdamped Langevin dynamics and the equations of motion have been integrated with a fourth order stochastic Runge-Kutta method

Up to now, we have illustrated the conversion of molecular dynamics data into a graph (the CMN). Now, we show how to efficiently obtain the thermo-statistical data from the mesoscopic description of the CMN.

Inspired by the deterministic steepest descent algorithm to locate minima in a potential energy surface we propose a

Picking at random one node of the CMN, say

We establish formally the above procedure assisted by a vector

We start by assigning

Select at random a node

Search, within the neighbors of the node _{j,l}

A If

B If

C If

The whole procedure ends when no nodes unlabeled remain in the CMN,

To illustrate the basin decomposition of a CMN, the SSD algorithm has been applied to the funnel-like potential. The result is the detection of six basins in agreement with the number of local minima in its FEL (

With the aim of studying biomolecules and systems with high degree of dimensionality, the way to detect these FEL basins must be computationally efficient. The method described above takes a computational time

To get a more comprehensible representation of the FEL studied, a new CMN network can be built by taking the basins as nodes. The occupation probabilities of these nodes as well as the transition probabilities among them can be obtained from those of the original CMN as

The weighted nodes and links have a clear physical meaning

The ability to define the proper regions of the conformational space in an efficient way let us compute physical magnitudes of relevance. For instance, the coarse-grained CMN is nothing but a graphical representation of a kinetic model with

The first hierarchy aims to answer the following question: What is the structure of the CMN when nodes with lower weight than a certain threshold are removed together with their links? Let us take the control parameter

With the above definitions we start a CMN reconstruction by smoothly increasing the threshold from its zero value. At each step of this process, we obtain a network composed of those nodes with free energy lower than the current threshold value. As the free-energy cut-off increases, new nodes emerge together with their links. These new nodes may be attached to any of the nodes already present in the network or they can emerge as a disconnected component. At a certain value of

This bottom-up network reconstruction provides us with a hierarchical emergence of nodes along with the way they join together. This picture can be better described by a process of basins emergence and linking that is easily represented by means of a basin dendogram. This representation let us guess at first glance the hierarchical relationship of the conformational macro-states and the height of the barriers between them. Let us remark that the transition times cannot be deduced from these qualitative barriers since the entropic contribution or the volume of the basin are not reflected in this diagram. The basins family-tree obtained for the funnel-like (see

(A) Free Energy hierarchy: based on the relative free-energy of the nodes. (B) Temporal hierarchy: number of basins defined by SSD for the different networks built by Eq. (7). The original basins merge in function of time. Both hierarchies reveals a coarse-grained behavior of two macro-states:

The CMN representation of a MD simulation provides with another hierarchical relationship that is meaningful to understand the behavior of the biological systems. The links of the original CMN have been weighted according to the stochastic matrix

For each value of

From the original

The result of this procedure performed for the funnel-like potential is shown in

The alanine dipeptide, or terminally blocked alanine peptide (Ace-Ala-Nme,

(A) The dialanine dipeptide with the angles

The alanine dipeptide has two slow degrees of freedom, the rotatable bonds

We have used

The

We now look at the coarse-grained picture of the FEL by describing the properties of the 6 basins detected. The different weights of the basins are related to the free energy of the corresponding conformational macro-states. In

Basin | Mean Escape Time (ps) | |

0.00 | 0.52 | |

0.45 | 0.42 | |

2.42 | 4.20 | |

3.84 | 0.71 | |

0.55 | 0.28 | |

0.90 | 0.23 |

The FEL can be represented as a dendogram, see

Two sets of basins are clearly distinguished with a high free energy barrier in between: (

The alanine dipeptide has been also studied because of its “fast” isomerization

_{ba} (ps) |
|||

1968.34 | 88.24 | ||

58011.74 | 815.87 | ||

393.75 | 6.63 | ||

400.57 | 58.47 | ||

3.32 | 1.88 | ||

4.80 | 0.78 |

To round off the description of the FEL, the dendogram corresponding to the temporal hierarchy is shown in

In around 100 ps the peptide finds the way to reach the global minimum, conformer

Finally, the magnitudes computed here for the alanine dipeptide would allow to construct a first-order kinetic model of 6 coupled differential equations as Eq. (6) (assuming equilibrium intra-basin). This model contains the same information as the kinetic model by Chekmarev et al. for the irreversible transfer of population from

Hierarchical landscapes characterize the dynamical behavior of proteins, which in turn depends on the relation between the topology of the basins, their transitions paths and the kinetics over energy barriers. The CMN analysis of trajectories generated by MD simulations is a powerful tool to explore complex FELs.

In this article, we have proposed how to deal with a CMN to unveil the structure of the FEL in a straightforward way and with a remarkable efficiency. The analysis presented here is based on the physical concept of basin of attraction, making possible the study of the conformational structure of peptides and the complete characterization of its kinetics. Note that this has been done without the estimation of the volume of each conformational macro-state in the coordinates space and without the ‘a priori’ knowledge of the saddle points or the transition paths from a local minimum to another.

On the other hand, the framework introduced in the article provides us with a quantitative description of the dialanine's FEL, coming up directly from a MD dynamics at certain temperature. The peptide explores its landscape building the corresponding CMN and the success of extracting the relevant information is up to the ability of dealing with it. Neither the FE basins were defined by the unique criterion of clustering conformations with a geometrical distance

Although we have applied the method to low dimensional landscapes, we expect that high dimensional systems could be also studied, by the combination of this technique with the usual methods to reduce the effective degrees of freedom (like principal component analysis or essential dynamics). In conclusion, the large amount of information obtained by working with the CMN, its potential application to any peptide with a large number of monomers, and the possibility of performing the analysis on top of CMN constructed via several short MD simulations

Checking Markovity.

(0.56 MB ZIP)

Comparing with community algorithms.

(0.18 MB ZIP)

More on Alanine dipeptide.

(<0.01 MB ZIP)

A critical reading of the manuscript by Y. Moreno and L.M. Floría, and the helpful comments and suggestions from the anonymous referees are gratefully acknowledged.