Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Systematically Constructing Kinetic Transition Network in Polypeptide from Top to Down: Trajectory Mapping

  • Linchen Gong,

    Affiliation Institute for Advanced Study, Tsinghua University, Beijing 100080, China

  • Xin Zhou ,

    xzhou@ucas.ac.cn

    Affiliation School of Physics, University of Chinese Academy of Sciences, Beijing 100049, China

  • Zhongcan Ouyang

    Affiliations Institute for Advanced Study, Tsinghua University, Beijing 100080, China, Institute for Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China

Systematically Constructing Kinetic Transition Network in Polypeptide from Top to Down: Trajectory Mapping

  • Linchen Gong, 
  • Xin Zhou, 
  • Zhongcan Ouyang
PLOS
x

Abstract

Molecular dynamics (MD) simulation is an important tool for understanding bio-molecules in microscopic temporal/spatial scales. Besides the demand in improving simulation techniques to approach experimental scales, it becomes more and more crucial to develop robust methodology for precisely and objectively interpreting massive MD simulation data. In our previous work [J Phys Chem B 114, 10266 (2010)], the trajectory mapping (TM) method was presented to analyze simulation trajectories then to construct a kinetic transition network of metastable states. In this work, we further present a top-down implementation of TM to systematically detect complicate features of conformational space. We first look at longer MD trajectory pieces to get a coarse picture of transition network at larger time scale, and then we gradually cut the trajectory pieces in shorter for more details. A robust clustering algorithm is designed to more effectively identify the metastable states and transition events. We applied this TM method to detect the hierarchical structure in the conformational space of alanine-dodeca-peptide from microsecond to nanosecond time scales. The results show a downhill folding process of the peptide through multiple pathways. Even in this simple system, we found that single common-used order parameter is not sufficient either in distinguishing the metastable states or predicting the transition kinetics among these states.

Introduction

Protein folding problem has been intensively studied for decades. Although in-depth understanding of proteins has been established by the pioneer works, see reference [1] for brief review, due to the tremendous complexity of these molecules, there is still a long way to get a clear and definitive description of conformational motions in proteins.

The current progress of experimental and simulation methods has made the protein structural ensemble accessible to researchers. In experiment, it is possible to directly observe the protein conformational dynamics by single molecular fluorescence method (SMF) [2]. Meanwhile, the full details of protein dynamics can be obtained by molecular dynamics (MD) simulations. The rapidly increasing computational power has enabled people to thoroughly study some small proteins with lots of parallel generated MD trajectories [37], or with a single long MD trajectory [8], up to milliseconds.

One intriguing point of proteins is that various folding intermediates generally exist [9, 10]. Besides, the unfolded ensemble of a protein also shows heterogeneity. In unfolded phase, proteins may have specific residue structures [11, 12] while statistically behaving like random coils [13]. The versatile metastable states of a polypeptide reflect the complexity of this molecule’s conformational space. A complete picture of protein dynamics can be established by explicitly identifying the metastable states and transitions between them. Such a picture may facilitate the study of some important protein molecules like intrinsic disorder proteins [14] and amyloid forming proteins [15].

So far, we have not provided a definition of a metastable state. A metastable state corresponds to a region in conformational space which is separated from other regions by high free energy barriers [16]. Consequently, a dynamics simulation trajectory entering into a metastable state will be trapped there in a characteristic timescale τlife, (i.e., the lifetime of the state), which should be longer than the local equilibration timescale τeq in the state, (i.e., the time that system loses its memory inside the state). Thus the kinetic transition between metastable states can be approximated as Markovian process [17]. Taking the states as nodes and transitions between them as edges, we can establish a transition network as a simplified picture about conformational motions of system [1820].

To identify the states, the traditional way is to project simulation data to a low-dimensional space spanned by one or two manually selected order parameters, reconstruct the free energy contour map and then visually pick out the free energy minima (or basins) as states [21]. Some advanced techniques are also invented or applied to better select the order parameters [2226]. However, it has been realized that the low-dimensional projection is usually not sufficient in complex systems, some metastable states may overlap each other after the projection, leading to artificial and distorted understanding in kinetics [27, 28]. In view of this, some methods have been designed to construct the transition network without the low-dimensional projection. Earlier attempts used clustering algorithms to directly group geometric similar conformations as metastable states [2933]. However, metastable states should be defined on similarity of conformations in kinetics (or dynamics) rather than in geometry. In bio-molecules, structurally (geometrically) similar conformations may not be kinetically close to each other, and the structural difference of conformations inside a kinetic state may look not smaller than that of conformations in different states, (i.e., the intra-state conformational fluctuation could be not smaller than the inter-state fluctuation).

Recently, a popular approach in classifying kinetic metastable states and transitions is the Markov state model (MSM) [3442]. In the MSM, the sampled conformations are first classified into lots of small groups called microstates wherein the conformations are similar in geometry. As long as the partition of simulation samples is fine enough, the kinetics between microstates would be supposed as a discrete-time-discrete-state Markov process, and a transition rate matrix between microstates could be established by directly counting transition events along simulation trajectories. Then these microstates are further grouped into metastable states based on the standard spectral clustering method, and the transition network is constructed accordingly [3442]. In the MSM, to ensure the correctness of the results, the number of microstates is often very large [43], but can not too large for getting sufficient number of transition events between them to estimate the transition rates.

In the previous works [19, 20], we proposed a trajectory mapping (TM) method to identify metastable states without a complete breakdown of simulation data. In the TM, we cluster simulation trajectory pieces rather than individual conformations, by mapping each trajectory piece as a high-dimensional vector with the average values of a set of (analytical) basis functions in the piece as components. The similar trajectory-mapped vectors are then grouped as metastable states, transition events in simulation trajectories are further identified. Recently, the idea describing conformational motions by analytical basis functions is also applied to improve and generalize the original MSM where a sample-based discrete functions are applied to describe the conformational motions. The benefits about the application of analytical basis functions are widely discussed [41, 42, 4447]. For example, Nüske et al. [46] use the variational approach to relate the cross time-correlation matrix of analytical basis functions to a finite-dimensional approximation of dynamics propagator of systems, then the first eigenvectors of the correlation matrix provides slow dynamics modes. Another similar approach is called Time-Structure Based Independent Component Analysis (tICA) [44] which generalizes the usual principle component analysis (PCA) to relate the eigenvectors of the time correlation matrix to independent modes.

In this paper, we further improve our previous TM by presenting a hierarchical analysis strategy and a robust clustering algorithm to identify metastable states and transition network from the trajectory-mapped vectors in general. The state-searching process is now fully automated, and the complex transition network can be easily constructed accordingly in polypeptide. We also briefly discuss the relation between TM and the MSM-like methods.

Materials and Methods

Overview of trajectory mapping

Trajectory mapping (TM) is an analysis framework to identify metastable states from simulation data and to construct the transition network between the states. The TM maps molecular dynamics (MD) trajectories or trajectory pieces with approximately equal length τ to high-dimensional vectors, (1) where the components of the mapped vectors are the average values of conformational functions {A^μ(q)} (named as basis functions) in the trajectory pieces. Âμ(q)i=1τ0τÂμ(qi(t)dt. q denotes the conformational coordinates of the simulated system, such as the spatial positions of all atoms. qi(t) means the ith trajectory piece within the time interval t ∈ [0, τ]. Here the first basis function A^0(q)1 was explicitly written, all the other basis functions {A^μ(q)},μ=1,,n are applied to describe (interested) conformational motions.

In the TM, the basis functions are orthonormalized each other under a reference distribution Pref(q), (2) Here ⟨⋯⟩ref represents the average over Pref(q), which is estimated in the corresponding finite-size sample. We could choose all of the sampled conformations in these trajectory pieces, or a relevant part of these conformations as the reference sample. It is easy to linearly combine the preselected basis functions to form a set of orthonormalized basis functions satisfied Eq (2) by standard methods such as the Gram-Schmidt process, or PCA.

Aggregations of the trajectory-mapped vectors are found to correspond to metastable states in the previous work [20]. In simpler cases where only a few (ns) metastable states exist, we reduce the mapped vectors into a low (nd = ns−1) dimensional space by PCA, then directly identify the aggregated clusters as metastable states. However, in peptide or protein systems, there are usually lots of metastable states in various (and not well separated) time scales, more systematical implementation of the TM and robust clustering algorithms are needed.

Mathematical and physical meaning of TM

Before introducing details of the improved TM, we discuss the mathematical and physical meanings behind the TM, on such as basis functions, reference distribution, the PCA reduction, clustering of mapped vectors, and the identification of transition events.

Basis functions.

Basis functions should be chosen to identify typical conformational motions of systems. Some physical quantities, such as, in protein, the torsion angles of backbone, distances of residue pairs, number of native contacts, root mean square deviation from some particular conformations, hydrogen bonded energy, solvated energy, etc., are good candidates of basis functions. In addition, since we usually focus on large-scale conformational motions, some fast degrees of freedom, such as hydrogen atoms, bond oscillation, etc., are usually excluded as basis functions. It is more efficient to select functions in coarse-grained conformational space as basis functions. More discussions about basis functions can be found in our previous works [19, 20, 48], or in some current approaches of MSM, such as tICA and the variational approach [4447] where basis functions are similarly selected to expand the dynamics propagator.

Similarity of trajectory pieces.

We define the overlapping integral of two probability density functions Pi(q) and Pj(q), (3) where Pref(q) is a reference probability density function. Although the overlapping integral may be sensitive to Pref(q), some qualitative results, such as the zero value of the overlapping integral means no overlapping, is not dependent on the selection of Pref(q). We usually choose Pref(q) including both Pi(q) and Pj(q) to make the definition be reasonable. In practical application, the integral is usually estimated by finite-size samples of these probability distributions rather than their analytical formulas. Since (4) the inner product of trajectory-mapped vectors, which defined as (5) is a good estimate of the overlapping integral. Here we include the first trivial basis function A^0(q)1 and require {A^μ(q)} satisfies Eq (2).

We further define the scaled inner product (SIP), (6) Here v^ is the unit vector of v. Therefore, while i and j correspond to two trajectory pieces which visit in the same metastable state and reach local equilibrium inside the state, their SIP is almost unity. Conversely, if the trajectory pieces i and j visit two complete different conformational regions without any overlapping, their SIP is almost zero. The value of SIP between zero and unity corresponds to the fact that the trajectories partially overlap in conformational space. In practice, although it is possible the SIP is slightly smaller than zero due to the finite sizes of samples and finite basis functions, it provides a good measure about similarity of trajectories. In the paper, we use the SIP (or it corresponding distance such as d=2(1SIP)) rather than the usual Euclidean distance de=|vi-vj| to measure the similarity of trajectory pieces. It is one of key points in the improvement of the TM.

Reduction of trajectories.

There are closely relation between the TM and the variational approach [46] and the tICA [44]. The variance-covariance matrix element of {vi} is (7) Here Cμν(t)=1τt0τtdt11miA^μ(qi(t1))A^ν(qi(t1+t)) is nothing else but the time correlation in the variational approach [46], where the first (left or right) eigenvectors of the cross time-correlation matrix of basis functions correspond to slow dynamics modes (i.e., the transitions between metastable states). The eigenvalues are expected to be single-exponential decay functions of time, while the basis functions are orthonormalized under the equilibrium distribution Peq(q). In the TM, we do not require to apply Peq(q) as the reference, and the variance-covariance matrix of trajectory-mapped vectors is a kind of average of the time correlation matrix. Although the principle components may not directly give slow modes, they well distinguish metastable states then provides the slow modes of system.

Clustering trajectories to states.

As we already mentioned, a conformational region is a metastable state if the local equilibrium time inside the region, τeq, is smaller than the life time of trajectory inside the region, τlife. We might measure the metastability of a state by the two times, such as κ=τlifeτeq. In the TM, we map trajectory pieces with the length τ then cluster them as metastable states, thus some states can be found if they satisfied the condition, (8) On the one hand, if ττlife, the τ−length trajectory pieces have significant possibility stay inside this state for identifying. Otherwise, trajectory pieces could only partially stay inside the state, thus no such a cluster corresponding to the state could be found. For these states, we can cut trajectories into shorter pieces (smaller τ) to make them be visible. On the other hand, τeqτ is a more basic condition in the TM, which ensures that all τ-length trajectories inside the state are mapped in the same cluster.

In realistic systems, there are usually lots metastable states with wide-distributed τeq and τlife. It is not easy to find all of them in a single τ. In this work, we first find large-size clusters at large τ, which obviously correspond to metastable states, then we cut the remaining trajectory pieces shorter and repeat to find large-size cluster as states, until most of data are identified or the remained trajectory pieces are too short.

Identify transition events.

After finding metastable states, we can further translate simulation trajectories to state-indicator curves. These curves give the states that individual conformations (or a few successive conformations) located in. Concretely speaking, given ns identified metastable states whose mapped vectors based on Eq (1) are denoted {vαs},α=1,,ns, a simulation trajectory i can be transformed into ns state-indicator curves, {f(t)}, (9) Here, v^αs is the unit vector of vαs, and v^i[tΔt,t+Δt] denotes the unit vector mapped from the conformations of the ith trajectory within the time interval [t−Δt, tt]. If Δt → 0, only the individual conformation qi(t) (the conformation of the ith trajectory at time t) is considered. Using finite Δt, the statistical noise in the state-indicator curves could be depressed. Ideally, f(t) should be either zero or unity, i.e., f(t) ≈ 1, if qi(t) ∈ Sα, otherwise f(t) ≈ 0. Here Sα represents the state α. Therefore, the transition events between metastable states can be identified from the state-indicator curves.

The systematical implementation of TM

The TM algorithm.

We summarize algorithm of the TM as,

  1. Choose a set of conformational functions and a reference sample, then form the othonormalized basis functions {A^μ},μ=1,,m.
  2. Map trajectory pieces with length τ to vectors {vi}, and reduce the mapped vectors by PCA.
  3. Group the mapped and reduced vectors by a clustering algorithm and identify larger cluster as metastable states.
  4. Cut trajectory pieces which are not identified yet to shorter pieces, repeat the step 1 to 3 until the remained trajectory pieces are sufficient short.
Here, it is allowed to reset basis functions and the reference sample while varying time scale τ to better focus on the remaining simulation data.

The clustering algorithm.

We briefly summarize the clustering algorithm as,

  1. The trajectory-mapped vectors are grouped into clusters if their SIPs are larger than rl.
  2. A cluster is identified as a metastable state if its size (number of vectors inside) is larger than Nnestd.
Here we use the SIP defined in Eq (6) to measure the similarity of trajectory pieces. The SIP is almost within [0, 1] while sufficient basis functions are applied. It closes to unity if trajectories visit same conformational region, but zero while visiting completely different regions. Thus it is easy to set criterions to judge if two trajectories are similar. In this paper, rl is set as 0.95, Nnestd=5. The threshold of cluster size is used to exclude occasional concentrations of trajectory-mapped vectors. For example, we might generate two trajectory pieces which visit two metastable states occasionally with similar fractions in the two states, their conformational distributions are similar then the two pieces are mapped in one cluster. However, the probability to generate many trajectories with similar distributions but not in a single metastable state is small. The application of a larger threshold can depress the misjudgement while it might miss some metastable states. Since we will cut the non-identified trajectories into short pieces and repeat the clustering and state-identification process, the missed states will be found in the shorter time scales. Therefore, although the found states in each special τ may be dependent on the parameter Nnestd, the final results of TM is not sensitive to that. In practical application, some additive judgements and tricks are also used to refine results. These details are listed in S1 Text of Supporting Information. We also illustrate the clustering algorithm in an imaginary models in Supporting Information as S1 Fig.

The hierarchical analysis strategy.

Polypeptides are quite heterogeneous systems. There are many metastable states with various τlife and τeq, and there there could also be some sub-states inside states. The versatile stability and complex interrelation between metastable states reflect the hierarchical structure of a protein’s conformational space. In view of this, we designed a hierarchical analysis strategy and illustrate it with an imaginary example in Supporting Information, see S2 Fig.

Simulation and analysis details

In the paper, we apply the TM in alanine-dodeca-peptide [Ala12], a polypeptide composed of 12 alanine residues. The simulation is performed with TINKER4.2 package using OPLSUA force field and GB/SA implicit solvent model [49]. Charged termini in Ala12 are used, which leads to versatile metastable structures [35]. The conformations are recorded every 0.5 ps. In the previous work [20], we studied this molecule with 1000 20-ns length simulation trajectories. We found that most of the identified metastable states correspond to β-hairpin/coil conformations, and α-helix conformation is less stable then β-hairpin/coil conformations using current force field, which is consistent with previous experimental and theoretical results [50, 51]. Owing to the limited simulation length of each trajectory and the tentative clustering algorithm, previously we did not globally analyze the system, but only focused on some local structures instead. In this work, five 4μs-length simulation trajectories were generated, one of the five trajectories is spawned from α-helix conformation, and all the others were initiated from β-hairpin/coil conformations to reflect the relative importance of these conformations.

We select the functions of backbone ϕ and ψ angles as basis functions. Here ϕ is defined as the backbone dihedral angle around the bond connecting Cα and N atoms, ψ is defined as the backbone dihedral angle around the bond connecting Cα and carbonyl carbon atoms. There are 22 ϕ or ψ angles in Ala12. These angles fully account for the backbone flexibility of this molecule. They are transformed into basis functions using the following two-dimensional trigonometrical functions. (10) Here x and y are two angles measured in radius. m and n are non-negative integers. We define the summation of m and n in Eq (10) as the order of these functions, and use the one-to-two order functions in analysis. Only the correlation between sequentially neighboring dihedral angles are modeled by the basis functions. Therefore, 172 basis functions are finally included in analysis. 88 of them are functions of single dihedral angles and the remaining 84 ones are functions of neighboring dihedral angles. For peptide system, this set of basis functions is already enough for a reasonable estimation in the TM [19, 20]. It should be noted that it is possible to select functions of other degrees of freedom (such as inter-atomic distances) or of carefully chosen collective variables in analysis, similar results could be obtained. The backbone dihedral angle is a simple and natural choice for describing the global conformational motions of peptides [52].

We performed the hierarchical analysis at three timescales. The trajectories are first truncated to 100 200ns-length trajectory pieces. After clustering, the trajectory pieces that are not allocated to any metastable state are truncated to 20ns-length, the shorter pieces that are not overlapping to existing metastable states are picked out for next round of clustering. The non-allocated ones in this round of clustering are truncated to 2ns-length. Then, the 2ns-length trajectory pieces that are not overlapping to existing metastable states are kept for the final round of clustering.

Results

The metastable states of Ala12

We first examined the convergence of the five 4μs-length simulation trajectories. We calculated the similarity between the conformational distributions of these long trajectories, i. e. the scaled inner product (SIP) defined in Eq (6). The results are shown in Supporting Information as S3 Fig. Although every simulation trajectory partially overlaps with some others, there do not exist two simulation trajectories very similar to each other such that their SIP is close to one. Therefore, the SIP measure clearly shows that none of the long simulation trajectories has reached the global equilibrium. We need to combine the information in these trajectories to get a synthesized picture of the system by the TM.

Metastable states.

Through hierarchical analysis at three levels, 28 states were automatically found by the TM. We first found 2 states at 200ns timescale, then 11 states at 20ns timescale from the remained parts of trajectories, finally, 15 states at 2ns timescale. These states are further refined to ensure that the SIP values between different states are almost zero (smaller than 0.01 in the current analyses). The identified metastable states are orthogonal to each other, i.e. different states are not overlapping in conformational space, indicates our basis functions is sufficient to completely distinguish these states. The SIPs among states are shown in Supporting Information as S4 Fig.

Transitions between states.

We can get the state-indicator curves by projecting the simulation trajectories to the 28 states. A set of representative state-indicator curves are plotted in Fig 1. The others are shown in Supporting Information as S5, S6, S7 and S8 Figs. There are totally 28 curves, each addresses the occupation timing and fraction of the third 4μs trajectory in a metastable state. As we mentioned, if the simulation trajectory stays in certain state around certain time, the state-indicator curve of this state should take a value close to 1.0 at this moment. Thus, it could be deciphered from Fig 1 that the 4μs trajectory started off from state S8, after traveling around other 10 states, it finally entered state S2 and stayed there for the last 2μs, which indicates the lifetime of state S2 is at least in μs-scale. We also provide the enlarged view of the parts of trajectory from 0ns to 350ns and from 750ns to 950ns in Fig 2. During the first period, the trajectory quickly traveled among state S8, S21 and S23. During the second period, the trajectory jumped between state S9 and S15. It can be seen that the state indicator curves usually show step-like behavior, jumping between value 0 and 1, which shows that the transitions between states are quite fast compared to τeq and τlife. There also exist some regions in which the state-indicator curve takes value between zero and one, for example, see the curve of S9 in Fig 2(b). In that case, the trajectory may enter a conformational region which does not have a good metastability then is called as a diffusive-like region. Sometimes all the state-indicator curves take almost zero value shortly, which indicates that there are local unidentified regions which could either be metastable or diffusive. The above-mentioned abnormality of state-indicator curves is understandable considering the complexity of bio-molecules’ conformational space.

thumbnail
Fig 1. The 28 state-indicator curves of the 3rd 4μs-length trajectory.

The curves are divided into three groups according to the identified timescale of corresponding states. The states are numbered according to the sequence of finding.

https://doi.org/10.1371/journal.pone.0125932.g001

thumbnail
Fig 2. The detailed view for some of the state-indicator curves in the trajectory.

https://doi.org/10.1371/journal.pone.0125932.g002

We list the number of trajectory pieces used for defining the metastable states in Supporting Information, see S1 Table. For each state, we also list the average SIP value between the defining trajectory pieces and the representative vector of the state. An average SIP value close to 1.0 indicates that the trajectory pieces in a state resemble each other quite well. To calculate the average SIPs, Pref(q) is selected as the equal weight linear combination of the identified metastable states. In S2 Table of Supporting Information, we show the proportion of conformations which are identified as metastable states in the five 4μs simulation trajectories. More than 90 percent of the simulation data is found to stay in the identified metastable states, which suggests the remarkable metastability of Ala12. Among the five simulation trajectories, the fifth trajectory is least accounted by metastable states, this is because the final part of this trajectory entered into a large region with a few metastable states and some other small regions with prominent diffusive behavior inside (i.e., κ = τlife/τeq is not obviously larger than unity).

Metastability of states.

We also tested whether the 28 metastable states satisfy the assumption τeq < τlife. To estimate the τlife of a metastable state, we picked out all the trajectory pieces that continuously stay in this state, and took their average length as an estimation. To estimate the τeq of a metastable state, we calculated the relaxation behavior of the τ-length trajectory pieces defining the state. Concretely speaking, for each trajectory piece i defining the state Sα, the SIP between the conformations in Sα and the conformations of the beginning u-length part of trajectory i, are calculated for u ∈ [0, τ]. The SIP should be small when u is close to zero. Meanwhile, as increasing u to approach to τeq, it approaches to 1 within statistical error. Such this kind of SIP curve illustrates the relaxation of a trajectory to the local equilibrium inside a state. We plot some SIP curves for the states S1, S2, S3 and S14 in Fig 3. The SIP curves are fitted with the stretched exponential model [53]. (11) For each trajectory piece, an estimation of τeq can be obtained by (12) where Γ(x) is the Gamma function. Averaging the τeqest values of the trajectory pieces defining a state leads to the estimation of τeq of that state. The final results of τeq and τlife are shown in Fig 4. The relation τeq < τlife is indeed satisfied within statistical error.

thumbnail
Fig 3. The equilibration process in metastable states.

In each panel, different color represents different trajectory pieces used to estimate the equilibration process. The solid lines are calculated with simulation data, the dotted lines are the fitted stretched exponential curves.

https://doi.org/10.1371/journal.pone.0125932.g003

thumbnail
Fig 4. Comparison between τlife, τ and τeq.

The blue symbols (squares for the ones with error bar, stars for the ones without error bar) denote the estimated τlife of the metastable states. The green crosses denote the identified timescale of the states. The red diamonds denote the estimated τeq. The error bars are estimated where possible. The dotted lines are just for aiding the inspection.

https://doi.org/10.1371/journal.pone.0125932.g004

Test of the inner product estimate.

We further demonstrate the validity of Eq (5). The evolution of probability distribution are often thought to follow the multiple-dimensional Fokker-Planck Equation, P(q,t)t=LP(q,t), where L is the Fokker-Planck operator. We have (13) where {ϕn(q)} is orthonormalized, i.e., ∫ϕn(q)ϕm(q)dq = δn, m, and ∣ϕ0(q)∣2 = Peq(q), the equilibrium distribution. The non-negative {λn} is sorted from small to large, λ0 = 0. The expansion coefficient {Cn} is determined by the initial distribution P(q, t = 0). We define the average distribution (14) If choosing Pref(q) = Peq(q), it is straight forward to show the overlapping integral defined in Eq (3) (15) As t increases, the exponential terms in Eq (15) decay to zero quickly, and ⟨Pavr(q, t)∣Pavr(q, t)⟩−1 will be proportional to 1/t2.

Due to the lack of global equilibrium sample, we focus on the local equilibrium inside the state S1, and use the local equilibrium sample as the reference. Since Eq (15) is related to the time relaxation of Pavr(q, t), we truncate the trajectory pieces defining S1 to even shorter pieces and estimate the relaxation of the ensemble of the short trajectories. Short trajectories of lengths 100ns, 50ns, 30ns, 20ns, 10ns and 5ns are analyzed. In these cases, we have 52, 104, 156, 260, 520 and 1040 pieces in the trajectory ensemble respectively. The shorter the truncated trajectories, the initial distribution P(q,0) of the ensemble of trajectories is more similar to the local equilibrium distribution of S1. The relaxation behavior of ⟨Pavr(q, t)∣Pavr(q, t)⟩−1 estimated by Eq (5) is plotted in Fig 5. Apparently, all the curves shown in Fig 5 become proportional to 1/t2, consistent with the theoretical result. Besides, the timescale at which the crossover to 1/t2 behavior happens is also consistent with the estimated τeq of S1, see Fig 4. Therefore, the sample estimation of the inner product between two conformational functions, see Eq (5), is promising.

thumbnail
Fig 5. The scaling behavior of Pavr(q,t)|Pavr(q,t)-1 versus t.

https://doi.org/10.1371/journal.pone.0125932.g005

The transition network and polypeptide folding

Based on the identified metastable states and the state-indicator curves, we derived out a transition network and plotted it in Fig 6. The metastable states are shown with circles of various size and color, they are connected to each other according to their transition relation. It should be noted that the transition relation illustrated here is inferred from the simulation data. If the transition between two states only happens in one direction in our finite-time simulation, the two states will only be connected by single-directional arrow. Thus the transition network may lack detailed-balance property due to finite sampling. Still, to get a qualitative picture, we estimated the rates of the observed transitions as follows. For each state Si, we estimated its lifetime τlifei. Suppose the jump from Si to another state Sj happened for Nij times, the kinetic transition rate in this direction could be estimated by (16) The different line styles for the transition arrows in Fig 6 illustrate the magnitude of non-zero transition rates.

thumbnail
Fig 6. The transition network. Each node represents a metastable state.

The states are colored according to their identified timescales, and their sizes are determined by the estimated τlife. The transitions between states are plotted with different line styles according to the estimated transition rates. The classification of the 28 states is also shown.

https://doi.org/10.1371/journal.pone.0125932.g006

Simplified picture in μs scale.

According to the transition relation, we roughly partition the 28 states into 4 groups, G1, G2, G3 and G4. The composition of the groups can also be found in Fig 6. In simulation, all the transitions between the states in different groups are found to be single-directional. Concretely speaking, we can only find the transitions from the states in G1 to the states in G3, from the states in G2 to the states in G3 and from the states in G3 to the states in G4, while all the reverse transitions didn’t show up. However, the states in the same group are kinetically closely related to each other. Except for S12 in G3 and S28 in G4, we can find transition routes of reverse directions between any two states in the same group. Meanwhile, S12 and S28 are actually small intermediate states for the transitions from G1 to G3 and from G3 to G4, respectively. Thus, the whole picture of the 28 states looks quite like the downhill folding of protein. The states in G1 and G2 represent two different kinds of denatured states, with a few μs occupying time. The states in G3 represent the intermediate states of folding. G4 seems the end of simulation dynamics. System enters the region and stays there more than 2.8μs without leaving. Although it is not clear if G4 is still only an intermediate region in much longer simulation, in the paper, we name the states in G4 might constitute the folded-state-like ensemble, partially because the conformational structure looks like the folded one.

The inner structure of two long lifetime states.

The sub-states of the 200ns-order metastable states S1 and S2 in Ala12 can be found at nanosecond or sub-nanosecond scales. With τ = 2ns, we found four sub-states for both S1 and S2. The sample state-indicator curves of the sub-states of S1 and S2 are shown in Supporting Information, S9 and S10 Figs, respectively. (In the pictures, we use Sa(b) to denote the bth sub-state of state Sa.) Compared with S6 Fig, the state-indicator curves of the sub-states indeed reflect the detailed inner-state dynamics of S1 and S2. Meanwhile, the state-indicator curves of the sub-states of S1 show much more prominent roughness than the ones of S2, which reflects the more diffusive nature of S1. The fast and sharp transition between the sub-states of S2 is consistent with our previous results [20]. The representative structures of the sub-states and their inner-relation are shown in S11 and S12 Figs of Supporting Information.

The persuit of reaction coordinate.

Since we have made an analogy of the 28-state transition network to the downhill folding process, it is natural to ask whether there exists certain collective variable that can be used as the reaction coordinate of the system. In Fig 7, we plotted the probability distributions of the 28 metastable states along six collective variables. It can be seen that, while all the states have similar total energy distributions [see Fig 7(a)], their solvation energy distributions show conspicuous heterogeneity [see Fig 7(b)]. The folded-like states (states in G4) have relatively low solvation energy. On the contrary, the intermediate states (states in G3) have relatively high solvation energy. The unfolded states (states in G1 and G2, or might be denoted as partially folded states more exactly) have their solvation energy distributed in between. Therefore, in the “folding” process, the unfolded states are first transformed into more compact form such that the contact between the backbone polar residues and water is reduced. After that, the polar residues are released to solvent again, and the folded-like ensemble is stabilized by solvation energy. Although the solvation energy provides a qualitative standard to differentiate the states of different identities, it is not qualified to be a reaction coordinate. For one thing, the states in different groups have overlapping distributions of solvation energy. For another, the transition from unfolded states to folded states is not monotone along the solvation energy axis.

thumbnail
Fig 7. The distributions of the 28 states along various collective variables.

The selected collective variable include total energy of the system (a), the solvation energy (b), the distance between the two ends of the peptide (c), the RMSD relative to a representative conformation in S4 (d), the first (e) and the second (f) principle component of dihedral angle principle component analysis [22].

https://doi.org/10.1371/journal.pone.0125932.g007

We also tested two commonly used reaction coordinates, the end-end distance between the two ends of Ala12 and the root mean square deviation (RMSD) to S4, where S4 is one of the states in the folded-like ensemble G4. As can be seen from Fig 7(c), the folded states have prominently larger end-end distance than most of the intermediate and unfolded states. Their distributions along the end-end distance axis also show much more variety than the others. On the contrary, the overlapping end-end distance distributions for most of the intermediate and unfolded states suggest that the two terminals of the molecule are closely restrained together in these states. The charged terminal adopted in current simulation help to stabilize such a close end-end contact. Although the end-end distance provides a clear separation of the folded state ensemble and the other states, it still can not provide the correct picture of the reaction process. Besides, the separation is probably owing to the nature of this system and can not be generalized. As shown in Fig 7(d), the RMSD to S4 seems more promising. The states in different groups are partially separated along the RMSD axis. Besides, the folding process happens with decreasing RMSD value. However, there still exist overlapping states from different groups. Thus, projecting the simulation samples to the RMSD axis may also lead to mis-interpreted kinetics.

Finally, we tested the principle components from dihedral angle PCA [22]. The distributions of the 28 states along the first and the second principle components are shown in Fig 7(e) and 7(f) respectively. Due to the strong overlap between states, the principle components can not help to clearly dissect the system into folded, intermediate and unfolded conformational ensembles.

In summary, we found that even for the simple system of Ala12, the state structure in conformational space is very complicated. It is very hard to select a single reaction coordinate to precisely reflect the complexity of the system. Multiple reaction coordinates are usually necessary. In the simple system, we may use two or three well-chosen reaction coordinates to distinguish all the metastable states, or one reaction coordinate may be sufficient in describing the transitions among a part of the metastable states, but generally, the network model shows its superiority for describing the complexity of bio-molecular systems.

The folded-like ensemble.

Now we focus on the folded-state ensemble, i.e. the states in G4 group. All the seven states in G4 lie in the end portion of the fifth 4μs trajectory. The trajectory first entered S28. After a short stay, it moved to a free energy basin containing S4, S7, S16, S17, S20 and S25, and jumped fast between these six states until the end of the simulation. The inter-state transitions between the six states are shown in Fig 8(a). This period lasts for 2.8 μ s. We found there is about 20 percent conformations of the 2.8 μ s trajectory unaccountable by the six metastable states, where none of the state-indicator curves of S4, S7, S16, S17, S20 and S25 is considerably larger than zero [see Fig 8(b)]. Of course, it is possible that we failed to find all the metastable states in this region. However we have tried to use different length τ of trajectory pieces to look for more detailed state structures, and there is no qualitatively change of the results. So it seems that the conformational region shows obvious diffusive behavior out of the six well-defined states. The representative structures of the six states also support the diffusive behavior inferred from the state-indicator curves. As shown in Fig 9, the six representative structures from these states have similar C-terminal structures and versatile N-terminal structures. In all the states, the C-terminal of Ala12 form stable hydrogen bonds with the amine bases in the middle of the chain. Meanwhile, the N-terminal chain is not confined by strong intra-molecular interactions. Such a flexible N-terminal leads to the diffusive-like property. Considering the fast inter-state transition between S4, S7, S16, S17, S20 and S25, it seems that these states as well as the outside diffusive regions connecting them constitute a large metastable state. Actually, the SIP between the first half and the second half of the 2.8 μ s-length trajectory has reached 0.88, which suggests the similarity between the two halves and consequently the local equilibration in this part of simulation trajectory. If the above guess is true, the τeq of the large metastable state should be several micron-second long, which is three orders of magnitude larger than the τeq of its sub-states. The separation of the equilibration timescales between a state and its sub-states generally exists. We also tried to find the sub-states of S1 and S2 with the TM method. The sub-states can only be found at nano-second or sub-nano-second scales (see the supplementary material). Actually, only when there is timescale separation, the τlifes of the sub-states could be prominently smaller than the τeq of the host state, which ensures the consistency of the identified metastable states, see Eq (8).

thumbnail
Fig 8. The state-indicator curves of S4, S7, S16, S17, S20 and S25 along the 5th trajectory.

(a) shows the last 2.8 μ s. (b) provides an enlarged view from 3.1 μ s to 3.4 μ s.

https://doi.org/10.1371/journal.pone.0125932.g008

The characterization of conformational dynamics.

For the transition network established by the TM, the local equilibrium sample of the states are obtained. Therefore, we can calculate the average value of any physical quantity within the states. This information could help to characterize the conformational dynamics of inter-state transitions. We considered the 22 flexible backbone dihedral angles used to define the basis functions in the TM. For each state, we calculated the averaged sine and cosine functions of these dihedral angles, and aligned the 44 values sequentially to form a vector. This vector characterizes the conformations of corresponding state. When comparing the difference between vectors in different states, some clues of transition dynamics could be obtained.

For example, we analyze the three states S3, S9 and S15 as a transition cycle in the transition network (see Fig 6). As shown in Fig 10, upper left panel, the transition between S3 and S9 is mainly induced by the twisting of the N-terminal backbone dihedral angles 1N−1Cα−1C′−2N, 1C′−2N−2Cα−2C′ and 2N−2Cα−2C′−3N, as well as the minor adjustment in the middle of the chain. (Here the name of an atom is composed of two parts, the integer number indicates the residue number, and the letters concretely provide the identity of the atom in a residue. Here N corresponds to the backbone nitrogen atom, Cα corresponds to the α carbon atom, C′ corresponds to the carboxyl carbon atom.) When comparing S3 and S15 (see Fig 10, lower left panel), we found that the difference between S3 and S9 is still preserved, and there is additional major difference at dihedral angles 5N−5Cα−5C′−6N and 5C′−6N−6Cα−6C′. This result suggests that the transition between S9 and S15 is only induced by the local adjustment of these two dihedral angles, which explains the fast transition between the two states shown in Fig 2(b). Meanwhile, the transition between S3 and S9 as well as that between S3 and S15 are more likely to be induced by the collective motion of the whole system. Usually it is hard to figure out the dynamic modes of a complex system by direct visual inspection. The difference graph shown in Fig 10 makes the dynamic modes directly observable, no matter these modes are localized or collective.

thumbnail
Fig 10. The difference graph for characterizing transition dynamics.

The left panel shows the representative structures of states S3, S9 and S15 and the transition relation between these states. The difference graphs in the right panel illustrate the conformational transition between S3 and S9 (upper), as well as between S3 and S15 (lower). As introduced in the main text, the conformations in a metastable state can be characterized by a vector. The elements of the vector are the sine and cosine values of backbone dihedral angles averaged among the conformations in that state. In each difference graph, the horizontal axis marks the 22 backbone dihedral angles of Ala12. The S3S9 graph shows the vector of S3 minus the vector of S9. The S3S15 graph shows the vector of S3 minus the vector of S15.

https://doi.org/10.1371/journal.pone.0125932.g010

Discussion

The trajectory mapping (TM) method and its systematical implementation developed here has wide applicability to data mining of all varieties. The metastable states of bio-molecules as well as their hierarchical organization can be systematically extracted from simulation data. Applying the TM in the long simulation data of alanine-dodeca-peptide, 28 metastable states with various life time and equilibration time were identified. These heterogeneous states could account for more than 90 percent of simulation data, which illustrates the impressive metastability of the model system.

A transition network was established and compared to the downhill folding process of protein. We found that even for this simple model system, there is considerable overlap between metastable states along the commonly used reaction coordinates. Therefore, simply projecting the simulation data to low-dimensional space might unavoidably introduce some artifact in kinetics. Such a finding testifies again the superiority of the transition network representation of bio-molecules. Since the TM also provides the local equilibrium sample of states, it is also possible to figure out the dynamic modes of inter-state transitions.

Benefiting from the rapidly increasing computational power, people are collecting massive detailed simulation data of bio-molecules. Careful analysis of these data can provide a lot of insightful information about the organization style of bio-molecules, which may greatly facilitate the rational engineering of life materials. The transition network representation has been designed to coarse-grain the dynamics of complex bio-molecules, and some methods have been subsequently invented to establish the network from high-dimensional simulation data. Since the temporal information of analytical basis functions is incorporated in the TM, we can directly find the local equilibrium sample of metastable states, which on the one hand facilitates further usage, and on the other hand ensures that the identified states are physically meaningful. The implementation of TM is flexible. It allows researchers to focus on part of the system by only selecting basis functions related to the interesting region.

Applying the TM to existing massive simulation data of proteins is currently ongoing. We would like to mention that although we only focus on the simulation data in this paper, theoretically the SMF data can also be analyzed by the TM. Recently, there have been attempts to simultaneously measure multiple intra-molecular distances in SMF experiment. TM is especially appropriate for handling such kind of multiple-dimensional data.

Supporting Information

S1 Fig. The illustration of the current clustering algorithm in TM.

The trajectory-mapped vectors of an imaginary three-state system are projected to a two-dimensional space. In clustering process, the points in white region will be considered for further clustering, and the ones in shaded region have already been analyzed and will not be considered further.

https://doi.org/10.1371/journal.pone.0125932.s001

(TIF)

S2 Fig. The illustration of the hierarchical analysis scheme.

The left panel of (a), (b) and (c) show the state structure in conformational space at three different levels. The size of a state is determined by its τlife. The transition relation between states is plotted with dotted arrows. The right panel of (a), (b) and (c) show the inter-state transition curve at three different levels. ‘O’ denotes the non-identified regions in simulation trajectory. (d) shows the final picture of the conformational space after identifying the sub-states of S1 and S2.

https://doi.org/10.1371/journal.pone.0125932.s002

(TIF)

S3 Fig. A representative conformation of Ala12 (a) and the orthogonality (SIP) between 4 μ s-length simulation trajectories (b).

Shown are the SIP values without absolute-value manipulation.

https://doi.org/10.1371/journal.pone.0125932.s003

(TIF)

S4 Fig. The orthogonality (SIP) between identified metastable states in Ala12.

The states are found respectively at three levels, 200ns, 20ns and 2ns. (a), (c) and (e) show the SIP values between states found in the same level. (b), (d), (f) show the SIP values between states found in different levels. Shown are the SIP values without absolute-value manipulation.

https://doi.org/10.1371/journal.pone.0125932.s004

(TIF)

S5 Fig. The 28 state-indicator curves along the 1st 4 μ s-length trajectory.

https://doi.org/10.1371/journal.pone.0125932.s005

(TIF)

S6 Fig. The 28 state-indicator curves along the 2nd 4 μ s-length trajectory.

https://doi.org/10.1371/journal.pone.0125932.s006

(TIF)

S7 Fig. The 28 state-indicator curves along the 4th 4 μ s-length trajectory.

https://doi.org/10.1371/journal.pone.0125932.s007

(TIF)

S8 Fig. The 28 state-indicator curves along the 5th 4 μ s-length trajectory.

https://doi.org/10.1371/journal.pone.0125932.s008

(TIF)

S9 Fig. The state-indicator curves of the sub-states of S1 in Ala12 along the 2nd 4 μ s-length trajectory.

The upper panel shows the full curves. The lower panel shows the enlarged view of the region from 3.1 μ s to 3.3 μ s.

https://doi.org/10.1371/journal.pone.0125932.s009

(TIF)

S10 Fig. The state-indicator curves of the sub-states of S2 in Ala12 along the 2nd 4 μ s-length trajectory.

The upper panel shows the full curves. The lower panel shows the enlarged view of the region from 0.0 μ s to 0.2 μ s.

https://doi.org/10.1371/journal.pone.0125932.s010

(TIF)

S11 Fig. The representative structures of the sub-states of S1 and their inter-relation.

The left panel shows the representative structures, the right panel shows the difference graphs between sub-states. The shown graphs are selected to reflect the most localized differences between the sub-states.

https://doi.org/10.1371/journal.pone.0125932.s011

(TIF)

S12 Fig. The representative structures of the sub-states of S2 and their inter-relation.

The left panel shows the representative structures, the right panel shows the difference graphs between sub-states. The shown graphs are selected to reflect the most localized differences between the sub-states.

https://doi.org/10.1371/journal.pone.0125932.s012

(TIF)

S1 Table. The number of trajectory pieces defining the metastable states and the average SIP values.

https://doi.org/10.1371/journal.pone.0125932.s013

(PDF)

S2 Table. The proportion of data accountable by the identified metastable states.

At certain time point, the simulation trajectory is considered as accountable by the identified metastable state only if the summation of state-indicator curves at this time point is larger than 0.9.

https://doi.org/10.1371/journal.pone.0125932.s014

(PDF)

Acknowledgments

The authors thank Yu-Kun Lu, Neng-Ji Zhou, and Prof. Bo Zheng for useful discussions.

Author Contributions

Conceived and designed the experiments: LG XZ ZO. Performed the experiments: LG XZ. Analyzed the data: LG XZ. Contributed reagents/materials/analysis tools: LG XZ. Wrote the paper: LG XZ ZO.

References

  1. 1. Fersht AR. From the first protein structure to our current knowledge of protein folding. Nat Rev Mol Cell Biol. 2008; 9: 650–654. pmid:18578032
  2. 2. Schuler B, Eaton WA. Protein folding studied by single-molecule fret. Curr Opin Struct Biol. 2008; 18: 16–26. pmid:18221865
  3. 3. Jayachandran G, Vishal V, Pande VS. Using massively parallel simulation and markovian models to study protein folding: Examining the dynamics of the villin headpiece. J Chem Phys. 2006; 124: 164902.1–12.
  4. 4. Noe F, Schutte C, Vanden-Eijnden E, Reich L, Weikl TR. Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations. Proc Natl Acad Sci USA. 2009; 106: 19011–19016. pmid:19887634
  5. 5. Huang X, Bowman GR, Bacallado S, Pande VS. Molecular simulation of ab initio protein folding for a millisecond folder ntl9(1–39). J Mol Biol. 2010; 132: 1526–1528.
  6. 6. Best RB, Hummer G, Eaton WA. Native contacts determine protein folding mechanisms in atomistic simulations. Proc Natl Acad Sci USA. 2013; 110: 17874–17879. pmid:24128758
  7. 7. Lindorff-Larsen K, Trbovic N, Piana PMS, Shaw DE. Structure and dynamics of an unfolded protein examined by molecular dynamics simulation. J Am Chem Soc. 2012; 134: 3787–3791. pmid:22339051
  8. 8. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, et al. Atomic-level characterization of the structural dynamics of proteins. Science. 2010; 330: 341–346. pmid:20947758
  9. 9. Hartl FU, Hartl MH. Converging concepts of protein folding in vitro and in vivo. Nat Struct Mol Biol. 2009; 16: 574–581. pmid:19491934
  10. 10. Brockwell DJ, Radford SE. Intermediates: ubiquitous species on folding energy landscapes? Curr Opin Struct Biol. 2007; 17: 30–37.
  11. 11. Bartlett AI, Radford SE. An expanding arsenal of experimental methods yields an explosion of insights into protein folding mechanisms. Nat Struct Mol Biol. 2009; 16: 582–588. pmid:19491935
  12. 12. Marsh JA, Neale C, Jack FE, Choy WY, Lee AY, Crowhurst KA, Forman-Kay JD. Improved structural characterizations of the drkN SH3 domain unfolded state suggest a compact ensemble with native-like and non-native structure. J Mol Biol. 2007; 367: 1494–1510. pmid:17320108
  13. 13. Kohn JE, Millett IS, Jacob J, Zagrovic B, Dillon TM, Cingel N, et al. Random-coil behavior and the dimensions of chemically unfolded proteins. Proc Natl Acad Sci USA. 2004; 101: 12491–12496. pmid:15314214
  14. 14. Dyson HJ, Wright PE. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005; 6: 197–208. pmid:15738986
  15. 15. Chiti F, Dobson CM. Amyloid formation by globular proteins under native conditions. Nat Chem Biol. 2009; 5: 15–22. pmid:19088715
  16. 16. Noe F, Fischer S. Transition networks for modeling the kinetics of conformational change in macromolecules. Cur Opin Struc Biol. 2008; 18: 154–162.
  17. 17. Buchete NV, Hummer G. Coarse master equations for peptide folding dynamics. J Phys Chem B. 2008; 112: 6057–6069. pmid:18232681
  18. 18. Rao F, Caflisch A. The protein folding network. J Mol Biol. 2004; 342: 299–306. pmid:15313625
  19. 19. Gong L, Zhou X. Structuring and sampling complex conformation space: weighted ensemble dynamics simulations. Phys Rev E. 2009; 80: 026707.1–9.
  20. 20. Gong L, Zhou X. Kinetic transition network based on trajectory mapping. J Phys Chem B. 2010; 114: 10266–10276. pmid:20701363
  21. 21. Shea JE, Brooks III CL. From folding theory to folding proteins: a review and assessment of simulation studies of protein folding and unfolding. Annu Rev Phys Chem. 2001; 52: 499–535. pmid:11326073
  22. 22. Mu Y, Nguyen PH, Stock G. Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Protens. 2005; 58: 45–52.
  23. 23. Sims GE, Choi IG, Kim SH. Protein conformational space in higher order ϕ-ψ maps. Proc Natl Acad Sci USA. 2005; 102: 618–621. pmid:15640351
  24. 24. Das P, Moll M, Stamati H, Kavraki LE, Clementi C. Low-dimensional free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction. Proc Natl Acad Sci USA. 2006; 103: 9885–9890. pmid:16785435
  25. 25. Coifman RR, Lafon S. Diffusion maps. Appl Comput Harmon Anal. 2006; 21: 5–30.
  26. 26. Nadler B, Lafon S, Coifman RR, Kevrekidis IG. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl Comput Harmon Anal. 2006; 21: 113–127.
  27. 27. Krivov SV, Karplus M. Hidden complexity of free energy surfaces for peptide (protein) folding. Proc Natl Acad Sci USA. 2004; 101: 14766–14770. pmid:15466711
  28. 28. Maisuradze GG, Liwo A, Scheraga HA. How adequate are one- and two-dimensional free energy landscapes for protein folding dynamics. Phys Rev Lett. 2009; 102: 238101.1–4.
  29. 29. Torda AE, van Gunsteren WF. Algorithms for clustering molecular dynamics configurations. J Comp Chem. 1994; 15: 1331–1340.
  30. 30. Shao JY, Tanner SW, Thompson N, Cheatham TE III. Clustering molecular dynamics trajectories: 1. characterizing the performance of different clustering algorithms. J Chem Theory Comput. 2007; 3: 2312–2334.
  31. 31. Deuflhard P, Huisinga W, Fischer A, Schutte C. Identification of almost invariant aggregates in reversible nearly uncoupled markov chains. Lin Alg Appl. 2000; 315: 39–55.
  32. 32. Weber M. Improved perron cluster analysis. ZIB Report. 2003; 03–04: 1–11.
  33. 33. Prada-Gracia D, Gomez-Gardenes J, Echenique P, Falo F. Exploring the free energy landscape: from dynamics to networks and back. PLoS Comput Biol. 2009; 5: e1000415.1–9.
  34. 34. Gfeller D, DeLosRios P, Caflisch A, Rao F. Complex network analysis of free-energy landscapes. Proc Natl Acad Sci USA. 2007; 104: 1817–1822. pmid:17267610
  35. 35. Noe F, Horenko I, Schutte C, Smith JC. Hierachical analysis of conformational dynamics in biomolecules: transition networks of metastable states. J Chem Phys. 2007; 126: 155102.1–17.
  36. 36. Chodera JD, Singhal N, Pande VS, Dill KA, Swope WC. Automatic discovery of metastable states for the construction of markov models of macromolecular conformational dynamics. J Chem Phys. 2007; 126: 155101.1–17.
  37. 37. Rao F, Karplus M. Protein dynamics investigated by inherent structure analysis. Proc Natl Acad Sci USA. 2010; 107: 9152–9157. pmid:20435910
  38. 38. Bowman GR, Pande VS. Protein folded states are kinetic hubs. Proc Natl Acad Sci USA. 2010; 107: 10890–10895. pmid:20534497
  39. 39. Bowman GR, Meng L, Huang XH. Quantitative comparison of alternative methods for coarse-graining biological networks. J Chem Phys. 2013; 139: 121905.1–9.
  40. 40. Weber JK, Jack RL, Pande VS. Emergence of glass-like behavior in markov state models of protein folding dynamics. J Am Chem Soc. 2013; 135: 5501–5504. pmid:23540906
  41. 41. Pande VS, Beauchamp K, Bowman GR. Everything you wanted to know about markov state models but were afraid to ask. Methods. 2010; 52: 99–105. pmid:20570730
  42. 42. jie Deng N, Dai W, Levy RM. How kinetics within the unfolded state affects protein folding: an analysis based on markov state models and an ultra-long md trajectory. J Phys Chem B. 2013; 117: 12787–12799.
  43. 43. Bowman GR, Beauchamp KA, Boxer G, Pande VS. Progress and challenges in the automated construction of markove state models for full protein systems. J Chem Phys. 2009; 131: 124101.1–11.
  44. 44. Naritomi Y, Fuchigami S. Slow dynamics in protein fluctuations revealed by time-structure based independent component analysis: The case of domain motions. J Chem Phys. 2011; 134: 065101.1–8.
  45. 45. Schwantes CR, Pande VS. Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. J Chem Theory Comput. 2013; 9: 2000–2009. pmid:23750122
  46. 46. Nüske F, Keller BG, Pérez-Hernández G, Mey ASJS, Noé F. Variational Approach to Molecular Kinetics. J Chem Theory Comput. 2014; 10: 1739–1752.
  47. 47. McGibbon RT, Pande VS. Variational cross-validation of slow dynamical modes in molecular kinetics. J Chem Phys. 2015; 142: 124105.1–12.
  48. 48. Lu SJ, Zhou X. Construction of Coarse-Grained Models by Reproducing Equilibrium Probability Density Function. Commun. Theor. Phys. 2015; 63: 10–18.
  49. 49. Qiu D, Shenkin PS, Hollinger FP, Still WC. The gb/sa continuum model for solvation: a fast analytical method for the calculation of approximate born radii. J Phys Chem A. 1997; 101: 3005–3014.
  50. 50. Levy Y, Jortner J, Becker OM. Solvent effects on the energy landscaps and folding kinetics of polyalanine. Proc Natl Acad Sci USA. 2001; 98: 2188–2193. pmid:11226214
  51. 51. Wales DJ. Energy landscapes and properties of biomolecules. Phys Biol. 2005; 2: S86–S93. pmid:16280625
  52. 52. Hovmoller S, Zhou T, Ohlson T. Conformations of amino acids in proteins. Acta Cryst D. 2002; 58: 768–776.
  53. 53. Lee KC, Siegel J, Webb SE, Leveque-Fort S, Cole MJ, Jones R, et al. Application of the stretched exponential function to fluorescence lifetime imaging. Biophys J. 2001; 81: 1265–1274. pmid:11509343