Protein Folding as a Complex Reaction: A Two-Component Potential for the Driving Force of Folding and Its Variation with Folding Scenario

The Helmholtz decomposition of the vector field of probability fluxes in a two-dimensional space of collective variables makes it possible to introduce a potential for the driving force of protein folding [Chekmarev, J. Chem. Phys. 139 (2013) 145103]. The potential has two components: one component (Φ) is responsible for the source and sink of the folding flow, which represent, respectively, the unfolded and native state of the protein, and the other (Ψ) accounts for the flow vorticity inherently generated at the periphery of the flow field and provides the canalization of the flow between the source and sink. Both components obey Poisson’s equations with the corresponding source/sink terms. In the present paper, we consider how the shape of the potential changes depending on the scenario of protein folding. To mimic protein folding dynamics projected onto a two-dimensional space of collective variables, the two-dimensional Müller and Brown potential is employed. Three characteristic scenarios are considered: a single pathway from the unfolded to the native state without intermediates, two parallel pathways without intermediates, and a single pathway with an off-pathway intermediate. To determine the probability fluxes, the hydrodynamic description of the folding reaction is used, in which the first-passage folding is viewed as a steady flow of the representative points of the protein from the unfolded to the native state. We show that despite the possible complexity of the folding process, the Φ-component is simple and universal in shape. The Ψ-component is more complex and reveals characteristic features of the process of folding. The present approach is potentially applicable to other complex reactions, for which the transition from the reactant to the product can be described in a space of two (collective) variables.

Introduction divergence-free field, accounts for the flow vorticity generated at the periphery of the flow field and provides the canalization of the flow between the source and sink within the reaction channel.
As such, both the hydrodynamic description of the folding dynamics and the decomposition of the vector field of probability fluxes into divergence-free and curl-free fields involve nothing that is specific of protein folding. Therefore the present approach is potentially applicable to other complex reactions that can be described by two (collective) variables, e.g., unimolecular chemical reactions [21] and conformational transitions in nanoclusters [22]. One recent example of a related approach is the calculation of a reaction path in the framework of the transition-path theory (E and Vanden-Eijnden [23]), in which the flow lines of the probability current of the reactive trajectories were introduced as the lines orthogonal to the isocommitor probability lines. The resulting pictures of the lines obtained in [23] for the model Müller and Brown potential [24] and in our work for folding of a β-hairpin protein [17] are very similar: the C-isolines can be likened to the flow lines, and F-isolines to the isocommitor probability lines, although the former, in contrast to the latter, are not mutually orthogonal [17].
The β-hairpin protein we studied in [17] had a very simple folding scenario: a single folding pathway, no intermediates and, correspondingly, two-state kinetics. It is therefore of interest to see how the potential for the driving force changes when the folding scenario becomes more complex, i.e., involves intermediates and additional pathways. A natural approach to this problem would be to consider proteins that have the corresponding complicated scenarios and compare the results to those for the β-hairpin protein [17]. However, such an approach does not seem to be optimal. First, it is difficult to choose the proteins that would have the desired folding scenarios without secondary effects complicating the comparison, e.g., a protein that had parallel folding pathways and no intermediates. Secondly, the chosen proteins should allow the simulations at a reasonable computational cost, because at least the order of thousand of folding trajectories is required to have statistically significant probability fluxes [17]. This restricts the class of possible protein models to simplified, coarse-grained models, such as a C αmodel used in [17]. At the same time, to describe, e.g., off-pathway intermediates, an all-atom approach is generally required, because such intermediates are typically associated with either disulfide bridges (BPTI [25]), or salt-induced states (S6 [26]), or cis prolines (CheY [27]).
Another possible approach is to use model potential energy surfaces that make it possible to generate molecular dynamics trajectories mimicking the protein folding trajectories projected onto a two-dimensional space of collective variables. These trajectories should allow both the calculation of the probability fluxes, which are necessary to determine the potential for the driving force, and the characterization of the process in terms accepted in the protein folding studies, in order to relate the potential to a specific folding scenario. For this purpose, we, similar to [23], employ the two-dimensional Müller and Brown potential [24], which is widely used to test numerical methods for calculation of reaction paths in many-body systems, e.g., [28,29]. In addition to a simple system (one "folding" pathway without intermediates), we will consider two more complex systems: one has two folding pathways, and the other has an off-pathway intermediate. We will show that in all cases the F-component of the potential is simple and universal in shape, while the C-component is more complex and reveals characteristic features of the folding process.
The paper is organized as follows. The next section describes methods: the generation of the PESs and their relation to protein folding, the simulation method, calculating the FESs, a short description of the hydrodynamic approach, and determining the two-component potential with the Helmholtz decomposition of the vector flow field. The subsequent section contains the obtained results and their discussion: for the basic system, a system with two folding pathways, and a system with an off-pathway intermediate. The last section summarizes the results of the work and gives concluding remarks.

Model potential energy surfaces and their relation to protein folding
The Müller and Brown function [24], which we use to determine a two-dimensional (2D) potential energy surface (PES), is written as where x 0i and y 0i are the coordinates of the center of i potential well, a i , b i and c i characterize the steepness of the well sides, and A i determines the depth of the well. The summation in this equation is taken from 2 to 4 depending on the specific form of the PES (Tables 1-3 below). Each point in the (x, y) space in Equation (1) is assumed to represent a conformation of some (imaginary) protein in a 2D space of collective variables; the coordinate x is to characterize the proximity of the protein conformation to the native state, and the coordinate y-the compactness of the protein. The deepest well on the PES is to account for the native-like states, and the shallowest well-for the unfolded states, which include both the completely unfolded (extended) and partly unfolded (semi-compact) conformations. Having this in mind, we will refer to the states of the present model system in the (x, y) space to as protein states, omitting quotation marks. Two-dimensional PESs are evidently too simple to reproduce the protein folding dynamics in multidimensional conformation space. A principal shortcoming of such surfaces is that the  contribution of conformational entropy is unreasonably reduced. In protein folding, when the reaction is projected onto a two-dimensional space of variables, the interplay between the potential energy and entropy leads to a barrier separating the unfolded and native-like states. On the barrier, the energy directs the protein to the native state, because its energy is minimal, but entropy pulls it back, because the number of the unfolded states is much higher that the number of native-like states (in some cases, the entropy of the native state can contribute to its stability [30], however, the overall picture of the process remains unchanged). In a 2D space, the appropriate difference in the number of the unfolded and native-like states is hardly possible to reproduce. Therefore, to mimic folding dynamics, a 2D PES should have a barrier that separates the unfolded and native-like states, although such (energy) barriers are not characteristic of protein folding. In other words, the PES should consists of, at least, two wells, one for the unfolded states and the other for the native-like states. However, if we are interested in the resulting MD trajectories rather than in the mechanism of protein folding, as in the present study, it is, in principle, insignificant what is the nature of the barrier that the system has to overcomeentropic or energetic. What is important here is that the obtained MD trajectories should allow consistent reproduction of the characteristic properties of the folding reaction in a 2D space of collective variables that are essential for the goal of the present study; they are the vector field of the local flows of conformational transitions (probability fluxes)-to calculate the potential for the driving force, and the distribution of the first-passage times and the FES landscape-to relate the results obtained with the given 2D PES to a specific folding scenario.

Simulation method and protocols
The simulations were performed using the constant-temperature MD based on the Langevin equation where r is the radius-vector of the (x, y) point, U is the potential energy of the system given by Equation (1), F is a random force generated by the "surroundings" at temperature T, γ is the friction coefficient that introduces viscosity of the surroundings to balance the random force and dissipation, and the system's mass is set to unity. The random forces have the Gaussian distribution with zero mean and variance hF i (t)F i 0 (t + τ)i = 2γTδ ii 0 δ(τ), where the angular brackets denote an ensemble average, the index at F stands for the vector component, and δ ii 0 and δ(τ) are the Kronecker and Dirac delta functions, respectively. The temperature is measured in the energy units (the Boltzmann constant k B = 1). The equation was numerically integrated using the algorithm of Biswas and Hamann [31] with the time step Δt = 0.001τ and γ = 3/τ (τ is the time scale equal to 1). In each case, 1 × 10 4 folding trajectories were simulated at a temperature at which the folding kinetics were as desired (two-states or three states).

Free energy surface
To construct the FES, the probability for the system to be at a current point of the (x, y) space, p(x, y), was calculated and then, using the Boltzmann hypothesis, was converted into the free energy Gðx; yÞ ¼ ÀT ln pðx; yÞ ð 3Þ The temperature is measured in the Boltzmann constant units.

Probability fluxes
The probability fluxes, determining the local rates of transitions between the points of the (x, y) space, were calculated according to the hydrodynamic description of protein folding [12]. The x-component of the flux j(r) was calculated as where M is the total number of simulated trajectories, t f is the mean first-passage time (MFPT), n(r@, r 0 ) is the number of transitions from state r 0 to r@, and r & r Ã is a symbolic designation of the condition that the transitions included in the sum have the straight line connecting points r 0 to r@, which crosses the line x = const within the segment of the length of Δy centered at the point r. The y-component of j(r) is determined in a similar way, except that one selects the transitions crossing the line y = const within the current segment Δx. The calculations were performed on a grid with discretization Δx = Δy = 0.01.

Potential for the driving force
To determine the potential for the driving force, the Helmholtz decomposition theorem [20] was used, similar to [17]. According to this theorem, any smooth vector field j(r) can be uniquely represented as where j cf is the curl-free field, and j df is the divergence-free field, i.e., r × j cf = 0 and r Á j df = 0. Due to the conditions the vectors j cf and j df satisfy, they can be written as where F = F(r) is the potential for the curl-free component, C = C(r) is the potential for the divergence-free component, and k x and k y are the unit vectors for x and y, respectively. Substituting the expressions for j cf and j df from Equations (6) and (7) into Equation (5) and regrouping the terms, we have The function F(r) and C(r) obey the corresponding Poisson's equations. Taking into account Equation (6) and the condition r Á j df = 0, the dot product of the gradient vector and the probability flux j determined by Equation (5) gives where q(r) is the distribution of sources and sinks that is calculated from the simulated probability fluxes j(r) as The function F(r) should satisfy Equation (9) within a region of the r space in which q(r) 6 ¼ 0 and have F(r) = 0 at its boundary (the Dirichlet problem; see, e.g., Sobolev [32]). Similarly, the cross product of the gradient vector and j, with taking into account Equation (7) and the condition r × j cf = 0, gives where is the simulated distribution of vorticity. As previously, the corresponding boundary condition should be satisfied, i.e., it should be C(r) = 0 at the boundary of the region where C(r) is determined.
To find the F(r) and C(r), it is convenient to replace the Dirichlet problem with the initialboundary-value problem (for the other methods of numerical solution of the former, see, e.g., Dorr [33]). In this approach, Equations (9) and (11)  where t plays a role of time. In each case, starting with some initial distribution, the process is continued until a stationary distribution is achieved, which approximates the solution of the corresponding the Dirichlet problem. In the present paper, we integrated Equations (13) and (14) numerically, using a finite-difference scheme of the first-order approximation for the time derivative and the second-order approximation for the space derivatives. To avoid boundary effects, the region of integration, which was the same for F(r, t) and C(r, t), was enlarged on each side by 0.2 in comparison with the region where the simulations were performed. The level of disretization of the r space was as in Equation (4). At the boundaries of this, enlarged region there was F(r, t) = 0 and C(r, t) = 0 for Equations (13) and (14), respectively, and the initial distributions were taken to be F(r, 0) = 0 and C(r, 0) = 0. The stationary solutions of Equations (13) and (14) were considered to be achieved if the conditions { R [4F(r, t)/q(r) + 1] 2 dxdy} 1/2 1 × 10 −10 and { R [4C(r, t)/ω(r) + 1] 2 dxdy} 1/2 1 × 10 −10 , respectively, were satisfied for the last one third of the total integration time. We note that Equation (8) allows determining the functions F(r) and C(r) directly from the simulated probability fluxes j(r). For this purpose, the least-squares functional Q = R [(j x + @F/@x − @C/@y) 2 + (j y + @F/@y + @C/ @x) 2 ]dxdy, where j x and j y are the simulated probability fluxes, can be minimized with respect to F(r) and C(r) [17]. Both these approaches lead to essentially the same results.
According to Equations (6) and (7), the probability fluxes are proportional to the first-order space derivatives of the potential functions F(r) and C(r), i.e., the fluxes are considered to be drift fluxes. In other words, although the inertia term was present in the Langevin Equation (2), the functions F(r) and C(r) determined from the resulting simulated fluxes j(r) via Equations (5)-(7) account for an overdamped motion. The corresponding kinetic equation is the Smoluchowski equation @p/@t + r Á J = 0, where p(r, t) is the probability density, and J is the probability current, which can be written as is the diffusion coefficient, and F(r) is the driving force (the Boltzmann constant is equal to unity). The zero probability current J = 0 assumes detailed balance, and, correspondingly, a curl-free flow. In this case, the equality J = 0 leads to the stationary equilibrium (Boltzmann) distribution p(r) * exp [−G(r)/T], where G(r) is the free energy that exerts the driving force F(r) = −rG(r). However, if the probability current is nonzero, i.e., as in the present case, when the steady flow from the unfolded states to the native state exists, the stationary solution is determined by the condition r Á J = 0, so that J can have a curl component (van Kampen [34]). Such flow is non-equilibrium and is characterized by "irreversible circulation" or "cyclic balance", which can be considered as a measure of deviation from detailed balance [35][36][37]. In the present case, the circulating flow is represented by the flux vector j df and, according to Equation (7), is generated by the C-component of the potential.
We note that the functions F(r) and C(r) in Equations (6) and (7) differ from the conventional potentials in that the driving forces determined by these functions are density weighted. Conventionally, as, e.g., in the Smoluchowski equation above, the probability flux is determined as j(r) * F(r)p(r), so that the driving force F(r), which is equated with the negative gradient of the potential, is proportional to the velocity of motion v(r) = j(r)/p(r). In contrast, Equations (6) and (7) assume that the gradients of the potentials are immediately related to the probability fluxes. The reason why the conventional definition is unsuitable in the present case is that the probability density p(r) varies across the flow field very strongly, i.e., the folding fluid is highly "compressible". Because of this, as it follows from the equality r Á v(r) = r Á j(r)/ p(r) + j(r) Á r[1/p(r)], the velocity field is not divergence-free in a region which is free of flow sources and sinks, in contrast to the field of the probability fluxes, for which r Á j(r) = 0. The last term in the right-hand side of the above equality presents the sources/sinks that arise due to the compressibility of the folding fluid. The appearance of such artificial sources and sinks destroys the natural separation of the folding flow into the divergence-and curl-free components and complicates the interpretation of the F(r) and C(r) functions (see below). Therefore, we determine the potential through the fluxes, according to Equations (6) and (7), but consider the driving force to be density weighted, i.e., FðrÞ ¼ pðrÞFðrÞ, whereFðrÞ is the actual driving force.

Two-well landscape with a single pathway
This case corresponds to the simplest, two-state kinetics of protein folding, which are characteristic of small proteins [38]. Table 1 gives the values of the parameters determining the potential energy function, Equation (1), and Fig. 1a shows the corresponding PES. The simulations were performed at T = 3.0. The native state was associated with the point x n = 0.75, y n = 0.2. The MD trajectories were initiated in the vicinity of the point x u = 0.25, y u = 0.78 with a uniform random scattering of the points within jΔ u xj = jΔ u yj = 0.1; these points were intended to mimic the completely unfolded (extended) protein states. The native state was considered to be reached if the deviation from the native state was not larger than jΔ n xj = jΔ n yj = 0.01. The same conditions to choose the initial points (the scattering of the points) and to terminate the trajectories in the native state were used for the other systems. Fig. 1b shows the FES. It reveals two basins of attraction that are connected by a free energy valley, which plays a role of the reaction channel. One basin, which is located at lower values of x, can be associated with unfolded conformations. For the most part, these conformations should be considered as partly unfolded (semi-compact), because extended conformations rapidly transform into semi-compact ones [11]. The other basin corresponds to native-like conformations. Since the trajectories were terminated upon reaching the native state, this basin is not so well formed as it would be under the equilibrium folding conditions, when the protein resides in the native basin until it unfolds (cf., e.g., the FESs for the first-passage and equilibrium folding of the three-stranded β-sheet protein [14]). At the same time, the free energy barrier between the unfolded and native-like states is rather well formed (x % 0.57).
The distribution of the first passage times is depicted in Fig. 1c. The distribution is close to single-exponential, evidencing that the kinetics are two-state. The MFPT t f is equal to % 1 × 10 4 . Fig. 1d shows the vector field of the probability fluxes. For the illustration purpose, the lengths of the vectors are rescaled as j scl = q(j/j)j 1/4 , where j is the original vector determined by Equation (4), j is its length, and q is the factor equal to 0.125. As is seen, the complex motion of the system that exists in the basin associated with unfolded states changes to a regular motion in the transition region between this and native basins, i.e., in the region which contains the free energy barrier. In this respect, the present vector flow field is similar to what was observed for an α-helical hairpin, which also had a single folding pathway and two-state kinetics [12]. A vortex observed in the native basin is mainly due to the strong condition we used to terminate the trajectories (jΔ n xj = jΔ n yj = 0.01). Because of it, the system had to explore a large portion of the native-like states until the native state was found with the required accuracy. For actual protein folding, when the multidimensional conformation space is projected on a two-dimensional space of collective variables, this effect is not present [12,17]. If the values of Δ n x and Δ n y were taken to be comparable to the size of the native basin (jΔ n xj = jΔ n yj * 0.1), the vortex was not formed. Instead, a region around the native state was created, into which the trajectories did not penetrate, i.e., the trajectories ended at the boundary of this region. We also note that since every trajectory initiated in an unfolded state reached the native state, and was terminated in it, the total flow from the unfolded states to the native state G(x) = R j x (x, y)dy is constant; it is equal to the reciprocal MFPT 1= t f [see Equation (4)]. Fig. 1e and 1f show, respectively, the distributions of the flow divergence q(r) and its vorticity ω(r), which are determined by Equations (10) and (12). Similar to the probability fluxes (Fig. 1d), for the illustration purpose, the values of ω(r) are rescaled as ω scl = (ω/jωj)jωj 1/4 . Fig. 1e shows that except for the local regions where the trajectories were initiated and terminated, i.e., the source and sink regions, the flow is divergence-free. The vorticity, in contrast, is nonzero across the entire flow field (Fig. 1f). First of all, it is seen that the motion of the system in the basin for the unfolded states is not completely disordered-a set of the vortices of opposite sign exists there (the negative vorticity corresponds to the clock-wise motion, and the positive vorticity to the counterclockwise motion). Secondly, the vorticity has different signs on different sides of the reaction channel, because the intensity of the flow decreases toward both sides of the channel (Fig. 1d). As has been mentioned in [17], this decrease of the flow intensity toward the channel sides is a natural phenomenon, because the lower the probability to visit some region of the conformation space (in the present case, a side of the channel), the smaller the flows in this region; similar nonuniform distributions of the flows were previously observed for an α-helical hairpin [12], SH3 domain [15], and β-hairpin [17]. Therefore, the vorticity of this type is an intrinsic property of the folding dynamics.
The calculated functions F(r) and C(r) are shown in Fig. 2a and 2b, respectively. In agreement with the Helmholtz decomposition [20], F(r) accounts for the source and sink of the folding flow, and C(r) for the vorticity effects. The surface for the F-component is simple; it is characterized by two peaks of different sign-one (positive) peak corresponds to the source of the folding flow, and the other (negative) peak represents the sink of the flow. Poisson's Equation (9), in which q(r) > 0 in the source region, q(r) < 0 in the sink region, and q(r) = 0 in the rest of the r space, can be viewed as an equation that describes the stationary diffusion of some substance of density F(r) between the source and sink. It is evident that the substance ejected from the source will spread over the surface in all directions. Therefore, to canalize the flow between the source to sink, an addition force is required. This force is produced by the C-component of the potential. Two properties of the C(r) surface are noteworthy. The first one is that in the region between the source and sink, where the F(r) surface is relatively flat, two parallel ridges of different sign are formed that connect the source and sink. More clearly, it is seen from Fig. 2c, where characteristic values of the C-component in the regions corresponding to the ridges are indicated: the C-component decreases from the periphery of the flow field, where it is zero (Fig. 2b), to approximately −2 × 10 −5 in region 1 (the first ridge) and then Two-Component Potential for the Driving Force of Folding increases to 5 × 10 −5 in region 2 (the second ridge), with a subsequent decrease to zero at the opposite side of the flow field. The second property of the C(r) surface is the presence of (two) peaks of different sign in the region for the unfolded states, which are due to the local vortices formed in this region (Fig. 1f). Such vortices were observed for the α-helical hairpin [12], SH3 domain [15] and three-stranded β-sheet protein [14], as a result of partial folding and unfolding of the protein in the attempts to overcome the barrier to the native state. In should be noted that in contrast to the presence of parallel ridges of different sign in C(r), which is typical of the folding process [17], the appearance of the peaks in C(r) depends on the value of the free energy barrier between the unfolded and the native states. If the barrier is large enough, the system dwells in the basin for the unfolded states performing a circulating motion, as in the present case; then the peaks are formed. However, if the barrier is low, so that the system easily overcomes it, the peaks are absent [17]. Fig. 2c depicts the equipotential lines F(r) = const and C(r) = const. In each family, the lines are shown with the interval of 0:1= t f . In the intermediate region between the source and sink, where the F(r) surface is relatively flat, the C(r) isolines are directed along the probability fluxes [17], i.e., each of them presents a streamline of the flow. Every two adjacent streamlines form a stream tube, so that the (ten) stream tubes between the C(r) isolines in this region contain the total flow from the source to the sink [see Equation (4)]. As has been mentioned in the Introduction, in contrast to [23], in which the probability current lines (streamlines) were introduced as the lines orthogonal to the isocommitor probability lines, the sets of the F(r) = const and C(r) = const lines are generally not mutually orthogonal because the folding flow is not curl-free; the orthogonality of these lines would require the flow to be both divergence-and curl-free [13] (see the discussion of this issue in [17]).
As has been previously indicated, the velocity field v(r) = j(r)/p(r) for a compressible folding fluid, i.e., for the probability density p(r) 6 ¼ const, is not divergence-free. The calculations show that if the velocities are used instead of the fluxes in Equations (6) and (7), the F and C components of the potential presented in Fig. 2 change very considerably. The main reason is that the probability density p(r), which has a large value in the basin for the unfolded states, drastically decreases toward the native state. As a result, the velocity of the flow v(r) increases, and the effective source of the flow, determined by the velocity, shifts toward the flow sink. Correspondingly, the positive peak in F(r) shifts toward the negative peak, and the ridges between the source and sink regions in C(r) become considerably shorter. In other words, both F(r) and C (r) surfaces lose their direct connection with the physical picture of the process.

Two-well landscape with two pathways
One complication of the folding scenario described in the previous section is the presence of multiple folding pathways [39,40]. We will consider the simplest case when there are two independent pathways, as, e.g., in the B domain of protein A due to the symmetrical backbone structure of the protein (a three-helix bundle) [41,42], or in the fyn SH3 domain, where the fast folding trajectories are organized in two essentially independent routes due to the reverse order of formation of the contacts between the β1 and β5 strands and the RT-loop and the β4 strand [15]. The values of the parameters determining the potential energy function in the present case are given in Table 2, and the corresponding PES is shown in Fig. 3a. The temperature at which the simulations were performed was T = 1.5. The native state was associated with the point x n = 0.65, y n = 0.2, and the MD trajectories were initiated in the vicinity of the point x u = 0.35, y u = 0.8. The MFPT is t f % 2:2 Â 10 3 . Figs. 3b-3f and 4a-4c show the simulation results, akin to the corresponding panels of Figs. 1 and 2.

Three-well landscape with an off-pathway intermediate
Another typical complication of the basic folding scenario described previously is the presence of off-pathway intermediates, which lead to a deviation from two-state kinetics [11]. When comparable with the native state in the life-time, such intermediates can play a role of "latent" states [44]. Table 3 gives the parameters of the potential energy function we used to construct the model PES in this case, and Fig. 5a depicts the PES. The simulations were performed at T = 1.25. The native state was associated with the point x n = 0.75, y n = 0.15, and the MD trajectories were initiated in the vicinity of the point x u = 0.25, y u = 0.8. The off-pathway intermediate is presented by a basin centered at x = 0.75, y = 0.65. The MFPT is t f % 2:5 Â 10 4 . The simulation results are shown in Figs. 5b-5f and 6a-6c.
In accord to the FES (Fig. 5b), Fig. 5c indicates that the kinetics are double-exponential; the MFPTs for the fast and slow trajectories are equal to approximately 1.5 × 10 4 and 1.5 × 10 5 , respectively. In the fast trajectories, the system goes from the unfolded states to the native state directly, while in the slow trajectories, it visits the intermediate (possibly, several times) before proceeding to the native state. The directed motion toward the native state is accompanied by a gradual change of the flow vorticity across the reaction channel (Fig. 5f), similar to what was observed for the systems considered in the previous sections. Accordingly, a pair of parallel ridges is formed that connect the source (unfolded states) and the sink (native state) of the flow (cf. Figs. 6b and 5d). At the same time, the reaction channel between the unfolded states and the intermediate does not possess these properties, because the flow toward the intermediate is compensated by the backward flow. Correspondingly, the intermediate presents neither a source nor sink of the flow, which is reflected in the distribution of divergence (Fig. 5e) and the F-component (Fig. 6a), and the C(r) surface does not have ridges connecting the unfolded states and the intermediate (Fig. 6b). Also note that a set of vortices is formed in the basin associated with the intermediate (Fig. 5f), similar to the basins for the unfolded states in this (Fig. 5f) and previous systems (Figs. 1f and 3f). These vortices lead to formation of a set of peaks of positive and negative sign that the C-component has in the intermediate basin.

Conclusions
Using model potential energy surfaces generated with the 2D Müller and Brown potential [24], three characteristic scenarios of protein folding have been considered: a single pathway from the unfolded to the native state without intermediates (two-state kinetics, a basic scenario), two parallel pathways without intermediates (two-state kinetics), and a single pathway with an offpathway intermediate (three-state kinetics). It has been shown that despite a reduced contribution of conformation entropy, the 2D surfaces enable consistent reproduction of the characteristic properties of the folding reaction in a 2D space of collective variables that are essential for the goal of the present study, i.e., the vector field of the probability fluxes, which is used to calculate the potential for the driving force, and the distribution of the first-passage times and the FES landscape, which are necessary to relate the results obtained with the 2D PES to a specific folding scenario. To calculate the probability fluxes, the hydrodynamic description of the folding reaction [12] was employed. In it, the process of the first-passage folding is viewed as a steady flow of the representative points of the system from the unfolded to the native state. One essential property of the hydrodynamic approach is that the Helmholtz decomposition [20] of the vector field of the probability fluxes makes it possible to introduce the potential for the driving forces of folding [17]. This potential has two components, F(r) and C(r); the Fcomponent accounts for the sources and sinks of the folding flows, and the C-component for the flow vorticity effects. Both components obey Poisson's equations; in the equation for F(r), the source/sink term represents the distribution of the flow sources and sinks, and in the equation for C(r), the distribution of the flow vorticity.
The consideration of the above mentioned scenarios has led us to the following conclusions: Despite all possible complexity of the folding process, the F-component of the potential is simple and universal in shape. It has two peaks of different sign; one (positive) peak corresponds to the source of the folding flow, representing the unfolded states in which the folding trajectories are initiated, and the other (negative) peak corresponds to the sink of the flow, representing the native state, where the trajectories are terminated. The C-component is more complex and depends on the specificity of the folding process. The most notable property of the C(r) surface is the presence of parallel ridges of different sign that connect the source and sink of the flow; if there are several independent pathways, the corresponding pair of such ridges is formed for each of them. These ridges are the result of the flow vorticity, which gradually changes across the free energy valley connecting the unfolded and native states (the reaction channel) and has different signs at the sides of the valley, because the folding flow weakens toward both sides. The components F and C act in concord, i.e., in the region between the source and sink they are such that the fluxes generated by these components contribute to the directed flow from the source to the sink, while in the regions behind the source and sink they compensate each other. In the region between the source and sink, where the F(r) surface is relatively flat, the C-component plays a primary role, providing the canalization of the flow within the reaction channel. Another, although not so characteristic [17], property of the C(r) surface is the possible presence of peaks of different sign in the regions where the system dwells performing a circulating motion, as it happens in the basins for the unfolded states and intermediates. The formation of these peaks depends on the heights of the barriers to leave these basins, i.e., if the system leaves the basin easily, the peaks are not formed.
For a system with many degrees of freedom, the present potential reflects the complexity of the reaction only in the part that is captured by two collective variables. For protein folding, the practice of application of the principal component analysis and other methods of dimensionality reduction (for a review, see, e.g., [18]) suggests that the general picture of the reaction, i.e., the dominant folding pathways and long-living intermediates, will likely be captured in a twovariable description. At the same time, some specific features of the reaction, which require a more detailed characterization of the process, may remain hidden, e.g., the conformational heterogeneity of the denatured state [45,46] and the transitions between intermediate conformations [47]. In these cases, another pair of collective variables, which reveal the corresponding elements of the reaction, can be used to see how the potential changes, similar as the FESs for different pairs of the variables are constructed to make the folding picture clearer [47,48].
The basic components of the present approach are the calculation of the vector field of the probability fluxes in a two-dimensional space of orthogonal variables and its subsequent decomposition into divergence-free and curl-free fields. None of them is specific of protein folding. Therefore, the approach is potentially applicable to any complex reaction, for which the transition from the reactant to the product can be described by two (collective) variables.