Analytic Markovian Rates for Generalized Protein Structure Evolution

A general understanding of the complex phenomenon of protein evolution requires the accurate description of the constraints that define the sub-space of proteins with mutations that do not appreciably reduce the fitness of the organism. Such constraints can have multiple origins, in this work we present a model for constrained evolutionary trajectories represented by a Markovian process throughout a set of protein-like structures artificially constructed to be topological intermediates between the structure of two natural occurring proteins. The number and type of intermediate steps defines how constrained the total evolutionary process is. By using a coarse-grained representation for the protein structures, we derive an analytic formulation of the transition rates between each of the intermediate structures. The results indicate that compact structures with a high number of hydrogen bonds are more probable and have a higher likelihood to arise during evolution. Knowledge of the transition rates allows for the study of complex evolutionary pathways represented by trajectories through a set of intermediate structures.


Detail Balance
At this stage we can check that we obey detailed balance to do that we will use an approximate form for the Heaviside function that is particularly good for large b or small temperatures If we substitute Eq. (S5) into Eq. (S3) if we rearrange the exponentials It is easy to see that Eq. (S8) and Eq. (S9) are identical and hence detailed balance is respected.

Metropolis acceptance
If we want to sample the distribution P(A)P(B) then we can use the Metropolis method. We start from the detailed balance equation: If we now remember that P A and P B are Boltzmann distributed then we can calculate the transition probabilities and hence our acceptance rule So if we group the exponents together we get that the final acceptance rule depends on the variation of the sum of Hamiltonians at every step.
where r i j is the distance between the C a atoms at the centers of spheres i and j and r max (r max = 12Å) is the distance at which E i j = e i j /2; a = 1/4 is a scale factor. This expression provides a continuous square well form for the sphere-sphere interaction energy. To determine the parameter e i j we made use of the model of Betancourt and Thirumalai (BT) [2], in which the interaction energies were derived from a calculation of the contact frequency in the PDB. This potential had been used primarily for lattice proteins, but it is also appropriate for the caterpillar model, which employs a square-well-like potential. Backbone hydrogen bonds were modeled with a 10-12 Lennard-Jones type potential using the expression [3] E H = e H (cos q 1 cos q 2 ) n " 5 ✓ s where r OH is the distance between the hydrogen atom of the amide group (NH) and the oxygen atom of the carboxyl group (CO) of the main chain. We set s = 2.0Å, e H = 3.1 k B T , and n = 2; the values are given in [3].

VIRTUAL-MOVE PARALLEL TEMPERING
The VMPT scheme is a combination of the adaptive parallel tempering algorithm [4] where W is a bias potential. We are not limited to a single trial swap of state i with a given state j. Rather, we can include all possible trial swaps between the temperature state i and all N 1 remaining temperatures.
Our estimate for the contribution to the probability distribution P i corresponding to temperature i is then given by the following sum where the delta functions select the configurations with order parameter Q. The combination with the parallel tempering is particularly efficiently because the information about the sampled states is shared between all the simulations running at different temperatures, that naturally will tend to explore different regions of the phase space. In the case of proteins it means that at low temperature it is possible to know what bias is needed to reach the unfolded state because that is sampled more often at higher temperatures.