^{*}

Conceived and designed the experiments: AEL YIW EVK. Performed the experiments: AEL. Analyzed the data: AEL YIW EVK. Contributed reagents/materials/analysis tools: AEL. Wrote the paper: AEL EVK.

The authors have declared that no competing interests exist.

Experimental studies on enzyme evolution show that only a small fraction of all possible mutation trajectories are accessible to evolution. However, these experiments deal with individual enzymes and explore a tiny part of the fitness landscape. We report an exhaustive analysis of fitness landscapes constructed with an off-lattice model of protein folding where fitness is equated with robustness to misfolding. This model mimics the essential features of the interactions between amino acids, is consistent with the key paradigms of protein folding and reproduces the universal distribution of evolutionary rates among orthologous proteins. We introduce mean path divergence as a quantitative measure of the degree to which the starting and ending points determine the path of evolution in fitness landscapes. Global measures of landscape roughness are good predictors of path divergence in all studied landscapes: the mean path divergence is greater in smooth landscapes than in rough ones. The model-derived and experimental landscapes are significantly smoother than random landscapes and resemble additive landscapes perturbed with moderate amounts of noise; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. We suggest that smoothness and the substantial deficit of peaks in the fitness landscapes of protein evolution are fundamental consequences of the physics of protein folding.

Is evolution deterministic, hence predictable, or stochastic, that is unpredictable? What would happen if one could “replay the tape of evolution”: will the outcomes of evolution be completely different or is evolution so constrained that history will be repeated? Arguably, these questions are among the most intriguing and most difficult in evolutionary biology. In other words, the predictability of evolution depends on the fraction of the trajectories on fitness landscapes that are accessible for evolutionary exploration. Because direct experimental investigation of fitness landscapes is technically challenging, the available studies only explore a minuscule portion of the landscape for individual enzymes. We therefore sought to investigate the topography of fitness landscapes within the framework of a previously developed model of protein folding and evolution where fitness is equated with robustness to misfolding. We show that model-derived and experimental landscapes are significantly smoother than random landscapes and resemble moderately perturbed additive landscapes; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. Thus, the smoothness and substantial deficit of peaks in fitness landscapes of protein evolution could be fundamental consequences of the physics of protein folding.

One of the most intriguing questions in evolutionary biology is: to what extent evolution is deterministic and to what extent it is stochastic and hence unpredictable? In other words, what happens if “the tape of evolution is replayed:” are we going to see completely different outcomes or the constraints are so strong that history will be repeated

The most thoroughly characterized feature of empirical fitness landscapes is the structure near a peak. In experiments that examine the peak structure, a high fitness sequence is typically subjected to either random mutations or an exhaustive set of mutations at a small number of important sites. The resulting library of mutants is then assayed to measure a proxy of fitness

Another broad class of experiments probes the evolutionary trajectories from low to high fitness. Usually, in such experiments, a random peptide is subjected to repeated rounds of random mutagenesis and purifying selection

A different type of landscapes has been explored in various microarray experiments where protein-DNA(RNA) binding affinity serves as the proxy for fitness

Empirical studies that exhaustively sample a region of the fitness landscape allow one to actually assess the accessibility of the entire set of theoretically possible evolutionary trajectories in a particular (small) area of the fitness landscape. For example, all mutational paths between two states of an enzyme, e.g., the transition from an antibiotic-sensitive to an antibiotic resistant form of

Recent analyses of fitness data have revealed dense networks of genetic and molecular interactions responsible for the substantial ruggedness and sign epistasis of empirical fitness landscapes

Here we focus on the question of the predictability of mutational paths which is intimately tied to the ruggedness/smoothness of the fitness landscapes. The study of random landscapes of low dimensionality revealed an intuitively plausible negative correlation between the roughness of a landscape and the availability of pathways of monotonic fitness

To gain insights into the structure of the fitness landscapes of protein evolution and in particular the accessibility of mutational paths we used a previously developed simple model of protein folding and evolution

We build on the efforts of Carneiro and Hartl

Carneiro and Hartl compared small random landscapes to several empirical fitness landscapes using deviation from additivity as a measure of roughness

Because roughness of a multidimensional landscape with variable degree connectivity is not an intuitive concept, we introduce three additional quantitative measures to probe alternative facets of the concept of roughness. First, local roughness is the root mean squared difference between the fitness of a point and its neighbors, averaged over the entire landscape. As defined, local roughness conflates the measures of roughness and “steepness.” For example, a globally smooth landscape, in which fitness depends only on the distance from the peak, will have a non-zero local roughness. However, because there is a large number of directions that change the distance from the peak by one, the local roughness of a globally smooth landscape will be vanishingly small. In addition, our landscapes tend to be globally flat–so that the average decrease in fitness due to a single mutation step away from the main peak is much smaller than the local fitness variability–everywhere except a small region around the main peak (see

Second, the fraction of peaks is the number of points with no fitter neighbors divided by the total number of points in the landscape. A strictly additive landscape has a single peak

Third, the roughness of a landscape can be assessed by identifying its tree component. The tree component is the set of all nodes with no more than one neighbor of higher fitness. Thus, the tree component includes peaks and plateaus. Monotonic fitness paths along the tree component form a single or several disjoint tree structures without loops. In the limit of high selection pressure, a mutational trajectory that finds itself on the tree component has a single path to the nearest peak or plateau, i.e. evolution on the tree component is completely deterministic. We use the mean distance to the tree component, i.e. the distance to the tree component averaged over the landscape, as a measure of roughness. In a fully additive landscape, only the peak sequence and its immediate neighbors belong to the tree component and therefore the mean distance to the tree component is a measure of the diameter of an additive landscape (which, for example, could be defined as the maximum pairwise distance between points on the landscape). Kauffman and Levin have shown that in a large class of correlated random landscapes, the mean distance to the tree component grows only logarithmically with the number of points in the landscape

We utilize two quantitative measures of the predictability of evolutionary trajectories. First is fraction of monotonic paths to the main peak

Second, the mean path divergence, is a fine-grained measure of evolutionary (un)predictability. We first define the divergence

The six quantitative characteristics of fitness landscapes are summarized in

Name of characteristic | Characterized property | Definition |

Peak fraction | Roughness | Number of points with no fitter neighbors divided by the total number of points in the landscape |

Deviation from additivity | Roughness | Mean squared difference between the actual fitness and the fitness predicted by the best fit additive model scaled by the mean squared fitness in the landscape |

Local roughness | Roughness | Mean squared difference between the fitness of a point and its immediate neighbors averaged over the landscape |

Distance to tree component | Roughness | Shortest distance to the tree component (points with at most one uphill neighbor) averaged over the landscape |

Monotonic path fraction | Path predictability | Fraction of the shortest paths (without multiple or reverse substitutions) to the main peak averaged over the landscape |

Mean path divergence | Path predictability | Measure of dissimilarity (divergence) of the monotonic paths to the main peak averaged over the landscape |

In an additive landscape, the mutational trajectory is maximally ambiguous. As every substitution that brings the sequence closer to the peak increases fitness, substitutions can occur in any order and all shortest mutational trajectories to the peak–without reverse substitutions or multiple substitutions at the same site–are monotonic in fitness. In the strong selection limit of our model defined below, all monotonic trajectories have roughly the same probability of occurrence, so the mutational path cannot be predicted.

The mean path divergence is a better measure of the predictability of evolutionary trajectories than the number or fraction of accessible paths. Even when only a small fraction of paths are monotonic in fitness, these paths could potentially be quite different, perhaps randomly scattered over the landscape. In such a case, prediction of the evolutionary trajectory would be inaccurate despite the scarcity of accessible paths which will be reflected in a high value of path divergence.

Equation (2) introduces the mean path divergence of a bundle of paths with the same starting and ending points. The landscape-wide mean path divergence is measured by constructing representative path bundles with all possible [start, peak] pairs including suboptimal peaks as trajectory termination points. Path divergence is averaged over all bundles with the starting and ending points separated by the same Hamming distance. To construct the path bundles, we employed a low mutation rate model in which the attempted substitutions are either eliminated or fixed in the population before the next mutation attempt occurs.

We invoke the misfolding-cost hypothesis to assign a fitness to a sequence that folds with probability

In the analysis that follows, we study the association between landscape roughness and path predictability for the folding landscapes and their randomized (also referred to as permuted or scrambled) versions. In the scrambled landscapes, the topology (i.e. connectivity) of the landscape is preserved but the fitness values are randomly shuffled. We also compare the roughness and path predictability characteristics of the model and the experimental landscapes for

We first establish that the folding and the experimental landscapes are significantly different from their randomly permuted counterparts. The deviation from additivity of the folding landscapes is typically several standard deviations below the mean of their scrambled counterparts. Although the additivity hypothesis accounts for less than 40% of the fitness variability (computed by comparing the sum of the squares of the fitnesses in the landscape to the sum of the squares of the residuals of the additive fitness model fit) in all but one of the folding landscapes, the deviation from additivity of the permuted landscapes is substantially greater (

(A) Deviation from additivity for the folding landscapes (larger symbols), their scrambled versions (smaller symbols) and the two experimental landscapes. Error bars show one standard deviation within the ensemble of permuted landscapes. (B) Fraction of monotonic paths to the main peak in folding, scrambled and experimental landscapes. (C) The number of peaks is vastly greater in scrambled landscapes than in folding or experimental landscapes (with the exception of the sesquiterpene synthase landscape).

To further characterize the deviation of the folding and experimental landscapes from their permuted counterparts, each landscape metric was measured and the mean and standard deviation were computed among 100 randomly permuted landscapes. We then compute the Z-score (deviation from the mean measured in the units of the standard deviation) of the original non-permuted landscape compared to the ensemble of the permuted landscapes. This Z-score shows how much more correlated the original landscape is, as measured by the chosen characteristic, compared to its scrambled counterparts (

Aside from the significant correlation (Spearman

Each panel quotes the Spearman rank correlation coefficient between the particular pair of characteristics.

Starting from a random non-peak sequence in the landscape, we introduced random mutations and accepted or rejected them according to equation (3) until the trajectory arrived at a fitness peak. This procedure was repeated a large number of times, and path bundles were constructed for all pairs of starting and ending sequences. Then the mean path divergence was computed for each path bundle using equation (2) and averaged over all bundles for which starting and ending points were separated by the same Hamming distance. When selection is weak, all mutations which do not result in a sequence with zero folding probability are accepted. Thus, evolution is a random walk on the landscape and the statistical properties of evolutionary trajectories are fully determined by the topology of the landscape (i.e. the connectivity of each node). Conversely, in the strong selection limit, only mutations that increase fitness are fixed. The mean path divergence varies smoothly between the two limits (

Solid lines are labeled by the Hamming distance between the pairs of starting and ending points of the trajectory bundles over which the path divergence is averaged.

All four measures of landscape roughness can serve as predictors of path divergence and monotonic path fraction to some degree (

The dots of different color correspond to noisy additive landscapes with differing amounts of multiplicative noise: low (red), two intermediate levels (green smaller than blue), and high (magenta). Yellow circles represent the folding landscapes, the cyan squares–the

In contrast to deviation from additivity, the mean distance to the tree component is positively correlated with path divergence. When the tree component comprises a large fraction of the landscape, the mean distance to the nearest tree branch is small. Consequently, the path divergence is reduced as the paths that reach the tree component do not deviate from each other from that point onward. By the same token, when the tree component is large, there are fewer monotonic paths.

The origin of the positive correlation between the local roughness and path divergence (

Here we examined the fraction of monotonic paths and introduced mean path divergence as quantitative measures of the degree to which the starting and ending points determine the path of evolution on fitness landscapes. The lower the mean path divergence value, the more deterministic (and predictable) evolution is. Global measures of landscape roughness correlate with path divergence in the three analyzed classes of fitness landscapes: additive landscapes perturbed by noise, landscapes derived from our protein folding model and two small empirical landscapes. The folding landscapes are substantially smoother than their permuted counterparts. As a result, although in all analyzed landscapes only a small fraction of the theoretically possible evolutionary trajectories is accessible, this fraction is much greater in the folding and experimental landscapes than it is in randomized landscapes. In addition, the mean path divergence in the randomized landscapes is significantly smaller than in the original landscapes. Thus, the model and empirical landscapes possess similar global architectures with many more diverged monotonic paths to the high peaks than uncorrelated landscapes with the same distribution of fitness values. Consequently, evolution in fitness landscapes is substantially more robust to random mutations and less deterministic (less predictable) than expected by chance. These findings are compatible with the concept that might appear counter-intuitive but is buttressed by results of population genetic modeling, namely, that robustness of evolving biological systems promotes their evolvability

When it comes to the interpretation of the properties of fitness landscapes described here, an inevitable and important question is whether the folding model employed here is sufficiently complex and realistic to yield biologically relevant information. In selecting the complexity of our folding model, we attempted to construct the simplest model which exhibits 1) a rich spectrum of low energy conformations across the sequence space, and 2) a non-trivial distribution of substitutions effects on the low energy conformations. An important choice is whether the location of monomers is confined to a lattice or can be varied continuously. When the configuration space is continuous, the distribution of energy barriers between energetically optimal conformations can extend to zero. Therefore, the subtlety of distinctions between conformations can lead to a richer structure of the fitness landscape. We chose not increase the complexity of the model further and treated monomers as point-like particles in a chain where the distance between nearest neighbors is fixed but the angle between successive links in the chain in unrestricted. Our level of abstraction is therefore somewhere between lattice models and all-atom descriptions of proteins

Another important choice is the number of the model monomer types. Again, we opted for an intermediate level of abstraction and chose four types of monomers: hydrophobic, hydrophilic, and positively and negatively charged. This choice drastically reduces the size of the sequence space while retaining some of the substitution complexity whereby hydrophilic and charged monomers can be swapped under some conditions without radically altering the native state. The intermediate level of abstraction in our approach has its pros and cons. Although the model reproduces key features of protein folding such as the existence of the hydrophobic folding nucleus and two-stage folding kinetics

Most importantly, our folding model has been shown to reproduce the observed universal distribution of the evolutionary rates of protein-coding genes as well as the dependencies of the evolutionary rate on protein abundance and effective population sizes

The experimental landscapes considered here are decidedly incomplete. Due to experimental limitations, only the analysis of binary substitutions at a handful of sites is feasible at this time. The incompleteness of the empirical landscapes analyzed in this work could be the cause of the observed lack of peak suppression. This proposition will be put to test by the study of larger parts of experimental landscapes that are becoming increasingly available.

The goal of this study is to explore the relationship between roughness and path divergence in realistic fitness landscapes. Our polymer folding model provides a simple way of constructing such landscapes. The model has been described in detail previously

In brief, the model polymer is a flexible chain of monomers in which the nearest neighbors interact via a stiff harmonic spring potential with rest length

The energy of the chain is

Dynamics of folding are simulated via over-damped Brownian kinetics which are appropriate when inertial and hydrodynamic effects are not important. Units are chosen so that each component

The “native structure” of a particular sequence is represented by an equilibrium ensemble of conformations. The ensemble is constructed by identifying the typical folded conformation and measuring the characteristic RMSD

The concept of the native structure ensemble allows us to compute the probability that a sequence folds to a particular structure in a natural, physically plausible fashion. Given a native structure ensemble we assess its conformation space density by computing the distance

Given a native structure ensemble of some sequence

Robust folders (sequences with a high probability of correct folding) tend to have large linear regions stretched by repulsive Coulomb interactions. Because the linear regions have no contacts with other monomers, we focused our attention on compact conformations with a high monomer contact density. Substitutions in these higher complexity conformations were more likely to exhibit non-trivial effects. To find compact robust folders in the vast available sequence space of

We examined each single substitution mutant of a robustly folding sequence and computed the folding probability

From our study of complete landscapes we estimate that on average for each sequence with

At the time of submission, 39 complete landscapes have been constructed, the largest comprising 12969 sequences.

The organization of the folding fitness landscapes and experimental landscapes were compared with perfectly additive landscapes perturbed by noise constructed as follows. Each substitution to the peak fitness sequence was assigned a negative fitness differential drawn at random from an exponential distribution with parameter

The studies on experimental fitness landscapes typically involve constructing a library of all possible combinations of binary mutations at a small number of sites. The first study included in the present analysis measured the minimum inhibitory concentrations (MIC) of an antibiotic for a complete spectrum of mutants with modified TEM