## Figures

## Abstract

Evolutionary pathways describe trajectories of biological evolution in the space of different variants of organisms (genotypes). The probability of existence and the number of evolutionary pathways that lead from a given genotype to a better-adapted genotype are important measures of accessibility of local fitness optima and the reproducibility of evolution. Both quantities have been studied in simple mathematical models where genotypes are represented as binary sequences of two types of basic units, and the network of permitted mutations between the genotypes is a hypercube graph. However, it is unclear how these results translate to the biologically relevant case in which genotypes are represented by sequences of more than two units, for example four nucleotides (DNA) or 20 amino acids (proteins), and the mutational graph is not the hypercube. Here we investigate accessibility of the best-adapted genotype in the general case of *K* > 2 units. Using computer generated and experimental fitness landscapes we show that accessibility of the global fitness maximum increases with *K* and can be much higher than for binary sequences. The increase in accessibility comes from the increase in the number of indirect trajectories exploited by evolution for higher *K*. As one of the consequences, the fraction of genotypes that are accessible increases by three orders of magnitude when the number of units *K* increases from 2 to 16 for landscapes of size *N* ∼ 10^{6} genotypes. This suggests that evolution can follow many different trajectories on such landscapes and the reconstruction of evolutionary pathways from experimental data might be an extremely difficult task.

## Author Summary

Biological evolution is driven by heritable, genetic alterations that affect the fitness of organisms. However, the pool of “fitter” variants (genotypes) is often restricted and it is not at all obvious how evolution finds its way from low-fitness to high-fitness genotypes in a complex, multidimensional “fitness landscapes” with many peaks (fit organisms) and valleys (unfit ones). To address this question we investigate how likely it is for biological evolution to find a way “uphill” from a lower-fitness organism to the best adapted organism. We discover that the accessibility of the fittest organism depends on the number of types of basic “units” used to encode genotypes. These units can be, for example, the four DNA nucleotides A,T,C,G, or the ∼20 amino acids used for synthesizing proteins, and the choice of the most appropriate unit is dictated by how the genotypes and the fitnesses are related—a relationship that researchers have begun to unveil only recently. We find that increasing the number of units strongly increases the probability that there will be at least one uphill path to the best-adapted genotype, and the number of evolutionary pathways leading to it. Our findings suggest that biological evolution can follow many more pathways than previously thought.

**Citation: **Zagorski M, Burda Z, Waclaw B (2016) Beyond the Hypercube: Evolutionary Accessibility of Fitness Landscapes with Realistic Mutational Networks. PLoS Comput Biol12(12):
e1005218.
https://doi.org/10.1371/journal.pcbi.1005218

**Editor: **Roger Dimitri Kouyos,
University of Zurich, SWITZERLAND

**Received: **June 10, 2016; **Accepted: **October 23, 2016; **Published: ** December 9, 2016

**Copyright: ** © 2016 Zagorski et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All data and computer programs are available in Supporting Information (S1 Data and Code).

**Funding: **MZ acknowledges the Polish National Science Centre grant no. DEC-2012/07/N/NZ2/00107. BW was supported by the Scottish Government/Royal Society of Edinburgh Personal Research Fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Biological evolution can be visualised as a dynamical process in which reproduction and mutations cause organisms to move in the “fitness landscape” [1]—a complex, multidimensional landscape full of ridges and local peaks in which space represents genetic variants (genotypes) of organisms, and “height” corresponds to organism’s fitness—a measure of the reproductive success.

The structure of the fitness landscape (FL) affects the speed and predictability of biological evolution and is thus very important for practical reasons such as forecasting the evolution of resistance to antibiotics [2–7]. However, fitness landscapes are astronomically large even for the simplest organisms: the number of possible genotypes with the same genome size as the smallest known bacterial genome to-date (*Nasuia deltocephalinicola* [8], *L* = 112kbp) is *N* = 4^{112000} ≈ 10^{67430} genotypes. For this reason only small fragments of fitness landscapes have been determined experimentally [3, 9–13] and progress has been made mostly by studying adaptation in computer-generated landscapes (see e.g. [14–16]) or in simple theoretical models (see e.g. House-of-Cards [17], NK [18], Mount Fuji [19], holey landscapes [20]). In these models genotypes are typically represented as binary sequences, where 0 and 1 correspond to two different alleles of a gene or a particular point mutation being absent/present. This choice has been dictated by three factors: (i) availability of empirical data in which genotypes usually differ by specific point mutations from a certain “wild-type” genotype (cf. Refs. [10, 11]), (ii) most studies focused on the shortest evolutionary pathways for which (as we will see later) the number of alleles is not important, (iii) under the assumption that offspring differ from their parent by only one character, the resulting graph of permitted mutations is the hypercube graph and this greatly facilitates mathematical analysis [17, 21].

However, the space of real genotypes and their mutational network differ significantly from the space of binary sequences and the hypercube graph. A DNA sequence is made of four different nucleotides (A,T,G,C), hence at the most fundamental level genotypes must be represented as sequences of *K* = 4 different characters. Another example with *K* = 4 is RNA folding in which relationship between the coding sequence and the corresponding secondary-structure defines a landscape of energetically favorable RNA conformations [22–24]. On a more coarse-grained level, assuming that fitness is determined primarily by protein structure, a genotype can be represented as a sequence of characters from a set of *K* > 20 symbols which correspond to 20 standard amino acids [25–27] and their post-translational modifications. Finally, if we consider genotypes as sequences of genes, *K* is equal to the number of different variants (alleles) of genes and is practically infinite. All these examples are valid descriptions of the genotype space but they lead to different structures of the corresponding fitness landscape: different genotype-to-fitness mappings and different structures of the graphs of possible transitions (mutations) between the genotypes.

Here we investigate how the global structure of the fitness landscape is affected by the number *K* of different “units”, or characters, of which sequences of length *L* representing genotypes are made, and the structure of the corresponding mutational graph (Fig 1). We focus on evolutionary accessibility of the global fitness maximum. An accessible pathway is a trajectory in the genotype space along which fitness increases monotonically. Accessible pathways are important because in the low mutation-strong selection limit [28–30] such pathways are good approximations to evolutionary trajectories obtained in computer models of biological evolution, regardless of the details of the model. Accessible pathways are thus expected to represent real evolutionary pathways, at least in some experiments [3].

Examples of two fitness landscapes with *K* = 2 and *K* = 4 and the same number of genotypes *N* = 16. Grey arrows show the direction of increasing fitness. Two accessible pathways (black = with indirect mutations, green = without indirect mutations) have been highlighted. For *K* = 4, indirect mutations can be either backward (increasing the Hamming distance from the target genotype) or distance-neutral (the Hamming distance remains unchanged).

While we intuitively expect that increasing the number of coding units and hence the connectivity of the mutational graph should increase accessibility if the total number of genotypes *N* = *K*^{L} is kept fixed, it is not obvious how increasing *L* affects accessibility for different *K*. In the case of well-studied binary landscapes (*K* = 2) the probability that a maximally random (House-of-Cards) fitness landscape has at least one accessible pathway approaches zero as *L* → ∞ [30–32]. While many evolutionary models have been studied in the case *K* > 2 [33–36], these works have not addressed accessibility of fitness landscapes. It is also unclear what happens for *K* > 2 in the presence of correlations between the fitnesses that exist in many real fitness landscapes [10].

To address these questions, we analyse computer-generated and experimental fitness landscapes with various *K* and *L*. First, we show that if maximally random landscapes with the same number of genotypes *N* = *K*^{L} but different *K* are compared, larger *K* leads to higher accessibility. In contrast to the binary case *K* = 2, in the large-*L* limit the landscape with *K* > 2 contains on average at least one pathway. We demonstrate that as *K* increases, accessible pathways become longer and contain an increasing number of indirect mutations which do not decrease the distance to the fittest genotype. We also show that accessibility depends strongly on the fitness of the initial genotype, *f*_{0}, and that a transition occurs at some critical *f*_{0} above which there are no accessible pathways. Finally, we confirm our predictions for two experimentally determined fitness landscapes.

### Modelling framework

We consider the space of all possible sequences of length *L* made of symbols taken from a *K*-letter alphabet. Each sequence *i* = 0, …, *K*^{L} − 1 represents a genotype in the discrete genotype space, with *i* = 0 corresponding to sequence {0, 0, 0, …, 0} and *i* = *K*^{L} − 1 to {*K* − 1, *K* − 1, *K* − 1, …, *K* − 1}. The genotypes are connected by a network of mutations that change one genotype into another one. We assume that any position in the sequence can mutate with equal probability and that a mutation at a given position changes the symbol at this position to another randomly selected symbol. Therefore, two genotypes are connected if they differ at exactly one position.

To create a fitness landscape, we assign random fitnesses {*f*_{i}} to all genotypes. Each fitness *f*_{i} is a random number drawn independently from the probability distribution uniform on [0…1). Next, we re-index all genotypes such that genotype {*K* − 1, *K* − 1, *K* − 1, …, *K* − 1} has the maximum fitness (Methods). This procedure creates an ensemble of maximally-random or “rugged” fitness landscapes with no correlations between the fitnesses of adjacent genotypes. This is arguably the simplest non-trivial model of the fitness landscape with many local maxima and minima and thus it presents an ideal test ground to investigate the role of connectivity on the accessibility of FLs. Although real landscapes are known to be moderately correlated [10, 37–39], we shall show later that our conclusions are robust in the presence of such correlations.

We define a pathway in the genotype space as a trajectory that connects two given genotypes following the links of the mutational graph. We call a pathway accessible if fitness increases monotonically from the start to the end point along the pathway. We consider only such evolutionary trajectories that begin at genotype {0, 0, 0, …, 0} and end at the best-fit, antipodal genotype {*K* − 1, *K* − 1, *K* − 1, …, *K* − 1}. Following Ref. [10], we call a FL accessible if there is at least one accessible path between the target and the initial genotype (Fig 1). We also define accessibility *A* as the probability that a FL randomly selected from our statistical ensemble is accessible.

## Results

### Computational model

#### Accessibility increases with *K*.

We investigated how accessibility varied with the number of genotypes *N* = *K*^{L} for different *K*. We first considered only pathways without “backward mutations”, that is pathways along which the Hamming distance from the initial genotype increased (or remained the same for *K* > 2) with each step. Fig 2a shows that accessibility *A* slowly decreases with increasing *N*, for all *K*. When *K* = 2, we recover the mathematical result of Hegarty et al. [31] that *A* tends to zero for *N* → ∞. However, when *K* > 2, accessibility approaches a finite, non-zero value in the large-*N* limit. The estimated asymptotic accessibility *A*_{∞} = *A*(*N* → ∞) increases with *K* and equals 47%, 69% and 81% for *K* = 4, 8, and 16 (S1 Fig) (standard errors <1%). Therefore, for sufficiently long sequences accessibility is larger for a larger number of basic unit types.

**(a)** Plots of accessibility *A* versus the number of genotypes *N* for different *K* (black, red, blue, and green triangles) and for pathways without backward mutations. *A* tends to zero as *N* → ∞ for *K* = 2. **(b)** Accessibility versus *N* for all pathways, including indirect ones (circles). In both panels the dashed lines correspond to asymptotic estimates of accessibility *A*_{∞} (cf. S1 Fig).

Fig 2b shows that the same trend holds when all pathways and not just the ones without backward mutations are considered. However, in the latter case *A*_{∞} remains finite also for *K* = 2. A comparison of panels (a), (b) in Fig 2 shows that, except for the case *K* = 2, accessibility is almost identical regardless of whether pathways with backward mutations are included or not. We shall see later that *A*_{∞} does not change much for *K* > 2 because the fraction of backward mutations is small compared to other (distance-neutral) mutations, see Fig 3. When *K* = 2 we recover the analytical prediction *A* = 0.1186… from Berestycki et al. [32]. We also note that if only shortest pathways are included for *K* > 2, accessibility tends to zero as *N* → ∞ because in this case only mutations that change the symbol “0” to *K* − 1 are permitted and thus the space of permitted genotypes is restricted to that of binary sequences (*K* = 2). Since pathways without backward mutations form only a small subset of all pathways for *K* > 2 (S2 Fig) in the rest of the paper we shall consider all pathways when discussing accessibility.

**(a)** Average number of mutational steps (pathway length) along accessible pathway for a fixed length of genotype *L*. Fractions of forward, distance-neutral, and backward mutations are indicated by coloured areas between the curves, for example the fraction of forward mutations corresponds to the distance between the red and the gray curve. **(b)** Average length of accessible pathway for a fixed number of genotypes *N*. **(c)** Average length of accessible pathway for a fixed number of coding units *K* (circles) with straight lines fitted to the data (lines). Note that the fraction of shortest pathways (with only forward mutations) among all pathways quickly decreases (cf. S3 Fig).

#### Larger *K* increases the length of accessible pathways and the fraction of indirect mutations in such pathways.

We next investigated the attributes of individual accessible pathways: their length and the fraction of backward mutations. Increasing *K* while keeping *L* fixed makes accessible pathways longer (Fig 3a). Interestingly, the average length scales as ∼0.5*LK* and only weakly depends on *N* (S4 Fig). Notice that by fixing *L* we keep the same mutational (Hamming) distance between the initial and target genotype, thus any increase in pathway length for *K* > 2 must be caused by mutations that either do not change the Hamming distance from the initial genotype (distance-neutral mutations) or decrease it (backward mutations), which we jointly refer to as indirect mutations. This is indeed what happens; S2 Fig shows that the probability that an accessible path has backward or distance-neutral mutations increases with *K* if we compare fitness landscapes with the same *L*. Moreover, the higher *K* the more dominating are distance-neutral mutations (Fig 3a). For example, for *K* = 16 and *L* = 5 accessible pathways have on average 5% of backward, 15% of forward and 80% of distance-neutral mutations. This is in contrast to the binary case *K* = 2 and *L* = 5 in which forward mutations account for 98% of all mutations, and the remaining 2% are backward mutations. Keeping *N* or *K* fixed while changing *K* or *L* (Fig 3b and 3c) does not qualitatively change this picture. Hence, higher accessibility of FLs with multiple coding units comes with a price: accessible pathways become longer, and more mutational steps are required to reach the best-fit genotype than in the binary case (*K* = 2).

#### Initial fitness affects the number of pathways and accessibility.

So far we have been assigning a random fitness to the initial genotype. We may however expect that the fitness *f*_{0} of the initial genotype strongly affects the number of accessible pathways from that genotype to the antipodal, best-fit genotype. If *f*_{0} was close to the maximal fitness we would intuitively expect few (or no) pathways, whereas for small *f*_{0} such pathways would be more likely. This phenomenon was first noted in Ref. [40] under notion of “accessibility percolation” and further studied for binary sequences (*K* = 2) [31, 32, 41]. Fig 4a shows that this is also the case for *K* > 2: the number of accessible pathways decreases with increasing *f*_{0}. Moreover, accessibility sharply decreases from ≈ 1 to zero as *f*_{0} crosses a critical *f*_{crit}, different for each *K* (Fig 4b). This transition between accessible and inaccessible FLs becomes sharper with increasing *L*, and for *L* → ∞ the numerical value of *f*_{crit} approaches *A*_{∞}. Thus, for *L* large enough, while accessible pathways exist for *K* = 2 only if the initial fitness is very low (*f*_{0} < 0.1), the number of such pathways is non-zero even for relatively large *f*_{0} when *K* > 2.

**(a)** Average number of accessible pathways as a function of initial fitness *f*_{0} for different *K* and for *N* = 2^{20}. Shaded areas represent standard errors. **(b)** Accessibility *A* as a function of initial fitness *f*_{0} for different *K* and for *N* = 2^{28}. A steep decrease in accessibility at some critical *f*_{0} (dashed line) indicates that the probability of reaching the best-fit genotype can be very sensitive to the initial fitness.

#### Accessible pathways can cover most of the FL.

Fig 4a shows that the number of accessible pathways increases by several orders of magnitude when *K* increases from 2 to 16. This is associated with a rapid increase in the number of genotypes belonging to accessible pathways, i.e., the coverage of the FL (Fig 5). In particular, for *K* = 2 the average coverage of the FL equals 0.02%, whereas for *K* = 16 around 25% of genotypes belong to accessible pathways. Coverage can exceed 50% when the initial fitness *f*_{0} is small (S5 Fig). The increase in the number of accessible pathways and accessible genotypes has important consequences for the *a posteriori* predictability of biological evolution: given the initial and final genotypes, we cannot reliably predict the pathway biological evolution might have followed if *K* > 2.

The total number of genotypes (grey) and the genotypes belonging to accessible pathways (colours black, red, and green) as a function of Hamming distance from the target genotype, for fitness landscapes with different number of coding units *K* = 2, 4, 16 (panels a, b, c) and the same number of genotypes *N* = 2^{20}. The shaded area under the curve corresponds to the total number of genotypes (for any distance) that belong to accessible pathways. Coverage (the fraction of genotypes in accessible pathways) is also shown.

### Experimental fitness landscapes

We next verified the predictions of our computer model using two experimentally determined fitness landscapes. We first analysed the landscape of DNA-protein affinities [39], with binding affinity to the fluorescent protein allophycocyanin as a proxy for fitness. By selecting subsets of genotypes (Methods) we generated two ensembles of fitness landscapes with *L* = 5, *K* = 4 and *L* = 10, *K* = 2 (*N* = 1024 genotypes in each case).

We obtained accessibility *A* = 0.669(5) for *K* = 2 and *A* = 0.864(4) for *K* = 4, thus confirming our prediction that *A* increases with *K* (Fig 6a–6c). The values of *A* are higher than for the maximally random landscape (Fig 2) because the fitness values are negatively correlated with the distance to the target genotype, i.e., the closer to the target genotype, the higher is the fitness (Fig 6b). If we randomize the fitness landscapes by randomly swapping the fitnesses, the expected *A* decreases and it matches the value obtained for the maximally random landscape. Correlations between the fitness and the distance of a genotype to the target genotype can therefore significantly increase accessibility. We note that this is not caused by the difference between the distributions of *f*_{0} for randomized and original FLs because these distributions strongly overlap (Fig 6c). This observation indicates that accessibility can be enhanced by fitness-to-distance correlations in FLs, in accordance with [30].

Panels (a-c): data from Rowe et al. [39], panels (d-f): data from Guenther et al. [42]. **(a,d)** Accessibility of the sub-FLs for *K* = 2, 4 (red), and for their randomized counterparts with the same fitness distribution (grey). Randomization decreases accessibility to that of a maximally-random FL. **(b,e)** Average genotype fitness in sub-landscapes with *K* = 2 as a function of the distance from the target genotype. Randomization removes correlations present in the original FL. **(c,f)** Histogram of the fitness of the initial genotype; the average fitness for each histograms is the same as the average fitness value of the antipodal genotype (the right-most points in Panels b, e).

We further investigated how the fitness of the initial genotype affects accessibility, i.e., whether there is some critical value of initial fitness separating regions of FLs with high and low accessibility. To account for the non-uniform distribution of the experimental fitnesses we transformed the fitnesses such that the new fitness distribution was uniform, while preserving the order of the fitnesses and local correlations (epistasis), i.e., if *f*_{0} < *f*_{1} < *f*_{2} < … < *f*_{N − 1} then the transformed fitnesses . The estimated critical values of are 0.69(1) and 0.90(1) for *K* = 2 and *K* = 4 respectively (Fig 7a). These values are close to the estimated values of *A* from Fig 6a; the critical *f*_{crit} increases with *K* similarly as in the maximally random model.

Panels (a-c): data from Rowe et al. [39], panels (d-f): data from Guenther et al. [42]. **(a,d)** Accessibility of the sub-FLs for *K* = 2, 4 (red circles, red triangles), and for their randomized counterparts (grey, black) as a function of rescaled initial fitness . In the presence of fitness-to-distance correlations landscapes with high levels of are mostly accessible compared to the case with no correlations. **(b,e)** Number of paths exhibits as a function of rescaled initial fitness . Presence of fitness-to-distance correlations increased number of pathways by a few orders of magnitude compared with the randomized ensemble. **(c,f)** Coverage of experimental sub-FLs and their randomized counterparts as a function of . The insets show the same data in semi-log scales. In the case of *K* = 2 correlations increase the fraction of accessible genotypes to levels observed for FLs with *K* = 4.

The presence of fitness-to-distance correlations affects also other properties of FLs: there are more accessible pathways leading to the best-fit genotype (Fig 7b), the pathways are longer and the fraction of accessible genotypes is higher than in the maximally random model (S1 Table). Moreover, the fractions of accessible genotypes for *K* = 2 and *K* = 4 are large and behave similarly when plotted against (Fig 7c), but when fitness-to-distance correlations are removed the two curves separate. Correlations thus enhance accessibility for both *K* = 2 and *K* = 4, making *a posteriori* predictability of evolutionary trajectories even more difficult than in the case of the maximally-random FL.

In order to confirm that the observed behaviour is not unique to the specific experimental data set of Ref. [39], we analysed a second landscape of RNA-protein binding affinity [42]. We assumed fitness to be proportional to the rate constant for the reaction of the RNA-binding subunit C5 of RNase P with all variants of the 6-nucleotide recognition site (Methods). Proceeding similarly as before, we generated sub-FLs with *L* = 3, *K* = 4 and *L* = 6, *K* = 2 (*N* = 64 genotypes). The results for this second landscape (Fig 6d–6f, Fig 7d–7f, S1 Table) turned out to be consistent with the results reported above.

Note that although our approach in which we identify fitness with binding affinity of a protein to an RNA or a DNA sequence is commonly used in the literature, the fitness defined in this way is not “true fitness”, i.e., it does not have to directly correspond to reproductive success of an organism in which the described interactions take place.

## Discussion

Although the concept of the fitness landscape was proposed almost a century ago [1], many open problems remain. Setting aside the appropriateness of the FL to model evolution in the presence of interactions between different organisms, one of the important questions is the accessibility of local fitness peaks: are such peaks sufficiently well connected, or are they separated by “fitness valleys” which are so deep that evolution cannot reach most of them? Another question concerns the predictability of evolution: how many evolutionary trajectories lead from one genotype to another, better adapted genotype?

Our results show that the answer to these questions depends on how genotypes are connected in the genotype space, i.e., which transitions between the genotypes are permitted and which are not. Since genetic information is stored in a linear molecule (DNA), genotypes are most naturally represented as linear sequences of some “units” of information, or building blocks, and this already imposes restrictions on the structure of the mutational graph. Moreover, the nature of the smallest unit of information and how mutations convert one unit into another further affects how genotypes are connected and how they are mapped to fitnesses. The choice of the appropriate “unit” depends on the subset of the fitness landscape considered in a particular research problem. While the presence or absence (*K* = 2) of a certain point mutation leads to a simple mathematical model, this approach is limited to small fragments of fitness landscapes and it requires the knowledge of which mutations are important and which can be ignored [11]. How many unit types are necessary to model the most general situation? Since the DNA code uses four nucleotides, *K* = 4 building blocks should in theory suffice to represent all possible genotypes. On the other hand, the redundancy of the genetic code might suggest that the most appropriate unit should rather be an amino acid, in which case *K* ≈ 20. However, this could be disputed because there is evidence that synonymous codons can be translated at different rates [43–45] which may affect fitness.

We have seen that increasing the number of basic units or “alleles” *K* generally increases accessibility. This is not just a trivial consequence of increased connectivity of the mutational graph (number of permitted mutations for each genotype). Connectivity also increases with increasing sequence length *L*, but this actually slightly decreases accessibility (cf. Fig 2). Increasing *K* changes not only how many neighbours a genotype has in the genotype space, but it affects the global topology of the network of genotypes. Since accessibility of the global fitness maximum is a global property of the fitness landscape, it depends on *K*. This is in contrast to local properties such as the number of local fitness maxima, which depend only on the number of nearest neighbours and is thus the same for FLs with the same value of the product (*K* − 1)*L* [17].

The presence of multiple alleles has also one more important effect—it enables pathways to explore a much larger part of the fitness landscape through mutations that either do not decrease the Hamming distance to the target genotype or even increase it (backward mutations). This greatly enhances accessibility by exploring genotypes through indirect mutations, effectively circumventing regions of sign epistasis [46–48]. This observation has been recently supported by experimental study of the four-site IgG-binding domain of protein G (*L* = 4, *K* = 20) [49] which shows that direct paths are often blocked by reciprocal sign epistasis, and only indirect pathways can access high-fitness regions of the landscape.

By re-sampling two experimental fitness landscapes [39, 42] we have found that the presence of correlations between the fitnesses facilitates evolution and, as a result, accessibility and the fraction of accessible genotypes for *K* = 2 increases to a level comparable with that of the maximally-random FL with *K* > 2. For larger *K* the increase of accessibility is not so pronounced (accessibility is already high in the maximally random FL), but the number of accessible pathways increases by many orders of magnitudes: there are ∼10^{10} pathways in the presence of correlations as compared to ∼10^{3} pathways in the randomized FL with 1024 genotypes (S1 Table). It would be interesting to see if similar effects could be observed in other large FLs [50, 51]. However, these data sets, despite a large number of genotypes included, cover only a small fraction of the genotype space due to random mutagenesis employed to generate mutant genotypes. The problem of missing genotypes with undefined fitness is why in this work we decided to use only the (almost complete) FLs of Refs. [39, 42].

Our analysis assumes that all evolutionary pathways are equally likely. This does not have to be true in many biologically relevant scenarios [3, 52, 53]. Even in the strong selection/weak mutation (SSWM) limit to which the notion of accessible pathways is most relevant, pathways of different lengths have different probabilities of realisation. In particular, for *K* = 2 shortest pathways (if they exist) may have higher probability of realisation than more abundant but longer indirect pathways [54]. Our work shows however that for *K* > 2 indirect pathways completely dominate the set of accessible pathways (cf. S3 Fig), which suggests that indirect accessible pathways with moderate lengths may be relevant in the SSWM limit. It would be interesting to see how these predictions change once the SSWM assumption is relaxed, effectively decreasing the relevance of pathways with increasing fitness [55].

Another factor that we neglected was that genotype-to-phenotype mapping is not unique due to phenotypic plasticity [56–58] in which many phenotypes may correspond to the same genotype. Moreover, fitness is not an absolute characteristic of an organism in the same way as the genotype is because it depends on the organism’s environment: physical conditions such as temperature, available nutrients, and the presence/absence of other organisms. This environmental plasticity has been shown to speed up evolution in experiments [59–61] and *in silico* [62–64]. Fitness can also be different at different locations even in a relatively small habitat, and chemical gradients can significantly affect the rate of biological evolution [65, 66]. The reality is thus much more complicated than our idealised model, but the picture that emerges from our work is that the mutational structure of the genotype space is an important factor affecting biological evolution, and that the use of binary sequences to represent real genotypes has limited applicability.

## Methods

### Computer model

For each generated fitness landscape (FL) we find the number of accessible pathways that start from the antipodal genotype {0, 0, …0} and end at the fittest (target) genotype {*K* − 1, *K* − 1, …, *K* − 1} using a depth-first search (DFS) algorithm with backtracking. The fitness landscape is represented as a graph with nodes corresponding to genotypes and edges to mutations between the pairs of genotypes. Each node is assigned a counter variable which specifies how many accessible pathways connect this node to the fittest genotype, and an auxiliary variable *v* which assumes one of the three possible node states: *v* = 0, *v* = 1 or *v* = 2, representing nodes that have not been visited by DFS yet, the visited ones, and the ones with their whole mutational neighbourhood explored. All counters and variables *v* are initially set to zero. The DFS algorithm starts from the antipodal genotype and follows only those edges along which fitness increases. When the final node (genotype with maximal fitness) has been reached the algorithm traces its steps back to the initial node while increasing the counters of all nodes it visits on its way. Particularly, if all edges from a given node have been explored, the node is labelled as explored (*v* = 2) and its counter variable is no longer updated. Next, the counter of this node is added to the counter of the previous node in the DFS and the search continues from the later node. If at some point the DFS algorithm attempts to visit an already explored node, only the explored node counter is added to the counter of the current node. The backtracking procedure thus exhaustively accumulates the number of all paths going from a given node to the target genotype. When the search is completed, the value of the counter at the initial node gives the total number of accessible paths starting at this node.

We tested this algorithm by comparing the number of pathways found for small (*N* < 2^{10}) FLs with the number obtained by a naive, recursive breath-first search algorithm. Both algorithms produced identical results but the first algorithm was substantially faster for larger graphs than the naive algorithm. For example, the described algorithm enabled us to enumerate all pathways (∼10^{34}) on a computer-generated FL with *K* = 16, *L* = 6 which would clearly be impossible for the naive algorithm.

To obtain the accessibility of a given FL we used a simple DFS algorithm without backtracking. This reduced memory requirements as compared to the full algorithm described above, enabling FLs with higher *K* and/or *L* to be analysed. To obtain the number of forward, backward and distance-neutral mutations in pathways of a given length, we modified the algorithm so that it stored a two-dimensional histogram—a generalized counter variable—for each node. During backtracking the histograms were accumulated and stored as the final histogram in the antipodal genotype (in S6 Fig example of such histogram is plotted for the experimental FL with *K* = 4 and *L* = 10).

### Experimental fitness landscape

The first experimental FL we used was the protein-binding landscape of DNA oligomers of length *L* = 10 [39]. This landscape has been previously shown to be rugged, with many minima and maxima, and is thus not too dissimilar from our computer-modelled landscapes. However, the landscape shows strong correlations (Fig. 2 in Ref. [39]) and thus it is not evident how our results from the computer model of maximally-random, uncorrelated FLs will apply to this landscape.

The natural number of coding units, or “alleles”, for this landscape is *K* = 4 since each site of the sequence can be in one of four possible states (4 nucleotides A,T,G,C). The total number of genotypes is 4^{10} ≈ 1M. In order to obtain quasi-independent fitness landscapes necessary for statistical analysis from just one, large data set and FLs for other values of *K* (in particular for *K* = 2), we followed the procedure described below.

To obtain a sample landscape with *K* = 4, *L* = 5, we randomly selected 5 different positions from the 10-nucleotide sequence, and a random sequence where *c*_{i} was one of A,T,G, or C. We then selected all 10-nucleotide sequences with nucleotides at positions , and any nucleotides at the remaining *L* = 5 positions. This generated a sub-landscape of the full fitness landscape with *N* = 4^{5} = 1024 genotypes with the same “genetic background” at positions . By repeating the above procedure we created 10k (of the total ≈ 258k different) FLs with different genetic backgrounds (different ).

To generate a sample landscape with *K* = 2, *L* = 10, we selected a random sequence from all 10-nucleotide sequences, and another sequence such that it differed from at all positions. We associated sequences with two antipodal binary sequences 00…0 and 11…1 of length *L* = 10, and constructed the fitness landscape from all *N* = 2^{L} = 1024 sequences such that the *i*-th nucleotide was either *x*_{i} or *y*_{i}. We repeated this procedure to sample 10k landscapes of the total ≈ 60M different possible FLs.

After generating the sets of fitness landscapes with *K* = 4 and *K* = 2 with the same number of genotypes *N* = 1024, the corresponding fitness values were normalized to be between 0 and 1. Accessibility was determined for mutational pathways starting at the antipodal genotype and ending at the genotype with maximum fitness. For *K* = 4 we used the same convention of selecting the antipodal genotype as in the computer-generated FLs (see below).

For our second FL we used the rate constants for the reaction between the RNA-binding subunit C5 of RNase P recognition and its RNA substrate, with all variants of its recognition site of length *L* = 6 [42]. Using the same approach as described in the Methods of Ref. [42] we calculated the rates relative to the rate constant of the wild-type RNA sequence. In our analysis we were less stringent than Ref. [42] on removing sequence variants that exhibited biased amplification, i.e., we kept sequences that had reads above quality threshold for at least three time points. We took as fitness the relative rate constant averaged over different time points; we also checked that other methods, e.g. selecting with the largest difference in the corresponding raw read count, did not significantly affect our results. The analysis yielded fitness values for 3940 out of the total 4^{6} = 4096 possible sequences. The remaining 4% sequences for which fitness could not be determined were assigned the fitness of the least fit genotype.

Adapting the procedure described for the first experimental landscape we obtained sub-landscapes of the full experimental landscape with *K* = 4, *L* = 3 and *K* = 2, *L* = 6, with *N* = 64 genotypes each. Since these sub-FLs were sampled from the landscape smaller than the first one and the total number of possible sub-FLs was much smaller than before (e.g., 1280 for *K* = 4), we generated only 1k sub-FLs for each *K*. We also normalized the fitness to be between 0 and 1, and performed the same analysis as for the first experimental data set.

### Re-indexing fitness landscapes

In both the computer model and the analysis of the real FLs we re-index all genotypes such that the best-fit (target) genotype is assigned the sequence {*K* − 1, *K* − 1, …, *K* − 1}. The antipodal genotype is then {0, 0, …0}. When fitness values are drawn at random no actual re-indexing is necessary but the series of *N* = *K*^{L} fitnesses is generated and the highest value is assigned to the target genotype while the remaining fitness values are assigned to the remaining genotypes. For sub-landscapes of the real fitness landscapes we define a map that assigns the target genotype sequence to the fittest genotype while preserving all correlations and the structure of the FL; in particular, each genotype has the same nearest neighbours in the re-indexed FL. For *K* = 2 we proceed as follows: if the fittest genotype sequence has 0s at *k* positions and 1s at the remaining positions, the map changes 0 → 1 and 1 → 0 at the positions and for the remaining positions there are no changes, see S7 Fig. Applying this transformation to all genotypes results in a FL with the desired indexing (the fittest genotype sequence contains only 1s and the antipodal genotype sequence only 0s). This mapping is unique for *K* = 2, but in the case of *K* > 2 the antipodal genotype can be selected in many ways. We thus modify the algorithm as follows for *K* > 2. Assume that the fittest genotype sequence has characters at positions such that 0 ≤ *c*_{1}, *c*_{2}, …, *c*_{k} < *K* − 1. We then define the following reassignment: *c*_{1} → *K* − 1, …, *c*_{k} → *K* − 1, which maps the fittest genotype to the sequence {*K* − 1, *K* − 1, …, *K* − 1}. There are now (*K* − 1)^{L} possible choices for the antipodal genotype and we remove this ambiguity by randomly selecting one of them and extending the mapping (character reassignment) in such a way that the selected genotype is mapped onto the sequence {0, 0, …0}. The character reassignment rules described above are then applied to all remaining genotypes. We note that these rules apply only to specific positions in the sequence; for example the mapping 1 → *K* − 1 at position 3 will be applied only if a sequence has character 1 at position 3, but not if there is a 1 at another position, unless there is a rule 1 → *K* − 1 specifically for that position. We also note that the algorithm does not change characters which are neither assigned to (*K* − 1) (target) nor to 0 (antipodal).

Equivalently to re-indexing sub-landscapes of the real fitness landscape one could adapt the DFS algorithm and related structures storing accessible pathways to have arbitrary target and initial genotypes, i.e., not encoded by sequences {*K* − 1, *K* − 1, …, *K* − 1} and {0, 0, …0}. The outcome of both approaches would be identical, as the fitness landscapes before and after re-indexing are isomorphic (S7 Fig).

## Supporting Information

### S1 Fig. Asymptotic accessibility estimates for different *K*.

**(a)** Asymptotic estimate of accessibility *A* versus the log(log_{2} *N*)/log_{2} *N* for *K* = 2 when only shortest pathways are considered. The logarithmic correction to the inverse of log_{2} *N* comes from mathematical arguments [31]. **(b,c)** Plots of accessibility *A* versus the inverse of log_{2} *N* for different *K* (2 = black, 4 = red, 8 = blue, and 16 = green) and for pathways without backward mutations (panel b) and for all pathways (panel c). The dashed lines are linear fits to the data subset. Only landscapes with *N* > 2^{10} have been used to estimate *A*_{∞}. Statistical errors (s.e.m.) are below 0.01 for all *A*_{∞}.

https://doi.org/10.1371/journal.pcbi.1005218.s001

(TIF)

### S2 Fig. Fraction of pathways with backward and distance-neutral mutations increases with *K*.

**(a)** Average fraction of pathways that contain at least one backward mutation as a function of genotype length *L*, and for different *K*. **(b)** Analogous to panel a, but for distance-neutral mutations. By definition, distance-neutral mutations do not exist for *K* = 2 and hence there is no corresponding line in the plot.

https://doi.org/10.1371/journal.pcbi.1005218.s002

(TIF)

### S3 Fig. Shortest pathways are quickly outnumbered by indirect pathways.

The average fraction of shortest pathways as a function of number of genotypes *N*, and for different *K*. The median of the fraction of shortest pathways appears to tend to zero for *K* > 2 in the large-*N* limit.

https://doi.org/10.1371/journal.pcbi.1005218.s003

(TIF)

### S4 Fig. Increasing the number of coding units lengthens accessible pathways.

**(a)** The normalized pathway length (number of mutational steps divided by *L*) as a function of *K*, for different values of *L*. **(b)** The normalized length as a function of log_{2} *N* for different values of *K*. Except from *K* = 16 the normalized length reaches a plateau within the investigated range of *N*.

https://doi.org/10.1371/journal.pcbi.1005218.s004

(TIF)

### S5 Fig. Coverage of FL with genotypes belonging to accessible pathways depends on the fitness of the initial genotype.

Average coverage of fitness landscapes with different number of coding units *K* (black, red, blue, green) and a fixed number of genotypes (*N* = 2^{24} for *K* = 2, 4, 8 and *N* = 2^{20} for *K* = 16) as a function of initial fitness *f*_{0}. Inset: the same data in the log-linear scale. Shaded area corresponds to one standard deviation (accessible FLs only).

https://doi.org/10.1371/journal.pcbi.1005218.s005

(TIF)

### S6 Fig. Number of accessible pathways in the experimental FL has a broad distribution in the space of indirect mutations.

Number of accessible pathways from the antipodal to the best-fit genotype for the full experimental FL as a function of the number of backward and distance-neutral mutations. The maximal number of pathways (black dot) is approximately 6.45 × 10^{21}. The histogram was obtained by exhaustive enumeration of all pathways using the algorithm described in Methods.

https://doi.org/10.1371/journal.pcbi.1005218.s006

(TIF)

### S7 Fig. Re-indexing preserves mutational neighbours of each genotype.

Example of original (left) and re-indexed (right) fitness landscapes with target (best-fit) and antipodal genotypes highlighted. The aim of re-indexing is to reassign genotypes in isomorphic manner (the structure of the fitness landscape is unchanged), such that the target genotype is assigned sequence 11…1 and the antipodal genotype 00…0. For each genotype the original sequence and the re-indexed sequence are highlighted with the same colour and linked with an arrow.

https://doi.org/10.1371/journal.pcbi.1005218.s007

(TIF)

### S1 Table. Fitness-to-distance correlations facilitate accessibility of the best-fit genotype.

Properties of the two fitness landscapes generated from the experimental data sets discussed in the main text [39, 42]. Randomized ensembles of FLs preserve fitness values, but not fitness-to-distance correlations.

https://doi.org/10.1371/journal.pcbi.1005218.s008

(PDF)

### S1 Data and Code. ZIP-archived directory containing all data and computer programs.

https://doi.org/10.1371/journal.pcbi.1005218.s009

(ZIP)

## Author Contributions

**Conceptualization:**MZ BW.**Data curation:**MZ.**Formal analysis:**MZ.**Funding acquisition:**MZ.**Investigation:**MZ.**Methodology:**MZ ZB BW.**Software:**MZ BW.**Supervision:**BW.**Validation:**MZ ZB BW.**Visualization:**MZ.**Writing – original draft:**MZ ZB BW.**Writing – review & editing:**MZ ZB BW.

## References

- 1. Wright S. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the Sixth International Congress of Genetics. 1932;1:356–366.
- 2. Orencia MC, Yoon JS, Ness JE, Stemmer WPC, Stevens RC. Predicting the emergence of antibiotic resistance by directed evolution and structural analysis. Nature Structural & Molecular Biology. 2001;8(3):238–242.
- 3. Weinreich DM, Delaney NF, DePristo MA, Hartl DL. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science. 2006;312(5770):111–114. pmid:16601193
- 4. Martinez JL, Baquero F, Andersson DI. Predicting antibiotic resistance. Nature Reviews Microbiology. 2007;5(12):958–965. pmid:18007678
- 5.
Novais Â, Comas I, Baquero F, Cantón R, Coque TM, Moya A, et al. Evolutionary trajectories of
*β*-lactamase CTX-M-1 cluster enzymes: Predicting antibiotic resistance. PLoS Pathogens. 2010;6(1):e1000735. pmid:20107608 - 6. Toprak E, Veres A, Michel JB, Chait R, Hartl DL, Kishony R. Evolutionary paths to antibiotic resistance under dynamically sustained drug selection. Nature Genetics. 2012;44(1):101–105.
- 7. Rodrigues JV, Bershtein S, Li A, Lozovsky ER, Hartl DL, Shakhnovich EI. Biophysical principles predict fitness landscapes of drug resistance. Proceedings of the National Academy of Sciences. 2016;113(11):E1470–E1478.
- 8. Bennett GM, Moran NA. Small, smaller, smallest: The origins and evolution of ancient dual symbioses in a phloem-feeding insect. Genome Biology and Evolution. 2013;5(9):1675–1688. pmid:23918810
- 9. Carneiro M, Hartl DL. Adaptive landscapes and protein evolution. Proceedings of the National Academy of Sciences. 2010;107(suppl 1):1747–1751.
- 10. Szendro IG, Schenk MF, Franke J, Krug J, de Visser JAGM. Quantitative analyses of empirical fitness landscapes. Journal of Statistical Mechanics: Theory and Experiment. 2013;2013(01):P01005.
- 11. de Visser JAGM, Krug J. Empirical fitness landscapes and the predictability of evolution. Nature Reviews Genetics. 2014;15(7):480–490. pmid:24913663
- 12. Palmer AC, Toprak E, Baym M, Kim S, Veres A, Bershtein S, et al. Delayed commitment to evolutionary fate in antibiotic resistance fitness landscapes. Nature Communications. 2015;6.
- 13. Marcusson LL, Frimodt-Møller N, Hughes D. Interplay in the selection of fluoroquinolone resistance and bacterial fitness. PLoS Pathogens. 2009;5(8):e1000541. pmid:19662169
- 14. Fontana W, Schuster P. A computer model of evolutionary optimization. Biophysical Chemistry. 1987;26(2):123–147. pmid:3607225
- 15. Court SJ, Waclaw B, Allen RJ. Lower glycolysis carries a higher flux than any biochemically possible alternative. Nature Communications. 2015;6:8427. pmid:26416228
- 16. Ullrich A, Rohrschneider M, Scheuermann G, Stadler P, Flamm C. In silico evolution of early metabolism. Artificial life. 2011;17(2):87–108. pmid:21370961
- 17. Kauffman S, Levin S. Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology. 1987;128(1):11–45. pmid:3431131
- 18. Kauffman SA, Weinberger ED. The NK model of rugged fitness landscapes and its application to maturation of the immune response. Journal of Theoretical Biology. 1989;141(2):211–245. pmid:2632988
- 19. Aita T, Uchiyama H, Inaoka T, Nakajima M, Kokubo T, Husimi Y. Analysis of a local fitness landscape with a model of the rough Mt. Fuji-type landscape: Application to prolyl endopeptidase and thermolysin. Biopolymers. 2000;54(1):64–79. pmid:10799982
- 20. Gavrilets S. A dynamical theory of speciation on holey adaptive landscapes. The American Naturalist. 1999;154(1):1–22.
- 21. Gavrilets S, Gravner J. Percolation on the fitness hypercube and the evolution of reproductive isolation. Journal of Theoretical Biology. 1997;184(1):51–64. pmid:9039400
- 22. Fontana W, Stadler PF, Bornberg-Bauer EG, Griesmacher T, Hofacker IL, Tacker M, et al. RNA folding and combinatory landscapes. Physical Review E. 1993;47:2083–2099.
- 23. Ancel LW, Fontana W. Plasticity, evolvability, and modularity in RNA. Journal of Experimental Zoology. 2000;288(3):242–283. pmid:11069142
- 24. Cowperthwaite MC, Meyers LA. How mutational networks shape evolution: lessons from RNA models. Annual review of ecology, evolution, and systematics. 2007; p. 203–230.
- 25. Maynard Smith J. Natural selection and the concept of a protein space. Nature. 1970;225(5232):563–564.
- 26. Babajide A, Hofacker IL, Sippl MJ, Stadler PF. Neutral networks in protein space: A computational study based on knowledge-based potentials of mean force. Folding and Design. 1997;2(5):261–269. pmid:9261065
- 27. Povolotskaya IS, Kondrashov FA. Sequence space and the ongoing expansion of the protein universe. Nature. 2010;465(7300):922–926. pmid:20485343
- 28. Gillespie JH. Some properties of finite populations experiencing strong selection and weak mutation. The American Naturalist. 1983;121(5):691–708.
- 29. Gillespie JH. Molecular evolution over the mutational landscape. Evolution. 1984;38(5):1116–1129.
- 30. Franke J, Klözer A, de Visser JAGM, Krug J. Evolutionary accessibility of mutational pathways. PLoS computational biology. 2011;7(8):e1002134. pmid:21876664
- 31. Hegarty P, Martinsson A. On the existence of accessible paths in various models of fitness landscapes. The Annals of Applied Probability. 2014;24(4):1375–1395.
- 32. Berestycki J, Brunet É, Shi Z. Accessibility percolation with backsteps. ArXiv e-prints. 2014.
- 33. Huynen MA, Stadler PF, Fontana W. Smoothness within ruggedness: The role of neutrality in adaptation. Proceedings of the National Academy of Sciences. 1996;93(1):397.
- 34. Greenbury SF, Schaper S, Ahnert SE, Louis AA. Genetic correlations greatly increase mutational robustness and can both reduce and enhance evolvability. PLOS Comput Biol. 2016;12(3):e1004773. pmid:26937652
- 35. Fontana W. Continuity in evolution: on the nature of transitions. Science. 1998;280(5368):1451–1455. pmid:9603737
- 36. Graner Fmc, Glazier JA. Simulation of biological cell sorting using a two-dimensional extended Potts model. Physical Review Letters. 1992;69:2013–2016. pmid:10046374
- 37. Otwinowski J, Nemenman I. Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter. PLOS ONE. 2013;8(5):e61570. pmid:23650500
- 38. Kouyos RD, Leventhal GE, Hinkley T, Haddad M, Whitcomb JM, Petropoulos CJ, et al. Exploring the complexity of the HIV-1 fitness landscape. PLOS Genetics. 2012;8(3):e1002551. pmid:22412384
- 39. Rowe W, Platt M, Wedge DC, Day PJ, Kell DB, Knowles J. Analysis of a complete DNA-protein affinity landscape. Journal of The Royal Society Interface. 2010;7(44):397–408.
- 40. Nowak S, Krug J. Accessibility percolation on n-trees. EPL (Europhysics Letters). 2013;101(6):66004.
- 41. Berestycki J, Brunet E, Shi Z. The number of accessible paths in the hypercube. Bernoulli. 2016;22(2):653–680.
- 42. Guenther UP, Yandek LE, Niland CN, Campbell FE, Anderson D, Anderson VE, et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature. 2013;502(7471):385–388. pmid:24056935
- 43. Saunders R, Deane CM. Synonymous codon usage influences the local protein structure observed. Nucleic Acids Research. 2010;38(19):6719–6728. pmid:20530529
- 44. Novoa EM, de Pouplana LR. Speeding with control: codon usage, tRNAs, and ribosomes. Trends in Genetics. 2012;28(11):574–581. pmid:22921354
- 45. O’Brien EP, Vendruscolo M, Dobson CM. Kinetic modelling indicates that fast-translating codons can coordinate cotranslational protein folding by avoiding misfolded intermediates. Nature Communications. 2014; 5.
- 46. Weinreich DM, Watson RA, Chao L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution. 2005;59(6):1165–1174. pmid:16050094
- 47. Poelwijk FJ, Tănase-Nicola S, Kiviet DJ, Tans SJ. Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapes. Journal of Theoretical Biology. 2011;272(1):141–144. pmid:21167837
- 48. Dawid A, Kiviet DJ, Kogenaru M, de Vos M, Tans SJ. Multiple peaks and reciprocal sign epistasis in an empirically determined genotype-phenotype landscape. Chaos. 2010;20(2). pmid:20590334
- 49. Wu NC, Dai L, Olson CA, Lloyd-Smith JO, Sun R. Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife. 2016;5:e16965. pmid:27391790
- 50. Puchta O, Cseke B, Czaja H, Tollervey D, Sanguinetti G, Kudla G. Network of epistatic interactions within a yeast snoRNA. Science. 2016 pmid:27080103
- 51. Sarkisyan KS, Bolotin DA, Meer MV, Usmanova DR, Mishin AS, Sharonov GV, et al. Local fitness landscape of the green fluorescent protein. Nature. 2016;533(7603):397–401. pmid:27193686
- 52. Orr HA. The genetic theory of adaptation: a brief history. Nature Reviews Genetics. 2005;6(2):119–127. pmid:15716908
- 53. Lozovsky ER, Chookajorn T, Brown KM, Imwong M, Shaw PJ, Kamchonwongpaisan S, et al. Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proceedings of the National Academy of Sciences. 2009;106(29):12025–12030.
- 54. Roy SW. Probing evolutionary repeatability: Neutral and double changes and the predictability of evolutionary adaptation. PLoS ONE. 2009;4(2):1–6.
- 55. Szendro IG, Franke J, de Visser JAGM, Krug J. Predictability of evolution depends nonmonotonically on population size. Proceedings of the National Academy of Sciences. 2013;110(2):571–576.
- 56. Price TD, Qvarnström A, Irwin DE. The role of phenotypic plasticity in driving genetic evolution. Proceedings of the Royal Society of London B: Biological Sciences. 2003;270(1523):1433–1440.
- 57. Feinberg AP. Phenotypic plasticity and the epigenetics of human disease. Nature. 2007;447(7143):433–440. pmid:17522677
- 58. Koonin EV, Wolf YI. Constraints and plasticity in genome and molecular-phenome evolution. Nature Reviews Genetics. 2010;11(7):487–498. pmid:20548290
- 59. de Vos MGJ, Dawid A, Sunderlikova V, Tans SJ. Breaking evolutionary constraint with a tradeoff ratchet. Proceedings of the National Academy of Sciences. 2015;112(48):14906–14911.
- 60. de Vos MGJ, Poelwijk FJ, Battich N, Ndika JDT, Tans SJ. Environmental dependence of genetic constraint. PLoS Genetics. 2013;9(6):1–8.
- 61.
Stiffler MA, Hekstra DR, Ranganathan R. Evolvability as a function of purifying selection in TEM-1
*β*-lactamase. Cell. 2015;160(5):882–892. pmid:25723163 - 62. Kashtan N, Noor E, Alon U. Varying environments can speed up evolution. Proceedings of the National Academy of Sciences. 2007;104(34):13711–13716.
- 63. Mustonen V, Lässig M. Molecular evolution under fitness fluctuations. Physical Review Letters. 2008;100:108101. pmid:18352233
- 64. Bitbol AF, Schwab DJ. Quantifying the role of population subdivision in evolution on rugged fitness landscapes. PLoS Computational Biology. 2014;10(8):1–15.
- 65. Zhang Q, Lambert G, Liao D, Kim H, Robin K, Tung Ck, et al. Acceleration of emergence of bacterial antibiotic resistance in connected microenvironments. Science. 2011;333(6050):1764–1767. pmid:21940899
- 66. Greulich P, Waclaw B, Allen RJ. Mutational pathway determines whether drug gradients accelerate evolution of drug-resistant cells. Physical Review Letters. 2012;109(8). pmid:23002776