The Time Scale of Evolutionary Innovation

A fundamental question in biology is the following: what is the time scale that is needed for evolutionary innovations? There are many results that characterize single steps in terms of the fixation time of new mutants arising in populations of certain size and structure. But here we ask a different question, which is concerned with the much longer time scale of evolutionary trajectories: how long does it take for a population exploring a fitness landscape to find target sequences that encode new biological functions? Our key variable is the length, of the genetic sequence that undergoes adaptation. In computer science there is a crucial distinction between problems that require algorithms which take polynomial or exponential time. The latter are considered to be intractable. Here we develop a theoretical approach that allows us to estimate the time of evolution as function of We show that adaptation on many fitness landscapes takes time that is exponential in even if there are broad selection gradients and many targets uniformly distributed in sequence space. These negative results lead us to search for specific mechanisms that allow evolution to work on polynomial time scales. We study a regeneration process and show that it enables evolution to work in polynomial time.


Bounds on hitting times of Markov chains on a line
In this section we will present our basic lower and upper bounds on hitting times of Markov chains on a line. The results of this section will be used repeatedly in the later sections to provide lower and upper bounds on the discovery time for several evolutionary processes. We start with the definition of Markov chains, and then define the special case of Markov chains on a line.
Definition S1 (Markov chains). A finite-state Markov chain MC L = (S, δ) consists of a finite set S of states, with S = {0, 1, . . . L} (i.e., the set of states is a finite subset of the natural numbers starting from 0), and a stochastic transition matrix δ that specifies the transition probabilities, i.e., δ(i, j) denotes the probability of transition from state i to state j (in other words, for all 0 ≤ i, j ≤ L we have 0 ≤ δ(i, j) ≤ 1 and for all 0 ≤ i ≤ L we have L j=0 δ(i, j) = 1). We now introduce Markov chains on a line. Intuitively a Markov chain on a line is defined as a special case of Markov chains, for which in every state, the allowed transitions are either self-loops, or to the left, or to the right. The formal definition is as follows. We now define the notion of hitting times for Markov chains on a line.
Definition S3 (Hitting time). Given a Markov chain on a line M L , and two states n 1 and n 2 (i.e., 0 ≤ n 1 , n 2 ≤ L), we denote by H(n 1 , n 2 ) the expected hitting time from the starting state n 2 to the target state n 1 , i.e., the expected number of transitions required to reach the target state n 1 starting from the state n 2 .
The recurrence relation for hitting time. Given a Markov chain on a line M L = (S, δ), and a state n 1 (i.e., 0 ≤ n 1 ≤ L), the following recurrence relation holds: 1. H(n 1 , n 1 ) = 0, 2. H(n 1 , i) = 1 + δ(i, i + 1) · H(n 1 , i + 1) + δ(i, i − 1) · H(n 1 , i − 1) + δ(i, i) · H(n 1 , i), for all n 1 < i < L, The argument is as follows: (a) Case 1 is trivial. (b) For case 2, since i = n 1 , at least one transition needs to be taken to a neighbor state j of i, from which the hitting time is H(n 1 , j). With probability δ(i, i + 1) the neighbor j is state i + 1, while with probability δ(i, i − 1) the neighbor j is state i − 1. On the other hand, with probability δ(i, i) the self-loop transition is taken, and the expected hitting time remains the same.
(c) Case 3 is a degenerate version of case 2, where the only possible transitions from the state L are either to the state L − 1, which is taken with probability δ(L, L − 1), or the self-loop, which is taken with probability δ(L, L). Also note that in Case 3 we have δ(L, L − 1) = 1 − δ(L, L). In the following lemma we show that using the recurrence relation, the hitting time can be expressed as the sum of a sequence of numbers.
Lemma S1. Consider a Markov chain on a line M L , with a target state n 1 , such that for all n 1 < i ≤ L we have δ(i, i − 1) > 0. For all n 1 < i ≤ L we have that H(n 1 , i) = L−n1−1 j=L−i b j , where b j is the sequence defined as: Proof. We consider the recurrence relation for the hitting time and first show that for all 0 ≤ i < L − n 1 we can write H(n 1  Exponential lower bound. We will use the following standard convention in this paper: a function t(L) is lower bounded by an exponential function, if there exist constants c > 1, > 0 and L 0 ∈ N such that for all L ≥ L 0 we have t(L) ≥ c ·L = 2 c * · ·L , where c * = log c > 0, i.e., it is lower bounded by a linear function in the exponent.
Exponential lower bound on hitting times for Markov chains on a line. In the following lemma we will show an exponential lower bound on the hitting time. We consider a Markov chain on a line M L , such that there exist two states x and y = x + k, for k > 0, such that in the whole contiguous segment between x and y the ratio of the probability to drift towards the right as compared to the left is at least 1 + A, for a constant A > 0 (strictly bounded away from 1). Then the expected hitting time from any starting point right of x to a target to the left of x is at least (1 + A) k−1 .
Lemma S3 (Lower bound). Consider a Markov chain on a line M L . If there exist two states x, y ≤ L with y = x + k, for k > 0, and a constant A > 0 such that for all x < i < y we have δ(i,i+1) δ(i,i−1) ≥ 1 + A, then for all n 1 , n 2 ≤ L such that n 1 ≤ x < n 2 we have H(n 1 , n 2 ) ≥ (1 + A) k−1 .
Proof. From Lemma S1 we have that H(n 1 , n 2 ) = L−n1−1 j=L−n2 b j : We have δ(i,i+1) δ(i,i−1) ≥ 1 + A by the given condition of the lemma. We show by induction that for all j between L − y and L − x − 1 (i.e., L − y ≤ j ≤ L − x − 1) we have b j ≥ a j−L+y (1 + A, 1).
2. (Inductive case). By inductive hypothesis on j − 1 we have b j−1 ≥ a j−1−L+y (1 + A, 1), and then we have Lemma S4 (Upper bound). Given a Markov chain on a line M L and 0 ≤ n 1 < n 2 ≤ L, if for all Proof. From Lemma S1 we have that H(n 1 , n 2 ) =

It follows that for all
Markov chains on a line without self-loops. A special case of the above lemma is obtained for Markov chains on a line with no self-loops in states other than state 0, i.e., for all 0 < i ≤ L we have 1 − δ(i, i) = 1 = B * . We consider a Markov chain on a line without self-loops M L , such that there exist two states x and y = x + k, for k > 0, such that in the whole contiguous segment between x and y the probability to drift towards the right is at least a constant A > 1 2 (strictly bounded away from 1 2 ). We also assume A < 1, since otherwise transitions to the left are never taken. Then the expected hitting time from any starting point right of x to a target to the left of x is at least c k−1 A , where c A = A 1−A > 1 (see Supplementary Figure 2). Corollary S1. Given a Markov chain on a line M L such that for all 0 < i ≤ L we have δ(i, i) = 0, the following assertions hold: 1. Lower bound: If there exist two states x, y ≤ L with y = x + k, for k > 0, and a constant A > 1 2 such that for all x ≤ i < y we have δ(i, i + 1) ≥ A > 1 2 , then for all n 1 , n 2 ≤ L such that n 1 ≤ x < n 2 we have H(n 1 ,
Unloop variant of Markov chains on a line. We will now show how given a Markov chain on a line with self-loops we can create a variant without self-loops and establish a relation on the hitting time of the original Markov chain and its variant without self-loops.
Definition S5 (Unloop variant of Markov chain on a line). Given a Markov chain on a line M L = (S, δ), we call its unloop variant a Markov chain on a line M L = (S, δ), with the following properties: Supplementary Figure 2: Lower and upper bound on hitting times for Markov chain on a line. Figure (a) shows a Markov chain on a line without self-loops, where for a length k between x and y the transition probabilities to the right are at least a constant A > 1 2 , and then the hitting time from any starting point n 2 to the right of x to a target n 1 to the left of x is at least exponential in the length k; figure (b) shows a Markov chain on a line without self-loops where all the transition probabilities to the left upto the target n 1 are at least 1 2 , and then the hitting time for any start point to the right of the target n 1 to the target is at most O(L 2 ); the graph (c) shows the exponential lower bound (red) and polynomial upper bound (green) on the hitting times H(n 1 , n 2 ) in the log-scale.
We now show the following: (1) the hitting time of the original Markov chain on a line M L is always at least the hitting time of the unloop variant; and (2) the hitting time of the original Markov chain is at most z * times the hitting time of the unloop variant, where z * is the maximum of the inverse of the 1 minus the self-loop transition probabilities.
Lemma S5. Consider a Markov chain on a line M L = (S, δ) and its unloop variant M L = (S, δ). Let 0 < n 1 , n 2 ≤ L and n 1 < n 2 , and let H(n 1 , n 2 ) denote the hitting time to state n 1 from state n 2 in M L , and H(n 1 , n 2 ) denote the corresponding hitting time in M L . The following assertions hold: Proof. From Lemma S1 we have that for all 0 < i ≤ L, we can write H(n 1 , i) = (i) We prove inductively that for all 0 < j < L − 1, we have b j ≤ b j .
1. (Base case). b 0 = 1 ≤ Thus for all such j, we have b j ≤ b j , and H(n 1 , n 2 ) = (ii) We prove inductively that for all 0 < j < L − 1, we have b j ≤ z * · b j .

Evolutionary Process
In this section we consider a simple model of evolutionary process, where organisms/genotypes are represented as strings of length L, and view evolution as a discrete time process. For simplicity, we will first consider the case of bit strings and present all our results with bit strings because all the key proof ideas are illustrated there. We will then generalize our results to strings for any alphabet size in Section 7. For a bit string s, at any time point a random mutation can appear with probability u, which will invert a single bit of the string s. Such mutations can be viewed as transitions between genotypes which form a random walk in the L-dimensional genotypic space of all 2 L strings. Notations. For L ∈ N, we denote by B(L) the set of all L-bit strings. Given a string s ∈ B(L), the neighborhood Nh(s) of s is the set of strings that differ from s by only one bit, i.e., Nh(s) = {s ∈ B(L) : s, s differ in exactly one position}. In order to model natural selection, we will consider a constant selection intensity β ∈ R and each string s will be associated with a fitness according to a fitness function f (s) ∈ R. The selection intensity and the fitness function will determine the transition probabilities between s and its neighbors. Transition probability between strings. Given a string s and s ∈ Nh(s), the transition probability ∆(s, s ) from s to s depends (i) on the fitness of s and the fitness of the neighbors in Nh(s), and (ii) the selection intensity. For all s ∈ Nh(s), let df (s, s ) = (f (s ) − f (s)) denote the difference in fitness of s and s , and let g(s, s ) = 1 1+e −β·df (s,s ) . Then the transition probability is defined as follows: The intuitive description of the transition probability (which is refered as Fermi process) is as follows: the term u represents the probability of a mutation occurring in s, while the choice of the neighbor s is based on a normalized weighted sum, with each sigmoid term 1 1+e −β·df (s,s ) being determined by the fitness difference between s, s and the selection intensity. The selection intensity acts like the temperature function. The high values of the selection intensity will favor those transitions to neighbors that have higher fitness, while setting β = 0 turns all the possible transitions of equal probability and independent of the fitness landscape (we refer to this case as neutral selection). Discovery time. Given a string space B(L), a fitness function f and a selection intensity β, for two strings s 1 , s 2 ∈ B(L), we denote by T (s 1 , s 2 , f, β) the expected discovery time of the target string s 1 from the starting string s 2 , i.e., the average number of steps necessary to transform s 2 to s 1 under the fitness landscape f and selection intensity β. Given a start string s 2 and a target set U of strings we denote by T (U, s 2 , f, β) the expected discovery time of the target set U starting from the string s 2 , i.e., the average number of steps necessary to transform s 2 to some string in U . In the following section we will present several lower and upper bounds on the discovery times depending on the fitness function and selection intensity. Moran evolutionary process. The evolutionary process we described is the Fermi process where the transition probabilities are chosen according to the Fermi function and the fitness difference. We will first present lower and upper bounds for the Fermi evolutionary process for mathematically elegant proofs, and then argue how the bounds are easily transferred to the Moran evolutionary process.

Neutral Selection
In this section we consider the case of neutral selection, and hence the transition probabilities are independent of the fitness function. Since β = 0 for all strings s, the transition probability equation (Eqn 1) simplifies to ∆(s, s ) = u L for all s ∈ Nh(s). We will present an exponential lower bound on the discovery time of a set of targets concentrated around the sequence 0, and we will refer to this case as broad peak. For a constant 0 < c < 1, let U L c denote the set of all strings such that at most cL bits are ones (i.e., at least (1 − c) · L bits are zeros). In other words, U L c is the set of strings that have Hamming distance at most cL to 0. We consider the set U L c as the target set. Because there is neutral selection the fitness landscape is immaterial, and for the sequel of this section we will drop the last two arguments of T (·, ·, f, β) since β = 0 and the discovery time is independent of f .
We Each state has a self-loop with probability (1−u), and we ignore the self-loop probabilities (i.e., set u = 1) because by Lemma S5 all lower bounds on the hitting time for the unloop variant are valid for the original Markov chain; and all upper bounds on the hitting time for the unloop variant need to be multiplied by 1 u to obtain the upper bounds on the hitting time for the original Markov chain. In other words, we will consider the following transition probabilities: Theorem S1. For all constants c < 1 2 , for all string spaces B(L) with L ≥ 4 1−2·c , and for all s Proof. We consider the Markov chain M L,0 for L ≥ 4 1−2·c and let us consider the midpoint i between cL and 1 2 · L, i.e., i = 1+2·c 4 · L. Such a midpoint exists since L ≥ 4 1−2·c . Then for all j such that cL ≤ j ≤ i we have The first inequality holds since j ≤ i, while the second inequality is due to c < 1 2 . We now use Corollary S1 (item 1) for M L,0 with n 1 = x = cL, y = i, and k = ( 1+2·c 4 − c) · L = · L and vary n 2 from x + 1 to L to obtain that H(n 1 , n 2 ) ≥ c ·L−1 A , and hence for all Four-letter alphabet. As in typical cases in biology the alphabet size is four (e.g., DNA, RNA), we state the analog of Theorem S1 for a four-letter alphabet. Later in this document, Theorem S4 states the general case for arbitrary alphabet. Consider the alphabet {0, 1, 2, 3}. We can again consider a Markov chain on a line M L,0 , where its i-th position encodes all the strings in B(L) which differ from t in exactly i positions. We consider a string s that corresponds to the i-th state of M L,β , for 0 < i < L. Then we have the following cases: • There are exactly i neighbors of s in state i − 1, since in each position among the i positions that s does not agree with t, there is exactly one mutation that will make s and t match in that position.
• There are exactly 3 · (L − i) neighbors of s in state i + 1, since in each position among the L − i positions in which s agrees with t, there are 3 mutations that will make s not agree with t in that position.
• There are exactly 2 · i neighbors of s in state i, since in each position j among the i positions that s does not agree with t, there are 2 mutations that will preserve this disagreement.
Based on the above analysis and Equation 1, the following holds for the transition probabilities of M L,0 : Proof. We prove each step separately.
1. We consider the Markov chain M L,0 for L ≥ L 0 = 8 3−4·c . Consider the midpoint i between cL and 3·L 4 , i.e., i = 3+4·c 8 · L (such a midpoint exists because L ≥ L 0 and the choice of c). For all cL < j ≤ i we have: = · L and vary n 2 from x + 1 to L to obtain that H(n 1 , n 2 ) ≥ A ·L−1 , and hence for all 2. We consider the Markov chain M L,0 . For every cL < j < L we have: Thus for all cL < j < L we have δ 0 (j, j − 1) ≥ δ β (j, j + 1), and δ 0 (j, j) ≤ 2 3 . Then, by Lemma S4 we have that H(cL, The figure shows that when the target set is U L c of strings that have at most c · n ones (blue in (a)), for c < 1 2 , for a region of length · L − 1, which is from c · n to the mid-point between cL and L 2 , the transition probability to the right is at least a constant A > 1 2 , and this contributes to the exponential hitting time to the target set. Figure (b) shows the comparison of the exponential time for multiple targets and single target under neutral selection. Figure 4: Constant selection with broad peaks. The figure shows the illustration of the dichotomy theorem. The blue region represents the states that correspond to targets, while the green region depicts the states where the transition probability to the left is greater than 1 2 . Intuitively given a selection intensity β, the selection intensity allows to reach the region 1 1+e β · L in polynomial time. In figure (a), there exists a region between the blue and green, of length · L, where the probability of transitioning to the right is a constant, greater than 1 2 . In other words, when the blue and green region do not overlap, in the mid-region between the blue and green region the transition probability to the right is at least A > 1 2 , and hence the hitting time is exponential. When β and c are large enough so that the two regions overlap (figure (b)), then all transitions to the left till the target set is at least 1 2 , and hence the hitting time is polynomial.

Constant Fitness Difference Function
In this section we consider the case where the selection intensity β > 0 is positive, and the fitness function is linear. For a string s, let h(s) denote the number of ones in s, i.e., the hamming distance from the string 0. We consider a linear fitness function f such that for two strings s and s ∈ Nh(s) we have , the difference in the fitness is constant and depends negatively on the hamming distance. In other words, strings closer to 0 have greater fitness and the fitness change is linear with coefficient −1. We call the fitness function with constant difference as the linear fitness function. Again we consider a broad peak of targets U L c , for some constant 0 < c < 1 2 . Since we consider all strings in U L c as the target set, it follows that for all strings s ∈ B(L) \ U L c the difference in the hamming distance between s and s ∈ Nh(s) from 0 and the target set U L c is the same. Similarly as in the neutral casel, due to symmetry of the linear fitness function f , we construct an equivalent Markov chain on a line, denoted M L,β = (S, δ β ), as follows: state i of the Markov chain represents strings with exactly i-ones, and we have the following transition function: (also see the technical appendix for the derivation of the above probabilities).
Again the discovery time corresponds to the hitting time in the Markov chain M L,β . Note that again we have ignored the self-loops of probability (1 − u), and by Lemma S5 all lower bounds for hitting time for the unloop variant are valid for the original Markov chain; and all upper bounds on the hitting time for the unloop variant need to be multiplied by 1 u to obtain upper bounds on the hitting time for the original Markov chain.
We will present a dichotomy result: the first result shows that if c · (1 + e β ) < 1, for selection intensity β > 0, then the discovery time is exponential, while the second result shows that if c · (1 + e β ) ≥ 1, then the discovery time is polynomial. We first present the two lemmas.
Proof. We consider the Markov chain M L,β for L ≥ L 0 = 2·v 1−c·v . Consider the midpoint i between cL and L v , i.e., i = 1+c·v 2·v · L (such a midpoint exists because L ≥ L 0 and the choice of c). For all cL < j ≤ i we have: The first inequality holds as j L−j ≤ i L−i since j ≤ i; the second equality is obtained since (v − 1) = e β and substituting i with its value 1+c·v 2·v · L; and the result of the equalities are simple calculation; and the description of the final inequality is as follows: establishing that the term along with 1 2 in A is strictly positive. We now use Corollary S1 (item 1) for M L,β with n 1 = x = cL, y = i, and k = 1−c·v 2·v · L = · L and vary n 2 from x + 1 to L to obtain that H(n 1 , n 2 ) ≥ c ·L−1 A , and hence for all Lemma S7. For all string spaces B(L), for all c < 1 2 and the linear fitness function, for all selection Proof. We consider the Markov chain M L,β , where β is such that we have c ≥ 1 1+e β . For every cL < j < L we have: The first inequality holds because L−j j ≤ L−cL cL since cL < j; the second inequality holds since c · (1 + e β ) ≥ 1 which implies that 1 ≥ 1 e β · ( 1 c − 1), and hence 1 + e −β · ( 1 c − 1) ≤ 2. Thus for all cL < j < L we have δ β (j, j − 1) ≥ 1 2 , and by Corollary S1 (item 2) we have that H(cL, The desired result follows.
Theorem S3. For the linear fitness function f , selection intensity β > 0, and constant c ≤ 1 2 , the following assertions hold:

Moran Process Model
In the previous section we considered the constant selection intensity with Fermi process. We now discuss how from the results of the previous section we can obtain similar results if we consider the Moran process for evolution. Basic Moran process description. A population of N individuals mutates with probability u in each round, at N · u rate. Consider that the population is currently in state i (which represents all bit strings with exactly i ones): the probability that the next state is i − 1 is the rate of an i − 1 mutant to be introduced, times the fixation probability of the mutant in the population. Formally, the transition probability matrix δ M (M for Moran process) for the Markov chain on a line under the Moran process is as follows: We assume that N ·u < 1 and ρ i,j is the fixation probability of a j mutant in a population of N −1 individuals of type i. In particular, and ρ i,j ∈ (0, 1) for positive fitness f i and f j , where f i (resp. f j ) denotes the fitness of strings with exactly i (resp. j) ones. We first show a bound for the self-loop probabilities δ M (i, i): since strings closer to the target have a greater fitness value we have f i−1 ≥ f i ; and hence the probability of fixation of an (i − 1)-mutant in a population of type i is at least 1 N . Thus we have . We now consider the case of multiplicative fitness function.
Multiplicative fitness rates. We consider the case where we have multiplicative fitness function where fi−1 fi = r i ≥ 1, as the fitness function increases as we move closer to the target. Then For constant factor r i = r for all i, we obtain Summary of results for Moran process with multiplicative fitness landscape. From the results of Section 4 and Section 5, and the equivalence of the transition probabilities of the Markov chains in Section 4 and Section 5 with those in the Moran process, we obtain the following results for Moran process of evolution under constant multiplicative fitness landscape r: 1. (Single target). For a single target, for all constants r and population size N , the discovery time from any non-target string to the target is exponential in the length of the bit strings.

(Broad peaks).
For broad peaks with constant c fraction of clustered targets with c ≤ 1 2 , if c·(1+r N −1 ) < 1, then the discovery time from any non-target string to the target set is at least exponential in the length L of the bit strings; and if c · (1 + r N −1 ) ≥ 1, then the discovery time from any non-target string to the target set is at most O( L 3 u ) (i.e., polynomial). The polynomial discovery time for a broad peak surrounded by a fitness slope, requires the slope to extend to a Hamming distance greater than 3L/4. What happens then, if the slope only extends to a certain maximum distance less than 3L/4? Suppose the fitness gain only arises, if the sequence differs from the specific sequence in not more than a fraction s of positions. Formally, we can consider any fitness function, f , that assigns zero fitness to sequences that are at a Hamming distance of at least sL from the specific Supplementary Figure 5: Broad peak with different fitness landscapes. For the broad peak there is a specific sequence, and all sequences that are within Hamming distance cL are part of the target set. (A) If the fitness landscape is flat outside the broad peak and if c < 3/4, then the discovery time is exponential in sequence length, L. (B) If the broad peak is surrounded by a multiplicative fitness landscape whose slope extends over the whole sequence space, then the discovery time is either polynomial or exponential in L depending on whether c(1 + r N −1 /3) ≥ 1 or not. (C) If the fitness slope extends to a Hamming distance less than 3L/4, then the discovery time is exponential in L. (D) Numerical calculations for broad peaks surrounded by flat fitness landscapes. We observe exponential discovery time for c = 1/3 and c = 1/2. (E) Numerical calculations for broad peaks surrounded by multiplicative fitness landscapes. The broad peak extends to c = 1/6 and the slope of the fitness landscape to s = 1/2. The discovery time is exponential, because s < 3/4. The fitness gain is r = 1.01 and the population size is as indicated. As the population size, N , increases the discovery time converges to that of a broad peak with c = 1/2 embedded in a flat fitness landscape.
sequence. Now our previous result for neutral drift with broad peak applies. Since we must rely on neutral drift until the fitness gain arises, the discovery time in this fitness landscape is at least as long as the discovery time for neutral drift with a broad peak of size c = s. If s < 3/4, then the expected discovery time starting from any sequence outside the fitness gain region is exponential in L. Tables 1 and 2

General Alphabet
In previous sections we presented our results for L-bit strings. In this section, we consider the case of general alphabet, where every sequence consists of letters from a finite alphabet Σ. Thus, B(L) is the space of all L-tuple strings in Σ L . We fix a letter σ ∈ Σ, and consider a target set U L c , consisting of all the L-tuple strings, such that every s ∈ U L c differs from the target string t = σ L (of all σ's) in at most cL positions (i.e., Hamming distance at most c · L from the target string t). We will prove a dichotomy result that generalizes Theorem S3.
We can again consider a Markov chain on a line M L,β , where its i-th position encodes all the strings in B(L) which differ from t in exactly i positions. We consider a string s that corresponds to the i-th state of M L,β , for 0 < i < L. Then we have the following cases: • There are exactly i neighbors of s in state i − 1, since in each position among the i positions that s does not agree with t, there is exactly one mutation that will make s and t match in that position.
• There are exactly (L − i) · (|Σ| − 1) neighbors of s in state i + 1, since in each position among the L − i positions in which s agrees with t, there are |Σ| − 1 mutations that will make s not agree with t in that position.
• There are exactly i · (|Σ| − 2) neighbors of s in state i, since in each position j among the i positions that s does not agree with t, there are |Σ| − 2 mutations that will preserve this disagreement.
Lemma S8. For the linear fitness function f , for all selection intensities β ≥ 0 and all constants c ≤ κ κ+1 such that c · v < 1 for v = 1 + e β κ , there exists L 0 ∈ N such that for all string spaces B(L) with L ≥ L 0 , for · κ · e −β > 1 and = 1−c·v 2·v . Proof. We consider the Markov chain M L,β for L ≥ L 0 = 2·v 1−c·v . Consider the midpoint i between cL and L v , i.e., i = L · 1+c·v 2·v (such a midpoint exists because L ≥ L 0 and the choice of c, as i > cL). For all cL < j ≤ i we have: The first inequality holds because j ≤ i and thus L−j j ≥ L−i i . The equalities follow as simple rewriting, while A > 2·v−2 2 · κ · e −β = (v − 1) · κ · e −β = 1, since c · v < 1. We now use Lemma S3 for M L,β with n 1 = x = cL, y = i, and k = L · 1−c·v 2·v = · L and vary n 2 from x + 1 to L to obtain that H(n 1 , n 2 ) ≥ A . Proof. We consider the Markov chain M L,β , where β is such that we have c · v ≥ 1. For every cL < j < L we have: The first inequality holds because cL < j; the second inequality holds because c · (1 + e β κ ) ≥ 1 and thus . Then, by Lemma S4 we have that H(cL, n 2 ) = O( L 2 M ) for all n 2 > cL. We conclude that The desired result follows.
Corollary S2. For alphabet size |Σ| = 1+κ, consider the Moran process with multiplicative fitness landscape with constant r, population size N , and mutation rate u. Let c ≤ κ κ+1 . The following assertions hold : Explicit bounds for four letter alphabet. We now present the explicit calculation for L 0 and of Corollary S2 for four letter alphabet. For the four letter alphabet we have κ = 3, and for the exponential lower bound we have cv < 1. In this case we have Since cv < 1 we have By changing the exponential lower bound to base 2, we have that the discovery time is at least 2 ( L−1) log 2 A . Thus we have the following two cases: • Selection: With selection (i.e., r > 1) the exponential lower bound on the discovery time when cv < 1 is at least: • Neutral case: Specializing the above result for the neutral case (i.e., r = 1) we obtain the exponential lower bound on the discovery time when cv < 1 is at least: for all L ≥ L 0 = 8 3−4c . We ignore the factor 1 as compared to L and have that 2 ( 3−4c 4c+3 . Discussion about implications of results. We now discuss the implications of Corollary S2. 1. First the corollary implies that for a single target (which intuitively corresponds to c = 0) even with multiplicative fitness landscape (which is an exponentially increasing fitness landscape) the discovery time is exponential. 2. The discovery time is polynomial if c · (1 + r N −1 κ ) ≥ 1, however this requires that the slope of the fitness gain extends over the whole sequence space (at least till Hamming distance (κ/(κ + 1)) · L).
3. Consider the case where the fitness gain arises only when the sequence differs from the target in not more than a fraction of s positions, i.e., the slope of the fitness function only extends upto a Hamming distance of s·L. Now our result for neutral drift with broad peak applies. Since we must rely on neutral drift until the fitness gain arises, the discovery time of this process is at least as long as the discovery time for neutral drift with a broad peak of size c = s. If r = 1 (neutral drift), then we have that the discovery time is polynomial if c(1 + 1 κ ) ≥ 1, and otherwise it is exponential. Hence if the fitness gain arises from Hamming distance s · L and s < κ/(κ + 1), then the expected discovery time starting from any sequence outside the fitness gain region is exponential in L. Moreover, there are two further implications of this exponential lower bound. First, note that if r = 1, then r N −1 is 1 independent of N , and thus the exponential lower bound is independent of N . Second, note that if the fitness gain arises from Hamming distance s · L, and it is neutral till the fitness gain region is reached, then the exponential lower bound for s < κ/(κ + 1), is also independent of the shape of the fitness landscape after the fitness gain arises. Formally, if we consider any fitness function f that assigns zero fitness to strings that are at Hamming distance at least s · L from the ideal sequence, and any nonnegative fitness value to other strings, then the process is neutral till the fitness gain arises, and the exponential lower bound holds for the fitness landscape, and is independent of the population size. For a four letter alphabet (as in the case of RNA and DNA) the critical threshold is thus s = 3/4.
Remark S1. Note that we have shown that all results for bit strings easily extend to any finite alphabet by appropriately changing the constant. For simplicity, in the following sections we present our results for strings over 4-letter alphabet, and they also extend easily to any finite alphabet by appropriately changing the constants.
Remark S2. We have established several lower bounds on the expected discovery time. All the lower bounds are obtained from hitting times on Markov chains, and in Markov chains the hitting times are closely concentrated around the expectation. In other words, whenever we establish that the expected discovery time is exponential, it follows that the discovery time is exponential with high probability.

Multiple Independent Searches
In this section we consider multiple independent searches. For simplicity we will consider strings over 4-letter alphabet, and as shown in Section 7 the results easily extend to strings over alphabets of any size.
8.1. Polynomially many independent searches. We will show that if there are polynomially many multiple searches starting from a Hamming distance of at least 3L 4 , then the probability to reach the target in polynomially many steps is negligibly small (smaller than an inverse of any polynomial function). We will present our results for Markov chain on a line, and it implies the results for the evolutionary processes. In all the following lemmas we consider the Markov chain on a line for a four letter alphabet. Before the formal proof we present informal arguments and intuition for the proof. The basic intuition and steps of the proof. The basic intuition and steps of our proof are as follows: 1. First we show that in the Markov chain on a line, from any point n 2 ≥ 3L 4 the probability to not reach 3L 4 in polynomially many steps is very (exponentially) small. The key reason is that we have shown that the expected hitting time from n 2 to 3L 4 is at most L 2 ; and hence the probability to reach 3L 4 from n 2 within L 5 steps is very high. Thus the probability to not reach within L 5 steps is very small (see Lemma S10).
2. Second, we show that the contribution to the expected hitting time for the steps beyond L 2 · 2 L log L is at most a constant. The informal reasoning is that beyond the expected hitting time, the probability to not reach in steps beyond the expected hitting time drops exponentially. Hence we obtain a geometric series whose sum is bounded by a constant (see Lemma S11).
3. Then we show that if the expected hitting time is exponential, then for all polynomials p 1 (·) and p 2 (·), the probability to reach within p 1 (L) steps is smaller than 1 p2(L) . The key argument is to combine the previous two results to show that if the probability to reach within p 1 (L) steps is more than 1 p2(L) , then the expected hitting time would be a polynomial, contradicting that the expected hitting time is exponential.
The formal arguments of the above results yield Theorem S5. We present the formal proof below.
Lemma S10. From any point n 2 ≥ 3L 4 the probability that 3L 4 is not reached within L 5 steps is exponentially small in L (i.e., at most e −L ).
Proof. We have already established that the expected hitting time from n 2 to 3L 4 is at most L 2 . Hence the probability to reach 3L 4 within L 3 steps must be at least 1 L (otherwise the expectation would have been greater than L 2 ). Since from all states n 2 ≥ 3L 4 the probability to reach 3L 4 is at least 1 L within L 3 steps, the probability that 3L 4 is not reached within k · L 3 steps is at most 1 − 1 L k . Hence the probability that 3L 4 is not reached within L 5 steps is at most The desired result follows.
Lemma S11. The contribution of the expectation to reach after L 2 · 2 L·log L steps to the expected hitting time is at most a constant (i.e., O(1)).
Proof. From any starting point, the probability to reach the target within L steps is at least 1 L L . Hence the probability not reaching the target within k · L L steps is e −k . Hence the probability to reach after · L 2 · 2 L·log L steps at most e − ·L 2 . Thus expectation contribution from L 2 · 2 L·log L steps is at most The desired result follows.
Lemma S12. In all cases, where the lower bound on the expected hitting time is exponential, for all polynomials p 1 (·) and p 2 (·), the probability to reach the target set from any state n 2 such that n 2 ≥ 3L 4 within the first p 1 (L) steps is at most 1 p2(L) .
Proof. We first observe that from any start point n 2 ≥ 3L 4 the expected time to reach 3L 4 is L 2 , and the probability that 3L 4 is not reached within L 5 steps is exponentially small (Lemma S10). Hence if the probability to reach the target set from 3L 4 within p 1 (L) steps is at least 1 p2(L) , then from all states the probability to reach within L 5 · p 1 (L) steps is at least 1 L·p2(L) . In other words, from any state the probability that the target set is not reached within L 5 · p 1 (L) steps is at most (1 − 1 L·p2(L) ). Hence from any state the probability that the target set is not reached within k · L 5 · p 1 (L) steps is at most (1 − 1 L·p2(L) ) k . Thus from any state the probability that the target set is not reached within L 3 · p 2 (L) · L 5 · p 1 (L) steps is at most Hence the probability to reach the target within L 8 · p 1 (L) · p 2 (L) steps is at least 1 − 1 e L 2 . By Lemma S11 the expectation contribution from steps at least L 2 · 2 L·log L is constant (O(1)).
Hence we would obtain an upper bound on the expected hitting time as Note that the above bound is obtained without assuming that p 1 (·) and p 2 (·) are polynomial functions. However, if p 1 (·) and p 2 (·) are polynomial, then we will obtain a polynomial upper bound on the hitting time, which contradicts the exponential lower bound. The desired result follows.
Corollary S3. In all cases, where the lower bound on the expected hitting time is exponential, let us denote by h denote the expected hitting time. Given numbers t 1 and t 2 , the probability to reach the target set from any state n 2 such that n 2 ≥ 3L 4 within the first t 1 = h L 9 ·t2 steps is at most 1 t2 .
Proof. In the proof of Lemma S12 first we established that the hitting time is at most L 9 ·p 1 (L)·p 2 (L) (without assuming they are polynomial). By interpreting t 1 as p 1 (L) and t 2 = p 2 (L) we obtain that h ≤ L 9 · t 1 · t 2 . The desired result follows.
Theorem S5. In all cases, where the lower bound on the expected hitting time is exponential, for all polynomials p 1 (·), p 2 (·) and p 3 (·), for p 3 (L) independent multiple searches, the probability to reach the target set from any state n 2 such that n 2 ≥ 3L 4 within first p 1 (L) steps for any of the searches is at most 1 p2(L) .
Proof. Consider the polynomial p 2 (L) = p 3 (L)·p 2 (L). Then by Lemma S12 for a single search the probability to reach the target within p 1 (L) steps is at most 1 p 2 (L) . Hence the probability that none of the search reaches the target in p 1 (L) steps is The desired result follows. Remark S3. Observe that in Theorem S5 the independent searches could start at different starting points, and the result still holds, because in all cases we established an exponential lower bound, the lower bound holds for all starting points outside the target region.
8.2. Probability of hitting in a given number of steps. We now present a simple (and informal) argument for the approximation of the probability that none of M independent searches succeed to discover the target in a given number of b steps, where the expected discovery time for a single search is d, for b << d. The steps of the argument are as follows: 1. First we observe that the expected discovery time is the expected hitting time in a Markov chain, and the probability distribution of the hitting time in a Markov chain is largely concentrated around the expected hitting time d, when the expected hitting time is exponential and the starting state is far away from the target set. The result of sharp concentration around the expected hitting time is a generalization of the classical Chernoff bound for the sum of independent variables: the generalization for Markov chains is obtained by considering Azuma-Hoeffding's s inequality for bounded martingales [1, Chapter 7] that shows exponential concentration around the expected hitting time. Note that for Markov chains, the martingale for expected hitting time is bounded by 1 (as with every step the hitting time increases by 1).
2. Given that the probability distribution is concentrated around the mean, an approximation of the probability that a single search succeeds in b steps is at most b d , for b << d.
3. Given independent events E 1 , E 2 , . . . , E M such that the success probabilities are a 1 , a 2 , . . . , a M , respectively, by independence (that allows us to multiply probabilities) the probability that none of the events succeed is (1 − a 1 ) · (1 − a 2 ) . . . (1 − a M ). Hence for independent M searches the probability that none of the searches succeed in b steps when the expected hitting time for a single search is d is at least The above reasoning gives an informal argument to obtain an upper bound on the probability of success of M independent searches in b steps.

Distributed Targets
We now discuss several cases of distributed targets for which the exponential lower bounds can be obtained from our results. We discuss the results for four letter alphabet. 1. Consider the example of distributed targets where the letters in a given L 0 number of positions are immaterial (e.g., the first four positions, the tenth position and the last four positions are immaterial, and hence L 0 = 9 in this case). Then we can simply apply our results ignoring the positions which are immaterial, i.e., the string space of size L − L 0 , and apply all results with effective length L − L 0 . 2. Consider the example where the target set is as follows: instead of the target of all σ's (i.e., t = σ L ), the target set has all sequences that have at least an α · L length segment of σ's, for α > 1/2. Then all the targets have an overlapping segment of (2 · α − 1) · L number of σ's from position (1 − α) · L to α · L. We can then obtain a lower bound on the discovery time of these targets by considering as target set the superset containing all sequences with σ's in that region. In other words, we can apply our results with single target but the effective length is (2 · α − 1) · L. A pictorial illustration of the above two cases is shown in Supplementary Figure 6. 3. We now consider the case of distributed targets that are chosen uniformly at random and independently, and let m << 4 L be the number of distributed targets. Let the selection gradient extend up to a distance of cL from a target, for c < 3/4. Formally we consider any fitness landscape f that assigns zero fitness to a string whose Hamming distance exceeds cL from every target. We consider a starting sequence for the search and argue about the estimate on the expected discovery time.
• First we consider the Markov chain M defined on B(L) where every string s in B(L) is a state of the Markov chain. The transition probability from a string s to a neighboring string in Nh(s) of Hamming distance 1 is 1 |Nh(s)| . The Markov chain M has the following two properties: it is (i) irreducible, i.e., the whole Markov chain M is a recurrent class; and (ii) reversible, i.e., if there is a transition probability from s to s , there is also a transition probability from s to s. • Since M is irreducible and reversible, and due to its symmetric nature, it has a very fast mixing time (the number of steps required to converge to the stationary distribution). In particular, the stationary distribution, which is the uniform distribution over B(L), is converged to with in O(L · log L) steps [2]. • Since c < 3/4, the expected time to reach a string from where the selection gradient to a specific target is felt is exponential (by Corollary S2). Thus given m << 4 L and c < 3/4, a string from where the selection gradient to any target is felt is reached with in the first O(L · log L) steps with low probability. • Since any string from where the selection gradient is felt to a target is reached with in the first O(L · log L) steps with low probability, and after O(L · log L) steps M converges to the uniform distribution, a lower bound on the expected discovery time can be obtained as follows: consider the probabilistic process that in every step chooses a string in B(L) uniformly at random and the process succeeds if the chosen string has a Hamming distance at most cL from any of the target sequence. The expected number of steps required for the success of the probabilistic process is a lower bound on the expected discovery time. Hence we first estimate the success probability of every step for the probabilistic process. Consider a target string and a string chosen uniformly at random. Since the string is chosen uniformly at random, we can equivalently think that the process is generating uniform distribution over the alphabet for every position of the string sequence. The probability that the i-th position of the sequence of a target differs from the chosen sequence has probability 3/4 (since we have a four letter alphabet). In other words, the generation of the positions of the string are Bernoulli random variables with mean 3/4. Let X denote the random variable for the number of positions of a target that differ from the chosen sequence (in other words, X denotes the Hamming distance), and hence X is distributed according to Bionomial (L, 3/4). We now apply Hoeffding's inequality and obtain that the probability that chosen string lies within the selection gradient from a specific target is at most By union bound, the probability of success in every step is at most m · exp −2 · (3/4 − c) 2 · L , and thus the expected discovery time is at least . Note that in proof of the lower bound above any sequence with positive fitness is considered as a target, and hence the lower bound on the expected discovery time holds even if there is a broad peak of width cL around each of the m target sequences. Theorem S6. Consider the four letter alphabet, and a starting sequence in B(L). Let the target set of m << 4 L sequences be chosen uniformly at random, with selection extending up to a distance of cL from each target sequence, with c < 3/4. Then with high probability the expected discovery time of the target set is at least Hence, if m is polynomial, or even an exponential smaller than exp 2 · (3/4 − c) 2 · L , then the expected discovery time is exponential with high probability.

A Mechanism for Polynomial Time
In the previous sections we have shown the scenarios where the discovery time is not polynomial. We now discuss a way that can ensure polynomial bounds. In the regeneration process, the process of evolution keeps on generating strings close to the target (say of distance k from the target). If the initial distance is k and constant, then we show with very high probability in polynomially many regenerations the target is reached. Regeneration process. Formally, the regeneration process has two key aspects: (1) The process keeps on generating starting sequences; and (2) the starting sequences that are generated are only a constant number k steps away from some sequence in the target set. Polynomially many regenerations. First note that from every string s, if there is a transition from the string s to a neighbor Nh(s), then there is at least a probability of 1 4·L to a neighbor that is closer to the target set. Hence in every step, if there is a mutation, then the probability to move closer to the target set is at least 1 4·L (and in expectation a transition to a neighbor occurs in every 1 u steps). Thus with probability at least α = 1 4·L k the target is reached for a single trial in O( k u ) steps, i.e., the probability is obtained as the product of the probabilities that each of the k steps of mutations bring closer to the target. Thus the probability to not reach in L · (4 · L) k = L · 1 α independent trials (or regeneration steps) is at most i.e., exponentially small in L. In other words, with L · (4 · L) k = L · 1 α trials (regenerations) the target is discovered in time at most O( k u ·L· 1 α ) with very high probability; i.e., if k is constant, then with polynomially many regenerations the target is discovered in polynomial time with very high probability.

Calculations and Details of Data of Article
We first present a calculation of the number of targets in a broad peak.
Calculation 1. For a four letter alphabet, the number of sequences that differ in at most cL positions form 0 is By Stirling's approximation we have n! ≥ ( n e ) n √ 2πn and thus we have where we first apply Stirling's approximation, and for the inequality use that since c < 3 4 we have 1 c cL+0.5 ≥ 1. By converting the exponential to base 2 we obtain Hence the number of sequences at hamming distance at most cL from 0 grows exponentially, as as 3 cL ≥ 2 c 2 L since c < 1, and √ 2πL < 3L. Thus the probability of at least one search succeeding within 10 14 generations is at most 10 −26 0.
More precise version of Table 1 of article. A more elaborate and precise version of Table 1 of article is given below.

Related Work
In this section we discuss and compare our results with relevant related works from population genetics. Genetic adaptation on continuous and sequence space. The subject of genetic adaptation has been an active research area for several decades, and has been nicely summarized by Orr [3]. In a seminal work [4], Fisher introduced the geometric model of adaptation in order to capture the statistical properties of beneficial mutations and their effect in a continuous phenotypic space. He concluded that evolution proceeds via mutations of small effect, a view that was first reconsidered later by Kimura [5]. Orr [6] extended this work of Kimura by studying the distribution of sizes of mutations for the whole evolutionary walk, and showed that it is an exponential distribution which retains its shape (but gradually shrinking) for the whole of the walk (also see [7] for a review and summary of this work). Kimura is also known for having introduced the neutral theory of molecular evolution [8]. To quote from Orr [3] "Throughout the 1960s and 1970s, evolutionary geneticists grew increasingly convinced that much, if not most, molecular evolution reflects the substitution of neutral [8,9] or nearly neutral [5,[10][11][12] mutations, not beneficial ones." In [13,14], Maynard Smith conceived the idea that organisms evolve in the discrete, high-dimensional space of DNA and protein sequences, and the adaptive walk proceeds via unit mutational steps to fitter sequences. The idea of exploring sequence space was expanded in [15], where evolution in rugged fitness landscapes was captured by the NK model. Recently, it has been demonstrated empirically that the ruggedness of the NK model is correlated, so that typically large peaks are clustered together [16]. Gillespie [17] described a simple stochastic substitution model under strong selection and weak mutation, and by means of extreme value theory concluded that the mean number of gene substitutions until fixation is small. This view was further developed in [18,19], where the assumption that the starting sequence must be highly fit was necessary for efficient evolution. In a similar setting, Orr [20] showed that finding local optima in sequence spaces takes at least e − 1 steps where e = 2.71. This came as a conclusion from the observation that the mean distance to a local maxima is e − 1.
The speed of adaptation has also frequently been characterized in terms of fixation rates of beneficial mutations. Orr [21] studied the rate of adaptive substitutions in asexuals as a function of the mutation rate under the assumption that selection against the deleterious mutations is stronger than selection in favor of the beneficial one. It was shown that the mutation rate which maximizes the adaptation rate depends only the strength of selection against deleterious mutations. This work was later extended in [22] where it was shown that beneficial alleles with relatively small beneficial advantage also have relatively small probability of fixation.

Role of recombination.
A key research question is what phenomenon contributes to speed-up of the evolutionary search process. The classical work of Crow and Kimura shows that recombination leads to a speed in evolution. Crow and Kimura [23] studied the advantage that recombination confers to an adapting sexual population over its asexual counterpart, by eliminating the clonal interference between simultaneously emerging beneficial mutants. In [23] the length L of the genome sequence is not a parameter, and the results show that the speed-up due to recombination is proportional to the population size. The speed-up of recombination in various models with L also as a parameter was considered by Maynard Smith, and Table 1 in [24] summarizes the relative speed-up under various models. In the best case, the speed-up due to recombination is proportional to the product of the population size and the length L of the genome. Charlesworth in [25] also examined the advantage in the population mean fitness that sexual populations have over asexuals, for various dynamic selection functions, and showed that this advantage is substantial for various breeding systems. The advantage of sexual reproduction continuous to be an active topic of study. A recent study [26] used a theoretical argument to demonstrate that in infinite populations of two loci that can accumulate infinite many mutations each, as long as the rate of recombination is r > 0, the speedup advantage of the sexual populations over asexuals is approaches 2. This is because in infinite populations, no matter how small the recombination rate is, as long as it is non-zero, a small fraction of highly fit individuals due to recombination will always exist. Because this portion of the population has an exponentially large fitness advantage, it will spread fast, and the rate of adaptation approaches that of extreme recombination (r = 1), which was proven to be 2. In [27], several different modes of recombination were examined and it was shown that in all models, when the population size is large, the rate of adaptation has a logarithmic dependency on N · u (population size times beneficial mutation rate), rather than linear. However, the rate of adaptation has a quadratic dependency on the recombination rate, showing that recombination plays a more important role in this rate for large populations. There are, of course, other parameters that affect the rate of adaptation in sexuals. In [28], the authors examined sexual reproduction in chromosomes where short stretches are linked together, and hence, clonal interference exists locally. The population can be effectively seen as consisting of many asexual stretches linked together by a smaller recombination factor, and it was estimated that beneficial mutations fix at a rate proportional to the new rate of recombination, with only logarithmic dependency on the rate in which beneficial mutations are introduced. Our results. In this work our contributions are as follows: 1. We present the mathematical foundations to estimate the expected number of steps for evolutionary processes as a function of L; 2. we characterize scenarios when the expected time is exponential in L; 3. we present strong dichotomy results between exponential vs polynomial time; 4. we suggest a mechanism that enables to break the infeasible exponential barrier and allows evolution to work in polynomial time. In part, our work systematically explores fitness landscapes with large neutral regions, where broad peaks are distributed at random. Such landscapes are widespread in biology. Studies have shown that the fitness landscapes of RNA are typically highly neutral, with very low peak density (around 10 −13 , according to [29]). Wagner's studies have also identified neutrality as an important ingredient for robustness [30,31]. Elaborate computer simulations have also outlined the importance of random drift in the emergence of novel functions [32].
Our results nicely combine and explain several existing results. The regeneration process that breaks the exponential barrier requires that (i) the starting sequence starts only a constant number of steps away from the target and (ii) the starting sequence can be repeatedly generated. The first aspect is related to the results of Gillespie [18,19] that using extreme value theory suggests that the starting sequence must be a highly fit sequence for efficient evolution (i.e., in our setting close to the target). The second aspect ties in with the long-standing ideas that gene (and genome) duplications are the major events leading to the emergence of new genes [33] and that evolution is a 'tinkerer' playing around with small modifications of existing sequences rather than creating entirely new sequences [34]. Recent studies [35] have shown that large numbers of transcirption factor binding sites have been evolved under divergent evolution, supporting the claim of local sequence duplication as a mechanism to emerge new functions. Our work shows that the combination of these two ideas break the exponential barrier. Our results also nicely combine with the existing results on recombination. Recombination that may lead to a linear factor speed-up does not change an exponential function to a polynomial one, but may contribute greatly to the efficiency of a polynomial process. The polynomial upper bound of L k+1 for regeneration process holds without selection and recombination. But the polynomial bound of L k+1 can still be inefficient, and then selection and possibly recombination plays the role to make the feasible polynomial bound much more efficient.

Additional Simulation Results
In this section we describe some additional computer simulation results. Our first simulation result is for the Moran process and per-bit mutation rate. The second simulation result is for another classical evolutionary process, namely, the Wright-Fisher process. The details are described in Supplementary Figure 7 and Supplementary Figure 8, respectively.

A. Technical Appendix: Linear Fitness Transition Probabilities
We derive the transition probabilities of the corresponding Markov chain on a line M L,β for the case of the linear fitness landscape and any selection intensity. That is, given s and s ∈ Nh(s), we have df (s, s ) = (f (s ) − f (s)) = −(h(s ) − h(s)). Our goal is to show that for 0 < i < L we have: x = e −β · L−i i . Figure 7: Moran process with per-bit mutation rate. The figure shows the results of the average discovery time obtained from computer simulation of a Moran process with per-bit mutation rate. We consider the case of neutral drift with broad peak of c = 1/2. We consider a population of size N , and in each round an individual A is chosen at random to reproduce, and the off-spring A of A is produced from the string of A with per-bit mutation rate of 1%. Then an individual is chosen at random to die and the off-spring A replaces the dead individual (thus population size remains constant). The process stops as soon as one individual reaches a string with Hamming distance at most cL from the target (one individual hits the broad peak). The discovery time is the number of generations (reproductions) required by the individual who reaches the peak the first time. We ran the computer simulation for 1000 samples for each experiment and then plot the average discovery time, and the figure shows the result for N = 100, 500, and 1000, and shows the average discovery time as a function of the gene length L. We again observe that the discovery times grow exponentially in n in all cases.

Supplementary
Supplementary Figure 8: Wright-Fisher Process. The figure shows the evolution of populations in the Wright-Fisher model, for population size N = 10 4 , for various values of L. We consider the multiplicative fitness landscape with r = 1.01, and the selection is felt from L/2 away from the ideal sequence 0. At every generation a new population replaces the old one, such that the expected number of off-springs of an individual of the old population to the new one is proportional to its fitness. These off-springs are mutated with a uniform mutation rate per bit (u = 10 −4 )). The first two figures depict the evolution of the mean fitness and normalized mean fitness of the population, while the last figure depicts the normalized average distance of the population from the target sequence 0. The results are obtained from a computer simulation where for each value of L the simulation was ran for 50 cases, and the averages are shown.