Limits to the Rate of Adaptive Substitution in Sexual Populations

In large populations, many beneficial mutations may be simultaneously available and may compete with one another, slowing adaptation. By finding the probability of fixation of a favorable allele in a simple model of a haploid sexual population, we find limits to the rate of adaptive substitution, , that depend on simple parameter combinations. When variance in fitness is low and linkage is loose, the baseline rate of substitution is , where is the population size, is the rate of beneficial mutations per genome, and is their mean selective advantage. Heritable variance in log fitness due to unlinked loci reduces by under polygamy and under monogamy. With a linear genetic map of length Morgans, interference is yet stronger. We use a scaling argument to show that the density of adaptive substitutions depends on , , , and only through the baseline density: . Under the approximation that the interference due to different sweeps adds up, we show that , implying that interference prevents the rate of adaptive substitution from exceeding one per centimorgan per 200 generations. Simulations and numerical calculations confirm the scaling argument and confirm the additive approximation for ; for higher , the rate of adaptation grows above , but only very slowly. We also consider the effect of sweeps on neutral diversity and show that, while even occasional sweeps can greatly reduce neutral diversity, this effect saturates as sweeps become more common—diversity can be maintained even in populations experiencing very strong interference. Our results indicate that for some organisms the rate of adaptive substitution may be primarily recombination-limited, depending only weakly on the mutation supply and the strength of selection.


Text S4 Additive e↵ects of multiple sweeps E↵ect of multiple catastrophes
We wish to find the probability of fixationP of a new allele providing advantage s when there are ongoing sweeps at linked loci, averaged over possible patterns of sweeps and genetic backgrounds. To do so, we will first consider a related problem: suppose that a sequence of instantaneous "catastrophes" occurs at times t 1 < t 2 < . . . < t n , and that the probability that a mutant individual survives the catastrophe at time t i is given by w i . Then the probability that a mutant allele that arises at time t will fix is [79] P (t) = 2s ⇥ 8 > > > > > > < > > > > > > : where We can find the net reduction in fixation probability, integrated over randomly timed catastrophes, as follows, with c i = 1 + 1 ⌘⌘ : . .w n (Above we have used that 1 + y i c i = w i 1 (1 + c i 1 ). This is just the sum of e↵ects of each individual catastrophe. Therefore, the expected net reduction in fixation probability, E [⌦] ⌘ 2s P , is equal to the rate of catastrophes, ⇤, multiplied by the expected e↵ect of a single catastrophe, We can now return to the problem of interfering sweeps. If all the interfering sweeps have selective advantage S i s, then their e↵ect is nearly the same as the series of catastrophes considered above, with a sweep at distance r from the focal allele having e↵ect w = 1 (s/S) r/S [79]. Assuming that the sweeps with advantage S occur at rate ⇤ S and are scattered uniformly over a linear map of length R S (i.e., r is distributed approximately uniformly), the average amount of interference is approximately: (Note that there is a factor of 2 in Eq. (26) arising from the fact that sweeps cause interference to both upstream and downstream loci on the chromosome.) If all interfering sweeps have the same selective advantage S and occur at rate ⇤, this simplifies to Thus the expected fixation probability of the focal allele is This is a proof of the linearity result in [79]: that paper only established the threshold at which alleles become e↵ectively neutral and showed the linear relation betweenP and ⇤ numerically.

Numerical analysis
In the main text, we instead focus on the case S ! s in which all alleles have the same e↵ect. In this case, the above proof of linearity no longer applies, because the interference caused by a sweep can no longer be approximated by an instantaneous catastrophe. Instead, the amount of interference grows and decays in time approximately exponentially with st, peaking at ⇠ 1/s generations before the sweep reaches frequency 1/2 ( Figure S3). However, the numerical calculations below show that for small numbers of interfering sweeps the e↵ects nearly add. Thus, as long as no more than a few sweeps are typically occurring nearby on the chromosome (⇤/R . 1), we can continue to make the approximation that their e↵ects add. Doing so, we can write the total expected interference experienced by a new allele as where p(r) is the probability density for the map length between the focal allele and a sweep; assuming that both are distributed uniformly, it is p(r) = 2(1 r/R)/R as long as r ⌧ 1/2 (the value at which the genetic distance saturates).
⌦ 1 (r, t) can be found by numerically solving Eq. (5) from [55], after which the integration in Eq. (29) can also be performed numerically. The integral over r is dominated by r ⇠ s (see Figure S4 ), so for R s far-away sweeps make little contribution (besides the net e↵ect of unlinked sweeps discussed in the main text), and we can take R to infinity in the integral. In this case, p(r) reduces to a uniform distribution, and Eq. (29) evaluates to where Z ⇡ 1.05. Note that Z does not depend on any of the population parameters. SinceP = 2s E[⌦], Eq. (6) immediately follows.

Checking the additive approximation
We can check the assumption of additivity by numerically solving Eq. (2) for the fixation probability. To do this, we must know the frequencies of the selected genetic backgrounds, g (X). (Note that the number of possible backgrounds grows exponentially with the number of polymorphic loci, so it is impractical to do this for more than a few interfering sweeps). We assume that each interfering sweep begins with a single copy in a wild-type individual, and then follows the deterministic trajectory (given by Eq. (23)). (Note that after this initialization, we allow for recombinant genetic backgrounds to be present at arbitrarily low frequencies.) The deterministic approximation underestimates the typical initial rate of increase while the sweeping alleles are at frequencies . 1/(Ns) [62], but the contribution of this very early part of the trajectory to the total interference is negligible. The sweeping alleles will appear at appreciable frequency after a time t ⇤ ⇠ 1 s log(Ns). The sweeps will initially be in complete negative linkage disequilibrium; the extent of linkage disequilibrium that remains when they reach high frequency, and hence the extent to which they interfere with each other, depends on rt ⇤ = r s log(Ns). Thus, population size does have an indirect influence on the rate of adaptation, by influencing the extent of linkage disequilibrium between competing sweeps. However, numerical calculations show that in large populations, linkage disequilibrium between sweeps will typically decay to a negligible value before they reach frequencies at which they appreciably interfere with new mutations. The total interference caused by two overlapping sweeps therefore depends only weakly on the initial linkage disequilibrium between them, and in fact is close to the interference caused by two isolated (non-overlapping) sweeps. The numerical results shown in Figure S5 show that pairs of sweeps do typically have a stronger e↵ect than expected from their individual e↵ects, but that this interaction is generally small compared to the total reduction in fixation probability.

Distribution of selective coe cients
We can repeat the analysis leading to Eq. (30) for a distribution ⇢(S) of selective advantages of the interfering sweeps. We can use the generalization of Eqs. (26) and (29): where we have assumed that nearly all sweeps have S ⌧ R, so that p(r) ⇡ 2/R. Eq. (4) and the scaling argument in the main text imply that the total interference caused by a sweep must have the form where Z s/S depends only on the ratio of the selective coe cients; see [55] for a more detailed discussion.
Z S/s can be evaluated numerically by integrating Eq. (6) of [55]. Simply gluing together the approximations in Eqs. (27) and (30) suggests that we approximate Z s/S by Eq. (11), which does indeed fit the numerical analysis well for S 6 ⌧ s -see Figure S6. (The numerical analysis fails in any case for S ⌧ s, since [55] assumes that the interfering sweep is not a↵ected by the focal sweep.) For the specific case considered in the main text of mutations with an exponential distribution of e↵ects, ⇢(s) = e s/hsi / hsi, most alleles that would fix in the absence of interference have selective advantages s ⇠ hsi, as do most of the alleles that they experience interference from (see Figures 7 and 8).
Thus, since S/s will rarely be greater than ⇡ 3 for the relevant pairs of alleles, we can make the simplifying approximation that the logarithmic factor in Eq. (11) is roughly constant, and therefore Z s/S ⇡ S/s.
With this simplification, Eq. (10) for the probability of fixation becomes: whereŝ is the mean selective advantage of alleles that successfully sweep. Averaging over s and rearranging terms, the mean probability of fixation is s = 2 hsi when interference is weak, and does not increase much even for strong interference ( Figure 7); substituting in this value, we have ⌦P ↵ where ⇤ 0 = 2NU hsi is the rate of sweeps in the absence of interference. In Figure 8, we consider populations where interference from unlinked loci is substantial (v > 1/4), and consider values of s large enough that the saturation at 1 must be taken into account; we therefore use the more detailed equationP where s ⇤ = 4⇤ hsi /R.