Genetic Diversity in the Interference Selection Limit

Pervasive natural selection can strongly influence observed patterns of genetic variation, but these effects remain poorly understood when multiple selected variants segregate in nearby regions of the genome. Classical population genetics fails to account for interference between linked mutations, which grows increasingly severe as the density of selected polymorphisms increases. Here, we describe a simple limit that emerges when interference is common, in which the fitness effects of individual mutations play a relatively minor role. Instead, similar to models of quantitative genetics, molecular evolution is determined by the variance in fitness within the population, defined over an effectively asexual segment of the genome (a “linkage block”). We exploit this insensitivity in a new “coarse-grained” coalescent framework, which approximates the effects of many weakly selected mutations with a smaller number of strongly selected mutations that create the same variance in fitness. This approximation generates accurate and efficient predictions for silent site variability when interference is common. However, these results suggest that there is reduced power to resolve individual selection pressures when interference is sufficiently widespread, since a broad range of parameters possess nearly identical patterns of silent site variability.


Text S4 Recombining genomes
In contrast to the asexual populations in the previous sections, relatively few methods exist for predicting the evolution of recombining genomes. The primary di culty stems from the fact that recombination requires explicit haplotype information: the fitness of a recombinant o↵spring depends not only on the fitness of each parent, but also on the precise location of the mutations within each parental genome. As a result, we no longer have a clean separation of scales between the mesoscopic dynamics of fitness evolution and the microscopic dynamics of sequence evolution that proved so useful in the analysis of asexual populations. Rather, both e↵ects must be modeled simultaneously. For similar reasons, forward-time simulations of recombining genomes are significantly more time-consuming than their asexual counterparts, even when we are only interested in evolution at the fitness level.

Recombination in the background selection regime
Because of these di culties, most earlier work on recombining genomes falls back on the independentsites assumption described in the main text. This avoids the haplotype problem by assuming that (on evolutionary timescales) the frequencies of individual mutations are in linkage equilibrium with each other, where they evolve with some e↵ective population size N e . In the simple model studied here, the e↵ective population size can be calculated in the background selection limit, which yields the well-known formula N e = Ne 2U/(2s+R) from Eq. (6). This prediction is valid in the limit that Nse 2U/(2s+R) ! 1 with NU/Ns and NR/Ns fixed, which is similar to what we found in the asexual case. In this limit, the linkage disequilibrium between two sites separated by a fixed map length R scales as which self-consistently vanishes in agreement with the independent-sites assumption above. However, for finite N e s it is known that selection causes distortions in neutral allele frequencies similar those observed in asexual populations, although the dependence on the underlying parameters is somewhat di↵erent. A structured coalescent description has only recently been derived in this limit [42,61], and its analytical implications are still being explored [38]. A cursory reading of Ref.
[61] could give the impression that the recombining structured coalescent avoids the interference issues that plagued its asexual counterpart, but we have shown in Figure S1 that this is not the case. We see that while the structured coalescent captures much of the distortion when Nse 2U/(2s+R) 1, it also rapidly diverges from simulation results near its maximum predicted deviation from neutrality, similar to the asexual case above. This leaves a broad "interference selection regime" in recombining populations as well, even when the ratio of mutation and recombination rates is not too large.

Interference and the linkage block ansatz
In order to predict the diversity in this interference selection regime, the most direct approach would be to the extend the coarse-graining in Text S3 to the recombining structured coalescent. However, this direct approach is more di cult than it appears because of the tight coupling between genetic diversity and fitness evolution in recombining genomes. Even if the interference selection limit still exists in recombining genomes (for fixed NR), we can no longer predict the variance in fitness within these populations without first characterizing the deleterious diversity along the chromosome. In asexual populations, this calculation was crucial for connecting the interference selection regime to the proper coarse-grained model. Moreover, even if the direct approach was successful, the recombining structured coalescent is su ciently complicated that it would provide little insight into the influence of recombination rate in these populations. This is arguably the most important goal of any theoretical analysis.
For these reasons, we eschew the direct approach here in favor of a simple heuristic argument, which trades some mathematical rigor for enhanced qualitative insight -and ultimately, better quantitative predictions. Like the ordinary independent sites assumption, our heuristic approach is based on the fact that distant parts of the genome are e↵ectively independent of each other. Yet this intuition cannot apply all the way down to the single-site level. Rather, evolution on su ciently short length scales will resemble an asexual genome, where interference builds up more rapidly than recombination can act to remove it. To the extent that this transition is sharp, the evolution of a recombining genome can be viewed as a set of freely recombining linkage blocks, within which evolution is e↵ectively asexual.
We argued in the text that the length of these blocks, L b /L, must scale as where c is some O(1) constant. The motivation for this scaling is simply that (up to logarithmic corrections) T 2 is relevant timescale over which genetic and phenotypic diversity is accumulated, and that blocks of size L b experience ⇠ O(1) recombination events over this time period. Previous work has also shown that L b /L corresponds to the extent of linkage disequilibrium in the background selection regime [42]. For concreteness, we choose the functional form which satisfies the scaling in Eq. (ST4.2) and seems to yield good results in practice. Given this definition, we partition the genome into asexual blocks of size L b which evolve independently of each other, and whose behavior can be predicted with the asexual methods described above. In the interference selection regime, this implies that: 1. The coalescent timescale T 2 is set not by the total variance in fitness within the population, but rather by the fraction 2 · (L b /L) that accumulates within a single linkage block. 2. The functional relationship between T 2 and 2 · (L b /L) is given by the asexual formula Eq. (ST4.2) derived in Text S3.
3. The fractional variance 2 · (L b /L) can be predicted from the asexual formula in Eq. (ST4.19), but with an e↵ective mutation rate U e↵ = U · (L b /L). We verify these predictions in Figures ST4.1 and ST4.2 using the same forward-time simulations from Figure 4 in the main text. We see that our simple approximation is surprisingly accurate: U · (L b /L) determines 2 · (L b /L), and 2 · (L b /L) in turn determines T 2 . Since L b /L is itself a function of T 2 , we obtain a closed system of equations and a self consistency condition for the linkage scale: where T 2 /N is given by Eq. (ST4.2) and N (NU, Ns) is given by Eq. (ST4.19). In the limit that T 2 R 1 and 2 ⌧ Us, this simplifies to where N 0,e↵ = p NU(Ns) 2 (L b /L) is the e↵ective control parameter for the interference selection regime on the linkage block. This implies that any two populations with the same value of U/R · h(Ns) 2 i should possess the same patterns of synonymous diversity, on average. This quantity has a natural interpretation as the fitness variance within the typical LD scale that would be obtained if fitness was not a selected  trait. When N 0,e↵ 1, we can employ the asymptotic formulae in Eqs. (ST4.17) and (S4.6) to simplify these expressions even further. Up to logarithmic corrections, we find that which are only weakly dependent on N . This gives some intuition for the scaling behavior, but many biologically relevant parameters lie outside this asymptotic regime and therefore require a numerical solution of Eq. (ST4.4) to calculate L b /L (see Methods). Once L b /L is determined, we can generate predictions for the site frequency spectrum by applying our asexual coarse-grained model for the parameters NU · (L b /L) and Ns. As a side benefit, our calculation of 2 gives a novel prediction for the rate of Muller's ratchet in sexual populations, which has important implications for the evolution of sex and genome architecture [87]. While the accuracy of the linkage block approximation is encouraging, some small systematic errors remain. These are already apparent from Figure 4, where each value of NR appears to collapse to a slightly di↵erent curve, despite the fact that the collapse within each value of NR is quite good. These errors are likely caused by a crucial factor we neglected in our original analysis: distant regions of the genome may be independent, but they still influence each other's evolution through a reduction in the e↵ective population size [46,67,71,72]. Fitness variation at a distant locus represents e↵ectively non-heritable variance in o↵spring number when the time T r between successive recombination events satisfies T r ⌧ T MRCA , which would occur if the loci were located on di↵erent linkage blocks. Existing studies have focused on these e↵ects at the level of individual sites, but extending this intuition to the linkage blocks, we might expect corrections to N e which depend on products of the form where n represents the distance measured in the number of linkage blocks. The power-law decay with n suggests that even relatively distant blocks contribute to the reduction in N e , similar to the background selection limit. This is consistent with the qualitative observation that longer genomes (i.e., larger values of NR) have a shallower "distortion vs diversity curve" in Figure 4, since they have more linkage blocks to contribute to the reduction in N e , and therefore, the reduction in ⇡/⇡ 0 . Unfortunately, quantitative predictions of N e are di cult, since the transition between the e↵ectively asexual and e↵ectively unlinked regimes is not su ciently sharp to apply existing theory. A more detailed analysis of distant linkage blocks remains an important avenue for future work.