Spatial soft sweeps: Patterns of adaptation in populations with long-range dispersal

Adaptation in extended populations often occurs through multiple independent mutations responding in parallel to a common selection pressure. As the mutations spread concurrently through the population, they leave behind characteristic patterns of polymorphism near selected loci—so-called soft sweeps—which remain visible after adaptation is complete. These patterns are well-understood in two limits of the spreading dynamics of beneficial mutations: the panmictic case with complete absence of spatial structure, and spreading via short-ranged or diffusive dispersal events, which tessellates space into distinct compact regions each descended from a unique mutation. However, spreading behaviour in most natural populations is not exclusively panmictic or diffusive, but incorporates both short-range and long-range dispersal events. Here, we characterize the spatial patterns of soft sweeps driven by dispersal events whose jump distances are broadly distributed, using lattice-based simulations and scaling arguments. We find that mutant clones adopt a distinctive structure consisting of compact cores surrounded by fragmented “haloes” which mingle with haloes from other clones. As long-range dispersal becomes more prominent, the progression from diffusive to panmictic behaviour is marked by two transitions separating regimes with differing relative sizes of halo to core. We analyze the implications of the core-halo structure for the statistics of soft sweep detection in small genomic samples from the population, and find opposing effects of long-range dispersal on the expected diversity in global samples compared to local samples from geographic subregions of the range. We also discuss consequences of the standing genetic variation induced by the soft sweep on future adaptation and mixing.

Here we describe the analytical forms for (t) used to compute the predictions for the characteristic length scale χ in main text Fig. 4. Ref. 1 derived asymptotic growth forms for the long-time limit of the domain core (t) (i.e. the region within which the occupancy of the range by an isolated domain is of order 1) for dispersal kernels with tails that fall off as r −(µ+d) : A exp(Bt η ), 0 < µ < d, A exp log 2 (Bt) 4d log 2 , µ = d, Here, η = log 2 [2d/(d + µ)], and A and B are magnitude scales for and t that depend on µ and on details of the dispersal kernel. (In the wavelike growth regime, µ > d, A is the front velocity of the growing domain.) The logarithmic correction to linear growth for µ = d + 1 is a conjecture for d = 2, which is supported by simulation data.
To extract A and B for the specific kernels used here, we performed separate simulations in which domains were grown from a single seed at the origin at t = 0. The domains were grown up to final masses of order 10 8 for µ ≤ 1 and 10 5 for µ > 1 in 1D, and of order 10 7 in 2D, with the background mutation rate turned off. For each value of µ, 20 independent simulations were performed and the mass evolution over time, averaged over the independent runs, was equated to ω d d (t) following our definition of (t) in the main text. The (t) thus extracted was fit to the growth forms to obtain A and B. (Given that the growth of with t can be extremely fast for µ < d + 1, in practice we fit the functional dependence of log (t) against log t, with log A and log B as free parameters.) Using the total mass as a proxy for (t) leads to an overestimate of the true size of the core, because it also counts individuals in the inevitable "halo" that exists due to jumps from the core to regions outside it during the stochastic growth process. The halo contains a fraction of the individuals in the core, which falls as µ increases. This correction is expected to provide a multiplicative constant of order 1 to (t), which is inconsequential to the prediction of X ave which itself equals χ only up to an overall constant for each µ.
The asymptotic forms only agree with the measured single-allele growth profiles when (t) has grown beyond a certain size. However, this threshold size becomes extremely large (i.e. order of the simulation range or larger) for values of µ close to d [1], making the asymptotic forms of limited utility to predict χ. Ref. 1 also derives an analytical scaling form for the behaviour of log 2 (t) over a much broader range of times for µ close to d, which reads where δ = µ − d and As before, we used fits of log (t) against log t to obtain the parameter values log A and log B. From our fits to the single-allele growth simulations, we found that the scaling form is significantly more accurate than the asymptotic forms of Eq. A1 for µ ≤ 1.4 in 1D, and µ ≤ 2.6 in 2D (except fo the marginal value µ = d in each case). As a result, we use the scaling form for our predictions of χ for these values of µ. Table I summarizes the values of log A and log B extracted from fits to the theoretical forms in Eqs. A1 and A2 as appropriate. In all cases, the forms for log (t) with fitted values for A and B are accurate to within a few percent for (t) of order 20 and larger. The inaccuracy of (t) for smaller domains leads to discrepancies between the measured average clone size and the prediction based on χ d for large µ and high rescaled mutation rates, which drive down the average clone extent into the regime of inaccurate (t).
Once A and B are determined from the fit either to Eq. A1 or Eq. A2, the relation defining the characteristic length, Eq. 1 (main text), is solved to obtain t * (u) and χ µ (u) = µ (t * ). Table 1 in the main text reports the functional forms for χ derived upon assuming that (t) follows the asymptotic forms. When the more complex scaling form is used for (t), Eq. 1 in the main text can still be solved to obtain an analytical solution for χ(u) in terms of Lambert W -functions. For each dispersal kernel, the solution χ µ (u) is analytically determined taking only µ, and the values of A and B estimated from fits (as reported in Table I) as inputs.
The characteristic length scale χ quantifies the balance between domain growth and mutations that sets the average domain size via X ave ∝ χ d up to a multiplicative constant of order 1; the precise relationship between χ d and X ave is determined by the distribution of domain sizes about the characteristic size, which is in turn established by the complete growth dynamics.
We have an explicit form for the domain size distribution in the constant-velocity wavelike growth regime in 1D, µ > 2 (Eqs. A5 and A6), which allows us to derive X ave = 2 2/πχ ≈ 1.6χ in this regime. For the 1D results in Fig. 4, we find that multiplicative constants close to 1.6 also lead to agreement between X ave (u) and χ(u) for other values of µ, over many orders of magnitude of u. The agreement is weakest for high u which corresponds to small domains (average clone sizes of 100 or smaller); here the functional forms of (t) are least accurate and stochastic effects begin to dominate the deterministic growth implied by (t).

B. SIMULATION RESULTS IN 2D
Here, we describe preliminary results for average clone mass, clone extent, and frequency spectra as measured from 2D simulations. Simulating large ranges is a challenge in two dimensions: effectively simulating a system in which key jumps are of order l in length requires a range with over l 2 demes (in contrast to l demes in 1D). We have succeeded in simulating ranges of linear size L = 4096 (hence 4096 2 ≈ 1.6×10 7 demes), and restricted ourselves to a range of mutation rates for which the total range mass is many times the average clone mass, so that we are in the regime of multiple-origin sweeps. However, we still expect finite-size effects to be significant for measures that depend on the spatial extent of the halo, which The average clone mass measured from 2D simulations as a function of rescaled mutation rate, scaled by the expected dependence (∝ũ 2/3 ) for wavelike growth. Each point represents an average over 48 independent simulations and error bars denote measured standard deviations across repetitions. Dashed lines show the theoretical prediction πχ 2 , using χ = χµ(ũ) functions described in Appendix A. Each theory line is multiplied by a µ-dependent magnitude factor whose value is 0.8 for µ < 3, 0.75 for µ = 3, and 0.73 for µ > 3.
can stretch out to many times the mass-equivalent radius for small µ. Fig. A1 compares the average clone size to the theoretical expectation πχ 2 , where the functions χ µ (ũ) are described in Appendix A. As with the 1D results, we find quantitative agreement with the theory lines upon using a single additional parameter -an overall magnitude scale which varies between 0.75 and 0.8. Fig. A2 reports the spatial extent of the clones from the two largest mutation rates, for which finite size effects are smallest. In 2D, we define the extent in terms of the eighth central moment: , where i indexes the demes belonging to that clone, r i is the position vector of deme i (computed modulo L/2 for each component to account for periodic boundary conditions), and r cm ≡ ( X i=1 r i )/X is the clone center of mass. The use of a high moment in the definition of r max ensures that the farthest demes from the centre of mass contribute strongly to r max even if they are rare. The specific choice of the eighth moment balances the need to emphasize the farthest demes (which favours a high moment) with the necessity of preventing loss of floatingpoint precision in the computation (which requires that FIG. A2. Spatial extent of clones in 2D simulations.
Ensemble-averaged spatial extent of clones in 2D, normalized by the ensemble-averaged mass-equivalent radius. See text for definition of rmax in 2D. Dashed lines show theoretical expectations ζ/χ and ψ/χ forũ = 1e − 6, computed as described in Appendix A. The prefactor was chosen so that the lines coincide with the simulation data point at µ = 3. Finite size effects are more severe in 2D, and the measured values for µ < 2 underestimate the true values that would be measured in an infinitely large range.
the moment not be too high). Using the sixth moment leads to similar results. By contrast, using too low a moment (such as the second moment, which provides the radius of gyration of the clone) gives values of r max that are very close to r eq since the core provides the major contribution. We find that the dependence of the ensemble-averaged extent on the dispersal kernel is well captured by the length scale ψ in the regime of power-law growth in 2D, 2 < µ < 3, with a single additional parameter setting the overall magnitude scale. We note that the asymptotic ratio ψ as /χ as , which was successful in reproducing r max for the 1D data, does not agree with the 2D simulation data for the current parameter range. This is because the typical sizes of clones in the 2D simulations is too small for the asymptotic growth rule ( (t) ∼ t 1/(µ−d) ) to be accurate. Instead, the scaling solution from Appendix A, which accurately captures the growth of single clones at the relevant size scales, must be used.
As was seen with the 1D data, the extent starts to depart from ψ as µ → d, consistent with an increased prominence of rare jumps out of the core region that land beyond well-established satellite clusters. However, the measured extent remains far below the theoretical bound ζ/χ, which grows extremely fast as µ falls below 2. We hypothesize that the ensemble averages are severely limited by finite-size effects; to attempt to match the theoretical expectation for µ = 1, for instance, we would require range sizes over an order of magniture larger in linear size, beyond our current capabilities for 2D sim-ulations. Nevertheless, our limited simulations confirm that clones can attain a spatial extent many times larger than their mass-equivalent radius as the dispersal kernel is broadened.
To summarize, the results from preliminary 2D simulations show quantitative evidence for the relevance of the length scale χ, when combined with theoretical predictions for (t) from Ref. 1. The simulations also show that the halo can extend over much longer distances than expected for compact clone, with evidence for the relevance of the length scale ψ obtained from the core-halo picture in the power-law growth regime d < µ < d + 1. However, more extensive simulations with much larger range sizes are needed to quantitatively test the relevance of the second scale ζ.
Here, we provide an alternative estimate for the length scale ψ that sets the extent of the halo of a "typical" clone, which agrees with the estimate ψ = (2t * ) proposed in the main text. The iterative scaling picture of Ref. 1 argues that, for growth in the marginal regime near µ = d, key jumps that land at a distance (t) from the mutational origin typically occurred around time t/2 and spanned a distance of roughly (t) connecting source and target regions each of size ∼ d (t/2) (Fig. 2b). The core extent at a given time constrains the expected number of these key jumps that have contributed to the core boundary by that time: they can be neither too rare (in which case the core would not have reached the purported boundary) nor too common (which would imply that the region should have been filled much earlier). Since the number of key jumps is itself set by the extent of the core (the source for the jumps) together with the jump kernel, the above constraint equates to a self-consistency requirement on (t) [1]: where G(r) = J(r)r 1−d /ω d is the rate of jumps per unit area of source and target regions when both are separated by a distance r. In the soft-sweep model, key jumps compete with new mutations in the target region, which occur at a rate of orderũ d (t/2). The growth of the halo is obstructed by new clones when the rate of mutations arising in the target region becomes comparable to the rate of key jumps into it from the expanding core. This requires (A3) Up to factors of order unity, the above scaling relation is satisfied by t = 2t * , where t * was the solution to Eq. 1. Therefore, we arrive at the same expression, ψ ≡ (2t * ), for the characteristic halo extent as we had derived in the main text from considerations of the jump-driven growth of unobstructed clones. The panmictic limit in our lattice model would correspond to jumps being attempted from the source deme to a randomly chosen deme in the entire range. The allele frequency spectrum and related sampling probabilities can be computed exactly in this limit by mapping to an urn process. To see this, consider the evolution of allele frequencies in our lattice model when the fraction of wildtype sites is w and mutants occupy the lattice with individual fraction f i for mutant i. At the next time step, the probability weight associated with picking a wildtype site to introduce a new mutation isũ × N w = θw, where θ =ũN is the initial mutation rate for the empty lattice. By contrast, the probability weight associated with picking a site of mutant type i for an attempted dispersal event is N f i , but only a fraction w of these attempted dispersal events is successful since the mutant only fixes in the target deme if it contains the wildtype. Therefore the probability weight of a successful reproduction of mutant i is N wf i . The final statistics of clone sizes is determined by the relative rate of mutation to reproduction at each time step [2] (unlike the times for the appearance of new clones which depends on the absolute rates), which is θ versus n i = N f i at all times since the wildtype fraction drops out.
The genealogy of new mutants in this model is identical to that of a stochastic process called Hoppe's urn [3], which begins with an urn containing a single black ball with an assigned probability weight θ. At any time step, a ball is picked from the urn with probability proportional to its weight. If the black ball is chosen, it is returned along with a ball with a new colour and probability weight 1 (a new mutant). If a coloured ball is chosen, it is returned along with one copy of itself. The relative rate of mutation to the duplication of a ball with colour i is θ versus n i at each turn, thus establishing the equivalence to our lattice model. The distributions of mutant frequencies in this urn model are the same as those for the infinite allele model at equilibrium [4]. In particular, the allele frequency spectrum is Fig. 6 shows that panmictic simulations reproduce the theoretical limit, which also persists for µ ≈ 0.5 in two dimensions.
The average clone size in the panmictic limit can be obtained from the allele frequency spectrum by computing the expected number of distinct clones n c . The smallest possible clone frequency is 1/N . Therefore, the expected number of distinct clones, n c , is the sum of all allowed allele frequencies, i.e. n c = 1 1/N f (x) dx, which can be evaluated exactly using f (x) from Eq. A4. For large N , we have n c ≈ũ[−1 + θ + N log N − N (γ + ψ 0 (θ)], where γ is the Euler-Mascheroni constant and ψ 0 is the digamma function. A further simplification, valid for θ 1, is n c ≈ θ log(1/ũ) [5]. Once n c is computed, the average clone size is N/n c .
Note that a mapping of the parallel adaptation model to an urn process was also identified in preprint [6].

ii. Wavelike spreading limit in 1D
For µ > d + 1, domains are predicted to grow in radially expanding waves, whose speed depends on the details of the dispersal kernel. The statistics of soft sweeps in this limit was previously explored by Ralph and Coop [7], who observed the equivalence of the process in the wavelike limit to Kolmogorov-Johnson-Mehl-Avrami (KJMA) models of grain growth. KJMA models track the evolution of isotropic domains which nucleate at random positions in space at a constant rate. Nucleated domains grow isotropically at a constant front velocity until they run into other domains, leaving a boundary separating domains that nucleated at different origins. The final pattern of domains matches the spatial pattern of clones in the mutation-expansion model, where individual domains correspond to distinct mutants.
In one dimension, the final grain size distribution for a KJMA process in which each nucleation gives rise to a unique domain is known exactly [8]. Using this result, we obtain the allele frequency spectrum for wavelike growth in 1D (µ > 2) as where χ = v/2u is the characteristic length scale for domains growing with front speed v, and where erf is the error function. The result is valid as long as the domain sizes are not limited by the range size, i.e. L χ. The front velocity for arbitrary µ > d + 1 is not known analytically, but its limiting value for very large µ in the lattice model is known. In the limit µ d + 1, practically all attempted jumps land exactly one lattice site away from the source (this is the lower cutoff for allowed jump distances). Isolated domains grow via jumps from the demes situated at the edges, only half of which are successful in advancing the front (the other half land on the occupied side of the front and have no effect). Therefore, the front velocity is 1/2 a lattice site per generation in the large-µ limit. The frequency spectra for µ > d + 1 approach this limit as µ increases, see Fig. 6. We can also extract the µ-dependent front speed by a one-parameter fit of Eq. A5 to the observed frequency spectra, and obtain consistent results when performing fits at different values of the mutation rate for any given µ, as shown in Fig. A3.

E. DETERMINISTIC APPROXIMATION TO ALLELE FREQUENCY SPECTRA IN 1D
The analysis of the panmictic limit in the main text revealed that the distribution of alleles as µ → 0 was identical to that of Hoppe's urn process. The continuoustime analogue of Hoppe's urn is the Yule process with immigration, in which new alleles enter the population as a Poisson process with rate θ, and already-present individuals give birth to offspring at rate 1 without death. Yule's process generates the same distribution of allele sizes as Hoppe's urn, but the continuous-time description has the advantage that the dynamics of different alleles are independent: the population of allele i at time t is proportional to e t−ti where t i was the time at which it entered the population. Statistical properties of the allele frequencies, such as the frequency spectrum f ∞ (x), can be derived efficiently within this viewpoint.
In our simulations, the growth rate of alleles is not constant over time even if we assume panmictic migration; the success of each birth event is proportional to the wildtype fraction w which falls as the simulation progresses. However, as we saw in the main text, the mapping to Hoppe's urn/Yule process remains exact because the rate of generation of new alleles is also proportional to w and the relative rates of birth and migration remain constant throughout the duration of the simulation in the panmictic limit. This is no longer true for µ > 0 when domains grow somewhat contiguously, because the likely targets for migrants become correlated with the occupancy of the lattice and the reduction in growth rate may not simply be given by the fraction w. If we ignore these correlations, we arrive at the following approximate continuous-time model for the establishment and growth of mutant clones: new alleles enter the population at a constant rate θ, and grow according to the growth rule (t) for the particular dispersal kernel, without interference from other clones.
We can make analytical headway if we further assume that the arrival of new alleles is deterministic rather than Poisson: the kth allele enters the population at time t k = k/θ, and hence the size of the kth clone is n k = (t−k/θ). The total number of alleles, K, is fixed by the range size: N = K k=1 n k . In this deterministic model, the strict time ordering of alleles implies that there are k alleles with size greater than or equal to n k ; i.e. if we can invert the n k relation to get k(n k ), this is the survival function associated with the probability distribution of n k and hence x = n k /N . The probability distribution of x is precisely the allele frequency spectrum up to a normalization.
Below, we summarize the outcome of computing f (x) according to this deterministic approximation upon using the asymptotic functional forms for (t) in the different regimes in 1D, summarized in Table I.

i. Power-law growth
The deterministic approach can be used to compute an approximate frequency spectrum for the growth form (t) = At 1/(µ−1) , which is the asymptotic growth rule for 1 < µ < 2. In this case, we have a frequency spectrum that decays as a power law: f (x) ∼ x µ−2 , up to a hard cutoff at a maximal value determined by the value of K that fills the entire range. Furthermore, the form admits a rescaling that ought to collapse frequency spectra across different system sizes and mutation rates: f (x) = (L/X ave ) 2 F (Lx/X ave ), where F (y) = y µ−2 up to the cutoff y max = µ/(µ − 1), which is the same as Eq. 4 in the main text. Fig. A4 shows that the collapse works very well across different mutation rates and two system sizes. The predicted power law for f (x) is nearquantitative for all µ except µ = 1.2, which is too close to the marginal case µ = 1 for the asymptotic growth rule to be relevant. The predicted cutoff frequency captures the rough location of the dropoff in f (x), but the deterministic approximation fails to capture the "soft shoulder" or the clones at very large frequency, which may have an outsize influence on sampling statistics. Note that the deterministic approximation predicts a flat frequency spectrum f (x) = const. for linear growth (t) = vt, whereas the exact result for wavelike growth in 1D from the Axe and Yamada results, which we have seen to be quantitatively accurate for µ 2, predict a linear increase in the power spectrum f (x) ∝ x for small x. The difference is due to the fact that the deterministic approximation assumes that growth happens symmetrically toward both the left and the right at all times, whereas the wavelike growth limit is characterized by the left and right edges of the domain being inter-rupted independently as they run into other domains, so that one edge always advances for longer than the other. We can also explicitly include the log t correction to linear growth exactly at µ = 2, and we find that the low-x behaviour is unaffected (i.e. f (x) ∼ const. as x → 0) but there are contributions at higher x. These arise in the "shoulder" region of the spectrum, which is not captured by the deterministic analysis.

ii. Marginal growth
If we use the growth form for µ = 1 in the deterministic calculation, we no longer get a simple power law for f (x); the functional form is instead where a and b depend on the prefactors associated with (t) and on θ and K. This form is not a strict power law in x. However, when the various coefficients are computed using the full expression for (t) measured from the growth of single domains (Appendix A), we find that f (x) behaves similar to a power law over a wide range of n k , with an effective exponent between -0.65 and -0.85. Using the same rescaling as for the power-law growth for the measured f (x) gives reasonable collapse over a range of values of u and L (Fig. A5) with a power law decay f (x) ∼ x −0.72 . We note that f (x) measured from simulations appears closer to a power-law form for x → 0 than the deterministic approximation.

iii. Stretched exponential growth
In the stretched-exponential growth regime µ < d, the rescaling of the frequency spectra for a specific kernel proposed in Equation 4 is no longer exact. The rescaling assumed that χ set all length scales in the problem; this was true for power-law growth because the halo-dependent scales ψ and ζ were proportional to χ (with proportionality factors that depended only on µ and not on χ). By contrast, for stretched-exponential growth the additional length scales depend on the average clone sizes and hence onũ. However, Fig. 6 showed that the rescaling captured much of the variation in f (x) across two well-separated mutation rates, down to µ = 0.4.
Although we could compute approximate frequency spectra using the deterministic calculation outlined above, they are less revealing in this regime. Instead, we gauge the inaccuracy of the proposed scaling in the panmictic limit µ → 0 where we know the exact frequency spectrum f ∞ . When Nũ = θ 1, we have X ave ≈ −1/(ũ logũ) in the panmictic limit. Using this result and the form for f ∞ in Eq. 4, we find that 10 5 10 4 10 3 10 2 10 1 10 0 10 1 10 2 Lx/Xave  where y = N x/X ave and we have used θ 1 in the second step. We find that the function after rescaling has a residual dependence on logũ, both in the overall magnitude and in the value y c ∼ logũ of the dropoff in f . The gentle logarithmic correction implies that the proposed rescaling still captures much of the variation with mutation rates for a given kernel, even ifũ is varied by orders of magnitude, thus explaining the decent collapse of curves at different mutation rates in Fig. 6 even for µ < d.

F. ALLELE FREQUENCY SPECTRA WITH A HARD CUTOFF
The measured allele frequency spectra display a powerlaw behaviour f (x) ∼ x p , p > −1 for small values of x. For cores growing as contiguous domains, balancing growth and mutation rates gives rise to a characteristic linear domain size χ (and corresponding clone size χ d ) for domain growth before a cone encounters a new mutation. In a finite range of size L d , such growth would imply an upper bound on the allowed allele frequency at some value x c ∼ (χ/L) d . These observations suggest the ansatz for the allele frequency spectra introduced in the main text: where the prefactor is determined by the normalization condition 1 0 x f (x)dx = 1. This ansatz ignores contributions from higherfrequency clones, which are clearly significant especially for small values of µ. We can evaluate the significance of these contributions by comparing measured quantities to expectations from the hard-cutoff ansatz below.
The average clone size X ave ≡ N/n c = N/ f (x)dx can be evaluated for all p > −1 as The sampling probability of observing only one allele in a sample of size j evaluates to which deviates weakly from the exponential falloff P hard = x * j−1 expected if all clones are of the same size and hence the same frequency x * .
For 1D wavelike growth with constant front velocity, Ref. [8] provides the exact form for the allele frequency spectrum, Eqs. A5-A6. The probability of observing only one allele in a random sample of size j is then The latter integral cannot be evaluated in a closed form, even when we consider L/χ 1 so that the upper limit can be replaced by s = ∞. However, by tracking the position of the maximum value of the integrand which occurs at s ≈ √ j, and using Laplace's method to approximate the integral, we arrive at ∞ 0 s j p(s) ds ≈ 2j j/2 p( √ j), which provides a correction to the leading contribution ( √ 2χ/L) j−1 to P hard . The resulting approximate expression, P hard ≈ 2( √ 2χ/L) j−1 j j/2 p( j), is used in the dash-dotted line in Fig 7 of the main text. Note that the approximation is only valid when the maximum value of the integrand lies below the upper integration limit; i.e. for j < L 2 /(2χ 2 ). For larger values of j, P hard is dominated by the upper limit, and scales as ( √ 2χ/L) j−1 × (L/( √ 2χ)) j p(L/( √ 2χ)) which is independent of j; i.e. the probability of detecting a hard sweep ultimately levels off for sufficiently large j.