Skip to main content
Advertisement
  • Loading metrics

The effect of long-range linkage disequilibrium on allele-frequency dynamics under stabilizing selection

  • Sherif Negm,

    Roles Formal analysis, Writing – original draft, Writing – review & editing

    Affiliation Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America

  • Carl Veller

    Roles Conceptualization, Formal analysis, Writing – original draft, Writing – review & editing

    cveller@uchicago.edu

    Affiliations Department of Ecology & Evolution, University of Chicago, Chicago, Illinois, United States of America, National Institute for Theory and Mathematics in Biology, Northwestern University and University of Chicago, Chicago, Illinois, United States of America

Abstract

Stabilizing selection on a polygenic trait reduces the trait’s genetic variance by (i) generating correlations (linkage disequilibria) between opposite-effect alleles throughout the genome, and (ii) selecting against rare alleles at loci that affect the trait, eroding heterozygosity at these loci. Here, we show that the linkage disequilibria, which stabilizing selection generates on a rapid timescale, slow down the subsequent allele-frequency dynamics at individual loci, which proceed on a much longer timescale. Exploiting this separation of timescales, we obtain expressions for the expected per-generation change in minor-allele frequency at individual loci, as functions of the effect sizes at these loci, the strength of selection on the trait, its variance and heritability, and the linkage relations among loci. Using whole-genome simulations, we show that our expressions predict allele-frequency dynamics under stabilizing selection more accurately than the formulae that have previously been used for this purpose. Our results have implications for understanding the genetic architecture of complex traits.

Author summary

Stabilizing selection—selection for optimal trait values—is likely pervasive across humans and other species. Its phenotypic effect is to reduce trait variance, and it achieves this genetically by favoring compensating combinations of trait-increasing and trait-decreasing variants throughout the genome, generating correlations between them, and by selecting against rare variants at individual loci. We show that the correlations generated by stabilizing selection slow the rate at which it purges rare variants. We characterize this effect mathematically, and show via simulations that the expressions we derive for the frequency dynamics at individual loci are accurate. Our results make possible more precise detection and quantification of stabilizing selection in genomic data.

1 Introduction

To understand the genetic architecture of polygenic traits, we need to connect population genetic models with genomic data such as those from genome-wide association studies (GWASs). For many traits, a particularly plausible model is stabilizing selection, which penalizes deviations from an optimal trait value. Theoretical argument and empirical evidence indicate that many complex traits are under stabilizing selection [1]. For example, in humans, Sanjak et al. [2] used lifetime reproductive success as a proxy for fitness and estimated significant nonlinear selection gradients consistent with stabilizing selection for a large fraction of the traits they analyzed. Furthermore, a number of studies have demonstrated for many human traits that the joint distribution of allele frequencies and effect sizes (inferred from GWAS) is consistent with stabilizing selection [35] (see also [6]).

The macroscopic consequence of stabilizing selection on a complex trait is to reduce the trait’s genetic variance over time. This is achieved in two ways. First, by selecting for compensating combinations of trait-increasing and trait-decreasing alleles, stabilizing selection rapidly generates negative correlations—linkage disequilibria (LD)—between alleles with the same directional effect on the trait [7] (this has come to be known as the ‘Bulmer effect’). Second, stabilizing selection generates weak selection against the rarer allele at each polymorphic locus affecting a complex trait, slowly eroding heterozygosity at these loci [8,9].

This second variance-reducing effect—which mimics fitness underdominance at trait-affecting loci—can be understood intuitively in the following way (see also Section 3.1 below). Consider a focal polymorphic locus affecting the trait, and suppose that the common allele at the locus is trait-increasing, so that the mean effect of the locus is to increase the trait. For the population’s average trait value to be at the optimum, the mean effect of the rest of the genome must compensate for the mean effect at the focal locus, and must therefore be trait-decreasing. But this leaves the rare allele at the focal locus—which is also trait-decreasing—maladapted to the rest of the genome: on average, it resides in individuals with trait values further from the optimum than the common allele at the locus does.

The speed of the resulting allele-frequency dynamics depends, as the intuition above suggests, on the mean phenotypes experienced by the trait-increasing and trait-decreasing alleles at the focal locus. These mean phenotypes are determined not only by the individual effects of the alleles themselves in concert with the mean effect of the rest of the genome, but also by the effects of alleles with which the alleles at the focal locus are associated via LD. Because of the Bulmer effect, alleles at the focal locus will tend to be associated with opposite-effect alleles elsewhere in the genome, partially masking the individual effects of the alleles at the focal locus. This brings the mean phenotypes experienced by the alleles at the focal locus closer to the optimum, weakening selection between them and therefore slowing down their frequency dynamics.

Here, we quantify the slowdown in allele-frequency dynamics at individual loci caused by the Bulmer effect. Exploiting a separation of the timescales over which stabilizing selection generates LD (rapidly) and changes allele frequencies (slowly; S1 Fig), we obtain simple expressions for the expected change in an allele’s frequency across a single generation under stabilizing selection, as a function of its individual effect, the strength of stabilizing selection on the trait, the trait’s variance and heritability, and the linkage relations among loci in the genome. As we show, the expressions that we derive predict allele-frequency change in simulations more accurately than other expressions that have commonly been used.

There is a deep theoretical literature on the population genetics of stabilizing selection, and some of the conclusions that we reach echo results from earlier work (see especially [1014] and syntheses in [1517]). For example, assuming a normal distribution of allelic effects on a trait under stabilizing selection, Lande [10] and Turelli & Barton [13] found that the linkage relations among loci do not affect the trait’s additive genetic variance at equilibrium. This can be understood as reflecting a balance between two consequences of the stronger negative LD that accumulates with tighter linkage under stabilizing selection. First, that the stronger negative LD, as one component of the trait’s genetic variance, directly reduces the genetic variance by a greater amount. Second, that the stronger negative LD more severely slows down the allele-frequency dynamics at individual loci, allowing the rarer allele at each locus to be maintained at a higher mutation–selection balance in expectation, thus indirectly increasing the other component of the trait’s additive genetic variance: the variance contributed by polymorphism at individual loci (the ‘genic variance’).

However, as in the example above, previous treatments have usually focused on the consequences of the Bulmer effect for the equilibrium values of aggregate quantities such as the genetic and phenotypic variance of the trait under selection (but see [18]). For many applications though, it is important to characterize the per-locus allele-frequency dynamics themselves. For example, the joint distribution of allele frequencies and effect sizes is an important summary of a trait’s genetic architecture and, under stabilizing selection, is determined by the allele-frequency dynamics at individual loci [35]. Additionally, because the allele-frequency dynamics under stabilizing selection are very slow, analyses of their equilibria implicitly assume long-term constancy of the strength of stabilizing selection, which may be inappropriate for many traits (e.g., [19]; see Discussion). It might therefore be useful to understand the impact of the Bulmer effect on allele-frequency dynamics outside equilibrium scenarios. Finally, the theoretical literature on stabilizing selection is often highly technical. While this has allowed very general results to be obtained, it has also perhaps limited the absorption of these results into the empirical literature, and into human genetics in particular. It is therefore an important challenge to derive simple, intuitive results for population genetic dynamics under stabilizing selection that are both accurate and portable into existing empirical frameworks.

2 Model

We consider an additive polygenic trait under stabilizing selection. Genetic variation in the trait is contributed by autosomal polymorphic loci, . At each locus , there is a trait-increasing allele, with frequency and haploid effect , and a trait-decreasing allele, with frequency and haploid effect (so that the difference in the phenotypes of a homozygote for the trait-increasing allele and a homozygote for the trait-decreasing allele is, all else equal, ). Since we are interested in characterizing the effect of selection on allele-frequency dynamics at these loci, we ignore mutation (although we later discuss the role of mutation and the implications of our results for the rate of turnover of the loci underlying genetic variation in a trait—see Discussion).

An individual’s trait value is given by

where

is the individual’s additive genetic value for the trait, with their genotype coded as , , or if they carry 0, 1, or 2 trait-increasing alleles at locus respectively. is an environmental disturbance that we assume to be independent of and to have mean zero.

We denote the phenotypic variance by , the additive genetic variance by , and the environmental variance by . The trait’s heritability is , and its genic variance—the additive genetic variance ignoring the contribution of linkage disequilibrium (LD) among loci—is .

The trait is under stabilizing selection around an optimal value that we arbitrarily code as 0: the relative fitness of an individual with trait value is specified by the Gaussian fitness function

(1)

where modulates the strength of stabilizing selection on the trait (with smaller values of corresponding to stronger selection). It will sometimes be useful to approximate this Gaussian fitness function by a first-order Taylor approximation around the optimal value of zero,

(2)

We assume that mating is random and that the trait is not affected by any forms of selection other than the stabilizing selection specified above.

Under these assumptions, upon the onset of selection, the mean trait value in the population rapidly converges to the optimal value of zero (e.g., [20,21]). We are interested in the allele-frequency dynamics at causal loci after this directional phase of selection, once stabilizing selection has commenced, and therefore assume that the population starts with a mean trait value equal to zero—with the allele frequencies such that this is the case.

3 Results

Since we are interested in the allele-frequency dynamics at individual loci, it will be useful to consider the marginal fitnesses of the two alleles at a given locus, that is, the average fitness of an individual carrying a randomly chosen copy of the one allele versus the other. From these marginal fitnesses, we can calculate an ‘effective selection coefficient’ for one of the alleles—say, the minor allele. The expected change in frequency of this allele across a single generation is then , where is its frequency in the earlier generation [22].

We begin with a simple calculation of these marginal fitnesses under the model described above, ignoring phenotypic variation from the environment and from loci other than the focal locus, and also ignoring LD between the focal locus and other loci. We will then add in these two ingredients in turn.

3.1 A simple one-locus calculation

Consider a focal locus segregating for a trait-increasing allele , at frequency and with effect size , and a trait-decreasing allele , at frequency and with effect size (we have omitted the locus subscript since we are considering a single locus in isolation here).

A haploid instance of this locus contains allele with probability and allele with probability . The average genetic value at a haploid instance of the locus is therefore

Under stabilizing selection around an optimal trait value of zero, the average genome-wide genetic value must be zero. Therefore, considering the average genetic value of the focal haploid locus above, the genetic value of the rest of the genome (including the homologous instance of the locus) must be .

Now suppose that the allele at the focal haploid locus is . If we ignore any correlation between the allelic state at the locus and the genetic value of the rest of the genome—i.e., if we ignore LD—the average genetic value of the individual carrying this allele is

(3)

If the allele at the focal haploid locus is instead , the average genetic value of the individual is

(4)

Thus, the average phenotypes inhabited by the and alleles differ in sign. If , that is, if is the minor allele at the locus, the average phenotype inhabited by an allele will be further from zero than the average phenotype inhabited by an allele (and vice versa if ). This explains why the minor allele is selected against under stabilizing selection: for the average value of the phenotype to be at its optimum, the rest of the genome must adapt to the more common allele at the locus, leaving the rarer allele maladapted to the rest of the genome.

(As an interesting aside, we may use the same method to calculate the average phenotypes experienced by the three possible diploid genotypes at the focal locus, and therefore to rank the fitnesses of the three genotypes. The mean phenotypic effect of the diploid locus is twice the mean haploid effect, i.e., , and so mean phenotypic effect of all other loci is . Therefore, again ignoring systematic signed LD among causal loci, the mean phenotype of the genotype is , that of the genotype is , and that of the genotype is . Assume that is the minor allele at the locus, i.e., that . Then, if , , i.e., the fitness ranking of the genotypes is . If, instead, , then , i.e., the fitness ranking is . In neither case is the heterozygous genotype the least fit, and in fact it is the fittest genotype when the minor-allele frequency is greater than [9]. Therefore, although the frequency dynamics at the locus (Eq. 5) resemble those expected under classical underdominance at the locus, the fitness values of the genotypes at the locus do not.)

If we further ignore the phenotypic variation contributed by the environment and from the rest of the genome, then we can substitute the average phenotypic values above into our fitness function to calculate the marginal fitnesses of and . Employing the quadratic approximation in Eq. (2), the mean fitness of the bearer of a randomly chosen allele is

while the mean fitness of the bearer of a randomly chosen allele is

Since the mean fitness in the population is close to 1, the effective selection coefficient of is

which is positive if (when is the common allele at the locus) and negative if (when is the rare allele at the locus).

The expected change in frequency of across a single generation is then

(5)

which is a commonly used formula for allele-frequency change under stabilizing selection (e.g., [3,8,9]).

However, this formula overpredicts the rate of allele-frequency change in simulations (Fig 1). This is for two reasons. First, phenotypic variation from the environment and from other loci obscures the signal that selection sees of each allele’s average phenotype. Second, the alleles come into positive LD with opposite-effect alleles elsewhere in the genome, bringing the average phenotypes of their bearers closer to zero and thus weakening selection at the locus. We consider these two effects in turn.

thumbnail
Fig 1. Simulated versus predicted average minor-allele frequencies across the polymorphic loci affecting a trait under stabilizing selection, in the case of full symmetry across loci.

In the simulations, there are polymorphic loci, all unlinked and with equal effect sizes () and starting minor-allele frequencies (). The loci are initially in linkage equilibrium, and the strength of selection is chosen such that initially. Further simulation details can be found in the Methods. Eq. (25) (dashed blue line), which takes into account background phenotypic variance and the Bulmer effect, is seen to be a better prediction of the observed trajectory of average minor-allele frequency (solid black line) than either Eq. (5) (dashed pink line), which ignores both background phenotypic variance and the Bulmer effect, and Eq. (10) (dashed yellow line), which ignores the Bulmer effect. The simulation trajectory is averaged over replicate trials.

https://doi.org/10.1371/journal.pgen.1012035.g001

3.2 Background phenotypic variance

We assume that the trait value is normally distributed with mean zero and variance . Still ignoring LD for now, the distribution of trait values experienced by alleles is then normally distributed with mean (from Eq. 3) and variance (because, for a highly polygenic trait, the genetic variance at the individual haploid locus is small compared to the overall phenotypic variance). Similarly, the distribution of trait values experienced by alleles is normally distributed with mean (from Eq. 4) and variance .

In general, if the trait distribution among the bearers of a given allele is normal with mean and variance , then the average fitness of the bearer of a randomly chosen copy of the allele is

(6)

as shown in refs. [3,20] and S1 Text Section S1. The effect of taking into account background variance in the trait is simply to dilute the strength of selection at the focal locus by a factor , which is greater if the phenotypic variance is large relative to the width of the selection function.

Therefore, in our case, the mean relative fitness of the bearer of a randomly chosen allele is

(7)

while the mean relative fitness of the bearer of a randomly chosen allele is

(8)

where the approximations hold because for a polygenic trait. Since the overall mean fitness is close to 1, the effective selection coefficient of is

(9)

and so the expected change in frequency of across a single generation is

(10)

This prediction of allele-frequency change is smaller than the prediction of Eq. (5) by a factor , but, while more accurate than Eq. (5), Eq. (10) still overpredicts the rate of allele-frequency change in simulations (Fig 1). As we show next, this is because the mean phenotypes of bearers of and are in fact closer to zero than predicted by Eqs. (3) and (4), because and have each come into LD with opposite-effect alleles elsewhere in the genome.

3.3 The Bulmer effect

As we have discussed, stabilizing selection reduces a trait’s genetic variance in two ways. The first is that, by favoring compensating combinations of trait-increasing and trait-decreasing alleles, stabilizing selection generates negative LD between alleles with the same directional effect on the trait. This leads to a rapid decline in the trait’s genetic variance, to a quasi-equilibrium value that reflects a balance between the generation of LD by selection and its destruction by recombination (S1 Fig). The second is that stabilizing selection induces selection against rare alleles affecting the trait, eroding heterozygosity on average at their loci. For a highly polymorphic trait, selection against rare alleles is weak, and so their average frequency decline is slow.

In the limit of high polygenicity and a large population size, allele frequencies do not change at all under selection, and so the reduction in a trait’s genetic variance due to stabilizing selection is entirely due to LD. In this limit, and under some other simplifying assumptions including that all loci are unlinked, Bulmer [7] calculated the equilibrium reduction in the genetic variance of a trait under stabilizing selection, as a function of the trait’s initial genetic variance in the absence of LD (its genic variance, ), the strength of stabilizing selection on the trait (), and the contribution of the environment to trait variance (). Bulmer [23] extended this calculation to allow for variable linkage among loci (see also [15]).

An allele’s LD with causal alleles elsewhere in the genome will affect the mean trait value experienced by the allele, and therefore also its frequency dynamics under selection. To calculate how the LD generated by stabilizing selection affects allele-frequency dynamics at individual loci, we first determine how LD of any kind affects the mean trait value experienced by an allele. Thereafter, we calculate the total degree of LD generated by stabilizing selection at equilibrium, and the expected apportionment of this overall amount of LD to specific pairs of loci. Exploiting the difference in the timescales over which LD is generated and rare alleles decline in frequency under stabilizing selection (S1 Fig), we then substitute the equilibrium degree of LD into our calculation for how LD affects the mean trait value experienced by an allele to find the expected effect of LD on allele-frequency dynamics under stabilizing selection. The close agreement between our analytical predictions and our simulations validates this separation-of-timescales approach for polygenic traits.

3.3.1 The effect of LD on the mean trait value experienced by an allele.

Recall that there are loci underlying genetic variation in the trait, and that the trait-increasing allele at locus is at frequency and increases the value of the trait by , while the trait-decreasing allele is at frequency and decreases the trait value by . We denote the coefficient of LD between the trait-increasing alleles at loci and by . Since the population mean value of the trait is, by assumption, at its optimum 0,

(11)

Define the random variable to take the value if the allele at locus in a randomly chosen haploid/gametic genome is trait-increasing, and 0 if the allele is instead trait-decreasing. From Bayes’ theorem, these indicator variables are related to the coefficients of LD via

(12)

Consider a copy of the trait-increasing allele at , and suppose, without loss of generality, that it is maternally inherited. Because mating is random, the mean genetic value of the paternally inherited genome is that of a randomly chosen haploid genome, i.e., 0. So the mean trait value experienced by the allele is the mean genetic value of the maternally inherited genome in which it lies. (Similarly, if it is paternally inherited, the mean trait value it experiences is the mean genetic value of the paternally inherited genome.) Therefore, the average trait value experienced by a trait-increasing allele at is

(13)

Sensibly, positive LD with trait-increasing alleles elsewhere in the genome () will tend to increase the trait value experienced by the trait-increasing allele at , while negative LD with other trait-increasing alleles () will tend to decrease the trait value it experiences. With no LD (), , recovering Eq. (3) above.

By a similar calculation, the average trait value experienced by a randomly chosen trait-decreasing allele at is

(14)

3.3.2 Symmetric loci.

The calculations above for the mean trait values experienced by the two alleles at a locus hold for any nature and source of the LD terms (see Discussion). Since our particular interest is in stabilizing selection as a source of LD, we now calculate these terms in expectation under stabilizing selection. We begin with the simplest possible case, where all loci are unlinked and have the same minor-allele frequency and effect size . In this case, the mean phenotype experienced by a randomly chosen trait-increasing allele at locus is, from Eq. (13),

(15)

while the mean phenotype experienced by a randomly chosen trait-decreasing allele at is, from Eq. (14),

(16)

that is, LD between the focal locus and other causal loci in the genome causes the mean phenotypes experienced by both alleles at the focal locus to be multiplied by a factor , as if the effect size at the locus were not but instead .

Under the same conditions of symmetry across loci, and in the limit of high polygenicity, the equilibrium reduction in the trait’s genetic variance due to the Bulmer effect satisfies

(17)

where ([7]; S1 Text Section S3.4 in [24]). (Note that, for neatness, we define to be the reduction in the genetic variance, so that it is a positive number; this is in contrast to Bulmer’s , which is the net change in the genetic variance and therefore negative.) When , [25].

While Eq. (17) gives the total reduction in the trait’s genetic variance explicitly in terms of the model parameters , , and (which do not themselves depend on ), in practice we will usually not have access to measurements of a trait’s genic variance (since we cannot characterize all of the loci that contribute variance to the trait). Fortunately, the equilibrium value of can also be expressed implicitly in terms of the heritability of the trait, , and its variance, (which both depend on ), as well as :

(18)

where and are at their quasi-equilibrium values (see S1 Text Section S3.4.3 in [24]). Eq. (18) obviates the need to estimate .

The reduction of the trait’s genetic variance is entirely due to LD among the polymorphic loci that underlie genetic variation in the trait:

(19)

Under the assumption of symmetry across loci in their effects and minor-allele frequencies, this total sum of LD is apportioned equally, in expectation, across the pairs of loci: for each pair of loci and ,

(20)

(Note that the summations in Eq. (19) count each pair of loci twice.) So, for a given causal locus , the sum of its LD coefficients with the other causal loci is, in expectation,

(21)

Substituting Eq. (21) into Eqs. (15) and (16), we find that the mean trait value of a randomly chosen trait-increasing allele at locus is

(22)

and, similarly, that the mean trait value of a randomly chosen trait-decreasing allele at locus is

(23)

Comparing Eqs. (22) and (23) with Eqs. (3) and (4), we see that the mean phenotypes of the two alleles at the locus are as they would be in the simple scenario considered before with no LD, but with the effect sizes of the alleles attenuated by a factor (with specified by Eq. 17). That is, the LD generated by stabilizing selection can be incorporated into the classical formulae for allele-frequency change at the locus (Eqs. 5 and 10), which ignore LD, by defining an ‘effective’ effect size,

(24)

Across a single generation, the expected change in frequency of the trait-increasing allele at the focal locus is then

(25)

(Note that the phenotypic variances experienced by the trait-increasing and trait-decreasing alleles at are still equal, and close to owing to the trait’s high polygenicity.)

A reviewer has suggested an alternative approach to deriving Eqs. (13) and (14), and thereby (25), which we briefly outline here. The trait value is a linear regression on the genotype at any given locus , and the slope of this regression, , has denominator and a numerator that is a sum of effect-size-weighted LD terms ( being the covariance between the genotype at and the contribution of to the trait). Combining this observation with Eqs. (3) and (4), one can obtain Eqs. (13) and (14) as fitted values of the regression for and respectively. A similar approach is taken by Bulmer [15, Ch. 10], who obtains an expression (his Eq. 10.12) that can be translated to Eq. (25).

In simulations of stabilizing selection under the conditions of symmetry across loci assumed here, Eq. (25) is seen to be a better predictor of the allele-frequency dynamics at causal loci than either Eq. (5) or Eq. (10) (Fig 1).

Relaxing the assumptions that effect sizes and minor-allele frequencies are the same across loci, we show in S1 Text Section S3 that Eq. (25) is still a close approximation to the frequency trajectory at an individual locus when all loci are unlinked (S2 Fig).

The parameters in Fig 1 involve relatively strong stabilizing selection (). These parameters were chosen primarily because they generate strong LD, leading to larger disparities between the predictions that do and do not take this LD into account, and to ‘stress test’ the way in which Eq. (25) incorporates this LD. However, they are also consistent with the median quadratic selection gradient among those compiled by Kingsolver et al. [26] that are consistent with stabilizing selection ([27]; see also discussion in [17, Ch. 28]). Nonetheless, the prediction of Eq. (25) remains accurate in simulations of weak stabilizing selection as well, (S3 Fig), of the order of that estimated to act on human height [2,28].

Our expression for the expected change in allele frequencies at loci affecting the trait, Eq. (25), furthermore allows us to predict the change in the genic variance over time, via the relation

(26)

derived in S1 Text Section S2. In S4 Fig, we show that Eq. (25), when substituted into Eq. (26), provides a much better prediction of the trajectory of the genic variance observed in simulations than Eqs. (5) and (10).

3.3.3 Incorporating linkage.

The calculations above assume full symmetry across loci, and, in particular, that every locus is on a separate chromosome. This symmetry allowed us to apportion the total amount of LD generated by stabilizing selection evenly, in expectation, among the locus pairs.

We now consider the case of variable linkage relations among loci, with some locus pairs lying on the same chromosome and some lying on different chromosomes. Let the recombination fraction between loci and be . For now, we maintain the assumption of constant effect sizes and minor-allele frequencies across loci. Variable linkage relations among loci affect the total amount of LD generated by stabilizing selection: In the limit of high polygenicity, the equilibrium reduction in the trait’s genetic variance due to the Bulmer effect satisfies

(27)

where as before, and is the harmonic mean recombination rate across loci ([23]; S1 Text Section S3.4 in [24]). can be estimated from various kinds of data, including linkage maps, cytological data, and sequence data, similarly to the arithmetic mean recombination rate [29]. When , .

As in the case of no linkage, in practice we do not need to estimate , and in particular , to estimate the equilibrium value of . Instead, it can be expressed implicitly in terms of and (which depend on ), as well as and :

(28)

where, again, and are taken to be at their quasi-equilibrium values (see S1 Text Section S3.4.3 in [24]).

As before, the reduction of the trait’s genetic variance is entirely due to LD among loci, with . However, now, this overall LD is not apportioned equally among locus pairs in expectation. Instead, the expected LD between and is proportional to the inverse of the recombination fraction between the loci [23] (this relationship is true as long as is not too small), so that

(29)

For a given locus , let be the harmonic mean recombination fraction between and the other causal loci. Then, summing Eq. (29) across these other loci, we find

(30)

Because of our assumption of equal effect sizes and minor-allele frequencies across loci, Eqs. (15) and (16) still specify the mean trait values experienced by a randomly selected trait-increasing and trait-decreasing allele at respectively. Substituting Eq. (30) into (15) and (16), we find

(31)

and, similarly,

(32)

Therefore, comparing the equations above with those that ignore LD (Eqs. 3 and 4), we can again think of the impact of the LD generated by stabilizing selection on the mean phenotypes experienced by the alleles at a particular locus in terms of an attenuation of the ‘effective’ effect size at the locus, with this ‘effective’ effect size dependent on the recombination relations of the locus to other causal loci, according to

(33)

Intuitively, alleles at loci that have tighter recombination relations with other loci () will develop stronger average LD with the opposite-effect alleles at these other loci, and will therefore have their individual effects more greatly masked—and so their frequency dynamics more severely slowed—by the Bulmer effect than alleles at loci with looser recombination relations with other loci ().

The expected change in frequency of the minor allele at across a single generation is

(34)

Taking the average of Eq. (34) across loci provides a prediction of the average change in the minor alleles’ frequencies. This prediction agrees well with simulations that make use of realistic linkage maps (Fig 2). Particularly in the case of a low-recombination species (Drosophila melanogaster; ), in which the LD generated by the Bulmer effect is especially strong, Eq. (34) offers a much improved prediction of the allele-frequency dynamics over Eqs. (5) and (10) (Fig 2B).

thumbnail
Fig 2. Average change in the minor-allele frequency at polymorphic loci affecting a trait under stabilizing selection, when these loci are distributed across the human linkage map (A) and the linkage map of Drosophila melanogaster (B).

Eq. (34) is seen to predict simulated trajectories of the average minor-allele frequency better than Eqs. (5) and (10), especially for D. melanogaster, a low-recombination species. Other than the variable linkage relations among loci, simulation details are identical to Fig 1.

https://doi.org/10.1371/journal.pgen.1012035.g002

We show in S1 Text Section S3 that, if we relax the assumptions that effect sizes and minor-allele frequencies are the same across loci, Eq. (34) remains a close approximation to the frequency trajectory at locus , and we verify this by simulation under both the D. melanogaster linkage map (S5 Fig) and the human map (S6 Fig).

Although Fig 2 shows the correspondence of our predictions of allele-frequency changes and those observed in simulations under relatively strong selection (), our predictions remain accurate under weaker selection as well (), for both the human and D. melanogaster linkage maps (S3 Fig).

As in the case with no linkage, substituting Eq. (34) into Eq. (26) provides a much better prediction of the observed change in the genic variance in simulations with the human and D. melanogaster linkage maps than substituting Eqs. (5) and (10) into (26) does (S4 Fig).

We can also check the prediction of Eq. (34) for loci with lower and higher locus-specific harmonic mean recombination rates . Collecting loci in bins of gradually higher values of across the D. melanogaster linkage map (which, unlike in high-recombination species like humans, shows a sizeable range of values), Fig 3 plots the average minor-allele frequency in each bin after generations of selection against the average value of within each bin. While the prediction of Eq. (34) is a substantial improvement over predictions that do not take into account LD (and therefore also ignore variable linkage relations among loci), Eq. (34) underpredicts the degree of allele-frequency change for the lowest-recombination bins.

thumbnail
Fig 3. Minor-allele frequencies after 250 generations of selection, for loci with different average recombination rates with the rest of the genome.

The simulations here are the same as for Fig 2B, with loci distributed along the D. melanogaster linkage map. For the simulated minor-allele frequencies, loci have been binned according to their locus-specific harmonic mean recombination fractions with other loci, . While the equilibrium-based prediction (Eq. 34) that takes into account the Bulmer effect (blue line) is a substantial improvement over predictions that do not take into account the Bulmer effect (Eqs. 5 and 10; pink and yellow lines), it underpredicts the degree of allele-frequency change, especially for loci with low values of , since for these values, the approach to equilibrium values of LD is slow. A prediction based on the full sequence of non-equilibrium values of LD (Eq. 36; details in Methods) performs better than the equilibrium-based prediction. Note that the fourth and fifth bins contain only 6 and 2 loci respectively, potentially explaining the discrepancy between the simulated minor-allele frequencies and the predictions based on Eq. (36). Simulation points are averages across 500 replicate trials.

https://doi.org/10.1371/journal.pgen.1012035.g003

The reason has to do with the separation-of-timescales assumption underlying Eq. (34), and in particular the assumption that the quasi-equilibrium degree of LD (Eq. 29) is attained instantaneously for every pair of loci at the onset of selection. In reality, the expected degree of LD between each pair of loci builds up over time, approaching its equilibrium value at a rate that depends on the recombination fraction between the pair of loci involved: very rapidly for loosely linked locus pairs, but more slowly for tightly linked locus pairs. Therefore, for the most tightly linked locus pairs, the Bulmer effect is, for an appreciable number of generations after the onset of selection, weaker than assumed by Eq. (34), and so the allele-frequency dynamics at these loci are faster in these early generations than predicted by our calculations. The result is that, after a given number of generations, allele frequencies have not changed as much as predicted by Eq. (34), with the disparity greater for more tightly-linked loci—the pattern observed in Fig 3.

We can, in fact, for every pair of loci, calculate the expected LD in each generation after the onset of selection, and use these values to obtain a more accurate—albeit more complicated—prediction for the overall allele-frequency change at each locus after a given number of generations. Initially, the degree of LD between each pair of loci and is zero: . Within each generation , selection generates an increment to the degree of disequilibrium between and equal to (we will determine the value of shortly). This amount is the same in expectation for all pairs of loci, because we have assumed that effect sizes are constant, and is divided approximately equally in expectation between cis-LD (LD among pairs of alleles inherited from the same parent) and trans-LD (LD among alleles inherited from different parents), which selection does not distinguish [15,23]. In transmission to the next generation , recombination between loci and generates an amount of new cis-LD from the trans-LD that built up in generation , while an amount of the cis-LD that built up in generation is preserved in transmission to the generation , along with an amount of the cis-LD that was already present at the beginning of generation (before selection acted). Therefore,

(35)

Starting with zero LD (), Eq. (35) defines a sequence of expected LD values

(36)

which converges at rate to an asymptotic value of [15,23]. This is the equilibrium degree of LD between and ; comparison with Eq. (29) therefore reveals that .

For each polymorphic locus affecting the trait, Eq. (36) thus specifies the expected LD with every other polymorphic locus affecting the trait, and so, for each generation after the onset of selection, we can substitute these into Eqs. (13) and (14) to calculate the mean trait value experienced by the trait-increasing and trait-decreasing alleles at ; from these, we can calculate the expected change in frequency of the alleles at locus from generation to generation (see Methods). Carrying out this calculation, we find that it predicts the cumulative allele-frequency change observed across generations in our simulations more successfully than Eq. (34), especially for loci with especially tight average linkage relations with the other loci in the genome (Fig 3).

The expressions in Eq. (36) for the individual pairwise linkage disequilibria each generation after the onset of selection further allow us to predict the change in the contribution of these linkage disequilibria to the genetic variance . Together with Eq. (26), which relates changes in allele frequencies to changes in the genic variance , this allows us to predict the expected change in the genetic variance over time (S1 Text Section S2). S4 Fig shows that employing Eqs. (25) and (34) in this procedure leads to accurate predictions of the trajectory of the genetic variance, both when all loci are unlinked and when they lie along the linkage maps of humans and D. melanogaster.

4 Discussion

Understanding the processes that govern the genetic architecture of complex traits will require the interpretation of genomic data in terms of population genetic models. The richest and finest-scale data come from genome-wide association studies, which offer estimates of allelic effect sizes at thousands of sites throughout the genome [30]. Because of the per-site nature of GWAS data, its interpretation in terms of population genetic models will require a detailed understanding of allele-frequency dynamics under these models.

Here, we have provided simple calculations that predict allele-frequency dynamics under stabilizing selection—a common mode of selection on complex traits [1,2]—more accurately than the formulae that have previously been used for this purpose. To do so, we have incorporated into these formulae the linkage disequilibrium that stabilizing selection rapidly generates between opposite-effect alleles throughout the genome [7,23].

The accuracy of our calculations in predicting simulated allele-frequency trajectories under stabilizing selection (Figs 1, Fig 2, and Fig 3) suggests that they may make possible more precise quantitative interpretation of GWAS and other genomic data in terms of population genetic models of stabilizing selection. Below, we discuss some of the implications of our results for the interpretation of such data.

4.1 Genetic architecture of complex traits

Several studies have shown that, for many human traits, the joint distribution of allele frequencies and effect sizes (as inferred from GWAS) is consistent with the allele-frequency dynamics expected under stabilizing selection [35]. In demonstrating this consistency, these studies made use of Eq. (5) as a description of the allele-frequency dynamics. As we have shown, Eq. (5) overpredicts the rate of allele-frequency change under stabilizing selection, because it ignores the systematically signed LD generated by stabilizing selection (and background phenotypic variance of the trait under selection). Importantly, however, the equations that we have derived for allele-frequency change that do take into account the LD generated by stabilizing selection are qualitatively of the same form as those that ignore this LD, with an ‘effective’ effect size of each allele substituted for the allele’s true effect size (Eqs. 24, 33). Therefore, the results of [35] are not qualitatively affected by the more accurate predictions of allele-frequency change under stabilizing selection that we have derived. However, their estimates for the strength of stabilizing selection on human traits must be revised upwards, since for a given strength of stabilizing selection on a trait, the slowdown of the allele-frequency dynamics induced by the Bulmer effect will result in higher minor-allele frequencies, on average, across loci.

4.2 Allelic turnover and the portability of polygenic scores across populations

Once a GWAS has been carried out in a particular sample, it is common to use the effect-size estimates obtained from the GWAS to generate polygenic scores (PGSs)—sums of an individuals’ genotypes across trait-associated loci, with each locus weighted in the sum by its estimated effect size—and to measure the accuracy of these PGSs as predictors of trait values both within the original GWAS sample and in other samples. Such exercises have revealed that PGSs are often much worse predictors of trait values in samples that are more distantly related to the original GWAS sample (e.g., [31,32]).

Several explanations have been offered for this ‘portability’ problem, including differences between populations in (i) the effect sizes of causal alleles, (ii) the frequencies of causal alleles, (iii) the degree to which genotyped SNPs ‘tag’ causal alleles (via close-range LD), and (iv) the environmental and genetic effects with which variation at genotyped SNPs is confounded (reviewed in [33]). Of interest here is the second explanation, that PGSs can suffer reduced portability because of differences between populations in allele frequencies at causal loci [3437]. In some cases, an ancestral polymorphism might have been retained in a sample drawn from one population but lost in a sample drawn from another; the locus would then contribute trait variation in the one sample but not in the other.

Although turnover of the loci underlying variation in a trait is possible by neutral drift alone, the process is accelerated—and the portability of PGS prediction across populations more rapidly degenerates—if the alleles at these loci are under selection [3640]. Yair & Coop [36] note that stabilizing selection on the trait itself will have such an effect, because stabilizing selection speeds up allele-frequency dynamics relative to neutral drift. Yair & Coop further recognized that the allele-frequency dynamics under stabilizing selection, and consequently the rate of allelic turnover between populations, would be slowed by the Bulmer effect (especially for ensembles of tightly linked-loci; see their Figure S1). However, for simplicity, in quantifying the effect of stabilizing selection on the reduction of trait variance contributed by ancestral polymorphisms, and the consequent degeneration of PGS prediction accuracy across descendent populations, they ignored the contribution of LD generated by the Bulmer effect.

Since, as we have shown, the impact of the Bulmer effect on allele-frequency dynamics under stabilizing selection can be captured by defining ‘effective’ effect sizes of causal alleles, it can be incorporated into the calculations of Yair & Coop in a simple way. For example, their Eq. (2.3) calculates the fraction of trait variance in a generation- descendent population explained by polymorphisms of effect size that were present in the ancestral population: , where is the value of between the ancestral and descendant populations at neutral loci and . This calculation can be modified to take into account the Bulmer effect simply by replacing with the ‘effective’ effect size , as defined in Eqs. (24) and (33). The same is true of the analogous calculations developed by Yair & Coop based on a diffusion approximation.

4.3 Assumptions, and notions of equilibrium under stabilizing selection

Much previous work on the genetic dynamics of stabilizing selection, including that which considers the impact of the Bulmer effect, has been carried out under the assumption of equilibrium. The broadest definition of equilibrium under stabilizing selection is that the distribution of trait values is stationary, with the trait mean at its optimum and the trait variance held constant by a balance between selection, which reduces it, and mutation, which replenishes it. Although the mean will rapidly attain its optimal value if it is initially displaced from it [20,21], the time it takes for the variance to subsequently equilibrate is much longer, because the allele-frequency dynamics under stabilizing selection are very slow.

Therefore, studies that assume (or characterize) a constant distribution of trait values—and in particular a constant variance—under stabilizing selection implicitly assume long-term constancy of the strength of selection, the population’s demography, etc. The strictness of this assumption has allowed powerful results to be obtained, for example, in showing that the genetic variance at equilibrium under stabilizing selection is sometimes independent of the genetic map [10,13] and in characterizing the equilibrium allele-frequency spectrum as a function of effect size [3,5]. But the assumption that demography and the strength of selection have been constant on extremely long timescales might be problematic for many of the traits and populations to which such analyses might be relevant.

In contrast, our analysis of allele-frequency dynamics under stabilizing selection depends on relatively few assumptions. We have assumed that the trait mean is at its optimum, which, as noted above, it will very soon be if it starts away from the optimum (given sufficient genetic variance). We have also assumed that the negative LD that stabilizing selection generates is at its equilibrium value—this too occurs on a rapid timescale [23] (S1 Fig). Thus, our results hold for a wide range of non-equilibrium scenarios. For example, if the strength of selection changes, our expressions for allele-frequency change will become accurate as soon as the (now stronger) LD has re-equilibrated, while analyses based on trait equilibrium must wait much longer for the variance to re-equilibrate.

In our calculations of allele-frequency dynamics at individual loci, we have ignored the influence of mutation at these loci. This is because our interest is in the genetic dynamics of complex traits, which are now known to have extremely large mutational targets (e.g., for human height, estimated by Simons et al. [4]), only a small fraction of which will be sufficiently polymorphic at any given time to contribute meaningfully to the trait’s genetic variation (e.g., for human height [41,42]). For such traits, therefore, most new trait-affecting mutations occur at loci that are not currently polymorphic; consequently, the frequency dynamics at polymorphic loci are largely unaffected by mutation, and therefore closely follow the expressions we have derived. One way in which mutation could affect our results, however, is if new mutations tend to have a particular directional effect on the trait value, such as to decrease it on average, since this would lead to persistent one-sided departures of the trait mean from its optimum and thus directional selection on allele frequencies [43].

An interesting question concerns the robustness of our results to changes in demography, such as population size and structure. The equilibrium amount of LD under stabilizing selection, and the rate at which it is attained, do not depend on the population’s size or structure (being a product of selection). It is true that, in smaller populations, the mean value of the trait tends to deviate further from the optimum, but these deviations are negligible unless the population size is very small relative to the strength of selection (the trait mean explores a stationary distribution with variance [20]). Therefore, the two key assumptions underlying our calculations are largely unaffected by demography. As with the strength of selection, since populations’ demography will seldom be constant over the long timescales required for trait equilibrium to be reached under stabilizing selection, the robustness of our results to changes in demography is a major advantage.

Nonetheless, demography can affect the rate of allele-frequency dynamics at individual loci. This is because the rate of the frequency dynamics is mediated by the genic variance , via its contribution both to in the denominators of Eqs. (25) and (34) and to the slowdown factor in their numerators, and because itself can depend on population demography in complex ways (e.g., [44]).

4.4 Other processes that generate long-range signed LD

While stabilizing selection itself generates systematically signed long-range LD between alleles that affect the trait under selection (see [45] for recent evidence of this effect in humans), with these LDs impacting the frequency dynamics of the alleles in the way we have described, other processes can also generate systematically signed LD, and these LDs too will influence allele-frequency dynamics under stabilizing selection.

Most important among these other processes is assortative mating, which is known to be common for traits in humans [46,47] and other animals [48]. Assortative mating for a heritable trait generates positive LD between alleles with like effects on the trait [4951]. If the trait is also under stabilizing selection, this positive LD will tend to counteract the negative LD among like-effect alleles generated by stabilizing selection, speeding up the rate at which the minor allele declines in frequency relative to the case where there is no assortment on the trait. Whether the frequency dynamics are ultimately faster or slower than expected in the absence of any LD depends on whether the negative LD generated by stabilizing selection is outweighed by, or outweighs, the positive LD generated by assortative mating. Our general calculation of the mean trait values experienced by the two alleles at a locus, as a function of their effects and their degrees of LD with other causal loci (Eqs. 13 and 14), allows this balance to be quantified.

As with the LD generated by stabilizing selection itself, the effect of LD generated by assortative mating on allele-frequency dynamics under stabilizing selection can be captured by ‘effective’ effect sizes of alleles at polymorphic loci affecting the trait. In contrast to the case of stabilizing selection, under assortative mating, the equilibrium degree of LD expected between a pair of loci does not depend on the recombination fraction between the loci [50]. It can be calculated as a simple function of the correlation of the trait value among mates, the trait’s heritability, and the effect sizes of the alleles involved [24,50,52].

Suppose, for simplicity, that effect sizes and minor-allele frequencies are equal across loci. If the heritability of the trait is and the phenotypic correlation among mates is , then the expected LD between the trait-increasing alleles at loci and due to assortative mating is (see SI Section S3.1.1 in [24]). Therefore, for a given locus , the expected sum of the LDs between its trait-increasing allele and the trait-increasing alleles at other loci, due to assortative mating, is

(37)

If, when the trait is under assortative mating and stabilizing selection jointly, the LD between a pair of loci is simply the sum of the LDs expected under the two processes separately (), then, assuming all loci to be unlinked, we combine Eqs. (37) and (21) to find

(38)

where is given by Eq. (17). Substitution into Eqs. (13) and (14) reveals the effective effect size of each allele in this case to be

(39)

If we allow for variable linkage relations among loci, a similar calculation combining Eqs. (37) and (30) yields

(40)

and an effective effect size

(41)

where is now given by Eq. (27). On average across loci, therefore, , as in the case with no linkage (though with a larger value of owing to linkage).

Therefore, if , the impact of stabilizing selection will outweigh that of assortative mating, the effective effect sizes of alleles will be attenuated from their true effects, and allele-frequency dynamics under stabilizing selection will be slowed. In contrast, if , the impact of assortative mating outweighs that of stabilizing selection, the effective effect sizes of alleles are amplified, and allele-frequency dynamics under stabilizing selection are accelerated.

As an example, human height is under strong assortative mating ( [53]) and moderately strong stabilizing selection ( [2,28]). Assuming its heritability to be , and making use of the value we have calculated and the estimate from Sanjak et al. [2], we find that , while . Therefore, the effective effects of loci that affect height are amplified, on average, because of the relatively strong assortative mating; their frequency dynamics under stabilizing selection are therefore expected to be accelerated, in spite of the Bulmer effect.

Note that patterns of trait-based migration can lead to long-range LD similar in nature to that under assortative mating [24].

4.5 Pleiotropy

Our calculations have ignored the possibility that the alleles underlying variation in the focal trait also affect other traits which might be under stabilizing selection (or, following the discussion above, might be subject to other processes that generate long-range signed LD). To see how such pleiotropy could affect our conclusions, consider a pair of alleles at distinct loci that both increase the focal trait. All else equal, we would expect stabilizing selection on the focal trait to generate negative LD between these two alleles, to a degree that we can calculate in expectation (e.g., Eqs. 20 and 29). However, suppose that there is another trait under stabilizing selection, and the one allele increases this trait but the other allele decreases it. All else equal, we would expect stabilizing selection on this other trait to generate positive LD between the alleles. Clearly, the sign and magnitude of the LD between the alleles will depend on the size of their effects on the two traits, as well as the relative strengths of selection on the two traits.

While ultimately the degree and nature of pleiotropy is an empirical question, we can suggest some qualitative predictions for how the Bulmer effect will influence the frequency dynamics of alleles that affect multiple traits under stabilizing selection. Following Simons et al. [3,4], we focus on a scenario where we have allelic effect-size estimates for a single trait, but where these alleles also affect other, unmeasured traits.

Consider first the ‘isotropic’ case, where the effects of an allele on the various traits are independent. This is the case considered by Simons et al. [3,4], who note that it will always apply in a suitably reconfigured coordinate system for the traits in question. Consider two alleles that increase the focal trait. In the absence of pleiotropy, we would expect these alleles to come into negative LD because of selection on the focal trait. And because, under isotropic pleiotropy, the signs and magnitudes of the alleles’ effects on other traits are independent of their effects on the focal trait, the LD between the alleles is unchanged, in expectation, by their effects on these other traits. To be clear, the realized value of LD between this particular pair of alleles will change because of pleiotropy, but across such locus pairs, there is no tendency for pleiotropy to systematically increase or decrease the degree of LD from the value expected under selection on the focal trait alone.

It is, however, perhaps more likely that alleles’ effects on traits will be correlated. That is, if a pair of alleles both increase trait A, and we learn that one of the alleles decreases trait B, then the other allele probably also decreases trait B (since, for example, the effects of the alleles on traits A and B might in part be mediated through shared pathways in a genetic network). In this case, alleles with like effects on the focal trait we have measured are disproportionately likely to have like effects on other, unmeasured traits. In this case, the magnitude of the LD between a pair of alleles would, in expectation, exceed the value expected based on their effects on the focal trait alone, as calculated, for example, in Eqs. (20) and (29). Therefore, in this perhaps more realistic scenario than the isotropic model, the slowdown in allele-frequency dynamics induced by the Bulmer effect would be more severe than we have calculated.

5 Conclusion

Stabilizing selection is increasingly recognized as a key driver of the genetic architecture of complex traits [1,35,36]. The calculations that we have developed here make possible more precise predictions of its effects on genetic architectures and, conversely, more precise inference of the parameters of stabilizing selection from genomic data.

As we have noted, there is a rich theoretical literature on the population genetics of stabilizing selection, in which the Bulmer effect has been extensively treated (e.g., [7,1015,23]). We see our work as complementary to this literature in its focus on the consequences of the Bulmer effect for single-locus allele-frequency dynamics rather than aggregate quantities such as genetic and phenotypic variance. In this, and in its relative simplicity, our work offers a potential bridge between this prior theoretical literature and practical application to the growing body of population genomic data.

Methods

All simulations were run in SLiM 4.0 [54]. The model organism is diploid, sexual, and undergoes random mating each generation. In all setups, the population size is and there are initially autosomal polymorphic loci, either all unlinked or placed along a linkage map as described below. Each individual’s trait value is specified by , where is the number of trait-increasing alleles that the individual carries at locus minus 1, and is the effect size at the locus, as described in the Model section of the Main Text. For simplicity, in our simulations there is no environmental component of trait variation; i.e., the trait is fully heritable. The fitness of an individual with trait value is given by . In each simulation, was chosen so that , where is the initial variance of the trait (equal to the initial genetic variance , since the trait is by assumption fully heritable). Code is available at https://doi.org/10.5061/dryad.np5hqc089 [55].

Methods for Fig 1.

The loci are all unlinked with respect to each other (specified by recombination rate between ‘adjacent’ loci in SLiM), and the effect size at each locus is . At 500 of these loci, the initial frequency of the trait-increasing allele is , while at the other 500 loci, it is . The designation of the loci at which the trait-increasing allele is common versus rare is arbitrary, since all loci are unlinked with respect to each other. Trait-increasing and trait-decreasing alleles are initially assigned randomly to haploid genomes, independently across loci, so that there is no LD among loci in expectation. The initial additive genetic variance, , is therefore equal in expectation to the initial genic variance, , and so the inverse strength of selection is .

The simulation then proceeds for 500 generations, with fitness in each generation specified as described above. In each generation, we measure the frequency of the (initially) minor allele at each locus, and average the frequencies across loci. replicate simulations were run, with results averaged across simulations to produce the ‘simulation’ line in Fig 1.

The prediction lines in Fig 1 were constructed by iterating Eqs. (5), (10), and (25), updating only the allele frequency each generation (i.e., holding constant, even though it is slowly decreasing according to the equations themselves).

Methods for Fig 2.

The specification of the simulations in Fig 2 is the same as that in Fig 1, except that the loci now lie along sex-specific linkage maps. For humans, we used the male and female maps generated by Kong et al. [56], while for Drosophila melanogaster, we used the female map generated by Comeron et al. [57] (there is no crossing over in male Drosophila). In each case, we apportioned the loci to chromosomes proportional to their physical (bp) lengths (as reported in build 38 of the human reference genome, available at https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/, and in release 6 of the D. melanogaster reference genome, available at https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_029775095.1/). We ignored the sex chromosomes and the fourth ‘dot’ chromosome of D. melanogaster. For each chromosome, we spaced the loci apportioned to it uniformly along its sex-averaged genetic (cM) length. This was done to avoid the occurrence of adjacent loci with a zero recombination fraction, since in such cases the harmonic mean recombination fraction among all locus pairs would be undefined. As is usual in SLiM, there is no crossover interference; recombination fractions between adjacent loci were therefore calculated from the Morgan map distances between these loci via Haldane’s map function, .

To plot the prediction line for Eq. (34) in Fig 2 requires calculating the harmonic mean sex-averaged recombination fraction for each locus, , and the overall harmonic mean sex-averaged recombination fraction, . For every pair of loci on the same chromosome, we calculated the recombination fraction from the map distance using Haldane’s map function, as specified above; for pairs of loci on different chromosomes, the recombination fraction is . The relevant averages of these recombination fractions were then taken. We calculated values of for humans and D. melanogaster of and respectively. Notice that, in averaging Eq. (34) across the loci, we must first calculate for each locus separately, and then average this quantity across loci.

Methods for Fig 3.

The ‘simulation’ points in Fig 3 derive from the same simulations as in Fig 2 under the D. melanogaster linkage map. For each locus, the minor-allele frequency was measured after 250 generations, having started at frequency . We also calculated each locus ’s harmonic mean recombination fraction with all other loci, —among the loci, the minimum and maximum values were and .

Because the dynamics at individual loci are noisy owing to drift, we binned loci according to their locus-specific harmonic mean recombination rates, into bins (486 loci), (388 loci), (118 loci), (6 loci), and (2 loci). The simulation points in Fig 3 are the average minor-allele frequency in each bin after 250 generations, plotted against the (arithmetic) average value of in each bin.

For the predictions in Fig 3 that take into account non-equilibrium values of LD after the onset of stabilizing selection (purple dots), note that Eq. (35) defines the sequence across generations of expected LD values between loci and , starting with zero LD in generation 0 ():

(42)

In each generation , for each locus , we calculate its expected LD values with every other locus according to Eq. (42); we then sum these values and substitute the sums into Eqs. (15) and (16) to find the mean trait values experienced by the alleles at locus . These define the effective effect size at locus in generation ,

(43)

which then defines the expected change in allele frequency at locus from generation to :

(44)

We calculated the expected allele frequency after 250 generations for each locus, and averaged these predictions within each bin to obtain the purple points plotted in Fig 3.

Methods for S1, S2, S5, and S6 Figs.

In S1, S2, S5, and S6 Figs, minor-allele frequencies and effect sizes vary across loci. Initial minor-allele frequencies were chosen, independently for each locus, from a uniform distribution on , and effect sizes were chosen, independently for each locus and independent of minor-allele frequencies, from a normal distribution with mean and variance . Since effect sizes can be negative or positive here, we assigned the chosen effect size to the minor allele at each locus. The configuration results in an expected initial genetic variance of , and so we chose . In each replicate simulation, for the choice of minor-allele effect sizes and initial frequencies in that replicate, we calculated the initial mean trait value (which on average across replicates is zero) and defined this to be the optimal trait value in the replicate simulation; this procedure ensures that the population begins each replicate centered on its optimal trait value, consistent with our calculations above.

Methods for S3 and S4 Figs.

The simulations used in S3 and S4 Figs are the same as those in Figs 1 and 2.

Supporting information

S1 Fig. Separation of timescales over which stabilizing selection reduces genetic variance by generating linkage disequilibrium (Bulmer effect; difference between genic and genetic variance) and selecting against minor alleles at polymorphic loci affecting the trait (reduction in the genic variance).

Here, loci are distributed along the human linkage map. Initial minor-allele frequencies were chosen from a uniform distribution on and effect sizes were chosen, independently, from a normal distribution with mean 0 and variance 1. The strength of selection was chosen such that initially. Trajectories are averaged across replicate simulations.

https://doi.org/10.1371/journal.pgen.1012035.s002

(EPS)

S2 Fig. Predictions of average allele-frequency trajectories, versus simulated trajectories, when effect sizes and initial minor-allele frequencies are allowed to differ, and all loci are unlinked.

Initial minor-allele frequencies were chosen from a uniform distribution on and effect sizes were chosen, independently, from a normal distribution with mean 0 and variance 1. The strength of selection was chosen such that initially. In each replicate simulation, the optimal trait value was taken to be the initial mean trait value in the population, given the chosen set of minor-allele effect sizes and initial frequencies. Otherwise, simulation details are as for Main Text Fig 1.

https://doi.org/10.1371/journal.pgen.1012035.s003

(EPS)

S3 Fig. Average change in minor-allele frequency at polymorphic loci affecting a trait under weak stabilizing selection, of a strength of the order of that acting on human height.

Simulation details are as in Main Text Figs 1 and 2, but the strength of selection was chosen such that initially, and the simulation trajectories were averaged over replicate trials rather than owing to the larger effect of genetic drift relative to selection in this case. Again, Eqs. (25) and (34), which take into account the LD generated by stabilizing selection, are seen to provide a closer approximation to the observed trajectories than Eqs. (5) and (10), which do not.

https://doi.org/10.1371/journal.pgen.1012035.s004

(EPS)

S4 Fig. Predicted changes in the genic and genetic variance, versus those observed in simulations.

For the genic variance predictions, we substituted Main Text Eqs. (5), (10), (25), or (34) into S1 Text Eq. (S5). Under this procedure, Eq. (25) (all loci unlinked; left panel) and Eq. (34) (loci distributed along the human and D. melanogaster genetic maps; right two panels), which take into account the LD generated by stabilizing selection, are seen to predict the change in the genic variance substantially better than Eqs. (5)and (10), which do not. Predictions of the change in the genetic variance are based on the predictions of the change in the genic variance, together with predictions of the change of the aggregate LD over time from Eq. (36) (see S1 Text Section S2). Since the predictions of the genetic variance necessarily involve LD, we only show predictions based on Eqs. (25)and (34), which are seen to be highly accurate. Simulation details are as in Main Text Figs 1 and 2.

https://doi.org/10.1371/journal.pgen.1012035.s005

(EPS)

S5 Fig. Predictions of average allele-frequency trajectories, versus simulated trajectories, when effect sizes and initial minor-allele frequencies are allowed to differ, and loci are distributed along the autosomal linkage map of Drosophila melanogaster in the same way as in Main Text Fig 2B.

Initial minor-allele frequencies were chosen from a uniform distribution on and effect sizes were chosen, independently, from a normal distribution with mean 0 and variance 1. The strength of selection was chosen such that initially. In each replicate simulation, the optimal trait value was taken to be the initial mean trait value in the population, given the chosen set of minor-allele effect sizes and initial frequencies. Otherwise, simulation details are as for Main Text Fig 1.

https://doi.org/10.1371/journal.pgen.1012035.s006

(EPS)

S6 Fig. Predicted versus simulated minor-allele frequencies after 500 generations of selection when allelic effect sizes vary across loci, plotted as a function of the per-locus squared effect sizes.

Loci are distributed along the human linkage map in the same way as in Fig 3A. Initial minor-allele frequencies were chosen from a uniform distribution on and effect sizes were chosen, independently, from a normal distribution with mean 0 and variance 1. The strength of selection was chosen such that initially. In each replicate simulation, the optimal trait value was taken to be the initial mean trait value in the population, given the chosen set of minor-allele effect sizes and initial frequencies. Otherwise, simulation details are as for Main Text Fig 1.

https://doi.org/10.1371/journal.pgen.1012035.s007

(EPS)

Acknowledgments

We are grateful to Jeremy Berg, Graham Coop, Andy Dahl, Serena Debesai, Marida Ianni-Ravn, Xinyi Li, Pavitra Muralidhar, John Novembre, and Yuval Simons for helpful discussions.

References

  1. 1. Sella G, Barton NH. Thinking About the Evolution of Complex Traits in the Era of Genome-Wide Association Studies. Annu Rev Genomics Hum Genet. 2019;20:461–93. pmid:31283361
  2. 2. Sanjak JS, Sidorenko J, Robinson MR, Thornton KR, Visscher PM. Evidence of directional and stabilizing selection in contemporary humans. Proc Natl Acad Sci U S A. 2018;115(1):151–6. pmid:29255044
  3. 3. Simons YB, Bullaughey K, Hudson RR, Sella G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 2018;16(3):e2002985. pmid:29547617
  4. 4. Simons YB, Mostafavi H, Zhu H, Smith CJ, Pritchard JK, Sella G. Simple scaling laws control the genetic architectures of human complex traits. PLoS Biol. 2025;23(10):e3003402. pmid:41082512
  5. 5. Koch E, Connally NJ, Baya N, Reeve MP, Daly M, Neale B, et al. Genetic association data are broadly consistent with stabilizing selection shaping human common diseases and traits. openRxiv. 2024. https://doi.org/10.1101/2024.06.19.599789
  6. 6. Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, et al. Characterizing selection on complex traits through conditional frequency spectra. Genetics. 2025;229(4):iyae210. pmid:39691067
  7. 7. Bulmer MG. The Effect of Selection on Genetic Variability. The American Naturalist. 1971;105(943):201–11.
  8. 8. Wright S. The analysis of variance and the correlations between relatives with respect to deviations from an optimum. Journ of Genetics. 1935;30(2):243–56.
  9. 9. Robertson A. The effect of selection against extreme deviants based on deviation or on homozygosis. J Genet. 1956;54(2):236–48.
  10. 10. Lande R. The maintenance of genetic variability by mutation in a polygenic character with linked loci. Genet Res. 1975;26(3):221–35. pmid:1225762
  11. 11. Lande R. The influence of the mating system on the maintenance of genetic variability in polygenic characters. Genetics. 1977;86(2 Pt. 1):485–98. pmid:407132
  12. 12. Turelli M. Heritable genetic variation via mutation-selection balance: Lerch’s zeta meets the abdominal bristle. Theor Popul Biol. 1984;25(2):138–93. pmid:6729751
  13. 13. Turelli M, Barton NH. Dynamics of polygenic characters under selection. Theoretical Population Biology. 1990;38(1):1–57.
  14. 14. Turelli M, Barton NH. Genetic and statistical analyses of strong selection on polygenic traits: what, me normal?. Genetics. 1994;138(3):913–41. pmid:7851785
  15. 15. Bulmer MG. The Mathematical Theory of Quantitative Genetics. Oxford: Clarendon Press. 1980.
  16. 16. Bürger R. The Mathematical Theory of Selection, Recombination, and Mutation. New York: John Wiley & Sons. 2000.
  17. 17. Walsh B, Lynch M. Evolution and Selection of Quantitative TraitsEvolution and Selection of Quantitative Traits. New Yorkand Oxford: Oxford University Press. 2018.
  18. 18. Kirkpatrick M, Johnson T, Barton N. General models of multilocus evolution. Genetics. 2002;161(4):1727–50. pmid:12196414
  19. 19. Ulizzi L, Terrenato L. Natural selection associated with birth weight. VI. Towards the end of the stabilizing component. Ann Hum Genet. 1992;56(2):113–8. pmid:1503392
  20. 20. Lande R. NATURAL SELECTION AND RANDOM GENETIC DRIFT IN PHENOTYPIC EVOLUTION. Evolution. 1976;30(2):314–34. pmid:28563044
  21. 21. Hayward LK, Sella G. Polygenic adaptation after a sudden change in environment. Elife. 2022;11:e66697. pmid:36155653
  22. 22. Wright S. Evolution in Mendelian Populations. Genetics. 1931;16(2):97–159. pmid:17246615
  23. 23. Bulmer MG. Linkage disequilibrium and genetic variability. Genet Res. 1974;23(3):281–9. pmid:4435356
  24. 24. Veller C, Coop GM. Interpreting population- and family-based genome-wide association studies in the presence of confounding. PLoS Biol. 2024;22(4):e3002511. pmid:38603516
  25. 25. Turelli M. PHENOTYPIC EVOLUTION, CONSTANT COVARIANCES, AND THE MAINTENANCE OF ADDITIVE VARIANCE. Evolution. 1988;42(6):1342–7. pmid:28581080
  26. 26. Kingsolver JG, Hoekstra HE, Hoekstra JM, Berrigan D, Vignieri SN, Hill CE, et al. The strength of phenotypic selection in natural populations. Am Nat. 2001;157(3):245–61. pmid:18707288
  27. 27. Johnson T, Barton N. Theoretical models of selection and mutation on quantitative traits. Philos Trans R Soc Lond B Biol Sci. 2005;360(1459):1411–25. pmid:16048784
  28. 28. Stulp G, Pollet TV, Verhulst S, Buunk AP. A curvilinear effect of height on reproductive success in human males. Behav Ecol Sociobiol. 2012;66(3):375–84. pmid:22389549
  29. 29. Veller C, Kleckner N, Nowak MA. A rigorous measure of genome-wide genetic shuffling that takes into account crossover positions and Mendel’s second law. Proc Natl Acad Sci U S A. 2019;116(5):1659–68. pmid:30635424
  30. 30. Abdellaoui A, Yengo L, Verweij KJH, Visscher PM. 15 years of GWAS discovery: Realizing the promise. Am J Hum Genet. 2023;110(2):179–94. pmid:36634672
  31. 31. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019;51(4):584–91. pmid:30926966
  32. 32. Mostafavi H, Harpak A, Agarwal I, Conley D, Pritchard JK, Przeworski M. Variable prediction accuracy of polygenic scores within an ancestry group. Elife. 2020;9:e48376. pmid:31999256
  33. 33. Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, et al. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet. 2024;25(1):8–25. pmid:37620596
  34. 34. Wang Y, Guo J, Ni G, Yang J, Visscher PM, Yengo L. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat Commun. 2020;11(1):3865. pmid:32737319
  35. 35. Carlson MO, Rice DP, Berg JJ, Steinrücken M. Polygenic score accuracy in ancient samples: Quantifying the effects of allelic turnover. PLoS Genet. 2022;18(5):e1010170. pmid:35522704
  36. 36. Yair S, Coop G. Population differentiation of polygenic score predictions under stabilizing selection. Philos Trans R Soc Lond B Biol Sci. 2022;377(1852):20200416. pmid:35430887
  37. 37. Hu S, Ferreira LAF, Shi S, Hellenthal G, Marchini J, Lawson DJ, et al. Fine-scale population structure and widespread conservation of genetic effect sizes between human groups across traits. Nat Genet. 2025;57(2):379–89. pmid:39901012
  38. 38. Uricchio LH, Zaitlen NA, Ye CJ, Witte JS, Hernandez RD. Selection and explosive growth alter genetic architecture and hamper the detection of causal rare variants. Genome Res. 2016;26(7):863–73. pmid:27197206
  39. 39. Uricchio LH. Evolutionary perspectives on polygenic selection, missing heritability, and GWAS. Hum Genet. 2020;139(1):5–21. pmid:31201529
  40. 40. Durvasula A, Lohmueller KE. Negative selection on complex traits limits phenotype prediction accuracy between populations. Am J Hum Genet. 2021;108(4):620–31. pmid:33691092
  41. 41. Yengo L, Vedantam S, Marouli E, Sidorenko J, Bartell E, Sakaue S, et al. A saturated map of common genetic variants associated with human height. Nature. 2022;610(7933):704–12. pmid:36224396
  42. 42. O’Connor LJ, Sella G. Principled measures and estimates of trait polygenicity. bioRxiv. 2025;:2025.07.10.664154. pmid:40791441
  43. 43. Charlesworth B. Stabilizing selection, purifying selection, and mutational bias in finite populations. Genetics. 2013;194(4):955–71. pmid:23709636
  44. 44. Ragsdale AP. Mean fitness is maximized in small populations under stabilizing selection on highly polygenic traits. bioRxiv. 2025;:2025.11.17.688329. pmid:41332579
  45. 45. Zhang MJ, Durvasula A, Chiang C, Koch EM, Strober BJ, Shi H, et al. Pervasive correlations between causal disease effects of proximal SNPs vary with functional annotations and implicate stabilizing selection. medRxiv. 2023;:2023.12.04.23299391. pmid:38106023
  46. 46. Horwitz TB, Balbona JV, Paulich KN, Keller MC. Evidence of correlations between human partners based on systematic reviews and meta-analyses of 22 traits and UK Biobank analysis of 133 traits. Nat Hum Behav. 2023;7(9):1568–83. pmid:37653148
  47. 47. Border R, Athanasiadis G, Buil A, Schork AJ, Cai N, Young AI, et al. Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science. 2022;378(6621):754–61. pmid:36395242
  48. 48. Jiang Y, Bolnick DI, Kirkpatrick M. Assortative mating in animals. Am Nat. 2013;181(6):E125-38. pmid:23669548
  49. 49. Wright S. Systems of mating. III. Assortative mating based on somatic resemblance. Genetics. 1921;6(2):144–61.
  50. 50. Crow JF, Felsenstein J. The effect of assortative mating on the genetic composition of a population. Eugen Q. 1968;15(2):85–97. pmid:5702332
  51. 51. Yengo L, Robinson MR, Keller MC, Kemper KE, Yang Y, Trzaskowski M, et al. Imprint of assortative mating on the human genome. Nat Hum Behav. 2018;2(12):948–54. pmid:30988446
  52. 52. Border R, O’Rourke S, de Candia T, Goddard ME, Visscher PM, Yengo L, et al. Assortative mating biases marker-based heritability estimators. Nat Commun. 2022;13(1):660. pmid:35115518
  53. 53. Stulp G, Simons MJP, Grasman S, Pollet TV. Assortative mating for human height: A meta-analysis. Am J Hum Biol. 2017;29(1):e22917. pmid:27637175
  54. 54. Haller BC, Messer PW. SLiM 4: Multispecies Eco-Evolutionary Modeling. Am Nat. 2023;201(5):E127–39. pmid:37130229
  55. 55. Negm S, Veller C. The effect of long-range linkage disequilibrium on allele-frequency dynamics under stabilizing selection. Dryad. 2026.
  56. 56. Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A, et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature. 2010;467(7319):1099–103. pmid:20981099
  57. 57. Comeron JM, Ratnappan R, Bailin S. The many landscapes of recombination in Drosophila melanogaster. PLoS Genet. 2012;8(10):e1002905. pmid:23071443