A population genetic interpretation of GWAS findings for human quantitative traits

Human genome-wide association studies (GWASs) are revealing the genetic architecture of anthropomorphic and biomedical traits, i.e., the frequencies and effect sizes of variants that contribute to heritable variation in a trait. To interpret these findings, we need to understand how genetic architecture is shaped by basic population genetics processes—notably, by mutation, natural selection, and genetic drift. Because many quantitative traits are subject to stabilizing selection and because genetic variation that affects one trait often affects many others, we model the genetic architecture of a focal trait that arises under stabilizing selection in a multidimensional trait space. We solve the model for the phenotypic distribution and allelic dynamics at steady state and derive robust, closed-form solutions for summary statistics of the genetic architecture. Our results provide a simple interpretation for missing heritability and why it varies among traits. They predict that the distribution of variances contributed by loci identified in GWASs is well approximated by a simple functional form that depends on a single parameter: the expected contribution to genetic variance of a strongly selected site affecting the trait. We test this prediction against the results of GWASs for height and body mass index (BMI) and find that it fits the data well, allowing us to make inferences about the degree of pleiotropy and mutational target size for these traits. Our findings help to explain why the GWAS for height explains more of the heritable variance than the similarly sized GWAS for BMI and to predict the increase in explained heritability with study sample size. Considering the demographic history of European populations, in which these GWASs were performed, we further find that most of the associations they identified likely involve mutations that arose shortly before or during the Out-of-Africa bottleneck at sites with selection coefficients around s = 10−3.


1.
The model

Absorbing the environmental contribution into the fitness function
Here, we show that the additive environmental contribution to the phenotype can be absorbed into the fitness function, which justifies considering only the additive genetic contribution in our analysis. This result has been derived multiple times for the one dimensional case (e.g., 1). The argument in the multi-dimensional case is similar and included for completeness.
First, assume that the additive environmental contribution to the phenotype, ) , is Given that absolute fitness is defined up to a multiplicative constant, we can therefore absorb the additive environmental contribution by using the Gaussian fitness function where " = " + ) . Even when the environmental contribution is anisotropic, we can always choose a coordinate system in which the effective fitness function takes an isotropic form around the fitness peak (Eq. 1, which appears in the Model section of the main text).

The distribution of mutational effect sizes on a given trait
In the main text, we define the distribution of phenotypic effects of newly arising mutations in the n-dimensional trait space, . Here, we consider the projection of these effects on a given trait, / , taken without loss of generality to be on the 1 st dimension. The distribution of effect sizes on a focal trait will depend on the degree of pleiotropy, n, and the form of this dependency becomes important when we consider how pleiotropy affects genetic architecture.
We want to calculate the distribution of effect sizes on the focal trait, / , conditional on their overall effect, = . We assume that the distribution of effects of de novo mutations is isotropic in trait space. The effect of a mutation, , therefore has equal probability to occupy any point on an n-dimensional sphere with radius . Let S I denote the surface area of an m-dimensional sphere of radius and denote the angle between the vector and its projection / , i.e., / = cos . In these terms, the surface area element corresponding to angle is and by a change of variables, the surface area element corresponding to projection / on the focal trait is (for a similar derivation, see (2)).
Next, we consider the high pleiotropy limit form of this distribution. For any degree of pleiotropy, the symmetry of the mutational distribution implies that As we elaborate in the main text, important implications about quantitative genetic variation follow from this high pleiotropy limit. The limit also holds quite generally when the distribution of effect sizes is anisotropic (see Section 5.4). 5

Solving for summaries of genetic architecture
Here, we derive closed forms for summaries of genetic architecture under our model. We begin by deriving the first two moments of change in allele frequency in a single generation. With these moments at hand, we use the diffusion approximation to calculate the sojourn time for alleles that contribute to quantitative genetic variation (3). Together with the distribution of effect sizes derived in the previous section, the sojourn time allows us to obtain closed forms for summaries of genetic architecture. Specifically, we can obtain a closed form for any summary that can be described as a function of allele frequencies and effect sizes at sites contributing to quantitative genetic variation. We use these expressions to calculate the summaries used in the main text, for example the expected additive genetic variance and its distribution across sites.

The first two moments of change in allele frequency
We assume that: • The phenotypic distribution at steady state is well approximated by an isotropic multivariate normal distribution centered at the optimum, namely by the probability density = / "0s 4  • Both " and " ≪ " .
These assumptions are justified in Section 3.3.
We rely on these assumptions to calculate the first two moments of change in frequency in a single generation for an allele with phenotypic effect and frequency q. The fitnesses of the three genotypes at the site depend on its distribution of genetic backgrounds, i.e., on the total phenotypic contribution of sites other than the focal one, . Following Eq. A10 and assuming every allele contributes only a small proportion of the genetic variance, the distribution of is well approximated by . (A14) The first moment of change in allele frequency is then relying on our assumptions that " and " ≪ " . The functional form of the first moment is equivalent to that of the standard viability selection model with under-dominance and selection coefficient = S 4 A 4 or scaled selection coefficient Similarly, we find that which is the standard second moment with genetic drift.

Sojourn time
Based on the first two moments, we can use the diffusion approximation to calculate the sojourn time as a function of allele frequency, i.e., the density of the time that an allele spends at a given frequency before it fixes or is lost (3). For a mutant allele with initial frequency 1/2 and scaled selection coefficient S, the sojourn time is where erf is the error function and f ± , ≡ erf 2 ± erf 1 − 2 2 .
The sojourn time takes simple limiting forms when selection is effectively neutral ( ≪ 1) or strong ( ≫ 1). In the effectively neutral range, it is well approximated by τ = 2/q, and in the strongly selected range, it is well approximated by τ = 2 exp − / .

Calculating expectations of summaries of architecture
Many summaries of interest can be expressed as sums over segregating sites of some function ( , / ), where is the derived allele frequency and / is the effect size on the The total additive genetic variance over all sites is The closed form for E in Eq. A24 was integrated numerically to obtain where erfi is the imaginary error function (erfi ≡ erf / ).
In the effectively neutral and strong selection limits, we can use limit forms of the sojourn time to derive simple approximations for . In the effectively neutral limit, i.e., when ≪ 1, τ ≈ 2/ and therefore In practice, this expression provides a decent approximation when < 1 (Fig. A1a). In the strong selection limit, when ≫ 1, τ ≈ 2 exp(− ) / and therefore (A28) In practice, this expression provides a decent approximation when > 30 (Fig. A1a). The constant 2 " , which recurs in our derivations (e.g., Eq. A24), thus has a simple interpretation: it is the expected contribution of strongly selected sites to additive genetic variance (per unit mutational input). We therefore denote it by ¦ , and henceforth measure variance in these units.

Densities
Here, we consider how the additive genetic variance is distributed among sites. We begin by deriving a closed form for the density of segregating sites with a given contribution to variance . This density follows from substituting Dirac's delta function δ − 2 / " 1 − into Eq. A23: where q ± , / = / " 1 ± 1 − 2 / / " are the two frequencies for which = 2 / " 1 − .
This integral can be calculated numerically for any S and degree of pleiotropy n (by using the corresponding density η / ). Moreover, as we illustrate below, summary statistics of the distribution of variances among sites can be expressed and calculated in terms of integrals over the density ( | ).
We can greatly simplify the expression for ( | ) in the limits of effectively neutral and strong selection, and especially in the cases without pleiotropy or with extensive pleiotropy. When selection is effectively neutral ( ≪ 1), then τ ≅ 2/ and thus where IS' ≡ /8 is the maximal contribution to variance for a mutation with selection coefficient , which is obtained when both alleles have frequency ½. When the degree of pleiotropy is high ( ≫ 1), / is approximately normally distributed with mean 0 and variance /4 (Eq. 11, which appears in the Results section of the main text) and the expression for the density simplifies to When selection is strong, derived alleles are rare ( ≪ 1), implying that ≪ / " and ≈ /2 / " , and that the sojourn time is well approximated by τ = 2 exp(− )/ . The density ( | ) then simplifies to Without pleiotropy, this expression further simplifies to and when the degree of pleiotropy is high ( ≫ 1), then We are especially interested in the distribution of variances among sites that exceed some threshold contribution * . As we discuss in the main text and in Section 6, to a first approximation, the loci identified in a GWAS would be those with contributions to additive variance that exceed the study's threshold contribution * . In particular, our inferences based on GWAS data rely on fitting the probability density of the number of segregating sites with variance that exceed a given threshold contribution * (Section 7). This probability density is: where is the expected number of segregating sites with contributions to variance exceeding * per unit mutational input.
In our analysis, we focus on the expected proportion of additive genetic variance arising from sites that exceed a threshold contribution * , which approximates the heritable variance explained in GWAS. This proportion is given by Given a distribution of selection coefficients, f( ), the corresponding proportion is 12 The dependences of the proportion of variance G * and the number of sites K * on the strength of selection for cases without pleiotropy and with extensive pleiotropy are shown in Figs. 3 and A3, respectively. We rely on Eqs. A32, A33, A35, A36, A38 and A39 to derive simplified forms for these summaries in the effectively neutral and strongly selected limits (Table A1). While the expressions for the effectively neutral limit were derived for ≪ 1, in practice they provide a decent approximation when < 1 (Fig. A4a & b). In the strong selection limit ( ≫ 1), the expressions for the case without pleiotropy provide a decent approximation for > 30 (Fig. 3a), whereas with extensive pleiotropy they already work quite well for > 5 (Fig. 3b).

Selection
Effectively neutral ( ≪ 1) Strongly selected ( ≫ 1) is an exponential integral and artanh is the inverse hyperbolic tangent. Figure A3. The number of loci per Mb contributing more than * to the variance, as a function of * , in the case without pleiotropy, = 1 (a), and in the high pleiotropy limit, ≫ 1 (b). We assume a constant population size of 20,000, with a mutation rate of 1.2 ⋅ 10 8À (5). Threshold contribution to variance (! * ) (in units of ! # ) Number of loci > ! * Number of loci > ! * Figure A4. The proportion of additive genetic variance that arises from sites that contribute more than the value on the x-axis, for a single trait (a) and in the pleiotropic limit (b). We show the x-axis in units of IS' = /8 ⋅ ¦ in order to evaluate the approximations in the effectively neutral limit (in dashed black; Eqs. 14 & 16, which appear in the Results section of the main text); note that IS' is not the maximal variance in cases with pleiotropy.
Both the proportion of variance, G * | , and number of variants, K * | , appear to always increase with the degree of pleiotropy, n (Fig. A5). We do not have a proof for this property but can suggest an intuitive explanation. Without pleiotropy (n=1), the selection coefficient determines the effect size, such that any contribution * to genetic variance corresponds to a specific minor allele frequency * . The sites with contributions > * are therefore those with minor allele frequencies > * . Pleiotropy causes sites with a given selection coefficient to have a distribution of effect sizes on the trait under consideration.
As a result, some sites with frequencies above * end up with contributions to variance below * while others exceed * . To understand how this affects G * | , recall that for any selection coefficient, the density of variants always rapidly increases as * decreases. As long as the contribution * and the corresponding frequency without pleiotropy * are not close to 0, we may therefore expect that introducing pleiotropy would result in pushing more sites above * than below * , resulting in a net increase to the proportion G * | .
For the same reasons, the number of variants with > * shows a similar behavior and also grows with .  % variance from loci > ! * % variance from loci > ! * Figure A5. The effect of pleiotropy on the proportion of the variance explained by sites contributing more then * to the variance, G( * | ) (see Eq. A39). For all selection coefficients, the proportion of variance explained increases as the number of traits, i.e., the degree of pleiotropy, increases.

Comparing predictions against simulations
We tested our theoretical derivations for the total genetic variance and its distribution among sites against forward computer simulations. The code and documentation are available at https://github.com/sellalab/GenArchitecture. The simulation implements the model as specified in the main text, with the following additional details and one exception.
First, we assume the infinite sites model for mutation. Second, the distribution of scaled selection coefficients, or equivalently the distribution of mutation sizes (see Eq. 7, which appears in the Results section of the main text), is taken to be a Gamma distribution, with specified parameters (see below). For computational efficiency, we use fecundity rather than viability selection; however, we ran a smaller number of simulations to verify that this choice does not lead to a detectable difference in the results. Each simulation is run for a burn-in period of 10 generations, to ensure convergence to the steady state behavior, before the variances at segregating sites are measured.
We explore a range of parameter values chosen to balance biological plausibility (see  simplicity, we take " = 1, which is equivalent to choosing the units used to measure effect sizes. The simulation results for the total genetic variance and its distribution among sites are in close agreement with our theoretical predictions (Fig. A6). Specifically, within the parameter ranges that we assume, i.e., when 1 2 ≪ " / " ≪ 1 (see Section 4), the total genetic variances measured in simulations are indistinguishable from our predictions ( Fig. A6a). Moreover, simulations and prediction seem to agree even when " / " ≤ 1/2N, although we consider this range to be less relevant, given our focus on highly polygenic traits (see Section 4.4). We also compare simulated and predicted distributions of variances among sites, in terms of G , the proportion of the variance arising from sites that contribute more than (Eq. A40), and find them to be in close agreement (Fig. A6b). Figure A6. Testing theoretical predictions for genetic variance against simulation results. (a) Total genetic variance (in units of " ) as a function of the mutation rate. The biologically relevant range of mutation rates, 1/2 ≪ ≪ 1, and the range in which we expect our predictions to be valid, " /2 ≪ " ≪ " , are marked by a grey box. (b) The distribution of variances among sites, for = 0.01; G( * ) is the proportion of variance from sites contributing more than * (Eq. A40). Error bars represent one standard deviation. For each set of parameters, the number of simulations was chosen to obtain standard deviations below 10%. In practice, we often obtain much smaller standard deviations, which is why most error bars are too small to be visible.
We ran two additional variations on the basic simulation procedure (also available at https://github.com/sellalab/GenArchitecture): one to explore the effects of a shift in the 16 optimal phenotype (Section 5.1 and Fig. A7) and another to explore the effects of asymmetric mutational input (Section 5.2 and Fig. A8). To these ends, for simplicity, we consider the case without pleiotropy, i.e., with = 1. In the first, after the 10 generations burn-in period, we introduce a shift in the optimal phenotype, and trace the allelic behavior over an additional 4,000 generations (see Section 5.1). In the second, after the 10 generations burn-in period, we introduce asymmetry in the rates of trait increasing and decreasing mutations, and trace the allelic trajectories over an additional 10,000 generations. The parameters of these simulations are detailed in Sections 5.1 and 5.2, respectively.

Justification for assumptions
Here, we justify the assumptions that we relied upon in deriving the first two moments of change in allele frequency (see Section 2.1; modeling assumptions are motivated in the introduction of the main text). We rely in part on self-consistency arguments, which should not be mistaken for being circular: specifically, we make assumptions about the behavior of the system and show that the solution to which we arrive satisfies these assumptions.

Normal and isotropic phenotypic distribution around the optimum
The assumption that the phenotypic distribution is well approximated by a normal distribution stems from the additive model of quantitative traits. By assuming that the phenotype arises from many additive contributions and that these additive contributions arise from some underlying distribution, normality follows from the law of large numbers.
In terms of model parameters, we would expect normality to hold if the rate of mutations affecting the trait is sufficiently large, i.e., when 2 ≫ 1.
We further assume that the phenotypic distribution is isotropic and its mean is at the optimum. Isotropy of the phenotypic distribution follows from assuming isotropy in the mutational input. In Section 5.4, we explore the consequences of anisotropy in the mutational input. In Section 4.4, we further show that the fluctuations of the mean phenotype around the optimum over time have negligible effects on allelic dynamics; a similar argument applies to fluctuations in the variance.

The phenotypic variance satisfies ≪
With the mean phenotype centered at the optimum, requiring that " ≪ " is equivalent to assuming that moving a standard deviation away from the mean phenotype entails only a minor reduction in fitness. This seems plausible for many phenotypes: if, for example, this assumption did not hold for human height, then individuals whose height is a standard deviation or more away from the population mean would suffer a substantial reduction in fitness. Arguably, deviations from the mean height would then be recognized as a very common and severe disease.
Another line of argument that it is likely that " ≪ " is based on our results. If we assume that mutations are strongly selected, then our results suggest that " = 2 It follows that if the rate of mutations affecting the phenotype under consideration satisfies ≪ 1 then " ≪ " . The number of mutations per diploid human genome per generation is estimated to be ~60 (5), and less than 10% of the genome is assumed to be functional (6), suggesting that the number of de novo mutations with any effect on function is less than 3 per haploid per generation. It then seems plausible that the (haploid) mutation rate affecting a specific trait satisfies ≪ 1. Assuming that mutations are weakly selected increases the variance in Eq. A41 only moderately and assuming the mutations are effectively neutral would suggest it is much smaller, leaving the above argument intact.

Mutational effect sizes satisfy ≪
As we argued in the introduction of the main text, variants for which the stronger condition " ≪ " holds account for most or all of the heritability explained in GWAS for many traits (e.g., 7, 8-10). Moreover, evidence for many traits suggests that the same is true for the variants that underlie the heritability that remains to be explained (11)(12)(13)(14). Indeed, for this assumption to be violated, much of the genetic variance would have to arise from mutations that have a very large impact on fitness (i.e., with s on the order of 1). While this may be the case for some diseases (e.g., autism (15)), it does not appear to be the case for most quantitative traits that have been examined.

Deviations of the mean phenotype from the optimum can be neglected
In reality, the mean phenotype of the population fluctuates around the optimum. Here, we derive equations for the dynamic of the mean phenotype in order to estimate the magnitude and timescale of these fluctuations. We then show that these fluctuations have a negligible effect on the first two moments of change in allele frequency and thus on the results that follow from these moments.
We begin by deriving the first and second moment of change in mean phenotype. To this end, we assume the distribution of phenotypes is a multivariate normal centered around a mean phenotype, , i.e. that The expected change in mean phenotype due to selection in one generation is therefore E Δ = By the same token, the variance in Δ is simply the sampling variance where in both cases we relied on the assumption that " ≪ " .
These two moments define an Ornstein-Uhlenbeck process in , allowing us to rely on wellknown results (16). Notably, when the mean phenotype starts far from the optimum, it decays exponentially to the optimum with exponent " / " (see Section 5.1 below). At steady state, will fluctuate with mean zero and E " = " 2 over a time scale of " " generations. The typical displacement of in any given direction will be " 2 , reflecting a balance between drift and the pull of selection toward the optimum.
Next we show that these fluctuations of the mean have negligible effects on allelic trajectories. To this end, we derive the first two moments of change in allele frequency, but this time, we include the effect of the displacement of from the optimum. While the second moment remains the same, the first moment becomes where S is 's component in the direction of . However, our analysis establishes that is a scalar on the order of 1, which fluctuates around zero on a timescale of " " .
We can therefore compare the first term in the above equation, which represents directional selection, and the second term, which represents stabilizing selection. When stabilizing selection is strong, ≫ 1, the stabilizing selection term dominates over the directional selection term. In contrast, when selection is weak, i.e., ≈ 1 or smaller, then in any given generation, the directional term is not necessarily negligible. However, in this case, both terms affect substantial change in allele frequency only over a timescale of 2 generations; on this timescale, if 2 ≫ " " , the directional effect would average to zero.
For " to be that small, virtually all alleles must have ≪ 1, such that their trajectories will be determined by drift, not selection. In summary, regardless of the selection acting on an allele, fluctuations of the mean phenotype around the optimum will have a negligible effect on its trajectory. 20

Model robustness
In this section, we consider the sensitivity of our results to relaxing some of the simplifying modeling assumptions about selection and mutation. Specifically, we show our results to be robust to moderate changes to the optimal phenotype; small asymmetry in the mutational input; the presence of major loci maintained at high frequency by selection on traits that are not included in the model; as well as to most forms of anisotropic mutation.

Changes to the optimal phenotype
We first consider how changes to the optimal phenotype over time would affect our results.
It is easy to imagine how events such as migration from Africa to Europe or the onset of agriculture may have introduced rapid changes in optimal phenotypes. In order to evaluate the potential impact of such events, we consider how an instantaneous change to the optimal phenotype would affect the allelic dynamics. Similar models have recently been analyzed in the limit of infinite population size (17,18).
We begin by considering how such an instantaneous change to the optimum would affect the mean phenotype. If the shift to the optimum is small, on the order of the fluctuations in the mean phenotype at steady state or smaller, then the arguments provided in Section 4.4 will still hold and the shift would have a negligibly small effect on our results. We therefore assume that the shift in optimum, , is large compared to the scale of fluctuations ( " ≫ " /2 ). This assumption means that we can use a deterministic approximation (based on Eq. A43) and describe the change in mean phenotype in a single generation by (neglecting higher moments). Further assuming that the mean phenotype was at the optimum, 0, before the optimum shifted (at time = 0) and neglecting changes to the genetic variance , we find that Thus, the mean adapts to the new optimum on a timescale of " / " generations (see (19) for a similar derivation).
We can rely on this approximation to learn when a shift in optimum will have negligible effects on allele trajectories. Recalling Eq. A45, the first moment of change in allele frequency is given by where, based on our approximation (Eq. A47), the directional selection term introduced by the shift in optimum takes the time-dependent form The effect of this directional term over the entire adaptive trajectory can be quantified by comparing the expected allele frequency after adaptation to the shift, Ë , with initial frequency before the shift, ~, i.e., This result suggests that the relative change in allele frequency will be negligible so long as This condition suggests that mutations with smaller effects would be less affected by the shift in the optimum. It further suggests that alleles that satisfy " ≪ " , as appears to be the case for most loci discovered in GWAS (e.g., 7, 8-10), will be negligibly affected by shifts in optimum on the order of the total genetic variation (i.e., ≤ ). These analytic predictions are confirmed by simulations (Fig. A7). Figure A7. Distribution of the contributions of sites to variance after a shift in the optimum. The y-axis is the proportion of the variance explained by sites that contribute more than * to the variance. The theoretical prediction without adaptation is shown in dashed black, and simulation results for different shifts in the optimal phenotype are shown in color. When the root mean square of ( ⋅ )/ " becomes larger than 1, directional selection substantially affects allele frequencies and therefore the contributions of sites to variance, as predicted by Eq. A51. (Since mutation is symmetric the mean of ( ⋅ )/ " is zero and we quantify its characteristic value by its root mean square " / " .) Simulations were run with an exponential distribution of selection coefficients with E( ) = 25, = 1,000, = 1, = 0.01, and a burn-in time of 10,000 generations. Results were taken 50 generations after the shift in optimum, which, for these parameters, is just after the population mean has reached the new optimum.

Asymmetric mutational input
In this section, we consider the sensitivity of our results to asymmetries in the mutational input, i.e., to the case in which mutations in a given direction in trait space are more likely to arise than mutations in the opposite direction (see (20) for treatment of this problem in the limit of high per-site mutation rate).
An asymmetric mutational input introduces a shift in the mean phenotype, , every generation. With new mutations arising at frequency 1/2 , the expected shift is where E Î is the expectation over newly arising mutations. For each trait, effects have a characteristic size E " = ¦ E( /4). The characteristic effect size sets the scale for the maximal shift in any direction, that is Δ Ï is of the order of 2 E " or smaller.
We therefore parameterize the shift in mean phenotype due to new mutations by where the vector parameterizes the strength and direction of the bias and = is assumed to be << 1.
At steady state, the mutational shift must be offset by selection, such that where Δ Ñ and Δ X are the expected shifts due to directional and stabilizing selection, respectively. We previously found that the expected directional shift is where denotes the mean phenotype (see Eq. A43). As we show next, when mutations are strongly selected, stabilizing selection offsets the mutational shift to maintain the mean phenotype at the optimum, implying that directional selection is negligible. In contrast, when mutations are effectively neutral, stabilizing selection is negligible and a directional term might not be negligible by comparison. However, as long as asymmetry is small, ≪ 1, we show that this directional term is not large enough to change the allele dynamics, 23 both when all mutations are effectively neutral and when some mutations are strongly selected.
First, we consider the shift in mean phenotype due to stabilizing selection. This shift arises because, with asymmetric mutational input, the distribution of phenotypes becomes skewed. Therefore, stabilizing selection may change the mean phenotype even if it is at the optimum. We have already shown (Eq. A15) that the expected change in allele frequencies per generation due to stabilizing selection at any given site i is The expected change in mean phenotype can then be calculated by adding up the contributions over sites The right-hand side of this equation reflects the skewness of the phenotypic distribution.
Indeed, in one dimension, it can be shown that with µ Ö ( ) being the third central moment of the phenotypic distribution. In n-dimensions, for every direction , When sites are under strong selection, Δ X takes a simple form. Assuming the asymmetry is small, the shift due to stabilizing selection can be expanded in orders of . The leading term in the frequency distribution takes the same form as it does without the bias. For strongly selected alleles with no bias, ≪ 1 and therefore the frequency dependence in this term can be approximated by . Moreover, q scales with 1/a 2 , implying that the distribution of " is independent of and that E " = " / (see Section 3.1).
Therefore, when all sites are strongly selected, the leading term in the shift due to stabilizing selection is Thus, to a first order in , the shift of the mean phenotype due to stabilizing selection offsets the mutational shift, implying that there will be no directional term and that the allele dynamics will not be affected by asymmetry.
When alleles are instead effectively neutral, then " / " ≪ 1/2 (see Section 2.2) and allele frequencies are well approximated by the neutral sojourn time, τ ≈ 2/ . The shift due to stabilizing selection then satisfies implying that it makes a negligible contribution to offsetting the mutational shift. In this case, the mutational effect on the mean phenotype is therefore offset by directional selection, where indicating a displacement of the mean phenotype from the optimum This displacement introduces a directional selection term into the first moment of change in allele frequency that, if large enough, could alter allele dynamics (see Section 4.4).
However, when all alleles are effectively neutral, we have and therefore the scaled directional selection coefficient, for an allele with effect size and scaled stabilizing selection coefficient = 2 with S~/ being the projection of in the direction of . Since ≪ 1, for all alleles other than those with unusually large selection coefficients, the scaled directional selection coefficient will be much smaller than 1 and the trajectories will still be determined by drift and not selection. Even in this case, therefore, we do not expect asymmetry to affect allele dynamics.
Next, we consider the case where there is a mix of effectively neutral and strongly selected mutations. The existence of strongly selected mutations in addition to effectively neutral 25 ones reduces the deviation of the mean phenotype from the optimum. Denoting the proportion of strongly selected mutations by ¦ , we have where E ‹.ž. ≤ 1 is the mean scaled stabilizing selection coefficient for effectively neutral mutations. Since " > 2 ¦ ¦ , we can then obtain an upper bound to the magnitude of scaled directional selection coefficient for an allele with effect size and scaled stabilizing With a substantial proportion of strongly selected sites, (1 − ¦ )/2 ¦ is of the order of 1, and therefore This condition implies that for effectively neutral alleles (i.e., ≤ 1), the scaled directional selection coefficient is ≪ 1 and allele trajectories will be determined by genetic drift, whereas for strongly selected alleles (i.e., when ≫ 1), the scaled directional selection coefficient is ≪ and therefore negligible compared to the scaled stabilizing selection coefficient.
Weakly selected alleles (with 1 < < 30) behave largely like strongly selected alleles except that stabilizing selection on them only partially cancels out the mutational bias (for example, for = 10 only 85% of the bias is canceled). The rest of the bias is canceled by directional selection and therefore induces a small shift in the mean phenotype. It is straightforward to repeat the arguments given above and show that the shift in the mean phenotype for a trait with only weakly selected alleles or a mixture that includes weakly selected alleles negligibly affects allele trajectories.
Thus, we conclude that small asymmetry in mutation will not affect the allelic dynamic (see  The mean phenotype , in units of ¦ , as a function of mutational bias . Simulations were run with = 1,000, = 1 and with different mixtures of effectively neutral (exponentially distributed with ( ) = 0.1) and strong (exponentially distributed with ( ) = 50) selection coefficients. Asymmetry was simulated by having more trait increasing than trait decreasing mutations; if is the proportion of trait increasing mutations then the asymmetry coefficient is = 2 − 1. As expected, for small biases (when ≪ 1), there are no substantial changes in the distribution of the contribution of sites to variance. Simulations were run with a 10,000 generations burn-in period without asymmetry and then 10,000 generations with asymmetry and averaged over many runs (>300), with the number of runs varied across plots keep errors in (a) below 1%.

Major effect loci
In this section, we show that our results are insensitive to the presence of major loci, i.e., individual loci that contribute substantially to quantitative genetic variation. We have in mind, for example, loci whose alleles are maintained at high frequencies by balancing selection on a Mendelian trait but have pleiotropic effects on the quantitative traits under  consideration (e.g., HLA loci (21,22)). While such loci violate our assumptions, we show that they do not affect the dynamics at other loci that fulfill them.
To this end, we calculate the first two moments of change in allele frequency in the presence of a major locus. We denote the frequency and effect size of the focal allele by and , and the frequency and effect size of the major allele by Ï and Ï , respectively. As in our previous derivations (Section 2.1), the distribution of background phenotypic contribution from all other loci, , is well approximated by the normal distribution , where Ï " is the contribution to genetic variance from the major locus. The population mean remains close to the optimum because any shift caused by the major locus is quickly compensated for by the other loci (see Section 4.4). We then average over both this distribution and the three genotypes at the major locus to calculate the mean fitness associated with each genotype at the focal locus. Namely, and similarly for the other genotypes. In this way, we obtain the first moment of the change in allele frequency which is the same as we derived in the absence of a major locus (Eq. A15). Similarly, we find the second moment to be unaffected.

Anisotropic mutation
In this section, we consider how relaxing the assumption that the distribution of newly arising mutations is isotropic in trait space would affect our results. As noted, we can We focus on a family of anisotropic mutational distributions that can be described as a projection of a multivariate normal distribution on the unit sphere in trait space. Namely, we draw the size of a mutation = from some distribution and to obtain its direction, we draw a vector from a multi-variate normal distribution MVN(0, ) and normalize it, i.e., When selection acts mainly on our focal trait, i.e. when / ≈ 1, then | / | ≈ and therefore / ≈ ± . Such a relationship between the strength of selection and effect size is well approximated by an isotropic model with ) = 1. We therefore focus on cases in which there is a significant pleiotropic contribution to selection, i.e., / is substantially less than 1.
Anisotropy then has two effects: the first is to introduce heterogeneity in the strength of selection on different traits and the second is to introduce correlations in the effects of a mutation on different traits, notably between the focal trait and others.
We first consider the case in which the strength of selection differs among traits, but traits are uncorrelated, corresponding to a diagonal covariance matrix, . When many traits have a non-negligible contribution to selection, " = / " + " " + ⋯ O " would have a small coefficient of variation, i.e., C ê " " = V " /E " " ≪ 1, because of the law of large numbers. In this case, However, there is an extreme scenario in which an effective number of traits cannot describe the distribution of effect sizes. This happens when C ê " " ≥ 1, that is when selection acts mainly on a small number of traits but our focal trait contributes very little to selection ( / ≪ 1). In this case, we might be tempted to use ) = 1/ / ≫ 1 but, as Eq. A74 suggests, the high pleiotropy limit would be inadequate. In fact, the variance in selection on newly-arising mutations (due to the contribution of the selected traits) will result in a longtailed distribution of effect sizes on the focal trait, which is not well-approximated by any isotropic model. Excluding these extreme cases, isotropic models provide a good approximation for the relationship between selection and effect size, even when there is heterogeneity in the strength of selection on different traits.
To illustrate the effect of heterogeneity in the strength of selection among traits, we consider a simple example in which all non-focal traits make the same contribution to selection and therefore can be modeled by where " is the ratio between the expected fitness effects of non-focal and focal traits. In  Next, we consider the case in which the effect sizes on different traits are correlated, i.e., when the covariance matrix has off-diagonal terms. / " ∝ / " / " and therefore we parameterize the effect of these terms using the correlation between / " and " , " ≡ Effective isotropic model Anisotropic model would change from 1 when " → 1 to 1/ / when " → 0. Note that with a large number of traits, very strong correlations among many of the traits will be necessary in order to create a large enough " to have a significant effect on ) (see Fig. A10).
To illustrate the effect of correlations among traits, we consider the following simple example (Fig. A10). We assume the correlation matrix takes the form

The power to detect loci in GWAS
In this section, we summarize the results that we rely on in connecting our theoretical predictions with the observations from GWAS (see Discussion in main text). These results provide a first approximation of the power to detect loci in GWAS in re-sequencing and genotyping studies. In Section 6.3 we consider potential complications that arise when GWAS do not identify causal loci but rather SNPs in LD with them.

Re-sequencing studies
First, we consider how the power to identify a locus in a GWAS depends on its contribution to genetic variance. To this end, we follow Sham and Purcell (24) where / is the true effect size and x is the minor allele frequency at the locus (which, due to the large study sizes, we assume to be estimated without error), ó is the total phenotypic variance, and the study size (which in reality may be an effective size reflecting study design, e.g., when the sample is split into discovery and validation panels) (24).
Under the null hypothesis, the true effect size is 0, meaning that The power to identify a locus as significant with p-value * is the probability that the estimated contribution of the locus to variance, , is large enough that This condition can be translated into a threshold contribution to variance * for which loci with > * are considered significant, i.e. Pr žõöö > * = * , with * given by where h ± , * = / " 1 ± erf /2 ∓ erf 8/ 1 − * and the two terms correspond to the estimated and true effect sizes having the same or opposite sign.
The form of the power function carries important implications (Eq. A84 and Fig. A11).
Notably, it shows that (in this approximation) power depends only on the contribution of a locus to variance, and this contribution should be measured relative to, or in units of, VP/m.
This scale makes intuitive sense, because the total phenotypic variance generates the background noise for detecting an individual locus, and the background noise is inverse proportional to the study size. In particular, the threshold contribution to variance * , as defined above, is proportional to ó and is also the contribution to variance at which power is 50%, i.e., The power function can then be approximated by a step function (see Fig. A11) This will be a good approximation when the number of loci that fall at intermediate range   (Section 3). Notably, our results suggest that the first loci to be detected, those that contribute the most to variance, are intermediate and strongly selected, and that their contributions to variance are on the order of vs. We therefore expect GWAS to begin to identify loci (and account for substantial genetic variance) when * is on the order of ¦ , i.e., when the study size is on the order of ó ¦ . We would further expect the rate of increase in identifying new loci (and in accounting for genetic variance) to be similar for different traits when variance is measured in units of ¦ .

Genotyping
Most current GWAS rely on genotyping instead of re-sequencing, resulting in an additional loss of power (26). Specifically, these studies impute the alleles at loci that are not included in the genotyping platform (27), and the imputation becomes imprecise when the imputed alleles are rare (Fig. A12). If causal loci with rare minor alleles are included in GWAS, this imprecision leads to an under-estimation of their effect size, resulting in loss of power (26).
For loci with MAF x and effect size a, the expected estimate of the effect size would be reduced by a factor of ( ), where " ( ) is the mean correlation between the imputed and real alleles (28), and the distribution of estimates can be approximated by Employing the reasoning of the previous subsection, we can therefore approximate the power to detect a locus by H " , * , where H is the power function defined in Eq. A84. Figure A12. The precision of imputation decreases with MAF. Specifically we show the mean correlation between imputed and real genotypes as function of minor allele frequency, for a study using an Illumina 1M SNP array and the 1000 genomes phase III as an imputation panel (based on Extended Fig. 9A in (29)). We approximate the effect on power by excluding loci with MAF < 1% and assuming that loci with greater MAFs are imputed correctly.

Cutoff frequency
In practice, GWAS often include only loci with MAF above a threshold, which is chosen to ensure precise imputation. We therefore approximate the effect of genotyping on power by excluding loci below a threshold MAF and assume that loci that exceed this threshold are imputed correctly.

Tagging
Our inference is predicated on the assumption that the distribution of estimated variances among genome-wide significant (GWS) associations faithfully reflects the distribution among causal loci. We have no obvious alternative but to make this assumption, and arguably, the good fit of our theoretical predictions to the distribution of variances among associations provides some support for this assumption. While this assumption cannot be directly tested at present, existing arguments and evidence suggest that it is plausible, for reasons that we briefly review.
Most of the variants discovered by GWAS are common. Specifically, all but one of the GWS associations for height and BMI, which we rely upon in our inference, have MAF>1%, and the MAF of most associations is considerably greater. In considering the validity of our assumption, we therefore consider what could be tagged by such common associations.
One possibility is that a given common association is tagging a single common, causal variant. Given the accuracy of imputation for common variants (see Fig. A12), we would therefore expect that the tagging variant would be in almost perfect LD with the causal one (including the possibility that the association is actually with the causal variant). If that were the case, then we would expect the estimated frequency and effect size, and thus the estimated contribution to genetic variance, to be very similar to those of the causal variant.
A second possibility is that a given association tags several common causal variants within the same genomic region. The number of causal variants would likely be small, as otherwise the tagging allele is highly unlikely to be in LD with causal alleles that affect the trait in the same direction. If that were the case, given the accuracy of imputation of the causal alleles, we would expect conditional analysis (e.g., 30) to successfully distinguish between the different causal variants, thus returning us to the previous scenario.
A third possibility involves a common association tagging rare, causal variants (25). While a single rare, causal variant would have to have an unreasonably large effect size in order to result in a common GWS association (31), it has been argued that several rare, causal 37 variants in the same genomic region may be tagged by a single "synthetic association" (25).
In this case, the relatively low LD between the association and each of the causal variants would imply that the estimated contribution to variance of the association would have to be much smaller than the combined contribution of the causal variants (25,31). If this were the case for many associations identified in GWAS, it would violate the premise of our inference.
However, multiple lines of evidence suggest that it is not a common occurrence. One is that, where data is available, associations often replicate across populations. For example, there is considerable overlap between GWS associations for height in Europeans and East-Asians (32). While we would not expect perfect replication even if associations were tagging single, common, causal variant, we would expect practically none if they were synthetic, both because the underlying rare, causal alleles would be less likely to be shared among populations and because the particular LD configuration that allows for their tagging in one population would likely break down in others (33,34). A second is that simulation studies suggest that synthetic associations are expected to have much lower MAF than typically observed among associations in GWAS (31). Moreover, these simulations suggest that, because synthetic association should capture only a fraction of the variance contributed by the tagged loci, having many synthetic associations would imply there being much more heritable variance than is known to be present in the population. A third, and perhaps most direct line of evidence, is that, to the best of our knowledge, none of the studies that pursued fine-mapping around GWS associations have uncovered such synthetic associations (33,35,36). These arguments, together with other lines of evidence (e.g., 31) suggest that in practice synthetic associations are likely to be rare.
Perhaps a more plausible alternative is for an association to primarily tag one common, causal variant, with which it is in high LD, but also to pick up the effects of one or a few rare, causal variants, which are more poorly tagged. Under this scenario, we might expect the estimated contribution to variance to slightly overestimate the contribution of the dominant causal variant. To the best of our knowledge, this scenario has not been well characterized, making it difficult to assess how common it is or whether the overestimation would be substantial.
In summary, given what we now know, our assumption about the distribution of estimated variances among associations reflecting the distribution among causal loci seems sensible.

Inference
In this section, we describe how we used our model to make inferences based on GWAS results for height and body mass index (BMI). As we note in the Discussion, these inferences are meant as an illustration and do not incorporate the effects of demography and a few other factors (e.g., genotyping and errors in the estimation of effect sizes (24,26)), which lie beyond the scope of this study.

The composite likelihood
Our inferences are based on a composite-likelihood approach. We begin by describing the composite-likelihood function and its maximization, when the loci detected by GWAS are strongly selected and can be described by the high-pleiotropy limit. In this case, we have shown that the distribution of variances among loci is insensitive to the distribution of selection coefficients, depends on a single parameter ¦ , and is well approximated by the probability density where I ≡ exp(− )/ intervals of the estimate with the chosen thresholds ( Fig. A13c & d). This analysis further supports our choice to exclude the loci with the lowest contribution to variance. For BMI, we also dropped the locus with the largest contribution to variance (near FTO), which appears to be an outlier (Fig. A13b) and has been suggested to be under balancing selection (37). Figure A13. Determining * and removing outliers. The total variance from significant associations as a function of the threshold contribution to variance, for height (a) and BMI (b). The insets show a close up of the lower range of contributions to variance, highlighting the decline in the density of discovered loci. Our chosen thresholds are shown by the dashed vertical line (in all graphs). Our estimates of ¦ as a function of the chosen threshold, for height (c) and BMI (d). When we increase the threshold, the estimates remain within the 95% CI of the estimate with our chosen threshold.

Estimating target size and explained variance
We estimate the target size and the variance explained, both for varying study size and total, based on our estimates of ¦ . The population-scaled mutational input per generation from strongly selected loci, 2 ¦ , is estimated by (see Eq. A38) and the corresponding estimate for the target size is where the estimate for the population scaled mutation rate per site per generation 2 ≈ 0.5 ⋅ 10 8Ö is based on heterozygosity (29). The explained variance corresponding to GWAS with study size is estimated by where we approximate the threshold corresponding to study size based on the study size, ~, and threshold, * , in current GWAS, by * = * ⋅~. (A100) To estimate the total variance arising from strongly selected loci, we simply set the threshold in Eq. A99 to 0.

Estimating confidence intervals
We use a combination of non-parametric and parametric bootstrap to estimate confidence intervals (CI). We use non-parametric bootstrap to estimate the CI for the model parameters ¦ and ¦ : specifically, we perform 10,000 iterations, in which we resample the loci identified by GWAS and repeat the estimation of ¦ . We use parametric bootstrap to estimate the confidence intervals in Fig. 5a, describing the explained variance as a function of threshold based on our model. To that end, we rely on our model with the point estimates for ¦ and Š , to generate 10,000 samples from GWAS with the specified threshold, and then calculate the total variance explained by these samples. We use a combination of non-parametric and parametric bootstrap to calculate the CI for model predictions, including the total variance, ¦ " , and the explained variance, ¦ " , and number of loci as a function of study size (Fig. 5b & c). In this case, we generate 10,000 samples by: i) estimating ¦ based on a resampled set of GWAS loci (similar to the nonparametric procedure), and ii) using the estimated ¦ and corresponding ¦ to generate a GWAS hits above * based on our model (similar to the parametric procedure); we then calculate the appropriate summary based on the latter samples. This two stage procedure is intended to capture the uncertainty generated by both the errors in estimating our basic model parameters and the noise generated by the stochastic processes underlying the 42 number and variance at segregating loci that are yet to be discovered. The resulting estimates and CI are summarized in Table A2.  Table A2. Parameter estimates and their confidence intervals for height and BMI based on GWAS results; the heritability was assumed to be 0.8 for height and 0.5 for BMI (8, 10).

Testing goodness of fit
We use the Kolmogorov-Smirnov D statistic (38,39) to test the goodness of fit of our models without pleiotropy and in the high pleiotropy limit. Since our parameter estimates are inferred from the data that we are testing against, we cannot rely on the standard tables for the p-values. We therefore generate null distributions for the D statistic using parametric bootstrap based on our models. Specifically: i) we generate 10 . samples of K significant loci based on the model under consideration, with the corresponding estimate of ¦ , ii) we infer ¦ based on each sample, and iii) calculate the Kolmogorov-Smirnov D statistic between the distribution of variances for the K loci in each sample and the corresponding theoretical distribution based on the ¦ inferred from that sample. The 43 resulting distribution of D statistics corresponds to our null hypothesis, i.e., that the loci detected in GWAS arose according to our model, and specifically to the way we calculate the D statistic between the observed distribution of variances for the K detected loci and the theoretical distribution that we inferred based on these observations. We then calculate the D statistic, 6 , based on the real data and corresponding theoretical distribution, and estimate the one-sided p-value by %8Š = # 456õö7ç‹8 87ç74‹ç4 95ç: Ë±Ë 9 # 456õö7ç‹8 87ç74‹ç4 . (A101) Note that unlike the common case, here the inability to reject the null indicates that the data is consistent with our model. Figure A14. Q-Q plots comparing the distribution of variances among significant loci taken from the GWAS of height (10) and BMI (8) with the theoretical distributions inferred from these data, based on the models without pleiotropy (a) and in the high pleiotropy limit (b). These plots show that the model assuming high pleiotropy cannot be rejected for either trait and fits these data much better than the model without pleiotropy.

Consistency with other datasets and analyses
Here, we show that the results of our inference for height are consistent with findings of a recent GWAS based on exome genotyping; that our inferences for height and BMI are consistent with estimates of the heritability tagged by SNPs with MAF > 1% in the GWAS we used; and that our model is consistent with estimates about the relationship between effect size and MAF in these and other GWAS.

Exome association study of height
Marouli et al. (40) present an association study for height that was specifically designed to capture rare, exonic variants. They rely on the ExomeChip genotyping array (41), which includes the vast majority of protein-altering variants with MAF>0.1%, allowing them to directly (i.e., without imputation) test for associations among rare variants. Using a study size of more than 300,000 European individuals, they find over 400 genome-wide significant associations. Here we examine whether their findings are consistent with our inference based on the Wood et al. genome-wide, genotyping based GWAS for height (10).
In addition to protein altering variants, the ExomeChip includes some synonymous SNPs associations. In addition, we apply the procedure described in Section 7.2, resulting in the removal of associations with contributions to variance below * = 1.15 ⋅ 10 8• ó , for which power is substantially diminished (Fig. A15a) ; this step leaves us with 147 associations.
Next, we compare the distribution of variances among the remaining 147 associations with our theoretical prediction, with the Š inferred from the Wood et al. data (Table A2) above the threshold * (Fig. A15b). We do not consider the fit to the number of associations, because it depends on the mutational target size for protein-altering variants affecting height, which is unknown. To test whether the observed distribution is consistent with our prediction, we calculate the Kolmogorov-Smirnov D statistic (38,39) for this comparison, 6 , and ask whether we can reject our prediction based on the value of 6  Doing so, we find that %8Š = 0.99, and thus, we cannot reject our predictions based on the data from Marouli et al. (40). This result indicates a good fit to their findings.

The heritability arising from common SNPs
Yang et al. (42,43)  The lower bound in Eq. A103 is therefore negative, implying that requirement A103 is met regardless of the distribution of selection coefficients for > 5. Next, we consider the results of our analysis in Section 9, incorporating the effects of recent changes in European population size. Our results suggest that GWS associations arise from loci with selection coefficients of ≈ 10 8Ö . We therefore ask whether the Yang et al. (42,43) estimates are consistent with ours, when we attribute our equilibrium estimates of the

The relationship between SNP heterozygosity and effect size
More recent studies of the heritability tagged by SNPs in GWAS also make inferences about the relationship between effect sizes and MAF (44)(45)(46). Specifically, they assume that the relationship between the contribution of a site to variance, = 2 / " (1 − ), and its MAF, , takes the form or equivalently, that and they estimate the value of from the data.
Provided a distribution of selection coefficients, f( ), Eq. A20 implies that in our model The aforementioned studies assume the relationship in Eq. A105 (or A106), without providing any evidence that this somewhat arbitrary functional form fits the data better than others, and show that values of between -1 and 0 provide the best fit to data from GWAS of a variety of traits. To show that our model is in agreement with theirs, all we therefore need to do is to find distributions of selection coefficients, f( ), that approximate the relationship of Eq. A108 for values of between -1 and 0. In Fig. A17, we assume that selection coefficients follow a Gamma distribution, where we vary its expectation and variance. As expected, E monotonically decreases as increases. When E ≪ 1 or the coefficient of variation C ê " ≪ 1, E varies minimally with and can approximated by Eq. A108 with = 0. In other cases, varies more substantially with . When we approximate those cases using Eq. A108, we obtain a range of values between −1 and 0. Thus, our model appears to be consistent with the values of reported in (44)(45)(46). Our inferences for height and BMI are not very informative about the distribution of selection coefficients and are therefore not comparable with estimates of .

The effects of demographic history
While our theoretical results were derived on the assumption of a panmictic population of constant size, the evolutionary history of human populations sharply deviates from these simplifying assumptions. Notably, most large GWAS, including the studies of height (10) and BMI (8) that we use to test our predictions, have been performed in predominantly European populations, which are known to have experienced dramatic changes in their effective population size, including an Out-of-Africa bottleneck about ~100 KYA and explosive population growth over the past ~5 KY (47)(48)(49)(50). These changes in population size have dramatically impacted the frequencies of neutral and selected alleles (47)(48)(49)(51)(52)(53), and are therefore expected to have had a substantial impact on the architecture of quantitative traits (52,53). These considerations raise several questions about the interpretation of the fit between our predictions and GWAS data. Notably, how will these historical changes in population size affect our prediction, and specifically, why do our equilibrium predictions fit GWAS data despite the dramatic historical changes in population size? While a comprehensive treatment of these questions warrants a study in itself, we briefly address them here.
Even with changing population size, our results for the dynamics at segregating sites should still hold. Notably, we would expect the mean phenotype in the population to maintain the optimal phenotypic value, because any displacement from the optimum would be quickly adjusted by small changes to allele frequencies at numerous loci (see Section 4.4). As a result, the dynamics at individual sites would be decoupled, and well approximated by the first two moments of change in allele frequency described in Eqs. 5 and 6, which appear in the Results section of the main text. In particular, the first moment would correspond to under-dominant selection, and the selection coefficient would be proportional to the size of the allele in the n-dimensional trait space (as described in Eq. 7, which appears in the Results section of the main text). We can therefore study the effect of historical changes in population size on allele frequencies with simulations, using a fixed (not population-scaled) selection coefficient with under-dominance, and having the population size change over time.
To this end, we modify the simulation from Simons et al. (53) to incorporate underdominance, and the historical changes in the effective population size of European populations inferred by Schiffels and Durbin (50) (Fig. A18) Fig. A18). The derived allele frequency is recorded at the last generation corresponding to the present. The software and documentation can be found at https://github.com/sellalab/GenArchitecture. Figure A18. Changes in population size in the history of Europeans, as inferred by Schiffels and Durbin using MSMC (50). The cutoff between the two and four haplotype MSMC inferences is marked by the gray line.
We rely on such simulations to study how changes in populations size will affect the genetic architecture of a trait under the assumptions of our model. To this end, we consider a grid of selection coefficients: = 10 8Ò À , = 8, 9 … , 40, where for each selection coefficient, we run 15 ⋅ 10 Û simulations. In this way, we obtain numerical approximations for the expected site frequency spectrum corresponding to each selection coefficient, which replaces the term 2 • τ( | ) in our expressions for summaries of genetic architecture (Section 3). We further assume the high pleiotropy limit form for the distribution of effect sizes on the focal trait corresponding to a given selection coefficient (i.e., Eq. 11, which appears in the Results section of the main text). We first consider how demography affects the distribution of genetic variances among sites with different selection coefficients (Fig. A19a). The expected contribution per site (including both sites that are segregating and monomorphic) peaks around a selection coefficient of ≈ 10 8Ö and, as in the case with constant population size (Fig. 2a), when the strength of selection increases, it appears to approach a plateau (Fig. A19a). The distribution of variances among sites, however, is dramatically affected by changes in population size: for selection coefficients around ≈ 10 8Ö , a much greater proportion of variance comes from sites with large contributions than from those with both weaker and stronger selection coefficients (Fig. A19b). This behavior contrasts with the case of a constant population size, where for sufficiently strong selection ( > 5), the distribution of variances among sites is insensitive to the strength of selection (see Fig. 3b). Figure A19. The joint effects of selection and changes in populations size (as inferred for Europeans by Schiffels and Durbin (50)) on the distribution of genetic variances among sites. (a) The expected contribution to variance per site, both segregating and monomorphic, as a function of the (unscaled) selection coefficient. Variance is measured in units of 4 " / , the equilibrium expectation for a strongly selected site. (b) The cumulative variance arising from sites with contributions above a threshold (y-axis) as a function of the threshold (x-axis); cumulative variance is measured in units of 4 " / , while the threshold in units of 10 8Ö " / .
As we establish below, these findings can be understood as follows. The segregating sites with the largest contribution to current genetic variance are due to mutations with ≈ 10 8Ö that arose shortly before or during the Out-of-Africa bottleneck. Such mutations were under strong selection (i.e., with 2 ) ≈ 50) before the bottleneck, but with the drop to an effective population size of ) ≈ 4000 during the bottleneck, they experienced more relaxed selection (with 2 ) ≈ 10), allowing some of them to ascend to higher frequencies.
The durations of subsequent increases in population size, and of explosive growth in particular, were too short to allow for a substantial reduction in their frequencies (e.g., a mutation with = 10 8Ö that reached 20% frequency by the end of the bottleneck, 15 Kya, would have an expected frequency of 18% at present). As a result, these mutations would have large contributions to variance at present. Moreover, their site frequency spectrum and distribution of contributions to variance are well approximated by assuming a population size of ) ≈ 5000 -roughly the geometric mean of populations sizes from the beginning of the bottleneck to the present -and thus to scaled selection coefficients of 2 ) ≈ 10.
Extant segregating mutations under substantially stronger selection are expected to be much younger. They therefore tend to have arisen after the bottleneck, when the population size was considerably larger. As a result, they have much lower frequencies and per segregating site contributions to variance at present. The larger population size, however, will also increase the mutational input and thus the number of extant segregating sites; so long as selection is sufficiently strong, these effects balance each other such that the per site contribution to variance, counting both segregating and monomorphic sites, remains insensitive to changes in population size (53). In turn, extant segregating mutations under substantially weaker selection are expected to contribute much less variance per site (considering either segregating sites alone or all sites) primarily because of their smaller effect sizes, which is the same reason that applied in the case with a constant population size (see Fig. 2a).
We find support for this verbal argument when we relate the results of our simulations with the findings from GWAS. To do so, we follow the same reasoning that we applied to the case with constant population size (see Discussion). Namely, based on the distribution of variances (Fig. A19a), we would expect sites with selection coefficients around ≈ 10 8Ö to be the first to be discovered in GWAS. Further assuming that such sites account for most associations discovered in GWAS and that their distribution of variances corresponds to ) = 5000, we can use our estimates of ¦ for height and BMI to calculate the parameter " (= ½ ) ¦ ) for these traits. This approach allows us to plot the putative distribution of variances among sites that exceed the study thresholds, * , for different selection 54 coefficients ( Fig. A20a & b). Doing so, we find that the observed and fitted distributions are well approximated by the distributions for sites with ≈ 10 8Ö , thus supporting our premise that most of the explained variance arises from such sites, and that their distribution of variances is well approximated by assuming a constant population size of ) ≈ 5000. Our simulations also suggest that the proportion of variance explained for sites with ≈ 10 8Ö is much greater than the proportion for sites under weaker or stronger selection (Fig. A20c & d), and should therefore also be greater than the total proportion of variance explained by these GWAS. This expectation accords with our findings as well, with our simulations suggesting that the proportion of variance explained for sites with ≈ 10 8Ö is ~40% for height and ~30% for BMI (Fig. A20c & d) compared to a total proportion of ~25% for height and ~5% for BMI in these GWAS (8,10).
Examining the expected MAF and allelic ages at sites that we predict to have been identified by these GWAS lends further support to our interpretation ( Fig. A20e-h). Notably, we find that the MAF for sites with ≈ 10 8Ö that are predicted to have been identified by these studies are similar to those that are observed (Fig. A20e & f). Moreover, when we examine the ages of mutations at detected sites, we find that mutations at sites with ≈ 10 8Ö are predicted to have originated during or shortly before the OoA bottleneck ( Fig. A20g & h).
In summary, our analyses suggest that the bulk of associations identified in the GWAS for height and BMI tag segregating mutations with ≈ 10 8Ö , which originated shortly before or during the OoA bottleneck. As a result, we would expect the distribution of variances among these sites to be well approximated by our equilibrium predictions corresponding to an effective population of ) ≈ 5000. This finding provides an explanation for why our equilibrium predictions fit the findings of GWAS in Europeans, despite our ignoring the dramatic changes in population size during their recent evolutionary past. The age of mutations at discovered sites as a function of selection coefficient based on simulations. In (e-h), points correspond to the mean and whiskers span the 1 st to 3 rd quartiles of the distribution.

The effects of genotyping
Another implication of the demographic effects that we discussed in the last section (Section 9) pertains to the reliance on genotyping rather than resequencing in GWAS. As we reviewed in Section 6, current genotyping-based GWAS typically consider only loci with MAF > 1%, for which imputation is currently quite accurate, at least in Europeans (24).
Even if loci below that frequency were imputed with perfect accuracy, however, they would only be detected in a GWAS if they exceed the threshold contribution to variance for that study. Thus, loci at which the minor allele is rare would only be detected if they had very large effect sizes, which in our model implies very strong selection. For example, assuming a constant population size, if a re-sequencing study captured 25% of the heritable variance, a genotyping study with the same sample size would suffer a ≥ 50% decrease in explained heritability only if ≥ 200 ( Fig. A21a & b)For an effective population size of 2 ⋅ 10 • for humans (50), that implies an enormous fitness cost of ≥ 0.5% (in heterozygotes) for the minor allele.
Our results for European demographic history suggest that only a small proportion of genetic variance can arise from loci that fall below the current MAF imputation threshold but have sufficiently large effect sizes to exceed the variance discovery thresholds of current GWAS. To illustrate that, we relied on our simulation results and estimates of ¦ for height and BMI, to calculate the proportion of variance arising from sites with MAF < 1% and contribution to variance > * . We find this proportion to be greater than 0 only for selection coefficients between = 0.3 ⋅ 10 8" and = 2 ⋅ 10 8" , but even within this range, such loci account for less than ~6% of the expected variance for height and 2% of the variance for BMI (Fig. A21c & d). This suggests that the common reliance on genotyping in current GWAS for quantitative traits entails minimal loss in the discovery of associations relative to resequencing. Moreover, this is likely to remain the case even when GWAS sizes substantially increase, as long as such increases are accompanied by reasonable increases in the size of imputation panels. Figure A21. The heritability explained in resequencing and genotyping studies as a function of selection coefficient. (a-b) The case with a constant population size, in the high pleiotropy limit (a) and without pleiotropy (b). The study size was chosen such that a resequencing study would capture 25% of the strongly selected variance: implying a study size of ~16 ó / ¦ in the highly pleiotropic limit (a), and a study size of ~43 ó / ¦ without pleiotropy (b). (c-d) The case with European demographic history and high pleiotropy, using our estimates of ¦ for height (c) and BMI (d) (see Section 9 for details).