A model for the genetic architecture of quantitative traits under stabilizing selection

Genome-wide association studies (GWAS) in humans are revealing the genetic architecture of biomedical, life history and anthropomorphic traits, i.e., the frequencies and effect sizes of variants contributing to heritable variation in a trait. To interpret these findings, we need to understand how genetic architecture is shaped by basic population genetics processes - notably, by mutation, natural selection and genetic drift. Because many quantitative traits are subject to stabilizing selection and genetic variation that affects one trait often affects many others, we model the genetic architecture of a focal trait that arises under stabilizing selection in a multi-dimensional trait space. We solve the model for the phenotypic distribution and allelic dynamics at steady state and derive robust, closed form solutions for summaries of genetic architecture. Our results suggest that the distribution of genetic variance among the loci discovered in GWAS take a simple form that depends on one evolutionary parameter, and provide a simple interpretation for missing heritability and why it varies among traits. We test our predictions against the results of GWAS for height and body mass index (BMI) and find that they fit the data well, allowing us to make inferences about the degree of pleiotropy and the mutational target size. Our findings help to understand why GWAS for height explain more of the heritable variance than similarly-sized GWAS for BMI, and to predict how future increases in sample size will translate into explained heritability.

Much of the phenotypic variation in human populations, including variation in morphological, life history and biomedical traits, is "quantitative", meaning that heritable variation in the trait is largely due to small contributions from many genetic variants segregating in the population (1,2). Quantitative traits have been studied since the birth of biometrics over a century ago (1)(2)(3), but only in the past decades have technological advances made it possible to systematically dissect their genetic basis (4)(5)(6). Notably, since 2007, genome-wide association studies (GWAS) in humans have led to the identification of thousands of variants reproducibly associated with hundreds of quantitative traits, including susceptibility to a wide variety of diseases (4). While still ongoing, these studies already provide important insights into the genetic architecture of quantitative traits, i.e., the number of variants that contribute to heritable variation and their frequencies and effect sizes.
Perhaps the most striking observation to emerge from these studies is that, despite the large sample size of many GWAS, all variants significantly associated with any given trait typically account for less (often much less) than 25% of the narrow sense heritability (4,7,8) (but see (9)). (Henceforth, we use "heritability" to refer to narrow sense heritability). While many factors have been hypothesized to contribute to the "missing heritability" (7,8,(10)(11)(12)(13)(14), the most straightforward explanation and the emerging consensus is that much of the heritable variation derives from variants with frequencies that are too low or effect sizes that are too small for current studies to detect. Comparisons among traits also suggest that there are substantial differences in architectures. For example, recent meta-analyses GWAS uncovered seven times as many variants for height (697) than for body mass index (97), and together the variants for height account for more than four times the heritable variance for body mass index (~20% vs. ~3-5%, respectively), despite the smaller sample size of the GWAS (250k compared to 340k, respectively) (15,16).
These first glimpses underscore the need for theory that explains how genetic architectures are shaped in the course of evolution and why they might differ among traits.
Such theory should allow us to use GWAS findings to make inferences about underlying evolutionary parameters, helping to answer enduring questions about the processes that maintain phenotypic variation in quantitative traits (5,17). From a practical perspective, it will help to interpret GWAS findings and specifically it may inform solutions to the "missing heritability" problem (18)(19)(20)(21). In so doing, it will inform the design of future mapping studies and phenotypic predictions (18,(20)(21)(22).
Development of such theory can be guided by empirical observations and first-principles considerations. First, we know that the architecture of a trait arises from genetic and population genetic processes. New mutations affecting a trait arise at a rate that depends on its "mutational target size" (i.e., the number of sites at which a mutation would affect the trait). Once they arise, the trajectories of variants through the population are determined by the interplay between genetic drift, demographic processes, and natural selection acting on them. These processes determine the number and frequencies of segregating variants underlying variation in the trait. The genetic architecture further depends on the relationship between the selection on variants and their effects on the trait. Notably, selection on variants depends not only on their effect on the focal trait but also on their pleiotropic effects on other traits. We would therefore expect both direct and pleiotropic selection to shape the joint distribution of allele frequencies and effect sizes.
Multiple lines of evidence suggest that many quantitative traits are subject to stabilizing selection, i.e., selection favoring an intermediate trait value (5,(23)(24)(25)(26)(27). For instance, a decline in fitness components (e.g., viability and fecundity) is observed with displacement from mean values for a variety of traits in human populations (28)(29)(30), in other species in the wild (31, 32) and in experimental manipulations (31, 33). While less is known about the selection acting on complex diseases, some likely reflect large deviations from optimal values of underlying traits under stabilizing selection. What remains unclear is the extent to which stabilizing selection is acting directly on variation in a given trait or results from pleiotropic effects of this variation on other traits.
Yet other lines of evidence suggest that pleiotropy is pervasive. For one, theoretical considerations about the variance in fitness in natural populations and its accompanying genetic load suggest that only a moderate number of independent traits can be effectively selected on at the same time (34). Thus, the aforementioned relationships between the value of a focal trait and fitness are likely heavily affected by the pleiotropic effects of genetic variation on other traits (26,(34)(35)(36). Second, many of the variants detected in human GWAS have been found to be associated with more than one trait (37-41). For example, a recent analysis of GWAS revealed that variants that delay the age of menarche in women tend to delay the age of voice drop in men, decrease body mass index, increase adult height, and decrease risk of male pattern baldness (37). More generally, the extent of pleiotropy revealed by GWAS appears to be increasing rapidly with improvements in the power and methodology (37,(42)(43)(44). Such considerations point to the potential importance of pleiotropic selection on quantitative genetic variation for many traits.
Moreover, the discoveries emerging from human GWAS suggest that genetic variance is predominated by additive contributions from numerous variants with small effect sizes.
Specifically, most or all of the heritability explained in GWAS of many traits derives from variants with squared effect sizes that are substantially smaller than the total genetic variance (e.g. (15,16,45,46)). Moreover, statistical quantifications of the genetic variance tagged by genotyping suggest that such variants may account for the bulk of heritable variance in many traits (9,47,48). Other analyses of GWAS results suggest that nonadditive interactions, e.g., dominance and epistasis, have only minor contributions to heritable variance, consistent with theory and other lines of evidence (49-55). Notably, considerable efforts to detect epistatic interactions in human GWAS have, by and large, came up empty-handed (9,56,57), with few counter-examples mostly involving variants in the MHC region (52, 56, 58, 59). Motivated by these considerations, we model how direct and pleiotropic stabilizing selection shape the genetic architecture of continuous, quantitative traits by considering additive variants of small effects and assuming that together they account for most of the heritable variance.
There has been relatively little theoretical work aimed at relating population genetics processes with the results emerging from GWAS. Moreover, the few existing models have reached divergent predictions about genetic architecture, largely because of making different assumptions about the effects of pleiotropy. Pritchard (20) considered the "purely pleiotropic" extreme, in which selection on variants is independent of their effect on the trait being considered (he focused on disease susceptibility). In this case, we would expect the largest contribution to genetic variance in a trait to come from mutations that have large effect sizes but are also weakly selected or neutral, allowing them to ascend to higher frequencies than other mutations. Other studies have considered the opposite extreme, in which selection on variants stems entirely from their effect on the trait under consideration (27, 60-64), and have shown that the greatest contribution to genetic variance arises from strongly selected mutations (61, 62) (we return to this case below).
In practice, we would expect most traits to fall somewhere in between these two extremes.
While there are compelling reasons to believe that quantitative genetic variation is highly pleiotropic, the effects of variants on different traits are likely to be correlated. If so, even if a given trait is not subject to selection, variants that have a large effect on it will also tend to have larger effects on traits that are under selection (e.g., by causing a large perturbation to pathways that affect multiple traits) ( conclude the greatest contribution should arise from weakly selected ones (19). Their conclusions differ because of how they chose to model the relationship between selection and effect size, a choice based largely on mathematical convenience. We approach this problem by explicitly modeling stabilizing selection on multiple traits, thereby learning, rather than assuming, the relationship between selection and effect sizes.

The Model
We model an individual's phenotype as a vector in an n-dimensional Euclidian space, in which each dimension corresponds to an additive, continuous, quantitative trait. We focus on the architecture of one of these traits (say, the 1 st dimension), where the total number of traits parameterizes pleiotropy. Fitness is assumed to decline with distance from the optimal phenotype positioned at the origin, thereby introducing stabilizing selection into the model. Specifically, we assume that absolute fitness takes the form where is the (n-dimensional) phenotype, = is the distance from the origin and w parameterizes the strength of stabilizing selection. However, we later show that the specific form of the fitness function doesn't matter. Moreover, the additive environmental contribution to the phenotype can be absorbed into w (see (66) and SI Section 1.1); we therefore consider only the genetic contribution.
We further assume the standard population dynamic for a diploid, panmictic population of constant size N, including Wright-Fisher sampling with viability selection (as described by where L is the number of sites with fixed or segregating mutations, : is the phenotypic difference between homozygotes for the ancestral and derived alleles at site l (because of the infinite sites assumption, there are at most two alleles) and : = 0, 1 or 2 is the number of derived alleles at site l.

Results
The phenotypic distribution. In the first three sections, we develop the tools that we later use to study genetic architecture. We start by considering the equilibrium distribution of phenotypes in the population and generalize previous results for the case with a single trait (27, 60, 61, 64). Under biologically sensible conditions on the rate and size of mutations (i.e., when 1 ≫ ≫ 1/2 and 1 ≪ 1 ; SI Section 4), this distribution is well approximated by a tight multivariate normal centered at the optimum, namely by the probability density: where the variance of the distribution satisfies 1 ≪ 1 (see SI Section 4). Intuitively, the phenotypic distribution is normal because it derives from additive and (approximately) i.i.d. contributions from many segregating sites. The distribution stays tightly concentrated around the optimum because stabilizing selection becomes stronger with increasing displacement from it, and because the population responds to such selection by minor changes to allele frequencies at many segregating sites, rapidly offsetting the displacement.
With phenotypes close to the optimum, only the curvature of the fitness function at the optimum (i.e., the multi-dimensional second derivative) affects the selection acting on individuals. In addition, it is always possible to choose an orthonormal coordinate system centered at the optimum, in which the trait under consideration varies along the first coordinate and a unit change in other traits (in other coordinates) near the optimum have the same effect on fitness. These considerations suggest that the equilibrium behavior is insensitive to our choice of fitness function. Moreover, in the SI (Section 5), we show that perturbations of the population mean from the optimum are rapidly offset by minor changes to allele frequencies at numerous sites, and that this behavior lends robustness to the equilibrium dynamics with respect to the presence of major loci, changes in the optimal phenotype over time, and moderate asymmetries in the mutational distribution.
Allelic dynamic. Next, we consider the dynamics at a segregating site, and generalize previous results for the case with a single trait (62-64). The dynamics can be described in terms of the first two moments of change in allele frequency in a single generation (cf. (67)). For an allele with phenotypic effect and frequency q, the distribution of contributions to the phenotype from the genetic background, , follows from the equilibrium phenotypic distribution (Eq. 3) and is well approximated by probability By averaging the fitness of the three genotypes over the distribution of genetic backgrounds, we find that the first moment is well approximated by assuming that 1 and 1 ≪ 1 (cf. SI Section 4). By the same token, we find that which is the standard second moment with genetic drift.
The functional form of the first moment is equivalent to that of the standard viability selection model with under-dominance. This result is a hallmark of stabilizing selection on (additive) quantitative traits: with the population mean at the optimum, the dynamics at different sites are decoupled and selection at a given site reduces the phenotypic variance (i.e., ½a 2 pq), thereby pushing rare alleles to loss. Comparison with the standard viability selection model shows that the selection coefficient in our model is s=a 2 /4w 2 , or S=2Ns=Na 2 /2w 2 in scaled units. In other words, the selection acting on an allele is proportional to its size-squared in the n-dimensional trait space (where w translates effect size into units of fitness).
The relationship between selection and effect size. The statistical relationship between the strength of selection acting on mutations and their effect on a given trait follows from the aforementioned geometric interpretation of selection. Specifically, all mutations with a given selection coefficient, s, lie on a ( − 1) -dimensional hypersphere with radius = 2 , and any given mutation satisfies where [ is the allele's effect on the i-th trait (Fig. 1A). Our assumption that mutation is isotropic then implies that the probability density of mutations on the hypersphere is uniform. Fig. 1. The distribution of effect sizes corresponding to a given selection coefficient. (a) Mutations with selection coefficient, s, lie on a ( − 1)-dimensional hypersphere with radius = 2 ; and the probability of such mutations with effect size a1 is proportional to the volume of the ( − 2)-dimensional cross section of that hyper-sphere with projection 8 . (b) The distribution of effect sizes corresponding to a given selection coefficient, measured in units of the distribution's standard deviation.
The distribution of effect sizes on a focal trait, a1, corresponding to a given selection coefficient, s, follows. Given that mutation is symmetric in any given trait, 8 = 0, and given that it is symmetric among traits, More generally, the probability density corresponding to an effect size a1 is proportional to the volume of the ( − 2) -dimensional cross section of that hyper-sphere with projection a1 (Fig. 1A). For a single trait, this implies that 8 = ± with probability ½, and for > 1, it implies the probability density (cf. SI Section 1.2). Intriguingly, when the number of traits n increases, this density approaches a normal distribution, i.e., This limit is already well approximated for a moderate number of traits (e.g., n=10; Fig. 1B).
The limit behavior also holds when we relax the assumption of isotropic mutation. This generalization is an important one, because having chosen a parameterization of traits in which the fitness function near the optimum is isotropic, we can no longer assume that the distribution of mutations is also isotropic (68). Specifically, mutations might tend to have larger effects on some traits than on others and their effects on different traits might be correlated. In SI Section 5, we show, however, that excluding pathological cases, the limit distribution (Eq. 10) also holds for anisotropic mutation. To this end, we introduce the concept of an effective number of traits, which can take any real value ≥1 and is defined as the number of traits, with expected fitness effect equal to that of the focal trait, required to satisfy Eq. 8. In the limit of high pleiotropy, the selection strength on a mutation and the magnitude of its effect on a focal trait are correlated, implying that the kind of "purely pleiotropic" extreme postulated in previous work cannot arise under our model (19)(20)(21).
Mounting evidence that genetic variation is highly pleiotropic (see Introduction) alongside the robustness of our model raise the intriguing possibility that this limit form applies quite generally.
Genetic architecture. We can now derive closed forms for summaries of genetic architecture (cf. SI Section 2). For mutations with a given selection coefficient, the frequency distribution follows from the diffusion approximation based on the first two moments of change in allele frequency (Eq. 5 and 6; (67)), and the distribution of effect sizes follows from the geometric considerations of the previous section. Conditional on the selection coefficient, these distributions are independent and therefore the joint distribution of frequency and effect size equals their product. Summaries of architecture can be expressed as expectations over the joint distribution of frequencies and effect sizes for a given selection coefficient, and then weighted according to the distribution of selection coefficients. Given that we know little about the distribution of selection coefficients of mutations affecting quantitative traits, we examine how summaries depend on the strength of selection.
Expected variance per site. We focus on the distribution of additive genetic variance among sites, a central feature of architecture that is key to connecting our model with GWAS results. We start by considering how selection affects the expected contribution of a site to additive genetic variance in a focal trait. We include monomorphic sites in the expectation, such that the expected total variance is given by the product of the expectation per-site and the population mutation rate (i.e., 2NU). Under the infinite sites assumption, sites can be monomorphic or bi-allelic and their expected contribution is (expressed in terms of the scaled selection coefficient S). Thus, the degree of pleiotropy only affects the expectation through a multiplicative constant. This effect of pleiotropy is not identifiable from data, because even if we could measure genetic variance in units of fitness (e.g., as opposed to units of the total phenotypic variance), we still wouldn't be able to distinguish between w and n. We instead focus on the proportional effect of selection on the contribution to variance, which is insensitive to the degree of pleiotropy.
The proportional contribution to genetic variance as a function of the selection coefficient was described by Keighley and Hill (in the one dimensional case; (62)) and is shown in Fig.   2A. When selection is strong (S>>1), its effect on allele frequency (which scales with 1/S) is canceled out by its relationship with the effect size, yielding a constant contribution to genetic variance per site, vs=2w 2 /(nN), regardless of the selection coefficient (SI Section 3; Fig. 2A). Henceforth, we measure genetic variance relative to vs. When selection is effectively neutral ( ≪ 1) and thus too weak to affect the allele frequency, the expected contribution of a site to genetic variance scales with the effect size and is ½S, which is considerably lower than under strong selection (SI Section 3; Fig. 2A). In between these selection regimes, allele frequency is affected by under-dominance, resulting in a more complex dependency on the selection coefficient (SI Section 3). The maximal contribution to genetic variance per site occurs within this range (at ≈ 10) and is slightly higher (by ~30%) than under strong selection ( Fig. 2A). Henceforth, we refer to this selection regime as intermediate, to distinguish it from the considerably weaker selection in the nearly neutral regime. These results suggest that effectively neutral sites should contribute much less to genetic variance than intermediate and strongly selected ones (61, 62).

Fig. 2.
The distribution of additive genetic variance among sites. In (a), we plot the expected contribution as a function of the scaled selection coefficient. In (b) & (c), we show the proportion of genetic variance stemming from sites that contribute more than the value on the x-axis, for a single trait (b) and in the pleiotropic limit (c).
Distribution of variance among sites. Next, we consider how genetic variance is distributed among sites with a given selection coefficient. We focus on the distribution among segregating sites (including monomorphic effects would just add a point mass at 0).
This distribution is especially relevant to interpreting the results of GWAS, because, to a first approximation, a study will detect only sites with contributions to variance exceeding a certain threshold (which decreases as the study size increases; see Discussion). We therefore represent the distribution in terms of the proportion of genetic variance, G(v), arising from sites whose contribution to genetic variance exceeds a threshold v.
We begin with the case of a single trait (n=1), in which selection on an allele determines its effect size (Fig 2B). When selection is strong (S>>1), the proportion of genetic variance exceeding a threshold v is also insensitive to the selection coefficient and takes a simple form, with (SI Section 3). In contrast, in the effectively neutral range ( ≪ 1), The effect of pleiotropy is to cause sites with a given selection coefficient to have a distribution of effect sizes on the focal trait, thereby increasing the contribution to genetic variance of some sites and decreasing it for others. In SI Section 3, we show that increasing the degree of pleiotropy, n, increases the proportion of genetic variance, , for any threshold, v, regardless of the distribution of selection coefficients (Fig. S2). When variation in a trait is sufficiently pleiotropic for the distribution of effect sizes to attain the limit form (Eq. 10): for strongly selected sites and for effectively neutral ones (Fig. 2C). For intermediately selected sites, is similar to the case of strong selection, with measurable differences only when ≫ r (see inset in Fig. 2C and SI Section 3). We would therefore expect that as the sample size of GWAS increase and the threshold contribution to variance decreases, intermediate and strongly selected sites will be discovered first, and effectively neutral sites will be discovered much later. In the SI (SI Section 3 and Fig. S13), we also derive corollaries of these results for the distribution of numbers of segregating sites that make a given contribution to genetic variance.

Discussion
In humans, GWAS for many traits display a similar behavior: when sample sizes are small, the studies discovered almost nothing, but once they exceeded a threshold sample size, both the number of associations discovered and the heritability explained suddenly began to increase rapidly (4,69). Intriguingly though, both the threshold study size and rate of increase vary among traits. These observations raise several questions, including: How is the threshold study size determined? How should the number of associations and explained heritability increase with study size once this threshold is exceeded? With an order of magnitude increase in study sizes into the millions imminent, how much more of the genetic variance in traits should we expect to explain? The theory that we developed offers tentative answers to these questions.
To relate our theory to GWAS, we must first account for the power to detect loci that contribute to quantitative genetic variation. In studies of continuous traits, the power can be approximated by a step function in the contribution of loci to additive genetic variance (ignoring the effects of genotyping, which are considered below, and some other potential complications, e.g. (70)). In other words, loci that contribute more than a threshold value * should be detected and those that contribute less should not. The threshold is affected by the study size, , and by the total phenotypic variance in the trait, v , where * , v ∝ v / (SI Section 6) (69). Given a trait and study size, the number of associations discovered and heritability explained should then follow from the tails of the distributions that we have considered, corresponding to the threshold * , v .
Our results therefore suggest that when genetic variation in a trait is sufficiently pleiotropic, the first loci to be discovered in GWAS will be intermediate or strongly Once the threshold study size is exceeded, the increase in the number of associations and explained heritability with study size directly follow from the functional forms that we derived for intermediate and strongly selected sites (Eq. 14 and S41 and Figs. 2C, S13B, 3A and S15A). The trait specific behavior boils down to the same parameter, v / r , relating the study size with the threshold * . Some results are modified when variation in a trait is only weakly pleiotropic, which is probably less common: notably, intermediate selected sites would begin to be discovered only after strongly selected ones (cf. Fig. 2B, S13A, and S14) and the threshold study size for strongly selected sites would be higher (cf. Eq. 12 and Fig.   S14A). The heritability explained in re-sequencing and genotyping studies as a function of the scaled selection coefficient. The study size was chosen such that a re-sequencing study would capture 25% of the strongly selected variance (corresponding to a study size of ~16 v / r ). For the case without pleiotropy, see Fig. S14A-B.
Our results further suggest that for missing heritability to be mainly due to the limited power of genotyping to detect rare variants (14,71), selection on the genetic variance in a trait would have to be very strong. GWAS impute allele frequencies at loci that are not included in the genotyping array, and the accuracy of this imputation declines rapidly when minor allele frequencies become too small to be well represented in the imputation panel (72, 73). This decline in power is dealt with in several ways, all of which are similar in effect to considering alleles above a threshold frequency that allows them to be imputed accurately (74). For example, a study using an Illumina 1M SNP array and the 1000 (75). Even if loci below that frequency were imputed with perfect accuracy, however, they would only be detected if their contribution to variance exceeded the study's threshold, implying that they would also have to have extremely large effect sizes; and under our model, extremely large effect sizes imply very strong selection (Fig. 3B). As an illustration, if a re-sequencing study identified 25% of the strongly selected (S>>1) variance, a genotyping study with the same sample size would suffer a 50% decrease in explained heritability only if ≈ 200 (assuming the same genotyping platform and imputation panel as above). If we further assuming a constant effective population size of 10 4 this would imply a (heterozygote) selection disadvantage of 1%. We note, however, that recent population growth of the kind that has been inferred for European populations (76, 77) is expected to reduce the frequency of alleles under very strong selection (77, 78), leading to a greater loss of power due to genotyping (see below).
While some heritable variation may also arise from effectively neutral mutations, our results indicate that their expected contribution to genetic variance per site is much smaller than for strongly selected and intermediate ones ( Fig. 2; (61, 62)). One implication is that for the contribution of effectively neutral variation to be comparable to that of intermediate and strongly selected variation, they would need to have a much larger mutational target size. It is difficult to evaluate whether this is plausible, because current GWAS are likely severely under powered to detect them (but see (79)). For example, sites with S=0.1 are expected to contribute only 1/20th of the genetic variance that arises from strongly selected ones; in the high pleiotropy limit, only 1% of that variance would be mapped when study sizes are large enough to map 95% of the strongly selected variance.
Importantly, these theoretical predictions can be used to make inferences based on GWAS results. As an illustration, we consider height and body mass index (BMI) in Europeans, two traits for which GWAS have discovered a sufficiently large number of genome-wide significant associations to allow for well powered tests and inferences (697 for height (16) and 97 for BMI (15)). We fit our theoretical predictions for the distribution of variance among loci to the distributions observed for the genome-wide significant associations reported for each of these traits (cf. Supplement Section 7 for details). In so doing, we assume that the mapped loci are under intermediate or strong selection, because our theory suggests that they should be the first to be mapped. We also make the plausible assumption that genetic variance for these traits is highly pleiotropic (see Introduction; 37, 42) and therefore assume the limit form for the distribution of effect sizes. Under these assumptions, we expect the distribution of variance among loci to be well approximated by a simple form (Eq. S89), which depends on a single parameter (vs). When we estimate this parameter, we find that the theoretical distribution fits the data well (Fig. 4A). Specifically, we cannot reject our model based on the data, suggesting that with only one parameter, we obtain good fits to the data for both traits (by a Kolmogorov-Smirnov test, = 0.14 for height and = 0.54 for BMI; cf. Supplement Section 7). By comparison, without pleiotropy (n=1), our predictions provide a poor fit to these data (by a Kolmogorov-Smirnov test, < 10 cx for height and = 0.05 for BMI; Fig. S11).

Fig. 4.
Model fit and predictions for height and BMI. In (a), we show the fit for associated loci. In (b), we present our predictions for future increases in the heritability explained with GWAS size. The predicted number of associations is shown in Fig. S16. 95% CIs are based on bootstrap; see Supplement Section S7 for details. By fitting the model to GWAS results, we can also make inferences about evolutionary parameters underlying quantitative genetic variation. Estimating the degree of pleiotropy (n) as an additional parameter in the model, we find that for both height and BMI, n is sufficiently large for it to be indistinguishable from the high pleiotropy limit. Based on the shape of the fitted distributions in this limit and the number of loci that fall above the threshold contribution to variance of the studies, we can also estimate the mutational target size and proportion of the heritable variance arising from mutations within the range of selection coefficients visible in these GWAS. These estimates suggest that, within this range, height has a target size of ~5Mb, which account for ~50% of the heritable variance, whereas BMI has a target size of ~1Mb, which explains ~15% of the variance (see SI Table S2).
These parameter estimates can help to interpret GWAS results and make predictions about future studies. They suggest that the GWAS for height succeeded in mapping a substantially greater proportion of the heritable variance (~20% compared to ~3-5%) despite their smaller sample size (~250K compared to ~340K), because the proportion of variance arising from mutations within the range of detectable selection effects is much greater for height than for BMI (~50% compared to ~15%). Furthermore, the estimates of target sizes and the relationship between sample size and threshold contribution to variance can be used to predict how the explained heritability and number of associations should increase with sample size (Figs. 4B and S16). These predictions are likely under-estimates as the range of selection effects itself should also increase with sample size.
To make these inferences and predictions more reliable, an important next step will be to account for the effects of recent human demography on the distributions of genetic variance (78, 80-82). Notably, the recent growth of European populations (76, 77) would have affected the distribution of genetic variance arising from very strongly selected mutations. Specifically, it will have reduced its expectation, because many of these mutations would have entered the population since the onset of growth and thus will be at lower frequencies (77, 78, 82). This consideration alongside the effect of genotyping would suggest that the range of selection effects revealed by current GWAS is in fact bound from above, and includes only moderately selected loci. However, moving beyond qualitative considerations would require incorporating the effects of demography into the model (77, 78, 82). Doing so may allow us to predict the outcome of future GWAS as well as to learn about the relative importance of different evolutionary and genetic forces in shaping quantitative genetic variation for different traits.  (7)

1.
The model

Absorbing the environmental contribution into the fitness function
Here, we show that the additive environmental contribution to the phenotype can be absorbed into the fitness function, which justifies our considering only the additive genetic contribution in our analysis. This result has been derived multiple times for the one dimensional case (e.g., (1)). The argument in the multidimensional case is similar and included for completeness.
First, assume that the additive environmental contribution to the phenotype, * , is (S1) Given that absolute fitness is defined up to a multiplicative constant, we can therefore absorb the additive environmental contribution by using the Gaussian fitness function where " = " + * . Even when the environmental contribution is anisotropic, we can always choose a coordinate system in which the effective fitness function takes an isotropic form around the fitness peak (Eq. 1).

The distribution of mutational effect sizes on a given trait
In the main text, we define the distribution of phenotypic effects of newly arising mutations in the n-dimensional trait space, . Here, we consider the projection of these effects on a given trait, 0 , taken without loss of generality to be on the 1 st dimension. The distribution of effect sizes on a focal trait will depend on the degree of pleiotropy, n, and the form of this dependency becomes important when we consider how pleiotropy affects genetic architecture.
We want to calculate the distribution of effect sizes on the focal trait, 0 , conditional on their overall effect, = . We assume that the distribution of effects of de novo mutations is isotropic in trait space. The effect of a mutation, , therefore has equal probability to occupy any point on an n-dimensional sphere with radius . Let S I denote the surface area of an m-dimensional sphere of radius and denote the angle between the vector and its projection 0 , i.e., 0 = cos . In these terms, the surface area element corresponding to angle is and by a change of variables, the surface area element corresponding to projection 0 on the focal trait is . This result implies that the probability density of 0 is (for a similar derivation, see (2)).
Next, we consider the high pleiotropy limit form of this distribution. For any degree of pleiotropy, the symmetry of the mutational distribution implies that As we elaborate in the main text, important implications about quantitative genetic variation follow from this high pleiotropy limit. The limit also holds quite generally when the distribution of effect sizes is anisotropic (see Section S5.4).

Solving for summaries of genetic architecture
Here, we derive closed forms for summaries of genetic architecture under our model. We begin by deriving the first two moments of change in allele frequency in a single generation. With these moments at hand, we use the diffusion approximation to calculate the sojourn time for alleles that contribute to quantitative genetic variation (3). Together with the distribution of effect sizes derived in the previous section, the sojourn time allows us to obtain closed forms for summaries of genetic architecture. Specifically, we can obtain a closed form for any summary that can be described as a function of allele frequencies and effect sizes at sites contributing to quantitative genetic variation. We use these expressions to calculate the summaries used in the main text, for example the expected additive genetic variance and its distribution across sites.

The first two moments of change in allele frequency
We assume that: • The phenotypic distribution at steady state is well approximated by an isotropic multivariate normal distribution centered at the optimum, namely by the probability density is (S10) • That both " and " ≪ " .
These assumptions are justified in Section S4.
We rely on these assumptions to calculate the first two moments of change in frequency in a single generation for an allele with phenotypic effect and frequency q. The fitnesses of the three genotypes at the site depend on its distribution of genetic backgrounds, i.e., on the total phenotypic contribution of sites other than the focal one . Following Eq. S10 and assuming every allele contributes only a small proportion of the genetic variance, the distribution of is well approximated by . (S14) The first moment of change in allele frequency is then relying on our assumptions that " and " ≪ " . The functional form of the first moment is equivalent to that of the standard viability selection model with underdominance and selection coefficient = S 5 sA 5 or scaled selection coefficient Similarly, we find that which is the standard second moment with genetic drift.

Sojourn time
Based on the first two moments, we can use the diffusion approximation to calculate the sojourn time as a function of allele frequency, i.e., the density of the time that an allele spends at a given frequency before it fixes or is lost ( where erf is the error function and f ± , ≡ erf 2 ± erf 1 − 2 2 . The sojourn time takes simple limiting forms when selection is effectively neutral ( ≪ 1) or strong ( ≫ 1). In the effectively neutral range, it is well approximated by τ = " k , and in the strongly selected range, it is well approximated by τ = " k exp − .

Calculating expectations of summaries of architecture
Many summaries of interest can be expressed as sums over segregating sites of some function ( , 1), where is the derived allele frequency and a1 is the effect size on the trait. For example, the additive genetic variance in a trait is given by the sum of ( , 0 ) = ½ 0 " (1 − ) over sites. The expectation over such summaries can be expressed as where is the summery summed over all sites, 2NU is the population mutation rate per generation and ( , 0 ) is the density of sites with the corresponding frequency and effect size per unit mutational input.
The density ( , 0 ) can be broken down into contributions from sites with different selection coefficients, i.e., where f( ) is the distribution of selection coefficients and is the sojourn time of a mutation with selection coefficient (Eq. S18). The probability density η 0 of effect sizes given selection coefficient follows from Eqs. S5 and S16 This allows us to break down our summaries into contributions from sites with different selection coefficients We use Eq. S23 to study how summaries of architecture depend on the strengths of selection, and how these summaries will depend on different distributions of selection coefficients. This allows us to draw general implications about genetic architecture despite our limited knowledge about this distribution.

Additive genetic variance and number of segregating sites
The distributions of additive genetic variance and of the number of segregating sites are critical to understanding genetic architecture and specifically to interpreting results of GWAS. Here we derive closed forms for both distributions as well as simple approximations under strong and effectively neutral selection.

Expectations
We begin by considering the expected contribution of a site to additive genetic variance. Substituting the contribution to variance from a single site ( , 0 ) = 0 " 0 " (1 − ) into Eq. S23, we find that The total additive genetic variance is The closed form for E (Eq. S24) was integrated numerically to obtain Fig. 2a in the main text. We can use the results of (4) to obtain an analytic approximation for E : with erfi being the imaginary error function (erfi = erf / ).
In the effectively neutral and strong selection limits, we can use simple limiting forms for the sojourn time to derive simple forms for . In the effectively neutral limit, τ ≈ 2/ and so

Densities
Here, we consider how the additive genetic variance is distributed among sites. The density of segregating sites with a given contribution to variance v can be calculated by substituting Dirac's delta function δ − ½ 0 " 1 − into Eq. S23, yielding where q ± , 0 = 0 " 1 ± 1 − 8 / 0 " are the two frequencies that yield = 0 " 0 " 1 − . This integral can be calculated numerically for any S and η 0 (e.g., for different degrees of pleiotropy).
As we discuss in the main text and in Section S6, the tails of this distribution are especially important for the interpretation of GWAS. Notably, to a first approximation, the loci captured in a given GWAS would be those whose contribution to additive variance exceeds some threshold contribution * .
We are particularly interested in the proportion of additive genetic variance captured in GWAS, which we approximate in terms of the additive variance carried by sites in the > * tail. The expected variance in such a tail is and its proportion of the total additive genetic variance is For cases without pleiotropy or with extensive pleiotropy, we can greatly simplify these expressions in the limits of effectively neutral and strong selection. We illustrate how these approximations can be obtained for the proportion of additive variance in a tail, G * (Table S1). When selection is effectively neutral, then   A noteworthy property of the proportion of variance from sites with contributions > * , G * | , is that it always appears to increase with the degree of pleiotropy, n ( Figure S2). We do not have a proof for this property but can suggest an intuitive explanation. Without pleiotropy (n=1), the selection coefficient determines the effect size, such that any contribution * to genetic variance corresponds to a specific minor allele frequency * . The sites with contributions > * are therefore those with minor allele frequencies > * . Pleiotropy causes sites with a given selection coefficient to have a distribution of effect sizes on the trait under consideration. As a result, some sites with frequencies above * end up with contributions to variance below * while others exceed * . To understand how this affects G * | , recall that for any selection coefficient, the density of variants always rapidly increases as * decreases. As long as the contribution * and the corresponding frequency without pleiotropy * are not close to 0, we may therefore expect that introducing pleiotropy would result in pushing more sites above * than those pushed below * , resulting in a net increase to the proportion G * | . Lastly, we consider how the number of segregating sites depends on their contribution to genetic variance vary as a function of this contribution (Fig. S13).
The number of segregating sites K with > * per unit mutational input is and therefore the distribution of variance at these sites is We use these results below, when we estimate model parameters from GWAS results (Section S7). They also allow us to translate the properties of the density of genetic variance into properties of the density of sites (Fig. S13). Notably, by applying the same reasoning one can show that the density of sites is insensitive to the distribution of strong selection coefficients and pleiotropy always increases the number of sites above * (i.e., for any * >0).

Comparing predictions against simulations
We tested our theoretical derivations for the total genetic variance and its Our theoretical derivations for both the total genetic variance and its distribution among sites agree well with results from simulations (Fig. S3). Notably, we found that as long as 1 2 ≪ " / " ≪ 1 (cf. Section S3.3), our theoretical predictions for the total genetic variance (Eq. S25) are indistinguishable from the simulation results ( Fig. S3A). Theoretical and simulation results seem to agree even when " / " ≤ 1/2N, although we consider this range to be of lesser biological relevance (cf. Section S4.4). We tested our predictions for the distribution of variance across sites in terms of G * (Eq. S34), i.e., the proportion of the variance arising from sites that contribute more than * (Fig. S3B), and also found them to be highly accurate.

Justification for assumptions
Here, we justify the assumptions that we relied upon in deriving the first two moments of change in allele frequency (cf. Section S2.1; modeling assumptions are motivated in the introduction to the main text). We rely in part on self-consistency arguments, which should not be mistaken for being circular: specifically, we make assumptions about the behavior of the system and show that the solution to which we arrive satisfies these assumptions.

Normal and isotropic phenotypic distribution around the optimum
The assumption that the phenotypic distribution is well approximated by a normal distribution stems from an additive model of quantitative traits. By assuming that the phenotype arises from many additive contributions and that these additive contributions arise from some underlying distribution, normality follows from the law of large numbers. In terms of model parameters, we would expect normality to hold if the rate of mutations affecting the trait is sufficiently large, i.e., when 2 ≫

1.
We further assume that the phenotypic distribution is isotropic and its mean is at the optimum. Isotropy of the phenotypic distribution follows from assuming isotropy in the mutational input. In Section S5.4, we explore the consequences of anisotropy in the mutational input. In Section S4.4, we further show that the fluctuations of the mean phenotype around the optimum over time have negligible effects on allelic dynamics; a similar argument applies to fluctuations in the variance.

The phenotypic variance satisfies ≪
With the mean phenotype centered at the optimum, requiring that " ≪ " is equivalent to assuming that moving a standard deviation away from the mean phenotype entails only a minor reduction in fitness. This seems plausible for many phenotypes: if, for example, this assumption did not hold for human height, then individuals whose height is a standard deviation or more away from the population mean would suffer from a substantial reduction in fitness. Arguably, deviations from the mean height would then be recognized as a very common and severe disease.
Another line of argument that it is likely that " ≪ " is based on our results. If we assume that mutations are strongly selected, then our results suggest that It follows that if the rate of mutations affecting the phenotype under consideration satisfies ≪ 1 then " ≪ " . The number of mutations per diploid human genome per generation is estimated to be ~60 (5), and less than 10% of the genome is assumed to be functional (6), suggesting that the number of de novo mutations with any effect on function is less than 3 per haploid per generation. It then seems plausible that the (haploid) mutation rate affecting a specific trait satisfies ≪ 1.
Assuming that mutations are weakly selected increases the variance in Eq. S43 only moderately and assuming the mutations are effectively neutral would suggest it is much smaller, leaving the above argument intact.

Mutational effect sizes satisfy ≪
As we argued in the introduction of the main text, variants for which the stronger condition " ≪ " holds account for most or all of the heritability explained in GWAS for many traits (e.g. (7-10)). Moreover, evidence for many traits suggests that the same is true for the variants that underlie the heritability that remains to be explained (11)(12)(13)(14). Indeed, for this assumption to be violated, much of the genetic variance would have to arise from mutations that have a very large impact on fitness (i.e., with s on the order of 1). While this may be the case for some diseases (e.g., autism (15)), it does not appear to be the case for most phenotypes that have been examined.

Deviations of the mean phenotype from the optimum can be neglected
In reality, the mean phenotype of the population fluctuates around the optimum.
Here, we derive equations for the dynamic of the mean phenotype in order to estimate the magnitude and timescale of these fluctuations. We then show that these fluctuations have a negligible effect on the first two moments of change in allele frequency and thus on the results that follow from these moments.
We begin by deriving the first and second moment of change in mean phenotype. To this end, we assume the distribution of phenotypes is a multivariate normal centered around a mean phenotype, , i.e. that By the same token, the variance in Δ is simply the sampling variance where in both cases we relied on the assumption that " ≪ " .
These two moments define an Ornstein-Uhlenbeck process in , allowing us to rely on well-known results (16). Notably, when the mean phenotype starts far from the optimum, it decays exponentially to the optimum with exponent " / " (see Section S5.1 below). At steady state, will fluctuate with mean zero and " = " 2 over a time scale of " " generations. The typical displacement of in any given direction will be " 2 , reflecting a balance between drift and the pull of selection toward the optimum.
Next we show that these fluctuations of the mean have negligible effects on allelic trajectories. To this end, we derive the first two moments of change in allele frequency, but this time, we include the effect of the displacement of from the optimum. While the second moment remains the same, the first moment becomes where S is 's component in the direction of . However, our analysis establishes that 6Á 5 /"v is a scalar of the order of 1, which fluctuates around zero on a timescale of " " . We can therefore compare the first term in the above equation, which represents directional selection, and the second term, which represents stabilizing selection. When stabilizing selection is strong, ≫ 1, the stabilizing selection term dominates over the directional selection term. In contrast, when selection is weak, i.e., ≈ 1 or smaller, then in any given generation, the directional term is not necessarily negligible. However, in this case, both terms affect substantial change in allele frequency only over a timescale of 2 generations; on this timescale, if 2 ≫ " " , the directional effect would average to zero. The directional term will become important only when 2 ≤ " " , that is " ≤ " 2 . For " to be that small, virtually all alleles must have ≪ 1, such that their trajectories will be determined by drift, not selection. In summary, regardless of the selection acting on an allele, fluctuations of the mean phenotype around the optimum will have a negligible effect on its trajectory.

Model robustness
In this section, we consider the sensitivity of our results to relaxing some of the simplifying modeling assumptions about selection and mutation. Specifically, we show our results to be robust to moderate changes to the optimal phenotype; small asymmetry in the mutational input; the presence of major loci maintained at high frequency by selection on traits that are not included in the model; as well as to most forms of anisotropic mutation.

Changes to the optimal phenotype
We first consider how changes to the optimal phenotype over time would affect our results. It is easy to imagine how events such as migration from Africa to Europe or the onset of agriculture may have introduced rapid changes in beneficial phenotypes. In order to evaluate the potential impact of such events, we consider how an instantaneous change to the optimal phenotype would affect the allelic dynamics.
We begin by considering how such an instantaneous change to the optimum would affect the mean phenotype. If the shift to the optimum is small, on the order of the fluctuations in the mean phenotype at steady state or smaller, then the arguments provided in Section S4.4 will still hold and the shift would have a negligibly small effect on our results. We therefore assume that the shift in optimum, , is large compared to the scale of fluctuations ( " ≫ " /2 ). This assumption means that we can use a deterministic approximation (based on Eq. S45) and describe the change in mean phenotype in a single generation by (neglecting higher moments). Further assuming that the mean phenotype was at the optimum, 0, before the optimum shifted (at time = 0) and neglecting changes to the genetic variance , we find that This result suggests that the relative change in allele frequency will be negligible so long as ¸⋅S "e 5 ≪ 1.

(S53)
This condition suggests that mutations with smaller effects would be less affected by the shift in the optimum. It further suggests that alleles that satisfy " ≪ " , as appears to be the case for most loci discovered in GWAS (e.g., (7-10)), will be negligibly affected by shifts in optimum on the order of the total genetic variation (i.e., ≤ ). These analytic predictions are confirmed by simulations (Fig. S4). Results were taken 50 generations after the shift in optimum, which, for these parameters, is just after the population mean has reached the new optimum.

Asymmetric mutational input
In this section, we consider the sensitivity of our results to asymmetries in the mutational input, i.e., to the case in which mutations in a given direction in trait space, , are more likely to arise than mutations in the opposite direction, − (see (18) for treatment of this problem in the limit of high per-site mutation rate).
An asymmetric mutational input introduces a shift in the mean phenotype every generation. With new mutations arising at frequency 1/2 , the expected shift is where E Á is the expectation over newly arising mutations. For each trait, effects have a characteristic size E " = " E( ). The characteristic effect size sets the scale for the maximal shift in any direction, that is Δ Â is of the order of E " or smaller. We therefore parameterize the shift in mean phenotype due to new mutations by where the vector parameterizes the strength and direction of the bias and = is assumed to be << 1.
At steady state, the mutational shift must be offset by selection, such that where Δ Ã and Δ X are the expected shifts due to directional and stabilizing selection, respectively. We previously found that the expected directional shift is where denotes the mean phenotype (cf. Eq. S45). As we show next, when mutations are strongly selected, stabilizing selection offsets the mutational shift to maintain the mean phenotype at the optimum, implying that directional selection is negligible. In contrast, when mutations are effectively neutral, stabilizing selection is negligible and a directional term might not be negligible by comparison. However, as long as asymmetry is small, ≪ 1, we show that this directional term is not large enough to change the allele dynamics, both when all mutations are effectively neutral and when some mutations are strongly selected.
First, we consider the shift in mean phenotype due to stabilizing selection. This shift arises because, with asymmetric mutational input, the distribution of phenotypes becomes skewed. Therefore, even if the mean phenotype is at the optimum, individuals with a given fitness may have an asymmetric distribution of phenotypes around the optimum, leading stabilizing selection to change the mean phenotype.
We have already shown (Eq. S15) that the expected change in allele frequencies per generation due to stabilizing selection at any given site i is The expected change in mean phenotype can then be calculated by adding up the contributions over sites The right-hand side of this equation reflects the skewness of the phenotypic distribution. Indeed, in one dimension, it can be shown that with µ È ( ) being the third central moment of the phenotypic distribution. In ndimensions, for every direction , When sites are under strong selection, Δ X takes a simple form. Assuming the asymmetry is small, the shift due to stabilizing selection can be expanded in orders of . The leading term in the frequency distribution takes the same form as it does without the bias. For strongly selected alleles with no bias, ≪ 1 and therefore the frequency dependence in this term can be approximated by Moreover, q scales with 1/a 2 , implying that the distribution of " is independent of and that E " = 4 " / (cf. Section S3.1). Therefore, when all sites are strongly selected, the leading term in the shift due to stabilizing selection is Thus, to a first order in , the shift of the mean phenotype due to stabilizing selection offsets the mutational shift, implying that there will be no directional term and that the allele dynamics will not be affected by asymmetry.
When alleles are instead effectively neutral, then " /4 " ≪ 1/2 (cf. Section S2.2) and allele frequencies are well approximated by the neutral sojourn time, implying that it makes a negligible contribution to offsetting the mutational shift. In this case, the mutational effect on the mean phenotype is therefore offset by directional selection, where indicating a displacement of the mean phenotype from the optimum with S~/ being the projection of in the direction of . Since ≪ 1, for all alleles other than those with unusually large selection coefficients, the scaled directional selection coefficient will be much smaller than 1 and the trajectories will still be determined by drift and not selection. Even in this case, therefore, we do not expect asymmetry to affect allele dynamics.
Next, we consider the case where there is a mix of effectively neutral and strongly selected mutations. The existence of strongly selected mutations in addition to effectively neutral ones reduces the deviation of the mean phenotype from the optimum. Denoting the proportion of strongly selected mutations by " , we have where E z.Ž. ≤ 1 is the mean scaled stabilizing selection coefficient for effectively neutral mutations. Since " > 2 " " , we can then obtain an upper bound to the magnitude of scaled directional selection coefficient for an allele with effect size and scaled stabilizing selection coefficient = With a substantial proportion of strongly selected sites, 08p Ñ "p Î is of the order of 1, and therefore 08 "p Î E . . ≪ 1. This condition implies that for effectively neutral alleles (i.e., ≤ 1), the scaled directional selection coefficient is ≪ 1 and allele trajectories will be determined by genetic drift, whereas for strongly selected alleles (i.e., when ≫ 1), the scaled directional selection coefficient is ≪ and therefore negligible compared to the scaled stabilizing selection coefficient.
Weakly selected alleles (with ~10) behave largely like strongly selected alleles except that stabilizing selection on them only partially cancels out the mutational bias (for example, for = 10 only 85% of the bias is canceled). The rest of the bias is canceled by directional selection and therefore induces a small shift in the mean phenotype. It is straightforward to repeat the arguments given above and show that the shift in the mean phenotype for a trait with only weakly selected alleles or a mixture that includes weakly selected alleles negligibly affects allele trajectories.
Thus, we conclude that small asymmetry in mutation will not affect the allelic dynamic (see Fig. S5). x v s x v s x v s 0% strong 10% strong 100% strong than trait decreasing mutations; if is the proportion of trait increasing mutations then the asymmetry coefficient is = 2 − 1. As expected, only for large biases (when ~1), are there substantial changes in the distribution of the contribution of sites to variance. Simulations were run with a 10,000 generations burn-in period without asymmetry and then 10,000 generations with asymmetry and averaged over many runs (>300), with the number of runs varied across plots keep errors in (b) below 1%.

Major effect loci
In this section, we show that our results are insensitive to the presence of major loci, i.e., individual loci that contribute substantially to quantitative genetic variation. We have in mind, for example, loci whose alleles are maintained at high frequencies by balancing selection on a Mendelian trait but have pleiotropic effects on the quantitative traits under consideration (e.g., MHC loci; (19,20)). While such loci violate our assumptions, we show that they do not affect the dynamics at other loci that fulfill them.
To this end, we calculate the first two moments of change in allele frequency in the presence of a major locus. We denote the frequency and effect size of the focal allele by and , and the frequency and effect size of the major allele by Â and Â , respectively. As in our previous derivations (Section S2), the distribution of background phenotypic contribution from all other loci, , is well approximated by the normal distribution , where Â " is the contribution to genetic variance from the major locus. The population mean remains close to the optimum because any shift caused by the major locus is quickly compensated for by the other loci (see Section S4.4). We then average over both this distribution and the three genotypes at the major locus to calculate the mean fitness associated with each genotype at the focal locus. Namely, and similarly for the other genotypes. In this way, we obtain the first moment of the change in allele frequency which is the same as we derived in the absence of a major locus (Eq. S15). Similarly, we find the second moment to be unaffected.

Anisotropic mutation
In this section, we consider how relaxing the assumption that the distribution of newly arising mutations is isotropic in trait space would affect our results. As noted, we can always choose an orthonormal coordinate system centered at the optimum, in which the trait under consideration varies along the first coordinate and a unit change in other traits (i.e., in other coordinates) near the optimum have the same effect on fitness. There is, however, no obvious reason for the distribution of newly arising mutations to be isotropic in this coordinate system (see (21) for generalizations of Fisher's Geometric Model along similar lines).
Anisotropy in mutation does not affect the moments of change in allele frequency, as these depend only on the selection on an allele or equivalently on its effect size but not on its direction in trait space. Anisotropy could affect the distribution of allelic effect sizes on the focal trait conditional on the selection acting on them. Here, we provide heuristic arguments suggesting that, barring extreme cases, we can define an effective number of traits * and an effective strength of selection we 2 for which the relationship between selection and effect size in anisotropic models is well approximated by the relationship found for isotropic ones (Section S1.2).
We focus on a family of anisotropic mutational distributions that can be described as a projection of a multivariate normal distribution on the unit sphere in trait space. Namely, we draw the size of a mutation = from some distribution and to obtain its direction, we draw a vector from a multi-variate normal distribution MVN(0, ) and normalize it, i.e., However, there is an extreme scenario in which an effective number of traits cannot describe the distribution of effect sizes. This happens when C Ü " " ≥ 1, that is when selection acts mainly on a small number of traits but our focal trait contributes very little to selection ( 0 ≪ 1). In this case, we might be tempted to use * = 1/ 0 ≫ 1 but, as Eq. S76 suggests, the high pleiotropy limit would be inadequate. In fact, the variance in selection on newly-arising mutations (due to the contribution of the selected traits) will result in a long-tailed distribution of effect sizes on the focal trait, which is not well-approximated by any isotropic model. In summary, except for these extreme cases, isotropic models provide a good approximation for the relationship between selection and effect size, even when there is heterogeneity in the strength of selection on different traits.
To illustrate the effect of heterogeneity in the strength of selection among traits, we consider a simple example in which all non-focal traits make the same contribution to selection and therefore can be modeled by n e ≈2 (c) Effective isotropic model Anisotropic model Next, we consider the case in which the effect sizes on different traits are correlated, i.e., when the covariance matrix has off-diagonal terms. 0 " ∝ 0 " / " and therefore we parameterize the effect of these terms using the correlation between 0 " and " , " ≡ corr α " , 0 " . If the correlation is small, " ≪ 1, then our previous reasoning holds. In the other extreme, when all selected traits are highly correlated with the focal trait, i.e. " ≈ 1, then the proportional contribution of the focal trait to selection is constant, To illustrate the effect of correlations among traits, we consider the following simple example ( Figure S7). We assume the correlation matrix takes the form meaning that that all traits contribute equally to the fitness and every pair of traits has the same correlation coefficient " . When " = 0 this becomes an isotropic model. When " = 1, effect sizes are always identical for every trait; thus, this case is equivalent to having only one trait with selection that is increased -fold.
Intermediate cases can be approximated by finding an effective number of traits * < 1/ 0 = , such that an isotropic model with * and * " = 0 * " = " * / qualitatively describes the distribution of effect sizes. Numerical results of this model are shown in Fig. S7.

The power to detect loci in GWAS
In this section, we summarize the results that we rely on in connecting our theoretical results with the observations in GWAS (see Discussion in main text).
These results provide a first approximation to the power to detect loci in GWAS in re-sequencing and genotyping studies. They neglect some potential complications, which lie beyond the scope of this study (e.g., synthetic associations (22,23)).
where 0 is the true effect size and x is the minor allele frequency at the locus (which, due to the large study sizes, we assume is estimated without error), ae is the total phenotypic variance, and the study size (which in reality may be an effective size reflecting study design, e.g., when the sample was split into discovery and validation panels) (22).
Under the null hypothesis, the effect size is 0, meaning that The power to identify a locus as significant with p-value * is the probability that the estimated contribution of the locus to variance, , is large enough that This condition can be translated into a threshold contribution to variance * for which loci with > * are considered significant, i.e. Pr Žèéé > * = * , with * given by oe * 2 ä /å = 2 erf 80 1 − * " , (S84) and erf denoting the error function. The power to identify a locus as significant would then be Pr > * , and the distribution given by with h ± , * = 0 " 1 ± erf /2 ∓ erf 80 1 − * and the two terms correspond to the estimated and true effect sizes having the same or opposite sign.
The form of the power function carries important implications (Eq. S86 and Fig. S8).
Notably, it shows that (in this approximation) power depends only on the contribution of a locus to variance, and this contribution should be measured relative to, or in units of, VP/m. This scale makes intuitive sense, because the total phenotypic variance generates the background noise for detecting any individual locus, and the background noise decreases is inverse proportional to the study size.
In particular, the threshold contribution to variance v * , as defined above, is proportional to ae and is also the contribution to variance at which power is 50%, i.e., H * , * = 1 2.
The power function can then be approximated by a step function (see Fig. S8) This will be a good approximation when the number of loci that fall at intermediate range (e.g., with power between 0.1 and 0.9) is negligible compared to the number that falls outside this range. Figure S8. The power to detect loci as a function of their contribution to genetic variance (given in units of ae / ). Shown are the exact power function (Eq. S86) and its step function approximation (Eq. S88) for = 5 ⋅ 10 8• .
Further insights come from considering this power function in conjunction with our theoretical results (Section S3). Notably, our results suggest that the first loci to be detected, those that contribute the most to variance, are weakly and strongly selected, and that their contributions to variance are on the scale of vs. We therefore expect GWAS to begin to identify loci (and account for genetic variance) when the study size is such that * ∝ ae is on the order of " , i.e., when ~ae " . We would further expect the rate of increase in identifying new loci (and accounting for the variance) to be similar for different traits when the variance is measured in units of " .

Genotyping
Most current GWAS rely on genotyping instead of re-sequencing, resulting in an additional loss of power (24). Specifically, these studies impute the alleles at loci Step function approximation Power Contribution to variance (in units of V P /m) that are not included in the genotyping platform (25), and the imputation becomes imprecise when the imputed alleles are rare (Fig. S9). If causal loci with rare alleles are included in GWAS, this imprecision leads to an under-estimation of their effect size, resulting in a loss of power (24). For loci with MAF x and effect size a, the expected estimate of the effect size would be reduced by a factor of ( ), where " ( ) is the mean correlation between the imputed and real alleles (26), and the distribution of estimates can be approximated by Employing the reasoning of the previous subsection, we can therefore approximate the power to detect a locus by H " , * , where H is the power function defined in Eq. S86. Figure S9. The precision of imputation decreases with MAF. Specifically we show the mean correlation between imputed and real genotypes as function of minor allele frequency, for a study using an Illumina 1M SNP array and the 1000 genomes phase III as an imputation panel (based on Extended Fig. 9A in (27)). We approximate the effect on power by excluding loci with MAF < 1% and assuming that loci with greater MAFs are imputed correctly.
In practice, GWAS often include only loci with MAF above a threshold, which is chosen to ensure precise imputation. We therefore approximate the effect of Cutoff frequency genotyping on power by excluding loci below a threshold MAF and assume that loci that exceed this threshold are imputed correctly.

Inference
In this section, we describe how we used our model to make inferences based on GWAS results for height and body mass index (BMI). As we note in the Discussion, these inferences are meant as an illustration and do not incorporate the effects of demography and a few other factors (e.g., genotyping and errors in the estimation of effect sizes (22,24)), which lie beyond the scope of this study.

The composite likelihood
Our inferences are based on a composite-likelihood approach. We begin by describing the composite-likelihood function and its maximization, when the loci detected by GWAS are strongly selected and can be described by the high-pleiotropy limit. In this case, we have shown that the distribution of variance among loci is insensitive to the distribution of selection coefficients, depends on a single parameter " , and is well approximated by the probability density = " zôõ 8" oe/oe Î oe (S90) (Section S3.2). Further approximating the power in GWAS as a step function (cf. Section S6), we find that the probability density of sites that exceed a threshold * can be approximated by . We also consider the models without pleiotropy and in which the degree of pleiotropy is a parameter. In the case without pleiotropy, = " zôõ 8"oe/oe Î oe (S94) (cf. Section S3.2). By following the same steps, we find that the composite-likelihood is then maximized when " = argmin oe Î 2 " + log I 2 * " , where ≡ 0 ù Ä ù Äø0 . When the degree of pleiotropy is a parameter of the model, we found that (S98) In the latter case, we used numerical maximization to show that the compositelikelihood estimates for height and BMI converge to the high pleiotropy limit.
Specifically, we maximized the composite-likelihood specifying an interval of [1,1000] for n, where for both traits the estimates converged to the upper limit of 1000. While numerical optimization does not allow us to specify an infinite interval, the likelihood function and maximal value for n=1000 are indistinguishable from those in the high-pleiotropy limit.

Determining * and removing outliers
Our likelihood maximization requires us to specify the value of the threshold * . We choose this threshold based on the empirical distributions of the contributions to variance among genome-wide significant associations ( Fig. S10A and B). Specifically, when the contributions to variance approach the lower boundary for discovery, we observe a decline in the density of loci. This is likely due to a gradual reduction in power and suggests that our approximation for power (as a step function) breaks down for these values of variance. We therefore choose thresholds that appear to be above this decline ( * = 1.4 ⋅ 10 8s ae for height and * = 1.35 ⋅ 10 8s ae for BMI; Fig.  S10A and B), resulting in the removal of 53 loci for height and 11 for BMI. We also examine how our estimates of " depend on the choice of * , and find that they are much more sensitive to reducing the threshold than to increasing it; in fact, the estimates we obtain by increasing the threshold are within the confidence intervals of the estimate with the chosen thresholds ( Fig. S10C and D). This analysis further supports our choice to exclude the loci with the lowest contribution to variance. For BMI, we also dropped the locus with the largest contribution to variance (near FTO), which appears to be an outlier (Fig. S10B) and has been suggested to be under balancing selection (28). Our chosen thresholds are shown by the dashed vertical line (in all graphs). Our estimates of " as a function of the chosen threshold, for height (c) and BMI (d).
When we increase the threshold, the estimates remain within the 95% CI of the estimate with our chosen threshold.

Estimating target size and explained variance
We estimate the target size and the variance explained, both for varying study size and total, based on our estimates of " . The population-scaled mutational input per generation from strongly selected loci, 2 " , is estimated by (cf. Eq. S41) and the corresponding estimate for the target size is   (c) (d) where the estimate for the population scaled mutation rate per site per generation 2 ≈ 0.5 is based on heterozygosity (27). The explained variance corresponding to GWAS with study size is estimated by " " = 2 " " oe¡oe * å = " oe¡oe * å " oe¡oe * å , (S101) where we approximate the threshold corresponding to study size based on the study size, n , and threshold, * , in current GWAS, by * = * ⋅ n . (S102) To estimate the total variance arising from strongly selected loci, we simply set the threshold in Eq. (S101) to 0.

Estimating confidence intervals
We use a combination of non-parametric and parametric bootstrap to estimate confidence intervals (CI). We use non-parametric bootstrap to estimate the CI for the model parameters " and " : specifically, we perform 10,000 iterations, in which we resample the loci identified by GWAS and repeat the estimation of " . We use parametric bootstrap to estimate the confidence intervals in Fig. 5A, describing the explained variance as a function of threshold based on our model. To that end, we rely on our model with the point estimates for " and y , to generate 10,000 samples from GWAS with the specified threshold, and then calculate the total variance explained by these samples. We use a combination of non-parametric and parametric bootstrap to calculate the CI for model predictions, including the total variance, y " , and the explained variance, y " , and number of loci as a function of study size (Figs. 5B and S16). In this case, we generate 10,000 samples by: i) estimating " based on a resampled set of GWAS loci (similar to the non-parametric procedure), and ii) using the estimated " and corresponding y to generate a GWAS sample based on our model (similar to the parametric procedure); we then calculate the appropriate summary based on the latter samples. This two stage procedure is intended to capture the uncertainty generated by both the errors in estimating our

Testing goodness of fit
We use the Kolmogorov-Smirnov D-statistic (29,30) to test the goodness of fit of our models without pleiotropy and in the high pleiotropy limit. Since our parameter estimates are inferred from the data that we are testing against, we cannot rely on the standard tables for the p-values. We therefore generate null distributions for the D-statistic using parametric bootstrap based on our models. Specifically: i) we generate 10 % samples of K significant loci based on the model under consideration, with the corresponding estimate of " , ii) we infer " based on each sample, and iii) calculate the Kolmogorov-Smirnov D-statistic between the distribution of variance for the K loci in each sample and the corresponding theoretical distribution based on the " inferred from that sample. The resulting distribution of D-statistics corresponds to our null hypothesis, i.e., that the loci detected in GWAS arose according to our model, and specifically to the way we calculate the D-statistic between the observed distribution of variance for the K detected loci and the theoretical distribution that we inferred based on these observations. We then calculate the D-statistic, 6 , based on the real data and corresponding theoretical distribution, and estimate the one-sided p-value by ù8y = # *+Ièé,Ùz--,Ù,*zÙ* .+Ù/ ¹¡¹ 9 # *+Ièé,Ùz--,Ù,*zÙ* . (S103) Note that unlike the common case, here the inability to reject the null indicates that the data is consistent with our model. Figure S11. Q-Q plots comparing the distribution of variance among significant loci taken from the GWAS of height (10) and BMI (8) with the theoretical distributions inferred from these data, based on the models without pleiotropy (a) and in the high pleiotropy limit (b). These plots show that the model assuming high pleiotropy cannot be rejected for either trait and fits these data much better than the model assuming no-pleiotropy.  genotyping. Calculated for a study size of ~43 ae / " , corresponding to having 25% of strongly selected (S>>1) variance explained with re-sequencing. Note, that this study size is ~2.5 times larger than the one required to explain 25% of strongly selected variance with pleiotropy (see Fig. 3).