The Genetic Architecture of a Complex Trait is more Sensitive to Genetic Model than Population Growth

The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider three different genetic models, two different population growth models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the genetic model, relating genotype to phenotype, has a qualitative effect on the genetic architecture of a complex trait. In particular, the variance component partitioning across the allele frequency spectrum and the power of statistical tests is more affected by the assumed genetic model than by population growth. Models with incomplete recessivity most closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies. Such models show little dominance variance, which is consistent with recent empirical estimates of heritability explained by typed markers. We highlight a particular model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations at the same gene partially fail to complement one another. Interestingly this gene-based model predicts considerable levels of unexplained variance associated with within locus epistasis. Our results suggest a need for improvement of statistical tools for region based genetic association and heritability estimation.


Introduction
Risk for complex diseases in humans, such as diabetes and hypertension, is highly heritable yet the causal DNA sequence variants responsible for that risk remain largely unknown.Genome-wide association studies (GWAS) have found many genetic markers associated with disease risk [1].However, follow-up studies have shown that these markers explain only a small portion of the total heritability for most traits [2,3].
There are many hypotheses which attempt to explain the 'missing heritability' problem [2][3][4][5].Genetic variance due to epistatic or gene-by-environment interactions is difficult to identify statistically because of, among other reasons, increased multiple hypothesis testing burden [6,7], and could artificially inflate estimates of broad-sense heritability [8].Well-tagged intermediate frequency variants may not reach genome-wide significance in an association study if they have smaller effect sizes [9,10].One appealing verbal hypothesis for this 'missing heritability' is that rare alleles of large effect are the causal factors underlying these traits and are difficult to detect [4,11,12].These hypotheses are not mutually exclusive, and it is probable that a combination of models will be needed to explain all heritable disease risk [13].
Further, the performance of statistical association methods has been shown to be sensitive to population genetic parameters [14,15].Thus, it is necessary to use theoretical disease-trait models in experimental design and the development of methods of analysis for genetic association studies.However, the field has neither a comprehensive theoretical framework for the simulation of disease-traits in populations nor a firm understanding of the relationship between model structure and predictions.Typically, specific parameters such as effect-size and frequency of causal mutations or degree of linkage disequilibrium are explored under a single modeling framework.
Many attempts to simulate GWAS datasets have been based on sampling from arbitrary statistical models of the relationships between allele frequencies and mutational effect sizes [16][17][18][19][20][21][22].These approaches make large simulated samples feasible to obtain, but they are not the outcome of an evolutionary process, so they may or may not represent a situation that occurs with any frequency in actual disease samples.An empirical approach, sampling haplotypes from known sequences [23], might provide more realism.Yet, it was demonstrated in [24] that such an approach fails to capture the spectrum of variation in a population subjected to recent population growth.
Population genetics approaches to modeling the genetic architecture of diseases are either based in a coalescent or forward simulation framework.In simulating the genetics of a complex trait under a coalescent framework the phenotypic effects of mutations are typically chosen based on hypothetical relationships between allele frequencies and effect sizes [25][26][27][28].Although, recent work has developed rigorous formulations of quantitative trait distributions under a neutral coalescent framework [29], those approaches have not been applied in the context of GWAS yet.Forward-in-time simulations, as in [24,[30][31][32][33][34][35][36], explicitly model the allele frequencies and effect sizes of mutations in a genomic region with neutral sites, selected sites and recombination.
Within the forward simulation framework, methods differ in their approach to assigning phenotypes to particular genotypes.One approach, based on the work of [37], involves the assumption that a site has an effect on fitness which is pleiotropically related to its effect on phenotype.In that model, by specifying a parameter ⌧ , the user establishes a degree of correlation between fitness and phenotype [15,[33][34][35].A related approach that builds off of much earlier work by [38], models trait and fitness effects as coming from a bivariate gamma distribution with a specified correlation parameter ⇢ [36].In both approaches the disease-trait itself is not a component of fitness and thus standing variation may be dominated by occasional large trait effect mutations with small fitness effects that can reach intermediate frequency.The extent to which this occurs is dependent on the degree of correlation.Furthermore, both approaches indicate that an intermediate degree of correlation between complex disease traits and fitness is most plausible [33,36,39](although [39] is a non-forward simulation based implementation of [37], a similar conclusion was reached.) Another set of studies were mixed in applying versions of explicit quantitative genetic models [24,31], or more arbitrary models designed to obtain certain desirable behaviors [30,32].As pointed out in [14], allowing the disease-trait to be a component of an individuals phenotype ensures that the distributions of phenotypic values and the frequency of underlying causal variants are shaped by selection.In reality, a particular variant may impact fitness both through the disease-trait and pleiotropically through other traits.However, if those other traits are also under stabilizing selection then their distribution and the frequency of underlying causal variants will be shaped by that process.At this time we do not have a model where each variant contributes a different effect to each of a subset of traits that each contribute to phenotype at a different stage of life history.The approach in [14] is similar to classical models [40][41][42][43][44][45] of selection on quantitative traits where phenotype is the sum of genetic and environmental components and is subjected to Gaussian stabilizing selection.A key difference between [14] and the classical quantitative trait models is that causal mutations are unconditionally deleterious and the gene action model involved gene-based recessivity,i.e,allelic non-complementation.
While the work in [14] presented a new genetic model, it was limited because only that model was explored under a single demographic scenario (constant sized population).To our knowledge, there has not been a joint analysis of the effect of genetic model and demography on the predicted outcomes of GWAS.Thus, we extend the approach of [14] by including a model of recent population expansion and a set of genetic models.As in [14], we simulate a 100 kilobase region of human genome, contributing to a complex disease phenotype and fitness.The region evolves forward in time subject to neutral and deleterious mutation, recombination, selection, and drift.We allow deleterious mutations to contribute to phenotype in one of three ways: additivity (co-dominance), multiplicative recessivity, or gene-based partial recessivity.The two former models are classical models and the latter is that of [14].The critical conceptual difference between recessive models is whether dominance is a property of a locus (nucleotide/SNP) in a gene or the gene overall.Mathematically this amounts to whether one performs the first operation across alleles at a site or across sites on a haplotype.For completely co-dominant models, this distinction is irrelevant, however for a model with arbitrary dominance one needs to be more specific.As an example, imagine a compound heterozygote for two biallelic loci, i.e. genotype Ab/aB.In the case of traditional multiplicative recessivity the compound heterozygote is wild type for both loci and therefore wild-type over all; this implies that these loci are in different genes (or independent functional units of the same gene) because the mutations are complementary.However, In the case of gene-based recessivity, neither haplotype is wild-type and so the individual is not wild-type; the failure of mutant alleles to complement defines these loci as being in the same gene [46].Here we show that different models make different predictions about GWAS results and these predictions have strong implications for future research on the genetics of complex traits.
The degree to which demographic history of a population influences the expected architecture of a complex trait remains unclear as various authors have made only partially overlapping conclusions on the subject [34,47,48].One common thread is that recent population expansion, under a genetic model where the disease-trait significantly contributes to fitness, will increase the importance of rare large effect variants.This hypothesis is well supported by empirical studies which suggest that the great majority of variation(>70%) in the human exome is due to rare mutations (minor allele frequencies (MAF) <0.5%) and that the observed patterns of nucleotide diversity are most consistent with a model of recent population expansion with many sites under weak purifying selection [49][50][51][52][53][54][55].These studies also indicate that rare variants are enriched at putatively deleterious sites, implicating a role for rare variants in disease.
The case for the importance of rare variation in the genetics of complex disease is strong.Yet, the construction of a theoretical model which simultaneously predicts the observed patterns of nucleotide diversity and the observed results of GWAS has remained elusive.It was noted by [56] and summarized in [4], that the MAF distribution of SNP markers that reach genome-wide significance in GWAS is not consistent with existing models in which rare variants drive the heritability of disease.Specifically, we do not observe an enrichment of low frequency to rare (MAF< 5%) significant hits in GWAS.These observations were made in comparison to simulations under an additive co-dominant genetic model.This incongruence between theory and empirical data is one major motivation for exploring other genetic models.Each model imposes a different strength of selection on causal sites of different magnitudes.The genetic model also influences the relationship between the distribution of phenotypic and genetic variation in the population.In addition, we were motivated to explore alternative approaches to simulating a gene region in which mutations are recessive.
We find that a model of gene-based recessivity in a recently expanded population reconciles the predicted MAF distribution of significant SNP's with the observed patterns from human GWAS.The gene-based recessive model under recent population expansion predicts that a substantial portion of heritability will be due to rare alleles of large effect; but, not as much as predicted by a co-dominant genetic model.In addition, this model is consistent with recent estimates of total heritability [57], and estimates of additive and dominance heritability [58].Lastly, we explore the power of several association tests to detect a causal gene region under each genetic and demographic model.We find significant heterogeneity in performance of burden tests [14,28,59] across models of the trait and demographic history.The behavior of these tests under our models provides us with insight as to the circumstances in which they are best suited and potentially the nature of the empirical associations which are revealed by each test.Overall, we demonstrate that there is a need to consider how predictions about the genetic architecture of complex traits are altered by changes in the structure of population genetic models.

Results and Discussion
To perform genetic association and heritability estimation studies in silico, we need to impose a trait onto simulated populations.By explicitly modeling a quantitative trait in a population genetic setting, we introduce assumptions about the molecular basis of a trait.Yet, we do not know how that molecular genetic basis influences population genetic signatures in the genome.To begin addressing this question, it was necessary to restrict ourselves to a small subset of molecular and evolutionary scenarios.We analyzed a set of approaches to modeling a single gene region experiencing recurrent unconditionally-deleterious mutation contributing to a quantitative trait subject to weak Gaussian stabilizing selection.Specifically, we studied three different genetic models and two different demographic models.The details of how the models are constructed are described in the Materials and Methods section.The genetic models are named the additive co-dominant (AC) model, multiplicative recessive (Mult.recessive; MR) model and the gene-based recessive (GBR) model.The MR model has a parameter, h, that controls the degree of recessivity; we call this model the complete MR (cMR) when h = 0 and the incomplete MR (iMR) when 0  h  1.The AC and MR models represent the traditional models used in simulations of population-genetic processes and the evolution of quantitative traits.Further, such models draw no distinction between a "mutation" and a "gene" (as discussed in [14]).The GBR is also a recessive model, but recessivity is at the level of a haplotype and is not an inherent property of individual mutations (see [14] for motivation of this model).Viewed in light of the traditional AC and MR models, the recessivity of a site in the GBR model is a function of the genetic background on which it is found.Based on several qualitative comparisons we find that the GBR model is approximated by iMR models with 0.1  h  0.25.However, no specific iMR model seems to match well in all aspects.The demographic models are that of a constant sized population (no growth) and rapid population expansion (growth).
The use of the MR model is inspired by Risch's work [60,61], linking a classic evolutionary model of multiple loci interacting multiplicatively [62,63] to the the genetic epidemiological parameter relative risk.
Risch's approach was pushed forward in an important study by Risch and Merikangas [64] which determined the power to detect casual risk variants as a function of allele frequency and relative risk under their model.
Pritchard extended Risch's model to consider a trait explicitly as a product of the evolutionary process [65].
Pritchard's work demonstrated that the equilibrium frequency distribution suggested an important role for rare deleterious mutations when a trait evolves in a constant sized, randomly mating population where mutations of constant effect sizes are recurrent at multiple loci.However, until now, an explicit exploration of the relationship between model structure and predicted outcomes of GWAS had not been done.

Heritability and genetic load under population growth
Before deeply exploring the predicted genetic architecture of a trait under each model, we looked at two key mean values: total genetic variance and load.The genetic variance underlying a trait is, in part, determined by the outcome of mutation-selection balance.Approximations for the expected genetic variance under models of stabilizing selection with Gaussian mutational effect sizes and additive gene action have been derived previously based on Kingman's "house-of-cards" model [43,45].According to those approximations, when the variance of mutational effect sizes is large compared to total genetic variance, the genetic variance will be dependent only on the mutation rate, µ, and the intensity of stabilizing selection, 1/V s .Here, µ refers to the total mutation rate in the "gene region" (per gamete, per generation), with mutations arising according to an infinitely-many sites scheme [66].In a diploid species, V G ⇡ 4µ d V s for an additive trait, and s for a recessive trait [45,67].By keeping V s and µ constant, we can modulate the broad sense ) by changing the environmental variance, V E = 2 E .These approximations are expected to hold for any well-behaved probability distribution of mutational effect sizes [68].Here, as in [14], we draw the effect sizes of causal mutation from an exponential distribution, modeling unconditionally deleterious mutations.As previously shown in [45], for a Gaussian distribution of effect sizes, and in [14] for an exponential distribution, heritability plateaus at the value expected under the HOC approximation when the mean effect size ( ) is large (Fig S1 ).
Previous work, under additive genetic models, on the impact of population growth on the genetic architecture of complex traits suggests that mean heritability is constant under growth [34,48].We confirm this in Fig S1 , showing that mean broad-sense heritability, Genetic burden (load), the average relative deviation from optimum fitness, of the population is also unaffected by recent population history under the AC model (Fig S3), as shown in [34,48].We find this same behavior under the GBR model, but not under the MR model.Under rapid population expansion, the load decreases slightly (at most 2%).As increases the load increases under the additive model in both demographic scenarios.Load is effectively constant over the range of under the GBR model.Increasing the heritability of the trait decreased the magnitude of the genetic load, but had no interaction with the effects of demography or increasing mean effect size.
From these results we can conclude that the AC and GBR models are fairly comparable with respect to total genetic variance and genetic load.Therefore, the remaining differences, which we highlight in later sections, between the AC and GBR model can be attributed to the fine scale composition of the genetic variance in the population.In other words, the AC and GBR model differ in how the genetic variance and load are accounted for, despite the total amounts being roughly equivalent.However, the MR model is qualitatively different in it's behavior under population growth.This makes comparison to the MR model somewhat difficult.However, there are important reasons to explore it further.Based on first principles, the application of the MR model in simulation of a single gene region is inappropriate.However, it would be appropriate for simulating each mutation as a variant of a distinct functional genomic unit.It is also the most analytically tractable model of recessivity in population genetics, and as such it is our best reference point for comparison to the GBR model.

Choice of genetic model effects key population genetic signatures
It is unlikely that there is a single correct model for all traits.Rather, we must determine how the structure of a model influences certain predetermined important measures.We find that modifying the function that determines how genotype contributes to phenotype, the genetic model, impacts the genetic load and the site frequency spectrum for risk variants.Mean genetic load decreases with degree of dominance (Fig S3).

Additive and dominance genetic variance in the population
The amount of narrow sense heritability, h 2 = (V A )/(V P ), explained by variants across the frequency spectrum is directly related to the effect sizes of those variants [71].Thus, this measure is an important predictor of statistical power of GWAS and should inform decisions about study design and analysis [72].
Empirically, there appears to be very little dominance variance underlying most quantitative traits [58].The results in [58] are relevant to our aims and any plausible model for quantitative trait variation should be consistent with their work.Thus, it is important that we explore additive and dominance variance across the allelic frequency spectrum in our models.We have a particular interest in the amount of additive variance, V A , that is due to rare alleles and how much of genetic variance, V G , is attributable to V A under different recessive models.Neither of these questions were addressed in [14].
We follow the approach of [48], by calculating the cumulative percent of V G explained by the additive effects of variants less than or equal to frequency x, (V A;qx )/(V G ).The product of this ratio and broad-sense heritability is an estimate of the narrow-sense heritability of the trait (h 2 = (V A )/(V P )).In addition we calculate the same distribution for dominance effects (V D;qx )/(V G ) using the orthogonal model of [58].Methods based on summing effect sizes [71] or the site frequency spectrum [48] would not apply to the GBR model, because the effect of a variant is not independent of other variants.Therefore, we resort to a regression based approach, where we regress the genotypes of the population onto the total genetic value as defined in our disease trait models (see Methods).For consistency, we applied the regression approach to all models.Overall, these distributions are substantially different across genetic models, demographic Here, increasing corresponds to stronger selection against causative mutations, due to their increased average effect size.As the strength of selection increases, we expect rare variants to make larger contributions to V G [48], which is exactly what we observe.Recent work by Zuk et al., takes a similar approach and relates the allele frequency distribution directly to design of studies for detecting the role of rare variants [47].However, our findings contrast with those of Zuk [47] and agree with those of Lohmueller [34], in that we predict that population expansion will substantially increase the heritability, or portion of genetic variance, that is due to rare variants.Our results under the AC model agree with those of genotypes (e.g., Ab/aB) [14].Therefore, the recessivity is at the level of the gene region while the typical approach to estimating V A and V D assigns effect sizes and dominance to individual mutations.Thus, compound heterozygosity, which is commonly observed for Mendelian diseases (see [14] and references therein) would be interpreted as variation due to interactions (epistasis) between risk variants.Importantly, the GBR model predicts that such interactions should be local, occurring amongst causal mutations in the same gene.These observations concerning the GBR model are consistent with the finding of [58] that there is little dominance variance underlying complex traits.These observations warrant future investigation, as the GBR model is reflective of the classic definition of a gene in which recessive mutations fail to complement.Because of allelic heterogeneity and the frequency of large effect mutations, it is unclear how to effectively reformulate approaches to estimating heritability to focus on haplotypes rather than sites.If we partitioned haplotypes into two classes, unaffected (A) and affected (a), we might see dominance between classes, but this exercise would not be particularly helpful because we have no way of doing so in practice.
The increase in the number of rare alleles due to population growth is a well established theoretical and empirical result [50,52,70,[75][76][77][78][79][80][81][82][83][84][85].The exact relationship between rare alleles [4,26,56,65,86], and the demographic and/or selective scenarios from which they arose [34,48,87], and the genetic architecture of common complex diseases in humans is an active area research.An important parameter dictating the relationships between demography, natural selection, and complex disease risk is the degree of correlation between a variant's effect on disease trait and it's effect on fitness [33,34,37,48].In our simulations, we do not impose an explicit degree of correlation between the phenotypic and fitness effects of a variant.Rather, this correlation is context dependent, varying according to the current genetic burden of the population, the genetic background in which the variant is present and random environmental noise.However, if we re-parameterized our model in terms of that of [37], then we would have 0 << ⌧ << 1, which is consistent with recent attempts at estimating that parameter [33,39].Our approach is reflective of weak selection acting directly on the complex disease phenotype, but the degree to which selection acts on genotype is an outcome

Estimating additive and dominance variance from population samples
Based on our results (Fig 1 ), we expected that approaches for estimating dominance variance that are implicitly based on a site-by-site model should not find significant dominance variance even when the underlying model is recessive at the level of the gene, in agreement with the results in [58].We hypothesized that the GBR model and intermediate iMR model would not necessarily show dominance variance when calculated using GREMLd.We also wished to explore, in a more general sense, the behavior of techniques for estimating heritability due to a genomic region on our simulated data.We tried three different approaches to estimating heritability from population samples.We tried GREML/GREMLd, minor allele frequency stratified(MS) GREML/GREMLd and MS Haseman-Elston (HE) regression.In all cases we used GCTA [88], to make genetic relatedness matrices (GRM) for both additive and dominance components.For MS estimates we stratified the additive and dominance GRM's into two bins M AF  0.01 and M AF > 0.01.HE-MS regression was carried out by regressing the off diagonal elements of each GRM onto the cross product of the scaled and centered phenotypes in a multiple regression setting using R [89].Unfortunately, due to the nature of our simulated data GREML/GREMLd-MS did not result in sufficiently reliable results.
Under GREML/GREMLd-MS, many replicates resulted in numerical errors in GCTA.These problems were present at a rate of less than 1/100 replicates using non-MS GREML/GREMLd, but were exacerbated by splitting the data into multiple GRMs.This is likely a result of some samples having one or more of the following problems: low numbers of SNPS, low total genetic variance, or extremely uneven distribution of SNP effect sizes.At large values of , especially under non-AC models, the realized heritability in sample often does not reflect the heritability in the population (Fig S17).Despite known bias induced in non-MS GREML/GREMLd due to differences in underlying allele frequency distribution between typed and causal SNPs, we present non-MS GREML/GREMLd and HE-MS results.
Earlier, we showed how genetic variance is distributed over allele frequency in our simulated populations.
Here we want to understand how methods for estimating total genetic variance using population samples perform on our simulated data.Recent analysis of a large pool of quantitative trait variation data suggests that dominance variance is rarely a significant contributor to total heritability [58].Our analysis of additive genetic variance explained as function of allele frequency suggested a significant presence of non-additive heritability in some recessive models (Fig 1 ); this observation called into question the relevance of our simulations to natural variation in quantitative traits.Other recent work has found that when genotyping is sufficiently thorough the typed markers account for almost all heritability for human height and body mass index [57].Importantly, [57] found that one must include nearly all variants to get accurate estimates and that significant portion of the genetic variance is due to rare alleles.We calculated GREMLd estimates of additive and dominance heritability in random population sample panels under the AC, MR (h = 0, 0.25) and GBR models.
In Fig S12 we show the GREMLd additive and dominance heritability estimates, as a percent of the true broad sense heritability, over .Models with high degree of dominance h = 0, 0.1, show a significant heritability due to dominance effects (Fig S12).Neither the GBR nor the iMR models with h 0.25 show significant dominance heritability.Filtering out rare alleles(M AF < 0.01), as was done in [58], can make the estimates more or less accurate depending on the model and parameters.When is low, filtering out low MAF variants makes the GREMLd estimates more accurate.If is high the, unfiltered estimates tend to be better.The difference between filtered and unfiltered estimates is exacerbated by population growth.In the cMR model(h = 0) filtering out rare alleles causes GREMLd to be much more accurate, because the unfiltered estimate has substantial upward bias.However, under the GBR and iMR models, filtering tends to create substantial downward bias.While filtering out rare alleles may be viewed as necessary given the noise present in many current genotype datasets, there may be a significant impact on the accuracy of the estimates.

As an alternative to GREML estimates, Fig S13-Fig S16 show that HE-MS estimates of additive and dominance heritability. For intermediate values of , HE-MS regression is an accurate estimate of broad-sense
heritability due to the analyzed genomic region in the population.When is small under the AC or GBR models there is a tendency to over-estimate total heritability.However, when is large there is a tendency to underestimate total heritability.Under the cMR model, the dominance component is massively overestimated and should be viewed as entirely unreliable.Although this issue is less pronounced under the iMR model, the dominance component tends to be negative and should also be viewed as unreliable.In general we find that intermediate iMR h = 0.25 and GBR models are not inconsistent with the findings of [58] and [57].However, we caution against the reliability of these results.Our results suggest that estimating heritability due to regions as small as 100Kb should be done with careful quality control of data and results.

The genetic model affects the outcomes of GWAS
Both demography and the model of gene action affect the degree to which rare variants contribute to the genetic architecture of a trait (Figure 1).However, the different mappings of genotype to phenotype from model to model make it difficult to predict a priori the outcomes of GWAS under each model.Further, in order for models based on explicit population-genetic principles to inform the design and/or the interpretation of GWAS studies, it is essential to understand if GWAS outcomes under different models are consistent with current GWAS results [4,14,56].Therefore, we sought to examine the performance of statistical methods for GWAS under each genetic and demographic model.Although rare alleles contribute relatively more to genetic variance under population growth (Fig 1 ), there is less power to detect associations under all models when populations have experienced recent, rapid growth (Fig 2A).The loss of power is attributable to a combination of rapid growth resulting in an excess of rare variants overall [50,52,70,[75][76][77][78][79][80][81][82][83][84][85], and the increasing strength of selection against causal variants in growing populations [48].While complete resequencing is more powerful than a gene-chip design, the relative power gained is modest under growth (Figure 2A).Region-based rare association tests behave similarly with respect to population growth (Fig 2B).
There are important differences in the behavior of the examined statistical methods across genetic models.
We focus first on the single marker tests (Fig 2A).For gene-chip strategies, power increases for "site-based" models as recessivity of risk variants increases (compare power for AC, iMR, and cMR models in Fig 2B).
This increase in power is due to the well-known fact that recessive risk mutations are shielded from selection when rare (due to being mostly present as heterozyogtes), thus reaching higher frequencies on average (Fig S5), and that the single-marker test is most powerful when risk variants are common [90].Further, for the complete multiplicative-recessive model (cMR), the majority of V G is due to common variants (Fig 1 ), explaining why resequencing does not increase power for this model (Fig 2A).
The GBR model predicts large gains in power under re-sequencing for intermediate (the mean trait-effect size of newly arising causal mutations), similar to the AC or iMR model.But, when is larger these gains are significantly diminished.The GBR model is effectively a model of complete recessivity when is high because the average number of mutations per gamete (Fig S6) is much less than 0.5.This means that an average diploid is likely to have at most one causal mutation.Under the GBR model one completely wild-type gamete results in a wild-type phenotype.However, the GBR does not show a pattern of genetic variance in the population consistent with either cMR or iMR models (1).While the behavior of the number of mutations per gamete is consistent across models (Fig S6 ), only the GBR model demonstrates this effect reflecting a qualitative difference between the GBR and other models.Surely, the overall shift toward rare alleles at larger values of also contributes to the decrease in power, as we see such a decrease under the AC model.However, if the drop in power were only as a result of frequency, then we would expect the decrease to be largest in the AC model.Yet, the largest drop in power as a function of is seen in the GBR model.
We can conclude that there are significant non-linearities in the observed power of logistic regression as a function of , model, and demography.
Region-based rare variant association tests show many of the same patterns across genetic model and effect size distribution as single marker tests, but there are some interesting differences.The ESM test is the most powerful method tested for the AC, iMR, and GBR models (Fig 2B ), with the c-Alpha test as a close second in some cases.For those models, the power of naive SKAT, linear kernel SKAT and SKAT-O, is always lower than the ESM and c-Alpha tests.This is peculiar since the c-Alpha test statistic is the same as the linear kernel SKAT test.The major difference between SKAT and ESM/c-Alpha is in the evaluation of statistical significance.SKAT uses an analytical approach to determining p-values while the ESM/c-Alpha tests use an explicit permutation approach.This implies that using permutation based p-values results in greater power.Yet, under the cMR model the linear kernel SKAT is the most powerful, followed by c-alpha.
The cMR model does not predict a significant burden of rare alleles and so the default beta weights of SKAT are not appropriate, and the linear kernel is superior.The ESM test does poorly on this model because there are not many marginally significant low-frequency markers.It is logical to think that these tests would all perform better if all variants were included.The massive heterogeneity in the performance of region-based rare variant tests across models strongly suggests that multiple methods should be used when prior knowledge of underlying parameters is not available.In agreement with [24,34], we predict that population growth reduces the power to associate variants in a causal gene region with disease status (Fig 2 ) when the disease also impacts evolutionary fitness.We also show that there is potential for more powerful burden tests, such as the ESM test, to be implemented in practice.

The distribution of the allele frequencies of GWAS hits
The distribution of the minor allele frequency of the most statistically significant single marker in a region is an interesting statistical signature of GWAS.This signature captures information regarding the underlying effect sizes and frequencies of typed variants, and statistical power.We investigate this by pooling model replicates across values of to reflect a uniform prior on this parameter (Fig 3).
It was noted by [4,56], that an excess of rare significant hits, relative to empirical data, is predicted by AC models where the effects of mutations contribute directly to fitness and the disease trait.We confirm that AC models are inconsistent with the empirical data (Fig 3).The empirical data represents (Fig 3) a pooled data set with the same diseases and quality filters as in [56], but updated to include more recent data.
Close to half of the data comes from GWAS studies uploaded to the NHGRI database after 2011, yet the same qualitative pattern is observed.This contradicts the hypothesis that the initial observation was simply due to the relatively small sample sizes and low marker density in early GWAS.This observation is a robust and important one to consider when evaluating the parsimony of a theoretical model with empirical results.
The GBR model agrees extremely well with empirical data, it predicts few rare significant hits and an approximately uniform distribution across the remainder of MAF domain ( In consideration of the rare allele of large effect hypothesis, [26] proposed a models where multiple rare alleles dominate disease risk and create synthetic associations with common SNPS.However, shortly after it was shown that this particular model was inconsistent with GWAS theoretically and empirically [4,56,91].
This inconsistency remained un-reconciled in the literature until now.We find that the MAF distribution of significant hits in a GWAS varies widely with choice of genetic model.In particular, we confirm the results of Wray et al., that AC evolutionary models predict an excess of low frequency significant hits.Also, the cMR model predicts an excess of intermediate and common significant hits.Utilizing a GBR model or an intermediate iMR model with h = 0.25 0.5, reconciles this inconsistency by simultaneously predicting the importance of rare alleles of large effect and the correct allele frequency distribution among statistically significant single markers.

Conclusion
The genetic model may be more important than the demographic model with respect to designing association studies.While changes in population size do affect the relationship between effect size and mutation frequency [50, 52,  From an empirical perspective, our findings suggest that re-sequencing approaches in large samples is likely the best way forward in the face of the allelic heterogeneity imposed by the presence of rare alleles of large effect.Re-sequencing of candidate genes [92][93][94][95] and exomes [54,55,[96][97][98][99][100]  Here, the total heritability due to the simulated locus is only either 8% or 4% with the remaining phenotypic variation due to the un-simulated portions of the genome, and/or the environment.Passing the trait value through the Gaussian function to determine fitness has the effect of decreasing the strength of selection on variants which occur on predominantly unaffected genetic backgrounds, and only creates strong selection against variants which are on genetic backgrounds with multiple mutations.Thus, selection is inherently frequency-dependent [101,102].This model may also apply to traits with much higher heritability due to multiple unlinked loci with effects similar to that of our simulated region.
While a complete exploration of the impact of arbitrary dominance on the predictions of a model with site-based recessivity remains to be done, we argue that it is unnecessary in this context.The use of multi-locus evolutionary models in simulating single nucleotide variants in a gene region is an incorrect application of the theory.Each locus in a multi-locus model is best suited to represent a gene, not a polymorphic site within a gene [103].This distinction is critical as the alleles in these classic evolutionary models represent a class of haplotypes which have similar (idealized as identical) contributions to fitness.
Like those models, our GBR model adheres to the classical definition of a gene in which mutant recessive alleles fail to complement [46].It is therefore compelling to observe that a simple change in perspective, a reversion to a classical perspective, in the modeling of a trait appears to reconcile an inconsistency between theory and experiment (compare Fig 3 to the results of [4,56]).

Forward simulation
Using the fwdpp template library v0.2.8 [104], we implemented a forward in time individual-based simulation of a Wright-Fisher population with mutation under the infinitely many sites model [66], recombination, and selection occurring each generation.We simulated populations of size N = 2e4 individuals for a time of 8N generations with a neutral mutation rate of µ = 0.00125 per gamete per generation and a per diploid per generation recombination rate of r = 0.00125.Deleterious mutations occurred at a rate of µ d = 0.1µ per gamete per generation.These parameters correspond to ✓ = 4Nµ = ⇢ = 4Nr = 100 and thus our simulation approximates a 100Kb region of the human genome.Population growth has been suggested to change the architecture of complex traits while keeping genetic load and genetic variance constant [34,48].Therefore, it was important for us to explore how the population growth impacted traits evolving under different genetic models.For simulations of population growth we extended the constant population simulations for 500 generations during which the population size grows exponentially from N i = 2e4 to N final = 1e6.This demographic model is much simpler than current models fit to empirical data [83].However, this simple model allows us to more easily get a sense of the impact of population expansion [34,48].

Models of disease and fitness
We implemented three disease-trait models with the phenotypic form P = G + E. G is the genetic 1, where c ij is as above and h ij is the dominance coefficient.This model is referred to as the SBR model and is classified as either complete or incomplete.For example, when an individual has k i = {0, 1, 2} risk mutations at the i-th locus the complete site-based recessive (cSBR) model has h ij = 0, 0, 1 and a incomplete site-based recessive (iSBR) model might be h ij = 0, 0.25, 1.The gene-based partially recessive (GBR) model of [14] is implemented as In all cases, the phenotype's are subject to Gaussian stabilizing selection with an optimum of zero and standard deviation of s = 1 such that the fitness, w, of a diploid is proportional to e P 2 2 2 s .The effect sizes of a causative mutations were sampled from an exponential distribution with means of = 0.01, 0.025, 0.05, 0.1, 0.125, 0.25, or 0.5.250 simulation trials were performed for each parameter/model combination.The parameters chosen here reflect a trait under weak "real" stabilizing selection [105].

Exploring the gene region's contribution to heritability
Broad-sense heritability can be calculated directly from our simulated data as H 2 = V ar G V ar P .We explored broad-sense heritability as a function of mean causative effect size under each model.We used the house-of-cards (HOC) approximations for total genetic variance under stabilizing selection to predict broad-sense heritability [45,67].Using these predictions we analyzed models where H 2 ⇠ 0.04 or 0.08.The total locus heritability under a co-dominant model is expected to follow the HOC approximation in certain parameter regimes.The HOC approximations to the expected genetic variance for an additive and recessive s respectively [45,67].In our simulations, V s = 1, because s = 1, and we varied V e = 2 e to adjust the total heritability.

Determining the genetic load of the population
Genetic load is defined as the relative deviation in a populations fitness from the fitness optimum, L = (w max w)/(w max ).We set the phenotypic optimum to be zero; P opt = 0.When determining fitness for the SBR models, we subtract one from all phenotypes.This implies that w max = e s .We also used the mean number of mutations per individual, and the mean frequency and effect sizes of segregating risk variants as proxies for the genetic load.

Additive and dominance genetic variance over allele frequency
We used an approach based sequential (type-1) regression sums of squares to estimate the contribution of the additive and dominance effects of variants to the total genetic variation due to a locus.Given a genotype matrix (rows are individuals and columns are risk variants) of (0,1, or 2) copies of a risk allele (e.g.all mutations affecting phenotype), we sort the columns by decreasing risk mutation frequency.Then, within frequency classes, columns were sorted by decreasing effect sizes.For each variant a dominance component was also coded as 0, 2p, or 4p-2 according to the orthogonal model of [58], where p is the frequency of the variant in the population.We then used the R package biglm [106] to regress the individual genetic values (G in the previous section) onto this matrix.The variance explained by the additive and dominance effects of the m markers with p  x is then approximately r . Averaging results across replicates, this procedure results in a Monte-Carlo estimate of the fraction of V G that is due to additive and dominance effects of variants with population frequency less than or equal to x is (V A;px + V D;px )/(V G;p1 ) [48].This fraction can be easily partitioned into strictly additive and dominance components.
In order to mimic ascertained SNP data, we sampled markers from our case/control panels according to their minor allele frequencies [69], as done in [14] .The minor allele frequency of the most significant marker with a single-marker score log 10 (p) 8, for all replicates where significant markers were present.models where H 2 ⇠ 0.08, although it was confirmed (data not shown) that these particular results are independent of total H 2 .The additive and dominance genetic variance is estimated by the adjusted r 2 of the regression of all markers (and their corresponding dominance encoding ) with q  x onto total genotypic value (see methods for details); data are displayed as the mean of 250 simulation replicates.The vertical dotted and dashed lines correspond to the q = 0.01 and q = 0.05, respectively.The curves under no growth appear to be truncated with respect to rapid growth because the range of the x-axis differs between growth and no growth (minimum q = 1/2N ).
initially increases as , the mean effect of a new deleterious mutation, increases and then plateaus to approximately the same level as models with constant population size.This general trend is observed under each genetic model, but the MR model is qualitatively different in its behavior under population growth.The MR model predicts a broad sense heritability under growth of about 90% of constant sized population levels when = 0.01, and 50% when = 0.5 (Fig S2).
Recessive deleterious mutations segregate to higher frequencies in the population without increasing genetic load (FigS5).The MR model demonstrates a slight decrease in load under growth (FigS3).Similarly, the additive model has greater skew and kurtosis for both the number of mutations and the genetic value of a gamete over the range of and demographic models(Fig S10 and Fig S11).The increase skew and kurtosis implies that total genetic load in the additive models is dominated by rare large excursions from the population mean.Population expansion has been shown to impact the site frequency spectrum[50,52,69,70].In general, we expect to find an increase in rare private mutations under a rapid population expansion scenario.We find all of our models showcase the expected pattern (Fig S7), but there are consistent and important differences between models.In Fig S7, the site frequency spectrum of risk variants from a population sample (n=100) shows a dependency on population growth, mean effect size , and genetic model.Population growth increases the proportion of singletons in for all genetic models and values of .Increasing the value increases the proportion of singletons in each genetic model and demographic scenario, but the increase is qualitatively dependent on genetic model and independent of demography.The recessive models show the strongest dependence on .When is small the recessive models show fewer singletons as compared to the additive model, but as increases the relative proportion of singletons between recessive and additive genetic model increases.When the value of is large, the recessive models show more singletons than the additive model.The GBR model shows more singletons than the MR model in all cases.Fig S8 shows the site frequency spectrum for non-risk variants, which demonstrates a dependence on population growth, shifting towards low-frequency sites, but shows no dependence on genetic model or .Since the neutral variant site frequency spectrum is consistent across models we determine that the difference in linked selection between models is not important.
scenarios and model parameters (Fig 1).Under the AC model, all of V G is explained by additive effects if all variants are included in the calculation; in Fig 1 the solid variance curves reach unity in the AC panel.Low frequency and rare variants (q < 0.01) explain a large portion of narrow sense heritability, 26% -95%, even in models without rapid population expansion.Fig S4 demonstrates that, for the AC model, the variance explained at any given frequency threshold increases asymptotically to unity as a function of increasing .While the total heritability of a trait in the population is generally insensitive to population size changes (Fig S1, see also [34, 48, 73]), rapid population growth increases the fraction of additive genetic variation due to rare alleles (Fig 1).
Simon's et al. in that we find that increasing strength of selection, increasing in our work, increases the contribution to heritability of rare variants[48].However, under the GBR model the distribution of variance over allele frequency as function is non-linear (Fig1and Fig S4).For all recessive models, V A < V G when including all mutations (Fig 1).For the MR models, all additional genetic variation is explained by the dominance component; in Fig 1 the dotted variance curves reach unity in the MR panels.As expected, genetic variation under the MR model with partial recessivity (h = 0.25) is primarily additive [71, 74], whereas V G under the cMR model (h = 0) is primarily due to dominance.The MR model differ from the third recessive model, the GBR model.This latter model shows little dominance variance and is the only model considered here for which the total V G explained by V A + V D is less than the true V G for all .This can be clearly seen in Fig 1 because the variance curves do not reach unity in the GBR panel.Under the GBR model, large trait values are usually due to compound heterozygote of the model.While the recent demographic history has little effect on key mean values such has broad-sense heritability of a trait or population genetic burden (Fig S1 and Fig S3), the structure of the individual components of the population which add up to those mean values varies considerably.The specific predictions with respect to the composition of the populations varies drastically across different modeling approaches.It is therefore necessary to carefully consider the meaning of a genetic model in a simulation study.
We assessed the power of a single marker logistic regression to detect the gene region by calculating the proportion of model replicates in which at least one variant reached genome wide significance at ↵  10 8 (Fig 2A).The basic logistic regression is equivalent to testing for association under the AC model and simulated both perfect "genotyping chip" (all markers with M AF 0.05) and complete re-sequencing including all markers (Fig 2B).

Fig 3 )
. The more uniform distribution of significant single markers seen under the GBR is consistent with the flatter distribution of genetic variance (Fig1).The cMR model shows an excess of intermediate frequency variants.If one considers trying to determine an approximate dominance coefficient in the GBR model, it would be found that there is a distribution of coefficients across sites.Yet, when simulating iMR model, we find that an intermediate degree of dominance, h = 0.25 0.5 results in distribution of significant hits which agrees well with empirical data as well(Fig 3 and Fig S9).In future studies, with more extensive simulation and careful pruning of empirical data, it might be possible to use the distribution of significant hits as part of a more rigorous model-based parameter estimation procedure.
(Figs 1 and Fig S5), different mappings of genotype to trait value do this in radically different ways for the same demographic history (Fig 1).Further, the fact that models with strong recessive components are more compatible with GWAS results (Fig 3) implies that the development of methods with explicit consideration of gene-based dominance or recessivity might be an important next step.
in case-control panels have observed an abundance of rare variants associated with case status.Here we show that under a model of mutation-selection balance at genes, neither current single-marker nor popular multi-marker tests are especially powerful at detecting large genomic regions harboring multiple risk variants (Fig2).The extent to which the conclusions of this work can be extended to exome resequencing studies is unclear.Exons likely have a large fraction of sites mutable to, and larger effect size at, causative alleles than a large genomic region, and physical clustering of causal mutations may be more block-like in exome resequencing datasets given exon/intron structure.It will be of value in future work to more explicitly explore the properties of exome resequencing case-control studies under a gene-based mutation-selection balance model.
component, and E = N (0, e ) is the environmental noise expressed as a Gaussian random variable with mean 0 and standard deviation e = 0.075.The additive co-dominant (AC) genetic model is G = is the effect of the allele at the i-th site on the j-th haplotype, and is effectively the sum of the effect sizes of all mutations in a diploid.The classic population genetic multiplicative site-based recessive viability model is implemented as G

P 2 opt 2 2 s
= 1 and that load is a simple function of the phenotypes of the population, L = 1 e

Fig 1 .
Fig 1. Variance explained over allele frequency.The cumulative additive and dominance genetic variance which can be explained by markers whose frequencies, q, are  x.Each color represents a different value of : the mean effects size of a new deleterious mutation.Shown here are the gene-based (GBR), additive co-dominant (AC), incomplete multiplicative recessive (Mult.recessive (h = 0.25); iMR) and complete multiplicative recessive (Mult.recessive (h = 0);cMR) models.Solid lines show the additive variance alone and dotted lines show the combined additive and dominance variance.All data shown are for

Fig 2 .Fig 3 .
Fig 2. Power of association tests.(A) The power of a single marker logistic regression, at significance threshold of ↵  10 8 , as a function of : the mean effect size of a new deleterious mutation.For single marker tests we define power as the number of simulation replicates in which any single marker reaches genome wide significance.Two study designs were emulated.For the gene chip design only markers with M AF > 0.05 were considered and all markers were considered for the resequencing design.Genetic models shown here are the additive co-dominant (AC), gene-based (GBR), complete multiplicative site-based recessive (Mult.recessive (h = 0); cMR) and incomplete multiplicative site-based recessive models (Mult.recessive(h = 0.25); iMR) (B) The power of region-based rare variant to detect association with the simulated causal gene region at significance threshold of ↵  10 6 .For region-based tests, we define power as the percent of simulation replicates in which the p-value of the test was less than ↵.The p-values for the ESM, c-Alpha were evaluated using 2 ⇥ 10 6 permutations.SKAT p-values were determined by the SKAT R package and represent numerical approximations to the presumed analytical p-value.