^{*}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: KRT ADL. Performed the experiments: KRT AJF. Analyzed the data: KRT AJF ADL. Contributed reagents/materials/analysis tools: KRT. Wrote the paper: KRT ADL.

Current genome-wide association studies (GWAS) have high power to detect intermediate frequency SNPs making modest contributions to complex disease, but they are underpowered to detect rare alleles of large effect (RALE). This has led to speculation that the bulk of variation for most complex diseases is due to RALE. One concern with existing models of RALE is that they do not make explicit assumptions about the evolution of a phenotype and its molecular basis. Rather, much of the existing literature relies on arbitrary mapping of phenotypes onto genotypes obtained either from standard population-genetic simulation tools or from non-genetic models. We introduce a novel simulation of a 100-kilobase gene region, based on the standard definition of a gene, in which mutations are unconditionally deleterious, are continuously arising, have partially recessive and non-complementing effects on phenotype (analogous to what is widely observed for most Mendelian disorders), and are interspersed with neutral markers that can be genotyped. Genes evolving according to this model exhibit a characteristic GWAS signature consisting of an excess of marginally significant markers. Existing tests for an excess burden of rare alleles in cases have low power while a simple new statistic has high power to identify disease genes evolving under our model. The structure of linkage disequilibrium between causative mutations and significantly associated markers under our model differs fundamentally from that seen when rare causative markers are assumed to be neutral. Rather than tagging single haplotypes bearing a large number of rare causative alleles, we find that significant SNPs in a GWAS tend to tag single causative mutations of small effect relative to other mutations in the same gene. Our results emphasize the importance of evaluating the power to detect associations under models that are genetically and evolutionarily motivated.

Current GWA studies typically only explain a small fraction of heritable variation in complex traits, resulting in speculation that a large fraction of variation in such traits may be due to rare alleles of large effect (RALE). The most parsimonious evolutionary mechanism that results in an inverse relationship between the frequency and effect size of causative alleles is an equilibrium between newly arising deleterious mutations and selection eliminating those mutations, resulting in an inverse relation between effect size and average frequency. This assumption is not built into many current models of RALE and, as a result, power calculations may be misleading. We use forward population genetic simulations to explore the ability of GWAS to detect genes in which unconditionally deleterious, partially recessive mutations arise each generation. Our model is based on the standard definition of a gene as a region within which loss-of-function mutations fail to complement, consistent with the multi-allelic basis for Mendelian disorders. Our model predicts that it may not be uncommon for single genes evolving under our model to contribute upwards of 5% to variation in a complex trait, and that such genes could be routinely detected via modified GWAS approaches.

Genome-wide association studies (GWAS) genotype upwards of 500,000 common SNPs and test for allele frequency differences in case/control panels consisting of several thousand individuals. Such studies have identified highly significant and replicable associations, and as a result have uncovered entirely new pathways contributing to complex disease risk (

A weakness of the RALE model for complex disease variation is that it is not a population-genetic model, but rather an easy to understand verbal model. As a result it does not make quantitative predictions concerning the nature of genetic variation at the genes underlying complex disease, neither in terms of the number of causative alleles, their frequencies and effects, nor in terms of the patterns of linkage disequilibrium (LD) between causative alleles and linked neutral markers. Ideally, the predictions of various RALE models would come from explicit population-genetic models of disease, with concrete assumptions about the fitness effects of causative mutations and the relationship between phenotype and fitness determining the frequency of causative mutations in the population. To date, Prichard's

Pritchard's

Here, we propose an explicit model of a quantitative trait subject to natural selection, with case/control status treated as a liability trait. Our model is similar to that of Pritchard

Our model results in gene regions evolving under the “allelic heterogeneity” model involving many non-complementing risk mutations segregating within a gene region. Since the 1990s, many human geneticists believed that this model was likely to explain complex variation

Under our gene-based model, the effect size of a causative mutation is exponentially-distributed with mean λ (λ = 0 implies a mutation that does not contribute to a complex disease phenotype), and the effect of a maternal or paternal haplotype is additive over causative mutations. The phenotype of a diploid is the geometric mean effect of the maternal and paternal haplotype plus a random Gaussian environmental effect scaled so that the gene-region being modeled accounts for some fraction of the total disease burden.

(a) Phenotype depends only on the number of causative mutations present on each haplotype, and not on whether an individual is homo- or heterozygous for particular mutations. Thus, the two diploids shown are equivalent in their expected phenotype, as both diploids contain one haplotype with two causative mutations, and a second haplotype with three such mutations. (b) Phenotype is calculated as the geometric mean of the effects of each haplotype, and is therefore determined primarily by the haplotype closest to wild-type. The panel shows the expected phenotype for diploids with different combinations of mutations on each haplotype, assuming a constant effect size of 0.05 per mutation. (c) Quantile-quantile plots of phenotypes resulting from the simulation. The x-axis is the quantiles of a unit Gaussian, and the y-axis is the z-score normalized quantiles observed in a simulated population. For three different parameter values, the phenotypes of 20,000 diploids from a single simulated population are shown. At moderate average effect sizes (λ) (0.10 in the panel), there tends to be an excess of individuals with modestly-large phenotypes, whereas with large λ, a population typically contains proportionally more individuals with large phenotypic values. (d). Broad-sense heritability as a function of λ, the mean effect size of a causative disease mutation. Plotted are the mean values ±1 standard deviation, calculated from the simulation output.

In our simulated populations, liabilities are close to normally distributed, except in the extreme “diseased” tail, where there is a slight excess of extremely affected individuals (

Given the computational demands of forward simulation, we focus our attention on a set of parameters (see Methods) that results in the proportion of total phenotypic variation in the population attributable to the focal gene region reaching a plateau at ∼4% as λ increases to ∼0.075–0.10 (

The value of heritability at the plateau depends on the model parameters. Plateau height is approximately linear as a function of the deleterious mutation rate (_{d} (the deleterious mutation rate, or the product of proportion of sites mutable to a causative allele and the size of a gene for a constant per site mutation rate) an λ (the average effect size of newly arising exponentially distributed deleterious mutations), our model is able to generate a single gene of small to large effect contributing to disease risk for plausible parameters (_{d} and λ) is quite large. This implies that even for parameter combinations that predict equilibrium heritability values of ∼2% (

We examined the frequencies of mutations in a sample of 100 diploids drawn from each of the simulated regions. On average, causative mutations are more rare than expected in the absence of natural selection (

The dashed horizontal line corresponds to a p-value of 10^{−8}. (a–f) The −log_{10} of the p-value of the logistic regression is shown for representative examples for different mean effect sizes of causative mutations (λ). The plots are separated into four classes of mutations: common neutral and causative variants, which could be typed in a GWAS, and rare neutral and causative variants which would only be directly typed by resequencing.

We also observe examples of significant, common, non-causative markers (e.g.

We estimated the power of the widely-used logistic regression approach to identify regions containing at least one significant marker. For the parameters simulated, power maximizes at 28% in a GWAS using common markers and at 38% in a resequencing study, when λ = 0.075 (^{−8}. Further, the cumulative distribution of p-values for λ = 0 is a line with a slope less than one, indicating that the logistic regression test is conservative when applied to our simulated data (data not shown). For small values of λ>0, broad-sense heritability is also lower (

(a) The power of the logistic regression in GWAS and resequencing studies at significance threshold α = 10^{−8}. (b) The power of Madsen and Browning's ^{−6}). (c) The power of Li and Leal's ^{−6}) using the SKAT software package, applied to data from non-recombining regions. (f) The power of the ESM test (at α = 10^{−6}, see Methods) in GWAS and resequencing studies.

We applied Madsen and Browning's rank-sum test ^{−6} (compared to the more conservative 10^{−8} for a SNP-by-SNP GWAS) for these gene-based tests, as they integrate over markers, and thus fewer tests are carried out when doing a genome-wide scan. The Madsen and Browning test results in an excess of small

The power of the SKAT software to detect associations in recombining regions is shown in

An interesting feature of the Manhattan plots (_{10} scale) of the M most significant markers in a genomic region (see Methods for details). Control simulations with no causative mutations show that a permutation procedure (see Methods) results in the correct distribution of ^{−6} (

Goldstein and colleagues

Wray

(a) From our simulated case-control studies, we randomly-sampled markers in order to mimic the ascertainment of common markers typical of current GWAS, which resulted in a uniform distribution of minor allele frequencies. The distribution shown here is summed across all replicate simulations of a gene region. (b–f) Monte Carlo estimates of the expected number of most-associated markers in different frequency intervals for different values of λ (the mean effect size of a causative mutation). The x-axis represent the frequency of the minor allele (defined in the general population) in the cases. In each panel,

Risch

Since Pritchard's

Several studies that have carried out resequencing of candidate gene exons in case/control samples have observed an excess of rare non-synonymous mutations in the cases

Our model is consistent with the hypothesis that many rare variants could exist at a relatively small number of genes, and as a class those variants are likely to make a measurable contribution to the variation in complex traits. It is not unreasonable to assume that those variants are partially recessive and partially fail to complement one another when located in the same gene. An important aspect of our model is that causative mutations may be located anywhere in a large gene region that includes regulatory and splicing control regions, and causative mutations are not limited to point mutations. We show that simple extensions to current marker-by-marker tests have considerable power to detect genes harboring such variants. GWAS employing common markers have harvested the “low-hanging fruit” associated with intermediate frequency causative variants. In light of mounting evidence that common variants only explain a small fraction of the genetic variation in complex disease phenotypes, it behooves us to design experiments that have reasonable power to uncover the genetic architecture of complex traits under specific population-genetic models purporting to explain the existence of variation in these traits. Forward simulations that can track entire gene regions under intuitively appealing models of gene action and fitness allow us to assess the power of different experimental designs.

We implemented a forward-time simulation of a Wright-Fisher population with mutation following the infinitely-many sites model _{d} = 0.1 µ per gamete per generation. In our model, causative mutations are treated as SNPs for simplicity, but should be viewed more generally as genetic events (including copy-number variants and transposable element insertions) that we assume to be detectable via a chip or resequencing assay.

We note that there are a variety of forward-time simulation programs in the literature. However, the majority of these either simulate non-gene-based models

An individual carries c_{1} and c_{2} causative mutations on each haplotype. The effect size of the i^{th} mutation on the j^{th} haplotype is _{e}, which we fix at 0.075 in the simulations. In words, the phenotypic effect of a single haplotype is additive over causal mutations, and the phenotype of an individual is the geometric mean of the effects of each haplotype plus Gaussian noise. Since phenotypes are continuous they represent the underlying liability of developing a disease _{s} = 1, and w, the fitness of a diploid, is proportional to

In our simulations, the effect sizes of causative mutations are exponentially distributed with means of λ = 0, 0.01, 0.025, 0.05, 0.075, 0.1, 0.125, 0.175, 0.25, 0.35, or 0.5. For each λ>0, we performed 250 independent simulations. For an effect size of 0, representing “control” simulations where there is no genetic contribution to risk, we simulated 1000 independent replicates. All simulations were run for 8N generations prior to sampling.

For the parameters μ = 0.00125/gamete, μ_{d} = 0.1 µ, σ_{s} = 1, σ_{e} = 0.075, and r = 0.00125/diploid or 0, we simulated both neutral and causative markers, allowing us to examine the properties of GWAS in detail. In order to reduce computational time, for all other parameter values explored, we set μ = 0 (i.e., no neutral mutations were simulated) and only simulated the causative sites. By not simulating the neutral mutations, simulations run orders of magnitude faster, allowing us to look at heritability across a broader parameter space.

For each simulated population, 3000 cases and 3000 controls were sampled. A case was defined as being in the upper 15% of the phenotypic distribution, and controls were within 1 standard deviation of the population mean. For each case-control panel, we define a GWAS to include all markers present in the panel with a minor allele frequency ≥5%, and a resequencing study to include all markers. For both types of study, we performed a logistic regression of case/control status onto genotype under an additive model.

The significance threshold used was 10^{−8}, representing a typical cutoff used in current GWAS

The forward simulations required approximately six weeks on a cluster of 96 computing cores (AMD Opteron 6168, 1900 Mhz). To facilitate the further development of tests for detecting associations in gene regions, we have made all source code, forward simulation output, and case/control files available online at

In addition to the single-marker test, we also applied several existing and one new test of an association of genotype with case/control status to our simulated data. For tests applied to a set of markers within a genomic region, the significance threshold should be less conservative than the 10^{−8} used for the single-marker test. Our simulated data are 100 kilobase regions, from a genome of approximately 3×10^{9} base pairs, giving 3×10^{9}/10^{5} = 3×10^{4} non-overlapping windows. A conservative significance threshold would thus be 0.05/(3×10^{4}) = 1.67×10^{−6}. Here, we take p< = 10^{−6} as the significance threshold for all region-based tests (following, for example,

We developed a statistical test that attempts to integrate significance over marginally significant variants in a single gene. Under the gene-based model, genes harboring causative mutations tend to display such a genetic signature, and the ESM statistic is larger when there are more marginally significant mutations in a genomic region. Given a vector of Fisher's exact test p-values (_{1}_{2}_{10} scale) of the M most significant markers in a region.

The test statistic was calculated for two different conditions. First, for GWA studies where, as above, only minor allele frequencies (MAF) ≥0.05 were included. The second condition assumed complete resequencing of individuals and included all markers. For the latter case, and for GWAS assuming a recombining region, we considered values of _{M} was generally observed to plateau by this point (averaging the statistic over replicates as a function of M). For GWAS in non-recombining regions, we considered values of

We have also applied several other “region-based” tests designed to detect a contribution of rare alleles to disease risk within a defined genomic region. The first test is Madsen and Browning's

The second test is Li and Leal's

For Madsen and Browning's, and for Li and Leal's, test statistics, we did not first collapse redundant markers. The rationale for not collapsing is that if a “case” contains, for example, two singleton mutations (

Finally, we applied the SKAT software

For the ESM, Madsen-Browning, and Li and Leal tests, we assessed statistical significance following the permutation procedure outlined in

In the analyses described above, we assume that a GWA study is conducted using perfect genotyping technology able to assay 100% of markers with minor allele frequencies >5%. However, the majority of GWAS to date have used genotyping chips that assay a subset of ascertained markers whose minor allele frequencies are uniform in the range of 0.05 to 0.5

We use these imperfect chips to look at the MAF distribution of the most significant marker (defined by a logistic regression test described above) in a gene region (following

Phenotypes under an explicit gene-based model. (a) The model of gene action results in partial recessivity of haplotypes. The panel shows the empirical cumulative distribution of phenotypes that result from our simulations with mean effect size λ = 0.10 per causative mutation (black line), based on 250 independent simulations. Using the output of each simulation, we calculated each individual's phenotype under the standard models used in quantitative genetics–the additive model (red line), recessive model (blue line) and dominant model (purple line) of gene action. These other models were not explicitly simulated. Rather, the haplotype effect sizes output from our gene-based simulation were used to generate phenotypes under these alternative genotype-to-phenotype models. The gene-based model results in a distribution of phenotypes in between that of the additive and recessive models. (b) Average fitness of individuals in the simulations. Red dots show the mean of the population mean fitness. Blue triangles are the average fitness of individuals in the upper 15% of the phenotypic distribution of the population, who were treated as cases in the case-control analyses. The black diamonds are the mean fitness of the least fit individual observed in each simulated population. (c) Mean ±1 standard deviation of broad-sense heritability, as a function of λ. The points with solid lines are the same parameters as in

(PDF)

Broad-sense heritability in different parts of the parameter space. (a) The deleterious mutation rate has an approximately linear effect on broad-sense heritability at large mean effect sizes of causative mutations (λ). All model parameters except the deleterious mutation rate (μ_{d}) are the same as in _{e}) is changed (open triangles and solid diamonds), heritability plateaus at different values. However, if _{d}, plateauing at approximately 0.02 when the deleterious mutation rate is halved. (c) Estimated broad-sense heritability as a function of predicted broad-sense heritability (^{−13},df = 13). Thus, the heritability under the gene-based model is roughly one-half of that predicted under house-of-cards, likely a result of our assumed weak selection

(PDF)

Population-genetic properties of a locus. (a–d) The mean, normalized site frequency spectrum (SFS) of derived mutations is shown for three different mean effect sizes (λ), calculated from a sample of 100 randomly-chosen diploids from each simulated population. Shown are the first ten entries of the SFS for neutral sites (red), causative variants (black), all polymorphisms (dashed blue), and the expected values for a Wright-Fisher population experiencing no natural selection (black circles). (e) Mean (± ¼ standard deviation) of the number of causative mutations per diploid in a case/control panel. For both cases and controls, the mean total number of causative mutations (open circles) and rare causative mutations (diamonds, derived allele frequency <0.05) are shown. (f) Summaries of the amount of variation in the entire population. Here, S_{2N} refers to the mean number of mutations present in the entire population, and _{2N} is plotted on a log_{10} scale. In the absence of selection, the theoretical expectation of S_{2N} is _{2N} for neutral markers for all λ shows that the total strength of selection against causative mutations does not result in a loss of variability in the region (because selection is weak on a per-marker basis). For all λ, there is at least a 1 order of magnitude difference in the number of causative and neutral mutations, and

(PDF)

Statistical properties of association studies. (a) Average proportion of rare variants which are causative in either the general population, in a case-control panel, or amongst significant markers in a GWAS where individuals are completely sequenced (b) For every neutral, common marker in a GWAS that was significant in a logistic regression test at p≤10^{−8}, we measured LD using the r^{2} statistic between the significant marker and all causal markers in the case-control panel, and recorded the top two r^{2} values. The distribution of r^{2} for the top marker is summarized in white boxplots, and the distribution for r^{2} for the second-strongest association is summarized in red. (c) White boxes summarize the distribution of the number of significant, common, neutral markers, conditional on there being at least one such marker. The red boxes summarize the distribution of the number of ^{2} values for each significant marker. Taken together, panels a and b suggest that significant common markers tend to tag a single causative site. (d) For each of the most strongly-tagged causal mutations making up the red boxes in panel b, the frequency and effect size of each mutant was recorded. The frequencies are summarized in the white boxes, and effect sizes are in red.

(PDF)

Distributions of −log_{10} p-values for the Madsen and Browning ^{−6} is shown. (a) The empirical cumulative distribution function (ECDF) of p-values for control simulations with no deleterious alleles. As expected, the ECDF of p-values is a straight line with a slope of approximately 1. (b–k) ECDF of p-values for simulations with non-zero mean effect sizes of causative mutations (λ>0).

(PDF)

Distributions of −log_{10} p-values for the Hotelling T statistic ^{−6} is shown. (a) The empirical cumulative distribution function (ECDF) of p-values for control simulations with no deleterious alleles. The ECDF of p-values is a straight line with a slope of approximately one when the number of markers is ≥200. (b–k) ECDF of p-values for simulations with non-zero mean effect sizes of causative mutations (λ>0).

(PDF)

Distributions of −log_{10} p-values for the Madsen and Browning ^{−6} is shown. (a) The empirical cumulative distribution function (ECDF) of p-values for control simulations with no deleterious alleles. As expected, the ECDF of p-values is a straight line with a slope of approximately 1 (b–k) ECDF of p-values for simulations with non-zero mean effect sizes of causative mutations (λ>0).

(PDF)

Distributions of −log_{10} p-values for the Hotelling T statistic ^{−6} is shown. (a) The empirical cumulative distribution function (ECDF) of p-values for control simulations with no deleterious alleles. The ECDF of p-values is a straight line with a slope of approximately 1 when the number of markers is ≥200. (b–k) ECDF of p-values for simulations with non-zero mean effect sizes of causative mutations (λ>0).

(PDF)

Distributions of −log_{10} p-values for the ESM statistic (see Methods). For all panels, the significance threshold of 10^{−6} is shown. (a) The empirical cumulative distribution function (ECDF) of p-values for control simulations with no deleterious alleles. As expected, the ECDF of p-values is a straight line with a slope of approximately 1 (b–k) ECDF of p-values for simulations with non-zero mean effect sizes of causative mutations (λ>0).

(PDF)

Distributions of −log_{10} p-values for the ESM statistic (see Methods) with no recombination. For all panels, the significance threshold of 10^{−6} is shown. (a) The empirical cumulative distribution function (ECDF) of p-values for control simulations with no deleterious alleles. As expected, the ECDF of p-values is a straight line with a slope of approximately 1. (b–k) ECDF of p-values for simulations with non-zero mean effect sizes of causative mutations (λ>0).

(PDF)

A comparison of the burden of risk mutations between significant and non-significant markers in a GWAS. Shown are mean and standard errors of the number of causative singletons in individuals with genotypes defined as having either zero, one, or two copies of the derived allele at the marker most significantly associated (e.g., the smallest p-value) with case/control status in a GWAS analyzed by a single-marker test (red). Pale blue lines show the mean and standard errors of the number of causative singletons associated with zero, one, or two copies of the derived mutation of a marker that is both not associated with case/control status in a logistic regression analysis (p>10^{−4}) and frequency-matched to the most-associated marker in the same replicate. Each panel of the figure is labeled by the mean effect size of a causative mutation (λ).

(PDF)

The authors thank Robin Bush, Jonathan Pritchard, Molly Przeworski, and two anonymous reviewers for valuable comments on the manuscript. We thank Marnee Reiley for copyediting.