Figure 1.
Phenotypes under the gene-based model.
(a) Phenotype depends only on the number of causative mutations present on each haplotype, and not on whether an individual is homo- or heterozygous for particular mutations. Thus, the two diploids shown are equivalent in their expected phenotype, as both diploids contain one haplotype with two causative mutations, and a second haplotype with three such mutations. (b) Phenotype is calculated as the geometric mean of the effects of each haplotype, and is therefore determined primarily by the haplotype closest to wild-type. The panel shows the expected phenotype for diploids with different combinations of mutations on each haplotype, assuming a constant effect size of 0.05 per mutation. (c) Quantile-quantile plots of phenotypes resulting from the simulation. The x-axis is the quantiles of a unit Gaussian, and the y-axis is the z-score normalized quantiles observed in a simulated population. For three different parameter values, the phenotypes of 20,000 diploids from a single simulated population are shown. At moderate average effect sizes (λ) (0.10 in the panel), there tends to be an excess of individuals with modestly-large phenotypes, whereas with large λ, a population typically contains proportionally more individuals with large phenotypic values. (d). Broad-sense heritability as a function of λ, the mean effect size of a causative disease mutation. Plotted are the mean values ±1 standard deviation, calculated from the simulation output.
Figure 2.
Representative Manhattan plots.
The dashed horizontal line corresponds to a p-value of 10−8. (a–f) The −log10 of the p-value of the logistic regression is shown for representative examples for different mean effect sizes of causative mutations (λ). The plots are separated into four classes of mutations: common neutral and causative variants, which could be typed in a GWAS, and rare neutral and causative variants which would only be directly typed by resequencing.
Figure 3.
Power to identify regions containing causative mutations.
(a) The power of the logistic regression in GWAS and resequencing studies at significance threshold α = 10−8. (b) The power of Madsen and Browning's [16] test (at α = 10−6). (c) The power of Li and Leal's [15] multiple-marker test (at α = 10−6). (d) Power (at α = 10−6) using the SKAT software package, applied to data from recombining regions. (e) Power (at α = 10−6) using the SKAT software package, applied to data from non-recombining regions. (f) The power of the ESM test (at α = 10−6, see Methods) in GWAS and resequencing studies.
Figure 4.
Frequencies of most significant markers (based on the logistic regression test) in GWAS based on genotyping panels of previously ascertained SNPs.
(a) From our simulated case-control studies, we randomly-sampled markers in order to mimic the ascertainment of common markers typical of current GWAS, which resulted in a uniform distribution of minor allele frequencies. The distribution shown here is summed across all replicate simulations of a gene region. (b–f) Monte Carlo estimates of the expected number of most-associated markers in different frequency intervals for different values of λ (the mean effect size of a causative mutation). The x-axis represent the frequency of the minor allele (defined in the general population) in the cases. In each panel, is an estimate of the expected number of replicates (out of a total of 250) containing at least one significant marker using an imperfect SNP chip.