Conceived and designed the experiments: DEA JSB. Performed the experiments: HH PC. Analyzed the data: DEA JSB HH PC. Contributed reagents/materials/analysis tools: AA DEA HH PC. Wrote the paper: HH DEA JSB PC.
The authors have declared that no competing interests exist.
Genomewide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel GeneWide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide pvalues that correct for the number of independent tests genomewide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of diseaseassociated genes housing multiple independent effects, observed at 35%–50% of loci in our study. This method can be generalized to other study designs, retains power for lowfrequency alleles, and provides genebased pvalues that are directly compatible for pathwaybased metaanalysis.
Genomewide association studies (GWAS) have successfully identified genetic variants associated with complex human phenotypes. Despite a proliferation of analysis methods, most studies rely on simple, robust SNP–by–SNP univariate tests with everlarger population sizes. Here we introduce a new test motivated by the biological hypothesis that a single gene may contain multiple variants that contribute independently to a trait. Applied to simulated phenotypes with real genotypes, our new method, GeneWide Significance (GWiS), has better power to identify true associations than traditional univariate methods, previous Bayesian methods, popular L1 regularized (LASSO) multivariate regression, and other approaches. GWiS retains power for lowfrequency alleles that are increasingly important for personal genetics, and it is the only method tested that accurately estimates the number of independent effects within a gene. When applied to human data for multiple ECG traits, GWiS identifies more genomewide significant loci (verified by metaanalyses of much larger populations) than any other method. We estimate that 35%–50% of ECG trait loci are likely to have multiple independent effects, suggesting that our method will reveal previously unidentified associations when applied to existing data and will improve power for future association studies.
Traditional singleSNP GWAS methods have been remarkably successful in identifying genetic associations, including those for various ECG parameters in recent studies of PR interval (the beginning of the P wave to the beginning of the QRS interval)
One analytical approach, genebased tests proposed during the initial development of GWAS
Despite these appealing properties, genebased and related multimarker association tests have generally underperformed singlelocus tests when assessed with real data
Multilocus tests often have the additional practical drawback of being highly CPU and memory intensive. Several methods use Bayesian statistics to drive a bruteforce sum or Monte Carlo sample over models
The GeneWide Significance (GWiS) test addresses these problems by performing model selection simultaneously with parameter estimation and significance testing in a computational framework that is feasible for genomewide SNP data (see Methods). Model selection, defined as identifying the best tagging SNP for each independent effect within a gene, uses the Bayesian model likelihood as the test statistic
The ECG parameters PR interval, QRS interval and QT interval are ideal test cases because recent largescale GWAS studies have established known positive associations. These traits are all clinically relevant, with increased PR interval associated with increased risk of atrial fibrillation and stroke
PR  QRS  QT  
Individuals, published GWAS  28,517  47,797  15,842/13,685 
Individuals, this study  7,076  7,250  7,771 
Individuals, this study relative to published  25%  15%  49%/57% 
Genes, total  25,251  
Genes, at least one SNP assigned  24,337  
SNPs, total  2,557,232  
SNPs, assigned to at least one gene  1,392,262  
SNPs, average per gene  72  
SNPs, median per gene  43  
Effective tests, average per gene  9.3  
Effective tests, median per gene  7.3 
The SNPs were assigned to genes based on the NCBI
The “gold standard” known positives rely on previously published metaanalyses of PR interval
The minSNP test uses the pvalue for the best single SNP within a gene. The minSNPP test converts this SNPbased pvalue to a genebased pvalue by performing permutation tests within each gene. BIMBAM averages the Bayes Factors for subsets of SNPs within a gene, with restriction to singleSNP models recommended for genomewide applications
Power calculations used genotypes from the ARIC population to ensure realistic LD. Phenotypes were then simulated for genetic models with one or more causal variants within a gene. GWiS was the bestperforming method, with an advantage growing as more independent effects are present (
Power estimates for GWiS (black), minSNPP (blue), BIMBAM (dashed blue), VEGAS (green), and LASSO (red) are shown for 0.007 population variance explained by a gene. Genes were selected at random from Chr 1; genotypes were taken from ARIC; and phenotypes were simulated according to known models with up to 8 causal variants with independent effects. (a) Power decreases as total variance is diluted over an increasing number of causal variants. (b) Power estimates with 95% confidence intervals are shown as a function of minor allele frequency (MAF) for the simulations from panel (a) with a single independent effect. GWiS, minSNP, minSNPP, and BIMBAM are robust to low minor allele frequency, whereas VEGAS and LASSO lose power.
Of the other methods, minSNPP and BIMBAM had similar performance that degraded as the true model included more SNPs. The VEGAS test did not perform well, presumably because the sum over all SNPs creates a bias to find causal variants in LD blocks represented by many SNPs and to miss variants in LD blocks with few SNPs. In the absence of LD, with genotypes and phenotype simulated using PLINK
The advantage of GWiS arises in part from better power to detect associations with lowfrequency alleles (
The model size selected by GWiS and LASSO was evaluated by simulation (
The ability to recover the known model size was evaluated for GWiS (a and b) and LASSO (c and d). The power to detect a single SNP was set to be 10% (a and c) and 80% (b and d). In separate tests, the causal SNPs were either retained in (black) or removed from (red) the genotype data.
GWiS provides a better estimate of the true model size than LASSO, assessed from the
Removing a causal SNP results in GWiS predicting a smaller model, with the ratio of estimated to true
These results demonstrate that the model size returned by GWiS is conservative for causal variants with small effects, and approaches the true model size for causal variants with large effects.
We then obtained pvalues from GWiS, minSNP, minSNPP, BIMBAM, VEGAS, and LASSO for the ARIC data. Permutations of phenotype data holding genotypes fixed
GWiS outperformed all other methods in the comparison (
Of 38 known positives, GWiS identified 6 at genomewide significance with no false positives. Univariate methods (minSNP and minSNPP) and VEGAS identified a subset of 4 entirely contained by GWiS, and LASSO identified a smaller subset of 2.
Known Locus  GWiS (2E6)  minSNP (7.4E8)  minSNPP (3E6)  BIMBAM (3E6)  VEGAS (1E6)  LASSO (2.1E11)  
Trait  Locus Name  Chr  Start  End  Genes  pvalue  Genes  SNPs  Tests 

pvalue  Rank  pvalue  Rank  pvalue  Rank  pvalue  Rank  pvalue  Rank  Genes 

SI  Rank 
PR  SCN5ASCN10A  3  38,363,244  39,055,168  8  2.1E74 


















PR  CAV2CAV1  7  115,444,532  116,032,391  4  3.7E28 














2  3  1.3E05  15 
PR  ARHGAP24  4  87,208,605  87,281,000  1  6.2E20  1  100  11.7  8.1E03  1589  1.5E01  3941  1.4E01  3605  4.3E02  581  
PR  SOX5C12orf67  12  23,576,498  24,993,499  4  3.3E13  1  185  15.8  8.9E03  1664  4.5E01  11033  2.8E01  6964  5.2E01  6092  
PR  ATP6VOE1C5orf41BNIP1NKX25  5  172,264,849  172,687,936  9  9.5E13  1  45  7.9  1  6.2E05  5  2.0E06  5  6.2E05  5  2.6E05  5  2.3E06  3  5  6  5.4E02  344 
PR  MEIS1  2  66,574,183  66,711,542  1  4.6E11  1  159  19.8  2.8E04  156  2.5E02  678  6.8E03  138  3.7E03  41  1  2  6.1E04  163  
PR  WNT11  11  75,203,923  75,783,791  5  3.2E08  1  196  18.4  1  1.8E03  46  3.1E05  22  1.8E03  48  2.5E03  52  2.6E01  3181  1  2  5.1E05  35 
QRS  ACVR2BEXOGSCN5ASCN10A  3  38,322,431  39,055,168  9  1.1E28 






1.0E06  1  4.4E05  1  4.0E05  1  3.8E06  1  4  11  1.8E10  1 
QRS  CDKN1A  6  36,518,522  37,000,310  11  3.0E27  2  69  12.3  1  1.2E04  4  5.0E06  5  1.0E04  3  8.7E05  3  1.4E03  14  2  3  8.2E02  321 
QRS  SLC35F1PLNBRD7P3  6  118,335,382  119,441,511  8  1.3E18  1  31  6.8  4.1E04  163  1.8E02  338  2.5E02  558  6.9E03  62  1  1  4.5E03  280  
QRS  NFIA  1  61,260,314  61,634,057  1  4.6E18  1  334  42.1  1  9.3E03  138  1.1E04  54  9.3E03  139  2.2E03  39  2.5E03  21  1  2  2.1E04  62 
QRS  HAND1SAP30L  5  153,550,488  153,845,903  5  7.4E14  1  39  9.6  1.7E03  515  1.3E01  3157  1.5E01  3770  1.6E01  1963  
QRS  TBX20  7  34,982,033  35,189,326  3  1.1E13  1  10  3.8  1.3E03  417  1.5E02  246  6.5E03  105  5.5E04  7  
QRS  SIPA1L1  14  70,443,875  71,275,875  5  1.0E10  1  101  13.4  1.9E03  558  1.6E02  279  2.6E02  567  8.2E02  982  
QRS  TBX5  12  113,254,456  113,312,808  2  1.3E10  1  133  19.1  1.2E03  395  2.6E02  536  1.8E02  358  1.8E02  198  
QRS  CDKN2CFAF1  1  50,618,956  51,966,162  11  3.3E10  1  56  11.2  6.6E03  1317  5.9E02  1433  3.7E02  848  1.0E01  1182  
QRS  GOSR2  17  41,805,904  42,621,664  8  4.8E10  1  63  7.5  1  5.0E03  84  3.9E04  154  5.0E03  85  1.7E02  353  5.8E02  703  2  2  3.7E04  91 
QRS  VTI1A  10  114,197,006  114,605,117  2  5.0E10  1  271  32.2  6.9E03  1342  1.1E01  2782  9.8E02  2411  2.0E01  2344  
QRS  SETBP1  18  40,438,804  40,898,771  2  6.2E10  1  98  11.5  1.1E03  360  1.2E01  2879  5.3E02  1292  1.0E02  93  
QRS  HEATR5BSTRN  2  36,835,565  37,370,391  8  1.9E09  1  102  10.4  1  6.7E03  103  4.5E04  175  6.7E03  102  1.5E02  283  4.6E02  548  2  2  4.0E03  276 
QRS  TKTCACNA1DPRKCD  3  53,099,055  53,356,653  5  6.3E09  1  63  6.4  2.6E02  5229  2.0E01  4877  2.0E01  4811  3.1E01  3659  
QRS  CRIM1  2  36,495,048  36,736,886  2  8.2E09  1  106  10.3  7.9E04  287  6.0E02  1464  5.2E02  1252  8.9E02  1072  
QRS  PRKCA  17  61,624,215  62,237,324  3  1.1E08  1  611  53.4  3.6E03  893  1.4E01  3571  1.2E01  2921  1.7E01  2053  
QRS  LRIG1SLC25A26  3  66,376,317  66,634,041  2  1.1E08  1  144  14.8  1.7E02  3353  2.1E01  5157  3.0E01  7296  3.4E01  4002  
QRS  KLF12  13  73,158,150  73,606,043  1  1.3E08  1  596  64.7  2.6E04  121  5.5E02  1338  5.6E02  1351  6.2E01  7218  1  3  1.7E03  228  
QRS  CASQ2  1  115,906,065  116,112,527  4  2.4E08  1  189  13.8  1  2.5E04  8  6.0E06  7  2.2E04  7  1.2E04  4  6.1E05  2  1  4  5.9E01  333 
QRS  DKK1  10  52,504,299  53,843,264  3  3.1E08  1  22  5.6  6.2E04  225  1.1E01  2708  1.0E01  2582  4.3E01  5041  
QT  NOS1AP  1  158,467,768  159,113,560  6  1.9E78 


















QT  GINS3CNOT1  16  56,983,924  57,325,641  7  3.0E25  1  101  12.6  1  4.2E05  6  1.0E06  7  4.2E05  6  8.0E06  4  9.7E05  8  3  4  4.5E05  42 
QT  c6orf204PLN  6  118,335,382  119,441,511  8  2.4E24 












4.2E06  3  2  3  5.7E02  326 
QT  KCNQ1  11  2,246,305  2,826,916  6  2.8E17  1  322  50.3  1  1.4E03  37  2.0E06  8  1.4E03  36  6.5E04  24  3.6E03  43  1  2  1.4E05  23 
QT  RNF207  1  6,020,646  6,460,521  13  1.0E16  5  20  6.7  1  5.0E06  4  2.9E07  4  5.0E06  3  8.0E06  4  1.8E05  4  5  3  8.6E06  19 
QT  KCNH2  7  149,820,442  150,340,230  20  5.0E16 






6.9E07  5  5.0E06  3 




5  6  5.3E06  16 
QT  ATP1B  1  165,633,513  166,331,065  8  1.2E15  2  332  19.2  2  4.6E05  7  1.2E07  3  5.4E05  7  2.2E05  6  3.2E03  36  4  6  5.2E10  2 
QT  LITAF  16  11,397,762  11,783,909  5  5.8E15  3  236  34.6  2  5.8E04  21  1.9E05  20  5.7E04  21  1.5E03  32  9.6E04  18  3  2  3.3E07  7 
QT  SCN5A  3  38,363,244  38,810,505  6  1.0E14  2  148  21.3  1  1.8E04  9  6.0E06  13  1.8E04  9  1.8E04  12  1.7E04  10  4  3  3.4E06  12 
QT  LIG3  17  30,279,055  30,618,866  11  6.0E12  3  47  11.5  1  1.2E05  5  1.0E06  6  1.1E05  5  2.6E05  7  2.9E05  5  5  5  1.7E05  24 
QT  KCNE1  21  34,658,193  34,909,252  5  2.0E08  1  163  13.9  2.0E02  4056  4.0E01  9626  3.1E01  7611  5.7E01  6628 
Due to the limited size of the ARIC cohort relative to the studies that generated the known positives, no method was expected to find all 38 known loci to be genomewide significant. Nevertheless, known positives should still rank high among the top predictions of each method, assessed by the ranks of the known positives at 40% recall (
While our conclusions are based on cardiovascular phenotypes, the results suggest that GWiS will have an advantage when causal genes have multiple effects. When an association is sufficiently strong to be found by a univariate test, GWiS is generally able to identify it. Beyond these association, GWiS is also able to detect genes that are genomewide significant, but where no single effect is large enough to be significant by univariate tests. The association of QRS interval with SCN5ASCN10A is a striking example: 4 independent effects are found by GWiS (pvalue =
GWiS correctly identifies the SCN5ASCN10A locus as genomewide significant with four independent effects, even though the strongest single effect has a pvalue 100
Of the 38 known positives, 20 have GWiS models with at least one SNP (regardless of genomewide significance), and 7 of these are predicted to have multiple independent effects (
Of 38 known positive loci, GWiS identified 20 loci, and 7 of these contain multiple independent effects.
In summary, we describe a new method for genebased tests of association. By gathering multiple independent effects into a single test, GWiS has greater power than conventional tests to identify genes with multiple causal variants. GWiS also retains power for lowfrequency minor alleles that are increasingly important for personal genetics, a feature not shared by other multiSNP tests.
Furthermore, GWiS provides an accurate, conservative estimate for the number of independent effects within a gene or region. Currently there are no standard criteria for establishing the genomewide significance of a weak second association in a gene whose strongest effect is genomewide significant. While the number of effects can be provided by existing Bayesian methods
The test we describe includes a prior on models designed to be unaffected by SNP density, in particular by the number of SNPs that are wellcorrelated with a causal variant. The priors on regression parameters are essentially uniform, with the benefit of eliminating any useradjustable parameters. A theoretical drawback is that the priors are improper
Bayesian methods can be computationally expensive. GWiS minimizes computation by evaluating only the locally optimal models of increasing size in a greedy forward search. This appears to be an approximation compared to previous Bayesian methods that sum over all models. Previous Bayesian methods entail their own approximations, however, because the search space must either be truncated at 1 or 2 SNPs, heavily pruned, or lightly sampled using Monte Carlo. Our results demonstrate that the approximations used by GWiS provide greater computational efficiency than approximations used in previous Bayesian frameworks, with no loss of statistical power. GWiS currently calculates pvalues, rather than Bayesian evidence provided by other Bayesian methods. If Bayesian evidence is desired, an intriguing alternative to Bayesian postprocessing of candidate loci might be to use the Bayes Factor from the most likely alternative model identified by GWiS as a proxy for the sum over all alternatives to the null model. This may be an accurate approximation because, in practice, the Bayes Factor for the most likely model from GWiS dominates all other Bayes Factors in the sum.
The GWiS framework, using gene annotations to structure Bayesian model selection, may be applied to casecontrol data by encoding phenotypes as 1 (case) versus 0 (control), a reasonable approach when effects are small. More fundamental extensions to logistic regression, Transmission Disequilibrium Tests (TDTs), and other tests and designs should be possible and may yield further improvements. Moreover, similar genebased structured searches can be applied to genetic models to include explicit interaction terms
This research involves only the study of existing data with information recorded in such a manner that the subjects cannot be identified directly or through identifiers linked to the subjects.
Known positive associations are taken from published genomewide significant SNP associations (pvalue
The ARIC study includes 15,792 men and women from four communities in the US (Jackson, Mississippi; Forsyth County, North Carolina; Washington County, Maryland; suburbs of Minneapolis, Minnesota) enrolled in 198789 and prospectively followed
The phenotype vector
The maximum likelihood estimators (MLEs) are
A conventional multiple regression approach uses the
A model
The factor
The prior probability of the model,
The remaining factor in Eq. 4 is
The integration limits and prefactor
The loglikelihood is then
As in the BIC approximation, we retain only terms that depend on the model and are of order
The strategy of GWiS is therefore to find the model that maximizes the objective function
The terms involving
GWiS is designed to select a single model for each gene. An alternative related approach would be to test for the posterior probability of the null model,
The effective number of tests is an established concept in GWAS to provide a multipletesting correction for correlated markers. While the exact correction can be established by permutation tests, faster approximate methods can perform well
The method we adopt is based on multiple linear regression of SNPs on SNPs. The genotype vector
This process continues until all weights are equal to zero. When SNPs with maximum weight are tied (as occurs for the first SNP processed), the SNP with lowest genomic coordinate is selected to ensure reproducibility; we have ensured that this method is robust to other methods for breaking ties, including random selection. For simplicity, the correlations are not updated (the update rule would be
The effective number of tests implies a trivial renormalization of the model prior, (Eq. 5), that does not affect the test statistic. Letting
We use two stages of permutation tests: the first stage converts the GWiS test statistic into a pvalue that is uniform under the null; the second stage establishes the pvalue threshold for genomewide significance.
The first stage is conducted genebygene. We permute the trait array using the FisherYates shuffle algorithm
The first factor is the parametric pvalue for the
While these pvalues are uniform under the null, the threshold for genomewide significance requires a second set of permutations. To establish genomewide significance thresholds, in the second stage we permuted the ARIC phenotype for each trait 100 times, ran GWiS for the permuted phenotypes on the entire genome, and recorded the best genomewide pvalue from each of the 100 permutations. We then combined the results from each trait to obtain an empirical distribution of the best genomewide pvalue under the null. We then estimated the p = 0.05 genomewide significance threshold as the 15th best pvalue of the 300. This procedure was performed for GWiS, minSNP, minSNPP, LASSO, and VEGAS to obtain genomewide significance thresholds for each. Since minSNPP and BIMBAM are both uniform under the null, we used the genomewide significance threshold calculated for minSNPP,
In a region with a strong association and LD, GWiS can generate significant pvalues for multiple genes in a region. A hierarchical version of GWiS is used to distinguish between two possibilities. First, through LD, a strong association in one gene may cause a weaker association signal in a second gene. In this case, only the strong association should be reported. Second, the causal variant may not be localized in a single gene; for example, the best SNP tags are assigned to multiple genes. In this case, the individual genes should be merged into a single associated locus. The hierarchical procedure is as follows.
Identify all genes with GWiS
Run GWiS on the merged locus (including a recalculation of the number of effective tests within the locus) and identify the SNPs selected by the GWiS model. If genes at either end of the locus have no GWiS SNPs, trim these genes from the locus. Repeat this step until no more trimming is possible. If only a single gene remains, accept it with its original pvalue as the only association in the region. Otherwise, proceed to step 3.
Use a permutation test to calculate the pvalue for the merged locus from step 2. Assign it a pvalue equal to the minimum of the pvalues from the individual genes, and the pvalue from its own permutation. Regardless of the pvalue used, retain the entire trimmed region as an associated locus.
The trimming in step 2 handles the first possibility, a strong association in one gene that causes a weaker association in a neighbor. The rationale for accepting the smallest pvalue in step 3 is the case of a single SNP assigned to multiple genes. The merged region will have a less significant pvalue than any single gene, and it does not seem reasonable to incur such a drastic penalty for gene overlap.
For these tests, SNPs are assigned to gene regions as before. The pvalue for each SNP is then calculated using the
The Bayesian Imputationbased Association Mapping (BIMBAM) is a Bayesian genebased method
The design matrix
The sufficient statistics used by BIMBAM are identical to minSNP and minSNPP, yet we found that the runtime of the public implementation was much slower, taking 270 sec for 1000 permutations of a gene with 135 SNPs across 8000 individuals. By improving memory management and optimizing computations, we improved the timing to 14 sec per 1000 permutations, a 19fold speedup. This implementation is included in our Supplementary Materials.
The Versatile GeneBased Test for Genomewide Association (VEGAS)
LASSO regression is a recent method for combined model selection and parameter estimation that maps
To reduce computational cost, univariate pvalues are estimated from parametric tests, and genebased SNPs with
As suggested previously, we used the Selection Index to rank genes and as the test statistic for a permutation pvalue
For each true model size of
We attempted to distribute the total population variance explained,
The power was calculated as (number of genes that are genomewide significant)/1000, and the error of the estimate was calculated using 95% exact binomial confidence intervals. The pvalue thresholds were taken directly from genomewide permutations (
Phenotypes that were used to estimate the model size were generated by assigning each “causal” SNP the same power of 0.1 and 0.8. The population variance explained for each SNP was calculated as
Only GWiS and LASSO give model size estimates. GWiS directly reports the model size as the number of independent effects within a gene and LASSO reports the model size as the number of selected SNPs within a gene. We ran both methods using the simulated data with LD. We also tested both scenarios when the causal SNPs were kept or removed from gene.
Gene associations were scored as true positives if the gene (or merged locus) overlapped with a known association, and as false positives if no overlap exists. Only the first hit to a known association spanning several genes was counted.
The primary evaluation criterion is the ability to identify known positive associations at genomewide significance. The genomewide significance threshold was determined separately for each method (see above), and no method gave any false positives at its appropriate threshold.
A secondary criterion was the ability to enrich highly ranked loci for known associations, regardless of genomewide significance. This criterion was assessed through precisionrecall curves, with precision = TP/(TP+FP), recall = TP/(TP+FN), and true positives (TP), false positives (FP), and false negatives (FN) defined as a function of the number of predictions considered.
Small differences in precision and recall may not be statistically significant. To estimate statistical significance, we performed a MannWhitney rank sum test for the ranks of the known associations at 40% recall for GWiS, minSNP, minSNPP, and LASSO.
GWiS runs efficiently in memory and CPU time, roughly equivalent to other genomewide tests that require permutations (
Memory (GB)  CPU time (Hours)  
Method  Phenotype  Null  Real  Null  Real 
GWiS  PR  1.2  1.2  9.4  43.1 
QRS  1.2  1.2  11.0  31.9  
QT  1.2  1.2  11.2  67.0  
minSNP  PR  0.6  0.6  13.6  62.0 
QRS  0.6  0.6  15.8  45.9  
QT  0.6  0.6  16.1  96.4  
minSNPP  PR  0.6  0.6  11.9  54.2 
QRS  0.6  0.6  13.8  40.1  
QT  0.6  0.6  14.0  84.3  
BIMBAM  PR  0.6  0.6  14.1  42.3 
QRS  0.6  0.6  16.5  33.2  
QT  0.6  0.6  16.8  101.5  
VEGAS  PR  32.5  8.2  26.0  34.0 
QRS  26.0  11.9  23.9  29.8  
QT  25.8  14.1  27.1  33.0  
LASSO  PR  0.1  0.1  0.2  0.4 
QRS  0.1  0.1  0.3  0.3  
QT  0.1  0.1  0.2  0.4 
Estimated power at genomewide significance for genotypes simulated without LD. Simulation tests were performed for true models in which a single gene housed one to eight independent causal variants. Genotypes were simulated with 20 SNPs per gene, no LD between SNPs, and minor allele frequencies selected uniformly between 0.05 and 0.5. Power estimates are provided for VEGAS (green), GWiS (black), minSNPP (blue), BimBam (blue dashed), and LASSO (red). While VEGAS performs well in the absence of LD, its performance degrades under realistic LD (see main text,
(TIF)
Number of SNPs and effective number of tests per gene. The number of SNPs and effective tests per gene are displayed as a density plot for (a) chromosome 1 and (b) the autosomal genome. While on average genes have 70 SNPs and 9 tests, large genes can have over 1000 SNPs and 100 tests.
(TIF)
Precisionrecall curves for recovery of known associations. Precision and recall for recovery of 38 known associations are shown for GWiS (black), minSNP (thin blue), minSNPP (thick blue), BIMBAM (dashed blue), LASSO (red), and VEGAS (green). Ranking is by pvalue for GWiS, minSNP, minSNPP, and VEGAS, and by Selection Index for LASSO. The tails of the curves for GWiS and LASSO are truncated when remaining loci have no SNPs entered into models, which occurs close to 50% recall. Triangles indicated the last genomewide significant finding from each method.
(TIF)
Number of identified genomewide significant loci. Results are reported for 20 kb and 100 kb flanking transcription boundaries. G: GWiS, S: minSNP, SP: minSNPP, B: BIMBAM, V:VEGAS, L: LASSO. *BIMBAM was only tested for 20 kb. **VEGAS is hardcoded to use
(PDF)
Genomewide significance thresholds calculated by permutation. Results are reported for 20 kb and 100 kb flanking transcription boundaries. Thresholds for GWiS, minSNP, minSNPP and VEGAS are for pvalues. Threshold for LASSO are for the selection index. The thresholds for minSNP and LASSO decrease because the larger threshold implies more tests. GWiS and minSNPP already include a correction for the number of tests within a gene, and thresholds are somewhat less stringent for longer gene boundaries. *BIMBAM uses the threshold from minSNPP because both tests provide genebased pvalues with identical uniform distributions under the null. **VEGAS is hardcoded to use
(PDF)
Top associations for PR interval. The top 100 associations are reported for GWiS, minSNP, minSNPP, BIMBAM, VEGAS, and LASSO. The locus name concatenates the named genes within the start and end positions indicated. Additional columns provide the number of SNPs, the effective number of tests, the number of independent associations within the region (
(XLS)
Top associations for QRS interval. The column information is the same as for
(XLS)
Top associations for QT interval. The column information is the same as for
(XLS)
The Atherosclerosis Risk in Communities Study is carried out as a collaborative study. The authors thank the staff and participants of the ARIC study for their important contributions.