Association Testing Strategy for Data from Dense Marker Panels

Genome wide association studies have been usually analyzed in a univariate manner. The commonly used univariate tests have one degree of freedom and assume an additive mode of inheritance. The experiment-wise significance of these univariate statistics is obtained by adjusting for multiple testing. Next generation sequencing studies, which assay 10-20 million variants, are beginning to come online. For these studies, the strategy of additive univariate testing and multiple testing adjustment is likely to result in a loss of power due to (1) the substantial multiple testing burden and (2) the possibility of a non-additive causal mode of inheritance. To reduce the power loss we propose: a new method (1) to summarize in a single statistic the strength of the association signals coming from all not-very-rare variants in a linkage disequilibrium block and (2) to incorporate, in any linkage disequilibrium block statistic, the strength of the association signals under multiple modes of inheritance. The proposed linkage disequilibrium block test consists of the sum of squares of nominally significant univariate statistics. We compare the performance of this method to the performance of existing linkage disequilibrium block/gene-based methods. Simulations show that (1) extending methods to combine testing for multiple modes of inheritance leads to substantial power gains, especially for a recessive mode of inheritance, and (2) the proposed method has a good overall performance. Based on simulation results, we provide practical advice on choosing suitable methods for applied analyses.


Introduction
Genome-wide association studies (GWASs) have been broadly used to test for association between genetic variants and various phenotypes. These studies have been quite successful in identifying numerous single nucleotide polymorphisms (SNPs) associated with a variety of human traits and diseases [1]. So far, GWASs have been commonly analyzed univariately using one degree of freedom (df) tests assuming an additive mode of inheritance [2][3][4]. The experiment-wise significance of the univariate statistics was assessed using a Bonferroni adjustment [5,6] or a permutation procedure [7][8][9]. While this approach was reasonably successful for GWAS, the field is moving away from this paradigm towards whole genome sequencing. When compared to GWAS, variant panels for sequencing studies (1) are substantially denser and (2) have different patterns of linkage disequilibrium (LD). Consequently, for these studies it is not clear if the most desirable approach is still a univariate testing for an additive mode of inheritance followed by the adjustment of the statistics for the large number of tests.
Intuitively, a test for association between phenotype and genotype achieves optimal power when the assumed mode of inheritance matches the underlying one. However, the underlying mode of inheritance is usually unknown in practice and an incorrect choice for it may cause a substantial power loss. Various strategies to minimize the effect of the possible model misspecification have been studied and developed [10][11][12][13][14][15][16][17]. Among these, one simple strategy is to test other modes of inheritance, e.g. dominant and recessive, in addition to the commonly used additive mode of inheritance [16,17]. This approach involves testing for three different modes of inheritance, i.e. additive (A), dominant (D) and recessive (R), and adjusting the lowest p-values for multiple testing. For brevity, the combination of testing for the three modes of inheritance will be henceforth denoted as ADR. Because the ADR paradigm is not in widespread use yet, it is of interest to estimate the performance improvement when applied to methods which were initially developed assuming an additive mode of inheritance.
To control the rate of false positives in GWAS analyses, the statistical significance of univariate p-values is adjusted for around a million univariate tests. With the advent of next generation sequencing, for univariate analyses, the number of tests will increase dramatically when compared to GWAS. Summary data from 1000 Genomes Project suggests that sequencing studies consisting of subjects from any main ethnic regions, i.e. Europe, East Asia, South Asia, Africa and the Americas, will result in the typing of at least 5 million SNPs having a minor allele frequency (MAF) above 5% [18]. For larger sequencing studies, if we assume that all SNPs with MAF > 0.5% are analyzed individually, the number of tests increases to more than 10 million for Caucasian and Asian cohorts and approaches 20 million for African cohorts. Consequently, univariate approaches entail an even more substantial multiple testing adjustment burden for sequencing studies. It is conceivable that using multilocus approaches, e.g. by summarizing the association of multiple SNPs simultaneously, opens the possibility of decreasing the multiple testing burden and, thus, increasing the power of detection for association signals.
To minimize the power penalty due to multiple testing adjustment, researchers proposed to analyze simultaneously all SNPs in a biological functional block of interest, e.g. a gene [19]. However, this approach might yield low power due to the large number of df. Subsequently, researchers proposed methods to decrease the number of df, and thus increase power, by (1) summarizing the LD information mainly from lowfrequency SNP variation in an LD block [20], (2) using dataadaptive sum of squared scores (aSUM) [21], (3) summarizing the LD information by the first few principal components (PC) [22,23], (4) combining the Simes p-value of all univariate pvalues in a gene with the p-value associated with the first few principal components of tests in the gene(S-PC) [24] and (5) combining individual SNP-based variance component score statistics (SKAT) [25].
Recently, two fast non-regression based multilocus methods were proposed for gene-based analysis. The first method, denoted as VErsatile Gene-based Association Study (VEGAS), summarizes the association signals in a gene using the sum of squares (SS) or the minimum p-value (minP) of univariate statistics in the gene [26]. Instead of performing permutations, VEGAS simulates multivariate normal variables for a rapid assessment of the asymptotic null distribution of summary statistics. The second method is an improved Simes procedure for association studies, denoted as Gene-based Association Test using Extended Simes (GATES) [27]. GATES is an extension to a technique proposed initially by Cheverud [28] to determine the effective number of independent tests in a region of interest. (For more background regarding this type of methods, see 28-32.) Because VEGAS-SS (V-SS), VEGAS-minP (V-minP) and GATES do not use permutations, they are faster than permutation based multilocus methods.
While the gene is commonly considered as the biological functional unit, it might not be the best unit for statistical analysis. First, since not all parts of a gene are equally important functionally, it would be more powerful to analyze separately SNPs in regions with important functions such as gene promoter regions, which are known to be involved in initiating and regulating the transcription process [33,34]. Second, the association signal from a causal SNP, e.g. from the promoter region, is diffused only among SNPs in the same LD block [20]. Thus, using as the unit of analysis a gene containing multiple LD blocks might increase the noise and reduce overall statistical power to detect an association signal. An alternative unit of statistical analysis might be a LD block (e.g. SNPs with D ′ near 1).
Unlike GWAS, which are based on a tag SNP approach, whole genome sequencing studies assay most genetic variants in the genome. Consequently, the typed variation in these studies will likely form a large number of LD blocks, each block having a large number of SNPs. Given the number and size of these blocks, it is conceivable that approaches analyzing simultaneously all SNPs in a LD block might be of great importance for a successful investigation of sequencing studies. Although the partition of the genome into LD block requires a substantial computing time, once they are computed for the main ethnic groups, these blocks can be reused for future analyses. While the LD block analysis approach was investigated before [17], these findings need to be updated for the present-day technological and methodological environment, i.e. increased variant density and the new developments in gene based methods.
In this paper, we attempt to develop/identify the most powerful methods to detect association signals in a LD block. To achieve this goal, we (1) propose a simple statistic consisting of the sum of the squares of significant univariate tests in a LD block, (2) use two simulation experiments to assess the performance of the proposed method and competing methods a) with or b) without an ADR extension and (3) provide practical recommendations based on simulation results.

Methods
Assume that we are interested in assessing the association between a binary phenotype, Y, and m SNPs in a LD block using a case-control cohort consisting of n cases and n controls. For the i th subject, Certain gene based approaches, such as V-SS, sum the squares of all univariate statistics in a LD block. However, this approach might lose power by including SNPs with weak association signals, which only add noise to the test statistic. V-minP also might suffer some statistical power loss by using only the most significant statistic. To avoid such a loss of power we propose a new statistical test consisting of the sum of squared statistics exceeding a threshold, i.e.∑ j=1 where t>0 is a reasonably high threshold. We denote this test statistic as the sum of square above a threshold (SS-T). The statistical significance of SS-T can be assessed in a manner similar to V-SS and V-minP, i.e. via multivariate normal simulations based on the LD pattern in the block. We compare the performance of the proposed test (SS-T), to the performance of various association tests developed mainly for an additive mode of inheritance and, where possible, their ADR extensions. As association tests, we include in our simulation studies Bonferroni, Simes [35], GATES [27], V-SS/V-minP [26], PC [22,23], S-PC [24], SKAT [25] and aSUM [21]. In our comparisons we also include the ADR extensions  (1) ,...,p A,(j) ) [27]. V-SS and V-minP compute the sum of squares, ∑ j=1 m Z A, j 2 , and minimum pvalue (which is equivalent to max j Z A, j 2 ) of the univariate statistics and assess their significance by simulating their null distributions based on the multivariate normal distribution [26]. The test statistic for the PC method, Q k , is defined as the sum of squares of the first k PC statistics, denoted as U 1 ,…,U k , of the genotype correlation matrix. The j th PC statistic is defined where v j and λ j are the j th eigenvector and eigenvalue of the correlation matrix of genotype data G A,⋅⋅ [22,23]. Under the null hypothesis, of no association between genotype and phenotype, Q k is distributed as a χ 2 with k df. In our comparison studies, we consider the first three PC statistics, i.e. k=3. The p-value for S-PC is obtained by performing a Simes correction on the p-values generated from (i) a Simes procedure applied on p A and (ii) the above PC method [24].
The ADR extension for methods in the previous paragraph can be achieved similarly by substituting: (i) 3m for m as the number of tests in the LD block, (ii) ADR p-values (p ADR =(p A ,p D ,p R )) for p-values assuming an additive mode of inheritance (p A ) and (iii) ADR univariate statistics (Z ADR =(Z A ,Z D ,Z R )) for univariate statistics assuming an additive mode of inheritance (Z A ). While the three tests (A, D and R) are not independent, both within and between SNPs, given that the power loss is small [17], we conservatively use 3m as the number of tests for the ADR extension for Bonferroni and related methods.
We implemented in R all methods tested in this paper, with the exception of SKAT and aSUM. SKAT statistics were obtained using SKAT 0.63 R package with a linear kernel and the default options. For aSUM, we used its implementation from AssotesteR 0.1-1 R package (http:// www.gastonsanchez.com/assotester) with the default options.

Simulations
We employ two extensive simulation experiments to generate genotype-phenotype data sets that we subsequently use to assess the performance of relevant methods and their ADR-extensions. The first simulation experiment generates artificial LD-patterns (see Tables S1-S3) and the second experiment is based on LD-patterns from a real data set (see Tables 1 and 2).
To efficiently simulate a large number of correlated SNPs in a LD block, the first simulation experiment (Experiment I) employs the polychoric correlation (PCC) as a measure of LD [17,36]. The advantage of PCC over other LD measures is that it is directly applicable to simulating a large number of markers in LD regardless of their reference allele frequencies (RAFs) [17]. The simulation of a study cohort is achieved using a fourstep process: i) simulate latent multivariate normal variables, ii) discretize the latent variables to obtain genotypes, iii) simulate the phenotype, i.e. case or control, for the simulated genotype vector and iv) accept cases and controls until achieving the required sample size. In this experiment, we investigate three settings for the number of causal variants (k): i) null hypothesis, i.e. no causal variant (k=0), ii) a single causal variant (k=1) and iii) two non-interacting causal varaints (k=2). We present the simulation process in more detail under Supporting Information (see Methods S1). The second simulation experiment (Experiment II) is based on 200 randomly chosen 250Kb genetic regions from UK10K (www.uk10k.org) reference data set. LD blocks are inferred by performing a hierarchical clustering analysis with an average link on SNP genotypes in each selected region. To match one of the settings from the first experiment, the LD blocks are defined using a PCC 2 threshold of 0.64. The causal LD block and the causal variant(s) within it are chosen randomly and, thus, their RAFs are not fixed. When compared to Experiment I, to increase robustness of the findings, we add to this simulation experiment a five non-interacting causal variant scenario (k=5). Otherwise, the simulations for this experiment follow the conceptual flow of the first experiment.
To examine the effect of the causal model on the power of various methods, we simulated case-control cohorts under three underlying modes of inheritance, i.e. additive, dominant and recessive (see Tables S2 and S3 for Experiment I and  Tables 1 and 2 for Experiment II). Under k causal variant scenario (k>1), the underlying genetic modes of inheritance of all k casual SNPs are assumed to be identical (for more details see Methods S1). Each case-control sample consists of 1,000 cases and 1,000 controls for most settings (see Tables S2 and  S3 for Experiment I and Tables 1 and 2 for Experiment II). To attain the 50% average power target, for Experiment I the genotype penetrances and the sample sizes vary with the mode of inheritance and the causal allele frequency (see Tables S2 and S3).
For both experiments, we assume a binary trait of prevalence K=0.05 and assess the empirical size of the test and power for tested methods at a type I error of0.05. We obtain the size and power of each method by simulating 500 replicates at each setting. For Monte-Carlo simulation/ permutation based methods, such as V-SS, V-minP, SS-T and aSUM, we performed 500 and 1,000 simulations under Experiment I and II, respectively.

Results
For Experiment I, under the null hypothesis (of no association between SNP genotypes and trait) some tests (V-SS, V-minP, SS-T and aSUM) using Monte-Carlo simulation/ permutation to assess the significance of the test statistic seem to have slightly (albeit not-statistically significant) inflated size of the test ( Figure S1). However, when we increase to 1000 the number of replications for Experiment II, all tested methods control the size of the test at or below the nominal type I error (Figures 1). As a general rule, ADR extensions tend to make most methods more conservative. Simpler adjustments for multiple comparisons, e.g. Bonferroni and Simes, their ADR extensions and SS-T are the most conservative.
To find a reasonable threshold value for SS-T we use the results of Experiment I and II, which are summarized in Figures  Figure 2, respectively. Under the first experiment, as the threshold value increases from 1 to 9, power tends to rise. This behavior is more apparent especially when 1) the mode of inheritance is recessive, 2) the tests are ADRadjusted, 3) the size of the LD block is small and 4) the polychoric correlation between the genotypes in the LD block is high. Under Experiment II, where we use realistic LD blocks from UK10K data set, power decreases with an increase in threshold values. Though under the dominant and recessive modes of inheritance power tends to slightly increase as the threshold value increases from 4 to 6, the improvement in power is marginal compared to that observed under Experiment I. On the basis of the results from both comparisons, we deem SS-6 to be close to optimal power-wise and, henceforth we present only its performance.
Under Experiment I conditions, the single causal variant simulations show differences in power between the ADR adjusted methods and the non-ADR-adjusted methods ( Figures  S8-S10). When the underlying modes of inheritance are additive or dominant, the ADR-adjustment causes a small power loss averaging 4.9% (see "Additive" and "Dominant" panels in the above mentioned figures). However, for a recessive mode of inheritance, the adjustment considerably improves the power to detect the association signal for most settings. The average ADR adjustment power gain under a recessive mode of inheritance is around 34.3%, ranging from 6.9% for PC to 51.8% for V-minP. Under a dual causal variant scenario, the differences in power between methods and their ADR extensions are similar to the single causal variant scenario (Figures S11-S13). ADR adjustment results in a 5.3% decrease in average power for non-recessive models and a 30% increase in average power for a recessive mode of inheritance. The average power gain of the methods under a recessive mode of inheritance ranges from 8.9% for V-SS to 40.5% for V-minP. The power gain for both (single and dual) causal variant scenarios under a recessive mode of inheritance appears to increase with i) an increase in the polychoric correlation between SNPs in the LD block, ii) a decrease in the size of the LD block and iii) a decrease in causal allele frequency (CAF).
Besides the difference in power between methods and their ADR extensions, it is of interest to establish which methods perform better under varying scenarios. For the first experiment, aSUM tends to have the highest power under additive/dominant modes of inheritance and it outperforms, sometimes considerably, the next best performing group (V-minP, GATES, SS-6 and their ADR extensions) (Figures S8-S13). Under additive/dominant modes of inheritance, as the CAF increases the difference in power between methods gradually lessens while the rank of each method tends to be maintained. For a recessive mode of inheritance, ADR V-minP and ADR GATES perform best overall and are followed by ADR Bonferroni, ADR Simes, ADR S-PC and ADR SS-6. These methods substantially outperform all non-ADR-adjusted methods, ADR V-SS and ADR PC. As CAF increases, under a recessive mode of inheritance, the performance of ADR SS-6 approaches that of the best performers (ADR V-minP and ADR GATES). For Experiment II, under the additive mode of inheritance SKAT, V-SS, V-minP and SS-6 performed best (Figure 3). Under the non-additive mode of inheritance, ADR SS-6, ADR V-SS and ADR V-minP tend to have highest power. While, as we observed from Experiment I, under the additive model ADR adjustment results in slight power loss, under the dominant model it leads to slight power gain. Under the recessive model, ADR adjustment improves power significantly. When compared to Experiment I, SKAT performs much better under Experiment II conditions. On the other hand, aSUM performs rather poorly when compared to its excellent performance observed in the first simulation experiment. Even more, under a recessive mode of inheritance, aSUM had the lowest power in the second experiment.

Discussion
When compared to GWAS, whole genome sequencing studies assay tens of millions of genetic variants in human genome. Due to their large number, a univariate analysis of these variants involves a substantial multiple testing burden. To avoid such a burden, we propose a new method to summarize in a single statistic the association between phenotype and the genotypes of not-very-rare SNPs, i.e. those which can be reasonably analyzed individually, in a LD block. We use two simulation experiments to compare the performances of i) the proposed method, ii) competing methods, e.g. methods used for gene-based analyses and iii) the extensions of the previously mentioned methods which combine information from multiple modes of inheritance. The results of these simulations are helpful in identifying methods delivering close to optimal power when used to analyze data coming from sequencing studies.
One important conclusion of this paper is that, even for denser marker panels, when the true mode of inheritance is unknown, combining additive, dominant and recessive modes of inheritance together (i.e. ADR adjustment) is a suitable strategy to minimize the power loss caused by the model misspecification. The power gain of ADR adjustments over their non-ADR adjusted counterparts is due to the considerable improvement in the overall performance of the methods when the underlying mode of inheritance is recessive. The power gain of ADR extensions increases with an increase in LD between SNPs and a decrease in causal allele frequency. This behavior is consistent with findings for GWAS panels [16,17].
Under the theoretical experiment (Experiment I) with additive and dominant modes of inheritance, for the rarer causal allele frequencies range we assumed, aSUM performs best across all configurations closely followed by V-minP, GATES, SS-6 and their ADR extensions. Under a recessive mode of inheritance, the most powerful methods are ADR V-minP and ADR GATES, which are closely followed by ADR Bonferroni, ADR Simes, ADR S-PC and ADR SS-6. Probably because it assumes additively coded genotypes, aSUM yields low power under a recessive mode of inheritance. However, the realistic experiment (Experiment II) shows that SKAT and V-SS perform well. We believe that the relative difference in both experiments is due to the fact that the first experiment is biased towards common tag SNP panels, whereas the second simulation experiment is geared towards denser, sequencing SNP panel.
Based on our simulation results, if the disease associated SNPs are highly likely to be acting additively (multiplicatively) or dominantly, for data coming from sequencing panels, we recommend the use of SKAT, ADR V-SS, ADR V-minP, and ADR SS-T. However, researchers rarely know the mode of inheritance for a variant. When no prior information regarding the underlying mode of inheritance is available, we recommend the use of methods with good performance across all modes of inheritance and all simulation experiments, i.e. ADR V-SS, ADR V-minP, ADR GATES and ADR SS-T. However, we