Development and Application of Genomic Control Methods for Genome-Wide Association Studies Using Non-Additive Models

Genome-wide association studies (GWAS) comprise a powerful tool for mapping genes of complex traits. However, an inflation of the test statistic can occur because of population substructure or cryptic relatedness, which could cause spurious associations. If information on a large number of genetic markers is available, adjusting the analysis results by using the method of genomic control (GC) is possible. GC was originally proposed to correct the Cochran-Armitage additive trend test. For non-additive models, correction has been shown to depend on allele frequencies. Therefore, usage of GC is limited to situations where allele frequencies of null markers and candidate markers are matched. In this work, we extended the capabilities of the GC method for non-additive models, which allows us to use null markers with arbitrary allele frequencies for GC. Analytical expressions for the inflation of a test statistic describing its dependency on allele frequency and several population parameters were obtained for recessive, dominant, and over-dominant models of inheritance. We proposed a method to estimate these required population parameters. Furthermore, we suggested a GC method based on approximation of the correction coefficient by a polynomial of allele frequency and described procedures to correct the genotypic (two degrees of freedom) test for cases when the model of inheritance is unknown. Statistical properties of the described methods were investigated using simulated and real data. We demonstrated that all considered methods were effective in controlling type 1 error in the presence of genetic substructure. The proposed GC methods can be applied to statistical tests for GWAS with various models of inheritance. All methods developed and tested in this work were implemented using R language as a part of the GenABEL package.


Introduction
Genome-wide association studies (GWAS) are a powerful tool for mapping genes of complex traits. Standard statistical methods used for GWAS, such as linear regression, assume that the correlation between a phenotype and a genotypic marker exists because of the marker itself or a strong linkage disequilibrium with the causative locus. This assumption holds when the sample consists of representatives of one panmictic population. However, other correlations caused by confounding factors that influence both phenotypes and genotypes of various loci are possible. In GWAS, the genetic substructure of the studied samples is among the most important confounders. If the analysis is not accounted for confounding by population substructure, the test statistic is inflated [1], which makes its statistical interpretation difficult and may lead to false-positive findings.
If information on a large number of genetic markers is available, the analysis results can be adjusted by accounting for the influence of non-specific effects by using the genomic control (GC) method. Several methods have been proposed for GC adjustment [1][2][3][4][5]. Devlin and Roeder [1] suggested the use of a correction coefficient, denoted as variance inflation factor (VIF), to correct the distribution of the test statistic. In general, the VIF has been demonstrated to be a function of marker allele frequencies and population parameters [1]. It has also been deduced that for an additive model, the VIF does not depend on allele frequency. Thus, for an additive model, the ''GC inflation factor'' constant, l, can be empirically estimated from null (not associated) loci. Note, however, that for smaller allele frequencies and smaller samples asymptotic assumptions will not hold, and, consequently, the inflation of the test statistic will depend on allele frequencies even for additive model.
Several estimators of the Genomic Control inflation constant l could be used. For example, the mean test statistic is an estimator of l, which, however, suffers from being strongly affected by outliers (e.g., from true association signals). The median estimator (l median ), which is defined as ratio of the median of the observed distribution of the test statistic and 0.455 (the median of the x 2 df~1 distribution) [1], is probably the one used most. Another estimator can be defined as regression coefficient of the observed test statistic onto the statistic expected for the null loci (regression estimator l regress ). This estimator arises from the simple observation that the covariance between two ordered random variables one of which is distributed as x 2 df~1 and another as l*x 2 df~1 is equal to 2*l, while the variance of the expected distribution is 2. All of these estimators are constants that we can use as indicators of statistical bias or as coefficients allowing correction of the observed test statistic.
The general formulation of the VIF [2], in principle, allows for extension of GC to dominant and recessive models. However, for the non-additive model, the VIF depends on allele frequency and a number of parameters that describe the genetic structure of the sample. Thereby, it is possible to estimate the VIF empirically (as for additive model) if the allele frequency of null loci is the same as for the test locus (specific VIF for each allele group), but in this case the number of available null markers is limited and thus limits the applicability of the GC method. An alternative way requires estimation of the population structure parameters. The methods, which infer population structure and assign individuals to populations [6] are computationally extensive.
Another method for empirical VIF estimation was suggested by Zheng et al. [3] for .a two degrees of freedom (2df) model, which does not constrain the relation of phenotypes and genotypes and does not impose severe restrictions on the weight of the heterozygous genotype. This ''robust GC'' method was based on combining the corrected test statistics from dominant and recessive models [3]. Yet another method of correction -delta decentralization (based on centralization of the non-central chi-square)was proposed by Gorroochurn et al [7], but was later invalidated by Dadd et. al. [8].
In this work, we aimed to develop and evaluate existing methods for GC correction of results of GWAS using non-additive (recessive, dominant, over-dominant, and 2df genotypic) models. Therefore, we concentrate on several points: formulation of VIF expressions for various models with one degree of freedom (1df) and development of VIF-based procedures for GC correction of the results of these models; estimation of model parameters describing the population substructure for VIF estimation; development of a new ''polynomial'' GC (PGC) method based on a polynomial approximation of the correction coefficient that can be applied for both one-and two-degree tests. All methods were tested using simulated and real data.

VIF for non-additive models
We derived the VIF as function of allele frequency (p), model of inheritance (x indicates the effect of the heterozygous genotype; for recessive, additive, and dominant model, x is equal to 0, 0.5, and 1, respectively), sample size (N), and population parameters. The over-dominant model (effect of genotype is equal to 0 for homozygotes and to 1 for heterozygotes) is described separately. Population parameters include the Wright's coefficient of inbreeding F (ranging from 0 to 1) [9] and a coefficient that describes the population substructure, K~P a k {b k ð Þ 2 (K §0), where a k and b k are numbers of representatives of each of the subpopulations in case and control samples, respectively. In reality, the mean inbreeding F takes a values of ,0.01 for most populations, but can reach values of 0.04 for highly consanguineous populations [10]. The value of K/N 2 approaches zero when the design is balanced (e.g. case:control ratio is 1:1 in each subpopulation) and approaches its maximum of 1/2 when either cases or controls only are sampled from each subpopulation.
The VIF is obtained as , where G i is the marker genotype of the i-th case (G i M {0,1,2}). Var(G j ) and cov(G i ,G j ) is defined as: respectively. The derivations and detailed formulas for the VIF are provided in the Supplementary Note S1. Figure 1 presents the VIF function for a set of population parameters (F = 0.05; N = 1,000; K = 11,000). This figure shows that the VIF is independent of allele frequency only for the additive model (x = K), which has been demonstrated previously [2]. The function is point symmetric at x = K -recessive model is mirror image of dominant. Also for x tending to infinity, it approaches -as expected -the function for the over-dominant model of inheritance.
Several conclusions could be drawn from the analysis of the VIF function ( Figure 1). First, the VIF of an additive model is always greater than the one of non-additive models, which we also observed in the analysis of simulated data for uncorrected tests (Table 1). Second, the application of a naive correction by a constant to the results obtained from the non-additive model GWAS can fix the ''average'' type 1 error to the nominal level; however, for several markers, the test will be conservative and for others, it will be liberal. For example, for the dominant model, such a ''correction'' will lead to a liberal test for low frequency SNPs (single nucleotide polymorphisms) and to a conservative test for common SNPs. These results are confirmed by our simulations for constant correction tests (Table 1). Although the correction by a constant generally keeps the type 1 error rate to a pre-defined threshold, this is not true for SNPs in a particular frequency group.

Estimation of VIF parameters
Methods for VIF estimation require knowledge about parameters that describe the population substructure. If these parameters, such as F and K, are not known, estimates can be utilized. Estimation is based on the idea that the distribution of the analysis test statistic should follow x 2 df~1 after correction. Thereby, estimating unknown function parameters is possible by minimizing a chosen error function that indicates the deviation of the observed distribution from the expected one. We used a sum of squared deviations of the ordered corrected statistics (Z 2 ) and the theoretically expected distribution as an error function: Note that only the population parameters F and K should be estimated, whereas N (sample size), M (number of SNPs), and p and x are defined by the data and the analysis model. This method was denoted as VIFGC.

Polynomial GC
We also propose a polynomial GC for non-additive models, which approximates the correction function via an l-degree polynomial of allele frequency p: For estimation of the coefficients a i , we used the same idea as with the estimation of parameters F and K in method VIFGC, that is that the corrected statistic Z 2Ã~Z 2 l(p) should be distributed as x 2 . We denote this method as PGC. Empirically, we decided to use third degree polynomials during the optimization. These two refined strategies, VIFGC and PGC, were compared with the standard GC method of dividing test statistics by a constant l. We used simulated and real data to evaluate the type 1 error and power of the methods. Type 1 error was characterized in three ways: l median , which is the ratio of the observed distribution's median and the expected median (0.455); l regress , which is the regression coefficient between the observed statistic's distribution and the theoretically expected x 2 statistic; and E, which is the proportion of tests with p-value # nominal level. We also characterized the type 1 error in five specific marker allele frequency groups:

Simulation results
Simulation details could be found in the ''Materials and methods section''. Simulation results for type 1 error for 1df tests are presented in Table 1. As discussed above, constant corrected tests have significant deviations from expected values of type 1 error for several allele frequency groups for non-additive models. Unlike the constant correction method, the PGC and VIFGC methods have a type 1 error close to expected values both for all SNPs together and for specific frequency groups.
Simulation results for type 1 error for 2df tests are presented in Table 2. This table shows that constant corrected tests have slightly liberal type 1 error with a behavior similar to the additive model, and the inflation does not depend on allele frequencies.
The method based on 1df VIFGC-corrected tests is strongly conservative (the type 1 error is lower than the nominal level). PGC corrected tests have type 1 error levels close to the nominal level.
The power of different methods is shown in Table 3. It showed that all methods for correction including VIFGC and PGC have optimal power when the correct model (the one used in the simulations) is also used for analysis. As expected, the 2df genotypic test has less power but is robust compared with the model used for simulation.

Application to real data
Real data application on two independent cohorts, namely, Cooperative Health Research in the region of Augsburg (KORA) and Erasmus Rucphen Family (ERF), provided the opportunity to test our methods in situations that are not reflected in our simulation study. In both studies, we analyzed imputed genotypes (expressed as estimated probabilities) and quantitative traits using linear regression methods. While our previous derivations and results concern binary traits, an important previous observation for the GC was that the method can be applied in the framework of quantitative trait regression analysis models as well [11]. It should be noted, that for genome-wide analyses in ERF, mixed model based methods are used [12]; here, however, we use ERF as an example of highly genetically structured population, and analyze it using fixed effects-only model. In KORA, we analyzed uric acid levels, and in ERF, levels of high-density lipoprotein (HDL) (see ''Materials and Methods'' section for more details). Table 4 shows the results of type 1 error analysis in ERF, a family-based study where the association is strongly confounded by the genetic structure if naïve analysis is applied. When an additive model is used without correcting for genetic structure via mixed models, l for HDL is 1.2. For non-additive models, we reproduced the same principal findings that we obtained from simulated data by using these real data. Correction by a constant factor results in a conservative test for some frequency groups and in a liberal test for other frequency groups, whereas VIFGC and PGC corrections yield accurate levels independent of the marker allele frequency.
Type 1 error results in KORA, a population-based study where stratification is minimal, are presented in Table 5. When an additive model is used, we observe that l is only 1.03 for the quantitative trait uric acid. For non-additive models, where the genetic structure is close to absent in this data set, we still reproduced the same principal findings as observed in our simulations and analysis of ERF data.

Discussion
We demonstrated by simulations and the analysis of real data that the proposed GC methods (VIFGC and PGC) could be used for the correction of non-additive test statistics in the context of GWAS assuming different models of inheritance. For additive models, widely used in GWAS, there are two applications of GC methods. Firstly, the inflation coefficient l can be used to correct the test statistic, thereby making an interpretation of p-values statistically valid. Secondly, l serves as an important indicator of goodness of the model used for association analysis. Although no specific threshold is available, as a rule, if the inflation of the test statistics is relatively large, this reflects the fact that the model chosen for analysis poorly accounts for the genetic structure present in the sample. In that case, the analysis model should be revised, e.g., instead of standard linear regression, the use of a such methods as structured association, EIGENSTRAT [13] or mixed models [14][15][16] should be considered. Note that even after the most advanced analysis model is used, some residual inflation may be expected. This residual inflation is usually corrected by the GC, because even minor inflation still can lead to much increased false positive rate in GWAS. For example, at l = 1.05, when the test statistic is not corrected, the x 2 threshold of 29.72 (p-value = 5*10 28 in case the statistic is not inflated) corresponds to pvalue = 1*10 27 , that is the false positive rate is increased by more than two times. This correction is also very important when metaanalysis of multiple GWAS is performed [17] because a small residual inflation, when not corrected, can lead to very large inflation in the final meta-analysis test statistic.
In our examples involving analysis of real phenotypes, the main use of GC in ERF is the use as indicator. Although nominal type I error can be achieved with the GC, GWAS should be performed with mixed models in this population, and GC should be used only to correct residual inflation. Analysis of KORA, which is a carefully designed population-based study with little stratification,  provides a more realistic example of a case when the GC method should be used ''directly''. Most GWAS performed up-to-date have used an additive model, and GC is an essential part of the analysis procedure. Methods for GC for non-additive models are much less developed. This obstructs correct analysis, meta-analysis, and interpretation of the results of such GWAS. Despite weaker development of the methodological base, some works reported interesting findings based on the analysis of non-additive models [18][19][20][21].
In this work, we proposed new and study existing methods of GC for non-additive models. We demonstrated that the VIFGC and the PGC method can be used to correct the results of GWAS obtained by using dominant, recessive, over-dominant (one degree of freedom), and genotypic (two degrees of freedom) models. We show that in general, for 1df models, both VIFGC and PGC perform equally well, whereas for the 2df model, the test based on a 1df VIFGC-corrected statistic results in a conservative test. Thus, for the genotypic model, the PGC correction may be preferred. These methods have a variance of the type 1 error in all frequency groups that is significantly less than that observed when constant correction GC is applied (p,10 220 , see Table S4). It should be noted that for the PGC method we could in principle use an exponential function instead of a polynomial, but using of exponential function restricts the available models to the recessive and dominant only. Using polynomial functions eliminates restrictions for using PGC for other models, such as overdominant and the 2df genotypic ones.
However, the quality of approximation comes at the costs of time whereas the correction of the additive model's results is computationally very simple, the VIFGC and PGC corrections include parameter optimization steps. Still, even PGC that requires optimization of four parameters and uses the data from 2.5 million tests finishes within minutes on a standard PC.
In our methods, we estimated the parameters by using the regression loss function. This loss function is sensitive to possible heavy tails of the distribution (these may reflect real strong association signals). Therefore, we decided to use the lower 95% of the distribution, which is similar to the method suggested in [22]. Other solutions are possible by using different loss functions, which are less sensitive to outliers. Examples include loss functions defined by the sum of absolute deviations or the square root thereof and the difference between obtained and expected medians.
For additive models, the GC inflation factor l is an important indicator of goodness of the model used for association analysis. For recessive, dominant, and over-dominant models, we have demonstrated that the test inflation is always smaller than the inflation for the additive model ( Figure 1). Therefore, we suggest that the use of analytical method that appropriately reflects the genetic structure of the underlying data should be decided based on l for an additive model or based on the maximal value of the VIF function from the non-additive model. For the GC method, an important previous observation was that the method is not specific for the Cochran-Armitage test. Bacanu et. al. [11] have demonstrated earlier that the GC method can be applied in the framework of quantitative trait regression analysis models as well, including models with gene-gene interaction. We confirmed this principle by analyzing real quantitative trait in two different populations.
All methods developed and tested in this work were implemented in R language in the GenABEL-package [23], part of the GenABEL project for statistical genomics (http://www.genabel.org).
In summary, we proposed and tested several methods for GC for various models of inheritance and compared these methods by using real and simulated data. We demonstrated that the VIFGC and the PGC method can be successfully used in adjusting the test statistic for different non-additive models in the framework of GWAS.

ERF study
Simulations were based on real genetic data collected in the framework of the Erasmus Rucphen Family (ERF) study. All study protocols were approved by the Medical Ethics Committee of Erasmus University, and all participants gave written informed consent in accordance with the Declaration of Helsinki. The ERF study is a cross-sectional study embedded in genetically isolated population located in southwest Netherlands. The study participants are members of a single large pedigree that can be traced in 23 generations and contains thousands of cycles [24]. The sample used for simulations included 3,235 people, for whom the genotypes of 54,000 SNP markers were available. All SNPs included had a coding allele frequency (CAF) 0.05#CAF (coded allele frequency) #0.95 and a call rate $0.95.
In the example where real phenotypic data was used, we analyzed levels of high-density lipoprotein (HDL) on imputed genotypic data. This data set included 2,699 people, who were genotyped and imputed (using HapMap2 as reference panel) at 2,093,818 SNP markers. All SNPs in the subset had 0.05#CAF#0.95 and a call rate $0.95. More detailed description of phenotyping and sample can be found in [25].

KORA study
As an example of a carefully designed population-based study, we used KORA (Cooperative Health Research in the region of Augsburg) F4, which is a study from the KORA cohorts [26]. KORA F4 is a follow-up (from 2006 to 2008) of the  [27].

Simulations
The phenotypes for analysis of type 1 error and the power were simulated based on real genetic data from the ERF study by using the scheme described below. We used binary traits for simulations because in our own derivation of VIF as well as in previous derivations, binary traits were used in the same way as demonstrated in [2]. Results can be generalized to quantitative traits as well [11].
Liability values were simulated as a sum of independent quantitative trait loci (QTLs) and polygenic effects. The heritability coefficient was set to be equal to a random number coming from a uniform distribution bounded by 0.5 and 0.8. To model the QTL effect, an SNP was randomly chosen. Based on its minor allele frequency (MAF), the effect was assigned in a way that the SNP was accounted for 0% of total liability variance for type 1 error and 0.35% for power simulations. To model the polygenic effect, 500 markers were randomly chosen (excluding the chromosome harboring the QTL), and based on their allele frequencies, effects were assigned in such way that each of the SNPs explained the same fraction of non-QTL heritability. The quantitative phenotype was transformed into a binary trait following a threshold model (the ''case'' phenotype was assigned if liability was below the threshold corresponding to 1/3 of the distribution). To study type 1 error, 1,000 simulation cycles were performed. To study the power, 100 simulation cycles were performed.

Association analysis
For the analysis of simulated and real data, we used standard tests implemented in the GWFGLS (genome-wide feasible generalized least squares) function of the MixABEL package, which is a part of the GenABEL suite of programs [23] for statistical genomics (option ''score,'' so the output from GWFGLS for binary traits was completely the same as for Cochran-Armitage trend test of chi-square for binary traits). GWAS were calculated for five different (additive, dominant, recessive, over-dominant, and genotypic) models of SNP effect.
For quantitative trait analysis, we used regression and score test as implemented in MixABEL. For the analysis of imputed data, the regression was performed onto probabilities. GWAS results were corrected using different methods for GC. The standard method, which corrects test statistic by dividing it by the estimated l, was applied as well as two refined methods. The qualities of the GC correction methods were compared with one another in terms of type 1 error with three characteristics: l median : ratio of distribution's median and expected median (0.455) l regress : regression coefficient between statistic's distribution and theoretically expected Chi-square statistic E: proportion of the tests with p-value less then declared level (0.05).
In addition to the comparison of the performance for all SNPs, five allele frequency groups were compared separately:

Supporting Information
Note S1 Derivation of VIF for non-additive models. (DOC)