^{1}

^{2}

^{*}

^{1}

^{2}

^{*}

Conceived and designed the experiments: YG MS. Performed the experiments: YG. Analyzed the data: YG MS. Contributed reagents/materials/analysis tools: YG MS. Wrote the paper: YG MS.

The authors have declared that no competing interests exist.

Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption—specifically, that difficult-to-impute SNPs tend to have larger effects—and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate—their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from

Genotype imputation is becoming a popular approach to comparing and combining results of multiple association studies that used different SNP genotyping platforms. The basic idea is to exploit the fact that, due to correlation among untyped and typed SNPs, genotypes of untyped SNPs in each study can be inferred (“imputed”) from the genotypes at typed SNPs, often with high accuracy. In this paper, we consider several issues that arise when applying these methods in practice, including factors affecting imputation accuracy, the importance of taking account of imputation uncertainty when testing for association between imputed SNPs and phenotype, how imputation accuracy affects power, and how to combine results across studies when only single-SNP summary data can be shared among research groups.

Ongoing large-scale genetic association studies, in an attempt to identify variants and genes affecting susceptibility to common diseases, are typing hundreds of thousands of SNPs in thousands of individuals, and testing these SNPs for association with phenotypes. Although this is a large number of SNPs, an even larger number of SNPs remain untyped. For example, the International HapMap Project contains genotype data on more than 3 million SNPs

The idea behind these imputation-based approaches is to exploit the fact that untyped SNPs are often correlated with typed SNPs, so genotype data on typed SNPs can be used to indirectly test untyped SNPs for association with phenotypes. Specifically, the approaches in

Testing untyped variants via imputation both increases power to detect associations

Here we focus on several important issues that arise when applying imputation-based methods in practice. These include i) factors affecting imputation accuracy, including choice of reference panel; ii) the effects of imputation accuracy on power to detect associations; iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs.

We find that imputation-based methods can be relatively robust to imputation accuracy: small changes in imputation accuracy produce small changes in power to detect associations, and even when average imputation accuracy is poor (as could occur if the panel is not well-matched to the study sample), imputation-based approaches can still improve power compared with no imputation. Comparing frequentist and Bayesian methods for testing imputed SNPs, we note that ranking SNPs by their

We now describe more formally the imputation-based approaches to association mapping that we take here, and introduce notation for use in later sections. Readers whose primary interest is in our practical findings may wish to skip the

The imputation-based approach we take here is based on using a “prospective” model relating phenotypes to genotypes, and so is appropriate for analyzing phenotypes and genotypes on individuals sampled randomly from a population. We consider its applicability to other designs, such as case-control studies, in the

Let _{i}_{ij}_{·j} denoting the vector of genotypes at SNP _{·j} will consist of all “missing” observations.

The aim of our imputation-based mapping approach is to assess whether a SNP, _{·j} at SNP

From Expression (1) we see that, for an untyped SNP, the likelihood is a weighted average of the complete data likelihood (i.e. the likelihood if the genotypes at SNP _{·j}) over all possible values of g_{·j}. The weights, Pr(g_{·j} = g|

We will compare two different approaches to using this likelihood to test the null hypothesis _{0} that SNP _{1} that SNP _{0} and _{1} denote the maximum likelihood estimates of _{0} and _{1} respectively. Under standard theory, 2log(Λ) has an asymptotic ^{2} distribution under _{0}.

The second approach we consider is the Bayesian approach from _{1} vs _{0} is given by the Bayes factor for _{1} vs _{0}, defined as_{0} (·) and _{1} (·) denote prior distributions on _{0} and _{1}. Note that this expression for the Bayes factor bears some resemblance to the likelihood ratio Λ, but whereas in Λ the numerator and denominator are maximised with respect to unknown parameters _{1} over _{0}, whereas small values indicate evidence for _{0} over _{1}. Bayes factors have a number of general advantages over

For a typed SNP at which genotypes are observed to be g, the Bayes factor, BF(g), can sometimes be computed analytically (see below). For an untyped SNP the Bayes factor is the weighted average of BF(g) over all possible values for g. Indeed, substituting the likelihood (1) into the numerator of (4) gives

We now give explicit expressions for the likelihood and Bayes factor in the case of a continuous (quantitative) phenotype _{·j} at a SNP _{ij}_{i}_{0} is

Under this model, the likelihood (2) becomes, for each _{i}_{0} = _{1} = _{2} =

To compute Bayes factors, we use a prior based on prior D2 from _{a}_{d}

Under this prior, the Bayes factor for a SNP with genotypes g, BF(g), can be computed analytically (^{−1} is the 3×3 diagonal matrix with diagonal elements _{i}_{3} = _{i}

To compute the likelihood ratio (3) and Bayes factor (5) we now need two things. First, we need expressions for Pr(g_{·j} = g|

While there are many possible approaches to predicting unknown genotypes from patterns of LD

In brief, the model from _{ij}_{i}_{·},_{i}_{·} is the vector of observed genotypes in individual

Our first, most striking, finding is that imputing genotypes using parameter estimates obtained by maximising the full likelihood

These results may appear initially counter-intuitive. However, we note that in the available data, only _{C}_{A}_{B}

Besides increasing imputation accuracy at untyped SNPs, fitting the model to

We found that accuracy of imputed genotypes improved with increased number of clusters (

Number of Clusters | Number of EM runs | Error Rate |

K = 10 | E = 5 | 0.073 |

K = 10 | E = 10 | 0.072 |

K = 10 | E = 20 | 0.069 |

K = 20 | E = 5 | 0.068 |

K = 20 | E = 10 | 0.064 |

K = 20 | E = 20 | 0.064 |

K = 30 | E = 5 | 0.064 |

K = 30 | E = 10 | 0.063 |

K = 30 | E = 20 | 0.062 |

We use

We compared imputation accuracy with that of the PAC model, used by

One advantage of the cluster-based model over the PAC model is that it can easily deal with the situation where the panel of densely-genotyped individuals is unphased, whereas the PAC model is much harder to adapt to that setting (e.g. IMPUTE assumes that a phased panel is available). Since the HapMap provides many accurately phased individuals, this advantage will not always be important. However, it does make it easier to exploit any denser unphased genotype data that may be available in some cases (e.g. from resequencing of a candidate region). Motivated by this, we examined the effect of having an unphased panel, by performing imputation with the model fit to the

We examined the effect of having a panel that is not well matched to the sample, by using the African (YRI) and Asian (CHB+JPT) HapMap samples as a panel to impute genotypes in the European individuals. We found that using either of these panels individually substantially increased error rates (17% for CHB+JPT, 25% for YRI). However, using a combined panel of CEU+YRI+CHB+JPT gave an error rate only slightly higher than using CEU alone (7.8% for combined panel vs. 6.2–7.3% for CEU). This demonstrates that imputation accuracy can be relatively robust to mismatches between the panel and cohort samples, provided that the panel contains at least some individuals with genetic variation representative of the cohort. For intuition into why this happens, note that to impute an individual's genotypes, the cluster-based model (and, indeed, the PAC model) attempts to explain the individual's observed genotypes using a mosaic of panel haplotypes. This can work provided the panel contains suitable haplotypes, even if the panel also contains a large number of unsuitable haplotypes. Thus, although the issue of panel choice merits further study, using the combined panel should be a helpful strategy when imputing genotypes in a cohort that may not be well represented by a single HapMap analysis panel (e.g., admixed individuals).

The error rates above are average error rates across all SNPs, assuming

Numbers along the line indicate the thresholds that produce the corresponding call rate and error rate; small black points indicate results for intermediate thresholds in increments of 0.02. For example, if we call only those imputed genotypes assigned probability >0.9 of being correct, then approximately 74% of imputed genotypes are called, and of these called genotypes approximately 1% are incorrect.

The Bayes factor for an untyped SNP (Equation (5)) involves a sum over a very large number of terms, and it is computationally impractical to compute this sum directly. In practice then we must use methods to approximate this Bayes factor. Here we compare three different approaches to making this approximation.

The first appproach is the “naive” Monte Carlo estimator used by both ^{(1)},…,g^{(M)} are independent and identically-distributed samples from _{·j}|

The second approach is based on an importance sampling estimator ^{(1)},…,g^{(M)} are independent samples from an “importance sampling distribution” _{naive}. Our importance sampling function is described in

The third approach we consider is motivated by the simple idea of replacing unobserved genotypes at SNP _{ij}_{ij}_{ij}_{ij}_{mean}, based on this expected design matrix:_{mean} is very quick to compute, reducing the number of Bayes factor evaluations by a factor of _{naive} and BF_{IS}.

We compared the three approximations by applying them to the genome-scan genotype data on chromosome 22, described above, with phenotypes simulated under both null and alternative hypotheses (see next section for details). Bayes factors were computed by averaging over _{a}_{d}_{a}_{naive} and BF_{IS} we used _{mean} were generally similar to both BF_{IS}, and BF_{naive} (_{IS} being better (presumably because BF_{IS} has generally smaller standard error than BF_{naive}; see red vertical bars on _{mean} appears to provide an adequate approximation to the Bayes factor, more accurate (for

In each case the diagonal blue line is _{10}(BF_{IS}±2 standard errors)).

To explain why, and under what circumstances, BF_{mean} provides an accurate approximation to the Bayes factor for _{1} vs _{0}, note that BF_{mean} is, in fact, the Bayes factor for a different alternative hypothesis _{i}_{1} (Equation 7). In particular _{i}_{1}. The differences between the two are that i) under _{i}_{1} it is a mixture of three normals; and ii) under _{i}_{1} the variance of _{i}_{i}_{a}_{mean} generally provides an accurate approximation to the full Bayes factor. For different priors, using substantially larger values of _{a}_{a}

We note that it would be relatively straightforward to develop an improved approximation to the Bayes factor for _{1} by modifying _{mean} appears adequate for our purposes.

In subsequent sections we use BF_{mean} to approximate the Bayes factor. Motivated by the fact that BF_{mean} is the Bayes factor for _{0}, we will also consider an analogous likelihood ratio statistic, Λ_{mean}, defined to be the likelihood ratio statistic for _{0}. The statistic Λ_{mean} has several practical advantages over Λ: it can be computed analytically (_{mean} can be easily obtained from standard regression software since _{mean} is equivalent to the standard

In this section we use both theoretical arguments and simulation experiments to compare and contrast the use of Bayes factors vs

Several authors have pointed out advantages of Bayes factors over ^{5} is truly associated with the phenotype is the same whether the Bayes factor was computed from a sample of 100 or 10,000 individuals; whereas the probability that a SNP with a ^{−6} is truly associated with phenotype will be different in these two settings. In addition it is often desirable, and straightforward, to combine (by averaging) Bayes factors computed under different assumptions (e.g. averaging over additive and dominant models for effects, and thus allowing for dominance while maintaining some of the benefits of simpler additive models). However, in terms of the limited question of simply _{1}, the (prior) variance of effect sizes (_{mean} and Λ_{mean} for the normal linear model. However, we expect that the result will hold more generally; see below for further discussion.

We now provide a practical interpretation of these results for typed and untyped SNPs.

For typed SNPs, the variance of the MLE is approximately proportional to 1/_{mean}) can be thought of as making the implicit assumption that effect sizes tend to be larger for SNPs with a small MAF, and specifically that the expected square of the effect size is proportional to 1/

For untyped SNPs, the variance of the MLE depends not only on the MAF, but also on the confidence with which the untyped SNP genotypes are imputed: the larger the uncertainty in imputed genotypes, the larger the variance of the MLE. Thus, in addition to the assumption for typed SNPs, ranking untyped (imputed) SNPs by _{mean}) can be thought of as making the implicit assumption that _{mean} show slightly poorer performance than Bayes factors that do not make this assumption.

To complement the above theoretical arguments, we compared methods empirically, using the real chromosome 22 genotype data described above and simulated phenotype data. To make the comparison as focussed as possible we compare results from the likelihood ratio statistics, Λ and Λ_{mean}, with the Bayes factor BF_{mean} based on the implicit prior assumed by the likelihood ratio statistic. That is, we used a prior in which the expected square of the effect size, _{a}_{d}

The key point here is that the priors for the Bayesian approach were chosen so that, under the theory outlined above, the Bayesian and frequentist test statistics will provide the same rankings for typed SNPs. Thus any difference between the methods in ranking SNPs must be due to differences in the treatment of untyped imputed SNPs.

We simulated phenotype data under two scenarios. The first scenario makes the assumption (also made by the testing methods) that the additive effect at the causal SNP depends on MAF

We applied imputation-based association mapping to each dataset, using the HapMap CEU individuals as a panel, and computing test statistics (BF_{mean}, Λ and Λ_{mean}) for each typed and untyped SNP. To examine effects of lower average imputation accuracy (and confidence) we repeated this experiment using the YRI individuals as a panel. In each case the causal variants were assumed to be untyped (i.e., the genotypes at the causal variant were assumed to be unknown, and were imputed along with all other untyped SNPs). To compute Λ we used numerical maximisation (Nelder-Mead) to maximise the numerator, with initial parameter values obtained from maximising the alternative likelihood (12). The computation for one simulated phenotype (34083 test statistics) for BF_{mean}, Λ and Λ_{mean} take 20 seconds, 229 seconds, and 10 seconds respectively, after imputed genotype probabilities had been computed.

_{mean} and Λ,Λ_{mean}, and how they depend on imputation confidence. (The relationships with Λ and Λ_{mean} are qualitatively similar, but the relationship with Λ_{mean} is cleaner because of the mathematical relationship between BF_{mean} and Λ_{mean} given in _{mean}) close to 0), with the variance about 1 being smallest for those SNPs with the lowest confidence genotypes. This reflects the fact that these difficult-to-impute SNPs tend to be relatively uninformative regarding the null vs alternative hypotheses. In contrast, by design, the likelihood ratio statistics have approximately the same distribution (under the null) for both poorly-imputed and well-imputed SNPs. The practical effect of this is that, when testing difficult-to-impute SNPs, the Bayes factor is less likely to take a large value by chance than is the likelihood ratio statistic, and so difficult-to-impute SNPs are less likely to produce false positives by chance when using the Bayes factor than using the likelihood ratio statistic.

Each point represents one SNP-phenotype combination, colored according to average confidence in imputed genotypes. Specifically, the SNPs were colored according to the value of

To quantify the practical impact of this difference between the approaches, we compared the power of the BF and LR to distinguish true positive vs false positive signals, and compared both these approaches with not performing imputation. Since large test statistic values tend to cluster together (due to LD among nearby SNPs), and since we want to compare methods that are testing different sets of SNPs (imputation vs no imputation), we took the approach of defining regions, rather than individual SNPs, as the unit of “discovery”. Specifically, we divided the chromosome into adjoining 200 kb regions, chosen in such a way that one region is centered on the causal variant, and scored each region as a “discovery” if the largest Bayes factor (or the largest value of Λ) in that region exceeded some threshold (

The results (_{mean}, presumably because the approximate model on which Λ_{mean} is based is a worse approximation for harder-to-impute SNPs.

Each line shows the trade-off between true and false discoveries when using Bayes factors (black lines) or likelihood ratio test statistics Λ (red solid lines) and Λ_{mean} (red dashed lines), as threshold for declaring an association is varied. In each setting the Bayes factor produces as good, or better, performance than the likelihood ratio test (black lines are above the corresponding red lines). Best performance is obtained using CEU panel, which is well-matched to the sample and produces a low imputation error rate (left; imputation error rate 6.2%). Larger increases in imputation error rate, obtained when using YRI panel that is not well-matched to sample, produce a notable reduction in performance (right; imputation error rate 25%). However, even with a high imputation error rate, using the Bayes factor as a test statistic gives better results than no imputation (blue dotted lines in both panel).

In summary, when testing imputed variants for association with a normal phenotype, we found that using a Bayes factor to rank SNPs for strength of evidence of a association gave better performance than using the Likelihood ratio statistics. Theoretical arguments suggest that this gain in performance may be explained by viewing the Likelihood ratio statistics as making an implicit, and unnatural assumption: that difficult-to-impute variants tend to have larger effect sizes. Consistent with this, we found that the quantitative difference in performance depends on the fraction of difficult-to-impute SNPs, and is larger when many SNPs have low-confidence imputed genotypes. Thus, in practice, the difference between the methods might have greater importance in imputation applications involving harder-to-impute rare variants (e.g. where the panel data are resequencing data, rather than genotype data–a scenario that may be more common in the near future with gains in sequencing technology).

It is natural to ask whether the advantage of the Bayes factor over the likelihood ratio statistics in this setting (of normal phenotype) also transfers to other settings, and to other frequentist test statistics. In fact we believe there are several reasons to expect it will apply quite generally. For example, regarding the transfer to settings other than normal phenotypes, the theoretical results from

Regarding the extension to other frequentist test statistics, we note that many test statistics in common use (e.g. the _{mean} to respond to differences in informativeness among SNPs, and in particular it causes them to occassionally produce, by chance, high ranks for difficult-to-impute SNPs. We therefore expect other test statistics that have this feature of having a fixed distribution under the null (which would include most conventional frequentist test statistics used for single-SNP analyses, including chi-square statistics for case-control studies, and the score test in

We emphasise that we do not view the gain in power discussed here as the only, or indeed the strongest, argument for applying Bayesian methods: we have focussed on it because it is an issue specific to imputation analyses, and one that has not previously been discussed. We also emphasise that the prior we used here, where effect size depends on minor allele frequency, and with no dominance, was adopted purely to facilitate comparisons with the likelihood ratio statistics. If effect sizes are independent of (or only weakly dependent on) MAF, or if some loci exhibit dominant effects, then increased power should be obtained from a Bayesian analysis that incorporate these factors. A strength of the Bayesian approach is its ability to incorporate this type of information where it is available, and to average over assumptions where good data are not available. For example, since there is little direct data on relationship between the MAF and effect size, there is an argument for averaging Bayes factors obtained under both the independence and dependence assumptions; similarly, there is an argument for averaging Bayes factors over different amounts of dominance, for example by setting _{d}_{d}

Besides comparing the Bayesian and frequentist approaches to inference, the results in _{mean} and Λ for the CEU panel in

While we have focussed here on testing quantitative phenotypes for association with genotypes, all of the key results should be expected to also apply to binary (0/1) phenotypes. Indeed, although the natural way to analyse a binary trait is via a logistic, rather than a linear, regression, for the small effect sizes that are typical in genetic studies the two approaches to analysis might be expected to produce similar results (e.g. see

The blue line is

In summary, we have addressed a number of practical issues that arise in implementing imputation-based association mapping for a quantitative trait. Key findings include: i) when using the model of

While our findings are based on use of a particular imputation method, in the particular context of association mapping of a quantitative phenotype, all of them (except perhaps i) above) are likely to apply more generally. For example, the PAC model used for imputation in

Although imputation-based analyses involve considerably more computation than simply testing typed SNPs, these analyses are nevertheless now practical with very modest computational resources. For example, using the methods we describe here (and particularly findings i) and iii) above), implemented in the software package BIMBAM, analysing the whole of chromosome 22, in 675 individuals, with

Besides the gain in computational convenience, the effectiveness of Bayes factors based on posterior mean genotypes also has important implications for sharing and combining data across studies. Specifically, we have in mind situations where, for political, ethical, or other reasons, sharing individual-level genotype and phenotype data among investigators working on similar studies is more difficult than sharing summary-level data on each SNP. In these cases, a simple approach is to share and compare Bayes factors (or ^{t}X^{t}Y^{t}Y_{mean} depends only on ^{t}X̅^{t}Y^{t}Y_{s}_{s}^{t}X^{t}Y^{t}Y_{mean} for each SNP _{mean}, allowing frequentist inference for the joint data also to be performed without sharing individual-level data.)

Of course, combining results across studies can present many challenges, including differential inclusion criteria or phenotype definitions; differential genotyping biases, e.g. due to differential DNA quality

The methods described here, like those from

Importance sampling.

(0.06 MB PDF)

Relationship between BF_{mean} and Λ_{mean}.

(0.06 MB PDF)

Laplace method to approximate Bayes factors for Logistic Regression.

(0.06 MB PDF)

We thank Dr R. Krauss for permission to use the PRINCE genotypes, which were collected as part of the Pharmacogenetics and risk of Cardiovascular disease project, funded by U01HL069757 (PI R. Krauss); we thank Dr Paul Ridker, Principal Investigator of the PRINCE trial, for providing access to the patient population and DNA used to produce these genotypes. We thank J. Wakefield for helpful comments and sharing a pre-print containing his results relating Bayes factors and F statistics; X. Wen for results from IMPUTE; and the Computational Institute at the University of Chicago for computing resources. We thank two anonymous reviewers for detailed comments on earlier versions of this manuscript.