Interpreting Meta-Analyses of Genome-Wide Association Studies

Meta-analysis is an increasingly popular tool for combining multiple genome-wide association studies in a single analysis to identify associations with small effect sizes. The effect sizes between studies in a meta-analysis may differ and these differences, or heterogeneity, can be caused by many factors. If heterogeneity is observed in the results of a meta-analysis, interpreting the cause of heterogeneity is important because the correct interpretation can lead to a better understanding of the disease and a more effective design of a replication study. However, interpreting heterogeneous results is difficult. The standard approach of examining the association p-values of the studies does not effectively predict if the effect exists in each study. In this paper, we propose a framework facilitating the interpretation of the results of a meta-analysis. Our framework is based on a new statistic representing the posterior probability that the effect exists in each study, which is estimated utilizing cross-study information. Simulations and application to the real data show that our framework can effectively segregate the studies predicted to have an effect, the studies predicted to not have an effect, and the ambiguous studies that are underpowered. In addition to helping interpretation, the new framework also allows us to develop a new association testing procedure taking into account the existence of effect.

we use the fact that the product of two Gaussian probability density functions results in a single Gaussian [1,2]. This result can easily be generalized to more than two Gaussians. That is, is the inverse variance or precision. Also, since a normal distribution is symmetric, N (a; b, V ) = N (b; a, V ) .
Using these two facts, given µ ∼ N (0, σ 2 ), The summations are all with respect to i ∈ t 1 .
Text S2 P-value estimation using importance sampling for binary effects model We suggest an importance sampling procedure for estimating the p-value of the binary effects model.
Importance sampling is a statistical technique for reducing the variance of the estimate by sampling from a distribution different from the distribution of interest [3].
In an importance sampling procedure for obtaining p-value, the sampling distribution is chosen so that the sampled statistic can easily exceed the threshold of the observed statistic S. In the case of 1-dimensional distribution, this goal is usually achieved by choosing a sampling distribution centered at or around S. However, we have a complication that the sampling space is N -dimensional where N is the number of studies in the meta-analysis. Different regions spread over this N -dimensional space can give the same statistic. Therefore, simply changing the center of the original sampling distribution from zero to S will not work, and the importance sampling procedure should effectively traverse all regions. The idea we suggest to achieve this goal is to sample the number of studies having an effect first and sample from the corresponding region, as shown in the following.
First, we sample the number of studies having an effect N E . We assume a uniform distribution .., N . Next, we sample N z-scores from the following mixture of normal distributions, where S is the observed binary effects model statistic. The mean value S give us the exactly same statistic S. Therefore, the use of this mean value will lead us to a region that will give a similar statistic to S.
If the sample size is different between studies, we use the following sampling distribution instead.
Given N E , there are N N E possible combinations to choose N E studies. Assume a single combination among them and let M be the set of indices of N E studies that are chosen to have an effect. Given M , a z-score vector that will exactly produce the statistic S is where I is an indicator function. Note that in this vector only N E elements are non-zero. We iterate all N N E combinations and calculate the mean β * and variance V * of all possible N E · N N E non-zero elements. The new sampling distribution is then taking into account the variation caused by unequal sample sizes. If N N E is too large for an exact calculation, we randomly sample 1,000 combinations and estimate β * and V * .
Now that we defined f N E given a sampled value N E , the overall sampling distribution of a vector of Given B sampled z-scores Z * 1 , Z * 2 , ..., Z * B from this generative model, the p-value is estimated as where f 0 (Z) = 1 (2π) N/2 e −Z T Z/2 is the original null distribution of z-scores and S(·) denotes the binary effects model statistic given the z-scores. The multiplying factor 2 comes from the fact that the null distribution of S is symmetric.
The fact that the null distribution is symmetric can be shown as follows. We informally describe the reasoning rather than giving a formal proof. Given a point Z giving a binary effects model statistic S, consider that we move to a symmetric point about the origin, −Z, in the N dimension space. It can be easily shown that the m-values of the studies do not change because the formula (2) (or the formula (S1) if we use approximation) gives the same value. Thus, the statistic corresponding to −Z will simply be −S.
Therefore, if we let K S ∈ R N be the set of all points giving a statistic S, the region giving the statistic −S will be the symmetric counterpart, K −S = {−Z|Z ∈ K S }. Given the fact that the null distribution of z-scores (f 0 (Z)) is symmetric about the origin, the probability of S, f (S) = K S f 0 (X)dX, will be the same as the probability of −S, f (−S) = K −S f 0 (X)dX. In other words, the probability density function of S is symmetric under the null.

Text S3 Efficient m-value approximation for binary effects model
Since the binary effects model involves sampling, the standard procedure for estimating m-value is inefficient. We propose an efficient approximation. We first assume a point prior π for the prior probability that the effect exists. Then, the m-value of study i is simply if we knew the true value of µ.
Instead of assuming a distribution prior on µ, we use an empirical approach that estimates µ from the other N − 1 studies in the meta-analysis. The inverse-variance-weighted estimate is The variance of X i is given Then we set an emprically estimated prior on µ The approximated m-value is The intuition under this empirical approach is that by using the other N − 1 studies, we can obtain the prior on µ which is independent of X i . In this sense, our approach has similarities to the leave-one-out cross-validation approach usually used in evaluating statistical prediction methods [3].
The advantage of this empirical approach is that the computation is very efficient. The disadvantage is that the information about µ in study i is not utilized since we only use the other N − 1 studies. In the case that only study i has an effect, as the sample size of the other studies increases, m * i asymptotically converges to π instead of 1 since the other studies do not have any clue about µ. For this reason, we use this empirical approach only for the purpose of efficiently computing the binary effects model statistic and p-value. We use π = 0.5 for all experiments in this paper.