Meta-GWAS Accuracy and Power (MetaGAP) Calculator Shows that Hiding Heritability Is Partially Due to Imperfect Genetic Correlations across Studies

Large-scale genome-wide association results are typically obtained from a fixed-effects meta-analysis of GWAS summary statistics from multiple studies spanning different regions and/or time periods. This approach averages the estimated effects of genetic variants across studies. In case genetic effects are heterogeneous across studies, the statistical power of a GWAS and the predictive accuracy of polygenic scores are attenuated, contributing to the so-called ‘missing heritability’. Here, we describe the online Meta-GWAS Accuracy and Power (MetaGAP) calculator (available at www.devlaming.eu) which quantifies this attenuation based on a novel multi-study framework. By means of simulation studies, we show that under a wide range of genetic architectures, the statistical power and predictive accuracy provided by this calculator are accurate. We compare the predictions from the MetaGAP calculator with actual results obtained in the GWAS literature. Specifically, we use genomic-relatedness-matrix restricted maximum likelihood to estimate the SNP heritability and cross-study genetic correlation of height, BMI, years of education, and self-rated health in three large samples. These estimates are used as input parameters for the MetaGAP calculator. Results from the calculator suggest that cross-study heterogeneity has led to attenuation of statistical power and predictive accuracy in recent large-scale GWAS efforts on these traits (e.g., for years of education, we estimate a relative loss of 51–62% in the number of genome-wide significant loci and a relative loss in polygenic score R2 of 36–38%). Hence, cross-study heterogeneity contributes to the missing heritability.

S1 Simulations. Assessment of the accuracy of the MetaGAP calculator.
Using five simulation studies, we assess the accuracy of the MetaGAP calculator, which is based on the expressions for GWAS power and PGS R 2 derived in S1 Derivations and S2 Derivations. Since the calculator is based on specific assumptions regarding the data-generating process, an important question is whether the calculator still provides accurate predictions of power and R 2 when the underlying assumptions are violated.
Hence, each simulation study has a different underlying data-generating process. The first study, Simulation 1, assumes that rare variants have larger effects than common variants to such an extent that each causal SNP, regardless of allele frequency, is expected to have the same R 2 with respect to the phenotype (Assumption 5 in S1 Derivations). This simulation is entirely in line with the assumptions underlying the MetaGAP calculator. In the second study, Simulation 2, common variants have effects of the same magnitude as rare variants (leading a common causal variant to explain a larger proportion of the phenotypic variation than a rare causal variant). The third study, Simulation 3, also allows for differential R 2 between SNPs and, in addition, does not assume that SNP allele frequencies are uniformly distributed. Instead, the third study assumes that there are more variants in the lower minor allele frequency bins than in the higher minor allele frequency bins. In addition to the deviations from assumptions made in Simulations 2 and 3, Simulation 4 allows allele frequencies to be completely independent across studies. Finally, in Simulation 5, we go back to a data-generating process in line with the assumptions underlying the MetaGAP calculator, with one important difference; in Simulation 5, the genetic correlation as inferred at the genome-wide level is not only shaped by the correlation of SNP effects, but also by the degree of overlap of causal loci across studies. Thereby, Simulation 5 violates the assumption discussed in S1 Note, that the estimated CGR is shaped only by imperfect correlations of SNP effects across studies.
For each simulation study there are 100 independent runs. In each run data is simulated for C = 3 distinct samples for discovery as well as a fourth sample used as hold-out sample for prediction. The sample sizes of the respective studies are given by N 1 = 20,000, N 2 = 15,000, N 3 = 10,000, and N 4 = 1,000, where N 4 denotes the sample size of the hold-out sample. For Simulations 1-4, an 11×11 grid of equispaced values of h 2 SNP ∈ [0, 1] and is considered. Here, s denotes the fraction of causal SNPs that overlaps across studies and ρ β the cross-study correlation of the effects of SNPs that are overlapping. In Simulations 1-4 we have that s = 1 and in Simulation 5 we have that h 2 SNP = 0.5. In all simulations there are S = 100,000 independent SNPs of which M = 1,000 have a causal influence. Moreover, when computing theoretical power and predictive accuracy, in line with S1 Note, we use ρ G = s · ρ G as value of the input parameter CGR. A detailed description of the data-generating process in each simulation study can be found in Table A1.
For every run, data is simulated in accordance with the underlying data-generating process. Next, a GWAS S1 Simulations 1/5  |= ' denotes f jk is independent of f hk S1 Simulations 2/5 is carried out in each of the three discovery samples. GWAS results are then meta-analyzed using sample-size weighting. The fraction of causal SNPs reaching genome-wide significance in the meta-analysis is the estimate of statistical power per SNP. The squared correlation between the meta-analysis-based PGS for the hold-out sample and the corresponding phenotype is the estimate of the PGS R 2 .
Final estimates of power per causal SNPs and PGS R 2 are obtained by averaging the estimates across the runs. both Simulations 1-4 and Simulation 5. In addition, both figures report the power per causal SNP and R 2 predicted by the theoretical model, derived under the assumptions discussed in S1 Derivations. Inspection of Fig. A1 shows that there is no qualitative difference between the contour plots. Moreover, when computing the root-mean-square error (RMSE) between the theoretical predictions and the simulation-based estimates of power and R 2 , even for the most extreme departures from our assumptions regarding allele frequencies and effects sizes (Simulations 3-4), the RMSE in power remains below 3% and the RMSE in R 2 of the PGS below 2%. Hence, the theoretical predictions of GWAS power and predictive accuracy -derived under assumptions of equal true R 2 of causal SNPs, with uniformly distributed allele frequencies that are equal across studies -are robust to violations of these assumptions.
Inspection of Fig. A2 shows that when CGRs are being shaped by a combination of poor overlap and poorly correlated effects of overlapping loci, there are some qualitative differences between predicted power and predictive accuracy compared to simulation-based estimates. However, the RMSE of theoretical power is only 1.2% with respect to the power estimated from simulations. Similarly, the RMSE of theoretical predictive accuracy is only 1.3%. Hence, the quantitative differences are small.
Simulation 5 shows that when low CGRs are induced by poor overlap of causal loci across studies instead of low correlations of the effects of overlapping loci, this leads to a slight downward bias in our theoretical predictions (i.e., making our theory conservative). Hence, we argue that if our calculator deems a study design well-powered, the analyses will be well powered, potentially even more so than what our theory predicts (e.g., if some of the imperfect Figure A1. Power and polygenic score R 2 contour plots with, in each plot, SNP heritability on the x -axis and cross-study genetic correlation on the y -axis. The first row shows predictions from the theoretical model. Subsequent rows show estimates based on respective simulation studies. The first column shows power per causal SNP. The second column the R 2 of a polygenic score in a hold-out sample. Above each plot, the root-mean-square error (RMSE) is reported for the difference between predictions from the theoretical model and the simulation-based estimates.

S1
Simulations 4/5 Figure A2. Power and polygenic score R 2 contour plots, with in each plot the fraction of causal loci that overlaps across studies on the x -axis and the cross-study correlation of the effects of overlapping loci on the y -axis. The first row shows predictions from the theoretical model. The second row shows estimates based on a simulation study. The first column shows power per causal SNP. The second column the R 2 of a polygenic score in a hold-out sample. Above each plot, the root-mean-square error (RMSE) is reported for the difference between predictions from the theoretical model and the simulation-based estimates.