Beyond SNP heritability: Polygenicity and discoverability of phenotypes estimated with a univariate Gaussian mixture model

Estimating the polygenicity (proportion of causally associated single nucleotide polymorphisms (SNPs)) and discoverability (effect size variance) of causal SNPs for human traits is currently of considerable interest. SNP-heritability is proportional to the product of these quantities. We present a basic model, using detailed linkage disequilibrium structure from a reference panel of 11 million SNPs, to estimate these quantities from genome-wide association studies (GWAS) summary statistics. We apply the model to diverse phenotypes and validate the implementation with simulations. We find model polygenicities (as a fraction of the reference panel) ranging from ≃ 2 × 10−5 to ≃ 4 × 10−3, with discoverabilities similarly ranging over two orders of magnitude. A power analysis allows us to estimate the proportions of phenotypic variance explained additively by causal SNPs reaching genome-wide significance at current sample sizes, and map out sample sizes required to explain larger portions of additive SNP heritability. The model also allows for estimating residual inflation (or deflation from over-correcting of z-scores), and assessing compatibility of replication and discovery GWAS summary statistics.


Relation to Other Work
There is similarity between our work (originally made available on bioRxiv in May 2017 [20]) and the reports of others. Zhu and Stephens [92] implemented a similar causal mixture distribution, but did not take the possibility of inflation into account. They use a reference panel to estimate LD between SNPs, but restrict it to the specific GWAS SNPs, i.e., the ones for which summary statistics are available. Thus, for height, for example, their causal β values perforce are restricted to the GWAS SNPs (≃1 million), whereas we use a much larger reference panel (≃11 million) and allow the z-scores in the GWAS of interest to arise from any of the reference panel SNPs. They take care to allow for correlated "noise" in the estimated GWAS effect sizes. Their maximization scheme for the likelihood function involves Markov chain Monte Carlo, and is therefore computationally intensive.
The "M2" model of Zhang et al [14] also uses a causal mixture Gaussian similar to ours. They employ a reference panel of ≃1 million SNPs, and allow the causal SNPs to be any of those (thus, in principle, causal SNPs are not necessarily restricted to being GWAS SNPs with zscores); however, for GWASs with summary statistics for SNPs not in the reference panel, they must restrict the z-scores analyzed to be from SNPs in the reference panel.
(In our case, the reference panel is sufficiently large that no such restriction is required.) Their analysis is based on an expression (Eq. (2) in the Supplementary Note of [14], hereinafter "SuppNoteZhang") for the effective β at a focal SNP arising through LD from underlying causal β values at neighboring SNPs that does not take into account the heterozygosity of those neighboring SNPs -H j in Eq. (15), H 1 , .., H w in Eq. (20), and H w in Eq. (26). Their likelihood function (see Eqns. (5) and (7) in SuppNoteZhang, with H = 1) ignores the detailed LD and heterozygosity structure of the focal SNP (each of the potential causal SNPs in LD with a typed SNP is approximated as having LD with the typed SNP given by the mean for all of the tagged SNPs -the LD score of the typed SNP divided by the number of the SNPs in LD -and their heterozygosities are implicitly treated in an average way, arising through the measured standard error of theβ effect size). We describe how to implement SNP heterozygosity and LD structure in a controlled and accurate way using, equivalently, either Fourier methods or a multinomial expansion (where we take w = 20 in our "Model PDF: Multinomial Expansion" subsection, Zhang et al take w = 1: compare Eq. (19) here with Eq. (5) in SuppNoteZhang with H = 1, and Eq. (21) here with Eq. (7) in SuppNoteZhang, again with H = 1; setting w = 1 reduces the correct multinomial expansion to an inaccurate binomial expansion). As a result, the M2 model solutions of Zhang et al do not accurately reflect the capability of the underlying mixture distribution for causal effects; their M3 model, which adds an extra Gaussian, is implemented in the same fashion. The inaccuracy in the M2 model can be seen in the QQ plots (even though they are fairly restricted, with the max value for -log 10 (p) set to 10) for, e.g., height, LDL cholesterol, total cholesterol, years of schooling, Crohn's disease, coronary artery disease, and ulcerative colitis, for which we obtain much better fits.
To test the numerical inaccuracy of using the binomial approximation, in simulations, we set w = 1 for all cases analyzed in Table A in the main paper and estimated the model M2 parameters from the reulting binomial expansion for the posterior distribution for a SNP's z-score, given the SNP's total LD and heterozygosity. That is, our modified cost function was based on Eq. (21) with w = 1. The results, directly comparable with Table A , are shown below in Table F . In all cases there is significant underestimation of the M2 heritability, arising from underestimation of the M2 polygenicity or the M2 discoverability, or both. This establishes numerically the mathematical fact mentioned above that the correct multinomial distribution is not equal to the merely approximate binomial distribution. Table 3 in SuppNoteZhang reports simulations based on "individual level data" (although it is not specified exactly how that was done; in Table 2 they report results from a simulation scheme to generate summary-level association statistics for GWAS without generating individual level data). The closest comparison with our implementation of their M2 model is for the 50k sample size reported in Table 3 SuppNoteZhang where M2 is both the true and fitted model. Although the number of causal SNPs is significantly under-estimated (3.92k vs. 5.35k) and the heritability explained per causal SNP -essentially the discoverability -is significantly over-estimated (7.6 × 10 −5 vs. 5.6 × 10 −5 ), the heritability turns out to be numerically correct (0.3). In Table F , for h 2 = 0.4 and n causal =11k, our estimated number of causal SNPs is only half the true value while the discoverability is a little underestimated (σ 2 β = 1.2 × 10 −4 vs. σ 2 β = 1.7 × 10 −4 ), andĥ 2 = 0.14 vs h 2 = 0.4. The discrepancies between the implementations of the inaccurate M2 model (in [14] and here) might be due to the factor of ten difference in reference panel size, how exactly individual level data simulations were handled in [14], and/or implementation details. We emphasize, however, that with our full correct implementation, our simulation results are in better accord with the simulated truth. For the example given above, we obtain n causal =11k,σ 2 β = 1.6 × 10 −4 andĥ 2 = 0.39, with a correspondingly acurate QQ plot -see the main paper Table  A and S1 Appendix Figure C .
We also applied the M2 model (i.e., setting w = 1 and using the binomial expansion as described above) to several of the real phenotypes, and compared with the results of Zhang et al. Below we use a vee (ˇ) to distinguish numerical quantities estimated from this procedure. We emphasize that we are using a reference panel with n snp = 11.02 million SNPs, whereas Zhang et al use a reference panel with only 1.07 million SNPs. For our larger panel, the mean heterozygosity is H = 0.22, while for the smaller panel it is H = 0.35; this difference itself might contribute to differences in estimated heritability, which is a derived quantity that in the model is proportional to the mean heterozygosity in the reference panel.
For height (2010), there were approximately 1 million z-scores, the same as for Zhang et al (essentially the HapMap 3 panel). We obtainedň causal = 4.6 × 10 −3 , slightly smaller than the value 4.8×10 −3 from our full correct implementation, main paper Table B , but in agreement with the value reported by Zhang et al (see S1 Appendix Table E ). Inflation (or rather deflation) appears to be minor in the case of height 2010:σ 2 0 = 0.94, the same as our value from the full correct implementation, σ 2 0 in main paper Table B . Nonetheless, if we do not correct the estimated heritability or discoverability for inflation -i.e., do not divide by σ 2 0 orσ 2 0 (see Eq. (10)), which Zhang et al do not do ("a" in Eq. (3) in [14] corresponds to our σ 2 0 , but values were not reported in [14]) -our M2 estimate isȟ 2 = 0.15, which is half the value reported by Zhang et al. These heritability estimates correspond to discoverabilities ofσ 2 β = 1.53 × 10 −4 for our implementation of M2 and 1.87 × 10 −4 for Zhang et al (using H = 0.35 in Eq. (38)). Note that our M2 estimate forσ 2 β is only slightly smaller that the value 1.56 × 10 −4 we obtain for the full correct implementation when not rescaling by σ 2 0 -see main paper Table B . It is possible that the larger value 1.87 × 10 −4 for Zhang et al is related to their reference panel being less than a tenth the size of ours.
For schizophrenia, our M2 implementation modeled zscores from 6.29 million SNPs; the implementation of Zhang et al considerably reduced that to 1.07 million z-scores. Our M2 implementation gave 1.04 × 10 5 causal SNPs (versus 3.1 × 10 4 for our correct implementation); Zhang et al report 1.9 × 10 4 causal SNPs. Our M2 inflation measure isσ 2 0 = 1.15, comparable to our value σ 2 0 = 1.14 reported in the main paper Table B . Our M2 liability-scale heritability is 0.24 (uncorrected for inflation, which happens to be the same as for our full model also when not correcting for inflation), while Zhang et al report a value of 0.29 -see S1 Appendix Table E . The corresponding discoverabilities areσ 2 β = 1.93 × 10 −5 for our implementation of M2, smaller than 6.28 × 10 −5 from our full model (again, both values not rescaled for inflation), and 4.36 × 10 −5 for Zhang et al.
In S1 Appendix Table G we provide results for all  phenotypes analyzed here for which M2 results were avail able, or implicit, in [14]. We also report height (2014), for which, as with BMI (2015), there were no M2 estimates in [14] though there were M3 estimates. Our M2 estimate of the number of causal SNPs for major depressive disorder, bipolar disorder, schizophrenia, and education all appear to be inflated. It would appear that the inaccurate M2 implementation is increasingly unreliable for higherpolygenicity/lover-discoverability traits. The disjuncture with our M2 simulation results, where we consistently find an underestimation of heritability arising from either an underestimation of polygenicity or discoverability, or of both (S1 Appendix Table F ), might be due to effects of model misspecification with respect to real phenotypes exacerbated by the inherent inaccuracy of the implementation, particularly when using a very large reference panel. Use of the smaller reference panel in [14] might be fortuitous in this regard.
Mathematically, we have shown that the binomial implementation of the M2 model used by Zhang et al is inaccurate, and we have demonstrated that numerically with simulations. In contrast, simulations with our correct implementation demonstrate much better parameter estimates -see main paper Table A . In both scenarios we used exactly the same simulated GWAS z-scores and the full reference panel of 11 million SNPs. When comparing with Zhang et al, there is the additional matter that they are using a reference panel one tenth the size of ours, and, for many real phenotypes (the situation with schizophrenia is representative) they exclude the vast majority of data (z-scores) from the analysis.
Exploring potential MAF-dependence on causal effect sizes, using only GWAS summary data, will be a focus of future work. Our concern here is to present the basic model, correctly implemented and analyzed. Zeng et al [15] use a related causal distribution for true effects β, but with MAF-dependence. Their approach is based on raw genotype data and requires extensive parallel computing. It is restricted to ≃400k SNPs; with summary statistics, we are working with ≃11 million reference panel SNPs. Below we explore possible model misspecification related to MAF-dependence -see S1 Appendix Table A .

Data Preparation
Total Linkage Disequilibrium Sequentially moving through each chromosome in contiguous blocks of 5,000 SNPs in the reference panel, for each SNP in the block we calculated its Pearson r 2 correlation coefficients (that arise from linkage disequilibrium, LD) with all SNPs in the central bock itself and with all SNPs in the pair of flanking blocks of size up to 25,000 each. For each SNP we calculated its total linkage disequilibrium (TLD), given by the sum of LD r 2 's thresholded such that if r 2 < r 2 min we set that r 2 to zero (r 2 min = 0.05). The fixed window size corresponds on average to a window of ±8 centimorgans. This is deliberatly larger than the 1centimorgan window used to define LD Score [12], because the latter appears to exclude a noticeable part of the LD structure.
For each SNP we also built a histogram giving the numbers of SNPs in w max equally-spaced r 2 -windows covering the range r 2 min ≤ r 2 ≤ 1. These steps were carried out independently for 1000 Genomes phase 3 and for HAPGEN2 (for the latter, we used 1000 simulated samples).
Employing a similar procedure, we also built binary (logical) LD matrices identifying all pairs of SNPs for which LD r 2 > 0.8, a liberal threshold for SNPs being "synonymous".
In applying the model to summary statistics, we calculated histograms of TLD and LD block size (using 100 bins in both cases) and ignoring SNPs whose TLD or block size was so large that their frequency was less than a hundredth of the respective histogram peak; typically this amounted to restricting to SNPs for which TLD ≤ 600 and LD block size ≤ 1, 500. We also ignored summary statistics of SNPs for which MAF ≤ 0.01. For schizophrenia, for example, there were 6,610,991 SNPs with finite z-scores out of the 11,015,833 SNPs from the 1000 Genomes reference panel that underlie the model; the genomic control factor for these SNPs was λ GC = 1.466. Of these, 314,857 were filtered out due to low MAF or very large LD block size. The genomic control factor for the remaining SNPs was λ GC = 1.468; for the pruned subsets, with ≃ 1.49 × 10 6 SNPs each, it was λ = 1.30. (Note that genomic control values for pruned data are always lower than for unpruned data.) Since effect sizes of individual variants on complex phenotypes are tiny, allelic composition of each variant in case and control groups is very similar, resulting in similar LD structure. There will likely be at most only an extremely slight change in the LD structure for different phenotypedefined subgroups within a population. Imprecision in the phenotype-specificity of the reference panel is currently not a limiting factor in GWAS analysis.
Summary of the multiple binning process (1) We bin typed SNPs (SNPs with z-scores) into an H×L grid. Then, for any grid element, (2) we bin the tagged SNPs into fairly fine LD-r 2 bins: the range 0.05-1 is divided up into w=20 or so bins.
In the extreme, we can consider each typed SNP individually, and inquire about how its z-score arises from causal β values distributed among the reference SNPs it is in LD with (the SNPs tagged by the typed SNP).
The two levels of binning ((1) and (2) above) can be finessed until we get converged results for our three model parameters -and of course it not necessary to go all the way to the extreme scenario. As mentioned in the main text, we ensured the binning in (1) and (2) was fine enough to give converged results.
From (1) above, we have, in a given grid element, selected typed SNPs in a narrow range of H and L. Conceptually, think of these SNPs as being just a single typed SNP. An important point is that we look at the reference SNPs our typed SNP is in LD with: we group those into very narrow LD bins (w=20 bins). Our model is that the typed SNP's z-score arises from noise and possibly some causal effects among the SNPs it tags through LD: a causal SNP might be in any one of those latter bins, there might be more than one causal SNP in any one of those bins, and many of the bins might have causal SNPs, and all of these distinct possible scenarios need to be accounted for. k i in Eq. 21 is our random integer variable for the number of causal SNPs in bin i. A second important point is that within bin i, regardless of i, the heterozygosities of the binned reference SNPs will be in a very narrow rangeeffectively, heterozygosity binning of the SNPs tagged by a given typed SNP comes along free with fine-grained LD r 2 binning of those tagged SNPs (again, we're conceptually just dealing here with a single typed SNP whose z-score we are trying to predict). But because all SNPs in bin i have very similar heterozygosities, the heterozygosity of any one of them is necessarily properly accounted for: it is given by H i in Eq. 21, which carries over entirely to the Fourier version of the z-score pdf -see Eq. 26.

t-Statistic for Pearson correlation coefficient
The sample Pearson correlation coefficient,r, and simple linear regression slope,β, for the response and explanatory variables in a data set, are proportional and so have the same t-statistic (and p-value), calculated below.
The true (or population) correlation, r, between two random variables X and Y is for any constants α, β, γ, δ. If one linearly models Y as Y = βX + α, i.e., y i = βx i + α + ǫ i for the i th instantiation where ǫ i is the residual (random error, assumed to follow a zero-centered normal distribution: ǫ i ∼ N (0, σ 2 ǫ )), then, by minimizing the mean squared error, the population (true) value β is estimated as the sample linear regression coefficientβ: For centered sample data, with response variable y and explanatory variable g (both vectors over n samples, with meansȳ =ḡ = 0), one therefore has the sample equation whereǫ is the sample estimated residual. The mean sample response (prediction) for a given value of the explanatory variable is thereforeŷ = gβ.
Now, the variance of the sample slope (as an estimate of the population or true slope) arises through the variation in y i , given g i (the g i are assumed to be correct and constant explanatory variables). Thus, from Eq. 41, where se denotes standard error (dropping the hat as with var). Hence, But from Eq. 42, var(y i |g i ) ≡ var(ǫ) =σ 2 ǫ , hence From Eq. 44, and noting that, as withβ in Eq. 48, var(r) ≡ var(r|g), it immediately follows thatβ/se(β) = r/se(r). So to construct a confidence interval forr, or equivalently forβ, form the t-statistic  Table B . Testing model misspecification: double Gaussian for large and small effects, for twelve single setups (three different heritabilities, each with four different polygenicities -a single instantiation in each case, i.e., not averaged over multiple instantiations). True causal SNPs are a randomly selected fraction π 1 of all reference SNPs, and their β values are randomly drawn from Gaussians with variances σ 2 b and σ 2 c = 10σ 2 b , 90% from the former, 10% from the latter. See Eq 56. π 1ef f and σ 2 βef f are weighted means of the polygenicities and variances for each Gaussian -see text.

Mean Relative Risk
For logistic linear regression coefficient β, the odds ratio for disease is OR = e β ; for a rare disease, this is approximately equal to the genotypic relative risk: GRR ≃ OR. Since E β 2 = σ 2 β , the mean relative risk E [GRR] ≃ 1 + σ 2 β /2. Thus, for schizophrenia for example, the mean relative risk is ≃ 1.00003.

Model Misspecification
Single Gaussian with selection parameter If the true β effects are distributed in a way that is different from the assumed one, how meaningful are the parameters and their estimated values? In principle, this is an openended issue, but an important assessment can nevertheless be made.
Our model assumed that the βs are drawn from a single Gaussian with constant variance σ 2 β , for a fraction π 1 of the reference SNPs: Evidence has been reported indicating that some causal effects are larger for SNPs with lower heterozygosity. This has lead to consideration of where H is the heterozygosity of a causal SNP being assigned the effect β, S is a new parameter, and σ 2 β0 is a constant. S < 0 suggests there is selection pressure. S = −1 is implicit in LD Score regression; our model has S = 0; intermediate values have been reported by others [15].
Here, in a series of additional simulations using HAP-GEN as before with a sample size of 100,000, we set S = −0.5, generated new β values prescribed by Eq. 53, and calculated new GWAS z-scores, for the twelve heritability × polygenicity scenarios we used previously -see the main paper Table A , but this time with just a single instantiation in each case. We then fit our model to the simulated z-score. The estimated model parametersπ 1 ,σ 2 β , andσ 2 0 , along withĥ 2 andn causal , are shown in S1 Appendix Table  A .
The proportion of variance explained by variant i, whose genotype vector (over N samples) is g i , is q 2 i = var(y; g i ) = β 2 i H i , where y is the phenotype vector over the samples. Then, the expected value of this contribution to heritability is h 2 arises from π 1 of the n SNPs. So, not knowing which of the SNPs are causal, h 2 will approximately be given by We expect that the contribution of H S to the variances will be absorbed in an average sense into our single variance estimate,σ 2 β . Let W denote the mean of H S over all causal SNPs. Then σ 2 β ef f ≡ W σ 2 β0 , which is approximately what the model will try to estimate inσ 2 β . As can be seen from S1 Appendix Table A , there is reasonable concordance between the model estimates and the true parameters, with greatest mismatch (factor of 2-3) for the very lowest polygenicity (π 1 = 10 −5 ) scenarios with higher heritabilities (estimating as a mean response from the full set h 2ĥ2 π 1π1 σ 2 βσ 2 βσ 2 0 n causalncausal 0.10 0.03 3.1E-3 4.3E-4 1.6E-5 3.2E-5 1.03 34540 4722 0.40 0.18 3.1E-3 6.5E-4 6.6E-5 1.2E-4 1.10 34718 7112 0.70 0.35 3.1E-3 9.8E-4 1.2E-4 1.5E-4 1.16 34370 10840 Table C . Testing model misspecification for three different true heritabilities, h 2 = 0.1, 0.4, 0.7, for a fixed overall polygenicity, π 1 = 3.1 × 10 −3 . True causal SNPs are a randomly selected with prior probability π 1L , which decreases from a maximum of 0.005 at L = 1 down to 0 at L ≥ 200. π 1 is the fraction of all reference SNPs that are causal; their β values are randomly drawn from a Gaussian with variances σ 2 β . See S1 Appendix Fig K . of variants the actual response arising from a small number of SNPs having large effects). In other words, with pronounced selection pressure (S = −1/2), our model estimates for polygenicity (number of causal SNPs), discoverability, and heritability give a reasonable indication of the underlying polygenic architecture.

Double Gaussian: Large and Small Effects
If the true non-null β effects are distributed with respect to two Gaussians, how well does our single-Gaussian based model perform? It may be the case that the majority of causal effects are distributed with respect to a Gaussian with a particular variance, σ 2 b , while a minority are distributed with respect to a Gaussian with a much larger variance, σ 2 c . This reflects situations where a minority of causal SNPs have much larger effects than the majority of causal SNPs. Accordingly, here we examine the case where For definiteness, we set p c = 0.1 and σ 2 c = 10σ 2 b . As with S1 Apendix Table A , we examined twelve scenarios (π 1 = {10 −5 , 10 −4 , 10 −3 , 10 −2 }, and h 2 ≃ {0.1, 0.4, 0.7}), using HAPGEN-generated simulated genotypes. In the instantiations generated, the true heritability based on the sample is calculated, which depends on the particular set of β values and the heterozygosity of the SNPs they are assigned to. In Table B we report the results of the estimated parameters,π 1 ,σ 2 β , andσ 2 0 , when the distribution of causal effects is modeled with Eq 3. We also report the estimated heritability,ĥ 2 and number of causal SNPs, n causal . Since the "b"and "c" Gaussians in Eq. 56 are being estimated with a single Gaussian (β ∼ N (0, σ 2 β )), we also report effective underlying parameter values: π 1ef f , the weighted mean of π 1 × (1 − p c ) and π 1 × p c , weighted by σ b and σ c used as proxies for relative importance of the two Gaussians; and σ 2 βef f , the overall variance of the true causal effects. It can be seen thatσ 2 β reasonably tracks withσ 2 βef f , while for the larger polygenicities,π 1 reasonably tracks with π 1ef f , and the estimated heritabilities, h 2 , are close to the true values, h 2 . Although this example is not an exhaustive search, the results demonstrate the general applicability and robustness of the model, giv-ing polygenicities, discoverabilities, and heritabilities than are good indicators of the underlying truth.

Total LD-dependent Prior Probability
We also examined a highly artificial scenario where the prior probability of a SNP being causal, π 1L , is not a constant but dependent on total LD, L. Because the model assumes there is no LD-dependence on whether or not a reference SNP is causal, we do not expect the single Gaussian model to provide a good fit. We assumed that the prior probability of being causal, π 1L , decreases linearly from a maximum of π 1L = 0.005 at L = 1, down to π 1L = 0.0 for L ≥ 200. The overall proportion of causal SNPs is still denoted π 1 . Results are shown in S1 Appendix Table  C . There is consistent under-estimation of the heritability. The model QQ plots gave only a very poor fit to the simulated p-values. That is, the obvious poorness of the fits (see S1 Appendix Fig K ) gave little credence in the estimated parameters.

GWAS Replication
An issue that commonly arises in GWAS is whether zscores in discovery GWAS and those in replication GWAS are consistent. In particular, if some variants reach genomewide significance on one study but are far from approaching that threshold in a replication study, is there some overlooked problem or are the values in fact statistically consistent? Our model allows for making principled assessments of consistency between discovery and replication GWAS, and we carried out such an assessments in a recent GWAS of bipolar disorder [35].
A discovery z-score from sample size N d is the sum of two random components A replication z-score from sample size N r similarly is where δ is the genetic fixed effect (causal for the SNP in question, or LD-mediated from one or more neighboring causal SNPs), and ǫ is the environmental contribution and noise, modeled as a normal distribution, N (0, σ 2 0 ), with mean 0 and variance σ 2 0d or σ 2 0r (both approximately equal to 1 for BIP data). Effect sizes are related by The posterior distribution pdf(z r |z d ) is the convolution of which, using Eq. 59, gives pdf(δ r |z d ); where φ(z; δ, σ 2 ) is the Gaussian for z with mean δ and variance σ 2 ; pdf(z d ) and pdf(δ d ) are calculated using the Gaussian mixture model -see Eq. 29 and Eq. 32. It should be noted that these probability densities are SNPspecific in that they depend on the SNP's heterozygosity and LD structure, i.e., the distributions of heterozygosity and LD r 2 of its neighbors. The convolution can be written as This can be re-expressed using fast Fourier transforms without the need to perform explicit integration. Now, For the purpose of testing compatibility of discovery and replication z-scores, σ 0d ≃ σ 0r ≃ 1 can always be achieved in practice by estimating the original values from the data using the model described here, rescaling the discovery and replication z-scores by the respective inflation estimate, and then analyzing the compatibility of the rescaled discovery and replication z-scores. Assuming σ 0d ≃ σ 0r and writing it as σ 0 , Eq. 64 is Hence, The integral is just a convolution. Hence, letting F denote the Fourier transform operator, with F −1 its inverse, Using Eqs. 60 and 62, or equivalently Eq. 69, for any z d , pdf(δ r |z d ) and pdf(z r |z d ) can be calculated for finelyspaced vectors with elements δ r and z r respectively ranging from −a lim to a lim , with, say, a lim = 12 (wide enough so that, for any z d , the pdfs start at 0, increase as δ r or z r increases, and then ultimately decrease to 0 again with further increases in the 1st argument). See S1 Appendix Figures A and S1 Appendix B which were calculated for discovery and replication bipolar disorder GWAS data [35], where the data (SNPs) are divided up into a 4 × 4 grid of heterozygosity (H) × total LD (TLD).
To assess whether observed replication z-scores, z ro , are statistically consistent with the observed discovery zscores, z do , for a given SNP (explicitly taking into account its H and TLD structure) one can calculate the probability of obtaining a replication z-score, z r , that is "more extreme" that the observed value as follows: if z do > 0, calculate p(z r < z ro ) by integrating the pdf thus and if z do < 0, calculate p(z r > z ro ) from (small values will indicate outliers). This probability was calculated for 623 rs# SNPs with discovery p-values < 9.97 × 10 −5 , as described in [35]. For example, in the discovery data set PGC2 (N ef f = 49, 367, with 20,352 cases), SNP rs9834970 (H=0.49976, TLD=97.42, shown as the left-most green disc in the bottom-left panel in S1 Appendix Fig. A ) had z-score z do = −7.52 and p-value p do = 5.54 × 10 −14 . In the independent replication dataset (N ef f = 35, 240, with 9,412 cases) it had z ro = −1.04 and p ro = 0.2959. The expected effect size in the replication dataset given the discovery z-score, E(δ r |z do ) = −4.03.
The probability of obtaining a replication z-score more extreme than the one observed was p(z r > z ro |z do ) = 0.012. If the threshold for replicating were set at p t = 0.05 (corresponding to z t = −1.6449), then the probability for the replication z-score for SNP rs9834970 to reach this threshold is 0.97. For a null hypothesis that replication z-scores are consistent with discovery z-scores, there are no rejections (α = 0.05) for Bonferroni or Benjamini-Hochberg FDR. I.e., the replication data are, based on our model, consistent with the discovery data.

Additional notes on HDL
The LDSR estimate of SNP heritability for HDL cholesterol is 15.8% [85]. Our estimate is 7%. The LDSR estimate appears to be based on the subset of samples genotyped with genome-wide association study arrays, reported as 94k in [48], the same subset we used. Note that [48] also include another 94k samples genotyped with the Metabochip array; in cases where Metabochip and GWAS array data were available for the same individuals, they used Metabochip data to ensure that key variants were directly genotyped rather than imputed.
Our procedure is geared at characterizing the causal variants that can best be described in a distributional sense. To that end, we exclude SNPs that are in very large LD blocks (and SNPs with very low MAF). Thus, it is possible that we miss some large effects. However, such large effects are the ones that are most likely to be found; more important, from our stand point, is making principled estimates of the characteristics of the remaining undiscovered causal SNPs in complex traits. Nevertheless, in forthcoming work, we will analyze all available phenotypes with a more complex model, incorporating multiple causal Gaussian, heterozygosity-dependent variance, and total LD dependence of prior probabilities.
The more direct comparison for our work is with [77], not [48] which has twice the sample size.
[77] report 95 lipid-associated loci, 59 of which were new at that time. Based on their Supplementary Table 2, of the 95 loci, 47 were associated with HDL: 17 of these were previously found, an additional 3 were new findings in the previously found loci, and the remaining 27 were findings in new loci. A subsequent "conditional association analysis" identified secondary signals at 26 loci, of which 10 loci resulted in 11 SNPs (two from one locus) being associated with HDL. Their text notes that when the additional SNPs for the 26 loci were combined with the lead SNPs from the 95 loci, 12.1% of total variance in HDL was explained in the Fram-ingham Heart Study, to be compared with our estimate of 3.3% of phenotypic variance explained by genome-wide significant SNPs. If all 95 loci lead SNPs, plus all secondary SNPs from 26 of the loci, the were used to estimate the proportion of total variance explained by genome-wide significant SNPs, that could easily lead to the discrepancy.
Additionally, HDL has very large signals from chromosomes 8, 15, and 16. Thus it might be more appropriate to analyze HDL in a manner similar to AD, separating out chromosome 19. We will address this in future work. Pdf for replication z-score z r in a sample with N ef f = 35, 240, given discovery z-score z d in a sample with N ef f = 49, 367, for bipolar disorder. pdf(z r |z d ) is given by Eq. 69. PGC2 Table 1 and PGC2 STable 2 refer to Table 1 and Supplementary STable 2 in [35].   Table D . 95% confidence intervals for the model parameters (π 1 , σ 2 b and σ 2 0 ), the number of causal SNPs, and the heritability. See the main paper Table B . Confidence intervals for parameters were estimated using the inverse of the observed Fisher information matrix (FIM). The full FIM was estimated for all three parameters used in the model. For the derived quantity h 2 , which depends on all parameters, the covariances among the parameters, given by the off-diagonal elements of the inverse of the FIM, were incorporated. To calculate the confidence intervals, the mode was run with pruning at r 2 = 0.1 to approximate independence of z-scores (for blocks of GWAS SNPs -i.e., SNPs with a z-score -for which their LD r 2 > 0.1, a single SNP in the block was randomly chosen to represent the block; blocks were thus approximately independent).  Table  B . It is important to note that our heritability estimates are corrected for inflation, by dividing by the inflation parameter σ 2 0 ; this is not done for M 2, M 3 or M S. * Disease prevalences are given as a percentage in parentheses. For binary phenotypes, let h 2 obs denote the heritability on the observed 0-1 scale (this is h 2 in the main paper Figure A ). Let P denote the proportion of cases in the study: P = N cases /(N cases + N controls ). Then the heritability on the log-odds-ratio scale reported in [14] is h 2 log = h 2 obs /(P (1 − P )). The transformation between the observed and liability scale is given by Eq. 39. † N is the total sample size for quantitative traits; for qualitative traits, N ef f = 4/(1/N cases + 1/N controls ) -see main text.  (i.e., restricting the number of LD r 2 windows for the range 0 ≤ r 2 ≤ 1 to 1 -see the subsection Relation to Other Work on page S2). Shown here is a comparison of mean (std) true and estimated (ˆ) model parameters and derived quantities. Results for each line, for specified heritability h 2 and fraction π 1 of causal SNPs, are from 10 independent instantiations with random selection of the n causal causal SNPs that are assigned a β-value from the standard normal distribution. Compare with the main paper  Empirical -log 1 0 (q) Quantile-quantile plots for simulations. True polygenicity is specified for each column, and true heritability is specified for each row. QQ-plots for simulated data in dark blue, with 95% confidence interval in light blue; model prediction in yellow. The dashed red curve is the QQ plot corresponding to the true parameters. λ is the overall nominal genomic control factor for the pruned simulated data. The three estimated model parameters are: polygenicity,π 1 ; discoverability,σ 2 β ; and SNP association χ 2 -statistic inflation factor,σ 2 0 .ĥ 2 is the estimated narrow-sense chip heritability. The dotted black line is the expected plot under null. Reading the plots: on the vertical axis, choose a p-value threshold (more extreme values are further from the origin), then the horizontal axis gives the proportion of SNPs exceeding that threshold (higher proportions are closer to the origin).