Results of a “GWAS Plus:” General Cognitive Ability Is Substantially Heritable and Massively Polygenic

We carried out a genome-wide association study (GWAS) for general cognitive ability (GCA) plus three other analyses of GWAS data that aggregate the effects of multiple single-nucleotide polymorphisms (SNPs) in various ways. Our multigenerational sample comprised 7,100 Caucasian participants, drawn from two longitudinal family studies, who had been assessed with an age-appropriate IQ test and had provided DNA samples passing quality screens. We conducted the GWAS across ∼2.5 million SNPs (both typed and imputed), using a generalized least-squares method appropriate for the different family structures present in our sample, and subsequently conducted gene-based association tests. We also conducted polygenic prediction analyses under five-fold cross-validation, using two different schemes of weighting SNPs. Using parametric bootstrapping, we assessed the performance of this prediction procedure under the null. Finally, we estimated the proportion of variance attributable to all genotyped SNPs as random effects with software GCTA. The study is limited chiefly by its power to detect realistic single-SNP or single-gene effects, none of which reached genome-wide significance, though some genomic inflation was evident from the GWAS. Unit SNP weights performed about as well as least-squares regression weights under cross-validation, but the performance of both increased as more SNPs were included in calculating the polygenic score. Estimates from GCTA were 35% of phenotypic variance at the recommended biological-relatedness ceiling. Taken together, our results concur with other recent studies: they support a substantial heritability of GCA, arising from a very large number of causal SNPs, each of very small effect. We place our study in the context of the literature–both contemporary and historical–and provide accessible explication of our statistical methods.

The GREML software GCTA allows the user to set a maximum allowable degree of genetic relatedness among participants entered into analysis (herein, the "genetic-relatedness ceiling"). To gain further insight into the quantity estimated by GCTA, ℎ , we conducted an exploratory analysis in which we calculated ℎ at varying genetic-relatedness ceilings. Specifically, we ran GCTA with FSIQ as the phenotype and with the same covariates used in the GWAS (sex, birth year, 10 EIGENSTRAT principal components) as fixed effects, 234 times over. Each of the 234 runs used a different genetic-relatedness ceiling, ranging from 0.005 to 1.17, in increments of 0.005. As the ceiling increased, both the number of participants, and the degree to which participants in the analysis could be genetically related to one another, increased. Figure A1 (below) graphs ℎ as a function of genetic-relatedness ceiling, with error bars representing ±1 standard error. As the ceiling increased, more participants were included in the analysis, and the statistical precision of ℎ increased. More importantly, the point estimate itself increased as well. Below a ceiling of 0.015, sample size was less than 300, and the software produced nonsensical negative point estimates. A noticeable spike in ℎ is evident around 0.5, where full siblings (including DZ twins) were introduced. Something similar happens around 1.0, where the MZ twins were introduced. Figure A2 (below) shows how the sample size of the GCTA analysis increases with relatedness ceiling, and Figure A3 (below) shows ℎ as a function of sample size rather than the ceiling. When we systematically incremented the ceiling by regular intervals, three sample-size plateaus were evident ( Figure  A2). For ceilings between 0.15 and 0.4, N and ℎ were steady around 3,600 and 0.43, respectively. Between ceilings of 0.56 to 0.98, they were steady at about 6,050 and 0.66. When all 7,100 GWAS participants were included at a ceiling of 1.17, GCTA yielded ℎ = 0.77 (SE = 0.01).
According to Yang et al. (Ref 28), if the purpose of GCTA is to estimate how much phenotypic variance is attributable to the common SNPs on a genome-wide array, then close relatives should be excluded from analysis. They suggest a genetic-relatedness ceiling of 0.025. The reason for excluding close relatives is that, if they are instead included, then the GCTA estimator may overestimate the actual variance attributable to the SNPs on the array. When close relatives are included, the GCTA estimator functions more like a pedigree-based estimator, which captures the influence of all trait-relevant polymorphisms that contribute to familial resemblance, no matter how rare, and not just the genotyped SNPs (and other polymorphisms the SNPs tag). In the extreme case, then, a GCTA variance component estimated from MZ twins could even reflect the influence of de novo mutations, which contribute to MZ-twin resemblance but are not tagged in the population by common SNPs. It would seem, then, that if one wants to estimate the overall heritability of a phenotype with GCTA, inclusion of close relatives is the way to go. It might be tempting to conclude that the ℎ values in Figure A1 produced when the relatedness ceiling is above 1.0 are molecular-genetic estimates of the true, broad-sense heritability of GCA. But that is not necessarily the case: as Yang et al. (Ref 28) also remind us, including close relatives confounds genetic resemblance with shared-environmental influence.
Although ℎ apparently increased with sample size ( Figure A3), this was because both sample size and ℎ increased as the genetic-relatedness ceiling-the parameter of the GCTA analyses we directly manipulated-was relaxed. We would not expect ℎ , which is unbiased, to systematically increase with sample size when additional unrelated individuals are added to the sample (Visscher et al.,Ref 30). In contrast, we would anticipate greater ℎ values if we had genotyped additional SNPs (Visscher et al.,Ref 30). It is explainable in terms of substantive theory, since for a polygenic trait, inclusion of more SNPs is expected to include more causal polymorphisms into calculating the genetic relationship matrix A-either directly or by proxy due to LD. This is supported by a study of several quantitative phenotypes by Yang et al. (2011, Nature Genetics 43, 519-525), in which they partitioned among the 22 autosomes and observed positive correlations between each chromosome's length and its own variance component (e.g., r = 0.83 for height). Further analyses suggested that these correlations reflected not chromosomal length per se, but the number of intragenic SNPs genotyped on each chromosome.
With MZ twins included in analysis, GCTA produced 2 SNP h estimates of 0.77, right around the residual MZ-twin correlation from RFGLS (Table S2,   as calculated at different genetic-relatedness ceilings.
Error bars are ±1 standard error. Genetic-relatedness ceiling is the maximum degree of genetic relationship allowed among participants entered into analysis. Figure A2. GCTA sample size as function of genetic-relatedness ceiling.
Genetic-relatedness ceiling is the maximum degree of genetic relationship allowed among participants entered into analysis.