This is an uncorrected proof.
Phenotypic variance heterogeneity across genotypes at a single nucleotide polymorphism (SNP) may reflect underlying gene-environment (G×E) or gene-gene interactions. We modeled variance heterogeneity for blood lipids and BMI in up to 44,211 participants and investigated relationships between variance effects (Pv), G×E interaction effects (with smoking and physical activity), and marginal genetic effects (Pm). Correlations between Pv and Pm were stronger for SNPs with established marginal effects (Spearman’s ρ = 0.401 for triglycerides, and ρ = 0.236 for BMI) compared to all SNPs. When Pv and Pm were compared for all pruned SNPs, only BMI was statistically significant (Spearman’s ρ = 0.010). Overall, SNPs with established marginal effects were overrepresented in the nominally significant part of the Pv distribution (Pbinomial <0.05). SNPs from the top 1% of the Pm distribution for BMI had more significant Pv values (PMann–Whitney = 1.46×10−5), and the odds ratio of SNPs with nominally significant (<0.05) Pm and Pv was 1.33 (95% CI: 1.12, 1.57) for BMI. Moreover, BMI SNPs with nominally significant G×E interaction P-values (Pint<0.05) were enriched with nominally significant Pv values (Pbinomial = 8.63×10−9 and 8.52×10−7 for SNP × smoking and SNP × physical activity, respectively). We conclude that some loci with strong marginal effects may be good candidates for G×E, and variance-based prioritization can be used to identify them.
Most contemporary studies of gene-environment interactions focus on gene variants that are known to bear strong and reliable associations with the traits of interest. The strategy is intuitive because it helps limit the number of tests performed by focusing on a relatively small number of gene variants. However, this approach is predicated on an implicit assumption that these loci are strong candidates for interactions owing to their established relationships with the index traits. The counter-argument is that, because these loci have highly consistent signals within and between populations that vary by environmental characteristics, the probability that these variants interact with other factors is low. The current analysis tests whether variants with strong marginal effects signals (i.e., those prioritized through conventional genome-wide association analyses) are strong or weak candidates for gene-environment interactions. Here we describe analyses focused on lipids and BMI that test this hypothesis by comparing marginal effect signals with variance effect signals and those derived from explicit genome-wide, gene-environment interaction analyses. We conclude that for BMI, there are features of the top-ranking marginal effect loci that render them stronger candidates for interactions than is true of variants with weaker marginal effects signals. These findings are likely to help optimize the efficiency of future gene-environment interaction analyses by providing evidence-based rankings for strong candidate loci.
Citation: Shungin D, Deng WQ, Varga TV, Luan J, Mihailov E, Metspalu A, et al. (2017) Ranking and characterization of established BMI and lipid associated loci as candidates for gene-environment interactions. PLoS Genet 13(6): e1006812. https://doi.org/10.1371/journal.pgen.1006812
Editor: Joshua M. Akey, University of Washington, UNITED STATES
Received: March 31, 2016; Accepted: May 10, 2017; Published: June 14, 2017
Copyright: © 2017 Shungin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Pm values were obtained from the Genetic Investigation of ANthropometric Traits (GIANT) and the Global Lipids Genetics Consortium (GLGC). Association statistics from GIANT and GLGC are available here: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium http://csg.sph.umich.edu//abecasis/public/lipids2013/<http://csg.sph.umich.edu/abecasis/public/lipids2013/>. Pv values were calculated as explained in the Methods. Pv values are made publicly available on Dryad at doi:10.5061/dryad.q1m7t. Pi values are drawn from GIANT and are contained in the following articles "Genome-wide physical activity interactions in adiposity? A meta-analysis of 200,452 adults" (10.1371/journal.pgen.1006528) and "Genome-wide meta-analysis of 241,258 adults accounting for smoking behaviour identifies novel loci for obesity traits" (10.1038/ncomms14977).
Funding: This research was undertaken as part of a research program supported by the European Commission (CoG-2015_681742_NASCENT), Swedish Research Council (Distinguished Young Researchers Award in Medicine), Swedish Heart-Lung Foundation, and the Novo Nordisk Foundation, all grants to PWF. TVV is supported by the Novo Nordisk Foundation Postdoctoral Fellowship within Endocrinology/Metabolism at International Elite Research Environments via NNF16OC0020698. TWW was supported by the grants "Bundesministerium für Bildung und Forschung": BMBF-01ER1206, BMBF-01ER1507. APM is a Wellcome Trust Senior Fellow in Basic Biomedical Science (grant WT098017). LAC acknowledges funding for the Framingham Heart Study: This research was conducted in part using data and resources from the Framingham Heart Study of the National Heart Lung and Blood Institute of the National Institutes of Health and Boston University School of Medicine. The analyses reflect intellectual input and resource development from the Framingham Heart Study investigators participating in the SNP Health Association Resource (SHARe) project. This work was partially supported by the National Heart, Lung and Blood Institute's Framingham Heart Study (Contract No. N01-HC-25195 and Contract No. HHSN268201500001I) and its contract with Affymetrix, Inc for genotyping services (Contract No. N02-HL-6-4278). A portion of this research utilized the Linux Cluster for Genetic Analysis (LinGA-II) funded by the Robert Dawson Evans Endowment of the Department of Medicine at Boston University School of Medicine and Boston Medical Center. This research was partially supported by grant R01-DK089256 from the National Institute of Diabetes and Digestive and Kidney Diseases (MPIs: I.B. Borecki, LAC, K. North). TOK was supported by the Danish Council for Independent Research (DFF—1333-00124) and Sapere Aude program grant (DFF—1331-00730B). RM would like to acknowledge the High Performance Computing Center of University of Tartu. EGCUT was supported by EU H2020 grants 692145, 676550, 654248, 692065, Estonian Research Council Grant IUT20-60, and PerMed I, NIASC, EIT—Health and European Union through the European Regional Development Fund (Project No, 2014-2020.4.01.15-0012 GENTRANSMED). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: I have read the journal's policy and the authors of this manuscript have the following competing interests: PWF has been a paid consultant for Eli Lilly and Sanofi Aventis and has received research support from several pharmaceutical companies as part of a European Union Innovative Medicines Initiative (IMI) project. LAC declares funding received by Affymetrix for genotyping of Framingham Heart Study subjects on the 250K Nsp, 250K Sty and 50K gene centric platform.
Gene-environment (G×E) interactions may contribute to complex diseases, but their detection has proven challenging; hence, a variety of approaches have been developed to enhance power. Most G×E analyses focus on loci that are strong biological candidates  or those with highly significant marginal effects . The latter approach is attractive because these loci are available in many large cohorts, and can be conveniently followed-up with interaction analyses if environmental data are accessible. Moreover, selecting SNPs with strong and reproducible marginal effect signals is a pragmatic data-reduction step that may improve power , although this approach risks omitting other promising candidates .
In a linear regression setting, the presence of interaction effects drives phenotypic variance heterogeneity by genotype [3,5]. Exploiting variance heterogeneity as a signature of interactions is appealing because, unlike standard approaches for assessing G×E interactions, no explicit information about environmental exposures is needed  and multiple exposures can be simultaneously considered.
Here we explored whether loci identified in large-scale genome-wide association studies (GWAS) of blood lipids and body mass index (BMI) are strong candidates for G×E interactions by comparing genome-wide variance heterogeneity P-value distributions generated using Levene’s test against P-value distributions for marginal effects and explicit G×E interaction effects (for smoking and physical activity).
We assessed between-genotype variance heterogeneity for up to 1,927,671 directly genotyped or imputed SNPs (HapMap II CEU reference panel ) that passed quality control (QC). Meta-analyses of Levene’s test summary statistics  were performed for BMI (n≤44,211 participants), and blood concentrations of high-density lipoprotein cholesterol (HDL-C) (n≤34,315), low-density lipoprotein cholesterol (LDL-C) (n≤34,180), total cholesterol (TC) (n≤34,318) and triglycerides (TG) (n≤34,110). We then obtained marginal effects results for the same index traits and SNPs from publicly available GWAS summary data from the GIANT (Genetic Investigation of ANthropometric Traits) Consortium  and GLGC (Global Lipids Genetics Consortium) [10,11].
We compared the genome-wide marginal effects with between-genotype variance heterogeneity results for each of the five cardiometabolic traits by calculating the association between marginal effects (Pm) and variance heterogeneity (Pv) P-values using the rank-based Spearman correlation (ρ). This was done using a set of 42,710 pruned SNPs produced using the—indep-pairwise command in PLINK (see Materials and Methods) to account for linkage disequilibrium (LD) among variants.
As shown in Table 1 (see also Fig 1A and S1 Table), the Spearman’s ρ for the association between Pm and Pv for all pruned SNPs was of very small magnitude and only statistically significant for BMI. The exclusion of SNPs based on progressively more conservative Pm thresholds (Pm<0.05; Pm<10−4; previously established loci with Pm<5×10−8 in external datasets), saw corresponding improvements in the magnitude of these correlations, which were statistical significant for all traits except TC when focusing on previously established loci. The BMI correlation at the Pm<0.05 threshold, as well as the test of equality with ρ for all SNPs, was statistically significant, suggesting concordance between marginal and variance signals at a nominal level of significance. The odds ratio (OR) for a SNP to have both Pm<0.05 and Pv<0.05 as compared to Pv≥0.05 was 1.33 (95% CI: 1.12, 1.57) for BMI while the 95% CIs of ORs for other traits included 1. On the other hand, the P-value for a non-zero ρ for TG was statistically significant when focusing on the established loci and at Pm<10−4, suggesting concordance between marginal and variance signals at more conservative Pm thresholds.
A. Percentile-scaled ranks of GWAS-derived SNPs for lipid traits on the genome-wide distribution of P-values from Levene’s meta-analysis. For each lipid trait (HDL, LDL, TG and TC on the vertical axis) we ranked Pv from Levene’s test for all SNPs from lowest to highest so that the lowest Pv for a given trait was assigned a rank equal to 1. We scaled ranks into percentiles such that the lowest Pv corresponded to the 100th percentile. We then plotted percentile-scaled ranks of GWAS-derived loci (black sticks on the blue axis) on the distribution of percentile-scaled ranks of genome-wide Pv (blue axis) for each trait and marked in red loci with Pv<0.05. Loci names are presented above the axis for Pv distribution of a given trait and are positioned in the same order as percentile-scaled ranks of GWAS-derived loci, but are equally spaced to facilitate cross-trait comparison (loci names with Levene’s test Pv<0.05 are highlighted in red). To the left of each axis we present counts of GWAS-derived loci with Pv<0.05 and total number of GWAS-derived loci in the analysis separated by a dash, as well as the P-value for the binomial test (Pbinomial). B. Percentile-scaled ranks of GWAS-derived SNPs for BMI on the genome-wide distribution of P-values obtained from Levene’s test (Pv) and between-strata difference test P-values (Pint) from the ‘SNP × Physical Activity’ and ‘SNP × Smoking’ interaction tests for BMI. For each analysis, we ranked P-values for all SNPs from lowest to highest so that the lowest P-value for a given trait was assigned a rank equal to 1. We scaled ranks into percentiles such that the lowest P-value corresponded to the 100th percentile. We then plotted percentile-scaled ranks of GWAS-derived loci (black sticks on the blue axis) on the distribution of percentile-scaled ranks of genome-wide P-values (blue axis) from all four approaches and marked in red loci with Pv<0.05 or Pint<0.05 (or 95th percentile for average rank between SNP × PA and SNP × Smoking). Loci names are presented above the axis for the P-value distribution of a given trait and are positioned in the same order as the percentile-scaled ranks of GWAS-derived loci, but are equally spaced to facilitate cross-trait comparisons (loci names with Pv<0.05 or Pint<0.05 are highlighted in red). To the left of each axis conveying each respective P-value distribution, we present counts of GWAS-derived BMI loci with Pv<0.05 or Pint<0.05 (or 95th percentile for the average rank of the SNP × PA and SNP × Smoking interaction tests) and the total number of GWAS-derived loci in the analysis separated by a dash, as well as the P-value for the binomial test (Pbinomial).
We further compared Pm with interaction P-values from exposure-specific (smoking and physical activity) genome-wide interaction tests for BMI (Pint); this was only done for BMI owing to the requirement for an adequately powered external dataset (such a dataset was accessible through the GIANT consortium) (Table 2). Marginal effects GWAS were performed by strata of smokers vs. non-smokers and physically active vs. inactive participants (n = 210,316 European-ancestry adults ) respectively, and a heterogeneity test  was used to generate exposure specific Pint distributions. Spearman ρ for the pruned set of SNPs in the SNP × physical activity and the SNP × smoking analyses were low and not statistically significant (Table 2). We also compared Pint values and Pv values for BMI. Spearman’s ρ for the pruned set of SNPs were low and not statistically significant.
We next tested if the number of previously established marginal effect SNPs (Pm<5×10−8) that were also nominally significant (Pv<0.05) for variance heterogeneity was greater than expected by chance (Tables 3 and 4, Fig 1). For 4 out of the 5 index traits, we observed enrichment at the lower end of the Pv distribution (Pv<0.05) for the established GWAS-derived lead SNPs. Thus, the nominally significant regions of the Pv distributions were generally enriched for GWAS-derived loci.
We also performed enrichment analyses to test if previously established marginal effects SNPs (Pm<5×10−8) are enriched for nominally significant (Pint<0.05) interactions in the SNP × physical activity or SNP × Smoking analyses, but no enrichment was observed (Table 3; Fig 1B). By contrast, for the physical activity and smoking interaction tests (using all pruned SNPs), the lower end of the Pint distribution (Pint<0.05) was enriched with SNPs that were nominally significant in the Levene’s test analysis (Pv<0.05) (Table 4). This enrichment translated into an OR of 1.08 (95% CI: 1.01, 1.14) for a SNP to have Pint<0.05 given Pv<0.05 vs. Pv≥0.05 for SNP × physical activity interaction. The corresponding OR for the SNP × smoking interaction test was not significant (OR = 1.02; 95% CI: 0.96, 1.08).
Finally, in the pruned SNP-set we used the Mann–Whitney U test to probe for systematic differences in Pv and Pm ranks. P-values were ordered from least significant to most significant, and the lowest 100th centile (i.e. the most significantly associated SNPs) was compared to the remaining 99th percentile for each of the five traits. For BMI, SNPs in the lowest 100th centile of the Pm distribution had markedly higher Pv ranks (i.e. more significant Pv) than the remaining SNPs (PMann–Whitney = 1.46×10−5; Table 5). Even when excluding previously established lead SNPs (Pm<5×10−8) for BMI (or SNPs +/-500kb proximal), SNPs from the lowest 100th centile of the Pm rank-ordered distribution had higher Pv ranks than the remaining SNPs (PMann–Whitney = 4.30×10−4; Table 5). Conversely, no difference in Pv ranks was observed for SNPs from the lowest 100th centile of the Pm rank-ordered distribution for the four blood lipid traits; this may reflect trait-specific G×E effects or differences in statistical power by trait. No differences in Pv ranks between SNPs from the lowest 99th centile of the Pm rank-ordered distribution compared to SNPs from the 98th to 1st centiles of the distribution were observed for any trait (PMann–Whitney>0.05; Table 5). Similarly, no difference in Pm ranks was observed for SNPs from the lowest 100th centile of the Pv rank-ordered distribution for any traits (PMann–Whitney>0.05; Table 6).
To assess whether a trait with a non-normal distribution (e.g. BMI) or strong marginal associations could cause spurious association between the marginal and variance signals, we recapitulated the analysis pipeline (correlation analysis, enrichment analysis, comparisons of rank Pm and Pv values) in simulations described in the Materials and Methods. Careful assessment of results emanating from these simulations did not reveal evidence of type I error rate inflation caused by the non-normal distribution of an outcome trait nor strong marginal effects. For instance, we extracted correlation P-values of Pm, Pv and Pint generated from 5,000 simulations. QQ-plots of the 5,000 correlation P-values, 2,500 binomial P-values, and 2,500 Mann-Whitney U test P-values revealed no inflation (S1A–S1C Fig, S2A and S2B Fig and S3A and S3B Fig, respectively). Repeating these analyses on subsets of SNPs with low Pm values did not materially change the results.
Collectively, our analyses highlight a few variants with genome-wide significant marginal effects that may be strong candidates for G×E interactions owing to their strong concurrent variance heterogeneity P-values. For BMI, such SNPs are also overrepresented in the nominally significant part of the Pv distribution. FTO is an excellent example, as it conveys strong marginal effects , exhibits high between-genotype heterogeneity here (Tables 2 and 3 and Fig 1B) and elsewhere , and reportedly interacts with physical activity, diet and other lifestyle exposures [2,14,15] and is associated with macronutrient intake [16,17].
Although variance heterogeneity tests are potentially powerful screening tools for G×E interactions, like most interaction tests, they may be bias prone. For example, apparent differences in phenotypic variances across genotypes may be caused by scaling, particularly when the phenotypic means also differ substantially , such that the per-genotype means and variances for index traits are correlated. However, where necessary we transformed variables, and the correlations between Pm and Pv were generally weak, excluding this as a likely source of bias. Using simulated data, we investigated whether the non-normal distribution of a trait can cause a spurious association between marginal and variance signals, which we show is highly improbable. Through further simulations, we assessed whether SNPs with large marginal effects inflate Pv, but observed no inflation, indicating that large genetic marginal effects do not artificially inflate variance heterogeneity to a meaningful extent, and SNPs with low Pm and low Pv-values are thus likely to be strong candidates for G×E interactions, at least in the case of BMI. It might also be that combining populations from ancestral (e.g., hunter-gatherers) and contemporary environments increases variance heterogeneity owing to diversity in population substructure rather than G×E interactions per se . However, this seems unlikely here, as the cohorts examined are from Westernized European-ancestry populations.
There are several additional explanations for between-genotype variance heterogeneity, such as variance misclassification that can occur when the index variant is located within a haplotype containing rare functional variants that convey strong marginal effects . Hence, although variance heterogeneity tests represent a useful data-reduction step, before conclusions are drawn about the presence or absence of G×E interactions, index variants should be validated by testing their interactions with explicit environmental exposures, as we did here with smoking and physical activity. However, genome-wide G×E interactions datasets are not comprised of functionally validated G×E interactions, as no such resource is currently available for human complex traits. This limitation inhibits the extent to which causal effects can be attributed to the top-ranking loci and their interactions with smoking or physical activity.
We conclude that the common approach of prioritizing loci with established genome-wide significant association signals without further discrimination for G×E interaction analyses might be useful, but the efficiency of such analyses could be substantially improved by focusing on variants with low P-values for both variance heterogeneity and marginal effects. We provide these rankings here to facilitate this approach.
Materials and methods
A detailed project flow-chart is shown in Fig 2.
Three sources of genome-wide results were used: i) meta-analysis of Levene’s test results for between-genotype heterogeneity of phenotypic variances; ii) published results for marginal effects genome-wide association studies undertaken by the GIANT and GLGC consortia; iii) published results for SNP × physical activity and SNP × smoking in BMI (from the GIANT consortium).
We performed a genome-wide search for SNPs whose associations with the following traits are characterized by high between-genotype variance heterogeneity: BMI, TC, TG, HDL-C and LDL-C. The variance heterogeneity analyses were performed using Levene’s test  in up to 44,211 participants of European descent from seven population-based cohorts. Descriptions of these cohorts are presented in S2 Table. To minimize bias that might result from unequal sample sizes between SNPs when calculating the correlations between the P-values from the marginal (Pm) and variance heterogeneity (Pv) meta-analyses, we restricted the sample size for analyses to 26,000 participants for BMI and to 24,000 participants for lipid traits (S4 Fig).
Genotyping and imputation
A detailed summary of sample sizes, genotyping platforms, genotype calling algorithms, sample and SNP quality control filters, and analysis software for all participating cohorts are provided in S2 and S3 Tables. For each individual, SNPs were imputed using the CEU reference panel of HapMap II  (S2 Table). We excluded SNPs with low imputation quality (below 0.3 for MACH, 0.4 for IMPUTE, and 0.8 for PLINK imputed data), Hardy-Weinberg equilibrium P <10−6, directly genotyped SNP call rate < 95%, and minor allele frequency (MAF) < 1%.
Selection of SNPs identified through GWAS
We identified SNPs that have been robustly associated (P<5x10-8) with the five cardiometabolic traits in European ancestry populations: 77 SNPs associated with BMI discovered by GIANT ; and 58 SNPs associated with LDL-C, 71 SNPs associated with HDL-C, 74 SNPs associated with TC, and 40 SNPs associated with TG [10,11] discovered by GLGC.
Variance heterogeneity analyses
We used Levene’s test  to identify SNPs that show heterogeneity of phenotypic variances (σi2) across the three genotype groups at each SNP locus (i = 0, 1, or 2). We first log10 transformed all five traits followed by a z-score transformation by subtracting the sample mean and dividing by the sample standard deviation (SD), and further Winsorized the z-score values at 4 SD. The transformed phenotype Y was then used to calculate Z, defined by the absolute deviation of each participant’s phenotype from the sample mean of his or her respective genotype group at a given SNP locus. For each trait, participating cohorts provided the necessary summary statistics for each genotype at each marker . Specifically, the per genotype group counts (n0s, n1s, n2s), per genotype means (), and per genotype group variances of Z (σ0s2,σ1s2,σ2s2) were centrally collected and meta-analyzed. The minimum number of observations per genotype group required is 30 participants per cohort.
Meta-analyses were performed using the following formula, derived previously :
Where N is the combined sample size, and are the sample mean and variance of Z in the ith genotype group of the sth study, respectively. When combining summary-level data to calculate the Levene’s test statistics L, the following natural weights ωis and γi were calculated: and , where ni the sum of genotype counts in the ith genotype group across all participating cohorts. These weights are determined by the frequency of the marker amongst the cohorts, such that the sum of both weights is equal to 1, i.e. and . The meta-analysis Levene’s test P-value is obtained by comparing L to an F-distribution with df1 = 2 and df2 = N-3.
Comparison between marginal effects and variance heterogeneity P-values
Marginal effects P-values for BMI and the relevant lipid traits were obtained from publically available GWAS summary data from the GIANT  and GLGC [10,11] consortia, respectively (all cohorts included here in the Levene’s meta-analysis were also included in the GIANT and GLGC datasets).
To illustrate our findings, we rank-ordered the P-values (from lowest to highest) from both marginal effects and variance effects analyses for all 1,927,671 SNPs so that the lowest P-value for a given trait was assigned a rank equal to the lowest 100th centile. These rank-scaled distributions for Pm for all five traits are presented in Fig 1.
We calculated Spearman’s correlations for each of the five cardiometabolic traits between Pm and Pv. This was done using a pruned set of SNPs. Pruning was performed in the TwinGene cohort using the—indep-pairwise 50 5 0.1 command in PLINK  by calculating LD (r2) for each pair of SNPs within a window of 50 SNPs, removing one of a pair of SNPs if r2>0.1; we proceeded by shifting the window 5 SNPs forwards and repeating the procedure. Spearman’s correlations were computed for categories of SNPs: i) all pruned SNPs, ii) the subset of SNPs that was nominally significant (Pm<0.05) in the marginal effects analysis, iii) the subset of SNPs with Pm<10−4 in the marginal effects analysis, and iv) SNPs that were previously established in conventional marginal effects GWAS meta-analyses (Pm<5×10−8). We also compared Spearman’s correlations between these categories of SNPs using the test for equality of two correlations .
Next, we performed enrichment analyses to test if there was a higher number of established SNPs in the nominally significant variance P-value (Pv<0.05) distribution than expected by chance under the binominal distribution.
We also tested if there is a difference in Pv ranks for SNPs from the lowest 100th centile of the Pm rank-ordered distribution for all five traits and the rest of SNPs in the pruned set of SNPs using the Mann–Whitney U test, including and excluding established SNPs (or SNPs that were +/-500kb from the reported lead SNP). This analysis was repeated for SNPs from the 99th centile vs SNPs from 1st to 98th centiles of the Pm rank-ordered distribution. The same Mann–Whitney U tests were used to study differences in Pm ranks for SNPs from the lowest 100th and 99th centiles of the Pv rank-ordered distribution and the rest of SNPs in the pruned set of SNPs.
All analyses were performed using Stata 12 (StataCorp LP, TX, USA), unless specified otherwise.
SNP × Physical activity and SNP × Smoking interaction analyses for the outcome of BMI
We used now published data from 210,316 European-ancestry adults (from the GIANT consortium) pertaining to marginal effects meta-analyses for BMI that had been performed separately by strata of smoking (45,968 smokers vs. 164,355 non-smokers) . The genetic marginal effect estimates, calculated separately within each of the two strata, were compared using a heterogeneity test  to infer the presence or absence of SNP × smoking interaction effects. The same analyses were performed using physical activity as a binary stratifying variable in up to 180,287 European-ancestry adults (42,065 physically active vs. 138,222 physically inactive) . We calculated Spearman correlations between the P-values derived from the marginal effects meta-analysis and the Pint from the interaction effects meta-analysis (i.e., the between-strata heterogeneity test for SNP × smoking and SNP × physical activity interactions from the GIANT consortium); these tests were undertaken for all SNPs and those SNPs that were nominally significant (Pm<0.05) in the marginal effects analysis. We then performed enrichment analyses to test if the numbers of nominally significant (Pint<0.05) GWAS-derived SNPs from both SNP × physical activity and SNP × smoking analyses were greater than expected by chance under the binomial distribution. We further calculated the OR of having Pint<0.05 given Pv<0.05 versus Pv≥0.05 both SNP × physical activity and SNP × smoking interaction analyses in a pruned set of TwinGene SNPs produced using the—indep-pairwise 50 5 0.8 command in PLINK .
Thereafter, we calculated the average rank for each SNP’s ranking on the Pint rank-ordered distributions from the SNP × smoking and SNP × physical activity interaction analyses and performed enrichment analysis using these average ranks with >95th centile instead of Pint<0.05 as the cut-off.
We simulated genetic data for 44,000 individuals from a pruned set of 50,335 SNPs with allele frequencies, effect estimates and Pm values drawn from the GIANT consortium. We generated an outcome trait by summing the products of the simulated allele counts and effect estimates over all SNPs for each individual, and subsequently added a randomly generated non-normal error term such that the trait resembles the observed distribution of the transformed BMI trait used in the main (real data) analyses. We also simulated a fixed binary interacting factor with 30% prevalence. Using this simulated dataset, we calculated Pm, Pv and Pint values for each SNP and undertook i) pairwise Spearman correlation analyses between Pm, Pv and Pint values (5,000 simulations), ii) enrichment analysis using binomial tests (2,500 simulations) and iii) Mann-Whitney U tests to determine systematic differences in Pv and Pm ranks (2,500 simulations). Following the same pipeline, we created additional simulated datasets narrowing down SNPs to i) those with Pm values from the lowest percentile (n = 504; highest Pm = 5×10−3) and to ii) genome-wide significant SNPs (n = 71; Pm<5×10−8), and tested the pairwise Spearman correlation for Pm, Pv and Pint values (1,000 simulations for both sets). Simulations were run using the statistical software R (v. 3.3.2).
A: Quantile-quantile plot of Spearman correlation test P-values for ranks of Pm and Pv. Quantile-quantile plot of Spearman correlation test P-values for ranks of Pm and Pv. The figure illustrates 5,000 Spearman correlation P values testing for correlation between Pm and and Pv values drawn from a simulated dataset of 44,000 individuals and 50,335 SNPs. In the figure, distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines. The dashed red line represents the correlation P value obtained from the “real data” analysis presented in the main text. B. Quantile-quantile plot of Spearman correlation test P-values for ranks of Pm and Pint. Quantile-quantile plot of Spearman correlation test P-values for ranks of Pm and Pint. The figure illustrates 5,000 Spearman correlation P values testing for correlation between Pm and and Pint values drawn from a simulated dataset of 44,000 individuals and 50,335 SNPs. In the figure, distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines. C. Quantile-quantile plot of Spearman correlation test P-values for ranks of Pint and Pv. Quantile-quantile plot of Spearman correlation test P-values for ranks of Pint and Pv. The figure illustrates 5,000 Spearman correlation P values testing for correlation between Pint and and Pv values drawn from a simulated dataset of 44,000 individuals and 50,335 SNPs. In the figure, distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines.
A. Quantile-quantile plot of binomial test P-values for enrichment of variants with Pv<0.05 among variants with Pm<0.05. Quantile-quantile plot of binomial test P-values for enrichment of variants with Pv<0.05 among variants with Pm<0.05. The figure illustrates 2,500 binomial P values testing for enrichment of variants with Pv<0.05 among all variants with Pm<0.05. Pv and and Pm values drawn from a simulated dataset of 44,000 individuals and 50,335 SNPs. In the figure, distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines. B. Quantile-quantile plot of binomial test P-values for enrichment of variants with Pv<0.05 among variants with Pint<0.05. Quantile-quantile plot of binomial test P-values for enrichment of variants with Pv<0.05 among variants with Pint<0.05. The figure illustrates 2,500 binomial P values testing for enrichment of variants with Pv<0.05 among all variants with Pint<0.05. Pv and and Pint values drawn from a simulated dataset of 44,000 individuals and 50,335 SNPs. In the figure, the distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines. The dashed red line represents the correlation P value obtained from the “real data” analysis presented in the main text.
A. Quantile-quantile plot of Mann-Whitney U test P-values for systematic differences in Pv ranks among variants with top ranking and lower ranking Pm values. Quantile-quantile plot of Mann-Whitney U test P-values for systematic differences in Pv ranks among variants with top ranking and lower ranking Pm values. The figure illustrates 2,500 Mann-Whitney U P values testing for systematic differences in Pv ranks among those variants with the most significant Pm values (100th percentile of Pm distribution) and the remaining variants (1–99 percentile of Pm distribution). Pv and and Pm values drawn from a simulated dataset of 44,000 individuals and 50,335 SNPs. In the figure, distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines. The dashed red line represents the correlation P value obtained from the “real data” analysis presented in the main text. B. Quantile-quantile plot of Mann-Whitney U test P-values for systematic differences in Pm ranks among variants with top ranking and lower ranking Pv values. Quantile-quantile plot of Mann-Whitney U test P-values for systematic differences in Pm ranks among variants with top ranking and lower ranking Pv values. The figure illustrates 2,500 Mann-Whitney U P values testing for systematic differences in Pm ranks among those variants with the most significant Pv values (100th percentile of Pv distribution) and the remaining variants (1–99 percentile of Pv distribution). Pv and and Pm values drawn from a simulated dataset of 44,000 individuals and 50,335 SNPs. In the figure, distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines. The dashed red line represents the correlation P value obtained from the “real data” analysis presented in the main text.
S4 Fig. Quantile-quantile plots of Levene’s test P-values for SNP associations with lipid traits and BMI.
Associations between SNPs and BMI (A), LDL (B), HDL (C), TG (D), TC (E) are presented. Only SNPs with N ≥ 26,000 samples for BMI and N ≥ 24,000 for lipid traits are shown. In each sub-figure, distribution under the null hypothesis is represented as a black line while its 95% confidence interval is represented as dashed gray lines.
S1 Table. Detailed results for known BMI, LDL-C, HDL-C, TG and TC loci.
S2 Table. Study design, number of participants and sample quality control for genome-wide association study cohorts.
S3 Table. Information on genotyping methods, quality control of SNPs, imputation, and statistical analysis.
- Conceptualization: DS WQD RM TVV GP PWF.
- Data curation: DS WQD.
- Formal analysis: DS WQD RM TVV JL EM AM.
- Funding acquisition: GP PWF.
- Methodology: DS WQD TVV GP PWF.
- Project administration: GP PWF.
- Resources: APM NGF CLi PKEM NLP GH AYC AEJ MG TWW LMR CLa LAC PMR NJW KKO RJFL DIC EI TOK RAS GP PWF.
- Software: DS WQD TVV.
- Supervision: GP PWF.
- Visualization: DS TVV.
- Writing – original draft: DS GP PWF.
- Writing – review & editing: DS WQD RM TVV JL EM AM APM NGF CLi PKEM NLP GH AYC AEJ MG TWW LMR CLa LAC PMR NJW KKO RJFL DIC EI TOK RAS GP PWF.
- 1. Franks PW, Mesa JL, Harding AH, Wareham NJ (2007) Gene-lifestyle interaction on risk of type 2 diabetes. Nutr Metab Cardiovasc Dis 17: 104–124. pmid:17011759
- 2. Kilpelainen TO, Qi L, Brage S, Sharp SJ, Sonestedt E, et al. (2011) Physical activity attenuates the influence of FTO variants on obesity risk: a meta-analysis of 218,166 adults and 19,268 children. PLoS Med 8: e1001116. pmid:22069379
- 3. Deng WQ, Pare G (2011) A fast algorithm to optimize SNP prioritization for gene-gene and gene-environment interactions. Genet Epidemiol 35: 729–738. pmid:21922538
- 4. Scott RA, Chu AY, Grarup N, Manning AK, Hivert MF, et al. (2012) No interactions between previously associated 2-hour glucose gene variants and physical activity or BMI on 2-hour glucose levels. Diabetes 61: 1291–1296. pmid:22415877
- 5. Yang J, Loos RJ, Powell JE, Medland SE, Speliotes EK, et al. (2012) FTO genotype is associated with phenotypic variability of body mass index. Nature 490: 267–272. pmid:22982992
- 6. Pare G, Cook NR, Ridker PM, Chasman DI (2010) On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women's Genome Health Study. PLoS Genet 6: e1000981. pmid:20585554
- 7. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861. pmid:17943122
- 8. Deng WQ., Asma S, and Paré G. (2014) Meta-analysis of SNPs involved in variance heterogeneity using Levene’s test for equal variances. European Journal of Human Genetics 22.3: 427–430. pmid:23921533
- 9. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, et al. (2015) Genetic studies of body mass index yield new insights for obesity biology. Nature 518: 197–206. pmid:25673413
- 10. Willer CJ, Schmidt EM, Sengupta S, Peloso GM, et al. (2013) Discovery and refinement of loci associated with lipid levels. Nat Genet 45: 1274–1283. pmid:24097068
- 11. Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466: 707–713. pmid:20686565
- 12. Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, et al. (2013) Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet 9: e1003500. pmid:23754948
- 13. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316: 889–894. pmid:17434869
- 14. Ahmad S, Rukh G, Varga TV, Ali A, Kurbasic A, et al. (2013) Gene x physical activity interactions in obesity: combined analysis of 111,421 individuals of European ancestry. PLoS Genet 9: e1003607. pmid:23935507
- 15. Young AI, Wauthier F, Donnelly P (2016) Multiple novel gene-by-environment interactions modify the effect of FTO variants on body mass index. Nat Commun 7: 12724. pmid:27596730
- 16. Qi Q, Kilpelainen TO, Downer MK, Tanaka T, Smith CE, et al. (2014) FTO genetic variants, dietary intake and body mass index: insights from 177,330 individuals. Hum Mol Genet 23: 6961–6972. pmid:25104851
- 17. Tanaka T, Ngwa JS, van Rooij FJ, Zillikens MC, Wojczynski MK, et al. (2013) Genome-wide meta-analysis of observational studies shows common genetic variants associated with macronutrient intake. Am J Clin Nutr 97: 1395–1402. pmid:23636237
- 18. Sun X, Elston R, Morris N, Zhu X (2013) What is the significance of difference in phenotypic variability across SNP genotypes? Am J Hum Genet 93: 390–397. pmid:23910463
- 19. Marigorta UM, Gibson G (2014) A simulation study of gene-by-environment interactions in GWAS implies ample hidden effects. Front Genet 5: 225. pmid:25101110
- 20. Levene H (1960) Robust tests for equality of variances. In: Olkin I, editor. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Stanford, CA: Stanford University Press. pp. 278–292.
- 21. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559–575. pmid:17701901
- 22. Kleinbaum DG, Kleinbaum DG (2007) Applied regression analysis and other multivariable methods. Australia; Belmont, CA: Brooks/Cole. xxi, 906 p. p.
- 23. Justice AE., et al. (2017) Genome-wide meta-analysis of 241,258 adults accounting for smoking behavior identifies novel loci for obesity traits." Nat Commun 8: 14977. pmid:28443625
- 24. Graff M, et al. (2017) Genome-wide physical activity interactions in adiposity―A meta-analysis of 200,452 adults. PLoS Genetics 13.4: e1006528. pmid:28448500
- 25. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.