Fig 1.
Statistical Power of the omnibus and sumZg tests under several simulation scenarios.
Power of the omnibus and sumZg tests with respect to sample overlap. Color of the line represents the test. Each panel correspond to a simulation scenario. Point shape indicates if the residual covariance was generated to be partially aligned with the genetic covariance or to be unconstrained (random). A 10 trait Z-score vector was generated as the sum of a genetic effect and a residual effect: Z = Zg + Zres. Zres was sampled from a normal multivariate distribution with covariance matrix terms equal to where n1 is the sample size of the first study, n2 is the sample size of the second study and ρ is the phenotypic covariance among the ns overlapping samples. Zg varied depending on the simulation scenario: (eff ~ Random) Zg were sampled from a uniform distribution with boundary [-6; 6], (eff ~ corG) Zg was sampled from a normal multivariate distribution with a random covariance matrix, (eff ~ wG) Zg was sampled along a straight line blurred with a normal noise, (eff ~ high H) Zg was sampled from a normal multivariate distribution simulating genetically uncorrelated traits with high heritability, (eff ~ het H) Zg was samples from a normal multivariate distribution simulating genetically uncorrelated trait with only the first having a high heritability.
Fig 2.
The diagram presents the overall analysis pipeline. A total of 36 GWAS were included covering several common diseases and quantitative traits. All GWAS summary statistics went through extensive pre-processing and quality control filtering, and missing single SNP statistics were imputed when possible. Multitrait approaches were then applied to all clean GWAS data and on each clinically based set (All, Immunity, Metabolism, Brain, Cardiovascular, Anthropometry, and Composite). After combining univariate and multivariate results, and merging SNPs within locus, a total of 6,767 associations were identified. After a comparison of results per approach, a clustering analysis was performed for variants within each set. Finally, we performed in-silico functional analysis of the clusters derived in the Metabolism set to assess their biological relevance.
Fig 3.
Multitrait approach comparison.
Panel (A) shows independent variants detected across the six approaches: univariate test (univ), omnibus test (omni), weighted sum of Z-score with uniform weight (sumZ1), weight defined as the loading of the first principal component of the phenotypic correlation (sumZr), the genetic correlation (sumZg), or defined using the loadings of an independent component analysis (sumZica). Each line corresponds to a test and each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot at the left represents the total number of significant independent signals detected by each approach. The stack bar at the top represents the cardinality of the sets. The next panels show the link between strengths of univariate association signal and the relative performance (i.e. larger power) of the four most tests: univ, omni, sumZg, and sumZica, for each phenotype set: anthropometry (B), cardiovascular (C), immunity (D), metabolism (E), brain (F), composite (G), and all phenotypes (H). Within each phenotype set, we split the top associated SNPs per region based on the most significant test, and derived the median chi-squared for each test. The radar plots show the derived median per test and illustrate the strong heterogeneity in patterns identified. For example, out of the 1605 SNPs from the anthropometry set, 1235 had stronger signal with univ as compared with other tests. The median chi-squares in that group were 49.1, 1.1, 2.0, 1.0, and 0.7 for height, body mass index (BMI), hip circumference (Hip), waist circumference (WC), and waist to hip ratio (WHR). Comparatively, the 267 SNPs harboring a stronger signal with omnibus, had median of 6.8, 20.1, 15.9, 11.2, and 7.2 for the same phenotypes.
Fig 4.
Multitrait genetic association clusters for the Metabolism set.
The panels summarize the clustering of the 392 independent SNPs selected from the Metabolism set analysis. The set includes 10 phenotypes: triglyceride (TG), total cholesterol (TC), type 2 diabetes (T2D), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), glycated hemoglobin (HbA1c), Homeostasis model assessment of β-cell function (HOMA-B), homeostasis model assessment of insulin resistance (HOMA-IR), fasting insulin, and fasting glucose. The alluvial plot in panel A) represents the decomposition of univariate genetic association and its rewiring to the six inferred clusters. The flow widths represent the proportion of phenotype’s variance explained by the subset of SNPs assigned to each specific cluster, relative to the total genetic variance explained by all 392 SNPs. For example, SNPs from cluster 6 capture approximatively 41.7% and 54.6% of that genetic variance for TC and LDL, respectively. For clarity, flows explaining less than 0.1% of the variance are not represented. Panel B) shows the heatmap of normalized beta coefficients per phenotype within each cluster. Each column is a SNP, with blue and red colors indicating negative and positive beta, respectively. Coded alleles have been defined according to the per cluster first principal component. The boxplots in panel C) shows the distribution per cluster of SNP’s entropy, an indicator of the fitness of the SNP-cluster assignment. SNPs perfectly assigned are expected to have entropy close to zero.
Fig 5.
Heterogeneity of genetic correlation across clusters for the Metabolism and Immunity sets.
We derived the genome-wide genetic correlation between phenotypes using LDscore regression and using Pearson correlation from all SNP Z-scores (top panels), and for SNPs within the identified clusters. Results for the Metabolism set are presented in panel (A) using only the four key traits, LDL, HDL, Triglyceride (TG) and type 2 diabetes (T2D). Results for the Immunity set are presented in the panel (B). For clarity only significant correlation are represented. The boldness of the line is proportional to the strength of the genetic correlation. Positive correlations are represented in blue and negative correlations in red. The values of the genetic correlation are indicated by the number next to the trait. Solid lines represent significant correlation (after Bonferroni correction) whereas dashed lines represent correlation significant only before Bonferroni correction. Note that because the clusters are inferred from the multivariate associations, the absolute value of the significance of the correlations is of limited interest. Nevertheless, it provides a useful descriptive statistic to identify the key structures within each cluster.
Table 1.
Top tissue associations and Immune related Genes by Clusters for the Immunity set.
Fig 6.
We projected cluster’s genes from the Metabolism phenotype set onto KEGG pathways and reconstructed a synthetic metabolic map. Panel A) presents the results for the lipoprotein component and panel B) for the lipid component. Gene names are highlighted by the colors of their associated clusters. When a gene is associated to several SNPs belonging to different clusters it is represented with several colors. To improve interpretation, we also present in panel C) a proxy for the relative contribution of each phenotype per cluster, defined as the loadings of the first principal component derived from the matrix of Z-score for the subset of SNPs in that cluster. Finally, panel D) shows the distribution of standardized beta for association between SNPs from each cluster and three diseases: any stroke (AS), coronary artery disease (CAD), and obesity (using body mass index as a proxy).
Table 2.
Drug target genes and associated SNPs in the metabolism set