Multitrait GWAS to connect disease variants and biological mechanisms

Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.


Introduction
Genome-wide association studies (GWASs) have identified thousands of significant genetic associations for multiple traits and diseases [1]. Publicly available summary statistics from these GWASs have proven to be invaluable in human genetic studies, enabling a range of secondary analyses without requiring individual-level genotype data and thus, averting major practical and ethical issues [2]. Among others, the estimation of phenotype heritability [3], the derivation of polygenic risk score [4], and the assessment of causal relations between phenotypes [5] are paragons of their critical utility. GWAS summary statistics have also been extremely useful to investigate pleiotropy and genetic correlation between human phenotypes. For example, recent works assessed whether significant loci for a given phenotype are also associated with other traits [6,7] while others estimated genome-wide [8,9] and regional [10] genetic correlations among phenotypes. The joint test of multiple traits can also be an efficient way to detect genetic variants missed by univariate screening [11][12][13][14][15][16][17][18][19][20][21][22][23], especially those with association patterns that deviate from the observed phenotypic correlation [24][25][26]. Nevertheless, while simulation studies and examples from real data applications in best case scenarios have confirmed the relevance of multitrait association tests, they have seldom been applied to large-scale datasets.
The application of multitrait association tests to a large heterogeneous set of traits requires overcoming several practical issues including careful preprocessing of individual GWAS summary statistics to avoid statistical artifacts, the estimation of multiple global parameters, and addressing widespread missing summary statistics. We addressed these issues in recent studies, developing the RAISS [27] approach for imputation, and JASS preprocessing and multitrait analysis pipeline software [28]. Nevertheless, the relative performances of existing multitrait tests in real data have not been fully addressed. In brief, two types of methods have been developed, weighted sum of univariate statistics, assuming a specific multivariate genetic effect distribution [12,14,15], and an omnibus approach, allowing for one degree-of-freedom per statistic [11,29], with some approaches using a combination of both [30]. An extensive and fair comparison of these methods is challenging as most face some of the aforementioned practical issues, no readily available implementation [28], and power in real data highly depends on the true (and unknown) multitrait genetic effect distribution. Finally, in addition to the potential ability to detect new variants, there is increasing interest in using GWAS multitrait association to decipher inter-and intra-phenotype genetic architecture [31,32]. Again, real data applications are scarce, and questions remain regarding the approach to be used, the detectability of the multitrait genetic structure behind genome-wide genetic correlation, their potential matching to biological mechanisms, and their potential clinical utility.
Here, we build on previous works to conduct a large-scale multitrait analysis using GWAS summary statistics from 36 phenotypes categorized into five clinically relevant sets (Immunity, Anthropometry, Metabolism, Cardiovascular and Brain). We implemented five tests, an omnibus K degree freedom test (for K GWAS analyzed jointly) similar to [11,29], and a weighted approach using four alternative weighting schemes, including some partly similar to previously proposed ones [12,14]. Comparing the relative performances of these models, we found substantial specificity of the signal identified by each approach, in terms of both association patterns and expressed tissue enrichment. We then used a Gaussian mixture model on the phenotypes with a variant association matrix to identify potential clusters of variants displaying similar genetic multitrait association profiles. In-depth functional analysis of the resulting clusters demonstrates a connection between those profiles and tissue specific expression. This breakdown of multitrait association signals highlighted how the overall genetic correlation between phenotypes can be decomposed into likely distinct genetic pathways. Finally, we used the phenotypes from the Immunity and Metabolism sets as case studies to demonstrate the matching between the identified profile and known biological pathways. Notably, mapping SNPs with unknown functions to pleiotropy profiles can indicate putative pathways. We conclude by investigating the potential clinical utility of the identified clusters for drug targeting.

Multitrait genetic association test
The first step of our study consisted of ensuring the validity of the proposed statistical tests by studying potential bias, and assessing their statistical power under several simulation scenarios. Let us denote the vectors of single nucleotide polymorphism (SNP) Z-scores z = (z 1 ,. . .,z K ,), where K is the number of phenotypes (i.e., the number of GWASs analyzed jointly). The first model we used, which we refer to as sumZ, assumes that genetic effects across the phenotype analyzed follow a direction specified by a vector of weights w to form a weighted sum of Zscores. Here we considered four weighting schemes: i) uniform weighting (sumZ 1 ); ii) weighting according to the first principal component of the phenotypic correlation matrix (sumZ r ); iii) weighting according to the first principal component of the genetic correlation matrix (sumZ g ); and iv) weighting according to the independent component analysis of the Z-scores matrix (sumZ ica ). The second approach, which we refer to as omnibus, does not rely on a prior specification on the direction or magnitude of the SNP effect across traits. In brief, it compares, for one SNP, the vector of genetic effects z with the expected multivariate normal distribution under the null. It is a standard omnibus test based on summary statistics that allows for one degree of freedom per outcome (here per phenotype). Both approaches (sumZ and omnibus) rely on a valid estimation of S r , the variance-covariance matrix between z 1 ,. . .,z K , under the null hypothesis of no association, and share similarities with previous approaches (see S1 Text).
We first performed an in-depth validation of each approach, starting with a series of simulations under an ideal situation, when there were no missing data and the true Z-score covariance matrix, S r is known (S1-S3 Figs). We further show using both simulated data (S4 and S5 Figs and S1 Text) and real data from the UK Biobank cohort that in the specific case of complete sample overlap between GWASs, the omnibus test is asymptotically similar to a multivariate analysis of variance (MANOVA) applied to individual level data (S6 Fig and S1 Text). The only major potential source of bias we identified is the misspecification of S r which can lead to severe type I error inflation (S7 Fig). Misspecification can affect all variants, if S r is estimated naively using the z-score data as proposed in previous studies (S8 Fig). Comparing various approaches, we found that S r can be accurately estimated using LDscore regression [9] (S9  Fig), and the approach was therefore used to estimate S r along the genome-wide genetic correlation (S g ) for the 36 phenotypes analyzed (S2 and S3 Tables). Nevertheless, misspecification can also be SNP-specific when sample size varies across the SNPs analyzed. Per-SNP sample size heterogeneity can induce different proportions of sample overlap and potentially invalidate S r for those variants. We illustrate this potential bias by applying omnibus tests for the joint analysis of the four GWASs from the GC consortium [33] (S10 Fig). To address this issue, we implemented additional tools to estimate sample size per SNP when missing and subsequently filtered the variants with small sample sizes (S11 andS12 Figs and S1 Text). Finally, out of 10 million variants reported for some GWAS, fewer than 1,000 had complete summary statistics for all 36 phenotypes analyzed. While methods exist to impute missing GWAS statistics, we found them to be inaccurate for multitrait analyses and we implemented the RAISS [27] approach we recently developed to ensure valid imputation for our context (S13 Fig). All preprocessing steps were recently incorporated into a publicly available toolset [28]. After applying our preprocessing pipeline to all 36 GWAS analyzed, there remained 6,978,319 SNPs with a missing data rate of 45% (59% before imputation).
Next, to illustrate the relative detection ability of each approach, we conducted a series of simulation studies across various scenarios. S14 Fig represents the null hypothesis rejection boundaries of each test using a simple example with two traits, when the genetic effect follows a single bivariate normal distribution and when it follows a mixture of two bivariate normal distributions (to reflect our working hypothesis). In this setting, the omnibus test displays the largest detection rate. Among the sumZ tests, sumZ g and sumZ ica show better performances than sumZ 1 and sumZ r , especially when the structure of the genetic signal is heterogeneous. We then conducted more extensive simulations focusing on the omnibus and SumZ g tests. Again, the omnibus test shows the best performances, especially when all traits have a high heritability and if the genetic signal is not structured along a specific direction. The statistical power of the omnibus test also tends to increase with the sample overlap especially when the environmental correlation was not aligned with the genetic correlation. Interestingly this good performance was also observed for genetically uncorrelated traits when they have a high heritability. In the case of a highly structured genetic effect sampled along a specific direction SumZ g and the omnibus test performed similarly. However, when only one of the 10 traits had a high heritability, the omnibus test underperformed compared to the SumZ g , reflecting the cost of additional degrees of freedom in the omnibus test. Power of the omnibus and sumZ g tests with respect to sample overlap. Color of the line represents the test. Each panel correspond to a simulation scenario. Point shape indicates if the residual covariance was generated to be partially aligned with the genetic covariance or to be unconstrained (random). A 10 trait Z-score vector was generated as the sum of a genetic effect and a residual effect: Z = Z g + Z res . Z res was sampled from a normal multivariate distribution with covariance matrix terms equal to s z ¼ rn s = ffi ffi ffi ffi ffi ffi ffi ffi ffi n 1 n 2 p where n 1 is the sample size of the first study, n 2 is the sample size of the second study and ρ is the phenotypic covariance among the n s overlapping samples. Z g varied depending on the simulation scenario: (eff~Random) Z g were sampled from a uniform distribution with boundary [-6; 6], (eff~corG) Z g was sampled from a normal multivariate distribution with a random covariance matrix, (eff~wG) Zg was sampled along a straight line blurred with a normal noise, (eff~high H) Z g was sampled from a normal multivariate distribution simulating genetically uncorrelated traits with high heritability, (eff~het H) Z g was samples from a normal multivariate distribution simulating genetically uncorrelated trait with only the first having a high heritability. https://doi.org/10.1371/journal.pgen.1009713.g001

Comparison of multitrait association in real data
We analyzed the 36 GWASs of European ancestry (S1-S3 Tables) using the aforementioned multitrait approaches applied to seven phenotype sets: five medical-based sets (Immunity, Anthropometry, Metabolism, Cardiovascular and Psychiatric), a BMI related set including anthropometry traits and lipids (referred further as the Composite set), and finally all 36 phenotypes jointly (Fig 2). Note that we included bone mineral density traits in the Immunity set because an enrichment of BMD genome wide significant loci in immune pathways and immune cell regulatory regions has been previously reported [34,35]. We derived the overlap of significant loci of the multitrait tests per phenotype set (S15-S21 Figs), and after merging all analyses (Fig 3A). We applied a Bonferroni correction to the joint tests and used a p-value threshold of 10 −8 instead of the standard 5x10 -8 . Univariate phenotype associations were included in the comparison using the minimum of univariate p-value across all outcomes (noted P univ ). Across all phenotype sets, 391 associations were identified by the multitrait tests only, 392 were identified by univariate association tests only, and 1557 were significant for both univariate and multitrait tests (see Fig 3A). The largest number of new associations was detected by the omnibus test. The performances of the sumZ tests varied substantially depending on the phenotype set. For example, the weighting scheme based on phenotypic correlation (SumZ r ), detects slightly more signals than other weights for the Immunity set (S19 Fig) but The diagram presents the overall analysis pipeline. A total of 36 GWAS were included covering several common diseases and quantitative traits. All GWAS summary statistics went through extensive preprocessing and quality control filtering, and missing single SNP statistics were imputed when possible. Multitrait approaches were then applied to all clean GWAS data and on each clinically based set (All, Immunity, Metabolism, Brain, Cardiovascular, Anthropometry, and Composite). After combining univariate and multivariate results, and merging SNPs within locus, a total of 6,767 associations were identified. After a comparison of results per approach, a clustering analysis was performed for variants within each set. Finally, we performed in-silico functional analysis of the clusters derived in the Metabolism set to assess their biological relevance.
https://doi.org/10.1371/journal.pgen.1009713.g002 shows independent variants detected across the six approaches: univariate test (univ), omnibus test (omni), weighted sum of Z-score with uniform weight (sumZ 1 ), weight defined as the loading of the first principal component of the phenotypic correlation (sumZ r ), the genetic correlation (sumZ g ), or defined using the loadings of an independent component analysis (sumZ ica ). Each line corresponds to a test and each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot at the left represents the total number of significant independent signals detected by each approach. The stack bar at the top represents the cardinality of the sets. The next panels show the link between strengths of univariate association signal and the relative performance (i.e. larger power) of the four most tests: univ, omni, sumZ g , and sumZ ica , for each phenotype set: anthropometry (B), cardiovascular (C), immunity (D), metabolism (E), brain (F), composite (G), and all phenotypes (H). Within each phenotype set, we split the top associated SNPs per region based on the most significant test, and derived the median chi-squared for each test. The radar plots show the derived median per test and illustrate the strong heterogeneity in patterns identified. For example, out of the 1605 SNPs from the anthropometry set, 1235 had stronger signal with univ as compared with other tests. The median chi-squares in that group were 49.1, 1.1, 2.0, 1.0, and 0.7 for height, body mass index (BMI), hip circumference (Hip), waist circumference (WC), and waist to hip ratio (WHR). Comparatively, the 267 SNPs harboring a stronger signal with omnibus, had median of 6.8, 20.1, 15.9, 11.2, and 7.2 for the same phenotypes.

PLOS GENETICS
fewer associations in other phenotype sets (Fig 3A). While the Omnibus detected the largest number of new associations, the substantial share of signals found by other models suggests that applying several multivariate tests, especially the combination omnibus, sumZ ica , and sumZ g , could be an interesting solution to maximize detection. Finally, we checked the 392 associations identified by the multitrait test only in these data against previously reported associations from the GWAS catalog [1] for the same phenotypes. Altogether, we report a total of 322 new associations (S4-S10 Tables).
To further understand the relative performances of those three tests (omnibus, sumZ ica , sumZ g ) along the univariate test, we explored which multitrait signal was associated with the largest increase in detection per test. To this end, we listed all loci found associated with at least one of the four approaches, and assigned each locus to a test based on the lowest p-value. We then derived the median squared z-score by phenotype across the loci assigned to each test. As shown in Fig 3B-3H, the median pattern varied substantially across tests and phenotype sets. Higher power for the univariate test was, as expected, observed for strong association signals for a single phenotype, and mostly reflected a very large sample size for that phenotype and/or a strong heritability (e.g. height in the anthropometry set, Fig 3B, or atrial fibrillation in the cardiovascular set, Fig 3C). A strong association signal for the omnibus test was linked to the inclusion of correlated phenotypes and sample overlap, resulting in a high residual covariance (S r , S2 Table). For example, the median squared z-score was elevated for any stroke (AS), any ischemic stroke (AIS) and cardioembolic stroke (CES) in the Cardiovascular set. The patterns preferentially detected by the sumZ g test are harder to interpret. However, we noticed that sumZ g displays a strong signal for SNPs associated with physiologically related traits (e.g., T2D and fasting glucose in the metabolism set, Fig 3E, or bone mineral density of neck and spine in the immunity set, Fig 3D).
To confirm the relevance of the associations detected by multivariate tests, we also conducted a tissue enrichment analysis to significant variants identified by the multitrait approaches and by the univariate analyses separately (S11 and S12 Tables). Overall, univariate variants and multitrait variants harbored a very similar functional enrichment landscape (S22 Fig). Most enriched tissues are already known to be involved in the phenotype in question, including liver, fat and pancreas for the Metabolism set, immune cell types and thymus for the Immunity set, and heart for the Cardiovascular set. Our enrichment study also confirmed less obvious observations, which have nevertheless been noted before: the involvement of immunity in brain-related traits (e.g. autisms and schizophrenia) [36,37] and the overrepresentation of brain tissues in the Metabolism set [38,39].

Distinct genetic association profiles correspond to distinct genetic correlations
Our comparison of approaches highlights that associated genetic variants display a broad range of multitrait association profiles. We investigated how these profiles can be broken down into groups of homogeneous multivariate genetic effects. This is directly related to the principle of genetic correlation, which quantifies the concordance of genetic effects across traits (e.g. [9]). The difference here, is that genetic correlation captures only the average over the whole genome, and as discussed in recent studies, more localized genetic structures likely exist for many pairs of traits [10]. To detect such a structure, we implemented a multivariate Gaussian mixture model (MGMM) [40] for the identification of clusters among SNPs found associated with at least one approach. We applied MGMM assuming between 2 and 10 clusters and used the BIC and silhouette criteria to determine the most relevant number of clusters. We further bootstrapped the computation of the clustering criteria to ensure the robustness of the selection (S1 Text). The best suited number of clusters is 6, 8, 8, 9,  We assessed the uncertainty in cluster assignation by deriving the entropy for each variant (see S1 Text). We observed some heterogeneity in the distribution of entropy values across phenotype sets and clusters (Fig 4C and S16 and S17 Tables). The difficulty to attribute a cluster for certain variants might be due to the lack of representative cluster (i.e. sub-structure not captured in our analysis) or to shared functionalities between clusters that are not modelled in the GMM framework. Because differentiating those two possibilities using the available data would be very challenging, we decided to remove outlier variants (N = 28 across all sets) with entropy above 0.75 from further analyses.
The resulting clustering is presented in  both the shared explained variance between phenotypes and the proportion of explained variance by clusters for each phenotype. The complete list of SNPs for the Metabolism set per cluster is presented in S16 Table. The multivariate effects vary substantially from one cluster to another. For instance, in Metabolism clusters, SNPs from the cluster 3 display increased HDL-C and decreased triglycerides, while SNPs from cluster 5 are more specific to triglycerides.
The alluvial figures and heatmaps provide an overview of the magnitude of genetic effects from one cluster to another. To further characterize the concordance or discordance of genetic contributions across phenotypes, we computed the pairwise SNP-based genetic correlations for each cluster (see S1 Text).  rheumatoid arthritis (RA), ulcerative colitis (UC) and Crohn's disease (CD) provide a striking illustration of how the genome-wide genetic correlation can be composed of smaller structures. The genome-wide genetic correlations between UC and CD is strong (0.41), but near 0 and not significant for RA (see S3 Table). In Fig 5B, we noticed a fairly large negative correlation in clusters 2 and 3 between RA both CD and UC whereas, the cluster 5 captures a group of variants displaying a strong positive correlation across the three traits. Similar negligible genome-wide correlations along with opposite genetic correlations across clusters were observed in the Metabolism set. For example, variants from cluster 1 display a strong concordant effect between LDL and T2D, but variants from cluster 6 harbor an equally strong negative correlation. Fig 5 also highlights that significant genome-wide genetic correlations across highly related phenotypes such as UC-CD and LDL-TG are not distributed evenly across variants.

Biological meaning of genetic clusters
We conducted series of in silico functional analyses with the objective of mapping clusters to candidate biological functions. For each phenotype set, we evaluated two types of enrichment: tissue-specific chromatin mark enrichment per cluster (S13 Table) and a pathway enrichment framework (S14 and S15 Tables) which integrates multiple databases such as Gene Ontology (GO) and KEGG. Here, we focused on the Immunity and Metabolism sets as a case study. Given a phenotype set, while using the same set of traits for all clusters, we observed large differences in pathways, tissues and cell type enrichments between clusters.
For the Immunity set, clusters 1 and 4 predominantly capture genetic effects of bone-mineral density; clusters 2, 3 and 5 affect inflammatory bowel disorder (IBD); and clusters 6, 7 and 8 capture variants with pleiotropic effects on rheumatoid arthritis and IBD (S27 Fig). Both enrichment analyses pointed toward an overrepresentation of the immune system with all clusters-even those affecting primarily bone-mineral density-being enriched for at least one immunologic pathway or one immunological tissue. We highlight the top enriched tissues and top pathways in Table 1. Concerning pathway enrichment, immune related pathways regulating the shape of the immune response, such as cytokines and the JAK-STAT signaling pathway were recurrent. Interestingly, variants from those clusters map to a distinct set of cytokines and cluster of differentiation genes (e.g., IL4, IL13, IL33 for cluster 1 and IL3, IL5, IL10, IL19, IL20, IL21, IL27 for cluster 5), which suggests that they may impact different components of the immune system. Concerning tissue-specific active chromatin mark enrichment, clusters 2 and 3 contained multiple SNPs enriched primarily in transcriptionally active regions of "primary natural killer cells from peripheral blood" whereas cluster 7 and 8 are enriched for "primary T helper cells." We also observed enrichment in the tissue where immune damage occurred for cluster 5 (colonic mucosa), which highlights the complex interaction between the immune system and the inflamed tissue.
The Metabolism set includes several molecular phenotypes, which we expect to be closer to biological mechanisms than some of the macrophenotypes from other sets. Overall, cluster 1 is mostly associated with an increase in fasting glucose and impaired β-cell function; cluster 2 is highly pleiotropic and notably increased the risk of T2D; and clusters 3 to 6 are mostly associated with lipids (Fig 4). Accounting for the direction of effects, we also noted that the genetic associations in cluster 5 match the known phenotypic correlation with the inverse relationship between circulating levels of HDL-C with those of LDL-C and, more especially, TGs observed in epidemiological studies [41]. At the tissue level, we observed modest enrichment for adipocytes in clusters 1 and 2 (FDR p-value 0.028 and 0.01 respectively, S13 Table) and cluster 3 SNPs are up-regulated in the liver (FDR p-value 0.005).
As shown in S14 Table, each cluster was significantly enriched for a large number of GO terms. We report some specific and illustrative examples: cluster 1 is enriched for the carbohydrate homeostasis set (q-value = 2.5 x 10 −3 ), cluster 3 is enriched for the reverse cholesterol transport set (q-value = 2.8 x 10 −13 ), cluster 4 is enriched for the plasma lipoprotein clearance set (q-value = 1.7 x 10 −5 ), cluster 5 is enriched for the protein lipid complex assembly set (qvalue = 1.08x10 -9 ) and cluster 6 is enriched for the low density lipoprotein particle remodeling set (q-value = 1.07x10 -2 ). Cluster 4 also exhibits active chromatin tissue enrichment in immune T cells (q-value = 2.3x10 -3 ), highlighting the link between cholesterol and immunity. Indeed, cholesterol and modified forms of cholesterol, such as oxidized cholesterol and cholesterol crystals, promote inflammatory and immune responses through multiple pathways including the activation of the Toll-like receptor (TLR) signaling, the NLRP3 inflammasome and myelopoiesis [42,43]. While the promotion of inflammation and immunity is carried by LDL particles, HDL particles were proposed to counteract this effect in part through reverse cholesterol transport [44]. However, cluster 3, which is enriched for reverse cholesterol transport did not exhibit such tissue enrichment in immune T cells, indicating that the link between HDL and immunity may be more complex, as recently pointed out by Madsen et al [45].

Metabolism pathways and diseases
To provide a perspective on the specificity of genetic variants across clusters and their potential contribution to human diseases, we investigated the lipids from the Metabolism set. We first projected each cluster gene onto KEGG pathways. Here, we used only maps corresponding to enriched GO gene sets identified or to tissue identified in the enrichment analysis at the previous stages (S14 and S15 Tables): fat digestion and absorption, cholesterol metabolism, and

PLOS GENETICS
PPAR signaling pathways. We constructed a synthesis of these observations on the metabolic map presented in Fig 6A and 6B. Genes associated with clusters (S16 Table) had functions in agreement with their effects on blood lipid levels: cluster 3 is enriched in genes involved in HDL-C biogenesis and metabolism (LCAT, ABCA1, SR-B1, CETP, PLTP, LIPG, APOAx and APOCx), clusters 4 and 6 with genes related to LDL-C clearance (SORT1, PCSK9, LDLR, LDLRAP1, APOB and APOE), and cluster 5 to genes related to triglycerides and chylomicron transport (LPL, APOAx and APOCx).
We then assessed the effect of variants from each cluster with three diseases known to be associated with serum lipids: coronary artery diseases (CAD), stroke, and obesity (defined as a BMI > 30) (S19 Table). Within each cluster, we aligned the SNP alleles with the main trend of the corresponding cluster so that all coded alleles fit the multitrait pattern defined in Fig 6C  (see S1 Text). For example, all SNPs from cluster 5 were recoded to be associated with an increase in TGs, TC and LDL-C and a decrease in HDL-C. We plotted in Figs 6D and S31 the genetic effect of each SNP on the three diseases (using the effect on BMI as a proxy for obesity) after the aforementioned alignment, and we performed a sign test to assess the significance of the observed trend (S19 Table). SNPs from several clusters displayed a significant increase in the risk of CAD: cluster 2 (P = 6.6x10 -5 ), cluster 4 (P = 2.9x10 -2 ), cluster 5 (P = 3.9x10 -3 ) and cluster 6 (P = 2.8x10 -4 ). SNPs from cluster 2 also displayed a nominally significant increase in the risk of stroke (P-value = 1.6x10 -2 ). Finally, a large fraction of SNPs from cluster 3 had a negative effect on BMI (P = 6.4x10 -4 ). Interestingly, several SNPs from this cluster show association with CAD, but with heterogeneous effects-some associated with an increased risk and other associated with a decreased risk-so the absence of a global trend. The associations of clusters 4 and 6 with CAD add to the evidence of a causal effect of LDL-C on CAD [46], which has been established by prospective epidemiological studies [47], Mendelian randomization [48] and randomized clinical trials evaluating the effect of LDL-C reducing therapies [49]. Moreover, the association of cluster 5 with CAD risks corroborates a potential causal role of TGs [5] and remnant cholesterol [50,51] in CAD. The role of TGs in CAD has also been confirmed by epidemiological studies [52], genome-wide association studies [5], Mendelian randomization studies [53] and randomized controlled trials aiming the lower of TGs [54]. Cluster 3, which is associated with increases in HDL-C, does not have a protective effect on CAD, which is again in agreement with Mendelian randomization analyses reporting no link between HDL and CAD [48,55]. Finally, the association of cluster 2 with CAD and strokes further supports the potential causal effect of type 2 diabetes on CAD and stroke [56].
As a final exploratory analysis, we reported the cluster and multitrait genetic effect of genes targeted to mitigate hyperlipemia to prevent CAD ( Table 2). Drug targets corresponding to cluster 3 (ABCA1, CETP, NR1H3) did not lead to successful clinical trials, whereas targets (PCSK9, NPC1L1, APOC3, HMGCR) in clusters 4, 5 and 6 were mostly successful or promising. The example of the CETP gene, that is classified in cluster 3, a cluster not associated with CAD, is of particular interest. CETP has been the target of failed clinical trials that attempted to prevent CAD by inhibiting CETP and consequently increasing circulating HDL-C [57][58][59]. Cholesteryl ester transfer protein (CETP) promotes the heteroexchange of cholesteryl esters and TGs between HDL-C and APOB-containing lipoproteins connecting HDL-C and TG metabolism [57]. Pharmacological inhibition of CETP was motivated by GWAS [60] and prospective cohorts [61] that indicated that CETP variants were associated with higher circulating HDL-C levels and lower LDL-C, TGs and CVD risk. However, although all CETP inhibitors achieved an effective increase in HDL-C, only anacetrapib led to a significantly lower incidence of major coronary events [62] in patients who were receiving statin therapy, an effect that might account for the reduction in ApoB (non-HDL-C) rather than the elevation of HDL, as suggested by Mendelian randomization analyses [63]. In addition to these well-known When a gene is associated to several SNPs belonging to different clusters it is represented with several colors. To improve interpretation, we also present in panel C) a proxy for the relative contribution of each phenotype per cluster, defined as the loadings of the first principal component derived from the matrix of Z-score for the subset of SNPs in that cluster. Finally, panel D) shows the distribution of standardized beta for association between SNPs from each cluster and three diseases: any stroke (AS), coronary artery disease (CAD), and obesity (using body mass index as a proxy). drugs, we provide a systematic listing of potential drug targets by cluster (S20 Table) based on the druggable genome database [64].
Altogether, these results suggest that drug development might be more effective by accounting for the gene context, i.e., by selecting candidate genes not from their individual features but based on the disease association trend of genes displaying similar multitrait association profiles. Under this working hypothesis, the proposed inference of genetic functional groups can provide a means to identify those genes and therefore to select potential candidates.

Discussion
In this study, we conducted multitrait analyses of GWAS summary statistics from 36 human phenotypes combining association tests and clustering to detect the shared and specific genetic substructures underlying those phenotypes and to explore the links between those substructures and biological pathways and diseases. The question of substructures underlying genomewide genetic correlation has been partially explored in other recent studies [8,10]. Our work is in agreement with these studies, confirming the presence of regional genetic correlation differences and offering a data-driven approach for identifying primary substructures across millions of possibilities. Nevertheless, an understanding of whether those genetic functional groups are only statistical constructs or correspond to meaningful biological mechanisms is critical. In the latter, it means that a data-driven approach, such as the one proposed in the present study, can be used to dissect the genetic contribution of complex human phenotypes. Here, we used two complementary functional enrichment analyses to map these multitrait association profiles to pathways, and we report a detailed view of these profiles for the Immunity and Metabolism phenotype sets. In these analyses, the most enriched tissues varied substantially across clusters within each phenotype set, highlighting the potential of such an approach for characterizing distinct genetic mechanisms.

PLOS GENETICS
The variability in pleiotropy profiles across SNPs identified in GWASs has been previously discussed. For example, earlier reports [33] on inflammatory diseases have highlighted such patterns, or proposed grouping of SNPs based on the direction of association [65]. However, those studies used only a handful of SNPs identified at the time of publication. Our analysis based on formal clustering and functional enrichment analyses and using the results of GWASs performed in much larger sample size offers a new and much more detailed qualitative perspective on these profiles. More recent publications have also discussed approaches focusing on the characterization of SNPs displaying pleiotropic effects [66], the inference of shared and distinct genetic pathways between related phenotypes [67], and the identification of genetic components linked to disease subtypes [31,68]. Our approach shares objectives with some of these methods but also has unique features and advantages. Studies based on component decomposition techniques alike principal component analysis [31,67] yields endotypes of interest from a biological standpoint, but do not provide the SNP-level genetic decomposition that we are addressing. Approaches that rely on individuals' genotypes are limited by the ethical and practical cumbersome aspects tied to this type of data [68], sometimes without increasing statistical power. For example, as demonstrated in this study, the MANOVA derived from individual-level data is equivalent to the proposed omnibus test derived using the summary statistics only.
Past studies have shown that sufficiently curated genetic information can enhance the chance of success of clinical trials [69,70]. We further argue that fine analysis of pleiotropic effects, as performed in the present study, is a very promising path forward to help identify drug targets with a minimal risk of serious side effects. In particular, the picture of the links between coronary artery disease risk and lipid pathways inferred from our analysis is coherent with the state-of-the-art, while providing critical new evidence. While the association of LDL-C and TGs with CAD is largely documented [46,71], the relation linking HDL-C with CAD is more complex, as both low and high HDL-C levels have been associated with a risk of cardiovascular disease and mortality [72,73]. Recent studies pointed out that the functionality of HDL rather than the static measure of its circulating cholesterol level accounts for the relationship between HLD-C and CVD and mortality [73,74], with a potential role of HDL in remnant cholesterol transport. Overall, evidence for the presence or absence of a causal effect between lipid cholesterol measures and CAD as reported by Mendelian randomization analyses should be considered with caution, as lipid traits result from a complex interconnexion of multiple biological pathways. Our analysis suggests that the genetic contribution to the established negative correlation between HDL-C and CAD might be driven only by a subset of genes within a few specific genetic pathways. Under this hypothesis, drugs targeting mechanisms outside these pathways would be ineffective in decreasing CAD risk. Note that such retrospective analysis can be only suggestive of the potential deleterious side effects of drug targets. Nevertheless, the identification of those candidates might limit cost of further analysis using both in silico analyses in independent data, and ex vivo study to examine the role of the identified genetic variants on additional intermediate molecular phenotypes (gene expression, protein, etc).
A number of further analyses can be conducted based on the results we obtained. First, we focused on a limited number of phenotype sets. Extending analyses to other sets of phenotypes might help refine potential genetic functional groups and better characterize their link to biological mechanisms. To our knowledge, there are no trivial solutions to solve the intrinsic combinatorial issue (i.e., one can build over 6x10 10 sets of phenotypes from 36 GWASs). Additionally, note that we worked with a data freeze dated from December 2018. Hence, at the date of the publication of this paper, newer summary statistics are available for a few traits. We accounted for these new publications when counting newly identified variants by filtering associations reported in the latest version of the GWAS catalog. Another critical component of our analysis is the methodological choices for clustering. Here, we considered a Gaussian mixture model, mainly to enable missing values, and used BIC and silhouette to decide the optimal number of clusters. Other methods and alternative criteria might result in slightly different clusters. Moreover, we assume that genetic variants belong to distinct clusters, but it is likely that some variants belong to multiple biological pathways. Note that GMM provides posterior probability of cluster assignment and has the potential to explore overlapping clusters, but better approaches might potentially exist to address that specific question. Additionally, our implementation does not automatically address the problem of allele coding (i.e., the choice of the coded allele) inducing, in some cases, symmetric clusters that we had to merge a posteriori. Again, alternative approaches might offer the possibility of solving this issue.
To summarize, we ensured the theoretical reliability of a panel of multitrait tests and demonstrated their capacity to detect new associations on diverse sets of traits. Considering independent significant associations, we stratified SNPs in multitrait profiles corresponding to biological pathways. We believe this stratification to be relevant for multiple applications ranging from functional annotation to drug targeting.

Multivariate association test
Consider a vector z of K Z-scores statistics for a single nucleotide polymorphism (SNP) obtained from standard univariate genome-wide association screenings of K phenotypes. Under the null hypothesis, z = (z 1 ,. . .,z K ,) follows a normal distribution N(0,S r ), where S r is the residual phenotypic covariance matrix (S1 Text), while under the alternative, z is expected to display additional covariance due to shared genetics (defined by a genetic correlation matrix S g ). We first considered an Omnibus test of the vector of Z-scores, which can be performed using the multivariate Wald statistics: where T omni follows a chi-square with K degree of freedom (df) under the null hypothesis of no phenotype-genotype association. We also considered a classic weight-based test defined as: where w is a vector of K weights applied to the Z-score. Under the null, T sumZ follows a chisquared distribution with 1 degree of freedom. Note that this approach shares similarities with both standard fixed effect meta-analysis [14] and with dimensionality reduction methods (e.g. principal component analysis [25]). One can also note that the Omnibus statistics can be expressed as a combination of the sumZ statistics over all eigenvectors of S r (S1 Text). We note v i the i th eigen vector of S r : We considered four weighting schemes for the sumZ tests: (i) in the SumZ 1, w is equal to the unit vector so all traits have the same weight; (ii) in the SumZ r, w is equal to the first eigen vector of S r so its direction represents phenotypic correlation between traits, (iii) in the SumZ g, w is equal to the first eigen vector to S g so its direction represents genetic correlation between traits, (iii) in the SumZ ica w is computed by applying an Independent component analysis (ICA) to the complete matrix of Z-score. To compute the weight vector w of the Sum-Z ica , for a given phenotype set, the genome wide Z-score matrix was extracted and an independent component analysis was performed with the scikit-learn python package. The component yielding the most novel association was selected as loadings. We verified that this selection procedure did not lead to an inflation under the null hypothesis by simulation (see S2 Fig).
Performing the omnibus test requires inverting the Z-score covariance matrix S r . When this matrix does not have a full rank, we use a pseudo inverse of the matrix based on the singular value decomposition (S1 Text). Briefly, as S r is a variance-covariance matrix, it can be written PDP t where D = diag((λ k ) k = 1. . .K ), (λ k ) k = 1. . .K ) are the eigenvalues of S r and P is the orthogonal matrix whose columns correspond to the eigenvectors of S r . If it is not invertible, only K' eigenvalues are different from 0 (where K' denotes the rank of S r ) and an inverse Σ À 1 r of the matrix can be computed as Σ À 1 ..K 0 Þ and P K 0 denotes the K×K 0 matrix whose columns are the K 0 eigenvectors corresponding to the eigenvalues different from 0. Note that the Omnibus statistics computed with Σ À 1 r follows a χ 2 distribution with K 0 degree of freedom.

Robust estimation of Z-score covariance
The validity of the proposed multivariate tests mostly relies on the accurate estimation of S r . In practice, the covariance between Z-scores from null SNPs from two GWAS will deviate from 0 when there is both sample overlap and correlation among the traits analyzed. When combining results from two independent studies, or when the trait analyzed has negligible correlation, S r will be a diagonal matrix, so that the Omnibus test can be performed by summing chi-squared statistics for each SNP to form a K degree of freedom chi-square, and the sumZ test becomes a standard weighted meta-analysis of fixed effect. Yet, in the large-scale GWAS era, this situation is unlikely as most of the large GWAS are conducted in the consortium setting, where samples likely overlap across multiple GWAS. It follows that S r can contain nonzero off-diagonal terms. Under the complete null model, the expected Z-score covariance for null SNPs between two traits equals s z ¼ rn s = ffi ffi ffi ffi ffi ffi ffi ffi ffi n 1 n 2 p where n 1 is the sample size of the first study, n 2 is the sample size of the second study and ρ is the phenotypic covariance among the n s overlapping samples (see S1 Text and e.g. [3,9]). In some specific cases, one can obtain these parameters directly from the data (e.g. when analyzing multivariate omics data). Conversely, obtaining all four parameters (ρ, n s , n 1 , n 2 ) from consortium GWAS based on dozen or even hundreds of cohorts can be a practically daunting and risky task. Moreover, accurate phenotypic covariance estimation would be particularly challenging when study-specific and trait-specific covariates adjustment has been performed. Recent studies proposed to estimate S r using available SNPs from the GWAS in question using all available single SNPs Z-score [75] or using a random subset of pruned variants [14], though some discussed removing GWAS hits [15], focusing on a subset of SNPs in regions less likely to contain causal variants [76], or using tetrachoric estimator [16]. The validity of these estimators mostly relies on the assumption that the vast majority of the SNP effects in the genome are distributed under the null hypothesis. While this is likely to be true in some cases, associated variants can potentially lead to either upward or downward pairwise covariance between Z-scores. Instead, we leverage recent work by Bulik-Sullivan et al [3,9] that allows for estimation of this covariance (and the diagonal variance terms) under a polygenic model and assuming multivariate normality of effect sizes across traits (see S1 Text). The estimation of S r was performed on Z-scores before the imputation step described in the next section. For a few traits the estimated variance is markedly inferior to 1. As indicated in the LDSC regression method, this phenomenon happens when the original GWAS was corrected with a genomic control factor.

Simulation studies on test statistical power
To assess the statistical power of the Omnibus test and the SumZ tests, we designed the following simulation scenarios. A 10 trait z-score vector was generated for each of the 5000 causal SNPs as the sum of a genetic effect and a residual effect: Z = Z g + Z res . In the following, random covariance matrices were generated using the randcorr R package [77]. Z res was sampled from a normal multivariate distribution with covariance matrix (S r ) terms equal to s z ¼ rn s = ffi ffi ffi ffi ffi ffi ffi ffi ffi n 1 n 2 p where n 1 is the sample size of the first study, n 2 is the sample size of the second study and ρ is the phenotypic correlation among the n s overlapping samples. As demonstrated in [9], ρ = ρ g +ρ e where ρ g is the genetic correlation and ρ e the environmental residual correlation. We generated the residual covariance matrix S r according to two scenarios: (random) ρ r was randomly sampled with the only constraint that S r remains a semi definite positive matrix, (aligned) ρ r was constructed as r r ¼ 0:4 � r g þ 0:6 � E with E sampled at random. Z g varied depending on the simulation scenario: (eff~Random) Z g were sampled from a uniform distribution with boundary [-6; 6], (eff~corG) Z g was sampled from a normal multivariate distribution with a random covariance matrix, (eff~wG) Zg was sampled along a straight line blurred with a normal noise, (eff~high H), Z g was sampled from a normal multivariate distribution simulating genetically uncorrelated traits with high heritability, (eff~het H), Z g was samples from a normal multivariate distribution simulating genetically uncorrelated trait with only the first having a high heritability.

Data preprocessing: an overview
The analysis of the 36 GWAS required substantial preprocessing, including the inference of several parameters. First, for many publicly available GWAS, sample size per SNP was not readily available and retrospectively collecting this information can be very challenging as it implies requesting this information from each individual cohort. For such a situation, we propose inferring a proxy for missing sample size as 1=ðŝ 2 b G s 2 G Þ, whereŝ 2 b G is the variance ofb G , the estimated SNP effect, and s 2 G the variance of the SNP, derived from the coded allele frequency which is either provided with the GWAS or extracted from a reference panel (see S1 Text). For linear regression this approximate Ns 2 e , where N is the true sample size and s 2 e is a residual variance of the outcome in the regression model. For logistic regression our estimator is a proxy for the term Np(1−p), where p is the in-sample proportion of cases, and it therefore assumes that the proportion of cases is relatively stable across SNPs with different sample size.
Another challenging issue was the merging of multiple GWAS with different set of assayed SNPs. Indeed, out of 10 million variants reported for some GWAS, fewer than 1,000 had complete summary statistics for all 36 phenotypes analyzed. We performed an imputation of missing Z-scores in each study using the RAISS [27] method we recently developed. The approach uses correlation between SNPs to predict Z-score at missing SNPs using available ones and achieves a level of imputation accuracy suitable for multitrait analysis (S1 Text). Here we used the European panels from the 1,000 Genomes project [78] as a reference for the estimation of the correlation between SNPs. Overall, imputation did not lead to any observable inflation of the omnibus statistic (S13 Fig). Nevertheless, as a supplementary quality control (QC), we excluded significant SNPs that were not surrounded by SNPs in linkage disequilibrium with significant or near significant p-values (P < 10 −6 ).
These two parameter inferences were integrated along other preprocessing operations into a pipeline that is fully described here [28]. Given a reference panel with no ambiguous strand, it consists in the following steps (i) Extract, the coded and alternative alleles, signed statistics (regression coefficient or odds ratio), standard error, p-value, and sample size; (ii) Remove all SNPs that are not in the reference panel; (iii) Derive Z-score for each SNP from signed statistics and p-value; (iv) Infer sample size when not available; (v) Remove all SNPs whose sample size is less than 70% of the maximum sample size; and (vi) Infer missing Z-scores statistics based on the 1K genome reference panel. After applying our preprocessing pipeline to all 36 GWAS analyzed, there remained 6,978,319 SNPs with a missing rate of 45% (59% before imputation).

Characterization of new loci
To determine new and existing trait-associated loci we used genome regions formed by linkage disequilibrium (LD) blocks as defined in Berisa et al [79] using a reference panel of individuals of European ancestry. It included a total of 1,704 independent regions ranging from 10 kb to 26 Mb in length, with an average size of 1.6 Mb. For each independent LD region, we extracted the minimum p-value over all SNPs contained in the region, and a single univariate analysis pvalue defined as the minimum across all single phenotype GWAS and all SNPs in the region. We consider that a region is newly detected by a multitrait test if the joint analysis p-value is genome-wide significant while its univariate p-value is not (joint analysis p-value < 1x10 -8 and univariate p-value > 1x 10 −8 ). We determined SNPs carrying the signal inside significant region with the plink "clump" function using the following parameters:-clump-p1, 10 −8 ;clump-r2, 0.2. We kept the lead SNP by clump for further analysis (gene mapping and clustering).
To report associations exclusively detected in the current report (S4-S10 Tables), we filtered out association present in the GWAS catalogue [1] at the date of the 14 th of September 2020 (univariate p-value > 5x 10 −8 ) for traits corresponding to our phenotype set.

FUN-LDA tissue enrichment
We computed enrichment for SNPs belonging to regions of open chromatin (more likely to contain expressed genes [80,81]) in specific tissues in three cases: i) when comparing results across phenotype sets, ii) when comparing univariate results, and iii) when comparing results across clusters. For all analyses we used functional annotations on 127 Roadmap tissues and cell lines defined by integrating activating histone marks (H3K4me1, H3K4me3, H3K9ac, and H3K27ac) with a latent Dirichlet allocation model as implemented in FUN-LDA [82]. The enrichment score for a tissue is based on the number significant SNPs compared with the total number of SNPs in open chromatin region (see S1 Text). Enrichment results are reported in S11-S13 Tables.

Multitrait genetic association clustering and selection of the optimal number of clusters
We performed a clustering of top associated SNPs for each phenotype set using a Gaussian Mixture model (GMM). Briefly, the assumed generative model is as follows. Consider n independent vectors Z i = (Z i1 ,. . .,Z ij ,. . .,Z id ) t in R d , each arising from one of k distinct clusters. Each vector represents one SNPs. The value Z ij is the Z-score of SNP i on trait j. Let I ij = 1 if the i th observation belongs to cluster j, and define the indicator vector I i = (I i1 ,. . .,I ij ,. . .,I ik ) t . Conditional on membership to the j th cluster, Z i follows a multivariate normal distribution, with cluster specific mean μ j and covariance S j . Let denote π j the marginal probability of membership to the j th cluster. The observations can be viewed as arising from the following hierarchical model: One major difficulty in applying the GMM was to deal with incomplete data. Indeed, even after imputation of some missing statistics, our datasets still contained some missing values. To solve the clustering, we implemented the statistical framework described by Ghahramani et al [83] which we recently implemented in a R package MGMM [40]. This method relies on EM optimization techniques enabling the inference of unobserved variables from observed variables and an assumed Gaussian mixture model. In classical GMM, the only variable inferred is the posterior probability cluster membership. In the MGMM algorithm, missing Zscores are also inferred taking into account the observed Z-scores and the inferred probability to belong to a cluster.
The model gives for each SNP the posterior probabilities to belong to each cluster, and was therefore assigned to its most likely cluster, as long as its entropy was larger than 0.75. For a given variant SNP i , the entropy was derived as follow: where k is the total number of clusters and P(SNP i 2cluster j ) is the posterior probability of SNP i to belong to cluster j. The higher the entropy the more the SNP attribution to one cluster is ambiguous. SNPs with an entropy higher than 0.75 were filtered out of the clustering results.
Clustering was performed on all independent significant SNPs. For each SNP, we defined three p-values on the phenotypic group traits: the minimum univariate p-value (P univ ), the Sum-Z ica p-value and the omnibus p-value. All SNPs with at least one of the three p-value under 10 −8 were selected for further analysis. For the Metabolism univariate clustering, we only considered the univariate p-value to perform the selection. We then applied the plink [84] clump function to retrieve practically independent associations using the 0.2 as clump-r2 parameter and 10 −8 as clump-p1 parameter. For each clump we selected a representative SNPs as the one with the smallest p-value across the three tests and having more than 60% of its values observed. Note that for a negligible number of occurrences, the representative SNPs has a p-value above 10 −8 (S15 and S16 Tables). We applied MGMM within each phenotype set and varied the pre-specified number of clusters between 2 and 10. To select the optimal number of clusters k, we performed the clustering 100 times on a random subset of 80% of the SNPs for each k. For each resulting clustering we computed the Bayesian Information Criteria and the Silhouette [85] (see S23 and S24 Figs). Except for the Metabolism set, the silhouette appears conservative and the BIC criterion anticonservative, i.e. the latter criteria tends to select a larger number of clusters. We decided to use the following ad hoc compounded criterion: 1. If the optimal number of clusters determined by the BIC criteria is higher than the one determined by the silhouette criteria, starting from the silhouette optimal, increase the number of clusters until one of these two conditions is met: 1) adding one cluster significantly decrease the silhouette criterion, 2) the BIC optimal number is reached.
2. In other cases, set the optimal number of clusters to the one determined by silhouette.

Cluster genetic correlation
We defined pairwise genetic covariance per cluster for as r g;cluster¼ b t 1 b 2 =M where β 1 and β 2 are the vector of genetic effects for the pair of phenotypes considered and M is the number of SNPs in the cluster. To estimate properly this quantity from the observedb, we accounted for the bias introduced by sample overlap and phenotypic correlation using the following estimator (see S1 Text): where ρ Y is the phenotypic covariance, and n s , n 1 and n 2 are respectively the sample size shared between the two traits, for the trait 1, and for the trait 2. To assess whether the estimated genetic covariances are significantly different from zero, we performed for each pair of phenotypes within each cluster, a t-test on the vector of random variables (X 1 , X 2 ,. . .,X M ), were X j 1 b j;1bj;2 À n s r n 1 n 2 is the contribution of SNP j to the covariance. Note that we used only independent SNPs selected using LD-clumping with squared-correlation parameter equals 0.2.

Functional enrichment of metabolism clusters
We used FUMA [86] SNP2GENE function to associate SNPs with genes based on two criteria, the physical position (in 30kb radius of a protein coding gene) and eQTLs (all significant cis-eQTL from GTEx up to a distance of 1Mb). Note that we restrained the eQTLs to the one that were found in relevant tissue for the Immunity and Metabolism set: immune cells for Immunity and adipose, intestine, liver and brain tissues for Metabolism (see S1 Data for complete parameters). After chaining genes to clusters based on SNPs, we performed a functional enrichment for pathways defined in KEGG [87] and GO [88] databases and derived report pvalues using FUMA GENE2FUNC function. Here, cluster's gene were compared against a background of protein coding genes. Finally, we used the R package pathview [89] to project genes onto KEGG pathways maps.

Disease-clusters association
For the metabolism phenotype set, to provide an indicator of the relative contribution of genetic variants to phenotypes in each cluster from the Metabolism set, we performed a principal component analysis (PCA) of the SNP-by-phenotype association matrix within each cluster. For this analysis, we used scaled beta coefficients, i.e. Z-scores divided by the square root of the phenotype GWAS sample size. To avoid bias due to the arbitrary choice of the coded allele, we randomly shuffled 20 times the coded allele, and repeated the PCA after each shuffling. We report in Fig 5, the average of the loadings of the first PC over all shuffling. Note that the first PC only provides the multidimensional direction explaining the largest variance and therefore do not fully capture the distribution of genetic effect within each cluster. Nevertheless, those first PCs explained a substantial amount of the total variance, equal to 75%, 38%, 53%, 64%, 80% and 93% of the variance in betas for cluster 1 to 6, respectively.
Then, we assessed the association between SNPs within the inferred cluster and three traits (none of which being included in the Metabolism set): cardiovascular diseases, any stokes and BMI. SNP alleles were aligned according to the first principal by clusters determined in the last section. We applied a sign test to assess the concordance of the sign of the projection on PC1 and the sign of Z-score for on the three tested additional traits. For this analysis we used more stringent criteria to ensure the SNPs independence. We selected the subset of Metabolism SNPs for which linkage disequilibrium does not exceed 0.2 (clump-r2 set to 0.05), which diminishes the number of SNPs considered from 391 to 285. Concerning the association of SNPs to drug target, we associated drug target to a representative SNPs by selecting the SNP with the lowest entropy and having a positive silhouette.

URL Resources
We developed the following R and python packages to perform our study: JASS_Preprocessing: https://gitlab.pasteur.fr/statistical-genetics/jass_preprocessing JASS: https://gitlab.pasteur.fr/statistical-genetics/jass RAISS: https://gitlab.pasteur.fr/statistical-genetics/raiss MGMM: https://cran.r-project.org/web/packages/MGMM/index.html Figs 3-5 have been generated using the following R packages: https://cran.r-project.org/web/packages/UpSetR/index.html https://cran.r-project.org/web/packages/radarchart/README.html https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html https://cran.r-project.org/web/packages/gplots/index.html https://cran.r-project.org/web/packages/igraph/index.html We simulated series of replicates, each including 20,000 individuals, 10,000 SNPs additively coded with frequencies ranging from 0.01 to 0.99, and 100 phenotypes. The phenotypes were drawn from a multivariate normal distribution with mean 0 and a variance-covariance S r of rank 50. For each SNP, we computed the zscore of association with each phenotype using linear regressions and then derived p-values from the Omnibus test using three strategies to derive the inverse of S r : i) considering only eigenvalues greater than a specified threshold � (left column), ii) replacing eigenvalues below a threshold � by � (middle column), and iii) adding a small value � to the diagonal terms of S r (right column). For each strategy we considered three different � values: 10 −3 (first line), We simulated replicates of 10,000 predictors for 500 (left column), 5,000 (middle column) and 10,000 individuals (right column). Predictors were drawn from a binomial with n = 2 and probability varying in [0.01-0.99] to mimic genotype data, and further normalized to have mean 0 and variance 1. We then simulated a) 5, b) 20 and c) 100 correlated outcomes independent from the genotypes. To assess the impact of non-normal outcomes on the relationship between the summary-statistics based test and the individual-level data test, the outcomes were first drawn from a multivariate normal distribution with pairwise correlation ranging in [-0.7; 0.7] and then transform to non-normal distributions based on their quantiles, so that on average, 33% of them follow a uniform distribution, 33% a Laplace distribution, and 33% an exponential distribution. For each SNP, we first applied a Multivariate ANOVA (MANOVA, blue line). We then conducted association screenings for each single phenotype separately, and applied the proposed Omnibus test (purple line) and the Wilk's approximation (orange line) on the resulting summary statistics. The plots show the observed -log10(p-value) of each test against the expected value under the null. (TIF)

S5 Fig. Summary statistics versus individual-level data tests under the alternative.
We simulated replicates of 10,000 predictors for 500 (left column), 5,000 (middle column) and 10,000 individuals (right column). Predictors were drawn from a binomial with n = 2 and probability varying in [0.01-0.99] to mimic genotype data, and further normalized to have mean 0 and variance 1. We then simulated a) 5, b) 20 and c) 100 correlated outcomes. We assumed that 10% of the predictors were causal, each causal predictor being randomly chosen and associated with up to 50% outcomes, randomly chosen with equal probability. Effect sizes for each causal predictor were drawn from a normal distribution with mean 0 and variance 0.0025 (i.e. the total phenotypic variance explained by the predictors). To assess the impact of non-normal outcomes on the relationship between the summary-statistics based test and the individuallevel data test, the outcomes were first drawn from a multivariate normal distribution with pairwise correlation ranging in [-0.7; 0.7] and then transform to non-normal distributions based on their quantiles, so that on average, 33% of them follow a uniform distribution, 33% a Laplace distribution, and 33% an exponential distribution. We simulated series of correlated z-scores for 100,000 SNPs from two outcomes Y 1 and Y 2 . For each simulation we generated a matrix of true genetic effect b = (β 1 β 2 ) of m standardized and independent genotypes for two phenotypes Y 1 and Y 2 from a multivariate normal with means of 0 variance h 2 1 =m, and h 2 2 =m, respectively, and covariance σ g , where h 2 1 ¼ 0:3 and h 2 2 ¼ 0:6 are the heritability of Y 1 and Y 2 . We then generatedb defined asb ¼ b þ ε where ε was also drawn from a multivariate normal with means 0, variance 1/N 1 and 1/N 2 , respectively, and covariance r e N s = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi N 1 N 2 p , where N 1 , N 2 = N 1 /2 and N s = N 2 /2 are the sample sizes for Y 1 and Y 2 , and the number of shared samples, respectively, and r e = 0.4 is the correlation between Y 1 and Y 2 across overlapping samples. Finally, we derived the expected z-score for each genotype z ¼ ðβ 1 ffi ffi ffi ffi ffi ffi N 1 pβ 2 ffi ffi ffi ffi ffi ffi N 2 p Þ ¼ ðz 1 z 2 Þ, and σ z = cov(z 1 , z 2 ), the covariance between z 1 and z 2 . The left panel (a) show the covariance between z-score for null variants in red, and the observed covariance between all z-scores except those harboring a p-value below a given threshold (0 (i.e. no SNP removed), 5x10 -8 , 5x10 -5 , 5x10 -4 , 5x10 -3 , 1x10 -2 , and 5x10 -8   We compared LDSC estimates of covariance between summary statistics against its expected value, which equals ρN s = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi N 1 N 2 p , where ρ is the phenotypic correlation among overlapping sample and N s , N 1 and N 2 are the sample overlap, the sample size for phenotype 1 and the sample size for phenotype 2, respectively. We used individual-level data for five anthropometric traits and 619,017 SNPS measured in 336,347 individuals from the UK Biobank cohort. We first considered scenarios with complete sample overlap (i.e. N s = N 1 = N 2 , panels a-d), so that LDSC estimate is expected to equal the phenotypic correlation. We compared estimates while using 100% (a), 50% (b), 10% (c) and 1% (ds) of the total sample size respectively. The lower matrix triangle in pale red are estimated phenotypic correlation from individual-level data, while the upper triangle in pale blue shades are estimated correlation derived using the LDSC. We then considered scenario where sample overlap is only partial by sub-sampling individuals only for BMI, using 100% (e),50% (f), 10% (g) and 1% (h) of the total sample for that phenotype. The first row, in red, is the expected GWAS covariance knowing ρ, N s , N 1 and N 2 . The second row, in blue, is the estimate derived using LDSC. (TIF) S10 Fig. Bias due to sample size heterogeneity in the GLG consortium GWAS data. We performed the Omnibus test to the four traits from the GLG consortium: high density lipoprotein (HDL), low density lipoprotein (LDL), total cholesterol (TC), and total triglyceride (TG) using all available SNP with complete summary statistics. We plotted the median chi-squared of the resulting test across bin of SNPs defined based the per-SNP standard deviation in sample size across the four traits (σ sample size ) for SNPs with a minor allele frequency (MAF) below 5% (a), and above 5% ( . For each replicate we tested for association between G and Y and we inferred w the weights, which equals the sample size times a constant, using the standard error of the effect estimate and either the in-sample allele frequency or the in-sample allele plus a noise term sample from a uniform with min and max in [0.01, 0.05, 0.10]. When allele frequency plus noise was larger or smaller than 0 or 1, the value was set to 0.001 and 0.999, respectively. Upper and lower panels show the inferred sample size as a function of the true allele frequency when using the identity and logit link functions for modelling and testing for association, respectively. Note that for the later, for each replicate, we simulated 50,000 individuals and considered a disease prevalence of 25%. We then randomly sampled 10,000 cases and 10,000 controls to form replicates of 20,000 individuals. Also, for the sake of comparison, we scaled w by a constant in the logit model so that the target is the true sample size. (TIF) S12 Fig. Impact of case/control ratio misspecification on sample size inference. We simulated 1,000 replicates including 50,000 individuals. For each individual we generated a disease status assuming a prevalence of 25% and an independent genotype G with frequency randomly sampled in [0.001, 0.999]. For each replicate we randomly sampled 5,000 cases and 20,000 controls and tested for association between G and Y using these samples and after sub-sampling cases (i.e. using a sub-sample of the 5,000 cases and the 20,000 controls, top panel) or controls (i.e. using the 5,000 cases and a sub-sample of the 20,000 controls, bottom panel), in order to mimic situation where either cases or controls would be missing for some SNPs. For each experiment we inferred W log = Np(1−p), where N is the sample size and p is the case-control ratio, using the standard error of the effect estimate and the in-sample allele frequency. In both situations (sub-sampling cases or sub-sampling controls), W log is decreasing with increasing percentage of missingness. (TIF)

S13 Fig. Effect of imputation of missing z-scores on the Omnibus test statistic.
Each line corresponds to a GWAS set. The left column is the empirical quantile of the Omnibus statistic versus the theoretical quantile before imputation. The middle column is the same after imputation. The right column is the empirical quantile of the Omnibus statistic before imputation versus the same quantity after imputation. (TIF) S14 Fig. Bivariate Significance boundary for each test. We simulated series of N = 100K individuals with two correlated outcomes Y 1 and Y 2 as a function of m = 1000 independent SNPs. The genetic effect of the m variants on Y 1 and Y 1 , denoted b = (β 1 β 2 ), were drawn from a bivariate normal with means 0, variance h 2 1 =m, and h 2 2 =m, and correlation σ g , where h 2 1 ¼ 0:4; h 2 2 ¼ 0:25, and σ g = 0.8. The vectors of residual ε = (ε 1 , ε 2 ) were drawn from a multivariate normal with means 0, variance 1 À h 2 1 and 1 À h 2 2 , respectively, and covariance r e = 0.5. For each simulation, we derived the z-score for each of the m genotypes z ¼ ðβ 1 ffi ffi ffi ffi N pβ 2 ffi ffi ffi ffi N p Þ ¼ ðz 1 z 2 Þ and plotted z 1 as a function of z 2 . We applied the five tests: Omnibus, sumZ r , sumZ g , sumZ ica and sumZ 1 and highlighted for each of them whether variants had significant p-value (red, p<5x10 -5 ) or not (blue, p>5x10 -5 ). In the results from the upper panels, b was drawn from a bivariate normal distribution with the σ g parameter was set to 0.24. For the lower panel, b was drawn from a mixture of two normal distributions both centered on zero and both with variance equal to h 2 1 =m, and h 2 2 =m. However, the covariance σ g was set to 0.8 for the first one and -0.65 for the second one. The upper panel shows independent variants detected across phenotype groups and across approaches represented as an UpSetR visualization. Matrix lines correspond to a test, each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot on the left of the matrix represents the number of significant independent signals detected by each approach. The barplot on the top of the matrix represents the cardinality of the sets. The sets are ordered by cardinality from the largest to the leftmost to the smallest to the rightmost. The bottom panels show quadrant plots, i.e. the -log10(p-value) for the most significant SNP per region for the Omnibus test as a function of the -log10(p-value) for the most significant SNP per region across all univariate GWAS. Complete results are presented in the left panel, and a zoom around the genome-wide significance threshold is presented on the right panel.
(TIF) S17 Fig. Signal comparison for cardiovascular phenotypes. The upper panel shows independent variants detected across phenotype groups and across approaches represented as an UpSetR visualization. Matrix lines correspond to a test, each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot on the left of the matrix represents the number of significant independent signals detected by each approach. The barplot on the top of the matrix represents the cardinality of the sets. The sets are ordered by cardinality from the largest to the leftmost to the smallest to the rightmost. The bottom panels show quadrant plots, i.e. the -log10 (p-value) for the most significant SNP per region for the Omnibus test as a function of the -log10(p-value) for the most significant SNP per region across all univariate GWAS. Complete results are presented in the left panel, and a zoom around the genome-wide significance threshold is presented on the right panel.
(TIF) S18 Fig. Signal comparison for composite phenotypes set. The upper panel shows independent variants detected across phenotype groups and across approaches represented as an UpSetR visualization. Matrix lines correspond to a test, each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot on the left of the matrix represents the number of significant independent signals detected by each approach. The barplot on the top of the matrix represents the cardinality of the sets. The sets are ordered by cardinality from the largest to the leftmost to the smallest to the rightmost. The bottom panels show quadrant plots, i.e. the -log10(p-value) for the most significant SNP per region for the Omnibus test as a function of the -log10(p-value) for the most significant SNP per region across all univariate GWAS. Complete results are presented in the left panel, and a zoom around the genome-wide significance threshold is presented on the right panel.
(TIF) S19 Fig. Signal comparison for immunity phenotypes. The upper panel shows independent variants detected across phenotype groups and across approaches represented as an UpSetR visualization. Matrix lines correspond to a test, each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot on the left of the matrix represents the number of significant independent signals detected by each approach. The barplot on the top of the matrix represents the cardinality of the sets. The sets are ordered by cardinality from the largest to the leftmost to the smallest to the rightmost. The bottom panels show quadrant plots, i.e. the -log10(p-value) for the most significant SNP per region for the Omnibus test as a function of the -log10(p-value) for the most significant SNP per region across all univariate GWAS. Complete results are presented in the left panel, and a zoom around the genome-wide significance threshold is presented on the right panel. (TIF) S20 Fig. Signal comparison for metabolism phenotypes. The upper panel shows independent variants detected across phenotype groups and across approaches represented as an UpSetR visualization. Matrix lines correspond to a test, each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot on the left of the matrix represents the number of significant independent signals detected by each approach. The barplot on the top of the matrix represents the cardinality of the sets. The sets are ordered by cardinality from the largest to the leftmost to the smallest to the rightmost. The bottom panels show quadrant plots, i.e. the -log10(p-value) for the most significant SNP per region for the Omnibus test as a function of the -log10(p-value) for the most significant SNP per region across all univariate GWAS. Complete results are presented in the left panel, and a zoom around the genome-wide significance threshold is presented on the right panel. (TIF) S21 Fig. Signal comparison for psychiatric phenotypes. The upper panel shows independent variants detected across phenotype groups and across approaches represented as an UpSetR visualization. Matrix lines correspond to a test, each column to a set of significant variants. For each set, the test for which variants are significant are represented with a black dot on the test line. The barplot on the left of the matrix represents the number of significant independent signals detected by each approach. The barplot on the top of the matrix represents the cardinality of the sets. The sets are ordered by cardinality from the largest to the leftmost to the smallest to the rightmost. The bottom panels show quadrant plots, i.e. the -log10(p-value) for the most significant SNP per region for the Omnibus test as a function of the -log10(p-value) for the most significant SNP per region across all univariate GWAS. Complete results are presented in the left panel, and a zoom around the genome-wide significance threshold is presented on the right panel. To simplify the visualization, anatomical categories that did not explain at least 5% of any of the phenotype group were regrouped under the "Other" category. In each group, anatomical category representing less than 1% were approximated to 0. a) results obtained with variants detected with univariate test and b) results with variants detected with multivariate tests. On the left panel, the alluvial plot represents the re-assignment of SNPs from univariate analysis to clusters. To emphasize the relative genetic contribution to phenotypes, SNPs from each phenotype block were weighted by their variance explained to that phenotype. On the right panel, the heatmap represents the multi-trait signatures. Each line is a SNPs, each column, a trait. The gradient of color represents the strength of the Z-scores. (TIF)

S26 Fig. Alluvial plot and heatmap for the Cardiovascular GWAS set.
On the left panel, the alluvial plot represents the re-assignment of SNPs from univariate analysis to clusters. To emphasize the relative genetic contribution to phenotypes, SNPs from each phenotype block were weighted by their variance explained to that phenotype. On the right panel, the heatmap represents the multi-trait signatures. Each line is a SNP, each column, a trait. The gradient of color represents the strength of the Z-scores. (TIF)

S27 Fig. Alluvial plot and heatmap for the Immunity GWAS set.
On the left panel, the alluvial plot represents the re-assignment of SNPs from univariate analysis to clusters. To emphasize the relative genetic contribution to phenotypes, SNPs from each phenotype block were weighted by their variance explained to that phenotype. On the right panel, the heatmap represents the multi-trait signatures. Each line is a SNP, each column, a trait. The gradient of color represents the strength of the Z-scores. (TIF) S28 Fig. Alluvial plot and heatmap for the Composite GWAS set. On the left panel, the alluvial plot represents the re-assignment of SNPs from univariate analysis to clusters. To emphasize the relative genetic contribution to phenotypes, SNPs from each phenotype block were weighted by their variance explained to that phenotype. On the right panel, the heatmap represents the multi-trait signatures. Each line is a SNP, each column, a trait. The gradient of color represents the strength of the Z-scores. (TIF)

S29 Fig. Alluvial plot and heatmap for the Psychiatric GWAS set.
On the left panel, the alluvial plot represents the re-assignment of SNPs from univariate analysis to clusters. To emphasize the relative genetic contribution to phenotypes, SNPs from each phenotype block were weighted by their variance explained to that phenotype. On the right panel, the heatmap represents the multi-trait signatures. Each line is a SNP, each column, a trait. The gradient of color represents the strength of the Z-scores. (TIF)

S30 Fig. Alluvial plot and heatmap for the Metabolism GWAS set.
On the left panel, the alluvial plot represents the re-assignment of SNPs from univariate analysis to clusters. To emphasize the relative genetic contribution to phenotypes, SNPs from each phenotype block were weighted by their variance explained to that phenotype. On the right panel, the heatmap represents the multi-trait signatures. Each line is a SNP, each column, a trait. The gradient of color represents the strength of the Z-scores. (TIF)

S31 Fig. Alluvial plot and heatmap for all GWAS combined.
On the left panel, the alluvial plot represents the re-assignment of SNPs from univariate analysis to clusters. To emphasize the relative genetic contribution to phenotypes, SNPs from each phenotype block were weighted by their variance explained to that phenotype. On the right panel, the heatmap represents the multi-trait signatures. Each line is a SNP, each column, a trait. The gradient of color represents the strength of the Z-scores.