Fig 1.
LOCUS model overview and study workflow.
(a) Inputs to LOCUS are an n × p design matrix X of p SNPs, and an n × q outcome matrix y of q molecular traits, e.g., gene, protein, lipid, metabolite or methylation levels, for n individuals. The model accounts for all the SNPs and molecular traits jointly. (b) Graphical model representation of LOCUS. The effect between a SNP s and a trait t is modelled by βst, and γst is a latent variable taking value unity if they are associated, and zero otherwise. The parameter ωs controls the pleiotropic level of each SNP, i.e., the number of traits with which it is associated. The parameter σ represents the typical size of effects, and the parameter τt is a precision parameter that relates to the residual variability of each trait t. (c) Outputs of LOCUS are posterior probabilities of associations, pr(γst = 1 ∣ y), for each SNP and each trait (p × q panel), and posterior means for the pleiotropy propensity of each SNP, E(ωs ∣ y) (Manhattan plot). (d) Workflow of the pQTL study. The MS and SomaLogic pQTL data are analyzed in parallel. LOCUS is applied on the Ottawa data for discovery, and 83% of the 18 and 118 pQTL associations discovered with the MS and SomaLogic data replicate in the independent study DiOGenes. The possible relevance of the validated pQTLs for disease endpoints is explored via analyses of clinical parameters from the Ottawa and DiOGenes cohorts. Further support is obtained by evaluating the overlap with eQTLs, epigenomic marks and GWAS risk loci.
Fig 2.
Overview of LOCUS validated pQTL hits.
(a) Venn diagram for the locus-protein hits identified by the GEMMA and LOCUS two-stage analyses. The hits uncovered by GEMMA but not by LOCUS (left) and the hits uncovered by LOCUS but not by GEMMA (right) are listed; the stars indicate that the former were not tagged so not detectable by the LOCUS analyses. When multiple SNPs correspond to the same locus-protein hit, the SNP(s) with the top association(s) in the Ottawa cohort is/are shown. The novel hits (r2 > 0.8-proxy search) are in bold and the hits with dual replication in the alternative proteomic platform are marked with a cross (4 over the 4 quantified with both platforms). (b) Estimated effects (regression coefficients with standard errors) in DiOGenes for the validated pQTLs whose controlled protein is quantified by both technologies (S3 Table). (c) Circular plot for the trans-pQTL associations uncovered by LOCUS (FDR <5% for discovery and validation). Each arrow starts from the pQTL SNP with label indicating its closest gene (grey) and points to the gene (black) coding for the controlled protein. The proteins whose trans pQTLs are missed by GEMMA are highlighted in green.
Fig 3.
Selection performance of LOCUS and GEMMA on simulated data.
(a) Truncated average ROC curves with 95% confidence intervals (50 replicates) for the detection of SNP-trait associations. (b) Average standardized partial AUC of LOCUS and GEMMA for a grid of effect sizes (x-axis) and signal sparsity (y-axis, 20 replicates for each scenario). (c) Simulated association pattern and patterns recovered by LOCUS and GEMMA, averaged over the 50 replicates. The plots display a window of 350 SNPs (x-axis) containing the first three SNPs with simulated associations (blue labels), along with their LD pattern. (d) Average standardized AUC with 95% confidence intervals for different numbers of proteins modelled jointly, i.e., the 133 simulated proteins are randomly partitioned into batches of size q0 (x-axis) and LOCUS is applied separately on these batches: q0 = 1 corresponds to modelling the proteins one by one and q0 = 133 corresponds to modelling all 133 proteins jointly, as achieved by a classical application of LOCUS. (e) Posterior mean of the LOCUS parameter representing the propensity for SNPs to be pleiotropic. Its magnitude satisfactorily reflects the number of simulated associations per SNP (color-coded).
Fig 4.
Two examples of pQTL signal estimation in the discovery set.
(a) The pleiotropic locus ABO: The panel “LOCUS pleiotropy parameter in ABO” shows the posterior mean E(ωs ∣ y) for a 1 Mb region around the gene ABO. This quantity attributes weight to two SNPs, rs8176741 and rs2519093, which LOCUS finds associated with the proteins CADH5, CD209, TIE1, resp., CD209, INSR, LYAM2. For each of these proteins, the colored panel displays the −log10 nominal p-values obtained when re-analyzing the Ottawa data with GEMMA [25, 26]; the dashed and dotted horizontal lines show the Bonferroni level of α = 0.05, resp. α = 0.2. The pleiotropic SNP identified by LOCUS is marked with a green triangle (rs8176741 left column, rs2519093 right column), and its correlation in r2 with the surrounding SNPs is indicated by the yellow to red colors. The top SNP found by GEMMA is shown by a black star; the univariate signal for INSR does not pass the p-value significance thresholds yet is detected by LOCUS multivariate analyses. (b) A locus trans-acting on XRCC6 (same labelling as in (a)).
Fig 5.
Forest plots for associations between proteins under genetic control and clinical parameters.
Analyses were adjusted for age, gender and BMI (Material and methods) and the selection of proteins shown covers the ABO and XRCC6 pQTLs discussed in the main text as well as other examples discussed in S4 Appendix. All endpoints are measured in both the Ottawa and DiOGenes cohorts; they correspond to total lipid levels (first row: total cholesterol, HDL, LDL, triglycerides), glucose/insulin resistance (second row: fasting glucose, fasting insulin, HOMA-IR) and the visceral adiposity index (VAI). In each case, regression coefficients with 95% confidence intervals are shown for the Ottawa and DiOGenes analyses, and for the meta-analysis. The stars indicate associations with meta-analysis FDR <5% (correction applied across all proteins under genetic control, not only those displayed). For proteins with measurements in the MS and SomaLogic platforms, association results are displayed for both; trans-regulated proteins are in bold.