A multivariate genome-wide association study of psycho-cardiometabolic multimorbidity

Coronary artery disease (CAD), type 2 diabetes (T2D) and depression are among the leading causes of chronic morbidity and mortality worldwide. Epidemiological studies indicate a substantial degree of multimorbidity, which may be explained by shared genetic influences. However, research exploring the presence of pleiotropic variants and genes common to CAD, T2D and depression is lacking. The present study aimed to identify genetic variants with effects on cross-trait liability to psycho-cardiometabolic diseases. We used genomic structural equation modelling to perform a multivariate genome-wide association study of multimorbidity (Neffective = 562,507), using summary statistics from univariate genome-wide association studies for CAD, T2D and major depression. CAD was moderately genetically correlated with T2D (rg = 0.39, P = 2e-34) and weakly correlated with depression (rg = 0.13, P = 3e-6). Depression was weakly correlated with T2D (rg = 0.15, P = 4e-15). The latent multimorbidity factor explained the largest proportion of variance in T2D (45%), followed by CAD (35%) and depression (5%). We identified 11 independent SNPs associated with multimorbidity and 18 putative multimorbidity-associated genes. We observed enrichment in immune and inflammatory pathways. A greater polygenic risk score for multimorbidity in the UK Biobank (N = 306,734) was associated with the co-occurrence of CAD, T2D and depression (OR per standard deviation = 1.91, 95% CI = 1.74–2.10, relative to the healthy group), validating this latent multimorbidity factor. Mendelian randomization analyses suggested potentially causal effects of BMI, body fat percentage, LDL cholesterol, total cholesterol, fasting insulin, income, insomnia, and childhood maltreatment. These findings advance our understanding of multimorbidity suggesting common genetic pathways.


GWAS selection
Data for major depression were obtained from a meta-analysis by Howard et al. [1] (n = 807,553), which was based on lifetime diagnoses of major depressive disorder (MDD) from clinical interviews, electronical medical records, self-report of symptoms or of help-seeking for depression and self-reported clinical diagnosis or treatment by a medical professional.

Genomic SEM
All default settings were kept, with the exception of 'genomic control', which was set to 'none' as the input univariate GWASs did not appear to suffer from genomic inflation related to population stratification. To help with model convergence, the residual variances were constrained to be positive for CAD and T2D. Once summary statistics for multimorbidity were obtained, the effective sample size (Neff) was estimated following the approach described by Mallard et al. [4]. All of the above analyses were conducted using R version 4.0.0 [5].

Gene mapping
We broadened the genomic loci for annotation to include all known variants that are available in the 1000G reference panel and are in LD (r 2 ≥ 0.6) with one of the non-heterogeneous, independent significant SNPs. Each genomic locus could therefore contain a mixture of independent significant SNPs and SNPs in LD with them (i.e., candidate SNPs). All candidate SNPs were functionally annotated by performing ANNOVAR [6] gene-based annotation using Ensembl genes.

Positional mapping and pathway enrichment analysis
Positional mapping was selected to map SNPs to genes based on their physical position on the genome using ANNOVAR [6] annotations with 10 kb windows. eQTL mapping was used to map SNPs to genes based on significant eQTL associations (i.e., where gene expression was associated with allelic variation at the SNP) [7] within 1 Mb distance using GTEx v8 [8], Blood eQTL browser [9], PsychENCODE eQTLs , and BRAINEAC [11] repositories.
Chromatin interaction mapping was selected to map SNPs to genes based on chromatin interactions between multimorbidity-associated regions and gene regions (without a distance boundary). For chromatin interactions Hi-C [12] data for 21 tissue/cell types were used.
For pathway enrichment analysis, we looked at the enrichment of the prioritised genes in biological pathways and functional categories, and their overrepresentation in differentially expressed gene (DEG) sets (i.e., sets of genes that are either up-regulated or down-regulated in a specific tissue compared to other tissue types).

Polygenic risk score calculation and constructing psychocardiometabolic multimorbidity phenotype in UK Biobank
Prior to calculating polygenic risk scores in the UKBB, we performed QC steps on base and target data, following the steps outlined by Choi et al. [13] The UKBB is a large prospective cohort with over 500,000 participants aged 40-69 years when recruited in 2006-2010 [14].
Out of the total sample, we selected 306,734 individuals of European ancestry that, in addition to individual-level genetic data, had phenotypic information available on depression, CAD and T2D.
To construct the psycho-cardiometabolic multimorbidity phenotype in UK Biobank, we first defined cases for single phenotypes: depression, coronary artery disease and type 2 diabetes based on information available at all time points (i.e., baseline, instance 1, 2 and 3).

Depression phenotype
Depression phenotype was derived using the UKBB field IDs specified in Glanville et al. [15] Briefly, cases were defined based on the following criteria: 1. Self-reported depression -Individuals who self-reported experiencing depression.

Lifetime depression -Mental Health Questionnaire (MHQ)
-Individuals who met criteria for a lifetime history of depression, as defined in Glanville et al. [15] and Davis, et al. [16]. Example items included: "Have you ever had a time in your life when you felt sad, blue, or depressed for two weeks or more in a row?" or "Have you ever had a time in your life lasting two weeks or more when you lost interest in most things like hobbies, work, or activities that usually give you pleasure?".

Coronary artery disease
Cases were defined based on self-reported heart attack/angina (Data-Field 20002; value 1075 / 1074), or vascular heart problems diagnosed by a doctor (Data-Field 6150), specifically heart attack and angina.

Type 2 diabetes
Cases were defined based on self-reported type 2 diabetes (Data-Field 20002; value 1223) and doctor's diagnosis of diabetes (Data-Field 2443).

Genetic correlations
To explore genetic correlations with multimorbidity, we selected 18 traits that are commonly considered risk factors for either one, two or all three diseases (depression, CAD and T2D).

Mendelian randomization
We conducted two-sample Mendelian randomization (MR) analysis [27] between 18 selected risk factors and multimorbidity using TwoSampleMR package [28] in R v4.0.0. The package harmonized exposure and outcome data and attempted to infer positive strand alleles using allele frequencies. Subsequently, independent SNPs with a GWAS P < 5e-8 were selected as instrumental variables for each exposure of interest. For childhood maltreatment, only 10 instruments were present using this threshold. As such, we decided to use a more lenient pvalue threshold for this trait (P < 5e-6), as done elsewhere [29]. Independent SNPs were defined using 10,000 kb distance and r 2 < 0.001.
For each risk factor, we performed inverse-variance weighted (IVW) regression using multiplicative random effects, which combines SNP-exposure and SNP-outcome estimates to provide an overall estimate of the causal effect [27,30].

Multivariate GWAS without UK Biobank
Overall, consistent findings were observed when repeating the analysis using the largest available GWASs for each trait, but this time excluding the UKBB cohort. Briefly, we were independent SNPs (S4 Table). Summary statistics for multimorbidity without UKBB correlated very strongly with summary statistics with UKBB (rg = 0.98, SE = 0.01, P < .001).
Eight of the 30 independent SNPs were a direct replication of the independent hits from the discovery multimorbidity GWAS, whereas the remaining three were suggestive of significance (all P < 5e-5). Using only non-heterogeneous SNPs as input to FUMA resulted While there was significant heterogeneity among the instrumental variables (S13 Table) for most of the risk factors, leave-one-out analyses indicated that no single variant was driving the results (S7- S24 Fig). Lastly, Steiger analysis suggested that the direction of the causal effect between our exposure and outcome variables were correct (S14 Table).
MRlap analysis which corrects for sample overlap, weak instrument bias and winner's curse revealed directionally consistent observed and bias-corrected effects. In many cases, the biascorrected estimates were significantly larger than the observed effects. None of the traits had smaller corrected effects than the observed ones (S13 Table). These results suggest that the average bias in our IVW estimates was towards the null.  Supplementary Figures   Fig A. A flowchart of the analysis.

Fig D. Enrichment in differentially expressed gene sets based on GTEx v8 30 general tissue types.
Significantly enriched differentially expressed gene (DEG) sets with a Bonferroni corrected P-value ≤ 0.05 and a log fold change ≥ 0.58 are highlighted in red.

Fig E. Enrichment in differentially expressed gene sets based on GTEx v8 54 tissue types.
Significantly enriched differentially expressed gene (DEG) sets with a Bonferroni corrected P-value ≤ 0.05 and a log fold change ≥ 0.58 are highlighted in red.

Fig F. MAGMA Tissue Expression Analysis in GTEx v8 53 tissue types.
Using the full distribution of genome-wide association SNP p-values. SNP, single nucleotide polymorphism.

Fig G. MAGMA Tissue Expression Analysis in GTEx v8 30 general tissue types.
Using the full distribution of genome-wide association SNP p-values. SNP, single nucleotide polymorphism.