Genome-Wide Association Study Identifies Novel Loci Associated with Circulating Phospho- and Sphingolipid Concentrations

Phospho- and sphingolipids are crucial cellular and intracellular compounds. These lipids are required for active transport, a number of enzymatic processes, membrane formation, and cell signalling. Disruption of their metabolism leads to several diseases, with diverse neurological, psychiatric, and metabolic consequences. A large number of phospholipid and sphingolipid species can be detected and measured in human plasma. We conducted a meta-analysis of five European family-based genome-wide association studies (N = 4034) on plasma levels of 24 sphingomyelins (SPM), 9 ceramides (CER), 57 phosphatidylcholines (PC), 20 lysophosphatidylcholines (LPC), 27 phosphatidylethanolamines (PE), and 16 PE-based plasmalogens (PLPE), as well as their proportions in each major class. This effort yielded 25 genome-wide significant loci for phospholipids (smallest P-value = 9.88×10−204) and 10 loci for sphingolipids (smallest P-value = 3.10×10−57). After a correction for multiple comparisons (P-value<2.2×10−9), we observed four novel loci significantly associated with phospholipids (PAQR9, AGPAT1, PKD2L1, PDXDC1) and two with sphingolipids (PLD2 and APOE) explaining up to 3.1% of the variance. Further analysis of the top findings with respect to within class molar proportions uncovered three additional loci for phospholipids (PNLIPRP2, PCDH20, and ABDH3) suggesting their involvement in either fatty acid elongation/saturation processes or fatty acid specific turnover mechanisms. Among those, 14 loci (KCNH7, AGPAT1, PNLIPRP2, SYT9, FADS1-2-3, DLG2, APOA1, ELOVL2, CDK17, LIPC, PDXDC1, PLD2, LASS4, and APOE) mapped into the glycerophospholipid and 12 loci (ILKAP, ITGA9, AGPAT1, FADS1-2-3, APOA1, PCDH20, LIPC, PDXDC1, SGPP1, APOE, LASS4, and PLD2) to the sphingolipid pathways. In large meta-analyses, associations between FADS1-2-3 and carotid intima media thickness, AGPAT1 and type 2 diabetes, and APOA1 and coronary artery disease were observed. In conclusion, our study identified nine novel phospho- and sphingolipid loci, substantially increasing our knowledge of the genetic basis for these traits.


Introduction
Phospho-and sphingolipids are present in all eukaryotic cell membranes and contribute to organelle structure and signalling events that influence cell behaviour and function [1][2][3]. Phosphatidylcholines (PC), phosphatidylethanolamines (PE), lysophosphatidylcholines (LPC) and PE-based plasmalogens (PLPE) are major classes of phospholipids that play an important role in several key processes such as cell survival and inflammation [4][5][6]. Sphingolipids are also essential components of plasma membranes and endosomes and are believed to play critical roles in cell surface protection, protein and lipid transport and sorting, and cellular signalling cascades [7]. In plasma, PC, PE and sphingomyelin (SPM) are included in the structure of lipoproteins; they constitute more than two-thirds of the total phospholipid content in HDL-C and LDL-C, as well as in platelets [8,9]. Remarkable differences in plasma lipoprotein acceptor affinities for the phospholipids exist (LDL-C is the major acceptor for SPM, whereas HDL-C is the predominant acceptor for PC) [9]. Altered concentrations of circulating phospholipids have been implicated in the pathology of type 2 diabetes, dyslipidemia and cardiovascular disease [10][11][12][13][14][15], as well as a wide range of other common diseases including dementia and depression [16].
Identifying genetic variants that influence phospho-and sphingolipid concentrations will be an important step towards understanding pathways contributing to common human disease. Earlier studies of these metabolites identified a number of genetic loci associated with their levels in blood [17][18][19]. We conducted a meta-analysis of genome-wide association studies (GWAS) on plasma levels of 24 SPMs, 9 ceramides (CER), 57 PCs, 20 LPCs, 27 PEs and 16 PLPEs in five European populations: (1) the Erasmus Rucphen Family (ERF) study, conducted in the Netherlands, (2) the MICROS study from the Tyrol region in Italy, (3) the Northern Swedish Population Health Survey (NSPHS) in Norrbotten, Sweden, (4) the Orkney Complex Disease Study (ORCADES) in Scotland, and (5) the CROAS (CROATIA_Vis) study conducted on Vis Island, Croatia.
The top findings were further analysed by adjusting for plasma HDL-C, LDL-C, TG and TC levels. The influences of these top hits on within class lipid ratios were also assessed, to help elucidate potential mechanisms. Finally, the variants that were associated with plasma phospho-and sphingolipid levels were tested for association with carotid intima media thickness (IMT), type 2 diabetes (T2DM), and coronary-artery disease (CAD) using large consortia meta-analysis results. Table 1 provides an overview of the study populations. The mean age, gender ratio and mean values of major classes of phospho-and sphingolipids were comparable among the 5 populations. Means for the individual species are presented in Table S1. Figure 1A and 1B shows the combined Manhattan plot for the meta-analyses of the absolute values and proportions of all phospholipid traits, respectively; Figure 2A and 2B provides the same for the sphingolipids. Out of 357 meta-analyses performed, 202 outcomes yielded genome-wide significant findings, most of which were located around two genes, FADS and LIPC, which were identified previously [17,19] as key lipid regulators and are associated with a large number of species (Table 2 and Table 3). Q-Q plots for the lipid GWAS that yielded significant associations are provided in Figure S1.
To determine if the associations of the phospho-and sphingolipid loci were mediated by these major classes of plasma

Author Summary
Phospho-and sphingolipids are integral to membrane formation and are involved in crucial cellular functions such as signalling, membrane fluidity, membrane protein trafficking, neurotransmission, and receptor trafficking. In addition to severe monogenic diseases resulting from defective phospho-and sphingolipid function and metabolism, the evidence suggests that variations in these lipid levels at the population level are involved in the determination of cardiovascular and neurologic traits and subsequent disease. We took advantage of modern laboratory methods, including microarray-based genotyping and electrospray ionization tandem mass spectrometry, to hunt for genetic variation influencing the levels of more than 350 phospho-and sphingolipid phenotypes. We identified nine novel loci, in addition to confirming a number of previously described loci. Several other genetic regions provided substantial evidence of their involvement in these traits. All of these loci are strong candidates for further research in the field of lipid biology and are likely to yield considerable insights into the complex metabolic pathways underlying circulating phospho-and sphingolipid levels. Understanding these mechanisms might help to illuminate factors leading to the development of common cardiovascular and neurological diseases and might provide molecular targets for the development of new therapies.  lipoproteins, conditional analyses were performed. Table S4 shows the effect size, standard error, and P-values for the genome-wide significant loci when adjusted for HDL-C, LDL-C, TG and TC.
Only the association of the APOE locus (rs7259004) with SPMs was greatly affected by the incorporation of LDL-C and TC. No other major differences were observed in effect size or P-value.

Association with IMT, T2DM, and CAD risk
The top 35 SNPs were assessed for association with IMT, T2DM, and CAD using the GWAS results from the CHARGE [23], DIAGRAM [24] and CARDIoGRAM [25] consortia, respectively. For IMT, we observed a significant association (Pvalue = 7610 24 ) with the FADS1-2-3 locus SNP rs102275 (Table  S7). rs1061808, located in the HLA region on chromosome 6, and two SNPs from the FADS1-2-3 region (rs174479 and rs102275) were associated with T2DM risk (nominal P-value,0.05) (Table  S8). rs964184 from the APOA1-5 region was previously reported to be associated with CAD risk (P-value = 8.02610 210 ) by the CARDIoGRAM meta-analysis study (Table S9). For all three outcomes, the observed P-value distribution differed significantly from that expected under the null hypothesis (Kolmogorov Smirnov P-value#3.3610 216 ; Figure S6).

Discussion
This genome-wide association study of more than 350 phosphoand sphingolipid measurements in five European populations yielded 25 loci associated with phospholipids and 10 loci associated with sphingolipids using a nominal P-value of 5610 28 . After correction for the number of independent phenotypes, the novel genome-wide significant loci included: PAQR9, AGPAT1, PKD2L1, PDXDC1, APOE and PLD2. In addition, further analysis of suggestive SNPs with lipid ratios showed significant association for an additional 3 loci (ABDH3, PNLIPRP2, and PCDH20).
The strongest association in the PAQR9 locus was observed between rs9832727 and the proportion of mono-unsaturated PEs, especially with the ratios PE 34:1/PE 34:2 and PE 36:1/PE 36:2. The protein coded byPAQR9 is an integral membrane receptor and functions as receptor for the hormone adiponectin, suggesting a molecular link with obesity and T2DM [26]. However, we did not observe an association between T2DM risk and this variant.
In the AGPAT1 locus, rs1061808 was associated with the proportion of PC 32:0, and, especially, with the ratio of PC 32:0/ PC 34:1. AGPAT1 is directly connected to phospholipid metabolism ( Figures S4 and S5), as the product of this gene converts lysophosphatidic acid (LPA) into phosphatidic acid (PA) [27]. The locus lies 400 kb distant from the HLA-DRB1 gene which was previously associated with insulin secretion [28]. A suggestive association between rs1061808 and increased T2DM risk was observed in the DIAGRAM consortium meta-analysis results.
We found two loci that strongly influence plasma LPC levels: PKD2L1 and PDXDC1. An intronic variant, rs603424 in the PKD2L1 gene, was strongly associated with LPC 16:1. Pathway analyses suggest that another gene in the same region, SCD (FADS-5), 25 kb away, may be a better candidate since it encodes the stearoyl-CoA desaturase (delta-9-desaturase) enzyme which is involved in fatty acid desaturation. Other members of the FADS family are the strongest genetic regulators of phospholipid metabolism identified to date. In the PDXDC1 locus, the strongest association was observed for intronic SNP rs4500751. This variant is 300 kb distant from PLA2G10, a gene that plays a major role in releasing arachidonic acid from cell membrane phospholipids [29] and the protein can be mapped to both the glycerophospholipid and the sphingolipid metabolism pathways by Ingenuity ( Figures  S4 and S5). In our study, the variant was strongly associated with the ratios of 20:3 fatty acid carrying LPCs, as well as PEs, and PCs, but not with the others, suggesting a fatty-acid specific mechanism for this enzyme.
Another index SNP (rs7259004), associated with SPMs, maps to the well known APOE locus, which also includes three other lipid genes (APOC1, APOC2 and APOC4). Results from the conditional analyses (Table S4)    A second locus associated with the SPMs is PLD2 (phospholipase D2). PLD2 catalyzes the hydrolysis of PC to produce phosphatidic acid and choline and the PLD2 signalling pathway is involved in the destabilization of ABCA1 and, therefore, plays a role in the generation of plasma HDL-C particles [30]. PLD2-related processes may be responsible, in part, for determining the SPM content of HDL-C. Unexpectedly, we did not observe an association between PC levels and the PLD2 locus.
The analysis of the ratios of the phospholipids uncovered three additional associations significant at the adjusted genome-wide threshold (P-value,2.2610 29 ): ABDH3, PNLIPRP2 and PCDH20. The exact function of the ABDH3 and PCDH20 proteins, and how they relate to phospholipid metabolism, has not been determined. PNLIPRP2 (pancreatic lipase-related protein 2) fulfils a key function in dietary fat absorption by hydrolyzing triglycerides into diglycerides and, subsequently, into monoglycerides and free fatty acids ( Figure S4) [31]. We found that a synonymous coding SNP (rs10885997) in PNLIPRP2 was associated with the ratios PC 36:1/PC 34:1 and PC 36:1/PC 34:3, suggesting a fatty-acid specific turnover between these lipids.
The significant hits from the current study were further studied for potential associations with IMT, T2DM, and CAD. For all three outcomes, the P-value distributions differed significantly from the expected null distribution even after exclusion of nominally significant SNPs, suggesting that some of these variants contribute to these outcomes even when they do not achieve statistical significance.
Among our top hits, rs102275 from the FADS cluster was associated with IMT in the CHARGE meta-analysis results [23]. This finding demonstrates the involvement of the FADS locus in the development of atherosclerosis.
In addition, the top SNP from the APOA1-5 locus was implicated in CAD risk in the CARDIoGRAM study [25]. This locus, previously associated with TG levels [20], influenced two ether-bound PCs and the PC/SPM ratio in our study. APOA1 and APOA2 are the predominant proteins in HDL-C particles, which also transport TG. The association between the phospholipids and rs964184 remained significant after adjustment for TG levels, suggesting that this signal is not due solely to TG mediated effects. APOA1 is also a cofactor for lecithin cholesterol acyltransferase (LCAT) which converts cholesterol and PC to cholesteryl esters and LPC on the surface of HDL-C [32] and it is possible that the association we observe here is due to LCAT mediated phospholipid cleavage.
Mapping the findings into the glycerophospholipid and sphingolipid metabolism pathways uncovered several enzymes, kinases, peptidases and G-protein coupled receptors that may also be relevant for phospho-and sphingolipid metabolism. Among those involved in sphingolipid metabolism ( Figure S5), HNF4A (hepatocyte nuclear factor-4) appears to be a common interacting factor for several genes (PCDH20, APOC1, AGPAT1, ITGA9, PLD2, C11ORF10, APOC2, GCKR, APOE, APOC3 and LIPC) from our GWAS. It is already known that the extinction of many hepatic functions and their expression are correlated with expression of  HNF4A which is a candidate transcription factor for further research on lipidomics [33].
In conclusion, we identified 15 previously undescribed loci that were suggestively associated (2.2610 29 ,P-value,5610 28 ) with phospho-and sphingolipid levels. These included interesting candidate genes such as LPAR2. These loci will require followup to definitively establish their relationship with these phenotypes. We also identified nine novel loci below the corrected genome-wide significance threshold (P-value,2.2610 29 ). These loci considerably expand our knowledge of genes/regions involved in the determination of phospho-and sphingolipid concentrations and provide interesting avenues for future research into this important topic.

Materials and Methods
All studies were approved by the local ethical committees. Detailed descriptions of the study populations that contributed to the meta-analysis, as well as detailed information on ethical statements, genotyping, lipid measurements and pathway analysis, are presented in Text S1. Briefly, lipid species were quantified by electrospray ionization tandem mass spectrometry (ESIMS/MS) using methods validated and described previously [34,35]. For each lipid molecule, we adopted the naming system where lipid side chain composition is abbreviated as Cx:y, where x denotes the number of carbons in the side chain and y the number of double bonds. For example, PC 34:4 denotes an acyl-acyl phosphatidylcholine with 34 carbons in the two fatty acid side chains and 4 double bonds in one of them. Lipid traits were analysed individually as well as aggregated into groups of species with similar characteristics (e.g. unsaturated ceramides). These were then analyzed as both absolute concentrations (mM) and as molar percentages within lipid sub-classes (mol%) (calculated as the proportion of each lipid molecule among its own class (e.g. PC, PE, PLPE, LPC). The additive value of the analyses of molar proportions is that it may bring to light genes involved in the transition of one species to another, such as through fatty acid chain elongation or (de)saturation. We also performed single SNP association analyses for each novel locus and the ratio of the index lipid (for example, PC 34:1) to the other lipids in the same class (in the example, PC 34:1/PC 36:1, PC 34:1/PC 38:1) so that we could determine whether the SNP might be involved in elongation or (de)saturation.
As all of the studies included related individuals, testing for association between lipid and allele dosage were performed using a mixed model approach as implemented with the 'mmscore' option in the GenABEL software [38]. Results from the five populations were combined using inverse variance weighted fixed-effects model meta-analyses using the METAL software [39]. To correct for multiple testing, we adopted a Bonferroni correction for the number of phenotypes studied. Since most of the lipid values are correlated with each other, we used the number of principal components (n = 23) that accounted for 79% of the phenotypic variance for this correction and applied it to the classical genomewide significance threshold (5610 28 ). Figure S1 Q-Q Plots from the GWAS of phospho-and sphingolipid traits with genome-wide significant findings. The x-axis shows the expected chi-square value, the y-axis shows the observed. Lambda (l): Genomic control inflation factor. (PDF) Figure S2 Regional plots of phospholipid related loci. Regional association plots covering a 1 Mb window around the top SNPs were created using Locus Zoom (https://statgen.sph.umich.edu/ locuszoom). Diamonds denote the index SNPs (with the smallest Pvalue). The color scale on the right refers to the linkage disequilibrium (R 2 ) between each SNP and the index SNP. Blue peaks show recombination rates. (PDF) Figure S3 Regional plots of sphingolipid related loci. Regional association plots covering a 1 Mb window around the top SNPs were created using Locus Zoom (https://statgen.sph.umich.edu/ locuszoom). Diamonds denote the index SNPs (with the smallest Pvalue). The color scale on the right refers to the linkage disequilibrium (R 2 ) between each SNP and the index SNP. Blue peaks show recombination rates. (PDF)   Figure S6 Q-Q plots for the association between the top loci and disease end points. P KS : P-value from a one sample Kolmogorov-Smirnov test comparing the observed P-value distribution to that expected under the null. (PDF)

Supporting Information
Table S1 Mean phospho-and sphingolipid concentrations of the study participants. SD: standard deviation; P-value: P-value for the test comparing means by gender. (PDF)