Genetic Determinants of Circulating Sphingolipid Concentrations in European Populations

Sphingolipids have essential roles as structural components of cell membranes and in cell signalling, and disruption of their metabolism causes several diseases, with diverse neurological, psychiatric, and metabolic consequences. Increasingly, variants within a few of the genes that encode enzymes involved in sphingolipid metabolism are being associated with complex disease phenotypes. Direct experimental evidence supports a role of specific sphingolipid species in several common complex chronic disease processes including atherosclerotic plaque formation, myocardial infarction (MI), cardiomyopathy, pancreatic β-cell failure, insulin resistance, and type 2 diabetes mellitus. Therefore, sphingolipids represent novel and important intermediate phenotypes for genetic analysis, yet little is known about the major genetic variants that influence their circulating levels in the general population. We performed a genome-wide association study (GWAS) between 318,237 single-nucleotide polymorphisms (SNPs) and levels of circulating sphingomyelin (SM), dihydrosphingomyelin (Dih-SM), ceramide (Cer), and glucosylceramide (GluCer) single lipid species (33 traits); and 43 matched metabolite ratios measured in 4,400 subjects from five diverse European populations. Associated variants (32) in five genomic regions were identified with genome-wide significant corrected p-values ranging down to 9.08×10−66. The strongest associations were observed in or near 7 genes functionally involved in ceramide biosynthesis and trafficking: SPTLC3, LASS4, SGPP1, ATP10D, and FADS1–3. Variants in 3 loci (ATP10D, FADS3, and SPTLC3) associate with MI in a series of three German MI studies. An additional 70 variants across 23 candidate genes involved in sphingolipid-metabolizing pathways also demonstrate association (p = 10−4 or less). Circulating concentrations of several key components in sphingolipid metabolism are thus under strong genetic control, and variants in these loci can be tested for a role in the development of common cardiovascular, metabolic, neurological, and psychiatric diseases.


Introduction
Sphingolipids are essential components of plasma membranes and endosomes and are believed to play critical roles in cell surface protection, protein and lipid transport and sorting, and cellular signalling cascades. They are known to have roles in both health and disease [1,2]. Several rare monogenic diseases associated with sphingolipid biosynthesis and turnover have been identified such as metachromatic leukodystrophy and GM1-and GM2-gangliosidosis, Niemann-Pick, Gaucher, Krabbe, Fabry, Farber, Tay-Sachs and Sandhoff diseases [3]. Defective biosynthesis due to mutations in genes involved in sphingolipid metabolism (e.g.serine palmitoyl transferase (SPTLC1) [4]; ceroid-lipofuscinosis, neuronal 8 (CLN8) [5]; and ceramide synthase (LASS1) [6]) can also lead to disease. Moreover, natural fungal inhibitors of ceramide synthase can result in a broad spectrum of effects including equine leucoencephalomalacia, porcine pulmonary oedema syndrome and liver cancer in rats [7], demonstrating the wide range of processes that include cell proliferation, differentiation and apoptosis underpinned by sphingolipid metabolism. Identifying common genetic variants that influence the balance between individual sphingolipid concentrations represents an important step towards understanding the contribution of sphingolipids to common human disease. To achieve this goal, we conducted a genome-wide association study (GWAS) on plasma levels of 33 major sphingolipid species (24 sphingomyelins and 9 ceramides) in five European populations, both within and across populations. The traits were analysed by individual species (sphingomyelins (SM), dihydrosphingomyelins (Dih SM), ceramides (Cer) and glucosylceramides (GluCer)) or aggregated into groups of species with similar characteristics (e.g. unsaturated ceramides), and expressed as absolute concentrations or as molar percentages within sphingolipid classes (mol%). In addition we examined 43 matched metabolite ratios between the traits as a surrogate for enzyme activity [8] in separate clusters designed to examine sphingolipid metabolism (11 ratios), desaturation (16 ratios) and elongation (16 ratios). All traits displayed substantial heritabilities in that much of the observed variation in sphingolipid levels could be attributed to genetic variation among individuals in each population.

Results
The GWAS for single species and matched metabolite ratios revealed a total of 32 SNPs in five distinct loci reaching genomewide significance (p values ranging down to 9.08610 266 ) (Table 1, Figure 1 and Figure 2, and Table S1 and Table S3). The direction and magnitude of the observed effect sizes for the 22 variants identified in the analysis of single species are summarized in Table 1 with full details in Table S1. For three of the regions (chromosomal regions 4p12, 14q23. 2 and 19p13.2), p values reached genome-wide significance in the largest cohort (South Tyrol), and the effect was replicated in the other populations. For two additional loci (11q12.3 and 20p12.1), signals bordered on genome-wide significance in South Tyrol alone, were consistent between all 5 populations and reached genome-wide significance in the meta-analysis. In the single species analysis, the strongest associations for three of the loci (11q12.3, 14q23.2 and 19p13. 2) were found with sphingomyelins and dihydrosphingomyelins. The 4p12 locus showed the strongest association with serum glucosylceramides and the 20p12.1 locus showed the strongest association with serum ceramide concentrations. Table S2 shows the p-values for the individual SNPs when included in a multiple regression model, and the fraction of single sphingolipid variance explained by sex, age and all SNPs in the model together. Taken together, the SNPs explain up to 10.1% of the population variation in each trait. Ratios of matched (substrate/product) pairs have been shown to reduce variation in the dataset and increase power of association several orders of magnitude [8]. Analysis of 43 matched metabolite ratios (Table S3) indeed increased power of association up to 10 orders of magnitude on some of the 22 variants above, and revealed an additional 10 SNPs over the same 7 genes reaching statistical significance (see Table S3). Surprisingly no signals from new genes reached genome-wide significance, highlighting the fact that across the 5 cohorts analysed here, the 7 genes identified are the major genes associated with circulating sphingolipid concentrations. Among the 32 significant individual SNPs (Table S4) variants in LASS4 explain up to 7.5% of the variance in some ratios (i.e. in SM16:0/SM18:0), SGPP1 variants explain up to 12.7% of the variance (i.e in SM14:0/SM16:0), FADS1-3 variants explain up to 3.5% of the variance (e.g. in SM16:0/SM16:1), SPTLC3 variants explain up to 4.9% of the variance (e.g. in SM14:0/SM16:0 and SM24:0/Cer24:0), and ATP10D variants up to 4.2% of GluCer/Cer variance. Combined effects of several genes (i.e. SPTLC3 and SGPP1) explains up to 14.2% of the variance in medium chain SM ratios (SM14:0/ SM16:0) and, in combination with LASS4, up to 11.2% of the variance in long-chain sphingomyelin ratios (SM22:0/SM24:0).
All SNPs within the associated chromosomal regions are located within or are in linkage disequilibrium (LD) with genes that encode enzymes involved in sphingolipid biosynthesis or intracellular transport ( Figure 2). The ATPase, class IV, type 10D (ATP10D) gene, located at chromosome 4p12, encodes a putative serinephospholipid (phosphatidylserine, ceramide) translocase [9]. Three SNPs at this locus showed genome-wide significant associations with glucosylceramides (C16:0, C24:1) ( Table 1, Table S1), with an additional five variants revealed in the ratio analysis (Table S3). SNP rs10938494 gave the strongest association in the single species analysis (p-values of 1.68610 29 in South Tyrol and 8.03610 219 in the joint analysis), and was among the strongest association in the ratio analysis (p = 3.04610 216 ) along with rs2351791 (p = 6.58610 217 ).
Three fatty-acid desaturase genes (FADS1, 2 and 3) are located adjacent to one another in a cluster at the 11q12.3 locus. The FADS1-3 genes encode enzymes that regulate the desaturation of fatty acids by the introduction of double bonds between defined carbons of the fatty acyl chain. Seven SNPs at this locus, distributed in and around the three genes, reached statistical significance in the single species analysis for sphingomyelin 16:1 levels in the joint analysis, with p-values ranging from 2.99610 211 (rs174449, close to FADS3) to 6.60610 213 (rs1000778, in FADS3) ( Table 1). The ratio analysis revealed an additional SNP at this locus within the FADS3 gene (rs174450 , Table S3), and improved association results for other SNPs several orders of magnitude (e.g. rs1000778 p = 1.29610 215 ). Fatty acids are built into ceramides by the ceramide synthases (e.g. LASS4). Unsaturated ceramides can be synthesized exclusively by the introduction of unsaturated fatty acids into the sphingosine/sphinganine chain. The pivotal role of FADS1-3 in the synthesis of unsaturated ceramides is confirmed by the strong associations of SNPs in this cluster to the monounsaturated sphingomyelins 16:1, 18:1 and 20:1, which are the end-products of the ceramide biosynthesis pathway (Table 1,  Table S1), and the ratios between these and their respective unsaturated precursors (Table S3). Previous studies of sphingolipid metabolites and poly-unsaturated fatty acids (PUFA) have demonstrated associations to SNPs, including rs174537, over the FADS1 and FADS2 genes in several populations [8,10,11].
The sphingosine-1-phosphate phosphohydrolase 1 gene (SGPP1) at the 14q23.2 locus belongs to the super-family of lipid phosphatases that catalyze the generation of sphingosine and, together with irreversible cleavage by sphingosine-1-phosphate (S1P)-lyase, strongly influences the pathway of S1P to ceramide ( Figure 3). Six SNPs in and around this gene demonstrate the most significant associations with circulating sphingomyelin C14-C16/ C22-C24 and dihydrosphingomyelin concentrations (Table 1) in the single species analysis, with a further two SNPs revealed in the ratio analysis. SNP rs7157785 showed the strongest association with sphingomyelin 14:0 relative content (molar percentage: mol%) with genome-wide significant p-values in all five populations, particularly in the South Tyrol population (p = 2.53610 228 ) and joint analysis (p = 9.08610 266 ), and demonstrated the most significant association in the ratio analysis. Enhanced SGPP1 activity leads to elevated ceramide levels by shifting the stochiometric balance of SGPP1/S1P-lyase towards sphingosine and ceramide production.
Five SNPs at the 19p13.2 locus showed some of the strongest associations with sphingolipids and all lie within LASS4, the gene encoding LAG1 longevity assurance homologue 4. In the single species analysis SNP rs7258249 showed the highest genome-wide significant association with sphingomyelin 18:0 mol% (South Tyrol p = 1.04610 215 and joint analysis p = 2.28610 227 ). Several LASS4 SNPs showed statistically significant association with the sphingomyelin species C18 to C20 and with ceramide C20:0 (Table 1 and Table S1). In the ratio analysis, however, associations strengthened by several orders of magnitude (p value) over those with these SNPs, with rs1466448 demonstrating the most statistically significant association (p = 4.05610 235 ). LASS family members, six of which have been identified in mammals (LASS1-6), are de novo ceramide synthases (CerS) that synthesize dihydroceramide from sphinganine and fatty acid ( Figure 3). Moreover, LASS enzymes catalyze the re-synthesis of ceramide and phytoceramide from sphingosine and phytosphingosine respectively, which are cleavage products of alkaline ceramidase activity in endoplasmic reticulum (ER) membranes.
The 20p12.1 locus contains the serine palmitoyltransferase long chain base subunit 3 gene (SPTLC3) encoding a functional subunit of the SPT enzyme-complex that catalyzes the first and ratelimiting step of de novo sphingolipid synthesis. One SNP (rs680379) demonstrated association for unsaturated ceramide in the South Tyrol population alone (p = 1.77610 207 ) and was genome-wide significant in the joint analysis (p = 8.24610 215 ). Significant association was observed also with C16 to C24 ceramides and the sphingomyelins 16:1 and 17:0 (Table 1 and Table S1). The ratio analysis strengthened association at this variant (p = 3.3610 220 for the metabolite ratio SM24:0/Cer24:0) and revealed two further significant variants at this locus (rs3848751 and rs6078866, Table S3).
As matched metabolite ratios can serve as a proxy for enzyme activity [8], in a complementary candidate gene approach, we investigated association signals in our combined single species and ratio datasets at 624 SNPs within or near 40 genes that encode enzymes involved in sphingolipid metabolism, in order to identify the most promising variants within these genes for further analysis. Of these, a total of 70 variants in or near 23 of the genes demonstrate association p values of 10 24 or less (Table S5).
Sex and age adjusted single sphingolipids species displayed strong phenotypic correlations with circulating plasma lipoproteins especially with total cholesterol or LDL-cholesterol (Table S6, e.g. between the sum of saturated sphingomyelin species and total cholesterol: 0.788/0.717/0.794/0.733/0.773 in respectively NPHS/ERF/SOUTH TYROL/CROATIA/ORKNEY; or SM16:1 and total cholesterol 0.737/0.631/0.671/0.6/0.638). This is in agreement with recent lipid profiling of lipoprotein fractions, showing higher proportions of sphingomyelin and ceramides in the LDL fraction [12]. However, among the GWAS hits uncovered in this analysis, only the FADS1-3 cluster overlaps with those reported in large meta-analysis of circulating serum lipoproteins levels (strongest with total and LDL-cholesterol levels) [13]. Several of the variants reported here display suggestive associations with classical lipids in the EUROSPAN cohorts (Table S7). All eight SNPs in the FADS1-3 cluster associate with HDLcholesterol levels (age-sex adjusted p values between 0.06 and 0.0041) similar to previous observations [8]. Interestingly, the sexspecific age-adjusted results show that these associations seem driven by the association found in males (lowest p = 0.0037 at rs174546). Association with HDL-cholesterol in males is also seen with SNPs in ATP10D (rs2351791, p = 0.01) and SPTLC3 (rs3848751, p = 0.0047). SNPs at ATP10D also associate with LDL-cholesterol, albeit weakly in the total population (rs469463, p = 0.034). In the males only, variants at LASS4 (rs28133, p = 0.043) and SPTLC3 (rs3848751, p = 0.022 and rs6078866, p = 0.02) also associate weakly with LDL-cholesterol levels. Five variants in FADS1-3 and two in ATP10D associate with triglyceride levels, with lower p values in males than in the whole group (p values from 0.017 to 0.009 in FADS1-3 and 0.0071 for

Author Summary
Although several rare monogenic diseases are caused by defects in enzymes involved in sphingolipid biosynthesis and metabolism, little is known about the major variants that control the circulating levels of these important bioactive molecules. As well as being essential components of plasma membranes and endosomes, sphingolipids play critical roles in cell surface protection, protein and lipid transport and sorting, and cellular signalling cascades. Experimental evidence supports a role for sphingolipids in several common complex chronic metabolic, cardiovascular, or neurological disease processes. Therefore, sphingolipids represent novel and important intermediate phenotypes for genetic analysis, and discovering the genetic variants that influence their circulating concentrations is an important step towards understanding how the genetic control of sphingolipids might contribute to common human disease. We have identified 32 variants in 7 genes that have a strong effect on the circulating plasma levels of 33 distinct sphingolipids, and 43 matched metabolite ratios. In a series of 3 German MI studies, we see association with MI for variants in 3 of the genes tested. Further cardiovascular, metabolic, neurological, and psychiatric disease associations can be tested with the variants described here, which may identify additional disease risk and potentially useful therapeutic targets.
. Association of FADS variants with triglyceride levels has also been observed in other populations [8]. As previously highlighted [8], the p values for association with the sphingolipids species were orders of magnitude stronger than with these classical lipids.
Given the reported associations to classical lipids and cardiovascular disease with variants at the FADS1-3 locus [10,13,14], and the evidence from functional studies of a role for sphingolipids in atherosclerotic plaque formation and lipotoxic cardiomyopathy [15], we looked in silico in a series of three age-and sex-adjusted GWAS datasets of German myocardial infarction (MI) case-control studies (Ger MIFS I [16] Ger MIFS II [17] and Ger MIFS III (KORA), unpublished) for evidence of association with the major variants associating with sphingolipid concentrations. Variants within three of the genes (ATP10D, FADS3 and SPTLC3) associate with MI in one or more of the studies ( Table 2). The protective odds ratios observed for variants in ATP10D and SPTLC3 are on alleles correlating positively with higher metabolite/lower ceramide ratios (i.e. GluCer/Cer and SM/Cer), in support of evidence that increased enzyme/transporter activity that lowers ceramide levels might alleviate the pro-apoptotic effects seen with higher ceramide levels in cardiomyocytes [18]. As previously hypothesised, carriers of FADS variants that are associated with higher desaturase activity may be prone to a proinflammatory response favoring atherosclerotic vascular damage [14].

Discussion
Direct experimental evidence indicates a role for sphingolipids in several common complex chronic disease processes including  Table S1. Abbreviations sphingomyelin (SM), dihydrosphingomyelin (dihSM), ceramide (Cer) and glucosylceramide (GluCer) unsaturated ceramides (CerUnsat), saturated ceramides (CerSat). In the nomenclature (e.g. SM18:0), the number before the colon refers to length of the carbon chain and the number after the colon to the number of double bonds in the chain. Additional variants uncovered in the matched metabolite ratio analysis can be found in Table S3. Alleles correspond to Illumina TOP notation. doi:10.1371/journal.pgen.1000672.t001  atherosclerotic plaque formation, myocardial infarction (MI), cardiomyopathy, pancreatic beta cell failure, insulin resistance and type 2 diabetes mellitus (T2D) [15]. Until now, the genetic variants that influence circulating sphingolipid concentrations in the general population have been characterized in relatively small cohorts [8]. Here we identified genetic variation with a significant effect on the biosynthesis, metabolism or intracellular trafficking of some of the major sphingolipids species in a large diverse group of European population samples. The SNPs showing association with circulating sphingolipids explain up to 10.1% of the population variation in each trait and 14.2% of some matched ratios (Tables  S2 and Table S4). Four of the five loci identified contain genes encoding proteins that are either responsible for de novo ceramide synthesis or for ceramide re-synthesis from sphingosine/sphinganine-phosphates or both (SPTLC3, LASS4, FADS1-3 and SGPP1). Increases in all of these enzymatic activities are predicted to elevate the ''ceramide-pool''. The associations are observed not only with ceramides, but also with sphingomyelins, indicating that a considerable proportion of ceramide is converted into the large and more stable ''sphingomyelin-pool''. None of the genes involved in ceramide degradation or ceramide-related signaling is genome-wide significantly associated with the traits analyzed, indicating the primary role of genes related to ceramide production in the genetic control of ceramide levels. Of these four loci, the FADS1-3 gene cluster has been the most frequently to be reported linked with disease in recent literature. Variants within in this region have been associated with cardiovascular disease and classic lipid risk factors such as cholesterol levels [10,13,14]. Reported variants demonstrating association in these reports (rs174547, rs174570, rs174537 and rs174546) are within the FADS1 and FADS2 genes, but expression studies indicate complex regulation in this region, with the FADS1 SNP rs174547 showing correlation with expression of both FADS1 and FADS3 genes [19], while the FADS1 SNP rs174546 correlates with FADS1 but not FADS2 expression [10]. Our strongest associations with both sphingolipid levels and MI are in or nearest the FADS3 gene, with variants showing less marked association with cholesterol levels than that observed with variants over FADS1 and FADS2 genes (Table S7). It is known that sphingomyelin and ceramides can modulate the atherogenic potential of LDL [20]. Further functional studies will be necessary to determine whether the active mechanism is through FADS3 alone, or in concert with FADS1, FADS2 or both. Neurological phenotypes associated with FADS2 include attention-deficit/hyperactivity disorder [21] and the moderation of breastfeeding effects on IQ [22]. Little is published regarding disease association with variants at the other four major loci described here. However, a reported association between expression levels of SGPP1 with Schizophrenia [23] along with changes in SPTLC2 (with variants identified in our candidate SNP search - Table S4) and ASAH1, highlights the importance of testing variants in these genes with multiple neurological and psychiatric diseases. Additional neurological associations with candidate genes listed in Table S4 include SGPL1 in Alzheimer's disease [24] and GBA with Parkinson's disease and dementia with Lewy bodies [25,26]. The wider possible involvement of genes within pathways of ceramide metabolism in Lewy body disease has also been recently reviewed [27].
The fifth locus contains ATP10D, a cation transport ATPase (Ptype) type IV subfamily member. The type IV subfamily is thought to be an important regulator of intracellular serine-phospholipid trafficking however the exact function or transport specificity of ATP10D has not yet been described [9]. A novel functional finding of this study is the specificity of the association of ATP10D SNPs to glucosylceramides (among the species tested so far), which provides the first evidence for the functional involvement of ATP10D in intracellular transport of specific species of ceramide ( Figure 3). Impaired function of ATP10D may therefore lead to enhanced exposure of ceramide to glucosyltransferases, forming higher concentrations of glycosylceramides that are released into the plasma compartment or may elevate serum glucosylceramide concentrations by impaired transport of glycosylceramide to the trans Golgi network. Mutations of ATP10D (C57BL/6J(B6)) in mice result in low HDL concentrations and these mice develop severe obesity, hyperglycaemia and hyperinsulinaemia when fed on a high-fat diet [28]. Based on the mouse model, increased circulating glucosylceramides in connection with ATP10D function would be one plausible mechanism of contributing to weight gain and early insulin resistance. From the novel association of SNPs in ATP10D to MI (Table 2) seen in German studies, further investigation of the specific role of glucosylceramides in MI and other cardiovascular diseases is warranted.
Thus, sphingolipids play a role in pathological processes leading to common complex diseases, and identification of genetic variants that influence the balance between individual sphingolipid species is an important first step into dissecting out the genetic components in such processes. Associations between the SNPs identified in this study, some of which have a strong effect on the circulating plasma levels, and complex metabolic, cardiovascular, inflammatory and neurological diseases in which a role for a sphingolipid-dependent mechanism is implicated can now be investigated. Modulation of sphingolipids in vivo has demonstrated that this may be a successful preventative strategy for diseases in which sphingolipids play a role, lending hope that, once such disease contributions are identified, successful therapeutic regimes may subsequently be identified.

Ethics statement
All studies were approved by the appropriate Research Ethics Committees. The Northern Swedish Population Health Study (NSPHS) was approved by the local ethics committee at the University of Uppsala (Regionala Etikprövningsnä mnden, Uppsala). The ORCADES study was approved by the NHS Orkney Research Ethics Committee and the North of Scotland REC. The Vis study was approved by the ethics committee of the medical faculty in Zagreb and the Multi-Centre Research Ethics Committee for Scotland. The ERF study was approved by the Erasmus institutional medical-ethics committee in Rotterdam, The Netherlands. The MICROS study was approved by the ethical committee of the Autonomous Province of Bolzano. For the German MI studies (GerMIFS-I,-II and -III(KORA), local ethics committees approved the studies and written informed conset obtained as published previously.

Study populations
The ERF study is a family-based study which includes over 3000 participants descending from 22 couples living in the Rucphen region in the 19th century. All descendants were invited to visit the clinical research center in the region where they were examined in person and where blood was drawn (fasting). Height   Association signals with 21 (from 32) variants in 4 chromosomal locations showing genome-wide significant association to circulating sphingolipids, with MI in 3 distinct German patient studies, GerMIFS-I, -II and -III (KORA), differing in their composition by family history of MI [16,17]. 11 variants across the 5 genes (including all LASS4 variants) were removed due to low imputation quality (Rsq,0 .7) in at least one of the MI cohorts or the control groups (KORAS3, F4 and/or PopGen). Reported p values are age and sex adjusted. A fixed-effects meta-analysis using inverse-variance weighting was used to derive combined odds ratios (Meta OR). doi:10.1371/journal.pgen.1000672.t002 and weight were measured for each participant. All participants filled out questionnaire on risk factors, including smoking. The 800 participants included in the lipidomics studies consisted of the first series of participants. The MICROS study is part of the genomic health care program 'GenNova' and was carried out in three villages of the Val Venosta on the populations of Stelvio, Vallelunga and Martello. This study was an extensive survey carried out in South Tyrol (Italy) in the period 2001-2003. An extensive description of the study is available elsewhere [29]. Briefly, study participants were volunteers from three isolated villages located in the Italian Alps, in a German-speaking region bordering with Austria and Switzerland. Due to geographical, historical and political reasons, the entire region experienced a prolonged period of isolation from surrounding populations. Information on the health status of participants was collected through a standardized questionnaire. Laboratory data were obtained from standard blood analyses. Genotyping was performed on just under 1400 participants with 1334 available for analysis after data cleaning. All participants were included in the lipidomics studies.
The Swedish samples are part of the Northern Swedish Population Health Study (NSPHS) representing a family-based population study including a comprehensive health investigation and collection of data on family structure, lifestyle, diet, medical history and samples for laboratory analyses. Samples were collected from the northern part of the Swedish mountain region (County of Norrbotten, Parish of Karesuando). Historic population accounts show that there has been little immigration or other dramatic population changes in this area during the last 200 years.
The Orkney Complex Disease Study (ORCADES) is an ongoing family-based, cross-sectional study in the isolated Scottish archipelago of Orkney. Genetic diversity in this population is decreased compared to Mainland Scotland, consistent with the high levels of endogamy historically. Data for participants aged 18 to 100 years, from a subgroup of ten islands, were used for this analysis. Fasting blood samples were collected and over 200 health-related phenotypes and environmental exposures were measured in each individual. All participants gave informed consent and the study was approved by Research Ethics Committees in Orkney and Aberdeen.
The Vis study includes a 986 unselected Croatians, aged 18-93 years, who were recruited into the study during 2003 and 2004 from the villages of Vis and Komiza on the Dalmatian island of Vis [30,31]. The settlements on Vis island have unique population histories and have preserved their isolation from other villages and from the outside world for centuries. Participants were phenotyped for 450 disease-related quantitative traits. Biochemical and physiological measurements were performed, detailed genealogies reconstructed, questionnaire of lifestyle and environmental exposures collected, and blood samples and lymphocytes extracted and stored for further analyses. Samples in all studies were taken in the fasting state.

Lipidomics
Lipids were quantified by electrospray ionization tandem mass spectrometry (ESI-MS/MS) in positive ion mode as described previously [32,33]. EDTA plasma (serum for South Tyrol) samples were quantified upon lipid extraction by direct flow injection analysis using the analytical setup described by Liebisch et al. [33]. A precursor ion scan of m/z 184 specific for phosphocholine containing lipids was used for phosphatidylcholine (PC) and sphingomyelin (SM) [33]. Ceramide and hexosylceramide were analyzed using a fragment ion of m/z 264 [32]. For each lipid class two non-naturally occurring internal standards were added and quantification was achieved by calibration lines generated by addition of naturally occurring lipid species to plasma. Deisotoping and data analysis for all lipid classes was performed by self programmed Excel Macros according to the principles described previously [33]. Nomenclature of sphingomyelin species is based on the assumption that d18:1 (dihydroxy 18:1 sphingosine) is the main base of plasma sphingomyelin species, where the first number refers to the number of carbon atoms in the chain and the second number to the number of double bonds in the chain.
Genotyping DNA samples were genotyped according to the manufacturer's instructions on Illumina Infinium HumanHap300v2 (except for samples from Vis for which version 1 was used) or Hu-manCNV370v1 SNP bead microarrays. Four populations have 318,237 SNP markers in common that are distributed across the human genome, with Vis samples having 311,398 SNPs in common with the other populations. Samples with a call rate below 97% were excluded from the analysis. Sphingolipid measurements were available for analysis following quality control assessment for 4110 study participants.

Statistical analysis
Genome-wide association analysis was performed using the GenABEL package in R [34]. A score test was used to test for association between the age-and sex-adjusted residuals of sphingolipid traits (both as absolute concentrations and as relative content of the total sphingolipid pool: mol%) and SNP genotypes using an additive model. The Genomic Control procedure [35] was used to account for under-estimation of the standard errors of effects, which occurs because of pedigree structure present in the data [36]. For the most interesting results and the species ratios, we re-analysed the data using ''mmscore'' function, a score test for family-based association [37], as implemented in GenABEL. The relationship matrix used in analysis was estimated using genomic data with ''ibs'' (option weight = ''freq'') function of GenABEL. This analysis, accounting for pedigree structure in an exact manner, allowed for unbiased estimation of the effects of the genetic variants (adjusted for age and sex). The results from all cohorts were combined into a fixed-effects meta-analysis with reciprocal weighting on standard errors of the effect-size estimates, using MetABEL (http://mga.bionet.nsc.ru/,yurii/ABEL/). Thresholds for genome wide significance were set at a p value of less than 1.57610 27 (0.05/318,237 SNPs) for the individual populations. For the overall meta-analysis we chose to use the conservative threshold of 7.2610 28 [38]. Since many of the traits tested and especially the ratios demonstrate high degrees of correlation, introducing a suitable statistical correction the multiple testing of the 76 correlated traits would be complex. Since Bonferroni correction (unsuitable in this instance) would lower thresholds to values between p = 10 29 to 10 210 , and since all five genomic regions have variants with p values ,10 210 , we report the age-sex corrected p values alone. The threshold for replication of significant results from one population in other cohorts was set at a p-value less than 0.05 divided by the number of SNPs tested. All significant variants reported are in Hardy-Weinberg Equilibrium, and effect directions are consistent across all five populations.

Supporting Information
Table S1 Variants significantly associated with circulating sphingolipid concentrations. 22 variants in 5 distinct chromosomal locations demonstrate genome-wide significant association signals with several measured sphingolipid species (listed). The p-values for significant signals across the sphingolipid species are shown for each population separately and jointly, and the direction of the association effects, as derived from the standardized regression coefficient (b), is provided. Abbreviations, sphingomyelin (SM), dihydrosphingomyelin (dihSM), ceramide (Cer) and glucosylceramide (GluCer) unsaturated ceramides (CerUnsat), saturated ceramides (CerSat). In the nomenclature (e.g. GluCer18:0), the number before the colon refers to length of the carbon chain and the number after the colon to the number of double bonds in the chain. Where mol% is used, the measure refers to the relative content of the measured species in the total sphingolipid pool, and is independent of other associated lipid species. Sex-specific age adjusted analyses provided little additional information, unlike the case of the ratio analyses (see Table S3 Table S3 Variants significantly associated with matched metabolite sphingolipid ratios. 32 variants in 5 distinct chromosomal locations demonstrate genome-wide significant association signals with matched metabolite ratios designed to probe metabolism (11 ratios), desaturation (16 ratios) and elongation (16 ratios) -details of the ratios are provided in the table. The p-values for significant signals across the sphingolipid species are shown for each population separately and jointly, and the direction of the association effects, as derived from the standardized regression coefficient (b), is provided. Sex-specific age adjusted results are also displayed, as these provided additional information with the ratio analysis that was more significant than the sex-specific effects seen in the analysis of the single species (not shown). Found at: doi:10.1371/journal.pgen.1000672.s003 (0.09 MB XLS) Table S4 Proportion of variance in matched shpingolipid metabolite ratios. Proportion of the variance in age and sex adjusted sphingolipid ratio explained by SNP variants that were significant in the GWA metanalysis of the 5 EUROSPAN populations. General linear mixed models were fitted using the polygenic function of the R statistical package ''GenABEL'' and variances explained drawn from comparing residual variances between models fitting in the SNP tested as fixed effects and models not fitting them in. Single SNP analysis were carried out for all candidate SNP, and multiple SNP for traits influenced by multiple candidate regions (in this case the top SNP for each region was selected). Shaded cells indicate SNP with GWA significant association in the meta-analysis for the trait analysed.

Acknowledgments
For the MICROS study, we thank the primary care practitioners Raffaela Stocker, Stefan Waldner, Toni Pizzecco, Josef Plangger, Ugo Marcadent, and the personnel of the Hospital of Silandro (Department of Laboratory Medicine) for their participation and collaboration in the research project. The ERF study is grateful to all patients and their relatives, general practitioners, and neurologists for their contributions and to P. Veraart for her help in genealogy, Jeannette Vergeer for the supervision of the laboratory work, and P. Snijders for his help in data collection. The Northern Swedish Population Health Study is grateful for the contribution of samples from the Medical Biobank in Umeå and for the contribution of the district nurse Svea Hennix in the Karesuando study. DNA extractions for ORCADES were performed at the Wellcome Trust Clinical Research Facility in Edinburgh. We would like to acknowledge the invaluable contributions of Lorraine Anderson and the research nurses in Orkney, the administrative team in Edinburgh, and the people of Orkney. The authors collectively thank a large number of individuals for their individual help in organizing, planning, and carrying out the field work related to the project and data management: Professor Pavao Rudan and the staff of the Institute for Anthropological Research in Zagreb, Croatia (organization of the field work, anthropometric and physiological measurements, and DNA extraction); Professor Ariana Vorko-Jovic and the staff and medical students of the Andrija Stampar School of Public Health of the Faculty of Medicine, University of Zagreb, Croatia (questionnaires, genealogical reconstruction and data entry); Dr. Branka Salzer from the biochemistry lab ''Salzer,'' Croatia (measurements of biochemical traits); local general practitioners and nurses (recruitment and communication with the study population); and the employees of several other Croatian institutions who participated in the field work, including but not limited to the University of Rijeka and Split, Croatia; Croatian Institute of Public Health; Institutes of Public Health in Split and Dubrovnik, Croatia. SNP Genotyping of the Vis samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, WGH, Edinburgh.