Conceived and designed the experiments: JFW NH PR TM PPP AAH BAO CMvD IR AW HC UG. Performed the experiments: ÅJ SHW OP CH VV CG. Analyzed the data: WI. Contributed reagents/materials/analysis tools: GS. Wrote the paper: WI UG.
The authors have declared that no competing interests exist.
Genome-wide association studies (GWAS) have identified 38 larger genetic regions affecting classical blood lipid levels without adjusting for important environmental influences. We modeled diet and physical activity in a GWAS in order to identify novel loci affecting total cholesterol, LDL cholesterol, HDL cholesterol, and triglyceride levels. The Swedish (SE) EUROSPAN cohort (
In this article we report a genome-wide association study on cholesterol levels in the human blood. We used a Swedish cohort to select genetic polymorphisms that showed the strongest association with cholesterol levels adjusted for diet and physical activity. We replicated several genetic loci in other European cohorts. This approach extends present genome-wide association studies on lipid levels, which did not take these lifestyle factors into account, to improve statistical results and discover novel genes. In our analysis, we could identify two genetic loci in the S
Genome-wide association studies (GWAS) have identified more than 38 larger genetic regions which influence blood levels of total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG)
In order to explore the usefulness of including both environmental and genetic factors in the analysis model, we used lipid measurements from the EUROSPAN study, comprising 3,938 individuals for whom genome-wide SNP data (
We chose a population living in northern Sweden for the selection of candidate loci because it shows strong natural heterogeneity in certain lifestyle factors (e.g. diet, activity), but homogeneity in other environmental aspects such as climate
We performed a GWAS with a lifestyle-adjusted model which included not only sex and age, but also daily intake of game meat, non-game meat, fish, milk products, physical activity at work and at leisure as covariates. We focused on the 0.05% of all SNPs with the lowest
SNP | Gene Symbol | Product name (Product Symbol) | |||
rs1684885 | 3.47E-03 | 2.01E-04 | 17 | PRKCI | Protein kinase C iota type (nPKC-iota) |
rs47137 | 3.69E-03 | 2.63E-04 | 14 | SLC2A12 | Glucose transporter type 12 (GLUT-12) |
rs669552 | 3.46E-03 | 2.87E-04 | 12 | FNDC3B | Factor for Adipocyte Differentiation 104 |
rs2303324 | 1.49E-03 | 1.65E-04 | 9 | GALNT14 | Polypeptide GalNAc transferase 14 |
rs12617790 | 2.26E-03 | 2.53E-04 | 9 | GALNT14 | Polypeptide GalNAc transferase 14 |
rs10041333 | 3.04E-03 | 3.74E-04 | 8 | FABP6 | Gastrotropin (GT), alt. Fatty Acid-Binding Protein 6 |
rs222014 | 2.93E-03 | 3.71E-04 | 8 | GC | Vitamin D-binding protein Precursor (DBP) |
rs2070657 | 1.70E-04 | 1.43E-04 | 1 | APP | Alzheimer disease amyloid protein (ABPP) |
rs2186830 | 2.66E-04 | 2.78E-04 | 1 | COLEC12 | Collectin-12 |
rs2478571 | 3.07E-04 | 4.42E-04 | 1 | SLC39A12 | Zinc transporter (ZIP12) |
rs1684885 | 1.53E-04 | 1.61E-05 | 10 | PRKCI | Protein kinase C iota type (nPKC-iota) |
rs1684881 | 2.80E-04 | 3.14E-05 | 9 | PRKCI | Protein kinase C iota type (nPKC-iota) |
rs12617790 | 1.65E-03 | 2.96E-04 | 6 | GALNT14 | Polypeptide GalNAc transferase 14 |
rs7583934 | 2.78E-04 | 1.84E-04 | 2 | LRP1B | Low-density lipoprotein receptor-related protein (LRP-DIT) |
rs1864616 | 8.64E-05 | 1.36E-04 | 2 | TGFBR2 | Transforming growth factor-beta receptor type II (TGFR-2) |
rs843319 | 6.74E-05 | 2.89E-04 | 4 | MBOAT1 | O-acyltransferase domain-containing protein 1 |
rs2292883 | 1.48E-06 | 1.06E-07 |
14 | MLPH | Melanophilin |
rs12712846 | 1.14E-03 | 3.02E-04 | 4 | MTA3 | Metastasis-associated protein |
rs365578 | 8.39E-04 | 2.92E-04 | 3 | NDUFS4 | NADH dehydrogenase 8 iron-sulfur protein 4 (CI-AQDQ) |
rs10519336 | 6.34E-04 | 3.82E-04 | 2 | MCC | Colorectal mutant cancer protein |
rs2054247 | 3.73E-05 | 3.70E-05 | 1 | APLP2 | Amyloid-like protein 2 Precursor |
rs11708205 | 2.67E-04 | 3.57E-04 | 1 | PLD1 | Phospholipase D1 |
rs1567385 | 1.32E-04 | 1.99E-04 | 2 | MAP4K4 | Mitogen-activated protein kinase 4 (MEKKK4) |
rs3776817 | 6.82E-05 | 1.31E-04 | 2 | ADAMTS2 | Procollagen I N-proteinase |
rs1999088 | 7.19E-05 | 1.51E-04 | 2 | MBNL2 | Muscleblind-like protein 2 |
rs1782644 | 1.39E-04 | 3.10E-04 | 2 | ZMIZ1 | Zinc finger MIZ domain-containing protein 1 |
rs4304239 | 1.63E-03 | 2.40E-04 | 7 | IGF2BP3 | Insulin-like growth factor 2 mRNA-binding protein 3 |
rs11770192 | 1.82E-03 | 2.40E-04 | 8 | IGF2BP3 | Insulin-like growth factor 2 mRNA-binding protein 3 |
rs12540730 | 7.79E-04 | 2.43E-04 | 3 | IGF2BP3 | Insulin-like growth factor 2 mRNA-binding protein 3 |
rs3823763 | 9.58E-05 | 4.45E-05 | 2 | BBS9 | Parathyroid hormone-responsive B1 gene protein (PTHB1) |
All candidate SNPs show strongest associations (
In order to evaluate the effect of including diet and activity covariates in the association analysis, we overlaid the
Results for two GWAS analysis models are presented. The unadjusted model (dark blue and light blue circles) included only sex and age as covariates. The adjusted model (red and orange squares) additionally contained food intake and physical activity as predictors. The dashed line indicates the local Bonferroni-adjusted α error = 1.6×10−7.
Results for two GWAS analysis models are presented. The unadjusted model (dark blue and light blue circles) included only sex and age as covariates. The adjusted model (red and orange squares) additionally contained food intake and physical activity as predictors. The dashed line indicates the local Bonferroni-adjusted α error = 1.6×10−7.
Results for two GWAS analysis models are presented. The unadjusted model (dark blue and light blue circles) included only sex and age as covariates. The adjusted model (red and orange squares) additionally contained food intake and physical activity as predictors. The dashed line indicates the local Bonferroni-adjusted α error = 1.6×10−7.
Results for two GWAS analysis models are presented. The unadjusted model (dark blue and light blue circles) included only sex and age as covariates. The adjusted model (red and orange squares) additionally contained food intake and physical activity as predictors. The dashed line indicates the local Bonferroni-adjusted α error = 1.6×10−7.
A food- and activity-adjusted candidate gene association study of the final 39 candidate SNPs in the Scottish (SC) sample (
SNP | Gene | Trait | Cohort | Mean Difference, unadjustedc | Effect Size, unadjustedd | Effect Size, adjustede | ||
rs2000999 | HP | TC | Discovery, SE | 1.12E-03 | 3.84E-04 | 20.21 mg/dl | 0.41 | 0.44 |
Replication, SC | 6.16E-03 | 4.33E-03 | 12.82 mg/dl | 0.29 | 0.52 | |||
rs1532624 | CETP | HDL-C | Discovery, SE | 1.06E-06 | 2.55E-06 | 9.99 mg/dl | 0.73 | 0.48 |
Replication, SC | 2.40E-09 | 1.96E-09 | 9.04 mg/dl | 0.59 | 0.57 | |||
rs5400 | SLC2A2 | TC | Discovery, SE | 3.57E-05 | 2.18E-06 | 27.11 mg/dl | 0.57 | 0.66 |
Replication, NS | 4.68E-02 | N.A. | 13.35 mg/dl | 0.30 | N.A. |
For all replicated SNPs
We also performed an unadjusted candidate gene analysis of the 39 candidate SNPs in all non-Swedish (NS) EUROSPAN cohorts (Scotland, Croatia, The Netherlands, and Italy,
No other associations, including LDL cholesterol or triglycerides levels, were replicated (all
Environmental covariates may either act as moderators, mediators or even suppressors, thereby affecting the discovery of genetic susceptibility loci
The
We also observed a highly significant association between rs1532624 in
This study also has some limitations. First, we are aware that our candidate gene association approach covers only a very small fraction of all genomic loci, which is one of the potential reasons why some classical lipid-influencing genes, such as
In summary, we have demonstrated that modeling environmental factors, in particular major food categories and physical activity, can improve statistical power and lead to the discovery of novel susceptibility loci. Such models also provide an understanding of the complex interplay of genetic and environmental factors affecting human quantitative traits. Inclusion of environmental covariates represents a much needed next step in the quest to model the complete environmental and genetic architecture of complex traits.
All EUROSPAN studies were approved by the appropriate research ethics committees according to the Declaration of Helsinki
The examined subjects stem from five different population-representative, pedigree-based cohorts from the EUROSPAN consortium (
The
The
The VIS study is a cross-sectional study in the villages of Vis and Komiza on the Dalmatian island of Vis, Croatia, and was conducted between 2003 and 2004
The
The
DNA samples were genotyped according to the manufacturer's instructions on Illumina Infinium HumanHap300v2 or HumanCNV370v1 SNP bead microarrays. Both arrays have 311,388 SNP markers in common that are distributed across the human genome. Analysis of the raw data was done in the BeadStudio software with the recommended parameters for the Infinium assay and using the genotype cluster files provided by Illumina. Individuals with a call rate below 95% and SNPs with a call rate below 98%, deviating from Hard-Weinberg equilibrium (
Total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) were quantified by enzymatic photometric assays using an ADVIA1650 clinical chemistry analyzer (Siemens Healthcare Diagnostics GmbH, Eschborn, Germany) at the Institute for Clinical Chemistry and Laboratory Medicine, Regensburg University Medical Center, Germany.
In the NSPHS cohort, we collected data with a food frequency questionnaire based on the Northern Sweden 84-item Food Frequency Questionnaire (NoS-84-FFQ)
In the NSPHS cohort, we used two self-report scales to measure overall physical activity at work and at leisure. The Work Activity Scale (WAS, 6 items) addresses typical occupational physical activities: sitting, standing, walking, lifting, and general indicators of physical activity, i. e. sweating and tiredness after work. The Leisure Activity Scale (LAS, 4 items) asks for various typical freetime activities such walking, cycling, other sporting activities, and sweating as a general indicator of physical activity. Participants reported the frequency of each activity on a 5-point rating scale (1 = “never”, 2 = “seldom”, 3 = “sometimes”, 4 = “often”, and 5 = “always”). Both scales showed satisfying internal consistency with Cronbach's α(WAS) = 0.73 and Cronbach's α(LAS) = 0.70. A similar approach was used for the measurement and analysis of data on physical activity collected with a self-report questionnaire in the Scottish cohort (
Sex and age are chosen as standard moderators of medical outcomes. Food and physical activity covariates have been selected based on findings on natural variation in lifestyle factors in this (data not presented) and other
We tested whether the inclusion of those covariates in the explanatory model led to a statistical significant improvement of the goodness of model fit compared to a restricted model by applying a maximum likelihood ratio (
The confounding effect of treatment with statins on total cholesterol level and LDL cholesterol level was adjusted for by imputing untreated lipid concentrations of medicated individuals using the
First, deviations from normality for all quantitative traits (lipids, age, diet, and physical activity) were corrected by inverse-normal transformation without adjusting for covariates. Second, linear mixed effects models were fitted for the transformed outcomes (TC, LDL-C, HDL-C, TG) using the above mentioned covariates in the Swedish EUROSPAN sample and corresponding measures in the Scottish EUROSPAN sample (
The same statistical approach was used for association analysis of candidate loci with a local type I error of α = 0.05. No Bonferroni adjustment was applied to protect against α inflation since this method would be biased for the following reasons. The applied selection procedure for candidate loci makes the assumption of a global null hypothesis highly unlikely. Additionally, the phenotypes and some of the genotypes are highly correlated decreasing the number of independent tests. Instead all confirmatory tests are reported to allow the reader to evaluate the overall significance of the findings
λ coefficients of lifestyle-adjusted genome-wide analysis varied in a low range between 1.00 and 1.04 in the Swedish cohort (see QQ-plots,
We performed all analysis with the statistical analysis system
Manhattan plots of genome-wide effects on total cholesterol, LDL cholesterol, HDL cholesterol, and triglyceride levels in the Swedish discovery cohort. Results for two GWAS analysis models are presented. The unadjusted model (dark blue and light blue circles) included only sex and age as covariates. The adjusted model (red and orange squares) additionally contained dietary measures (game meat, non-game meat, fish, milk products) as predictors. The dashed line indicates the local Bonferroni-adjusted α error = 1.6×10−7.
(0.31 MB DOC)
Manhattan plots of genome-wide effects on total cholesterol, LDL cholesterol, HDL cholesterol, and triglyceride levels in the Swedish discovery cohort. Results for two GWAS analysis models are presented. The unadjusted model (dark blue and light blue circles) included only sex and age as covariates. The adjusted model (red and orange squares) additionally contained physical activity measures (job, leisure) as predictors. The dashed line indicates the local Bonferroni-adjusted α error = 1.6×10−7.
(0.31 MB DOC)
QQ-Plots for the unadjusted GWAS on total cholesterol, LDL cholesterol, HDL cholesterol, and triglyceride levels in the Swedish discovery cohort. The analysis model was only adjusted for sex and age, but not for diet and activity measures (black line = expected slope under no inflation, red line = slope fitted to observations).
(0.12 MB DOC)
QQ-Plots for the adjusted GWAS on total cholesterol, LDL cholesterol, HDL cholesterol, and triglyceride levels in the Swedish discovery cohort. The analysis model was adjusted for sex, age, diet and activity measures (black line = expected slope under no inflation, red line = slope fitted to observations).
(0.12 MB DOC)
GWAS results for all top candidate SNPs (0.05%) in the Swedish (SE) discovery cohort, the Scottish (SC), and all non-Swedish (NS) replication cohorts.
(0.41 MB XLS)
Comparison of the diet- and activity-adjusted analysis model in the Swedish and the Scottish cohort.
(0.04 MB DOC)
Pearson correlations, determination coefficients (explained variance), and
(0.03 MB XLS)
GWAS results for all top SNPs (0.05%) in the Swedish (SE) discovery cohort, and for all candidate SNPs in the Scottish (SC), and in the non-Swedish (NS) replication cohorts including only individuals without lipid-lowering treatment.
(0.34 MB XLS)
We would like to thank the many colleagues who contributed to collection and phenotypic characterization of the samples, genotyping and analysis of the GWAS data, as well as lipid species analysis. We would also like to acknowledge those who agreed to participate in these studies.
NSPHS: We are grateful for the contribution of samples from the Medical Biobank in Umeå and for the contribution of the district nurse Svea Hennix. ORCADES: We would like to acknowledge the invaluable contributions of Lorraine Anderson and the research nurses in Orkney, the administrative team in Edinburgh and the people of Orkney. VIS: We collectively thank a large number of individuals for their individual help in organizing, planning and carrying out the field work related to the project and data management: Professor Pavao Rudan and the staff of the Institute for Anthropological Research in Zagreb, Croatia (organization of the field work, anthropometric and physiological measurements, and DNA extraction); Professor Ariana Vorko-Jovic and the staff and medical students of the Andrija Štampar School of Public Health of the Faculty of Medicine, University of Zagreb, Croatia (questionnaires, genealogical reconstruction and data entry); Dr Branka Salzer from the biochemistry lab “Salzer”, Croatia (measurements of biochemical traits); local general practitioners and nurses (recruitment and communication with the study population); and the employees of several other Croatian institutions who participated in the field work, including but not limited to the University of Rijeka and Split, Croatia; Croatian Institute of Public Health; Institutes of Public Health in Split and Dubrovnik, Croatia. SNP Genotyping of the Vis samples was carried out by the Genetics Core Laboratory at the Wellcome Trust Clinical Research Facility, WGH, Edinburgh. MICROS: We thank the primary care practitioners Raffaela Stocker, Stefan Waldner, Toni Pizzecco, Josef Plangger, Ugo Marcadent and the personnel of the Hospital of Silandro (Department of Laboratory Medicine) for their participation and collaboration in the research project. ERF: We are grateful to all patients and their relatives, general practitioners, and neurologists for their contributions and to P. Veraart for her help in genealogy, Jeannette Vergeer for the supervision of the laboratory work and P. Snijders for his help in data collection. UPPMAX: The computations were performed on UPPMAX (