Advertisement
  • Loading metrics

Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

  • Momoko Horikoshi ,

    Contributed equally to this work with: Momoko Horikoshi, Reedik Mӓgi

    momoko@well.ox.ac.uk

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

  • Reedik Mӓgi ,

    Contributed equally to this work with: Momoko Horikoshi, Reedik Mӓgi

    Affiliation Estonian Genome Center, University of Tartu, Tartu, Estonia

  • Martijn van de Bunt,

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

  • Ida Surakka,

    Affiliations Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, National Institute for Health and Welfare, Helsinki, Finland

  • Antti-Pekka Sarin,

    Affiliations Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, National Institute for Health and Welfare, Helsinki, Finland

  • Anubha Mahajan,

    Affiliation Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

  • Letizia Marullo,

    Affiliation Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy

  • Gudmar Thorleifsson,

    Affiliation deCode Genetic - Amgen Inc, Reykjavik, Iceland

  • Sara Hӓgg,

    Affiliations Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden, Department of Medical Sciences, Molecular Epidemiology, and Science for Life Laboratory, Uppsala University, Uppsala, Sweden

  • Jouke-Jan Hottenga,

    Affiliation Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands

  • Claes Ladenvall,

    Affiliation Department of Clinical Sciences, Diabetes and Endocrinology, Lund University Diabetes Centre, Skåne University Hospital, Malmö, Sweden

  • Janina S. Ried,

    Affiliation Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany

  • Thomas W. Winkler,

    Affiliation Department of Genetic Epidemiology, Institute of Epidemiology and Preventive Medicine, University of Regensburg, Regensburg, Germany

  • Sara M. Willems,

    Affiliation Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands

  • Natalia Pervjakova,

    Affiliation Estonian Genome Center, University of Tartu, Tartu, Estonia

  • Tõnu Esko,

    Affiliations Estonian Genome Center, University of Tartu, Tartu, Estonia, Division of Endocrinology and Center for Basic and Translational Obesity Research, Children’s Hospital, Boston, Massachusetts, United States of America, Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America, Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America

  • Marian Beekman,

    Affiliations Department of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands, Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands

  • Christopher P. Nelson,

    Affiliations Department of Cardiovascular Sciences, University of Leicester, Leicester, United Kingdom, National Institute for Health Research Leicester Cardiovascular Disease Biomedical Research Unit, Glenfield Hospital, Leicester, United Kingdom

  • Christina Willenborg,

    Affiliations Institute for Integrative and Experimental Genomics, University of Lübeck, Lübeck, Germany, DZHK German Center for Cardiovascular Research, Partner Site Hamburg/Kiel/Lübeck, Lübeck, Germany

  • Steven Wiltshire †,

    † Deceased.

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

  • Teresa Ferreira,

    Affiliation Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

  • Juan Fernandez,

    Affiliation Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

  • Kyle J. Gaulton,

    Affiliation Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

  • Valgerdur Steinthorsdottir,

    Affiliation deCode Genetic - Amgen Inc, Reykjavik, Iceland

  • Anders Hamsten,

    Affiliation Cardiovascular Genetics and Genomics Group, Atherosclerosis Research Unit, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden

  • Patrik K. E. Magnusson,

    Affiliation Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden

  • Gonneke Willemsen,

    Affiliation Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands

  • Yuri Milaneschi,

    Affiliation Department of Psychiatry, VU University Medical Center, Amsterdam, The Netherlands

  • Neil R. Robertson,

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

  • Christopher J. Groves,

    Affiliation Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

  • Amanda J. Bennett,

    Affiliation Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

  • Terho Lehtimӓki,

    Affiliation Department of Clinical Chemistry, Fimlab Laboratories and School of Medicine, University of Tampere, Tampere, Finland

  • Jorma S. Viikari,

    Affiliation Department of Medicine, University of Turku and Division of Medicine, Turku University Hospital, Turku, Finland

  • Johan Rung,

    Affiliation European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, United Kingdom

  • Valeriya Lyssenko,

    Affiliations Department of Clinical Sciences, Diabetes and Endocrinology, Lund University Diabetes Centre, Skåne University Hospital, Malmö, Sweden, Steno Diabetes Center A/S, Gentofte, Denmark

  • Markus Perola,

    Affiliations Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, National Institute for Health and Welfare, Helsinki, Finland

  • Iris M. Heid,

    Affiliation Department of Genetic Epidemiology, Institute of Epidemiology and Preventive Medicine, University of Regensburg, Regensburg, Germany

  • Christian Herder,

    Affiliations Institute for Clinical Diabetology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University Düsseldorf, Düsseldorf, Germany, German Center for Diabetes Research (DZD e.V.), Partner Düsseldorf, Germany

  • Harald Grallert,

    Affiliations Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany, Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany

  • Martina Müller-Nurasyid,

    Affiliations Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany, Department of Medicine I, University Hospital Grosshadern, Ludwig-Maximilians-Universität, Munich, Germany, Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig-Maximilians-Universität, Munich, Germany

  • Michael Roden,

    Affiliations Institute for Clinical Diabetology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich Heine University Düsseldorf, Düsseldorf, Germany, German Center for Diabetes Research (DZD e.V.), Partner Düsseldorf, Germany, Department of Endocrinology and Diabetology, University Hospital Düsseldorf, Düsseldorf, Germany

  • Elina Hypponen,

    Affiliations School of Population Health, University of South Australia, Adelaide, Australia, Centre for Paediatric Epidemiology and Biostatistics, University College London Institute of Child Health, London, United Kingdom

  • Aaron Isaacs,

    Affiliations Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands, Center for Medical Systems Biology, Leiden, The Netherlands

  • Elisabeth M. van Leeuwen,

    Affiliation Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands

  • Lennart C. Karssen,

    Affiliation Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands

  • Evelin Mihailov,

    Affiliation Estonian Genome Center, University of Tartu, Tartu, Estonia

  • Jeanine J. Houwing-Duistermaat,

    Affiliation Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands

  • Anton J. M. de Craen,

    Affiliations Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands, Department of Gerontology and Geriatrics, Leiden University Medical Center, Leiden, The Netherlands

  • Joris Deelen,

    Affiliations Department of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands, Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands

  • Aki S. Havulinna,

    Affiliation Unit of Chronic Disease Epidemiology and Prevention, National Institute for Health and Welfare, Helsinki, Finland

  • Matthew Blades,

    Affiliation Bioinformatics and Biostatistics Support Hub (B/BASH), University of Leicester, Leicester, United Kingdom

  • Christian Hengstenberg,

    Affiliations Deutsches Herzzentrum München, Technische Universität München, Munich, Germany, DZHK German Center for Cardiovascular Research, Partner Site Munich, Munich, Germany

  • Jeanette Erdmann,

    Affiliations Institute for Integrative and Experimental Genomics, University of Lübeck, Lübeck, Germany, DZHK German Center for Cardiovascular Research, Partner Site Hamburg/Kiel/Lübeck, Lübeck, Germany

  • Heribert Schunkert,

    Affiliations Deutsches Herzzentrum München, Technische Universität München, Munich, Germany, DZHK German Center for Cardiovascular Research, Partner Site Munich, Munich, Germany

  • Jaakko Kaprio,

    Affiliations Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, National Institute for Health and Welfare, Helsinki, Finland, The Department of Public Health, University of Helsinki, Helsinki, Finland

  • Martin D. Tobin,

    Affiliations Genetic Epidemiology Group, Department of Health Sciences, University of Leicester, Leicester, United Kingdom, National Institute for Health Research (NIHR) Leicester Respiratory Biomedical Research Unit, Glenfield Hospital, Leicester, United Kingdom

  • Nilesh J. Samani,

    Affiliations Department of Cardiovascular Sciences, University of Leicester, Leicester, United Kingdom, National Institute for Health Research Leicester Cardiovascular Disease Biomedical Research Unit, Glenfield Hospital, Leicester, United Kingdom

  • Lars Lind,

    Affiliation Department of Medical Sciences, Uppsala University, Akademiska Sjukhuset, Uppsala, Sweden

  • Veikko Salomaa,

    Affiliation Unit of Chronic Disease Epidemiology and Prevention, National Institute for Health and Welfare, Helsinki, Finland

  • Cecilia M. Lindgren,

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America

  • P. Eline Slagboom,

    Affiliations Department of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands, Netherlands Consortium for Healthy Ageing, Leiden, The Netherlands

  • Andres Metspalu,

    Affiliations Estonian Genome Center, University of Tartu, Tartu, Estonia, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia

  • Cornelia M. van Duijn,

    Affiliations Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, Rotterdam, The Netherlands, Center for Medical Systems Biology, Leiden, The Netherlands

  • Johan G. Eriksson,

    Affiliations Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland, Folkhalsan Research Center, Helsinki, Finland, Vasa Central Hospital, Vasa, Finland, Department of Health Promotion and Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland

  • Annette Peters,

    Affiliations Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany, Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany

  • Christian Gieger,

    Affiliations Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany, Research Unit of Molecular Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany, Institute of Epidemiology II, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany

  • Antti Jula,

    Affiliation Department of Chronic Disease Prevention, National Institute for Health and Welfare, Turku, Finland

  • Leif Groop,

    Affiliations Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, Department of Clinical Sciences, Diabetes and Endocrinology, Lund University Diabetes Centre, Skåne University Hospital, Malmö, Sweden

  • Olli T. Raitakari,

    Affiliations Research Center of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland, Department of Clinical Physiology and Nuclear Medicine, University of Turku and Turku University Hospital, Turku, Finland

  • Chris Power,

    Affiliation Centre for Paediatric Epidemiology and Biostatistics, University College London Institute of Child Health, London, United Kingdom

  • Brenda W. J. H. Penninx,

    Affiliation Department of Psychiatry, VU University Medical Center, Amsterdam, The Netherlands

  • Eco de Geus,

    Affiliations Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands, EMGO Institute for Health and Care Research, VU University & VU University Medical Center, Amsterdam, The Netherlands

  • Johannes H. Smit,

    Affiliation Department of Psychiatry, VU University Medical Center, Amsterdam, The Netherlands

  • Dorret I. Boomsma,

    Affiliation Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands

  • Nancy L. Pedersen,

    Affiliation Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden

  • Erik Ingelsson,

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Department of Medical Sciences, Molecular Epidemiology, and Science for Life Laboratory, Uppsala University, Uppsala, Sweden

  • Unnur Thorsteinsdottir,

    Affiliations deCode Genetic - Amgen Inc, Reykjavik, Iceland, Faculty of Medicine, University of Iceland, Reykjavik, Iceland

  • Kari Stefansson,

    Affiliations deCode Genetic - Amgen Inc, Reykjavik, Iceland, Faculty of Medicine, University of Iceland, Reykjavik, Iceland

  • Samuli Ripatti ,

    ‡ These authors jointly directed this work.

    Affiliations Institute for Molecular Medicine Finland FIMM, University of Helsinki, Helsinki, Finland, National Institute for Health and Welfare, Helsinki, Finland, The Department of Public Health, University of Helsinki, Helsinki, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom

  • Inga Prokopenko ,

    ‡ These authors jointly directed this work.

    Affiliation Deparment of Genomics of Common Disease, School of Public Health, Imperial College London, London, United Kingdom

  • Mark I. McCarthy ,

    ‡ These authors jointly directed this work.

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom, Oxford National Institute for Health Research Biomedical Research Centre, Churchill Hospital, Oxford, United Kingdom

  • Andrew P. Morris ,

    ‡ These authors jointly directed this work.

    Affiliations Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom, Estonian Genome Center, University of Tartu, Tartu, Estonia, Department of Biostatistics, University of Liverpool, Liverpool, United Kingdom, Department of Molecular and Clinical Pharmacology, University of Liverpool, Liverpool, United Kingdom

  •  [ ... ],
  • ENGAGE Consortium
  • [ view all ]
  • [ view less ]

Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

  • Momoko Horikoshi, 
  • Reedik Mӓgi, 
  • Martijn van de Bunt, 
  • Ida Surakka, 
  • Antti-Pekka Sarin, 
  • Anubha Mahajan, 
  • Letizia Marullo, 
  • Gudmar Thorleifsson, 
  • Sara Hӓgg, 
  • Jouke-Jan Hottenga
PLOS
x

Abstract

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

Author Summary

Human genetic studies have demonstrated that quantitative human anthropometric and metabolic traits, including body mass index, waist-hip ratio, and plasma concentrations of glucose and insulin, are highly heritable, and are established risk factors for type 2 diabetes and cardiovascular diseases. Although many regions of the genome have been associated with these traits, the specific genes responsible have not yet been identified. By making use of advanced statistical “imputation” techniques applied to more than 87,000 individuals of European ancestry, and publicly available “reference panels” of more than 37 million genetic variants, we have been able to identify novel regions of the genome associated with these glycaemic and obesity-related traits and localise genes within these regions that are most likely to be causal. This improved understanding of the biological mechanisms underlying glycaemic and obesity-related traits is extremely important because it may advance drug development for downstream disease endpoints, ultimately leading to public health benefits.

Introduction

Quantitative human glycaemic and obesity-related traits, including fasting plasma glucose and insulin (FG and FI), body mass index (BMI), and waist-hip ratio (WHR) are highly heritable [15], and are well established risk factors for type 2 diabetes (T2D) and cardiovascular disease [610]. Large-scale genome-wide association studies (GWAS) have proved to be extremely successful in the identification of loci harbouring genetic variants contributing to these traits in multiple ethnic groups [1127]. This process has been facilitated by technical advances in the development of imputation methods [28] that allow evaluation of association with genetic variants not directly assayed on genotyping arrays, but present instead in more dense phased reference panels, such as those made available through the International HapMap Consortium [29,30]. However, the detected loci are typically characterised by common variant association signals, defined by lead SNPs with minor allele frequency (MAF) of at least 5%, which extend over large genomic intervals because of linkage disequilibrium (LD). They also often map to non-coding sequence, making direct biological interpretation of their effect more difficult than for non-synonymous variants. The lead SNPs at GWAS loci are overwhelmingly of modest effect, and together account for only a small proportion (generally less than 5%) of the overall trait variance [1719,26,27]. As a consequence, there has been limited progress in identifying the genes through which GWAS association signals are mediated, and characterisation of the downstream molecular mechanisms influencing glycaemic and obesity-related traits remains a considerable challenge.

There has been much recent debate as to the role that low frequency and rare variation (MAF<5%) might play in explaining the “missing heritability” of complex human traits [3133]. It has been hypothesized that some of these variants will have larger effects on traits than common SNPs because they are likely to have arisen as a result of relatively recent mutation events, and thus will have been less subject to purifying selection [34]. Unfortunately, such variation is not well captured by traditional GWAS genotyping arrays, by design, even when supplemented by HapMap imputation [3537]. However, more recent, higher density reference panels released by the 1000 Genomes (1000G) Project Consortium [38], constructed on the basis of low-pass whole-genome re-sequencing, provide haplotypes at more than 37 million variants for 1,094 individuals from multiple ethnic groups, and facilitate imputation of genetic variation with MAF as low as 0.5% across diverse populations [3941].

Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we sought to assess the advantages and limitations of high-density imputation for the discovery and fine-mapping of loci for glycaemic and obesity-related traits. We considered 22 European ancestry GWAS (S1 Table), each imputed up to the 1000G “all ancestries” reference panel (Phase 1 interim release, June 2011), in up to (after quality control): 87,048 individuals for BMI; 54,572 individuals for WHR; 46,694 individuals for FG; and 24,245 individuals for FI (S2 and S3 Tables). To account for the impact of overall obesity on central adiposity [18,27] and insulin sensitivity [19], we considered WHR and FI after adjustment for BMI (denoted WHRadjBMI and FIadjBMI, respectively). With these high-density imputed data, we aimed to: (i) discover novel signals of association for glycaemic and obesity-related traits, including within established GWAS loci; (ii) evaluate the impact of low-frequency variation to common SNP GWAS signals; (iii) consider the contribution of genetic variants at GWAS loci in explaining trait variance; and (iv) refine the localisation of potential causal variants underlying GWAS association signals and assess the mechanisms through which they impact glycaemic and obesity-related traits.

Results

Imputation quality

Within each study, we performed stringent quality control of the genotype scaffold before imputation, minimally including sample and variant call rate and deviation from Hardy-Weinberg equilibrium (S1 Table). Each scaffold was imputed up to the 1000G multi-ethnic reference panel (Phase 1 interim release, June 2011), which includes 762 European ancestry haplotypes, using IMPUTEv2 [42], minimac [39] or specialist in-house software (S1 Table). Making use of the multi-ethnic reference panel, including haplotypes from all ancestry groups, has been demonstrated to reduce error rates and to improve imputation quality, particularly of lower frequency variants [28]. Imputed variants were retained for downstream evaluation and association testing if they passed traditional GWAS quality control thresholds (IMPUTEv2 info score ≥ 0.4; minimac r2 ≥ 0.3) [43].

We considered the quality of imputation (as measured by the IMPUTEv2 info score) of variants from the 1000G reference panel in two contributing studies (S4 Table): the 1958 British Birth Cohort from the Wellcome Trust Case Control Consortium (58BC-WTCCC, 2,802 individuals from Great Britain); and the 1966 Northern Finnish Birth Cohort (NFBC1966, 5,276 individuals from Lapland and the Province of Oulu in Northern Finland). In 58BC-WTCCC, 98.8% of common SNPs (MAF≥5%, 6.3 million) and 97.0% of low-frequency variants (0.5%≤MAF<5%, 3.8 million) passed imputation quality control filters, of which 72.9% are not present in HapMap reference panels. However, imputation of rarer variants (0.1%≤MAF<0.5%, 3.4 million) proved less successful in 58BC-WTCCC, with only 80.5% passing quality control filters. The quality of imputation in NFBC1966 was comparable to that observed in 58BC-WTCCC: 99.7% of common SNPs (5.9 million) and 94.4% of low-frequency variants (3.7 million). However, amongst rarer variants, the quality of imputation was noticeably poorer in NFBC1966 (62.8%) than 58BC-WTCCC, presumably reflecting less representation of low-frequency haplotypes from the isolated Northern Finnish population in the 1000G reference panel.

We have demonstrated that high-density imputation provides >90% coverage of low-frequency variants present in the 1000G reference panel in two diverse European ancestry populations. Our study thus enables association testing with more than three million high-quality variants with 0.5%≤MAF<5% that would not have been directly interrogated in previous GWAS of glycaemic and obesity-related traits that have been supplemented by HapMap imputation alone. With the sample sizes available in this study, we have estimated that for any of these variants explaining at least 0.2% of the overall trait variance (i.e. effect size of 0.32 SD units for 1% MAF, and effect size of 0.15 SD units for 5% MAF), we have >99.9% power to detect their association with BMI, WHR, and FG, and >93.9% power to detect their association with FI.

Discovery of novel loci and new lead SNPs

Within each study, we tested for association of each directly typed and well imputed variant with BMI, WHRadjBMI, FG and FIadjBMI, separately in males and females, in a linear regression modelling framework (Methods, S2 and S3 Tables). Association summary statistics were then combined across studies in sex-specific and sex-combined fixed-effects meta-analyses for each trait. Variants passing quality control in fewer than 50% of the contributing studies for each trait were excluded from the meta-analysis. Association signals at genome-wide significance (p<5x10-8) and with lead SNPs independent (r2<0.05) and mapping more than 2Mb from those previously reported for the traits were considered novel. By convention, loci were labelled with the name(s) of the gene(s) located closest to the lead SNP, unless more compelling biological candidates mapped nearby (Table 1, S1, S2, S3 and S4 Figs).

thumbnail
Table 1. Novel loci for glycaemic and obesity-related traits achieving genome-wide significance (p<5x10-8).

https://doi.org/10.1371/journal.pgen.1005230.t001

We identified two novel loci achieving genome-wide significance for BMI in the sex-combined meta-analysis: ATP2B1 (rs1966714, MAF = 0.46, p = 1.9x10-8); and AKAP6 (rs12885467, MAF = 0.49, p = 4.5x10-8). For FG, we detected one novel locus in the sex-combined meta-analysis at RMST (rs17331697, MAF = 0.10, p = 1.3x10-11) and a female-specific association at EMID2 (rs6947345, MAF = 0.017, pMALE = 0.50, pFEMALE = 3.8x10-8). We did not identify any novel loci at genome-wide significance, in either sex-combined or sex-specific analyses, for WHRadjBMI or FIadjBMI. We observed no evidence of heterogeneity in sex-specific allelic effects across studies at the lead SNPs at the novel loci (Table 1). With the exception of the sex-specific association signal at EMID2, the lead SNPs at all other novel loci were common.

At AKAP6 and RMST, the common lead SNPs were present in HapMap (S5 Fig) but did not achieve genome-wide significance in large-scale European ancestry HapMap imputed meta-analyses conducted by the GIANT Consortium [17] (for BMI in up to 123,865 individuals) and the MAGIC Investigators [16] (for FG in up to 46,186 individuals), despite substantial overlap with cohorts contributing to our study. We have estimated that, amongst individuals contributing to our 1000G imputed meta-analyses for BMI/FG, a maximum of 59%/37% also participated in the previous GIANT and MAGIC studies (S5 Table). At RMST, our lead FG SNP approaches genome-wide significance in the MAGIC meta-analysis (p = 6.5x10-6), and this likely reflects stochastic variation. However, at AKAP6, our lead BMI SNP demonstrates only nominal evidence of association (p = 0.012) in the GIANT meta-analysis, suggesting that 1000G reference panels have enabled higher quality imputation at this locus. To investigate this assertion further, we compared the quality of imputation of the lead BMI SNP using HapMap and 1000G reference panels in two contributing studies of diverse European ancestry. In 58BC-WTCCC/NFBC1966, there was a marginal improvement in the IMPUTEv2 info score from 0.972/0.939 using reference haplotypes from CEU HapMap to 0.996/0.971 using those from 1000G.

At ATP2B1, the common lead SNP was not present in HapMap (S5 Fig). The lead SNP for BMI from the GIANT HapMap imputed meta-analysis [17] was rs2579106, achieving nominal evidence for association (p = 6.4x10-5) in a reported sample size of 123,864 individuals. This SNP reached near genome-wide significance in our 1000G imputed meta-analysis, despite the smaller sample size (p = 3.3x10-7, in 86,955 individuals). Furthermore, the HapMap and 1000G lead SNPs are in only modest LD with each other (EUR r2 = 0.22). Taken together, these data suggest that the discovery of this novel locus has been due to improved coverage through 1000G imputation, despite the lead SNP being common.

We observed genome-wide significant evidence of association at 34 established loci for glycaemic and obesity-related traits, including GCKR with the same lead SNP for both FG and FI (S6 Table). At 29 of these loci, our meta-analysis identified lead SNPs that were different from previous reports in which they were first discovered, of which 23 were not present in HapMap (S7 Table). At 18 of these 29 loci, the new lead SNP was in strong LD (r2≥0.8) with that previously reported, and consequently both variants had similar MAF and allelic effect size (S6 Fig). At a further nine of the 29 loci, the new and previously reported lead SNPs were in moderate LD (0.2≤r2<0.8) with each other. For these, there was greater difference in MAF and allelic effect size for each pair of variants, but the new lead SNP was common and not consistently less frequent (S6 Fig). At the remaining two loci, the new lead SNPs were not present in HapMap and were in only weak LD with those previously reported (S7 Fig), mapping near BDNF for BMI (r2 = 0.10) and RSPO3 for WHRadjBMI (r2 = 0.04). At both loci, multiple distinct signals of association have been recently reported by the GIANT Consortium in the largest meta-analyses of BMI and WHRadjBMI in European ancestry individuals genotyped with GWAS arrays, supplemented by imputation up to reference panels from the International HapMap Consortium [29,30], and the Metabochip, in up to 339,224 and 224,459 individuals, respectively [26,27]. At BDNF, our new lead SNP (rs4517468) was in moderate LD (r2 = 0.31) with the index variant (rs10835210) for the GIANT secondary signal of association for BMI at this locus, suggesting that they represent the same underlying effect on obesity.

At established loci, amongst the 29 lead SNPs identified in our 1000G imputed meta-analysis that were different from the previous reports in which they were discovered, five of them are present on the Metabochip: NRXN3 (BMI, rs7141420), SH2B1 (BMI, rs2008514), MC4R (BMI, rs663129), LY86 (WHRadjBMI, rs1294437), and GCKR (FG/FIadjBMI, rs1260326). These variants were thus directly interrogated in the largest European ancestry meta-analyses, to date, of glycaemic and obesity related traits from the GIANT Consortium [26,27] and MAGIC Investigators [19] that made use of this array. At all five of these loci, our new lead SNP is either the same or is in strong LD (EUR r2>0.75) with that reported in the trait-equivalent Metabochip effort. Four of these loci (all except NRXN3) were densely typed as “fine-mapping” intervals on the array, providing evidence that 1000G imputation has been successful at predicting genotypes at untyped variants in these regions, even though the GWAS scaffolds used in our investigation were comparatively sparse.

Multiple distinct association signals

We investigated the evidence for multiple distinct association signals in the glycaemic and obesity-related trait loci achieving genome-wide significance in our study (four novel and 34 established) (Table 1 and S6 Table). We undertook approximate conditional analyses, implemented in GCTA [44], to select index SNPs for distinct association signals achieving “locus-wide” significance (pCOND<10−5) to reflect the number of uncorrelated variants in a 2Mb window flanking the lead SNP (Methods). We made use of summary statistics from the meta-analysis and genotypes from 58BC-WTCCC and NFBC1966 to approximate the LD between genetic variants (directly typed and well imputed) and hence the correlation in parameter estimates in the joint association model. Reassuringly, the index SNPs and association summary statistics (effect sizes and p-values) from the joint model were highly concordant for both reference studies (S8 Table). Finally, we confirmed these GCTA association signals through exact reciprocal conditional analyses by adjustment for genotypes at each index SNP as a covariate in the linear regression model (Methods, Fig 1, Table 2).

thumbnail
Fig 1. Regional plots of multiple distinct signals at WHRadjBMI locus RSPO3 (A), FG loci G6PC2 (B) and GCK (C).

Regional plots for each locus are displayed from: the unconditional meta-analysis (left); the exact conditional meta-analysis for the primary signal after adjustment for the index variant for the secondary signal (middle); and the exact conditional meta-analysis for the secondary signal after adjustment for the index variant for the primary signal (right). The sample sizes vary due to the availability of the well imputed index SNPs of the primary and secondary signals. Directly genotyped or imputed SNPs are plotted with their association P values (on a -log10 scale) as a function of genomic position (NCBI Build 37). Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their correlated proxies (according to a blue to red scale from r2 = 0 to 1, based on pairwise EUR r2 values from the 1000 Genomes June 2011 release). SNP annotations are as follows: circles, no annotation; downward triangles, nonsynonymous; squares, coding or 3′ UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals).

https://doi.org/10.1371/journal.pgen.1005230.g001

thumbnail
Table 2. Loci with multiple distinct signals of association with glycaemic and obesity-related traits achieving “locus-wide” significance in conditional analysis (pCOND<10−5).

https://doi.org/10.1371/journal.pgen.1005230.t002

We identified two distinct signals of association for WHRadjBMI mapping to the RSPO3 locus, indexed by rs72959041 (MAF = 0.079, pCOND = 2.5x10-10) and rs4509142 (MAF = 0.49, pCOND = 5.8x10-6), corresponding to our new lead SNP and that previously reported [18], respectively. More recently, both signals have also been reported by large-scale meta-analyses undertaken by the GIANT Consortium [27]. Our new lead SNP (rs72959041) was reported as the index variant for their secondary association signal at this locus, whilst the index variant for our secondary signal of association (rs4509142) was in strong LD with their lead SNP (rs1936805, r2 = 0.67). The GIANT Consortium also identified a third distinct signal of association at this locus, stronger in females than in males, which was not detected in our conditional analyses, and presumably reflects reduced power due to our smaller sample size. We also identified two distinct signals of association for FG each mapping to GCK (rs878521, MAF = 0.21, pCOND = 1.3x10-18; rs10259649, MAF = 0.27, pCOND = 4.6x10-10) and G6PC2 (rs560887, MAF = 0.31, pCOND = 2.2x10-66; rs138726309, MAF = 0.015, pCOND = 5.7x10-23). None of the index variants for these distinct association signals was present in HapMap (S8 Fig), and only rs10259649 in GCK was well represented by a tag in that reference panel (rs2908292, r2 = 1.00).

Trait variance explained by novel loci and new lead SNPs

We evaluated the additional heritability of glycaemic and obesity-related traits explained by lead SNPs at novel and established loci after 1000G imputation in 5,276 individuals from NFBC1966 (Methods). For each trait, we calculated the phenotypic variance accounted for by: (i) previously reported lead SNPs at established loci; and (ii) new lead SNPs and index variants for distinct association signals at novel and established loci from the present study. The greatest increment in variance explained was observed for FG, where the novel loci and new lead SNPs after 1000G imputation together account for an increase from 1.9% to 2.3%. We also observed noticeable increments in variance explained after 1000G imputation for WHRadjBMI (from 1.1% to 1.3%) and BMI (3.2% to 3.5%). However, for FIadjBMI, only one new lead SNP at an established locus was identified after 1000G imputation, providing a negligible improvement in variance explained (from 0.46% to 0.47%).

Fine-mapping of novel and established GWAS loci

We sought to take advantage of the improved coverage of common and low-frequency variation offered by 1000G imputation to localise potential causal variants (MAF≥0.5%) for the 42 distinct association signals achieving locus-wide significance in our conditional meta-analyses (two distinct signals of association each at RSPO3, GCK, and G6PC2, one signal of association for both FG and FIadjBMI at the GCKR locus, and one signal of association at each of the other 34 novel and established loci). For each distinct signal, we constructed 99% credible sets of variants [45] that together account for 99% probability of driving the association on the basis of the (conditional) meta-analysis (Methods, S9 Table). At the 29 established loci where we identified a new lead SNP after 1000G imputation, the posterior probability of driving the association signal was consistently higher than that for the variant previously reported (S9 Fig). The greatest increases in posterior probability were observed at: GCKR (FG/FIadjBMI, increase from 2.6%/1.8% to 93.5%/89.6%); RSPO3 (WHRadjBMI, increase from 0.4% to 78.6%); PROX1 (FG, increase from 13.2% to 76.9%); and NRXN3 (BMI, increase from 2.5% to 62.2%).

Credible sets are well calibrated for common and low-frequency variants provided that imputation and meta-analysis provides complete coverage of variation with MAF≥0.5% at each locus. Smaller credible sets, in terms of the number of variants they contain, thus correspond to fine-mapping at higher resolution. We considered 99% credible sets containing fewer than 20 variants to be “tractable”, and amenable to follow-up through additional analyses of functional and regulatory annotation (Table 3, S10 Table). The most precise localisation was observed for FG loci including: MTNR1B (rs10830963 accounts for more than 99.9% of the probability of driving the association); both distinct signals at G6PC2 (two variants each, mapping to <15kb interval); and one signal at GCK (indexed by rs878521, mapping to <25kb interval). Of the 127 variants reported in these tractable credible sets, 74 (58.3%) were not present in HapMap, and accounted for 42.4% of the probability of driving the association signals. None of the HapMap variants in the tractable credible sets was of low-frequency, compared to 20.8% of those present only in 1000G (S11 Table).

thumbnail
Table 3. Association signals for glycaemic and obesity-related traits for which the 99% credible sets contain no more than 20 variants.

https://doi.org/10.1371/journal.pgen.1005230.t003

The tractable credible sets included coding variants at just three loci implicated in FG: GCKR, SLC30A8, and the low-frequency association signal at G6PC2. The lead SNP mapping to GCKR (rs1260326) was the common coding variant L446P, which accounts for 93.5% of the probability of driving the FG association signal, and was present in HapMap. At the SLC30A8 locus, the probability of driving the association for FG was shared between 7 SNPs, in strong LD with each other, and including the coding variant R325W. This variant was present in HapMap, and was sufficient to explain the association signal of the lead non-coding SNP for FG in conditional analysis (rs11558471, p = 3.2x10-10, pCOND = 0.052) at the locus. SLC30A8 R325W is also the lead SNP for T2D susceptibility at this locus in published European ancestry meta-analyses from the DIAGRAM Consortium [46]. Finally, the low-frequency index SNP for the secondary association signal mapping to G6PC2 (rs138726309, MAF = 0.015) was the coding variant H177Y, which accounts for 11.2% of the posterior probability of causality at this locus. For this association signal, none of the variants in the 99% credible set was present in HapMap, and thus would have been overlooked without 1000G imputation. This coding variant has recently been implicated in FG homeostasis in a meta-analysis of 33,407 non-diabetic individuals of European ancestry, genotyped with the Illumina exome array, and in agreement with our study, demonstrates a stronger signal of association in conditional analysis after accounting for the lead SNP at the G6PC2 locus [47].

The remaining variants in the tractable credible sets mapped to non-coding sequence. To gain insight into potential regulatory mechanisms through which these variants might impact glycaemic and obesity-related traits, we overlaid each of these credible sets, in turn, with chromatin state calls from eleven cell lines and tissues (Methods). Across all traits, 99% credible set variants were enriched for overlap with enhancer elements (Fig 2). Focussing on FG, variants within the 99% credible set showed significant enrichment (p<2.4x10-3) for active promoter and transcription factor binding site annotations compared to all others (respectively: 3.8-fold, Fisher's combined p = 9.4x10-5; and 7.2-fold, Fisher’s combined p = 2.1x10-13). Over cell types, this enrichment was most prominent in pancreatic islets (Fig 2). More than half of islet-annotated variants are not present in HapMap, and this would not have been observed without 1000G imputation. For example, at the novel FG RMST locus, 11 of the 14 variants in the 99% credible set are not present in HapMap, but all overlap active islet chromatin marks (S10 Fig).

thumbnail
Fig 2. Broad category functional annotation (A) and cell-type specific annotation (B) of credible set variants.

On the x-axis is each category of broad functional annotation (A) or cell-type specific annotation (B). The fraction of credible set variants that overlap with each category is shown on y-axis. The overlapping variants are further broken down into either variants that exist in both the 1000 Genomes and HapMap reference panel (green) or those that exist only in the 1000 Genomes reference panel (red). TFBS, transcription factor binding site; ncRNA, non-coding RNA; UTR, untranslated regions; GM12878, lymphoblastoid cell line from European ancestry female; hESC, H1 human embryonic stem cells; hASC(t1), human pre-adipocytes; hASC(t4), mature human adipocytes; HepG2, liver carcinoma cell-line; HMEC, human mammary epithelial cells; HSMM, human skeletal muscle myoblasts; HUVEC, human umbilical vein endothelial cells; K562, human myelogenous leukemia cell-line; NHEK, normal human epidermal keratinocytes; NHLF, normal human lung fibroblasts.

https://doi.org/10.1371/journal.pgen.1005230.g002

Discussion

Through meta-analysis of 1000G imputed GWAS of glycaemic and obesity-related traits, we have identified two novel loci for BMI at genome-wide significance, and two for FG (including one low-frequency variant association signal that is specific to females). These loci were not reported in larger meta-analysis efforts of European ancestry undertaken by the GIANT Consortium (for BMI) and the MAGIC Investigators (for FG), despite the partial overlap of contributing studies [1619,26,27]. Improved coverage and quality of imputation for common and low-frequency variation using 1000G reference panels has increased power. We also reported new lead SNPs at 29 established glycaemic and obesity-related trait loci achieving genome-wide significance in our meta-analyses, of which 23 were not present in HapMap, and identified multiple distinct signals of association for WHRadjBMI at RSPO3 and for FG at GCK and G6PC2. Taken together, these novel loci, distinct association signals, and new lead SNPs have increased the trait variance explained for glycaemic and obesity-related traits, although the majority of the heritability remains unaccounted for.

Despite more than 90% coverage of low-frequency variation after 1000G imputation, in diverse European ancestry populations, and equivalent power to detect association across the allele frequency spectrum for a fixed proportion of trait variance explained, the new lead SNPs at established and novel GWAS loci are predominantly common. These data argue strongly against the “synthetic association” hypothesis, which posits that common lead SNPs at GWAS loci will often reflect unobserved causal variants of lower frequency and greater effect size [32]. We recognise that our study has insufficient power to detect common or low-frequency association signals of more modest effect (S12 Table). For example, we estimated that the power to detect association in this study, at genome-wide significance, of a variant of 1% MAF, explaining 0.05% of the overall trait variance (effect size of 0.16 SD units), was 88.0% for BMI, but just 42.1% for WHRadjBMI, 27.7% for FG, and only 2.6% for FIadjBMI. Furthermore, the contribution of rare variants to glycaemic and obesity-related traits cannot be directly investigated with these data because of the low quality imputation for MAF<0.5%, but will require interrogation through deep whole-genome re-sequencing studies in large sample sizes.

We have demonstrated that integration of 1000G imputation, genetic fine-mapping, and genomic annotation, facilitates fine-mapping of GWAS loci for glycaemic and obesity-related traits, and has provided insight into potential functional and regulatory mechanisms through which the effects of these association signals are mediated. In particular, variants in the 99% credible set for the low-frequency association signal mapping to G6PC2 are completely absent from HapMap, but include H177Y. The glucose lowering allele at this variant has been demonstrated to result in a significant decrease in protein expression mediated through proteasomal degradation, leading to a loss of G6PC2 function [47]. We also demonstrated enrichment for overlap of functional elements with variants in the tractable credible sets mapping to non-coding sequence, in particular enhancers. For FG, additional enrichment was observed across credible set variants mapping to promoter and transcription factor binding sites in pancreatic islets, in particular. Uncovering these types of enrichment is essential for prioritisation of variants for functional follow-up, and can be incorporated in statistical models to elucidate causal alleles. Also, at the level of an individual locus, functional annotation can help point to the underlying molecular mechanism through which the GWAS signal is mediated. At G6PC2, for example, the lead SNP, rs560887, in the 99% credible set for the second distinct (non-coding) association signal at this locus (79.5% posterior probability) maps to an enhancer region that is active in pancreatic islets and embryonic stem cells, but repressed in most other cell types. These observations are in agreement with recent reports of clustering of T2D-associated risk variants in islet enhancers [48] and highlights a potential mechanism through which GWAS loci impact glucose homeostasis and disease risk.

Despite the success of traditional GWAS genotyping arrays for the discovery of common variant association signals for complex human traits, because of the structure of LD for variation with MAF>5%, the gold standard approach to directly interrogating lower frequency variation is through re-sequencing studies. However, in agreement with recently published investigations of the contribution of low-frequency variants to a range of phenotypes [47,4951], our study highlights that effect sizes are modest, and require sample sizes for detection that are financially infeasible through re-sequencing on the scale of the whole genome (or exome). We have demonstrated, in this study, that imputation of existing GWAS scaffolds up to reference panels from the 1000 Genomes Project Consortium [38] enables imputation of more than 90% of low-frequency variants in diverse European populations, at no additional cost other than computation and analyst time. Future GWAS of complex traits in European ancestry populations will be further enhanced by the Haplotype Reference Consortium (www.haplotype-reference-consortium.org). This effort will create a reference panel of more than 60,000 haplotypes from re-sequencing of multiple cohorts, predominantly of European ancestry, enabling high-quality imputation to lower allele frequencies. Phase 3 of the 1000 Genomes Project includes haplotypes from diverse populations from each the five major global ethnicities, and thus would be expected to improve imputation quality over Phase 1 for low-frequency variants in East Asian, South Asian, African and American ancestry groups. The viability of imputation as an approach to recover genotypes at low-frequency variants in GWAS undertaken in populations that are not well represented by the 1000 Genomes Project might require whole-genome re-sequencing of some individuals from the study, in combination with haplotypes from the existing reference panel.

Irrespective of the population under investigation, our study suggests that imputation is unlikely to provide sufficient coverage of variation with MAF<0.5% to enable gene-based testing of rare variants [52]. Imputation is restricted to those rare variants that are present in the reference panel, which are much more likely to be population specific. Furthermore, imputation of rare variants that are present in the reference panel is generally poor, although it is not clear how well calibrated the traditional metrics of quality (such as IMPUTEv2 info score) will be. Thorough investigation of the impact of rare variation on phenotype will thus require re-sequencing, although some success in discovering rare coding variants associated with complex human traits has been achieved through exome array genotyping [47,5355]. For the time being, arrays that combine an imputation scaffold with direct interrogation of rare coding variation likely offer the most cost-effective approach to assaying variants across the frequency spectrum.

In conclusion, our study has enabled discovery and fine-mapping of novel and established association signals for glycaemic and obesity-related traits, and through integration with genomic data from relevant tissues, has highlighted functional and regulatory processes through which these effects are mediated. Improved understanding of the biological basis of the quantitative human anthropometric and metabolic traits may advance our appreciation of the mechanisms underlying downstream disease endpoints, including T2D and cardiovascular diseases, ultimately leading to personalised treatment approaches, therapeutic development and public health benefits.

Methods

Ethics statement

All human research was approved by the relevant institutional review boards, and conducted according to the Declaration of Helsinki. All participants provided written informed consent.

Studies and samples

We considered 22 population-based and case-control GWAS of European ancestry in up to (after quality control): 87,048 individuals for BMI; 54,572 individuals for WHRadjBMI; 46,694 individuals for FG; and 24,245 individuals for FIadjBMI. Samples were limited to individuals of at least 18 years of age. Case-control studies were stratified by disease status, with each stratum analysed separately. Full details of study and sample characteristics are provided in S1 Table.

Genotyping and quality control

Samples were genotyped with a variety of GWAS arrays. Sample and SNP quality control was undertaken within each study. Sample quality control included exclusions on the basis of genome-wide call rate, extreme heterozygosity, sex discordance, cryptic relatedness, and outlying ethnicity. SNP quality control included exclusions on the basis of call rate across samples and extreme deviation from Hardy-Weinberg equilibrium. Non-autosomal SNPs were excluded from imputation and association analysis. SNPs with MAF<1% were also excluded from the genotype scaffold prior to imputation. Full details of the genotyping arrays and quality control protocols employed by each study are summarised in S1 Table.

Imputation

Within each study, the autosomal GWAS genotype scaffold was imputed up to the 1000 Genomes Project multi-ethnic reference panel (Phase I interim release, June 2011), which was the most up to date available at the time analyses were undertaken. Imputation was performed using IMPUTEv2 [42], minimac [39] or specialist in-house software. Poorly imputed variants (IMPUTE info<0.4; minimac ) [43], and those with minor allele count of less than three (under a dosage model) were excluded from downstream association analyses.

Trait transformations and study-level association analyses

We utilised protocols for obesity-related and glycaemic trait transformations developed by the GIANT Consortium [17,18] and MAGIC Investigators [19]. Full details of trait transformations, trait summary statistics and study-specific covariates are presented in S2 and S3 Tables.

BMI was calculated as the ratio of weight (kg) to squared height (m2). BMI was inverse normal transformed separately in males and females. Association of the transformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele after adjustment for age, age2 and study-specific covariates, separately in males and females.

WHR was calculated as the ratio of waist circumference (m) to hip circumference (m). Residuals were obtained after adjustment for age, age2, BMI, and study-specific covariates, separately in males and females, and were subsequently inverse-rank normalised. Association of the transformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele, separately in males and females.

FG was measured in mmol/L. Individuals with a diagnosis of diabetes (type 1 or type 2), diabetes treatment, and/or FG≥7mmol/L, non-fasting state, or pregnancy were excluded. Individuals from case cohorts (with diseases such as stroke and cardiovascular disease) were also excluded if they had undergone hospitalization or blood transfusion in the 2–3 months before measurements were taken. Association of the untransformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele after adjustment for age, age2 and study-specific covariates, separately in males and females.

FI was measured in pmol/L with subsequent natural log transformation. Individuals with a diagnosis of diabetes (type 1 or type 2), diabetes treatment, and/or FG≥7mmol/L, non-fasting state, or pregnancy were excluded. Individuals from case cohorts (with diseases such as stroke and cardiovascular disease) were also excluded if they had undergone hospitalization or blood transfusion in the 2–3 months before measurements were taken. Association of the transformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele after adjustment for age, age2, BMI and study-specific covariates, separately in males and females.

Meta-analysis

Summary statistics from association testing of variants passing quality control, separately in males and females, were corrected in each study for residual population structure through genomic control [56] where necessary (S2 and S3 Tables). Subsequently, association summary statistics were combined across studies in sex-specific and sex-combined fixed-effects meta-analyses (inverse-variance weighting) for each trait, as implemented in GWAMA [57]. Heterogeneity in allelic effects between males and females for each trait at each variant was assessed by means of an implementation of Cochran’s Q-statistic [58] in GWAMA [57]. Variants passing quality control in fewer than 50% of the contributing studies for each trait were excluded from the meta-analysis. After filtering, the total numbers of variants reported for each trait were: 9,953,165 for BMI; 9,954,794 for WHRadjBMI; 9,967,162 for FG; and 9,837,044 for FIadjBMI. Sex-specific or sex-combined p<5x10-8 was considered genome-wide significant for each trait. Associated loci are referred to by the name(s) of the nearest gene(s) to lead SNP, unless there are more biologically plausible candidates mapping nearby.

Approximate conditional analysis

We performed approximate conditioning in established and novel glycaemic and obesity-related trait loci in GCTA [44] on the basis of association summary statistics from the sex-combined meta-analyses after variant filtering. We utilised genotype data from two reference studies to approximate LD between variants in diverse European populations, and hence correlation between parameter estimates in the GCTA-COJO joint regression model: 58BC-WTCCC (2,802 individuals from Great Britain); and NFBC1966 (5,276 individuals from Lapland and the Province of Oulu in Northern Finland). We identified “index” variants to represent each distinct association signal achieving genome-wide significance (p<5x10-8) in the GCTA-COJO joint regression model for further validation.

Exact conditional analysis

We performed exact conditional analysis for each locus identified with multiple distinct association signals in GCTA using imputed data from all contributing studies except Rotterdam Study 1 (5,745 individuals). Within each study, we tested for association in the same linear regression framework utilised for unconditional analysis, separately in males and females, but included genotypes at each GCTA index SNP identified at the locus, in turn, as an additional covariate in the model. At each established glycaemic and obesity-related trait locus, we also performed conditioning on the previously reported lead SNP if it differed from that reported in our unconditional meta-analysis. Subsequently, association summary statistics for each signal were combined across studies in sex-specific and sex-combined fixed-effects meta-analyses (inverse-variance weighting) for each trait, as implemented in GWAMA [57].

Trait variance explained

We estimated the variance explained for each trait using genotype data from NFBC1966 (5,276 individuals) in a multiple linear regression framework. For each trait, we considered two sets of variants: (i) previously reported lead SNPs for established loci; and (ii) new lead SNPs and index variants for multiple distinct association signals in established and novel loci. We tested for association of the trait: (i) with covariates only; and (ii) with covariates and the dosage of the minor allele at each variant. For each set of variants, the trait variance explained was given by the difference in the coefficient of determination (r2) between these two regression models.

Credible set construction

For each distinct signal for each trait, we calculated the posterior probability of driving the association for the jth variant, πCj, given by where the summation is over all variants reported in the (conditional) meta-analysis across the locus. In this expression, Λj is the approximate Bayes’ factor [59] for the jth variant, given by where βj and Vj denote the allelic effect and corresponding variance from the (conditional) meta-analysis for the association signal. The parameter ω denotes the prior variance in allelic effects, taken here to be 0.04 [59]. A 99% credible set was then constructed by: (i) ranking all variants in the locus according to their Bayes’ factor, Λj; and (ii) including ranked variants until their cumulative posterior probability exceeds 0.99.

Functional and regulatory annotation

We interrogated coding variants in the 99% credible set for each association signal using Ensembl and HaploReg [60]. Their likely functional consequences were predicted by SIFT [61], PROVEAN [62] and PolyPhen2 [63].

We collected genomic annotation data from several sources. For regulatory state information, we collected sequence reads generated for six assays (H3K4me1, H3K4me3, H3K27ac, H3K27me3, H3K36me3, and CTCF) from 9 ENCODE cell types (GM12878, K562, HepG2, HSMM, HUVEC, NHEK, NHLF, hESC, HMEC) [64], pancreatic islets [65], and adipose stem cells (hASC t1, t4) [66]. Reads were mapped to the human genome reference sequence (hg19) using BWA [67]. Regulatory states for all cell types were called from the aligned reads using ChromHMM [68], assuming 10 states. We then assigned names to the resulting state definitions as follows: active promoter (High H3K4me3, H3K27ac); strong enhancer 1 (H3K4me3, H3K27ac, H3K4me1); strong enhancer 2 (H3K27ac, H3K4me1); weak enhancer (H3K4me1); poised promoter (H3K27me3, H3K4me3, H3K4me1); repressed (H3K27me3); low/no signal; insulator (CTCF); low/no signal; and transcription (H3K36me3). We also obtained transcription factor binding sites (TFBS) established using chromatin immunoprecipitation sequencing. This consisted of data on 147 proteins [6466].

Finally, we used transcript information from GENCODEv14 [69] to define protein-coding genes, 5’ and 3’ UTR regions, and non-coding genes. For transcripts to be classified as protein-coding, the ‘protein-coding’ tag needed to be set and further filtering for either presence in the conserved coding DNA sequence (CCDS) database or experimentally confirmed mRNA start and end was applied. From this set of transcripts, 5’ UTR, exon, and 3’ UTR regions were defined. For non-coding genes, transcripts labelled as ‘lncRNA‘, ‘miRNA’, ‘snoRNA’ or ‘snRNA’ were used as non-coding genes.

Overlap between the annotations described above and variants in tractable credible sets was determined using bedtools v2.17.0. We defined seven broad functional classes from these annotation data: coding (protein-coding transcripts); ncRNA (non-coding RNA transcripts); UTR (3’ and 5’ UTR regions of coding transcripts); enhancers (strong and weak enhancer elements); promoters (active and poised promoter elements); insulators; and TFBS (sites pooled across all factors). We further used each of the cell line annotations as a distinct category. Each variant was allowed to overlap multiple annotation categories.

For each broad functional class, Fisher’s exact test as implemented in R v3.0.1 (with alternative = “greater”) was used to compare whether the set of credible variants showed a higher fold overlap of this annotation versus all of the others independently. The six resulting p-values for each class were then combined using Fisher’s method. With 21 different functional class and trait combinations, a Bonferroni adjusted significance threshold (p<2.4x10-3) was used.

Supporting Information

S1 Fig. Quantile-quantile plot of up to 9,967,162 single nucleotide polymorphisms (SNPs) from the meta-analysis for (A) BMI, (B) WHRadjBMI, (C) FG and (D) FIadjBMI.

The black dots represent observed P values and the grey line represents the expected P values under the null distribution. The red dots represent observed P values after excluding the previously identified signals described in S7 Table.

https://doi.org/10.1371/journal.pgen.1005230.s001

(TIFF)

S2 Fig. Genome-wide association results from the sex-combined (Manhattan) and sex-specific (Miami) meta-analysis for (A) BMI, (B) WHRadjBMI, (C) FG and (D) FIadjBMI.

The association P value (on -log10 scale) for each of up to 9,967,162 SNPs (y-axis) is plotted against the genomic position (NCBI Build 37; x-axis). Association signals that reached genome-wide significance (P < 5x10-8) are shown in green if novel and pink if previously reported.

https://doi.org/10.1371/journal.pgen.1005230.s002

(PDF)

S3 Fig. Regional plots for novel loci associated with BMI or FG identified through (A) sex-combined and (B) sex-specific meta-analyses.

Directly genotyped or imputed SNPs are plotted with their meta-analysis P values (as -log10 values) as a function of genomic position (NCBI Build 37). In each panel, the lead SNP from the meta-analysis is represented by a purple circle. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their correlated proxies (according to a blue to red scale from r2 = 0 to 1, based on pairwise EUR r2 values from the 1000 Genomes June 2011 release). Gene annotations were taken from the UCSC genome browser. SNP annotations are as follows: circles, no annotation; downward triangles, nonsynonymous; squares, coding or 3′ UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals).

https://doi.org/10.1371/journal.pgen.1005230.s003

(TIFF)

S4 Fig. Forest plots of the associations at novel loci for BMI (A, B) and FG (C, D).

For each study, sex (m, f) and sample size are displayed after the study name. Box size is proportionate to the sample size.

https://doi.org/10.1371/journal.pgen.1005230.s004

(PDF)

S5 Fig. Regional plots for novel loci at or near (A) ATP2B1, (B) AKAP6, (C) RMST and (D) EMID2 comparing the SNP coverage between 1000 Genomes imputed and HapMap imputed SNPs.

For each of the novel signals, all the SNPs imputed up to the 1000 Genomes reference panel (left) or only those present in the HapMap panel (right) are plotted with their meta-analysis P values (as -log10 values) as a function of genomic position (NCBI Build 37). In both plots, the lead SNP in HapMap panel is represented by a purple circle. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their proxies (according to a blue to red scale from r2 = 0 to 1, based on pairwise r2 values from the 1000 Genomes June 2011 release EUR). SNP annotations are as follows: circles, no annotation; downward triangles, nonsynonymous; squares, coding or 3′ UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals).

https://doi.org/10.1371/journal.pgen.1005230.s005

(PDF)

S6 Fig. Comparison of characteristics of previously reported and new lead SNPs at established loci for glycaemic and obesity-related traits.

Minor allele frequency (MAF) (A) and effect size (B) of the previously reported lead SNP on the x-axis and the new lead SNP on the y-axis. Details of the SNPs are presented in S7 Table.

https://doi.org/10.1371/journal.pgen.1005230.s006

(TIFF)

S7 Fig. Regional plots for known signals at (A) BDNF and (B) RSPO3 comparing the SNP coverage between 1000 Genomes imputed and HapMap imputed SNPs.

For each association signal, all the SNPs imputed up to the 1000 Genomes reference panel (left) or only those present in the HapMap panel (right) are plotted with their conditional meta-analysis P values (as -log10 values) as a function of genomic position (NCBI Build 37) after adjustment for the other index SNP at the locus. In each plot, the previously reported lead SNP is highlighted by the purple circle. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their proxies (according to a blue to red scale from r2 = 0 to 1, based on pairwise r2 values from the 1000 Genomes June 2011 release EUR).

https://doi.org/10.1371/journal.pgen.1005230.s007

(TIFF)

S8 Fig. Regional plots for multiple distinct association signals at RSPO3 (A, B), G6PC2 (C, D) and GCK (E, F) comparing the SNP coverage between 1000 Genomes imputed SNPs and HapMap imputed SNPs.

For each association signal, all the SNPs imputed up to the 1000 Genomes reference panel (left) or only those present in the HapMap panel (right) are plotted with their conditional meta-analysis P values (as -log10 values) as a function of genomic position (NCBI Build 37) after adjustment for the other index SNP at the locus. In each plot, the lead SNP present in HapMap is represented by a purple circle. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their proxies (according to a blue to red scale from r2 = 0 to 1, based on pairwise r2 values from the 1000 Genomes June 2011 release EUR). SNP annotations are as follows: circles, no annotation; downward triangles, nonsynonymous; squares, coding or 3′ UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals).

https://doi.org/10.1371/journal.pgen.1005230.s008

(PDF)

S9 Fig. Comparison of the posterior probability between previously reported and new lead SNPs at established loci for glycaemic and obesity-related traits.

Posterior probability (PP) of the previously reported lead SNP on the x-axis and the new lead SNP on the y-axis. Details of the SNPs are presented in S7 Table.

https://doi.org/10.1371/journal.pgen.1005230.s009

(TIFF)

S10 Fig. Expression and chromatin status of a novel FG-associated locus, RMST, in human tissue.

(A) Expression data of RMST are extracted from the Human Illumina BodyMap 2.0 and reads per kilobase of exon per million reads (RPKMs) are plotted across 17 human tissues. (B) Annotation of RMST in islet cells. Transcription factor binding ChIP sites (TFBS) and chromatin states in islet cell lines from various resources are presented (see Methods).

https://doi.org/10.1371/journal.pgen.1005230.s010

(TIFF)

S1 Table. Cohort summary information on sample quality control, genotyping, imputation and statistical method.

https://doi.org/10.1371/journal.pgen.1005230.s011

(PDF)

S2 Table. Summary of obesity-related traits in each study.

https://doi.org/10.1371/journal.pgen.1005230.s012

(PDF)

S3 Table. Summary of fasting glycaemic traits in each study.

https://doi.org/10.1371/journal.pgen.1005230.s013

(PDF)

S4 Table. Comparison of the SNP coverage between 1000 Genomes imputation and HapMap imputation in 1958 British Birth Cohort (a) and 1966 Northern Finnish Birth Cohort (b).

https://doi.org/10.1371/journal.pgen.1005230.s014

(PDF)

S5 Table. Overlap of study samples between GIANT, MAGIC and current ENGAGE consortium.

https://doi.org/10.1371/journal.pgen.1005230.s015

(PDF)

S6 Table. Established loci for glycaemic and obesity-related traits achieving genome-wide significance (p<5x10-8).

https://doi.org/10.1371/journal.pgen.1005230.s016

(PDF)

S7 Table. Summary of lead SNPs achieving genome-wide significance in established loci for BMI, WHRadjBMI, FG and FIadjBMI.

https://doi.org/10.1371/journal.pgen.1005230.s017

(PDF)

S8 Table. Approximate conditional analysis results for BMI, WHRadjBMI, FG and FIadjBMI.

https://doi.org/10.1371/journal.pgen.1005230.s018

(PDF)

S9 Table. Summary of 99% credible sets at 42 distinct association signals.

https://doi.org/10.1371/journal.pgen.1005230.s019

(PDF)

S10 Table. Variants of 99% credible sets containing less than 20 variants driving distinct association signals for BMI, WHRadjBMI, FG and FI adjBMI.

https://doi.org/10.1371/journal.pgen.1005230.s020

(PDF)

S11 Table. Allele frequency distribution of 99% credible sets with less than 20 variants.

https://doi.org/10.1371/journal.pgen.1005230.s021

(PDF)

S12 Table. Power to detect association, at genome-wide significance (p<5x10-8), with a variant of MAF 1% in the current study.

https://doi.org/10.1371/journal.pgen.1005230.s022

(PDF)

Author Contributions

Conceived and designed the experiments: MH RM MvdB SR IP MIM APM. Performed the experiments: MH RM MvdB IS APS AMa LM SW TF IP APM. Analyzed the data: MH RM MvdB IS APS AMa LM GT SH JJHo CL JSR TWW SMW NT TE MBe CPN CW JF KJG GW JR IMH AI EMvL LCK EM JJHD ASH MBl MDT APM NRR. Contributed reagents/materials/analysis tools: IS JJH TE VSt AH PKEM GW YM CJG AJB TL JSV VL MP CHer HG MMN MR EH EMvL LCK AJMdC JD CHen JE HS JK MDT NJS LL VSa CML PES AMe CMvD JGE AP CG AJ LG OTR CP BWJHP EdG JHS DIB NLP EI UT KS SR MIM APM. Wrote the paper: MH RM MvdB SR IP MIM APM. Central data management and storage: NRR. Critical review and approval of the manuscript: MH RM MvdB IS APS AMa LM GT SH JJH CL JSR TWW SMW NP TE MBe CPN CW SW TF JF KJG VSt AH PKEM GW YM NRR CJG AJB TL JSV JR VL MP IMH CHer HG MMN MR EH AI EMvL LCK EM JJHD AJMdC JD ASH MBl CHen JE HS JK MDT NJS LL VSa CML PES AMe CMvD JGE AP CG AJ LG OTR CP BWJHP EdG JHS DIB NLP EI UT KS SR IP MIM APM.

References

  1. 1. Rose KM, Newman B, Mayer-Davis EJ, Selby JV (1998) Genetic and behavioural determinants of waist-hip ratio and waist circumference in women twins. Obes Res 6: 383–392. pmid:9845227
  2. 2. Poulsen P, Kyvik KO, Vaag A, Beck-Nielsen H (1999) Heritability of type II (non-insulin-dependent) diabetes mellitus and abnormal glucose tolerance—a population-based twin study. Diabetologia 42: 139–145. pmid:10064092
  3. 3. Poulsen P, Levin K, Petersen I, Christensen K, Beck-Nielsen H, et al. (2005) Heritability of insulin secretion, peripheral and hepatic insulin action, and intracellular glucose partitioning in young and old Danish twins. Diabetes 54: 275–283. pmid:15616039
  4. 4. Silventoinen K, Rokholm B, Kaprio J, Sørensen TI (2010) The genetic and environmental influences on childhood obesity: a systematic review of twin and adoption studies. Int J Obes 34: 29–40.
  5. 5. Van Dongen J, Willemsen G, Chen WW, de Geus EJ, Boomsma DI (2013) Heritability of metabolic syndrome traits in a large population-based sample. J Lipid Res 54: 2914–2923. pmid:23918046
  6. 6. American Diabetes Association (2003) The expert committee on the diagnosis and classification of diabetes mellitus: follow-up report on the diagnosis of diabetes mellitus. Diabetes Care 26: 3160–3167. pmid:14578255
  7. 7. Weyer C, Bogardus C, Mott DM, Pratley RE (1999) The natural history of insulin secretory dysfunction and insulin resistance in the pathogenesis of type 2 diabetes mellitus. J Clin Invest 104: 787–794. pmid:10491414
  8. 8. DeFronzo RA, Ferrannini E (1991) Insulin resistance: a multifaceted syndrome responsible for NIDDM, obesity, hypertension, dyslipidemia, and atherosclerotic cardiovascular disease. Diabetes Care 14: 173–194. pmid:2044434
  9. 9. Lewis CE, McTigue KM, Burke LE, Poirier P, Eckel RH, et al. (2009) Mortality, health outcomes, and body mass index in the overweight range: a science advisory from the American Heart Association. Circulation 119: 3263–3271. pmid:19506107
  10. 10. Pischon T, Boeing H, Hoffmann K, Bergmann M, Schulze MB, et al (2008) General and abdominal adiposity and risk of death in Europe. N Engl J Med 359: 2105–2120. pmid:19005195
  11. 11. Chambers JC, Elliott P, Zabaneh D, Zhang W, Li Y, et al. (2008) Common genetic variation near MC4R is associated with waist circumference and insulin resistance. Nat Genet 40: 716–718. pmid:18454146
  12. 12. Prokopenko I, Langenberg C, Florez JC, Saxena R, Soranzo N, et al. (2009) Variants in MTNR1B influence fasting glucose levels. Nat Genet 41: 77–81. pmid:19060907
  13. 13. Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, et al. (2009) Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet 41: 25–34. pmid:19079261
  14. 14. Lindgren CM, Heid IM, Randall JC, Lamina C, Steinthorsdottir V, et al. (2009) Genome-wide association scan meta-analysis identifies three loci influencing adiposity and fat distribution. PLoS Genet 5: e1000508. pmid:19557161
  15. 15. Cho YS, Go MJ, Kim YJ, Heo JY, Oh JH, et al. (2009) A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits. Nat Genet 41: 527–534. pmid:19396169
  16. 16. Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, et al. (2010) New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet 42: 105–116. pmid:20081858
  17. 17. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, et al. (2010) Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42: 937–948. pmid:20935630
  18. 18. Heid IM, Jackson AU, Randall JC, Winkler TW, Qi L, et al. (2010) Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat Genet 42: 949–960. pmid:20935629
  19. 19. Scott RA, Lagou V, Welch RP, Wheeler E, Montasser ME, et al. (2012) Large-scale association analyses identify new loci influencing glycaemic traits and provide insight into the underlying biological pathways. Nat Genet 44: 991–1005. pmid:22885924
  20. 20. Manning AK, Hivert MF, Scott RA, Grimsby JL, Bouatia-Naji N, et al. (2012) A genome-wide approach accounting for body-mass index identifies genetic variants influencing fasting glycaemic traits and insulin resistance. Nat Genet 44: 659–669. pmid:22581228
  21. 21. Okada Y, Kubo M, Ohmiya H, Takahashi A, Kumasaka N, et al. (2012) Common variants at CDKAL1 and KLF9 are associated with body mass index in east Asian populations. Nat Genet 44: 302–306. pmid:22344221
  22. 22. Wen W, Cho YS, Zheng W, Dorajoo R, Kato N, et al. (2012) Meta-analysis identifies common variants associated with body-mass index in east Asians. Nat Genet 44: 307–311. pmid:22344219
  23. 23. Ng MC, Hester JM, Wing MR, Li J, Xu J, et al. (2012) Genome-wide association of BMI in African Americans. Obesity 20: 622–627. pmid:21701570
  24. 24. Berndt SI, Gustafsson S, Mägi R, Ganna A, Wheeler E, et al. (2013) Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat Genet 45: 501–512. pmid:23563607
  25. 25. Monda KL, Chen GK, Taylor KC, Palmer C, Edwards TL, et al. (2013) A meta-analysis identifies new loci associated with body mass index in individuals of African ancestry. Nat Genet 45:690–696. pmid:23583978
  26. 26. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, et al. (2014) Genetic studies of body mass index yield new insights for obesity biology. Nature (in press).
  27. 27. Shungin D, Winkler TW, Croteau-Chonka DC, Ferreira T, Locke AE, et al. (2014) New genetic loci link adipose and insulin biology to body fat distribution. Nature (in press).
  28. 28. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11 499–511. pmid:20517342
  29. 29. The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861. pmid:17943122
  30. 30. The International HapMap Consortium (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52–58. pmid:20811451
  31. 31. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747–753. pmid:19812666
  32. 32. Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB (2010) Rare variants create synthetic genome-wide associations. PLoS Biol 26: e1000294.
  33. 33. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42: 565–569. pmid:20562875
  34. 34. Pritchard JK (2001) Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet 69: 124–137. pmid:11404818
  35. 35. Barrett JC, Cardon LR (2006) Evaluating coverage of genome-wide association studies. Nat Genet 38: 659–662. pmid:16715099
  36. 36. Anderson CA, Pettersson FH, Barrett JC, Zhuang JJ, Ragoussis J, et al. (2008) Evaluating the effects of imputation on the power, coverage and cost-efficiency of genome-wide SNP platforms. Am J Hum Genet 83: 112–119.
  37. 37. Jostins L, Morley KI, Barrett JC (2011) Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets. Eur J Hum Genet 19: 662–666. pmid:21364697
  38. 38. The 1000 Genomes Project Consortium (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65. pmid:23128226
  39. 39. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44: 955–959. pmid:22820512
  40. 40. Porcu E, Sanna S, Fuchsberger C, Fritsche LG (2013) Genotype imputation in genome-wide association studies. Curr Protoc Hum Genet: Chapter 1, Unit 1.25.
  41. 41. Duan Q, Liu EY, Croteau-Chonka DC, Mohlke KL, Li Y (2013) A comprehensive SNP and indel imputability database. Bioinformatics 29: 528–531.
  42. 42. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5: e1000529. pmid:19543373
  43. 43. Winkler TW, Day FR, Croteau-Chonka DC, Wood AR, Locke AE, et al (2014) Quality control and conduct of genome-wide association meta-analyses. Nat Protoc 9: 1192–1212. pmid:24762786
  44. 44. Yang J, Ferreira T, Morris AP, Medland SE; Genetic Investigation of ANthropometric Traits (GIANT) Consortium, et al. (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44: 369–375. pmid:22426310
  45. 45. Maller JB, McVean G, Byrnes J, Vukcevic D, Palin K, et al. (2012) Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat Genet 44: 1294–1301. pmid:23104008
  46. 46. Morris AP, Voight BF, Teslovich TM, Ferreira T, Segrè AV, et al. (2012) Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44: 981–990. pmid:22885922
  47. 47. Mahajan A, Sim X, Ng HJ, Manning A, Rivas MA, et al. (2014) Identification and functional characterization of G6PC2 coding variants influencing glycaemic traits define an effector transcript at the G6PC2-ABCB11 locus. PLoS Genet 11: e1004876.
  48. 48. Pasquali L, Gaulton KJ, Rodríguez-Seguí SA, Mularoni L, Miguel-Escalada I, et al. (2014) Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat Genet 46: 136–143. pmid:24413736
  49. 49. Huyghe JR, Jackson AU, Fogarty MP, Buchkovich ML, Stancakova A, et al. (2013) Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat Genet 45: 197–201. pmid:23263489
  50. 50. Peloso GM, Auer PL, Bis JC, Voorman A, Morrison AC, et al. (2014) Association of low-frequency and rare coding-sequence variants with blood lipids and coronary artery disease in 56,000 whites and blacks. Am J Hum Genet 94: 223–232. pmid:24507774
  51. 51. Holmen OL, Zhang H, Zhou W, Schmidt E, Hovelson DH, et al. (2014) No large-effect low-frequency coding variation found for myocardial infarction. Hum Mol Genet 23: 4721–4728. pmid:24728188
  52. 52. Moutsianas L, Morris AP (2014) Methodology for the analysis of rare genetic variation in genome-wide association and re-sequencing studies of complex human traits. Brief Funct Genomics 13: 362–370. pmid:24916163
  53. 53. Chen F, Klein AP, Klein BE, Lee KE, Truitt B, et al. (2014) Exome array analysis identifies CAV1/CAV2 as a susceptibility locus for intraocular pressure. Invest Opthalmol Vis Sci 56: 544–551.
  54. 54. Wessel J, Chu AY, Willems SM, Wang S, Yaghootkar H, et al. (2015) Low-frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility. Nat Comms 6: 5897.
  55. 55. Chen JA, Wang Q, Davis-Turak J, Li Y, Karydas AM, et al. (2015) A multiancestral genome-wide exome array study of Alzheimer disease, frontotemporal dementia, and progressive supranuclear palsy. JAMA Meurol (in press).
  56. 56. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55: 997–1004. pmid:11315092
  57. 57. Magi R, Morris AP (2010) GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics 11: 288. pmid:20509871
  58. 58. Ioannidis JP, Patsopoulos NA, Evangelou E (2007) Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE 2: e841. pmid:17786212
  59. 59. Wakefield JA (2007) Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet 81: 208–227. pmid:17668372
  60. 60. Ward LD, Kellis M (2012) HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucl Acids Res 40: D930–934. pmid:22064851
  61. 61. Kumar P, Henikoff S, Ng P (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4: 1073–1081. pmid:19561590
  62. 62. Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7: e46688. pmid:23056405
  63. 63. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249. pmid:20354512
  64. 64. The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74. pmid:22955616
  65. 65. Pasquali L, Gaulton KJ, Rodríguez-Seguí SA, Mularoni L, Miguel-Escalada I, et al. (2014) Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat Genet 46: 136–143. pmid:24413736
  66. 66. Mikkelsen TS, Xu Z, Zhang X, Wang L, Gimble JM, et al. (2010) Comparative epigenomic analysis of murine and human adipogenesis. Cell 143: 156–169. pmid:20887899
  67. 67. Li H, Durbin R. (2009) Fast and accurate short read alignment with Burrows—Wheeler transform. Bioinformatics 25: 1754–1760. pmid:19451168
  68. 68. Ernst J, Kellis M (2010) Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol 28: 817–825. pmid:20657582
  69. 69. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, et al. (2012) GENCODE: The reference human genome annotation for the ENCODE project. Genome Res 22: 1760–1774. pmid:22955987