Discovery and Fine-Mapping of Glycaemic and Obesity-Related Trait Loci Using High-Density Imputation

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

i.e. sample confidentiality. However, for 58BC, NFBC1966, PIVUS, Twingene and ULSAM, access to genotype and phenotype data can be applied for through the relevant data access committee. Contact details are listed below. For 58BC: http://www2.le.ac. uk/projects/birthcohort/1958bc/available-resources For NFBC1966: http://www.oulu.fi/nfbc/node/24677 For PIVUS: http://www.medsci.uu.se/pivus/ For Twingene: http://ki.se/en/research/the-swedish-twinregistry-1 For ULSAM: http://www2.pubcare.uu.se/ ULSAM/res/proposal.htm EGCUT studies were financed by University of Tartu (grant "Center of Translational Genomics"), by Estonian Goverment (grant #SF0180142s08), by EFSD grant "Genomic, metabolic and demographic characteristics of type 2 diabetes in the Estonian population" and by European Commission through the European Regional Development Fund in the previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.

Author Summary
Human genetic studies have demonstrated that quantitative human anthropometric and metabolic traits, including body mass index, waist-hip ratio, and plasma concentrations of glucose and insulin, are highly heritable, and are established risk factors for type 2 diabetes and cardiovascular diseases. Although many regions of the genome have been associated with these traits, the specific genes responsible have not yet been identified. By making use of advanced statistical "imputation" techniques applied to more than 87,000 individuals of European ancestry, and publicly available "reference panels" of more than 37 million genetic variants, we have been able to identify novel regions of the genome associated with these glycaemic and obesity-related traits and localise genes within these regions that are most likely to be causal. This improved understanding of the biological mechanisms underlying glycaemic and obesity-related traits is extremely important because it may advance drug development for downstream disease endpoints, ultimately leading to public health benefits.

Introduction
Quantitative human glycaemic and obesity-related traits, including fasting plasma glucose and insulin (FG and FI), body mass index (BMI), and waist-hip ratio (WHR) are highly heritable [1][2][3][4][5], and are well established risk factors for type 2 diabetes (T2D) and cardiovascular disease [6][7][8][9][10]. Large-scale genome-wide association studies (GWAS) have proved to be extremely successful in the identification of loci harbouring genetic variants contributing to these traits in multiple ethnic groups [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. This process has been facilitated by technical advances in the development of imputation methods [28] that allow evaluation of association with genetic variants not directly assayed on genotyping arrays, but present instead in more dense phased reference panels, such as those made available through the International HapMap Consortium [29,30]. However, the detected loci are typically characterised by common variant association signals, defined by lead SNPs with minor allele frequency (MAF) of at least 5%, which extend over large genomic intervals because of linkage disequilibrium (LD). They also often map to non-coding sequence, making direct biological interpretation of their effect more difficult than for non-synonymous variants. The lead SNPs at GWAS loci are overwhelmingly of modest effect, and together account for only a small proportion (generally less than 5%) of the overall trait variance [17][18][19]26,27]. As a consequence, there has been limited progress in identifying the genes through which GWAS association signals are mediated, and characterisation of the downstream molecular mechanisms influencing glycaemic and obesity-related traits remains a considerable challenge. There has been much recent debate as to the role that low frequency and rare variation (MAF<5%) might play in explaining the "missing heritability" of complex human traits [31][32][33]. It has been hypothesized that some of these variants will have larger effects on traits than common SNPs because they are likely to have arisen as a result of relatively recent mutation events, and thus will have been less subject to purifying selection [34]. Unfortunately, such variation is not well captured by traditional GWAS genotyping arrays, by design, even when supplemented by HapMap imputation [35][36][37]. However, more recent, higher density reference panels released by the 1000 Genomes (1000G) Project Consortium [38], constructed on the basis of low-pass whole-genome re-sequencing, provide haplotypes at more than 37 million variants for 1,094 individuals from multiple ethnic groups, and facilitate imputation of genetic variation with MAF as low as 0.5% across diverse populations [39][40][41].
Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we sought to assess the advantages and limitations of high-density imputation for the discovery and fine-mapping of loci for glycaemic and obesity-related traits. We considered 22 European ancestry GWAS (S1 Table), each imputed up to the 1000G "all ancestries" reference panel (Phase 1 interim release, June 2011), in up to (after quality control): 87,048 individuals for BMI; 54,572 individuals for WHR; 46,694 individuals for FG; and 24,245 individuals for FI (S2 and S3 Tables). To account for the impact of overall obesity on central adiposity [18,27] and insulin sensitivity [19], we considered WHR and FI after adjustment for BMI (denoted WHR adjBMI and FI adjBMI , respectively). With these high-density imputed data, we aimed to: (i) discover novel signals of association for glycaemic and obesity-related traits, including within established GWAS loci; (ii) evaluate the impact of low-frequency variation to common SNP GWAS signals; (iii) consider the contribution of genetic variants at GWAS loci in explaining trait variance; and (iv) refine the localisation of potential causal variants underlying GWAS association signals and assess the mechanisms through which they impact glycaemic and obesity-related traits.

Imputation quality
Within each study, we performed stringent quality control of the genotype scaffold before imputation, minimally including sample and variant call rate and deviation from Hardy-Weinberg equilibrium (S1 Table). Each scaffold was imputed up to the 1000G multi-ethnic reference panel (Phase 1 interim release, June 2011), which includes 762 European ancestry haplotypes, using IMPUTEv2 [42], minimac [39] or specialist in-house software (S1 Table). Making use of the multi-ethnic reference panel, including haplotypes from all ancestry groups, has been demonstrated to reduce error rates and to improve imputation quality, particularly of lower frequency variants [28]. Imputed variants were retained for downstream evaluation and association testing if they passed traditional GWAS quality control thresholds (IMPUTEv2 info score 0.4; minimac r 2 0.3) [43].
We considered the quality of imputation (as measured by the IMPUTEv2 info score) of variants from the 1000G reference panel in two contributing studies (S4 Table) passing quality control filters. The quality of imputation in NFBC1966 was comparable to that observed in 58BC-WTCCC: 99.7% of common SNPs (5.9 million) and 94.4% of low-frequency variants (3.7 million). However, amongst rarer variants, the quality of imputation was noticeably poorer in NFBC1966 (62.8%) than 58BC-WTCCC, presumably reflecting less representation of low-frequency haplotypes from the isolated Northern Finnish population in the 1000G reference panel.
We have demonstrated that high-density imputation provides >90% coverage of low-frequency variants present in the 1000G reference panel in two diverse European ancestry populations. Our study thus enables association testing with more than three million high-quality variants with 0.5%MAF<5% that would not have been directly interrogated in previous GWAS of glycaemic and obesity-related traits that have been supplemented by HapMap imputation alone. With the sample sizes available in this study, we have estimated that for any of these variants explaining at least 0.2% of the overall trait variance (i.e. effect size of 0.32 SD units for 1% MAF, and effect size of 0.15 SD units for 5% MAF), we have >99.9% power to detect their association with BMI, WHR, and FG, and >93.9% power to detect their association with FI.

Discovery of novel loci and new lead SNPs
Within each study, we tested for association of each directly typed and well imputed variant with BMI, WHR adjBMI , FG and FI adjBMI , separately in males and females, in a linear regression modelling framework (Methods, S2 and S3 Tables). Association summary statistics were then combined across studies in sex-specific and sex-combined fixed-effects meta-analyses for each trait. Variants passing quality control in fewer than 50% of the contributing studies for each trait were excluded from the meta-analysis. Association signals at genome-wide significance (p<5x10 -8 ) and with lead SNPs independent (r 2 <0.05) and mapping more than 2Mb from those previously reported for the traits were considered novel. By convention, loci were labelled with the name(s) of the gene(s) located closest to the lead SNP, unless more compelling biological candidates mapped nearby (Table 1, S1, S2, S3 and S4 Figs).
We identified two novel loci achieving genome-wide significance for BMI in the sex-combined meta-analysis: ATP2B1 (rs1966714, MAF = 0.46, p = 1.9x10 -8 ); and AKAP6 (rs12885467, MAF = 0.49, p = 4.5x10 -8 ). For FG, we detected one novel locus in the sex-combined metaanalysis at RMST (rs17331697, MAF = 0.10, p = 1.3x10 -11 ) and a female-specific association at EMID2 (rs6947345, MAF = 0.017, p MALE = 0.50, p FEMALE = 3.8x10 -8 ). We did not identify any novel loci at genome-wide significance, in either sex-combined or sex-specific analyses, for WHR adjBMI or FI adjBMI . We observed no evidence of heterogeneity in sex-specific allelic effects across studies at the lead SNPs at the novel loci (Table 1). With the exception of the sex-specific association signal at EMID2, the lead SNPs at all other novel loci were common.
At AKAP6 and RMST, the common lead SNPs were present in HapMap (S5 Fig) but did not achieve genome-wide significance in large-scale European ancestry HapMap imputed metaanalyses conducted by the GIANT Consortium [17] (for BMI in up to 123,865 individuals) and the MAGIC Investigators [16] (for FG in up to 46,186 individuals), despite substantial overlap with cohorts contributing to our study. We have estimated that, amongst individuals contributing to our 1000G imputed meta-analyses for BMI/FG, a maximum of 59%/37% also participated in the previous GIANT and MAGIC studies (S5 Table). At RMST, our lead FG SNP approaches genome-wide significance in the MAGIC meta-analysis (p = 6.5x10 -6 ), and this likely reflects stochastic variation. However, at AKAP6, our lead BMI SNP demonstrates only nominal evidence of association (p = 0.012) in the GIANT meta-analysis, suggesting that 1000G reference panels have enabled higher quality imputation at this locus. To investigate this assertion further, we compared the quality of imputation of the lead BMI SNP using Hap-Map and 1000G reference panels in two contributing studies of diverse European ancestry. In 58BC-WTCCC/NFBC1966, there was a marginal improvement in the IMPUTEv2 info score from 0.972/0.939 using reference haplotypes from CEU HapMap to 0.996/0.971 using those from 1000G. At ATP2B1, the common lead SNP was not present in HapMap (S5 Fig). The lead SNP for BMI from the GIANT HapMap imputed meta-analysis [17] was rs2579106, achieving nominal evidence for association (p = 6.4x10 -5 ) in a reported sample size of 123,864 individuals. This SNP reached near genome-wide significance in our 1000G imputed meta-analysis, despite the smaller sample size (p = 3.3x10 -7 , in 86,955 individuals). Furthermore, the HapMap and 1000G lead SNPs are in only modest LD with each other (EUR r 2 = 0.22). Taken together, these data suggest that the discovery of this novel locus has been due to improved coverage through 1000G imputation, despite the lead SNP being common.
We observed genome-wide significant evidence of association at 34 established loci for glycaemic and obesity-related traits, including GCKR with the same lead SNP for both FG and FI (S6 Table). At 29 of these loci, our meta-analysis identified lead SNPs that were different from previous reports in which they were first discovered, of which 23 were not present in HapMap (S7 Table). At 18 of these 29 loci, the new lead SNP was in strong LD (r 2 0.8) with that previously reported, and consequently both variants had similar MAF and allelic effect size (S6 for BMI (r 2 = 0.10) and RSPO3 for WHR adjBMI (r 2 = 0.04). At both loci, multiple distinct signals of association have been recently reported by the GIANT Consortium in the largest meta-analyses of BMI and WHR adjBMI in European ancestry individuals genotyped with GWAS arrays, supplemented by imputation up to reference panels from the International HapMap Consortium [29,30], and the Metabochip, in up to 339,224 and 224,459 individuals, respectively [26,27]. At BDNF, our new lead SNP (rs4517468) was in moderate LD (r 2 = 0.31) with the index variant (rs10835210) for the GIANT secondary signal of association for BMI at this locus, suggesting that they represent the same underlying effect on obesity.
At established loci, amongst the 29 lead SNPs identified in our 1000G imputed meta-analysis that were different from the previous reports in which they were discovered, five of them are present on the Metabochip: NRXN3 (BMI, rs7141420), SH2B1 (BMI, rs2008514), MC4R (BMI, rs663129), LY86 (WHR adjBMI , rs1294437), and GCKR (FG/FI adjBMI , rs1260326). These variants were thus directly interrogated in the largest European ancestry meta-analyses, to date, of glycaemic and obesity related traits from the GIANT Consortium [26,27] and MAGIC Investigators [19] that made use of this array. At all five of these loci, our new lead SNP is either the same or is in strong LD (EUR r 2 >0.75) with that reported in the trait-equivalent Metabochip effort. Four of these loci (all except NRXN3) were densely typed as "fine-mapping" intervals on the array, providing evidence that 1000G imputation has been successful at predicting genotypes at untyped variants in these regions, even though the GWAS scaffolds used in our investigation were comparatively sparse.

Multiple distinct association signals
We investigated the evidence for multiple distinct association signals in the glycaemic and obesity-related trait loci achieving genome-wide significance in our study (four novel and 34 established) ( Table 1 and S6 Table). We undertook approximate conditional analyses, implemented in GCTA [44], to select index SNPs for distinct association signals achieving "locus-wide" significance (p COND <10 −5 ) to reflect the number of uncorrelated variants in a 2Mb window flanking the lead SNP (Methods). We made use of summary statistics from the meta-analysis and genotypes from 58BC-WTCCC and NFBC1966 to approximate the LD between genetic variants (directly typed and well imputed) and hence the correlation in parameter estimates in the joint association model. Reassuringly, the index SNPs and association summary statistics (effect sizes and p-values) from the joint model were highly concordant for both reference studies (S8 Table). Finally, we confirmed these GCTA association signals through exact reciprocal conditional analyses by adjustment for genotypes at each index SNP as a covariate in the linear regression model (Methods, Fig 1, Table 2).
We identified two distinct signals of association for WHR adjBMI mapping to the RSPO3 locus, indexed by rs72959041 (MAF = 0.079, p COND = 2.5x10 -10 ) and rs4509142 (MAF = 0.49, p COND = 5.8x10 -6 ), corresponding to our new lead SNP and that previously reported [18], respectively. More recently, both signals have also been reported by large-scale meta-analyses undertaken by the GIANT Consortium [27]. Our new lead SNP (rs72959041) was reported as the index variant for their secondary association signal at this locus, whilst the index variant for our secondary signal of association (rs4509142) was in strong LD with their lead SNP (rs1936805, r 2 = 0.67). The GIANT Consortium also identified a third distinct signal of association at this locus, stronger in females than in males, which was not detected in our conditional analyses, and presumably reflects reduced power due to our smaller sample size. We also identified two distinct signals of association for FG each mapping to GCK (rs878521, MAF = 0.21, p COND = 1.3x10 -18 ; rs10259649, MAF = 0.27, p COND = 4.6x10 -10 ) and G6PC2 (rs560887, MAF = 0.31, p COND = 2.2x10 -66 ; rs138726309, MAF = 0.015, p COND = 5.7x10 -23 ). None of the index variants for these distinct association signals was present in HapMap (S8 Fig), and only rs10259649 in GCK was well represented by a tag in that reference panel (rs2908292, r 2 = 1.00).

Trait variance explained by novel loci and new lead SNPs
We evaluated the additional heritability of glycaemic and obesity-related traits explained by lead SNPs at novel and established loci after 1000G imputation in 5,276 individuals from NFBC1966 (Methods). For each trait, we calculated the phenotypic variance accounted for by: (i) previously reported lead SNPs at established loci; and (ii) new lead SNPs and index variants for distinct association signals at novel and established loci from the present study. The greatest increment in variance explained was observed for FG, where the novel loci and new lead SNPs after 1000G imputation together account for an increase from 1.9% to 2.3%. We also observed noticeable increments in variance explained after 1000G imputation for WHR adjBMI (from 1.1% to 1.3%) and BMI (3.2% to 3.5%). However, for FI adjBMI , only one new lead SNP at an established locus was identified after 1000G imputation, providing a negligible improvement in variance explained (from 0.46% to 0.47%).

Fine-mapping of novel and established GWAS loci
We sought to take advantage of the improved coverage of common and low-frequency variation offered by 1000G imputation to localise potential causal variants (MAF0.5%) for the 42 distinct association signals achieving locus-wide significance in our conditional meta-analyses (two distinct signals of association each at RSPO3, GCK, and G6PC2, one signal of association Regional plots of multiple distinct signals at WHR adjBMI locus RSPO3 (A), FG loci G6PC2 (B) and GCK (C). Regional plots for each locus are displayed from: the unconditional meta-analysis (left); the exact conditional meta-analysis for the primary signal after adjustment for the index variant for the secondary signal (middle); and the exact conditional meta-analysis for the secondary signal after adjustment for the index variant for the primary signal (right). The sample sizes vary due to the availability of the well imputed index SNPs of the primary and secondary signals. Directly genotyped or imputed SNPs are plotted with their association P values (on a -log 10 scale) as a function of genomic position (NCBI Build 37). Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their correlated proxies (according to a blue to red scale from r 2 = 0 to 1, based on pairwise EUR r 2 values from the 1000 Genomes June 2011 release). SNP annotations are as follows: circles, no annotation; downward triangles, for both FG and FI adjBMI at the GCKR locus, and one signal of association at each of the other 34 novel and established loci). For each distinct signal, we constructed 99% credible sets of variants [45] that together account for 99% probability of driving the association on the basis of the (conditional) meta-analysis (Methods, S9 Table). At the 29 established loci where we identified a new lead SNP after 1000G imputation, the posterior probability of driving the association signal was consistently higher than that for the variant previously reported (S9 Fig). The greatest increases in posterior probability were observed at: GCKR (FG/FI adjBMI , increase from 2.6%/1.8% to 93.5%/89.6%); RSPO3 (WHR adjBMI , increase from 0.4% to 78.6%); PROX1 (FG, increase from 13.2% to 76.9%); and NRXN3 (BMI, increase from 2.5% to 62.2%).
Credible sets are well calibrated for common and low-frequency variants provided that imputation and meta-analysis provides complete coverage of variation with MAF0.5% at each locus. Smaller credible sets, in terms of the number of variants they contain, thus correspond to fine-mapping at higher resolution. We considered 99% credible sets containing fewer than 20 variants to be "tractable", and amenable to follow-up through additional analyses of functional and regulatory annotation ( Table 3, S10 Table). The most precise localisation was observed for FG loci including: MTNR1B (rs10830963 accounts for more than 99.9% of the probability of driving the association); both distinct signals at G6PC2 (two variants each, mapping to <15kb interval); and one signal at GCK (indexed by rs878521, mapping to <25kb interval). Of the 127 variants reported in these tractable credible sets, 74 (58.3%) were not present in HapMap, and accounted for 42.4% of the probability of driving the association signals. None of the HapMap variants in the tractable credible sets was of low-frequency, compared to 20.8% of those present only in 1000G (S11 Table).
The tractable credible sets included coding variants at just three loci implicated in FG: GCKR, SLC30A8, and the low-frequency association signal at G6PC2. The lead SNP mapping to GCKR (rs1260326) was the common coding variant L446P, which accounts for 93.5% of the probability of driving the FG association signal, and was present in HapMap. At the SLC30A8 locus, the probability of driving the association for FG was shared between 7 SNPs, in strong LD with each other, and including the coding variant R325W. This variant was present in Hap-Map, and was sufficient to explain the association signal of the lead non-coding SNP for FG in conditional analysis (rs11558471, p = 3.2x10 -10 , p COND = 0.052) at the locus. SLC30A8 R325W is also the lead SNP for T2D susceptibility at this locus in published European ancestry metaanalyses from the DIAGRAM Consortium [46]. Finally, the low-frequency index SNP for the nonsynonymous; squares, coding or 3 0 UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals).
doi:10.1371/journal.pgen.1005230.g001 Table 2. Loci with multiple distinct signals of association with glycaemic and obesity-related traits achieving "locus-wide" significance in conditional analysis (p COND <10 −5 ). secondary association signal mapping to G6PC2 (rs138726309, MAF = 0.015) was the coding variant H177Y, which accounts for 11.2% of the posterior probability of causality at this locus. For this association signal, none of the variants in the 99% credible set was present in HapMap, and thus would have been overlooked without 1000G imputation. This coding variant has recently been implicated in FG homeostasis in a meta-analysis of 33,407 non-diabetic individuals of European ancestry, genotyped with the Illumina exome array, and in agreement with our study, demonstrates a stronger signal of association in conditional analysis after accounting for the lead SNP at the G6PC2 locus [47]. The remaining variants in the tractable credible sets mapped to non-coding sequence. To gain insight into potential regulatory mechanisms through which these variants might impact glycaemic and obesity-related traits, we overlaid each of these credible sets, in turn, with chromatin state calls from eleven cell lines and tissues (Methods). Across all traits, 99% credible set variants were enriched for overlap with enhancer elements (Fig 2). Focussing on FG, variants within the 99% credible set showed significant enrichment (p<2.4x10 -3 ) for active promoter and transcription factor binding site annotations compared to all others (respectively: 3.8-fold, Fisher's combined p = 9.4x10 -5 ; and 7.2-fold, Fisher's combined p = 2.1x10 -13 ). Over cell types, this enrichment was most prominent in pancreatic islets (Fig 2). More than half of islet-annotated variants are not present in HapMap, and this would not have been observed without 1000G imputation. For example, at the novel FG RMST locus, 11 of the 14 variants in the 99% credible set are not present in HapMap, but all overlap active islet chromatin marks (S10 Fig).

Discussion
Through meta-analysis of 1000G imputed GWAS of glycaemic and obesity-related traits, we have identified two novel loci for BMI at genome-wide significance, and two for FG (including Table 3. Association signals for glycaemic and obesity-related traits for which the 99% credible sets contain no more than 20 variants.   one low-frequency variant association signal that is specific to females). These loci were not reported in larger meta-analysis efforts of European ancestry undertaken by the GIANT Consortium (for BMI) and the MAGIC Investigators (for FG), despite the partial overlap of contributing studies [16][17][18][19]26,27]. Improved coverage and quality of imputation for common and low-frequency variation using 1000G reference panels has increased power. We also reported new lead SNPs at 29 established glycaemic and obesity-related trait loci achieving genomewide significance in our meta-analyses, of which 23 were not present in HapMap, and identified multiple distinct signals of association for WHR adjBMI at RSPO3 and for FG at GCK and G6PC2. Taken together, these novel loci, distinct association signals, and new lead SNPs have increased the trait variance explained for glycaemic and obesity-related traits, although the majority of the heritability remains unaccounted for. Despite more than 90% coverage of low-frequency variation after 1000G imputation, in diverse European ancestry populations, and equivalent power to detect association across the allele frequency spectrum for a fixed proportion of trait variance explained, the new lead SNPs at established and novel GWAS loci are predominantly common. These data argue strongly against the "synthetic association" hypothesis, which posits that common lead SNPs at GWAS loci will often reflect unobserved causal variants of lower frequency and greater effect size [32]. We recognise that our study has insufficient power to detect common or low-frequency association signals of more modest effect (S12 Table). For example, we estimated that the power to detect association in this study, at genome-wide significance, of a variant of 1% MAF, explaining 0.05% of the overall trait variance (effect size of 0.16 SD units), was 88.0% for BMI, but just 42.1% for WHR adjBMI , 27.7% for FG, and only 2.6% for FI adjBMI . Furthermore, the contribution of rare variants to glycaemic and obesity-related traits cannot be directly investigated with these data because of the low quality imputation for MAF<0.5%, but will require interrogation through deep whole-genome re-sequencing studies in large sample sizes.
We have demonstrated that integration of 1000G imputation, genetic fine-mapping, and genomic annotation, facilitates fine-mapping of GWAS loci for glycaemic and obesity-related traits, and has provided insight into potential functional and regulatory mechanisms through which the effects of these association signals are mediated. In particular, variants in the 99% credible set for the low-frequency association signal mapping to G6PC2 are completely absent from HapMap, but include H177Y. The glucose lowering allele at this variant has been demonstrated to result in a significant decrease in protein expression mediated through proteasomal degradation, leading to a loss of G6PC2 function [47]. We also demonstrated enrichment for overlap of functional elements with variants in the tractable credible sets mapping to non-coding sequence, in particular enhancers. For FG, additional enrichment was observed across credible set variants mapping to promoter and transcription factor binding sites in pancreatic islets, in particular. Uncovering these types of enrichment is essential for prioritisation of variants for functional follow-up, and can be incorporated in statistical models to elucidate causal alleles. Also, at the level of an individual locus, functional annotation can help point to the underlying molecular mechanism through which the GWAS signal is mediated. At G6PC2, for example, the lead SNP, rs560887, in the 99% credible set for the second distinct (non-coding) association signal at this locus (79.5% posterior probability) maps to an enhancer region that is active in pancreatic islets and embryonic stem cells, but repressed in most other cell types. These observations are in agreement with recent reports of clustering of T2D-associated risk variants in islet enhancers [48] and highlights a potential mechanism through which GWAS loci impact glucose homeostasis and disease risk.
Despite the success of traditional GWAS genotyping arrays for the discovery of common variant association signals for complex human traits, because of the structure of LD for variation with MAF>5%, the gold standard approach to directly interrogating lower frequency variation is through re-sequencing studies. However, in agreement with recently published investigations of the contribution of low-frequency variants to a range of phenotypes [47,[49][50][51], our study highlights that effect sizes are modest, and require sample sizes for detection that are financially infeasible through re-sequencing on the scale of the whole genome (or exome). We have demonstrated, in this study, that imputation of existing GWAS scaffolds up to reference panels from the 1000 Genomes Project Consortium [38] enables imputation of more than 90% of low-frequency variants in diverse European populations, at no additional cost other than computation and analyst time. Future GWAS of complex traits in European ancestry populations will be further enhanced by the Haplotype Reference Consortium (www. haplotype-reference-consortium.org). This effort will create a reference panel of more than 60,000 haplotypes from re-sequencing of multiple cohorts, predominantly of European ancestry, enabling high-quality imputation to lower allele frequencies. Phase 3 of the 1000 Genomes Project includes haplotypes from diverse populations from each the five major global ethnicities, and thus would be expected to improve imputation quality over Phase 1 for low-frequency variants in East Asian, South Asian, African and American ancestry groups. The viability of imputation as an approach to recover genotypes at low-frequency variants in GWAS undertaken in populations that are not well represented by the 1000 Genomes Project might require whole-genome re-sequencing of some individuals from the study, in combination with haplotypes from the existing reference panel.
Irrespective of the population under investigation, our study suggests that imputation is unlikely to provide sufficient coverage of variation with MAF<0.5% to enable gene-based testing of rare variants [52]. Imputation is restricted to those rare variants that are present in the reference panel, which are much more likely to be population specific. Furthermore, imputation of rare variants that are present in the reference panel is generally poor, although it is not clear how well calibrated the traditional metrics of quality (such as IMPUTEv2 info score) will be. Thorough investigation of the impact of rare variation on phenotype will thus require resequencing, although some success in discovering rare coding variants associated with complex human traits has been achieved through exome array genotyping [47,[53][54][55]. For the time being, arrays that combine an imputation scaffold with direct interrogation of rare coding variation likely offer the most cost-effective approach to assaying variants across the frequency spectrum.
In conclusion, our study has enabled discovery and fine-mapping of novel and established association signals for glycaemic and obesity-related traits, and through integration with genomic data from relevant tissues, has highlighted functional and regulatory processes through which these effects are mediated. Improved understanding of the biological basis of the quantitative human anthropometric and metabolic traits may advance our appreciation of the mechanisms underlying downstream disease endpoints, including T2D and cardiovascular diseases, ultimately leading to personalised treatment approaches, therapeutic development and public health benefits.

Ethics statement
All human research was approved by the relevant institutional review boards, and conducted according to the Declaration of Helsinki. All participants provided written informed consent.

Studies and samples
We considered 22 population-based and case-control GWAS of European ancestry in up to (after quality control): 87,048 individuals for BMI; 54,572 individuals for WHR adjBMI ; 46,694 individuals for FG; and 24,245 individuals for FI adjBMI . Samples were limited to individuals of at least 18 years of age. Case-control studies were stratified by disease status, with each stratum analysed separately. Full details of study and sample characteristics are provided in S1 Table. Samples were genotyped with a variety of GWAS arrays. Sample and SNP quality control was undertaken within each study. Sample quality control included exclusions on the basis of genome-wide call rate, extreme heterozygosity, sex discordance, cryptic relatedness, and outlying ethnicity. SNP quality control included exclusions on the basis of call rate across samples and extreme deviation from Hardy-Weinberg equilibrium. Non-autosomal SNPs were excluded from imputation and association analysis. SNPs with MAF<1% were also excluded from the genotype scaffold prior to imputation. Full details of the genotyping arrays and quality control protocols employed by each study are summarised in S1 Table. Imputation Within each study, the autosomal GWAS genotype scaffold was imputed up to the 1000 Genomes Project multi-ethnic reference panel (Phase I interim release, June 2011), which was the most up to date available at the time analyses were undertaken. Imputation was performed using IMPUTEv2 [42], minimac [39] or specialist in-house software. Poorly imputed variants (IMPUTE info<0.4; minimacr 2 < 0:3) [43], and those with minor allele count of less than three (under a dosage model) were excluded from downstream association analyses.

Trait transformations and study-level association analyses
We utilised protocols for obesity-related and glycaemic trait transformations developed by the GIANT Consortium [17,18] and MAGIC Investigators [19]. Full details of trait transformations, trait summary statistics and study-specific covariates are presented in S2 and S3 Tables. BMI was calculated as the ratio of weight (kg) to squared height (m 2 ). BMI was inverse normal transformed separately in males and females. Association of the transformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele after adjustment for age, age 2 and study-specific covariates, separately in males and females.
WHR was calculated as the ratio of waist circumference (m) to hip circumference (m). Residuals were obtained after adjustment for age, age 2 , BMI, and study-specific covariates, separately in males and females, and were subsequently inverse-rank normalised. Association of the transformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele, separately in males and females.
FG was measured in mmol/L. Individuals with a diagnosis of diabetes (type 1 or type 2), diabetes treatment, and/or FG7mmol/L, non-fasting state, or pregnancy were excluded. Individuals from case cohorts (with diseases such as stroke and cardiovascular disease) were also excluded if they had undergone hospitalization or blood transfusion in the 2-3 months before measurements were taken. Association of the untransformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele after adjustment for age, age 2 and study-specific covariates, separately in males and females. FI was measured in pmol/L with subsequent natural log transformation. Individuals with a diagnosis of diabetes (type 1 or type 2), diabetes treatment, and/or FG7mmol/L, non-fasting state, or pregnancy were excluded. Individuals from case cohorts (with diseases such as stroke and cardiovascular disease) were also excluded if they had undergone hospitalization or blood transfusion in the 2-3 months before measurements were taken. Association of the transformed trait with each variant passing quality control was tested in a linear regression framework under an additive model in the dosage of the minor allele after adjustment for age, age 2 , BMI and study-specific covariates, separately in males and females.

Meta-analysis
Summary statistics from association testing of variants passing quality control, separately in males and females, were corrected in each study for residual population structure through genomic control [56] where necessary (S2 and S3 Tables). Subsequently, association summary statistics were combined across studies in sex-specific and sex-combined fixed-effects metaanalyses (inverse-variance weighting) for each trait, as implemented in GWAMA [57]. Heterogeneity in allelic effects between males and females for each trait at each variant was assessed by means of an implementation of Cochran's Q-statistic [58] in GWAMA [57]. Variants passing quality control in fewer than 50% of the contributing studies for each trait were excluded from the meta-analysis. After filtering, the total numbers of variants reported for each trait were: 9,953,165 for BMI; 9,954,794 for WHR adjBMI ; 9,967,162 for FG; and 9,837,044 for FI adjBMI . Sex-specific or sex-combined p<5x10 -8 was considered genome-wide significant for each trait. Associated loci are referred to by the name(s) of the nearest gene(s) to lead SNP, unless there are more biologically plausible candidates mapping nearby.

Approximate conditional analysis
We performed approximate conditioning in established and novel glycaemic and obesityrelated trait loci in GCTA [44] on the basis of association summary statistics from the sex-combined meta-analyses after variant filtering. We utilised genotype data from two reference studies to approximate LD between variants in diverse European populations, and hence correlation between parameter estimates in the GCTA-COJO joint regression model: 58BC-WTCCC (2,802 individuals from Great Britain); and NFBC1966 (5,276 individuals from Lapland and the Province of Oulu in Northern Finland). We identified "index" variants to represent each distinct association signal achieving genome-wide significance (p<5x10 -8 ) in the GCTA-COJO joint regression model for further validation.

Exact conditional analysis
We performed exact conditional analysis for each locus identified with multiple distinct association signals in GCTA using imputed data from all contributing studies except Rotterdam Study 1 (5,745 individuals). Within each study, we tested for association in the same linear regression framework utilised for unconditional analysis, separately in males and females, but included genotypes at each GCTA index SNP identified at the locus, in turn, as an additional covariate in the model. At each established glycaemic and obesity-related trait locus, we also performed conditioning on the previously reported lead SNP if it differed from that reported in our unconditional meta-analysis. Subsequently, association summary statistics for each signal were combined across studies in sex-specific and sex-combined fixed-effects meta-analyses (inverse-variance weighting) for each trait, as implemented in GWAMA [57].

Trait variance explained
We estimated the variance explained for each trait using genotype data from NFBC1966 (5,276 individuals) in a multiple linear regression framework. For each trait, we considered two sets of variants: (i) previously reported lead SNPs for established loci; and (ii) new lead SNPs and index variants for multiple distinct association signals in established and novel loci. We tested for association of the trait: (i) with covariates only; and (ii) with covariates and the dosage of the minor allele at each variant. For each set of variants, the trait variance explained was given by the difference in the coefficient of determination (r 2 ) between these two regression models.

Credible set construction
For each distinct signal for each trait, we calculated the posterior probability of driving the association for the jth variant, π Cj , given by where the summation is over all variants reported in the (conditional) meta-analysis across the locus. In this expression, Λ j is the approximate Bayes' factor [59] for the jth variant, given by where β j and V j denote the allelic effect and corresponding variance from the (conditional) meta-analysis for the association signal. The parameter ω denotes the prior variance in allelic effects, taken here to be 0.04 [59]. A 99% credible set was then constructed by: (i) ranking all variants in the locus according to their Bayes' factor, Λ j ; and (ii) including ranked variants until their cumulative posterior probability exceeds 0.99.
Finally, we used transcript information from GENCODEv14 [69] to define protein-coding genes, 5' and 3' UTR regions, and non-coding genes. For transcripts to be classified as proteincoding, the 'protein-coding' tag needed to be set and further filtering for either presence in the conserved coding DNA sequence (CCDS) database or experimentally confirmed mRNA start and end was applied. From this set of transcripts, 5' UTR, exon, and 3' UTR regions were defined. For non-coding genes, transcripts labelled as 'lncRNA', 'miRNA', 'snoRNA' or 'snRNA' were used as non-coding genes.
Overlap between the annotations described above and variants in tractable credible sets was determined using bedtools v2.17.0. We defined seven broad functional classes from these annotation data: coding (protein-coding transcripts); ncRNA (non-coding RNA transcripts); UTR (3' and 5' UTR regions of coding transcripts); enhancers (strong and weak enhancer elements); promoters (active and poised promoter elements); insulators; and TFBS (sites pooled across all factors). We further used each of the cell line annotations as a distinct category. Each variant was allowed to overlap multiple annotation categories.
For each broad functional class, Fisher's exact test as implemented in R v3.0.1 (with alternative = "greater") was used to compare whether the set of credible variants showed a higher fold overlap of this annotation versus all of the others independently. The six resulting p-values for each class were then combined using Fisher's method. With 21 different functional class and trait combinations, a Bonferroni adjusted significance threshold (p<2.4x10 -3 ) was used. Directly genotyped or imputed SNPs are plotted with their meta-analysis P values (as -log 10 values) as a function of genomic position (NCBI Build 37). In each panel, the lead SNP from the meta-analysis is represented by a purple circle. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their correlated proxies (according to a blue to red scale from r 2 = 0 to 1, based on pairwise EUR r 2 values from the 1000 Genomes June 2011 release). Gene annotations were taken from the UCSC genome browser. SNP annotations are as follows: circles, no annotation; downward triangles, nonsynonymous; squares, coding or 3 0 UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals). (TIFF) In both plots, the lead SNP in HapMap panel is represented by a purple circle. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their proxies (according to a blue to red scale from r 2 = 0 to 1, based on pairwise r 2 values from the 1000 Genomes June 2011 release EUR). SNP annotations are as follows: circles, no annotation; downward triangles, nonsynonymous; squares, coding or 3 0 UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals). (PDF) with their conditional meta-analysis P values (as -log 10 values) as a function of genomic position (NCBI Build 37) after adjustment for the other index SNP at the locus. In each plot, the lead SNP present in HapMap is represented by a purple circle. Estimated recombination rates are plotted to reflect the local LD structure around the associated SNPs and their proxies (according to a blue to red scale from r 2 = 0 to 1, based on pairwise r 2 values from the 1000 Genomes June 2011 release EUR). SNP annotations are as follows: circles, no annotation; downward triangles, nonsynonymous; squares, coding or 3 0 UTR; asterisks, TFBScons (in a conserved region predicted to be a transcription factor binding site); squares with an X, MCS44 placental (in a region highly conserved in placental mammals). (PDF)   Table. Variants of 99% credible sets containing less than 20 variants driving distinct association signals for BMI, WHR adjBMI, FG and FI adjBMI . (PDF) S11 Table. Allele frequency distribution of 99% credible sets with less than 20 variants. (PDF) S12 Table. Power to detect association, at genome-wide significance (p<5x10 -8 ), with a variant of MAF 1% in the current study.