Novel loci for childhood body mass index and shared heritability with adult cardiometabolic traits

The genetic background of childhood body mass index (BMI), and the extent to which the well-known associations of childhood BMI with adult diseases are explained by shared genetic factors, are largely unknown. We performed a genome-wide association study meta-analysis of BMI in 61,111 children aged between 2 and 10 years. Twenty-five independent loci reached genome-wide significance in the combined discovery and replication analyses. Two of these, located near NEDD4L and SLC45A3, have not previously been reported in relation to either childhood or adult BMI. Positive genetic correlations of childhood BMI with birth weight and adult BMI, waist-to-hip ratio, diastolic blood pressure and type 2 diabetes were detected (Rg ranging from 0.11 to 0.76, P-values <0.002). A negative genetic correlation of childhood BMI with age at menarche was observed. Our results suggest that the biological processes underlying childhood BMI largely, but not completely, overlap with those underlying adult BMI. The well-known observational associations of BMI in childhood with cardio-metabolic diseases in adulthood may reflect partial genetic overlap, but in light of previous evidence, it is also likely that they are explained through phenotypic continuity of BMI from childhood into adulthood.


Abstract
The genetic background of childhood body mass index (BMI), and the extent to which the well-known associations of childhood BMI with adult diseases are explained by shared genetic factors, are largely unknown. We performed a genome-wide association study meta-analysis of BMI in 61,111 children aged between 2 and 10 years. Twenty-five independent loci reached genome-wide significance in the combined discovery and replication analyses. Two of these, located near NEDD4L and SLC45A3, have not previously been reported in relation to either childhood or adult BMI. Positive genetic correlations of childhood BMI with birth weight and adult BMI, waist-to-hip ratio, diastolic blood pressure and type 2 diabetes were detected (R g ranging from 0.11 to 0.76, P-values <0.002). A negative genetic correlation of childhood BMI with age at menarche was observed. Our results suggest that the biological processes underlying childhood BMI largely, but not completely, overlap with those underlying adult BMI. The well-known observational associations of BMI in childhood with cardio-metabolic diseases in adulthood may reflect partial genetic overlap, but in light of previous evidence, it is also likely that they are explained through phenotypic continuity of BMI from childhood into adulthood.

Author summary
Although twin studies have shown that body mass index (BMI) is highly heritable, many common genetic variants involved in the development of BMI have not yet been identified, especially in children. We studied associations of more than 40 million genetic variants with childhood BMI in 61,111 children aged between 2 and 10 years. We identified 25 genetic variants that were associated with childhood BMI. Two of these have not been implicated for BMI previously, located close to the genes NEDD4L and SLC45A3. We also show that the genetic background of childhood BMI overlaps with that of birth weight, adult BMI, waist-to-hip-ratio, diastolic blood pressure, type 2 diabetes, and age at menarche. Our results suggest that the biological processes underlying childhood BMI largely overlap with those underlying adult BMI. However, the overlap is not complete.
Additionally, the genetic backgrounds of childhood BMI and other cardio-metabolic phenotypes are overlapping. This may mean that the associations of childhood BMI and later cardio-metabolic outcomes are partially explained by shared genetics, but it could also be explained by the strong association of childhood BMI with adult BMI.

Introduction
Childhood obesity is a major public health problem with impact on health in both the short and the long term [1]. Besides the well-established lifestyle and behavioral factors, genetics influence the risk of obesity, with reported heritability estimates from twin studies for body mass index (BMI) ranging from 40 to 70% [2,3]. An estimated 17 to 27% seems to be explained by common variants [4][5][6]. Large genome-wide association studies (GWAS) have identified 941 loci associated with adult BMI, accounting for 5% of the phenotypic variation [7]. Less is known about the genetic background of childhood BMI. A previous GWAS of BMI among 35,668 children identified 15 associated loci, accounting for 2% of the phenotypic variance [8]. Of these loci, 12 were also associated with adult BMI [9,10]. The remaining 3 identified genetic loci, specifically associated with childhood BMI, suggest possible age-specific differences between the two stages of life or could indicate stronger effects for these genetic loci in childhood BMI than in adult BMI [11][12][13]. Thus far, most common variants explaining the genetic variability of childhood BMI remain undetected. It is well known that obesity in early-life tends to track into later life [14]. Furthermore, childhood obesity has been associated with a lower age at menarche and with non-communicable diseases in later life, including hypertension, dyslipidemia, type 2 diabetes, neurodegenerative disease and asthma [15][16][17][18][19]. Findings from recent studies suggest a shared genetic background for BMI in childhood and adulthood [8,20,21]. To which extent the associations of childhood BMI with common adult diseases are genetically explained, has not been explored in detail.
We aimed to study the genetic background of childhood BMI by performing a two-stage GWAS meta-analysis consisting of 41 studies with a total sample size of 61,111 children of European ancestry. We also examined the genetic correlations of childhood BMI with anthropometric, cardio-metabolic, respiratory, neurocognitive and endocrinological traits in adults, using GWAS summary statistics from various consortia.

Identification of genome-wide significant loci for childhood BMI
Sex-and age-adjusted Standard Deviation Scores (SDS) were created for BMI at the latest time point (oldest age, if multiple measurements were available) between 2 and 10 years using the same software and external reference across all studies (LMS growth; Pan H, Cole TJ, 2012; http://www.healthforallchildren.co.uk). Individual study characteristics are shown in S1 Table. The discovery meta-analysis included data from 26 studies (N discovery = 39,620) with data imputed to the 1000 Genomes Project or The Haplotype Reference Consortium (HRC). We performed a fixed-effects inverse variance-weighted meta-analysis and performed conditional analyses based on summary-level statistics and Linkage Disequilibrium (LD) estimation between SNPs in Genome-wide Complex Trait Analysis (GCTA) to select independently associated SNPs at each locus on the basis of conditional P-values [22]. Seventeen independent SNPs reached genome-wide significance (P-values <5 × 10 −8 ) and thirty SNPs showed suggestive association with childhood BMI (P-values >5 × 10 −8 and <5 × 10 −6 ). A Manhattan plot of the discovery meta-analysis is shown in Fig 1. No evidence of inflation due to population stratification or cryptic relatedness or other confounders was observed (genomic inflation factor (lambda) = 1.05; LD-score regression intercept = 1.0) (S1 Fig) [23]. All 47 independent SNPs identified in the discovery meta-analysis were taken forward for analysis in 15 replication cohorts (N replication = 21,491) and results of the two stages were then combined. Results of the discovery, replication and combined meta-analyses are shown in Table 1 and S2 Table and S3  Table. Results of the discovery analysis for SNPs with P-values <5 × 10 −6 are shown in S4 Table. As the replication stage might lack power to replicate SNPs from the discovery analysis, we consider the joint analysis as the primary analysis.
In total, 25 loci achieved genome-wide significance in the combined meta-analysis. We defined a SNP as representing a known BMI-locus if it was within 500 kb of and in LD (r 2 � 0.2) with a previously reported BMI-associated signal. Of the 25 SNPs, two were novel and had not been previously associated with BMI in either adults or children: rs1094647 near SLC45A3 and rs184566112 near NEDD4L. Per additional risk allele (G, allele frequency = 0.55) of rs1094647 (SLC45A3), childhood BMI increased by 0.04 SDS (Standard Error (SE) = 0.01, Pvalue = 7.20 × 10 −10 ), equal to 0.09 kg/m 2 . Per additional risk allele (A, allele frequency = 0.84) of rs184566112 (NEDD4L), childhood BMI increased by 0.06 SDS (SE = 0.01; Pvalue = 4.24 × 10 −8 ), equal to 0.11 kg/m 2 . Regional plots of the 2 novel SNPs are shown in Fig 2. Despite the fact that these novel SNPs were not associated with either childhood or adult BMI previously, they have been reported to be associated with other anthropometric phenotypes. Rs1094647 (SLC45A3) has been associated with both height and whole-body fat-free mass in adulthood [24][25][26]. Additionally, rs708724, which is in high LD with rs1094647 (r 2 = 0.70) was associated with adult weight [24][25][26]. Rs184566112 (NEDD4L) is located in the same  region as rs6567160 (distance = 448 kb, r 2 <0.2), previously associated with adult body fat [27]. In the current study, we did not observe evidence for association between rs184566112 (NEDD4L, effect allele = G, allele frequency = 0.84) and body fat percentage measured by Dual energy X-ray Absorptiometry (age range 24 to 120 months) in 2,698 children from 4 cohorts (0.03 SDS (SE = 0.04, P-value = 0.51)). Individual study characteristics of studies with data on body fat percentage are shown in S5 Table. No evidence of association with childhood obesity was found for the two novel SNPs (P-values >0.11) [28]. We additionally identified 2 independent SNPs (METTL15 and PRRC2A) within 500 kb of previously reported SNPs associated with adult BMI, but only in weak LD with prior reported signals (r 2 <0.2). Similarly, we found 2 independent SNPs in regions that are known for both childhood and adult BMI (FAM150B and MC4R) [7,8,10]. Regional plots of the 4 independent SNPs at known loci are shown in S2 Fig. Of the remaining 19 SNPs, 6 mapped to loci previously associated with adult BMI (BDNF, GPRC5B, SLC39A8, NEGR1, GALNT10, and CADM1), 2 mapped to loci previously associated with childhood BMI only (ELP3 and GPR1) and 11 SNPs mapped to loci known to be associated with both adult and childhood BMI (ADCY3 , BCDIN3D, TMEM18, FTO, FPGT-TNNI3K/TNNI3K, SEC16B, TFAP2B, LINC00558 Overall, there was low heterogeneity between studies for the 25 SNPs, except for FTO (S2 Table) [29]. The broad age range included in the discovery meta-analysis of this study may conceal age-specific effects. Therefore, we performed a sensitivity analysis excluding studies of children aged <6 years (remaining N sensitvity analysis = 55,354), which showed similar results (S6 Table) [30]. Additionally, we ran a sensitivity analysis excluding case-control studies and one excluding studies with a sample size <n = 500, showing similar results (S7 Table).

Functional characterization
We used several strategies to gain insight into the functional characterization of the 25 SNPs leading the association signals with childhood BMI. A summary of relevant information from all strategies can be found in S8 Table. First, we examined gene expression profiles of the nearest genes to the 25 SNPs from the combined meta-analysis with GTEx v7 in 53 tissues, using the tool FUMA [3,31]. We found differential expression of the 25 nearest genes in brain and salivary gland. In a second analysis of gene expression profiles in GTEx, we considered all genes in a region of 500 kb to either  side of the 25 SNPs. Using this strategy, we additionally found differential expression in liver, heart, kidney, pancreas, muscle, skin and adipose tissue [31]. Second, we assessed whether the 25 SNPs were associated with gene expression in whole adipose tissue, isolated adipocytes, and isolated stroma-vascular cells from the Leipzig Adipose Tissue Childhood Cohort [32]. Full results can be found in S9 Table. We observed differential gene expression associated with multiple SNPs. Rs1094647 (nearest gene: SLC45A3) was associated with gene expression of PM20D1 (P FDR <0.05) in whole adipose tissue. We additionally found associations of rs114285994 (nearest gene: GPRC5B) with expression of C16orf88 in isolated adipocytes. Rs115181845, which is in moderate LD (r 2 = 0.47) with rs144376234 (nearest gene: GNAI3), was associated with expression of GSTM1 and GSTM2 in whole adipose tissue, isolated adipocytes and isolated stroma-vascular cells (S8 Table and S9 Table). No associations with gene expression were observed for any of the other 22 SNPs.
Third, we used Bayesian colocalization analysis to examine evidence for colocalization between GWAs and eQTL signals and to identify additional candidate genes for the 25 SNPs (GTEx v7). Briefly, GWAS summary statistics were extracted for each eQTL for all SNPs that were present in the meta-analysis and that were in common to both GWAS and eQTL studies. In most pairs, no evidence for association was found with either trait. To define colocalization we used restriction to pairs of childhood BMI and eQTL signals with a high posterior probability for colocalization (See Methods and Materials for details) [33]. We found significant colocalizations at 6 loci (ADCY3, DNAJC27-AS1, CENPO, ADAM23, LIN7C, TFAP2B) across a range of tissues (S8 Table and S10A and S10B Table) [8].
Fourth, to explore biological processes, we used DAVID, with the 25 nearest genes as input, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [34,35]. Pathway analysis revealed one enriched biological process, cAMP signaling (P-value = 0.03).
Fifth, we performed a look-up in mouse-knockout data of the 25 nearest genes and, additionally, the genes that were indicated by colocalization and gene expression analysis. Mice in which NEDD4L was knocked out displayed neuronal abnormalities [36]. No related phenotypes were shown for SLC45A3 or any of the 4 independent loci (METTL1, PRRC2A, FAM150B, and MC4R). Of the 19 known loci, ADCY3 showed an association with increased total body fat in female heterozygous knockout mice, whereas NEGR1 was associated with decreased lean body mass in male and female homozygous knockout mice (S8 Table). Full results can be found in S8 Table. Sixth, among the 25 top SNPs, combined annotation-dependent depletion (CADD) scores >12.37, indicating potential pathogenicity of a SNP, were observed for rs13107325 (SLC39A8), rs56133711 (BDNF) and rs17817449 (FTO) (CADD scores of 34, 15.3 and 15.3, respectively) (S8 Table) [3,37].
https://doi.org/10.1371/journal.pgen.1008718.g002 value = 0.001) (Fig 3 and S11 Table). For birth weight, there were positive genetic correlations with childhood BMI, both when using fetal genetic effects on birth weight and when using maternal genetic effects on birth weight (R g = 0.20, P-value = 3.19 x 10 −5 and R g = 0.12, Pvalue = 0.002 for fetal and maternal effects, respectively). Negative genetic correlations were observed between childhood BMI and total cholesterol (R g = -0.15, P-value = 0.001), high-density lipoprotein (HDL) (R g = -0.22, P-value = 8.65 x 10 −6 ), and age at menarche (R g = -0.42, Pvalue = 1.03 x 10 −32 ). We did not find genetic correlations with any of the respiratory and neurocognitive phenotypes. Genetic correlations of childhood BMI with a selection of phenotypes that show evidence of association in observational studies are shown in Fig 3. Full results can be found in S11 Table. Second, we did a look-up of the 25 SNPs in the adult BMI GWAS [7]. In total, 12 SNPs and 8 proxy SNPs (r 2 � 0.87) were available in the adult BMI study comprising~700,000 individuals. No information was available on five loci, FAM150B, GPR1, NEGR1, NEDD4L, and PRRC2A. The directions of effect of all 20 SNPs were the same in adults as in children. Of these, 18 were genome-wide significantly associated with adult BMI (P-value <5 x 10 −8 ) and the other 2 SNPs, SLC45A3 and METTL15, showed suggestive evidence of association (Pvalues < 2.1 × 10 −6 ) (S12 Table). Effect sizes of these 20 SNPs for adult BMI were highly correlated with those for childhood BMI (r 2 = 0.86).
Third, we calculated a combined childhood BMI genetic risk score (GRS) of the 25 genome-wide significant SNPs, summing the number of BMI-increasing alleles weighted by their effect sizes from the combined meta-analysis. The GRS was associated with childhood BMI (P-value = 2.84 × 10 −11 ) in 1,169 children from the Tracking Adolescents' Individual Lives Survey (TRAILS) Cohort, aged 7 years, one of the largest replication cohorts (Fig 4). For each additional average risk allele in the GRS, childhood BMI increased by 0.06 SDS (SE = 0.009). This GRS explained 3.6% of the variance in childhood BMI. When calculating the risk score for the TRAILS cohort, effect estimates from the combined meta-analysis were used after excluding TRAILS from the meta-analysis. We additionally tested the GRS for On the x-axis the traits and diseases are shown. On the y-axis the genetic correlations (R g ) and corresponding standard errors, indicated by error bars, between childhood BMI and each trait were shown, estimated by LD score regression. The genetic correlation estimates (R g ) are colored according to their intensity and direction. Red indicates positive correlation, blue indicates negative correlation. References can be found in S11 Table. https://doi.org/10.1371/journal.pgen.1008718.g003 association with adult BMI in the three sub-cohorts of the Rotterdam Study [38] (RS-I-1; n = 5,957, RS-II-1; n = 2,147 and RS-III-1; n = 2,998). We found the GRS to be associated with adult BMI in all study samples (P-values = 5.09 × 10 −9 , 0.02, and 1.49 × 10 −10 , respectively). Per additional average risk allele, adult BMI increased by 0.03 SDS (SE = 0.005), 0.02 SDS (SE = 0.009) and 0.04 SDS (SE = 0.007), explaining 0.6%, 0.2%, and 1.3% of the variance in adult BMI, respectively. No association was found of the GRS with birth weight and cardiometabolic phenotypes, including insulin, triglycerides, low-density lipoprotein, HDL, total cholesterol, diastolic blood pressure and systolic blood pressure in 2,831 children aged 6 years from the Generation R Study if considering a Bonferroni corrected P-value of 0.00625 (S13 Table).

Discussion
In this large GWAS meta-analysis of childhood BMI among >60,000 children aged 2-10 years, we identified 25 genome-wide significant loci. Two of these loci, rs1094647 near SLC45A3 and rs184566112 near NEDD4L had not been associated with BMI before. We observed moderate to strong genetic correlations of childhood BMI with several anthropometric, cardio-metabolic, and endocrinological traits in adulthood, suggesting a shared genetic background. The closest genes to the two novel loci, SLC45A3 and NEDDL4, have not been strongly linked to obesity in previous studies and databases, indicating that functional studies are needed to identify possible biological pathways. SLC45A3, encoding the solute carrier family 45, member 3 protein, also known as prostate cancer-associated protein 6, has been related to prostate-specific antigen serum concentrations and prostate cancer [39][40][41][42]. NEDD4L, ubiquitin protein ligase Nedd4-like, known for its role in the regulation of ion channel internalization and turnover, is suggested to play a role in the regulation of respiratory, cardiovascular, renal, and neuronal functions [36,[43][44][45]. The independent SNPs identified at loci known from previous studies on adult or childhood BMI may represent fully independent signals, although due to the low LD, these SNPs might still tag the same causal variant as the previously identified SNPs.
Since there is no strong previous evidence supporting the closest genes to the 25 SNPs as the causal genes, we took multiple approaches for further functional characterization. As many different tissues have been implicated to play a role in body composition we chose to include all available tissues in the gene expression analysis. Using GTEx, we found differential expression of the 25 nearest genes in brain. This may be of interest as appetite regulation might play a role in the development of obesity [46][47][48]. Gene expression data revealed an association between one of the novel SNPs, rs1094647 (nearest gene: SLC45A3), and expression of PM20D1 in whole adipose tissue. PM20D1, Peptidase M20 domain-containing 1, previously identified as a factor secreted by thermogenic adipose cells, is known for its association with insulin resistance, glucose intolerance and enhanced defense of body temperature in cold when knocked out in mice. Furthermore, increased circulating PM20D1, together with adenoassociated virus-mediated transduction, leads to a higher energy expenditure and reduced adiposity in mice [49,50]. We used colocalization analysis to further identify candidate causal genes. This did not identify specific potential causal genes for rs1094647 (SLC45A3) and rs184566112 (NEDD4L). However, we identified ADCY3, DNAJC27-AS1, CENPO, ADAM23, LIN7C, TFAP2B as candidate genes for known loci across different tissues, including tibial nerve tissue, tibial artery tissue and the skin. No candidate genes were detected in biologically more relevant tissues, including subcutaneous or visceral adipose tissue.
Information on rs184566112 near NEDD4L was available in 24 out of 26 discovery cohorts that primarily used 1000 Genomes phase 1 imputed data (N = 37,104), thus clearly surviving our pre-set filter of having information in at least 50% of the number of studies and at least 50% of the total sample size in the discovery analysis. However, it was available in only less than half of the replication studies, mainly using 1000 Genomes phase 3 or HRC imputed data (N = 5,518) as this SNP was not included in these more recent reference panels. No other SNPs in high LD were available as proxy for this SNP in the replication analysis. Therefore, this signal needs to be interpreted with caution. However, no heterogeneity of this SNP between the discovery stage studies (I 2 = 0; P-value for heterogeneity = 0.98), a high imputation quality (weighted mean R 2 = 0.89) and the known association of another locus in the same region with adult body fat percentage might lends credibility to this signal, although further work is needed to unravel the details [27]. Previous studies have shown that variants might have strong age-dependent effects across childhood [15,51]. We performed a sensitivity analyses excluding children aged <6 years, as the approximate age of the adiposity rebound [30]. However, no difference in main results with the full meta-analysis were observed.
Genetic studies can provide more insight into the etiology of complex diseases. We observed a strong positive genetic correlation of childhood BMI with adult BMI. This is in line with previous studies [8,20,21]. We additionally observed positive genetic correlations between childhood BMI and several cardio-metabolic phenotypes in later life, including waist-to-hip ratio, diastolic blood pressure, type 2 diabetes, and coronary artery disease. Negative genetic correlations were found between childhood BMI and HDL-C and age at menarche. These results may suggest that the associations reported in observational studies are partly explained by genetic factors [15][16][17]58,61]. However, there is also evidence from previous work to support that the associations of childhood BMI with cardiometabolic phenotypes in adulthood are explained by the continuity of a high BMI from childhood until later ages, rather than by an independent effect of childhood BMI on adult cardiometabolic phenotypes [15,62]. From our data, we are not able to distinguish this. Childhood BMI was not genetically correlated with asthma and Parkinson's disease. This may indicate that the observational associations between childhood BMI and these phenotypes are not strongly explained by shared genetics [18,56,57,63].
The GRS combining the 25 top SNPs was not associated with cardiometabolic phenotypes in children aged 6 years. This may indicate that there is no shared genetic basis between childhood BMI and these phenotypes in childhood. However, the GRS analyses in children had a much lower sample size than the LD score regression analyses in adults and phenotypic variation in these phenotypes is more limited in children, leading to a much lower power to detect associations in these analyses. Additionally, the GRS was composed of the top-associated SNPs, whereas the genetic correlation estimated from the LDSR examined variation genome-wide.
We observed a SNP heritability of 0.23 which is consistent with previous findings [4-6]. Secular trends in obesity across populations and age groups can influence the heritability estimates across distinct population settings, requiring careful interpretation. This contention is also relevant for the interpretation of the genetic correlations estimated between traits. Environmental influences like those giving rise to the increase obesity in the last decades, can influence heritability estimates and hence, the power to identify significant genetic correlations. Before concluding unequivocal absence of some degree of "shared heritability" between childhood BMI and some of the adult traits, genetic correlations should be interpreted in the context of power limitations. Increasingly larger environmental influences along the life-course can result in lower heritability, but recent work has also shown that the increase in phenotypic variance accompanying increasing prevalence of obesity occurs alongside an increase in genetic variance [64][65][66][67]. This results in relatively stable (broad sense) heritability estimates across measurement years, as recently shown by a large-scale meta-analysis of adult twin data.
The 25-SNP GRS was positively associated with both childhood and adult BMI, showing slightly larger effect estimates in children suggesting that these specific genetic variants affect BMI in both childhood and adulthood, but with stronger effects at younger ages. A recent study, using genome-wide polygenic scores of 2.1 million common variants, found that the overall effect of those variants on weight starts in early childhood and increases over time [4]. Two previous studies also describe specific genetic variants associated with BMI in infancy only, and overlapping patterns of genetic variants with those in adults emerging from childhood onwards. Three SNPs associated with infant BMI from these studies were not genomewide significantly associated with childhood BMI in our data (P-values >0.02), which supports their infancy-specific effects [68,69].
Although many of the associated variants from the current study overlap between children and adults, the relative order of the signals differs. Additionally, SLC45A3, one of the novel loci did not show genome-wide association in adult data [7]. However, suggestive association of this locus with adult BMI was observed (P-value = 2.7 x 10 −5 ). Overall, the effect estimates of the 25 SNPs in childhood were highly correlated with those in adulthood (r 2 = 0.86). Taken together, evidence from the current and previous studies suggests that biological processes underlying BMI are similar from childhood onwards, but their relative influence may differ depending on the life stage.

Conclusions
In conclusion, we identified 25 loci for childhood BMI, together explaining 3.6% of the variance in childhood BMI. Two of these are novel and four represent independent SNPs at loci known to be associated with adult or childhood BMI. A strong positive genetic correlation of childhood BMI with adult BMI and related cardio-metabolic phenotypes was observed. Our results suggest that the biological processes underlying childhood BMI largely, but not completely, overlap with those underlying adult BMI. The well-known observational associations of BMI in childhood with cardio-metabolic diseases in adulthood may reflect partial genetic overlap, but in light of previous evidence, it is also likely that they are explained through phenotypic continuity of BMI from childhood into adulthood.

Ethics statement
All individual studies got approval by their medical ethics review committees. All participants gave written informed consent. Study-specific ethics statements are given in S1 Text.

Study design
We conducted a two-stage meta-analysis in children of European ancestry to identify genetic loci associated with childhood BMI. Sex-and age-adjusted standard deviation scores were created for BMI at the latest time point (oldest age, if multiple measurements were available) between 2 and 10 years using the same software and external reference across all studies (LMS growth; Pan H, Cole TJ, 2012; http://www.healthforallchildren.co.uk). In the case of twin pairs and siblings, only one of each twin or sibling pair was included, either randomly or based on genotyping or imputation quality.
In the discovery stage, we performed a meta  N = 1169). In the EDEN mother-child cohort, information was available about three SNPs only (rs7138803, rs13107325, and rs987237). Characteristics of discovery and replication studies can be found in S1 Table and S1 Text.

Study-level analyses
Genome-wide association analyses were first run in all discovery cohorts separately. Studies used high-density Illumina or Affymetrix SNP arrays, followed by imputation to the 1000 Genomes Project or HRC. Before imputation, studies applied study specific quality filters on sample and SNP call rate, minor allele frequency and Hardy-Weinberg disequilibrium (see S1 Table for details). Linear regression models assuming an additive genetic model were run in each study to assess the association of each SNP with BMI SDS, adjusting for principal components if this was deemed needed in the individual studies. As BMI SDS is age and sex specific, no further adjustments were made. Before the meta-analysis, we applied quality filters to each study, filtering out SNPs with a minor allele frequency (MAF) below 1% and SNPs with poor imputation quality (MACH r2_hat �0.3, IMPUTE proper_info �0.4 or info �0.4).

Meta-analysis
We performed fixed-effects inverse-variance weighted meta-analysis of all discovery samples using Metal [70]. Genomic control was applied to every study before the meta-analysis. Individual study lambdas before genomic control ranged from 0.993 to 1.036 (S1 Table). The lambda of the discovery meta-analysis is shown in S1 Fig. After the meta-analysis, we excluded SNPs for which information was available in less than 50% of the studies and less than 50% of the total sample size. We report I2 and p-value for heterogeneity for all findings.
The final dataset consisted of 8,228,795 autosomal SNPs. Genome-wide Complex Trait Analysis (GCTA) was used to select the independent SNPs for each locus [22]. We performed conditional analyses based on summary-level statistics and LD estimation between SNPs from the Generation R Study as a reference sample to select independently associated SNPs based on conditional P-values [22]. Forty-seven genome-wide significant or suggestive loci (P-values <5 × 10 −8 and <5 × 10 −6 , respectively) were taken forward for replication in 14 replication cohorts. Fixed-effects inverse variance meta-analysis was performed for these 47 SNPs combining the discovery samples and all replication samples, giving a combined analysis beta, standard error and P-value (Table 1). SNPs that reached genome-wide significance in the combined analysis were considered to be genome-wide significant.

Functional mapping and annotation of genetic associations (FUMA)
To obtain predicted functional consequences for our 25 SNPs, we used SNP2FUNC in FUMA, a web-based platform to facilitate and visualize functional annotation of GWAS results (http:// fuma.ctglab.nl) [3]. By matching chromosome, position, and reference and alternative alleles, combined annotation-dependent depletion (CADD) scores were annotated, indicating the deleteriousness of a SNP [37].
To annotate the nearest genes of the 25 SNPs in biological context, we used the GENE2-FUNC option in FUMA, which provides hypergeometric tests of enrichment of a list of genes in 53 GTEx tissue-specific gene expression sets (GTEx v 7) [3,31]. We used GENE2FUNC for two sets of genes: 1. Nearest genes of 25 SNPs; 2. Genes located in a region of 500 kb to either side of the 25 SNPs.

Look-up of the 25 SNPs in expression data
We studied the associations of the 25 SNPs associated with childhood BMI with gene expression levels in adipose tissue samples from the Leipzig Adipose Tissue Childhood Cohort [32]. These associations were examined in the following tissues: whole adipose tissue, isolated adipocytes and isolated stroma-vascular cells using genome-wide expression analysis (Illumina HumanHT-12 v4 arrays). Gene expression raw data of all 47,231 probes was extracted by Illumina GenomeStudio without additional background correction. Data was further processed within R / Bioconductor. Expression values were log2-transformed and quantile-normalised [71,72]. Batch effects of expression BeadChips were corrected using an empirical Bayes method [73].Within pre-processing, gene-expression probes detected by Illumina GenomeStudio as expressed in less than 5% of the samples were excluded as well as probes still found to be significantly associated with batch effects after Bonferroni-correction. Furthermore, geneexpression probes with poor mapping on the human trancriptome [74] were also excluded. In summary, these filters resulted in 23354, 21258, and 22637 valid gene-expression probes from which 20672, 18956, and 20230 probes corresponded to 14455, 13518, and 14256 genes mapping to a unique position in the human genome (hg19) for whole adipose tissue, adipocytes, and stroma/vascular cells, respectively. Three criteria were used to remove samples of low quality: First, the number of detected gene-expression probes of a sample was required to be within ± 3 interquartile ranges (IQR) from the median. Second, the Mahalanobis distance of several quality characteristics of each sample had to be lower than median + 4 x IQR. Third, Euclidean distances of expression values as described [71] had to be lower than median + 4 x IQR. Overall, of the assayed samples, 2, 4, and 2 samples were excluded for quality reasons leaving 203, 63 and 69 unique individuals having also valid data for eQTL analysis for whole adipose tissue, adipocytes, and stroma/vascular cells, respectively. Associations between the genotype and gene expression of genes in cis (respective gene area +/-1 Mb regarding transcription start and transcription end) were analyzed using a gene-dose based linear regression model adjusted for age and sex as implemented in MatrixEQTL [75]. Analysis of variants within one haplotype were done through analyses of linkage-disequilibrium using 1000 Genomes Phase 1 Version 3 and HapMap r28 hg19 CEU as references.

Colocalization analysis
We used Bayesian colocalization analysis to examine evidence for colocalization between childhood BMI and eQTL signals (GTEx v7).Colocalization analyses were conducted using the R package coloc, hht://cran.r-project.org/web/packages/coloc, as described previously [33]. Briefly, in each of the GTEx v7 tissues, all cis-eQTLs at FDR <5% were identified. For each eQTL, GWAS summary statistics were extracted for all SNPs that were present in >50% of the studies and >50% of the total sample size and that were in common to both GWAS and eQTL studies, within 1 MB of the transcription start site of the gene. For each such locus, colocalization analyses were done with default parameters, testing the following hypotheses [33]: Support for each hypothesis was quantified in terms of posterior probabilities, defined at SNP level and indicated by PP 0 , PP 1 , PP 2 , PP 3 or PP 4 , corresponding to the five hypotheses and measuring how likely these hypotheses were. S10B Table shows the above-mentioned posterior probabilities for all pairs. In most pairs, no evidence for association was found with either trait. In case association was observed, it was mostly with a single trait. To define colocalization we used restriction to pairs of childhood BMI and eQTL signals with a high posterior probability for colocalization, indicated by a PP4/(PP3+PP4) >0.9 (S10A Table).

DAVID
To explore biological processes, we used DAVID, with the 25 nearest genes as input, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [34,35].

Linkage-disequilibrium score regression
The use of LD score regression to estimate genetic correlations between two phenotypes has been described in detail previously [20]. Briefly, LD score is a measure of how much a genetic variation is tagged by each variant. A high LD score indicates that a variant is in high LD with many nearby polymorphisms. Variants with high LD scores are more likely to contain true signals and have a higher chance of overlap with genuine signals between GWAS. To estimate LD scores, summary statistics from GWAS meta-analysis are used to calculate the cross-product of test statistics per SNP, which is regressed on the LD score. where N i is the sample size of study i, ρ g is the genetic covariance, M is the number of SNPs in the reference panel with a MAF between 5% and 50%, l j is the LD score for SNP j, N s quantifies the number of individuals that overlap both studies, and ρ is the phenotypic correlation amongst the N s of overlapping samples. A sample overlap or cryptic relatedness between samples will only affect the intercept from the regression but not the slope. Estimates of genetic covariance will therefore not be biased by overlapping samples. Similarly, in case of population stratification, the intercept will be affected but it will have only minimal impact on the slope since population stratification does not correlate with LD between variants. Because of the correlation between the imputation quality and LD score, imputation quality is a confounder for LD score regression. Therefore, SNPs were excluded for the following reasons: MAF <0.01 and INFO �0.9. The filtered GWAS results were uploaded on http://ldsc. broadinstitute.org/ldhub/, a website with many GWAS meta-analyses available on which LD score regression has been implemented by the developers of the LD score regression method. In case multiple GWAS meta-analyses were available for the same phenotype, the genetic correlation with childhood BMI was estimated using the most recent meta-analysis. Genetic correlations are shown in Fig 2 and S11 Table. Genetic risk score and percentage of variance explained We combined the 25 genome-wide significant SNPs from the combined meta-analysis into a GRS by summing up the number of BMI SDS-increasing alleles, weighted by the effect sizes from the combined meta-analysis. The GRS was rescaled to a range from 0 to 50, which is the maximum number of BMI SDS increasing alleles and rounded to the nearest integer. Linear regression analysis was used to examine the associations of the risk score with childhood and adult BMI. For these analyses data from the TRAILS cohort (N = 1169), one of the largest replication cohorts, and data from the Rotterdam Study (RS-I-1; n = 5,957, RS-II-1; n = 2,147 and RS-III-1; n = 2,998) were used. Additionally, linear regression analysis was used to examine the associations of the GRS with birth weight and childhood metabolic phenotypes in Generation R in which detailed information on these phenotypes was available. When calculating the risk score for the TRAILS cohort and Generation R, effect estimates from the combined metaanalysis were used after excluding TRAILS and Generation R, respectively, from the metaanalysis. The variance explained was estimated by the adjusted R 2 of the models.