Modelling BMI Trajectories in Children for Genetic Association Studies

Background The timing of associations between common genetic variants and changes in growth patterns over childhood may provide insight into the development of obesity in later life. To address this question, it is important to define appropriate statistical models to allow for the detection of genetic effects influencing longitudinal childhood growth. Methods and Results Children from The Western Australian Pregnancy Cohort (Raine; n = 1,506) Study were genotyped at 17 genetic loci shown to be associated with childhood obesity (FTO, MC4R, TMEM18, GNPDA2, KCTD15, NEGR1, BDNF, ETV5, SEC16B, LYPLAL1, TFAP2B, MTCH2, BCDIN3D, NRXN3, SH2B1, MRSA) and an obesity-risk-allele-score was calculated as the total number of ‘risk alleles’ possessed by each individual. To determine the statistical method that fits these data and has the ability to detect genetic differences in BMI growth profile, four methods were investigated: linear mixed effects model, linear mixed effects model with skew-t random errors, semi-parametric linear mixed models and a non-linear mixed effects model. Of the four methods, the semi-parametric linear mixed model method was the most efficient for modelling childhood growth to detect modest genetic effects in this cohort. Using this method, three of the 17 loci were significantly associated with BMI intercept or trajectory in females and four in males. Additionally, the obesity-risk-allele score was associated with increased average BMI (female: β = 0.0049, P = 0.0181; male: β = 0.0071, P = 0.0001) and rate of growth (female: β = 0.0012, P = 0.0006; male: β = 0.0008, P = 0.0068) throughout childhood. Conclusions Using statistical models appropriate to detect genetic variants, variations in adult obesity genes were associated with childhood growth. There were also differences between males and females. This study provides evidence of genetic effects that may identify individuals early in life that are more likely to rapidly increase their BMI through childhood, which provides some insight into the biology of childhood growth.


Introduction
Obesity is a major global public health problem. The World Health Organisation estimated in 2010 there were at least 42 million overweight children under the age of 5-years and one billion overweight adults globally [1]. Childhood obesity is associated with poor mental [2,3,4,5] and physical health [6,7] and is one of the strongest predictors of adult obesity [8,9]. Adult obesity, in turn, increases the risk of many diseases including coronary heart disease, metabolic syndrome, some cancers, stroke, liver and gallbladder disease, sleep apnoea and respiratory problems, osteoarthritis and gynaecological problems [1]. It has been proposed that there are critical periods early in an individual's life for the development of obesity including gestation and early infancy, adiposity rebound and adolescence [10].
An individual's susceptibility to obesity is thought to result from a combination of their genetics, behaviours and environment. The heritability of obesity is estimated from family and twin studies to be between 40 and 80% [11,12,13], which appears to be age dependent with younger individuals having higher heritability estimates [14]. Genetic factors have an important role in childhood obesity, but their role may be different to those that operate in adulthood. Since the advent of genome-wide association studies (GWAS), common variants within 35 genes have been discovered to be associated with adult obesity [15,16,17,18,19] and a further 48 genes associated with population variation in body mass index (BMI) and weight [20,21,22,23,24,25,26] in individuals of European descent. In particular, common variants within the fat-mass and obesity associated (FTO) and melanocor-itin 4 receptor (MC4R) genes are associated with modest effects on BMI (0.2-0.4 kg/m 2 per allele) which translate into increased odds of obesity of 1.1-1.3 in adults [24,26,27,28,29]. However, the genomic regions discovered to date to be associated with BMI account for less than 1% of the total variance in the BMI [30], leaving much of the estimated heritability unexplained. In addition, relatively few studies have investigated the association between the adult BMI associated variants and childhood BMI [23,31,32,33,34]. Zhao et al [31] investigated the association between childhood BMI and 13 genomic loci reported to be associated with adult obesity to find that nine of the loci contribute to paediatric BMI between birth and 18 years of age. Subsequently, several authors have investigated the association between adult BMI loci and changes in growth over childhood. Hardy and colleagues [33] took variants from the two most commonly reported obesity genes, FTO and MC4R, to see if they were associated with life course body size. They found the association with BMI in both genes strengthened during childhood up until 20 years of age before weakening throughout adulthood. In 2010, Elks et al [34] used eight variants that showed individual associations with childhood BMI to create an obesity-risk-allelescore. This allele-score was strongly associated with early infant weight gain but also with weight gain over childhood. Finally, den Hoed et al [32] looked at BMI in childhood and adolescence against a larger subset of replicated SNPs representing the 16 BMI loci from the six genome-wide association studies in adults of white European descent [22,23,24,26,35,36]. Together, these studies begin to provide evidence that genetic loci associated with BMI in adulthood start having an effect in childhood and even infancy.
Obesity develops over a period of time so investigating the genetic determinants underlying this developmental process may provide insights into mechanisms of the genetic associations. Sophisticated longitudinal analyses allow questions to be addressed that cannot be determined from cross-sectional analyses. These longitudinal models assess patterns and duration of genetic effect at baseline and over a time period and the differences in means and rates of change of a trait. It is therefore important to investigate the genetic component of BMI trajectory in order to better understand some of the underlying biology of growth. The analysis of longitudinal growth curves allows one to identify specific stages in which genes play a central role.
A child's growth rate profile often contains important information regarding their genetic make-up and environmental exposures; however, BMI trajectories are difficult to model statistically due to the various changes in growth rate over childhood. Children tend to have rapidly increasing BMI from birth to approximately 9 months of age where they reach their adiposity peak; BMI then decreases until around the age of 5-6 years at adiposity rebound and then steadily increases again until after puberty where it tends to plateau through adulthood. These patterns of growth tend to be different in males and females where females often reach each of the 'landmarks' (adiposity rebound, puberty and plateau at adult BMI) at an earlier age than males. These changes over time within each individual, as well as the increasing variability over time of BMI between individuals, are often difficult to capture accurately in a statistical model. This is particularly the case when the aim is to detect modest genetic effects. The World Health Organization recently conducted research into statistical methods used to estimate growth curves over childhood and examined 30 previously published methods, of which only 7 could handle multiple measurements per child [37]. These methods range from non-linear, parametric curves [38] to non-linear, non-parametric methods where the form of the curve was allowed to differ for each subject [39,40] and from linear mixed-effects models for longitudinal normally distributed data [41,42] to a more general multilevel model, some with nonparametric components [43,44,45]. Although many methods have been previously used for growth modelling, not all are appropriate for genetic association analyses or modelling growth profiles in longitudinal birth cohorts.
We aim to compare various modelling approaches to assess the genetic effects of BMI growth through infancy, childhood and adolescence. To investigate the sensitivity of these different modelling frameworks to detect genetic effects, we will use the previously published adult obesity and BMI associated SNPs that have been shown to be associated with childhood BMI and explore their associations with childhood growth.

Subjects
The Western Australian Pregnancy Cohort (Raine) Study [46,47,48] is a prospective pregnancy cohort where 2,900 mothers were recruited prior to 18-weeks' gestation between 1989 and 1991. Recruitment took place at Western Australia's major perinatal centre, King Edward Memorial Hospital, and nearby private practices. The mothers completed questionnaires regarding the children and the children had physical examinations at average ages of 1, 2, 3, 6, 8, 10, 14 and 17 years. A DNA sample was collected at the 14 and 17 year follow-ups. A subset of 1,506 individuals were used for analysis in this study using the following inclusion criteria: at least one parent of European descent, live birth, unrelated to anyone in the sample (one of every related pair, including multiple births, was selected at random to exclude), no significant congenital anomalies, a DNA sample and at least one measure of body mass index (BMI) throughout childhood. Weight and height were measured at each follow-up by trained members of the research team [49]; weight was measured using a Wedderburn Digital Chair Scale to the nearest 100 g with children dressed in running shorts and a singlet top and height was measured to the nearest 0.1 cm with a Holtain Stadiometer. BMI was calculated from the weight and height measurements (median 6 measures per person, interquartile range 5-7, range 1-8 measurements), with a total of 8,986 BMI measures. The study was conducted with appropriate institutional ethics approval from the King Edward Memorial Hospital and Princess Margaret Hospital for Children ethics boards, and written informed consent was obtained from all mothers. The cohort has been shown to be representative of the population presenting to the antenatal tertiary referral centre in Western Australia [48].

Genes
We wanted to investigate markers that have an effect on childhood BMI, and more importantly, change in BMI over childhood so selected the 17 genetic variants published in den Hoed et al [32]. These SNPs were first discovered to be associated with adult BMI and replicated in at least one study against childhood BMI and change in BMI growth over childhood. At the time of selecting SNPs for this study, they were the largest set of SNPs shown to be associated with BMI over childhood and adolescence. We did not include loci that have been shown to be associated with only obesity risk but not BMI. Subsets of these 17 SNPs (either the same SNPs or a SNP in high LD [r 2 .0.8]) were also presented by Elks et al [34] and Hardy et al [33], who showed associations with changes in growth over childhood. Genetic information on these 17 published genetic variants was available for individuals in our sample, either directly genotyped SNPs (rs925946 (BDNF), rs10913469 (SEC16B), rs2605100 (LYPLAL1), rs987237 (TFAP2B), rs10838738 (MTCH2), rs7138803 (BCDIN3D) and rs10146997 (NRXN3)) or from the best guess genotype data imputed against HapMap release 22 (rs2815752 (NEGR1), rs6548238 (TMEM18), rs7647305 (ETV5), rs10938397 (GNPDA2), rs613080 (MRSA), rs1488830 (BDNF), rs8055138 (SH2B1), rs1121980 (FTO), rs17782313 (MC4R) and rs11084753 (KCTD15)). Genotyping and quality control has been described elsewhere [50]. Briefly, our sample was genotyped using the genome-wide Illumina 660 Quad Array. Genotyping was performed on the Illumina BeadArray Reader at the Centre for Applied Genomics, Toronto, Canada using 250 nanograms of DNA. The genotype data was cleaned using standard thresholds (HWE p-value .5.7610 27 , call rate .95% and minor allele frequency .1%). Individual level genotype data was extracted for those SNPs of interest that were directly genotyped by the chip and passed QC measures. Imputation of un-typed or missing genotypes was also performed using MACH v1.0.16 for the all 22 autosomes with the CEU samples from HapMap Phase2 (Build 36, release 22) used as a reference panel. Two variants in the BDNF gene were investigated as they have previously been shown to be independently associated with obesity [22] (r 2 = 0.11). The 17 SNPs are described in Table S1, including the available sample size with complete data for each SNP. These 17 SNPs were used to investigate the sensitivity of each method to detect genetic variants in terms of point estimates and standard errors (SEs) across various time points (for those methods that could be compared). Each SNP was incorporated into the model independently assuming an additive genetic effect for the obesity risk allele. In addition, an 'obesity-risk-allele score' was created on the subset of individuals with complete genetic data by summing the number of risk alleles an individual had (n = 1,219) [51]. The alleles were not weighted by their effect size as this has previously been shown to only have limited benefit [52].

Statistical Analysis
Four popular methods were compared to assess the accuracy of estimation of BMI growth trajectories and the ability to detect genetic effects influencing these trajectories. These methods included: Linear Mixed Effects Model (LMM) [41], the Skew-t Linear Mixed Effects Model (STLMM) [53,54,55], Semi-Parametric Linear Mixed Models (SPLMM) and a Non-Linear Mixed Model (NLMM), also known as SuperImposition by Translation and Rotation (SITAR) [40]. Although there are many possible statistical methods that could be utilized in this context, these methods were chosen as they allow for adjustment of potential confounders, appropriately account for the complex correlation structure between the repeated measures, allow for incomplete data on the assumption that data are missing at random, and are computationally feasible in the context of candidate gene and genome-wide association studies. Once the best fitting model was defined for each method, the model fit for each of the methods was compared. A small simulation study was also conducted using resampling techniques based on 1,000 non-parametric bootstrap data sets with replacement [56] from the Raine data and calculating an R 2 statistic for each method fit to these simulated datasets.
LMM. The LMM with a polynomial function is a common tool for growth curve analysis with continuous repeated measures. For a set of time points varying from 1,.,t, the time trend in the sample can be described by a (q-1)st-degree polynomial function, with q # t. The growth curve LMM for the j th individual and t th time point and with the time scale measured by age is as follows: Where Age is the mean age over the t time points in the sample (i.e. 8 years), b i are the parameter estimates for the fixed effects, u kj are the parameter estimates for the random effects assumed multivariate normal and the e jt 's are the error terms assumed normally distributed N(0, S), where S is the within-individual correlation matrix. Both age and the natural log transformation of age were considered as the time component to identify the optimal underlying scale. Both fixed (i) and random (k) effects up to polynomial of degree 3 were tested for significance. Several withinindividual correlation structures were considered, including autoregressive, continuous autoregressive, exchangeable (compound symmetric) and unstructured. Following the guidelines outlined in Cheng et al [57], the initial saturated model considered included a cubic function of age for both the fixed and random effects and BMI on the natural log scale, was used to compare covariance (random effects) matrices. Initially, likelihood ratio tests (LRT) were used to assess the required degree of polynomial function for the random effects to fit the data accurately, while keeping the fixed effects the same and specifying an independence correlation matrix for the random effects. Next, a similar approach was used to investigate withinindividual correlation structures in addition to the random effects. Finally, models with both untransformed and natural log transformed age were compared using diagnostic plots such as fitted verses observed values, fitted versus residual values and distribution of both random effects and error terms. STLMM.
The assumption of multivariate normal random effects and within-subject errors is often violated, particularly when modelling the childhood growth curve. This may lead to biased estimation of fixed effects and their SEs and thus to wrong statistical inference, in particular of the genetic association-related parameters. A common approach to achieve normality is to transform the response variable but generally there is not a unique transformation that could be used and the results of the analyses might depend on the transformation used. To avoid transforming the response and still obtain a valid inference under a non-normal distribution assumption for the response, we utilised an extension of the LMM model assuming a multivariate t distribution for the error terms, e jt 's, and a multivariate skew-normal distribution for the random effects. The resulting model for the response over the t time points is multivariate skew-t with specific parameters that account for the asymmetry (skewness parameters) and long-tail (degree of freedom of the t distribution) of the response distribution [54]. The specification in terms of fixed and random effects was identical to the LMM. No transformations were applied to either BMI or age as the skewness in the data was accounted for by the model structure.
SPLMM. Semi-parametric linear mixed models make use of smoothing splines, which yield a smoother growth curve estimate than the polynomial function in the LMM when fitting non-linear relationships. The basic model for the j th individual and time-point t is as follows: Where k k is the k-th knot and (t -k k ) + = 0 if t # k k and (t -k k ) if t.k k , which is known as the truncated power basis that ensures smooth continuity between the time windows.
Various numbers and positions of knots and the degree of polynomial between knots were compared to find the best fit to the data. Knot points were initially estimated visually from both individual profiles and the population average curve in males and females separately. To optimise the number and placement of the knot points, we fit a series of models with the knot points placed at 6-month intervals around the estimated knot points and incorporated additional knot points to see if they improved the model fit. The model with the lowest Akaike Information Criterion (AIC) was selected as the final model. Finally, we investigated the degree of polynomial, up to the third degree, required for each spline, once again selecting the best model with the lowest AIC.
NLMM. The SITAR method [40] was recently defined to summarize height growth in puberty (in particular peak height velocity) and estimate subject-specific parameters that can be used to investigate relationships with earlier exposures and later outcomes. The SITAR method (referred to here as NLMM) model has a single fitted curve at the population level and individual level estimates of mean differences in size (shifting up or down of the BMI curve), growth tempo (left-right shift of the curve on the age scale) and velocity (shrinking or stretching of the age scale).
The basic model for the growth curves is: Where: y it = growth of subject i at age t. h(t) = natural cubic spline curve of growth vs. age. a i = random growth intercept that adjusts for differences in mean height (size).
b i = random growth intercept to adjust for difference in timing (tempo).
c i = random age scaling adjusting for the duration of the growth spurt (velocity).
This model was fit with the three parameters (size, tempo and velocity) as random effects, size and velocity as fixed effects, and h(t) a natural cubic spline curve with 3 to 8 degrees of freedom (df) fitted as fixed effects. BMI and age were fitted both untransformed and natural log transformed, to identify the best fit to the data. Model fit to the data were compared using AIC, deviance and residual standard deviation. The estimates for the three parameters (size, tempo and velocity) were extracted for each individual and used for genetic analyses.
Given that growth curves differ greatly between males and females, particularly around puberty, and because different genes may influence the timing of growth spurts in males and females, sex stratified models were used for all analyses. Age was mean centred prior to analysis. Due to the possibility of population stratification in our sample given our sampling criteria of at least one parent of European descent, a sensitivity analysis was conducted adjusting the genetic analyses for the first five principal components generated in the EIGENSTRAT software [58]. No adjustment for multiple testing have been made as our goal was to estimate a combined effect of SNPs that have already been validated in previous studies and shown to be significantly associated with childhood BMI and growth. All analyses were conducted in R version 2.12.1 [59]; the spida library was used for the SPLMM models and the sitarlib library was used for the NLMM models. To enable comparison between the four methods, maximum likelihood estimation was used for all mixed models. Genetic loci were considered associated with BMI if the global likelihood ratio test was significant at a a,0.05 level.

Population Characteristics
Of the 1,506 children in the analysis, there are 773 males (51%) and 733 females. Table 1 gives the characteristics of the Raine sample used in the analysis. At birth, these babies were similar to the Western Australian population of births with an average birth weight of 3.35 Kg (SD = 0.59 Kg) and gestational age of 39.35 weeks (SD = 2.11 weeks), 25.21% of them were born to mothers who smoked throughout pregnancy and 8.77% born preterm. The mothers on average gained 8.79 kg (SD = 3.78) throughout pregnancy and breast fed their infant for an average of 6 months (IQR = 2-12 months). On average, the infants gained 6.98 Kg (SD = 1.17 Kg) in the first year of life.

Model Fitting and Comparisons
The optimal model for each method was defined before any cross-method comparisons were conducted. The selected models for each method are summarized in Table 2.
LMM. The optimal LMM model for both males and females was based on ln(BMI) and untransformed age, with cubic polynomial of age in the fixed effects, a quadratic polynomial of age in the random effects and a continuous autoregressive correlation structure of order one. Hence, the final model for both females and males was STLMM. The LMM model defined previously was used for this method; however BMI was modelled on the untransformed scale as the method accounts for the skewness and kurtosis of the BMI distribution. The model would not converge with both linear and quadratic age components in the random effects so this was reduced to only linear age. This was the most computationally intensive method to fit as it uses an expectation-maximization (EM) algorithm for parameter estimation, and hence took the longest time to converge.
SPLMM. For females, the optimal model had three knot points placed at two, eight and 12 years with a cubic slope for each spline. The males displayed a similar curve to the females, also with three knots at two, eight and 12 years and a cubic slope between each knot.
NLMM. The optimal model for females had a natural cubic spline curve with three degrees of freedom and both BMI and age on the natural log transformed scale. Similarly, the optimal model for males was with BMI and age on the natural log transformed scale but with four degrees of freedom for the natural cubic spline curve.
Comparisons. Table 3 displays the measures of fit used to compare methods: R 2 , R 2 from 1,000 simulated datasets, observed-fitted values, number of SNPs detected and computational time. The R 2 , in conjunction with interquartile range of variation of R 2 estimated through simulations, clearly favour the SPLMM as the best model fit for the females. The R 2 estimates from the simulations indicate that although the STLMM method has higher R 2 for both females and males, the interquartile range is much larger for STLMM method, indicating the model fit is more data dependent than the other methods, which is not desirable for generalization to other cohorts. The conclusion for the males is not as simplistic as the R 2 is largest for the STLMM,   tive inference about genetic associations. The male residuals displayed a similar pattern to females, although there were fewer obvious outliers. In addition, as there was less skewness in the males, the STLMM method deviated from the expected t distribution but in the opposite direction to that of the females, whereby the low values of BMI are underestimated. Based on model fit, all four methods were adequate in modelling childhood growth curves; however, the SPLMM was slightly better than the other methods at accounting for outliers and had the best model fit.

Genetic Results
Of the 17 SNPs, a likelihood ratio test indicated the LMM method detected one significant association in the females and three in males at the 5% level of significance, the STLMM method detected three in females and four in males, the SPLMM detected three in females and four in males and finally the NLMM method detected no significant SNPs in either females or males for the size parameter but 2 significant SNPs for the velocity parameter in males. Results of all 17 SNPs can be found in Tables S2 (females) and S3 (males). The first five principal components for population stratification were not significantly associated with BMI in any of the four methods and the genetic results of the 17 SNPs remained consistent when adjusting for them (data not shown).
The obesity-risk allele score based on the genotypes at each of the 17 loci was normally distributed and showed an approximately linear association with BMI across childhood, based on the mean BMI (95% confidence interval) for each score at each age ( Figure 2). When incorporating the risk-allele score into the four longitudinal models, it was associated with increasing BMI in females using all four methods however only three methods detected an association in males (Table 4). For the females, the LMM, STLMM and SPLMM methods all detected an increase in BMI per allele increase in the obesity-risk-allele-score (LMM b = 0.0046, P = 0.0216; STLMM b = 0.0492, P = 0.0410; SPLMM b = 0.0049, P = 0.0181), in addition to an increase in linear slope over time (LMM b = 0.0012, P = 0.00002; STLMM b = 0.0153, P = 0.00003; SPLMM b = 0.0012, P = 0.0006). No significant associations in the LMM, STLMM or SPLMM methods were detected for the quadratic interactions with the risk-allele score, however the cubic interaction was significant in the LMM (b = 20.00001, P = 0.0067) and STLMM (b = 20.0001, P = 0.0236). This indicates that, according the LMM and STLMM methods, females with higher allele scores plateau to adult BMI at an earlier age. In contrast, the NLMM method in both females and males was unable to detect a significant association with an increase in size or velocity, but did detect a decrease in tempo (assumed to be adiposity rebound) for each increase in risk allele. In the males, the LMM, STLMM and SPLMM methods, also detected an increase in BMI (LMM b = 0.0073, P = 0.0001; STLMM b = 0.0423, P = 0.0481; SPLMM b = 0.0071, P = 0.0001) and BMI/year per allele increase (LMM b = 0.0010, P = 0.0001; STLMM b = 0.0083, P = 0.0070; SPLMM b = 0.0008, P = 0.0068). No significant associations in the LMM, STLMM or SPLMM methods were detected for the quadratic and cubic interactions with the risk-allele score, indicating that the shape of the curve is consistent across the score categories.
Further analysis focused on the SPLMM model, as this method was shown to give the best fit to these data. There are potentially different genetic pathways leading to increased growth rate in males and females as SNPs from different genes are associated with BMI trajectory; in females, SNPs in the NRXN3, BDNF and MRSA genes were significantly associated with BMI trajectory whereas in males FTO, NRXN3, GNPDA2 and TMEM18 were significant. Figure 3 displays the population average curves for individuals with 15, 17 or 18 (25 th , 50 th and 75 th percentile) obesity-risk alleles. The growth curves in each of the genders show different patterns; females begin their trajectory smaller than males, they have an earlier rebound, and by the age of 18 years they are beginning to plateau at their potential adult BMI. In contrast, males go through puberty at a slightly later age resulting in their BMI continuing to increase at the age of 18 years. It is apparent that the genetic effect begins later for females, around seven and a half years (P = 0.03), than males at four years (P = 0.02) (Figure 4).

Discussion
The current study has shown that of the four statistical methods evaluated, the semi-parametric linear mixed model (SPLMM) method was the most efficient for modelling childhood growth to detect modest genetic effects in the longitudinal pregnancy cohort study investigated. In addition, we have shown that there are potentially different genetic pathways leading to increased growth rate in males and females and that the obesity-risk-allele score increases both average BMI and rate of growth throughout childhood.
There are several different statistical methods that can be used to model childhood growth. We selected four methods that would allow for adjustment of potential confounders, appropriately account for the correlation between the repeated measures, allow for incomplete data, and were computationally feasible in the context of candidate gene studies and GWAS. The evidence suggested that the SPLMM method does a better job at accounting for the variation in BMI growth than the LMM as it had a smaller residual standard deviation. The SPLMM and NLMM methods produce similar differences between observed and fitted values. The LME and STLMM methods have a larger range which indicates the prediction of BMI for each individual over time is worst using both of these methods, introducing bias whereby they over estimate low BMI values and under estimate high BMI values. As seen in the residual plots, there are a small number of outliers in this dataset, which are highly influential for both the LMM and STLMM and will effect there ability for accurate prediction. Furthermore, the estimates of skewness from the STLMM model were relatively large ( Of the 17 genetic variants associated with adult BMI and obesity risk that we investigated, the SPLMM method was able to detect a higher proportion of associations with childhood growth in both males and females than the other methods. The NLMM method performed poorly in both males (five significant tests of 51) and females (two significant tests of 51) consistent with it being more conservative than the other three methods. The STLMM method detected a number of genetic effects, however it was a more computationally intensive method, which would prove difficult in larger scale genetic studies such as genome-wide association studies. Moreover, it is not as flexible as the other methods in terms of extensions to evaluate gene-environment or gene-gene interactions. The current study provides evidence that the SPLMM method is the most effective method to detect genetic associations and allows the flexibility for extensions into large scale and more complex genetic analyses.
Single genetic loci typically have small effects on complex diseases or explain only a small proportion of the variability in a quantitative trait; therefore, major increases in disease risk are expected from simultaneous exposure to multiple genetic risk variants. A post hoc power calculation using 1,000 non-parametric bootstrap simulations based on the Raine data indicated that this study had 97% power to detect the FTO loci rs1121980 with MAF = 0.41, which has one of the larger effect sizes on BMI, but still had 83% power to detect a more realistic smaller effect size like the BDNF SNP rs1488830 association in females with MAF = 0.21. In contrast, the power to detect the allele score, combining all risk alleles, was 95% in both males and females separately. The current study is the first to investigate, separately in males and females, an association between 17 published obesityrisk loci as an allele score and BMI trajectory throughout childhood and adolescence. Hoed et al [32] used a similar approach with a 17-loci allele-score but focused on two cross-sectional association analyses in pre2/early pubertal children and adolescents. By utilizing a longitudinal design, the current study reduced the number of genetic association tests conducted from eight in a cross-sectional setting to one per gender, reducing the necessity of adjusting for multiple testing and potentially missing important genetic loci. A second study by Elks et al [34] evaluated the association between adult obesity risk genes and growth throughout childhood using a smaller subset of obesity susceptibility loci and with analyses only up to age 11 years. Both studies conducted analysis adjusting for gender; however, this does not allow each gender to have different growth trajectories or the investigation of different timing of the genetic effects. We found substantial differences between males and females in the timing of the adiposity rebound and plateauing towards adulthood. Additionally, we detected genetic effects had different timing and effects in each gender. By combining males and females into one analysis, these genetic differences may have been averaged out and the biology underlying the differences may remain undetected.
A recent longitudinal study investigating the life-course effects of variants in the FTO gene and near the MC4R gene demonstrated that the effects strengthen throughout childhood and peak at age 20 before weakening during adulthood [33]. We detected a similar pattern with the obesity-risk allele score throughout childhood, where the effect begins around four years in males and seven years of age in females and increases in size each year. One limitation of the current study is that the cohort currently only has data available up to 18-years. It will be of interest to follow the cohort in order to investigate how the combined effect of these SNPs changes as the cohort progresses into adulthood. Further, it would be valuable to confirm that the SPLMM method is the most appropriate statistical method in other cohorts investigating the genetic determinants of childhood growth and the patterns of association across the life course.
Further studies are now required to assess the validity of these findings and also extend them to perhaps focus on interactions between genes and the environment. Interactions, both gene-gene and gene-environment, are an important area of research that is critical for understanding the mechanisms underlying obesity. We performed a small simulation study using re-sampling techniques based on 1,000 non-parametric bootstrap data sets with replacement from the Raine data and calculating the power to detect a gene-gene interaction. Two SNP combinations were investigated to gather an understanding of the range of power in our study; these included the two most commonly reported BMI associated loci, FTO rs1121980 (MAF = 0.41) by MC4R rs17782313 (MAF = 0.23) as well as two loci with large minor allele frequency, FTO rs1121980 by NEGR1 rs2815752 (MAF = 0.38). Based on these simulations, our study had 58.0% power to detect an interaction between two SNPs with larger minor allele frequencies (FTO*NEGR1) and effect sizes (FTO 0.019 kg/m 2 ; NEGR1 0.011 kg/m 2 ), while assuming a multiplicative model for the interaction. However, the power decreases rapidly with the minor allele frequency (FTO*MC4R) and effect size (FTO 0.0044 kg/ m 2 ; MC4R 0.0020 kg/m 2 ) to 4.6%. We therefore believe that our study was not appropriately designed to detect gene-gene or geneenvironment interactions but instead think that meta-analyses of multiple cohorts might be a better way to tackle this problem.
In conclusion, we have shown that although all four statistical methods investigated for modelling childhood growth were appropriate to model growth curves in childhood, the SPLMM method was the most efficient in these data in terms of predicted values and detection of genetic effects. Further, we have shown that there is some evidence that genetic variations in established adult obesity-associated genes are associated with childhood growth; however these effects differ by gender and timing of effect. This study provides further evidence of genetic effects that may identify individuals early in life that are more likely to rapidly increase their BMI through childhood, which provides some insight into the biology of childhood growth.