Longitudinal Genome-Wide Association of Cardiovascular Disease Risk Factors in the Bogalusa Heart Study

Cardiovascular disease (CVD) is the leading cause of death worldwide. Recent genome-wide association (GWA) studies have pinpointed many loci associated with CVD risk factors in adults. It is unclear, however, if these loci predict trait levels at all ages, if they are associated with how a trait develops over time, or if they could be used to screen individuals who are pre-symptomatic to provide the opportunity for preventive measures before disease onset. We completed a genome-wide association study on participants in the longitudinal Bogalusa Heart Study (BHS) and have characterized the association between genetic factors and the development of CVD risk factors from childhood to adulthood. We report 7 genome-wide significant associations involving CVD risk factors, two of which have been previously reported. Top regions were tested for replication in the Young Finns Study (YF) and two associations strongly replicated: rs247616 in CETP with HDL levels (combined P = 9.7×10−24), and rs445925 at APOE with LDL levels (combined P = 8.7×10−19). We show that SNPs previously identified in adult cross-sectional studies tend to show age-independent effects in the BHS with effect sizes consistent with previous reports. Previously identified variants were associated with adult trait levels above and beyond those seen in childhood; however, variants with time-dependent effects were also promising predictors. This is the first GWA study to evaluate the role of common genetic variants in the development of CVD risk factors in children as they advance through adulthood and highlights the utility of using longitudinal studies to identify genetic predictors of adult traits in children.


Introduction
Cardiovascular disease (CVD) affects over 79 million people in the United States [1], and is the leading cause of death worldwide [2][3][4]. Identifying the genetic determinants of CVD can lead to more effective diagnostics, prognostics, therapeutics, and, ultimately, preventive strategies. The best chance for prevention would be to identify risk at the earliest possible age. Genome-wide association (GWA) leveraging cross-sectional phenotypic data has been a particularly useful approach to identifying loci that influence many of the quantitative risk factors of CVD [5][6][7][8][9][10], however the use of cross sectional data does not provide insight into how such risk factors develop over time. Longitudinal studies, particularly those that begin in childhood, allow for the identification of risk profiles of susceptible individuals before disease onset. The Bogalusa Heart Study (BHS) is a longitudinal study focused on the early natural history of CVD. The BHS began in 1973 and includes up to 9 phenotypic screenings in childhood (4-17 years of age) and up to 10 adult (18-48 years of age) cross-sectional screenings. We have conducted a longitudinal genome-wide association study on a subset of the total sample of unrelated individuals with a large number of measurements (mean number of measurements = 8, range = 4-13) and are of European Ancestry (N = 525).

Longitudinal GWA
We conducted a genome-wide association study of longitudinal measures of 12 traits measured from childhood through adulthood on participants of the BHS of European ancestry: anthropomorphic (height, weight, and waist circumference), blood pressure (BP) (diastolic and systolic BP), heart rate, blood lipids (low density lipoprotein cholesterol (LDL), high density lipoprotein cholesterol (HDL), total cholesterol (TC), and triglycerides), and metabolic traits (glucose and insulin). Genotyping was performed on the Illumina Human610 and HumanCVD BeadChips [11] for a total of 545,821 SNPs passing QC and allele frequency filters (see Materials and Methods). Imputation was performed using the CEU HapMap 2 as a reference population with the computer program MACH v.1.0.16 (http://www.sph.umich.edu/csg/yli/ mach/) [12], providing genotype estimates for an additional 1,622,114 SNPs. For each SNP, we tested whether it had an average linear effect over time (SNP effect), and whether it entered into a time-dependent effect (SNPxAGE interaction effect), such that the genotype is associated with variation in the linear trajectory of the trait from childhood through adulthood. Both SNP and SNPxAGE effects were calculated using linear mixed models as implemented in the R nlme package [13], adjusting for age and gender. Table 1 lists all regions showing SNP effect associations (P,10 26 ) and Table 2 lists all regions showing association (P,10 26 ) with SNPxAGE effects. We analyzed the regions surrounding the top associations for consistency with recombination hotspots and LD relationships ( Figure S1) and provide Manhattan plots of each trait association ( Figure S2). From both sets of analyses, there were 5 novel associations with a P-value less than 5610 28 and 6 novel regions where there were at least 10 genotyped or imputed SNPs with P,10 25 . The most significant association (rs7890572, P = 3.8610 210 ) was observed with a linear triglyceride trajectory effect (i.e., SNPxAGE effect) on the X chromosome within the IL1RAPL1 gene and near the gene

Author Summary
We have studied the association between genetic factors on a whole genome level and cardiovascular disease (CVD) risk factors in a population of individuals studied from childhood through adulthood. The longitudinal study design has enabled the investigation of genetic variation influencing trait values over time. We have identified DNA variants that are associated with CVD trait values consistently over time, and a second set of variants that are associated with CVD trait values in a time-dependent manner. We also show that variants previously identified in adult populations have consistent effects within our population and that these effects are usually similar across childhood through adulthood. The discovery of timedependent variants that influence CVD trait values over time can potentially be used to screen young individuals who are pre-symptomatic and provide the opportunity for preventive measures decades before disease onset. encoding glycerol kinase (GK), in which mutations have been implicated in pseudo-hypertriglyceridemia, caused by high levels of glycerol creating measurement artifacts in the triglyceride assay [14]. A novel association of potential biological interest involved a SNP effect on insulin levels with variation in the CHN2 locus (rs3793275, P = 5.8610 29 ), a beta-chimerin that has recently been described as part of a fusion gene also containing the insulin receptor that was shown to be responsible for severe insulin deficiency [15]. This SNP is also associated with glucose trajectories in our dataset (SNPxAGE; P = 1.5610 27 ). In the 7q11 region, 25 SNPs are associated (P,10 25 ) with diastolic BP (SNP effect; peak SNP rs709595, P = 7.0610 27 ). The calcitonin gene-related peptide receptor (CRCP) is approximately 200 kb from the top SNP, but contains SNPs that are in LD with the top SNP (see Figure S1). The calcitonin gene-related peptide is a vasodilator [16] and its receptor CRCP has been previously implicated in hypertension in a small candidate gene association study of hypertension in Japanese individuals [17].
In addition to novel associations, there were three regions showing SNP associations that have been previously identified in GWA studies: rs853773 [18] near G6PC2 was associated with a glucose SNP effect (P = 7.0610 27 ), rs247616 [5] near CETP was associated with an HDL SNP effect (P = 6.6610 27 ), and the APOE e2 SNP rs7412 [19] was associated with a genome-wide significant LDL SNP effect (P = 1.6610 28 ). A region near APOA5 that had been previously implicated in triglyceride levels showed a significant SNPxAGE effect on triglycerides in our study (rs12280753; P = 1.8610 28 ). Although the nearest gene to rs12280753 is not APOA5, this SNP was also the most strongly associated SNP in previous studies of adult triglyceride levels [5,10,20].

Replication in the Young Finns
We pursued replication of these findings in genotyped individuals within the Young Finns (YF) cohort, consisting of 2,442 Finnish individuals tracked from childhood through middle adulthood (ages 3-45) with three measures in young individuals (ages 3-24) and two measures in older individuals (ages 24-45). These individuals have been genotyped on a custom-built Illumina genotyping chip (670K). Using the same analysis methods, we tested whether the top SNP was associated in the YF study (Table 3). Imputed genotype dosages were used when direct genotype data was not available. For the APOE-e2 SNP rs7412, which is not in HapMap or on the 670K chip, we used the SNP with the next strongest association in the BHS (rs445925). There were two SNPs that significantly replicated beyond the multiple testing threshold (P,0.05/51 = 1610 23 ): the rs247616 SNP at CETP (P = 1.7610 218 ), and rs445925 at APOE (P = 4.1610 215 ). There was no trend to replicate the direction of effect between the studies: within the SNP effects, there were 12/21 (57%, chi-square P = 0.51) markers that showed the same direction of effect, while within SNPxAGE effects, there were 14/30 (47%, chi-square P = 0.72). The samples were combined and P-values were calculated for the combined BHS and YF data, using study as a covariate ( Table 3). The associations at rs247616 at CETP with HDL-cholesterol (P = 9.7610 224 ) and rs445925 at APOE with LDL-cholesterol (P = 8.7610 219 ) were strongly significant, but no other regions in the combined BHS/YF data reached genomewide significance of P,5610 28 .

Prediction of adult values given childhood values
Genetic variants will be most useful for trait prediction when they are associated with a trait above and beyond other known risk factors. In addition, the ability to predict adult trait levels in children, before disease onset, can lead to a disease prevention strategy. In longitudinal studies starting in childhood and going into adulthood, we can ask whether genetic loci are associated with the adult trait level above and beyond the trait level seen in the first measure taken in childhood. To test this hypothesis, we evaluated whether our associated markers were likely to be predictive of adult levels of the traits, after adjustment for trait levels in childhood. To account for variation in data collection, we also included the age at each of these measures as well as gender as covariates in the analysis. Within the BHS, variants that were characterized as SNPxAGE effects were more likely to be predictive of adult values after correcting for childhood values, which is expected since these variants were characterized in BHS initially (Table 4). In the YF study, however, we also saw more SNPxAGE variants associated with adult levels given childhood levels (Table 4). There were 6 variants that were associated with adult levels in the YF study at P,0.05, with 2 corresponding to the genome-wide significant SNP effects and 4 corresponding to BHS SNPxAGE variants. Only the association of rs445925 with LDLcholesterol was strong enough to withstand multiple corrections. Further analysis of this observation is warranted in a larger cohort.

Previously identified markers
We assessed whether associations that have been described in previous adult cross-sectional GWA studies exhibit consistent effects over time and whether the effect sizes observed in children through middle-aged adults are consistent with those previously described. We identified 169 SNP-trait associations (see Materials and Methods) for which we had directly genotyped or imputed genotype data. We first estimated our power to detect each previous association at alpha = 0.05 under a more structured, but similar study design (i.e., 8 equally spaced measurements), given the previously reported effect size and allele frequency. Under this model, we would expect to have detected 40/169 (24%) associations at P,0.05, and we observed a similar number of SNP effects in the BHS data (32/169; 19%). We evaluated the associations across all traits together by comparing how well the previously reported effect size was recapitulated in the BHS GWA ( Figure 1A). For consistency across studies and traits, if an effect size was not already expressed in terms of percent standard deviation (%SD), we converted the previously reported effect size into %SD and compared the previous effect size to the SNP effect. The previously reported effect size was a strong predictor of the SNP effect (slope = 0.47, P = 1.2610 221 ), suggesting that SNPs that have been previously identified in adult cross-sectional GWA studies are good predictors of time-averaged effects in the BHS sample.
We also determined whether the same previously identified SNPs were likely to show effects on a trait over time (SNPxAGE effects). Under a simple model that assumed that all of the effect in adults is due to a locus that has no effect in childhood, we estimated power to detect such an interaction effect in a similarly structured study with 8 repeated measures. Given these assumptions, we would have expected to see 24/169 (14%) SNPxAGE associations. We observed 6/169 (3.6%) SNPs that showed SNPxAGE effects at P,0.05, indicating that effects seen in SNPs described in adult GWA studies are not due primarily to differences in effects over time, although larger studies will be required to definitively characterize this.

Composite scoring
We considered whether a composite genotype score would better predict overall CVD risk factor trajectories or timedependent effects than any single locus. For each person and each trait, we created a score by summing the expected effect in percent standard deviation of each allele that the person carried. We then determined whether the score was associated with the trait's average value and trajectory by using this score as a predictor for each trait in a linear mixed model, adjusting for age and gender. We assessed the score's average effect across time (score effect) and whether or not there was a time-dependent effect (score*age effect). The traits HDL, LDL, total cholesterol, triglycerides, and height showed strongly significant score effects, while only triglycerides showed a score*age effect (Table 5). Longitudinal data was visualized by color-coding the individuals according to the decile of their overall score and the average linear trend of each group was plotted (LDL, Figure 1B and others in Figure S3). These results indicate that the cumulative effects of SNPs that are identified in large adult cross-sectional studies are generally age-independent effects, with an exception in triglycerides, which was the only trait to show a significant score*age effect. We additionally tested whether previously identified variants were predictive of adult levels after adjusting for childhood levels ( Table 6). We saw that 25/169 (14.8%) showed association at P,0.05. These observations in the BHS data suggest that even though results from existing GWA studies demonstrate ageindependent effects, they can be predictive of trait values in adults.

Discussion
We identified seven associations at P,5610 28 showing either time-averaged or time-dependent effects on CVD risk factors in the BHS, two of which have been previously characterized. Of all associations with P,10 26 , we were able to strongly replicate the association in the YF with HDL-cholesterol at CETP with a combined P = 9.7610 224 , and LDL-cholesterol at APOE with a combined P = 8.7610 219 . Differences that exist between the cohorts, such as birth year (15 year difference), and environmental differences could have influenced replication of the remaining SNPs. Larger discovery studies will also have better resolution and We evaluated the longitudinal effects of markers that have been previously identified in adult GWA studies. We found that previously identified markers showed time-averaged effects consistent with their reported effect size. This argues that the linear mixed model is an effective tool for modeling time-averaged effects in a GWA setting and that adult GWA studies may be capturing variation that tends to have consistent effects over time. Using a scoring approach, the overall signal from previously identified markers tended to have strong associations with time-averaged effects, but except in the case of triglycerides, did not show timedependent effects. Previously identified markers were also likely to be associated with adult trait levels above and beyond childhood levels. Although we primarily describe time-averaged effects for previously identified markers, there may be more subtle timedependent effects that larger studies will be better able to capture.
It is important to note that although we focused on analysis of linear trends over time, a linear model may not best capture these trends. Other approaches could be explored further such as nonlinear models when there is an a priori expectation of trait trajectory, or model free approaches. These additional models could lead to additional variations that influence trajectories, or more precise estimations of effect size.
Longitudinal studies are particularly suited to capturing effects that vary over time. Genetic variation that shows a timedependent effect may help predict those that will go onto develop disease before they show symptomatic traits. The discovery of variants associated with SNPxAGE interaction effects could thus be used to screen young individuals who are pre-symptomatic and provide the opportunity for preventive measures decades before disease onset. We explored how well the markers that we identified predicted adult traits after correcting for childhood traits and suggest further study of variants with SNPxAGE effects as possibly better predictors of adult trait levels above and beyond childhood levels. These results are consistent with the idea that longitudinal studies may be a useful tool to better capture time-dependent variation that could ultimately be better predictive of future outcomes.

Ethics statement
The study was approved by the institutional review board and the ethics committee of each institution. Written informed consent was obtained from each participant in accordance with institutional requirements and the Declaration of Helsinki Principles. All subjects in the BHS gave informed consent at each examination, and for those under 18 years of age, consent of a parent/guardian was obtained. Study protocols were approved by the Institutional Review Board of the Tulane University Health Sciences Center.

The Cardiovascular Risk in Young Finns Study (YF)
The YF cohort is a Finnish longitudinal population study sample on the evolution of cardiovascular risk factors from childhood to adulthood [21]. The first cross-sectional study was conducted in 1980 in five centers and included 3,596 participants in the age groups of 3, 6, 9, 12, 15, and 18, who were randomly chosen from the national population register. After baseline in 1980 these subjects have been re-examined in 1983 and 1986 as young individuals, and in 2001 and 2007 as older individuals. Genotype data for the present analysis (DNA collected in 1980, 2001 and 2007) was available for 2,442 individuals.
In the latest follow-up in 2001, a total of 2,283 participants (of which DNA is available from 2,265 participants) were examined for numerous study variables, including serum lipoproteins, glucose, insulin, obesity indices, blood pressure, life-style factors, smoking status, alcohol use and general health status.

Genotyping & QC
BHS genotyping. We genotyped 1,202 BHS samples using the Illumina Human610 Genotyping BeadChip [22], and HumanCVD BeadChip [11]. Genotypes were called using a clustering algorithm in Illumina's BeadStudio software. Three samples on the Human610 BeadChip gave poor results (call rates ,99%) and were discarded from the study. In addition, 3 samples had a different estimated gender from genotype data versus gender provided with the phenotype data and were also discarded. SNPs with call rates ,90% were discarded, and SNPs with call rates between 90-95% or cluster separation score ,0.3 were manually  Assessing cryptic relatedness. Bogalusa participants with genotype data were filtered for relatedness. Whole-genome genotype data was used to calculate identity-by-descent (PI_HAT) values in PLINK [24]. Individuals were then removed such that no pair of individuals retained a PI_HAT value greater than 0.10. PI_HAT values were consistent with known sibling and half-sibling relationships. The final list consisted of 525 BHS individuals.   In the YF data, there were 546,770 SNPs and 2,496 individuals which were utilized to generate an identity-by-descent (IBD) matrix file in PLINK [24]. There were 51 pairs of individuals with pi-hat greater than 0.2 thus these individuals removed due to possible relatedness. One of the pair was removed using greater missingness as criteria. The final list consisted of 2,442 YF subjects.
Imputation. We imputed genotypes in genotyped BHS individuals for all HapMap (phase II, release 22) SNPs using the program MACH [12]. The best estimate of the quantitative allele dosage was used as the predictor in association tests. The CEU HapMap phased haplotypes were used as a reference (N = 60 unrelated individuals). This resulted in overall allelic error rates of 1.6%. SNPs were filtered for minor allele frequency (,5%) and r 2 with respect to genotyped SNPs (,0.30), resulting in genotype data in a total of 2,173,391 SNPs. Imputation was performed in the YF samples using MACH with the HapMap release 22 CEU haplotypes as reference.

Prediction ability
Previously identified markers were obtained through the NHGRI database [25] (accessed 5/20/09). Marker associations, alleles, and allele frequencies were verified with those reported in the original papers and corrected if required. Markers were used if the alleles at the locus provided unambiguous orientation or if the allele frequencies were different enough between A/T and C/G SNPs to distinguish which allele was the associated allele. We thus excluded any A/T or C/G SNPs with a minor allele frequency .0.4 and required that the allele frequency in the previously reported study be within 10% of the allele frequency in the BHS. We excluded studies of non-European Ancestry origin. One SNP per cytogenic region was used for each phenotype: the SNP with the smallest previously reported p-value was used.
Effect size was translated to percent standard deviation. If the effect size was reported in an absolute measure (e.g. cm for height), then the standard deviation from the BHS study was used. Standard deviation was calculated from the standard error of the SNP association reported in the linear mixed model. For glucose, cholesterol, and triglycerides measures, units were converted to mg/dl before converting to %SD.
A risk value was calculated for each individual based on the imputed genotype and previously reported effect size, converted to %SD. The %SD was multiplied by the allelic dosage for each SNP and summed over all the associated SNPs for each phenotype. The resulting risk value was then used as a predictor for the BHS individuals.

Genome-wide association
GWA was performed using linear mixed model regression with fixed covariates of age and sex, random slope, and random intercept. Genotypes were coded as 0,1, or 2 when the SNP was genotyped and by dosage (scale 0-2) when imputed. Analysis was performed within the nlme package in R [13]. Covariance structures were determined by testing all spatial covariance structures (exponential, Gaussian, linear, rational quadradics, and spherical) with covariates and a sample of SNPs, and picking the structure that best fit the data as measured by the lowest AIC (Akaike Information Criteria) value. SNP and SNPxAGE interaction effects were estimated separately. Although the default nlme optimizer tended to have difficulty converging, we obtained good results by using the optim optimizer on data where all missing data was removed. The number of SNPs that converged and for which we obtained results is listed in Table S1. Analyses were performed on a compute cluster with 600,000 tests taking ,3 hrs on 64 processors.  Filtering for genomic inflation If genomic inflation factors were inflated or deflated, we reran the GWA using the first four MDS components as covariates. If the inflation factor was still less than 0.90 or greater than 1.05, we removed the analysis. In addition, we filtered body mass index (BMI) SNP, BMI SNPxAGE, and weight SNP analyses completely from the analysis due to a combination of consistently inflated or deflated genomic inflation factors or a long list of highly associated SNPs.

Power
Power was calculated using G*Power 3 [26].We used the MANOVA repeated measures module with 8 repeated measures with a correlation of 0.5 between them, similar to the correlations seen in this study. We estimated power for between-factor and between-within interaction effects. Effect size (f) was calculated as f~ffi ffiffiffiffiffiffiffiffiffiffiffiffi R 2 1{R 2 r 26 ½ and R 2 was calculated from the allele frequencies as reported in the original associations (p and q) and the effect size in terms of %SD [27]. Figure S1 Regional plots of top SNP and SNPxAGE associations. Regions are ordered by phenotype and significance as in Table 1 Tables 1, 2, and 3. Chromosomes are plotted in alternating blue and grey. P-values greater than 0.001 are not plotted. Found at: doi:10.1371/journal.pgen.1001094.s002 (8.19 MB PDF) Figure S3 Longitudinal profiles of cumulative score from previously identified SNPs. Individuals were scored based on the effect size of each previously identified marker as in Figure 1B. Individuals are grouped and color-coded based on the decile of their score. Linear lines were calculated using linear regression with all points from all individuals in a given decile.