Genetic Determinants of Lipid Traits in Diverse Populations from the Population Architecture using Genomics and Epidemiology (PAGE) Study

For the past five years, genome-wide association studies (GWAS) have identified hundreds of common variants associated with human diseases and traits, including high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglyceride (TG) levels. Approximately 95 loci associated with lipid levels have been identified primarily among populations of European ancestry. The Population Architecture using Genomics and Epidemiology (PAGE) study was established in 2008 to characterize GWAS–identified variants in diverse population-based studies. We genotyped 49 GWAS–identified SNPs associated with one or more lipid traits in at least two PAGE studies and across six racial/ethnic groups. We performed a meta-analysis testing for SNP associations with fasting HDL-C, LDL-C, and ln(TG) levels in self-identified European American (∼20,000), African American (∼9,000), American Indian (∼6,000), Mexican American/Hispanic (∼2,500), Japanese/East Asian (∼690), and Pacific Islander/Native Hawaiian (∼175) adults, regardless of lipid-lowering medication use. We replicated 55 of 60 (92%) SNP associations tested in European Americans at p<0.05. Despite sufficient power, we were unable to replicate ABCA1 rs4149268 and rs1883025, CETP rs1864163, and TTC39B rs471364 previously associated with HDL-C and MAFB rs6102059 previously associated with LDL-C. Based on significance (p<0.05) and consistent direction of effect, a majority of replicated genotype-phentoype associations for HDL-C, LDL-C, and ln(TG) in European Americans generalized to African Americans (48%, 61%, and 57%), American Indians (45%, 64%, and 77%), and Mexican Americans/Hispanics (57%, 56%, and 86%). Overall, 16 associations generalized across all three populations. For the associations that did not generalize, differences in effect sizes, allele frequencies, and linkage disequilibrium offer clues to the next generation of association studies for these traits.


Introduction
Since its introduction in 2005, the genome-wide association study (GWAS) design has become a powerful tool in human genetics to identify single nucleotide polymorphisms (SNPs) associated with common diseases or traits using an experimental design that does not require a priori biological knowledge. As of September 2010, greater than 1,000 SNPs across the genome have been reported as genome-wide significant (p#5610 28 ) for 165 traits [1]. An early analysis of the GWAS-reported SNPs demonstrated that most identified variants were intergenic or intronic [2], suggesting either novel biology or that the functional variant has yet to be found.
A major goal of the Population Architecture using Genomics and Epidemiology (PAGE) study is to determine whether GWASidentified variants generalize to diverse groups drawn from population-based studies [30]. Generalization is defined here as a significant association (p,0.05, uncorrected for multiple testing) in a non-European population and a direction of genetic effect in the same direction as that of European Americans. In PAGE, variants identified in GWAS and well replicated in multiple studies are chosen for targeted genotyping in hundreds to thousands of European Americans (,20,000), African Americans (,9,000), American Indians (,6,000), Mexican Americans/Hispanics (,2,500), Japanese/East Asians (,690), and Native Hawaiians/Pacific Islanders (,175). All samples are linked to extensive demographic, health, and exposure data, making the PAGE study a rich resource for post-discovery generalization and characterization for common human diseases and traits.
We present here PAGE study data on the replication and generalization for 49 SNPs associated with three common lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides. Each of these three traits has numerous GWAS published in European ancestry individuals [30][31][32][33][34][35][36][37][38][39][40][41][42][43] but only a handful published in other populations (such as Asians [44] and Micronesians [45]). Additional data are just now emerging from large sample sizes of diverse populations for generalization [32,[46][47][48][49][50][51] and fine-mapping [52] of these lipid GWAS-identified SNPs. We demonstrate that the majority of the targeted GWAS-identified SNPs replicate in European Americans in PAGE and that many generalize to diverse populations. Both power and LD are explored as explanations of non-generalization, highlighting the complexities involved in properly interpreting results of even robust genetic associations such as these.

Study population characteristics
The PAGE study sites are diverse across multiple variables (Table 1 and Table S1). Together, the PAGE study consists of several populations: European Americans, African Americans, Mexican Americans/Hispanics, American Indians, Japanese/East Asians, and Native Hawaiians/Pacific Islanders. All PAGE study sites except WHI ascertained both men and women. Participant age varies widely across PAGE. For example, CHS ascertained on average older adults (median age = 74 and 72 years for European and African Americans, respectively), CARDIA ascertained younger adults (median age = 26 and 24.5 years for European and African Americans, respectively), and NHANES ascertained all ages of adults (18 years to 90 years; median age = 51, 39, and 40 years for European, African, and Mexican Americans, respectively). In addition to demographic differences, lifestyles and health differed across the PAGE study sites by population, including lipid lowering medication use and current smoking status. More Japanese participants ascertained by MEC reported lipid lowering medication use compared with other populations ascertained by other PAGE study sites: 38.3% versus ,5-10%. American Indians from the Dakotas reported more smoking (42.2-47.8%) than other American Indians (25-33%) or other PAGE study site populations (6.3% to 35.3%). The differences in demographics, lifestyle, and health characteristics observed across the PAGE study sites and populations are reflected in the three traits studied here (Table S1). Given the diversity observed across the PAGE study sites, we performed all tests of association for HDL-C, LDL-C, and triglycerides unadjusted, minimally adjusted (for age and sex), and adjusted for various demographic, lifestyle, and health variables.

Allele frequencies
Coded allele frequencies are presented in Table 2, Table 3, Table 4 and in Figure S1, by population. We calculated the Pearson correlation coefficient (r) and F ST between European American coded allele frequencies and all other groups. The highest correlation was observed in the comparison with Mexican Americans/Hispanics (0.97) followed by American Indians (0.92), Native Hawaiians/Pacific Islanders (0.90), Japanese/East Asians (0.87), and African Americans (0.84). Compared with European Americans, the proportion of SNPs with F ST values greater than 0.15 was smallest in Mexican Americans/Hispanics (0/49 SNPs) and largest in African Americans (6/49 SNPs; 12%) followed by Japanese/East Asians (5/46 SNPs, 11%). F ST values were small for the remaining populations compared to European Americans, with 3% and 7% of SNPs with F ST values greater than 0.15 for American Indians and Native Hawaiians/Pacific Islanders, respectively.
A striking example of population differences in allele frequencies is FADS1 rs174547. The T allele of FADS1 rs174547 is the major allele in three populations (allele frequency = 0.66, 0.91, and 0.59 in European Americans, African Americans, and Japanese/East Asians, respectively), but is the minor allele in the other three populations (allele frequency = 0.39, 0.21, and 0.42 in Mexican Americans/Hispanics, American Indians, and Native Hawaiians/ Pacific Islanders, respectively). Compared to European Americans, F ST for this SNP was largest in American Indians (0.34) followed by African Americans (0.15).
We also compared allele frequencies between the various PAGE study sites, within each racial/ethnic group. As demonstrated in Figure S2, the allele frequencies of European Americans, African Americans, and Mexican Americans/Hispanics do not differ substantially across PAGE studies (allele frequencies differ by less than 60.10). In contrast, over half of the SNPs genotyped in American Indians had allele frequency differences greater than 60.10, with three SNPs with allele frequencies that differed by more than 60. 25. Comparisons are more difficult in Japanese/ East Asians and Native Hawaiians/Pacific Islanders, as many SNPs were genotyped by only one PAGE study in these two racial/ethnic groups.

Author Summary
Low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglyceride (TG) levels are well known independent risk factors for cardiovascular disease. Lipid-associated genetic variants are being discovered in genome-wide association studies (GWAS) in samples of European descent, but an insufficient amount of data exist in other populations. Therefore, there is a strong need to characterize the effect of these GWASidentified variants in more diverse cohorts. In this study, we selected over forty genetic loci previously associated with lipid levels and tested for replication in a large European American cohort. We also investigated if the effect of these variants generalizes to non-European descent populations, including African Americans, American Indians, and Mexican Americans/Hispanics. A majority of these GWAS-identified associations replicated in our European American cohort. However, the ability of associations to generalize across other racial/ethnic populations varied greatly, indicating that some of these GWAS-identified variants may not be functional and are more likely to be in linkage disequilibrium with the functional variant(s).   Table 4. Meta-analysis of GWAS-identified Triglyceride SNPs. respectively, across European American populations collected by individual PAGE study sites (Table S2). For HDL-C, 23 of the 27 (85%) SNPs tested were associated at p,0.05 assuming an additive genetic model and adjusting for age and sex ( Figure 1 and Table 2). The four SNPs that did not replicate at this liberal significance threshold were rs471364 (TTC39B), rs1883025 (ABCA1), rs4149268 (ABCA1), and rs1864163 (CETP), all of which are intronic (Table S2). For LDL-C, only one (intergenic MAFB rs6102059) of the 19 SNPs tested was not significantly associated at p,0.05 ( Figure 1 and Table 3). Finally, for ln(TG), all 14 SNPs tested were associated at p,0.05 ( Figure 1 and Table 4). Each SNP was tested for an association with the indicated trait assuming an additive genetic model adjusted for age and sex. Meta-analysis was performed, and p-values (2log 10 transformed) of the meta-analysis are plotted along the y-axis using Synthesis-View [73,74]. SNP location is given on the x-axis. Each triangle represents a meta-analysis p-value for each population. Populations are color-coded as follows: European Americans (blue; EA), African Americans (red; AA), Mexican Americans/Hispanics (orange; MA/H), and American Indians (purple; AI). Large triangles represent p-values at or smaller than genome-wide significance (p, 10 28  Of the associations that did not replicate in the Europeandescent populations from PAGE, four out of five had sufficient power (.80%) to detect the previously reported effect size: TTC39B rs471364 (.99% power; HDL-C), CETP rs1864163 (80% power; HDL-C); MAFB rs6102059 (.90% power; LDL-C), and ABCA1 rs4149268 (99% power; HDL-C). ABCA1 rs1883025, which did not replicate the expected association with HDL-C, did not have sufficient power to detect the reported effect size (68% power; n = 3,865).
We then compared the genetic effect sizes reported in the literature to the genetic effect sizes estimated from the metaanalysis of these population-based studies. We observed that the majority of the point estimates of effect size (b) were smaller than previously reported estimates. Using the HDL-C association results as an example, 15 out of the 23 (65%) significant associations had effect estimates smaller than published effect estimates. We caution, however, that we did not formally test for significant differences between estimates and that these smaller effect estimates may or may not be significantly different than the published reports. However, it is interesting to note that 11 of our effect estimates differed from previous reports by more than 25%, including two HDL-C associations whose effect sizes differed by 50% or more from those in the literature (ANGPTL4 rs2967605 and MLXIPL rs17145738; Table 2 and Table S2).

Associations in non-European-descent populations
We meta-analyzed tests of association performed in African Americans for the same 27, 19, and 14 SNPs previously associated with HDL-C, LDL-C, and/or triglycerides in populations of European-descent. For all three traits studied, assuming an additive genetic model and adjusting for age and sex, approximately half of the tested GWAS-identified SNPs were associated at p,0.05: 12/27 (44%) for HDL-C, 11/19 (58%) for LDL-C, and 8/14 (57%) for ln(TG) (Figure 1, Figure S3, Table 2, Table 3, Table 4, Table 5). The majority of SNPs that failed to replicate in the meta-analysis for European Americans also failed to associate in the meta-analysis for African Americans. Interestingly, one SNP (CETP rs1864163) was significantly associated with HDL- Other populations that were examined for select SNPs included American Indians, Mexican Americans/Hispanics, Japanese/East Asians, and Native Hawaiians/Pacific Islanders. Among American Indians, 9/21 (43%), 10/14 (71%), and 10/13 (77%) of the SNPs tested for association with HDL-C, LDL-C, and ln(TG), respectively, were associated at the liberal significance threshold of p,0.05. For Mexican Americans/Hispanics, 14/27 (52%), 10/19 (53%), and 12/14 (86%) SNPs were significantly associated at p,0.05 with HDL-C, LDL-C, and ln(TG), respectively. Despite a small sample size, intronic CETP rs1864163 was significantly associated with HDL-C in Mexican Americans/Hispanics (n = 265; CAF = 0.28; b = 22.98; p = 1.78610 22 ) but not in European Americans (n = 291; CAF = 0.27; b = 22.07; p = 0.13), although the size and the direction of effect were similar. Venn diagrams representing the overlap of significant associations across the four major PAGE populations are presented in Figure S3.
The sample sizes for Japanese/East Asians and Native Hawaiians/Pacific Islanders are considerably smaller compared with the other populations examined. Despite the lower power to detect associations, significant associations were observed for both groups at a liberal significance threshold of p,0.05. Among the 26, 18, and 13 SNPs tested for associations with HDL-C, LDL-C, and ln(TG), respectively, there were nine (35%), three (17%), and three (23%) SNPs significantly associated in the combined Japanese/ East Asian group.
For Native Hawaiians/Pacific Islanders, the group with the smallest sample size considered here, one SNP each was associated with HDL-C (APOA1/C3/A4/A5 gene cluster rs28927680) and LDL-C (APOB rs754523) out of the 24 and 18 SNPs tested for association, respectively. Three out of 12 SNPs tested for an association with ln(TG) were associated at p,0.05 (PLTP rs7679, MLXIPL rs17145738, and APOA1/C3/A4/A5 gene cluster rs289 27680), with the latter at a significance of p,10 219 .

Generalization across non-European-descent populations
For the 55 SNP-trait associations that replicated in European Americans, we determined which associations generalized across all four of our largest populations (European Americans, African Americans, American Indians, and Mexican Americans/Hispanics). Generalization was based on two criteria: 1) level of significance (i.e. p-value) and 2) direction of effect (i.e. positive or negative beta). SNPs that were significantly associated at p,0.05 and had the same direction of effect as European Americans in all populations studied were considered to have generalized. For HDL-C, five SNPs (CETP rs3764261, LPL rs6586891, LIPC rs4775041, LPL rs2197089, and APOA1/C3/A4/A5 gene cluster rs3135506) met these criteria, and two SNPs (LCAT rs2271293 and LPL rs328) were associated in three groups and trended towards significance in a fourth group (p = 0.06 and p = 0.07 in Mexican Americans/Hispanics and American Indians, respectively; Table 2).

Power
Based on our definition of generalization, several SNPs discovered and replicated in European-descent populations failed to generalize to other populations. There are several possible explanations for non-generalization, including power. To further investigate potential lack of power, we first performed post-hoc power calculations assuming an additive genetic model and liberal significance threshold (0.05) in each racial/ethnic group for each test of association. In these power calculations, we further assumed the observed genetic effect size (beta) from PAGE European Americans and the observed allele frequency, sample sizes, and trait mean/standard deviations from each non-European American population. By adding the power of all tested loci, we estimated the number of expected significant associations and compared this to the number of observed significant associations (Table 5).
In general, the number of expected significant associations was greater than the number observed. African Americans consistently had fewer significant associations (11,11, and 8 for HDL-C, LDL-C, and ln(TG), respectively) than expected (17.3, 14.7, and 11.9 for HDL-C, LDL-C, and ln(TG), respectively) based on power, regardless of the lipid trait being tested. More specifically, we were powered to detect in African Americans 17 of the 25 associations that replicated in European Americans but failed to generalize to African Americans.
Compared to African Americans, differences between the observed and the expected number of associations for American Indians and Mexican Americans/Hispanics were less extreme. In fact, for ln(TG), more significant associations were detected in these two populations than the PAGE study was powered to detect (8.4 and 10.4 expected; 10 and 12 observed for American Indians and Mexican Americans/Hispanics, respectively; Table 5). We were powered to detect in American Indians nine of the 18 associations that replicated in European Americans but did not generalize to American Indians. Similarly, we were powered to detect in Mexican Americans/Hispanics eight of the 20 associations that replicated in European Americans but failed to generalize to Mexican Americans/Hispanics.

Linkage disequilibrium
To examine whether LD can account for the lack of generalization of the properly powered tests of association in African Americans, we examined LD patterns in HapMap Europeans (CEU) and West Africans (YRI) as well as those published in the literature for the genotyped SNPs and surrounding variation. For APOA1/C3/A4/A5 rs28927680, previous studies in Europeandescent populations have noted that this SNP is in strong LD (r 2 = 0.98) with missense APOA5 rs3135506 [42]. APOA1/C3/A4/ A5 rs964184 is also in moderate LD with missense rs3135506 (r 2 = 0.510 in CEU). However, neither rs28927680 nor rs964184 are in LD with missense rs3135506 (r 2 = 0.039 and r 2 = 0.048) in YRI. Furthermore, APOA5 rs3135506 is associated with HDL-C in European Americans, African Americans, Mexican Americans/ Hispanics, and American Indians ( Table 1 and Table 2). Generalization of rs3135506 coupled with non-generalization and differences in YRI LD patterns for rs28927680 and rs964184 suggest that APOA5 rs3135506 is either the putative functional SNP for the association with HDL-C or in LD with the functional SNP. Although the exact mechanism is not yet known, molecular modeling [53] as well as in vitro [53] and in vivo [54,55] studies support the epidemiologic evidence that rs3135506 is functional.
Other interpretations of LD patterns are more difficult. For example, CETP rs9989419, which failed to generalize in African Americans for HDL-C despite sufficient power, is not in strong LD with obvious functional SNPs in CEU within 50 kb flanking the genotyped SNP. The strongest pair-wise LD (r 2 = 0.251) consists of intergenic and intronic SNPs, and these same SNPs have weak LD (r 2 ,0.03) or are not found in YRI. Similarly, LIPC rs261332 associated with HDL-C levels in European Americans but failed to generalize in African Americans. LIPC rs261332 is in strong LD (r 2 .0.80 in CEU) with SNPs in the 59 flanking region of LIPC, but not in LD with these same SNPs in YRI (r 2 ,0.15).

Adjustments for exposures and co-morbidities
Genetic variations in isolation are not the sole determinants of lipid trait distributions. Many environmental exposures and demographic variables are associated with lipid traits. To account for these variables, we meta-analyzed all tests of association for HDL-C, LDL-C, and ln(TG) adjusted for age, sex, body mass index, current smoking, type 2 diabetes, post-menopausal status, and current hormone use. Adjustment for these additional covariates did not appreciably alter the results compared with the models minimally adjusted for age and sex ( Figures S4, S5, S6). Inclusion of previous myocardial infarction as a variable to the fully adjusted model also did not appreciably alter the results compared with the minimally adjusted models (Figures S4, S5, S6).

Effect of including versus excluding by medication use
All analyses presented thus far include fasting adult participants regardless of lipid lowering medication use. Many GWAS conducted for the lipid traits excluded participants on lipid lowering medication [40,42,43] given that these medications substantially lower LDL-C levels. We have included these participants for analysis as participants on lipid lowering medication could represent the upper extreme of the normal LDL-C distribution associated with a genetic profile found in a general population. Exclusion of these participants would preclude these meta-analyses from fully describing the extent and strength of associations relevant to these traits in a population-based setting. However, if genetic variation is associated with lipid concentrations and medication use lowers lipid concentrations, inclusion of participants on lipid lowering medications could bias associations towards the null. As a sensitivity analysis, WHI used detailed medication data available on a subset of participants, and performed the tests of association for HDL-C, LDL-C, and ln(TG) excluding and including participants on lipid lowering medication with the latter adjusted for medication usage using average effects estimated in Wu et al [56] for specific drug classes. Figure S7 suggests that both the point estimates and the confidence intervals of the genetic effects are similar for this female-only study whether participants are excluded or included and adjusted for medication use.
We also performed a second sensitivity analysis: tests of association excluding participants on lipid lowering medication for all models. As detailed in Figures S8, S9, S10, excluding participants on lipid lowering medication usage does not appreciably alter the results, with the possible exception of LDL-C associations in Japanese/East Asians. More specifically, two SNPs (rs11206510 and rs1501908) became significantly associated with LDL-C after excluding participants on medications while two other SNPs (rs562338 and rs6544713) were no longer significantly associated ( Figure S9). The difference in significance for these four tests of association may be related to lipid lowering medication use; however, it is more likely due to statistical fluctuations from small samples sizes (n Include = 690; n Exclude = 467). Also of note, use of lipid-lowering medications was low (,10%) in the ARIC, CHS, NHANES, and WHI studies since the majority of study recruitment occurred before the introduction or widespread use of the recent generation of lipid-lowering medications. Medication use was higher in the MEC study (20-38% depending on the population), which contributed the majority of Japanese/East Asian samples.

Discussion
We have performed an extensive replication and generalization effort for HDL-C, LDL-C, and TG GWAS-identified SNPs. The PAGE study consists of six racial/ethnic groups: European American, African American, Mexican American/Hispanic, American Indian, Japanese/East Asian, and Native Hawaiian/Pacific Islander, with population-specific sample sizes ranging from ,100 to .20,000 for any one test of association. Although power to detect associations varied across the lipid traits and populations, we observed general patterns worth noting for future genetic epidemiological studies.

Replication in European-descent populations
Perhaps not unexpectedly, we were able to replicate most reported associations in European Americans. Regardless of significance, all but one of the tested SNPs had effect estimates in the same direction as the previously reported association from the literature. FADS1 rs174547, which was significantly associated with decreased ln(TG) in this meta-analysis for European Americans, was associated with increased TG in European Americans from the Framingham Heart Study (n = 7,423) [43]. HDL-C had proportionally (15%) the greatest number of SNPs that failed to replicate in European Americans compared with LDL-C (5%) and TG (0%) despite the fact that we had sufficient power to detect the reported genetic effect size for many of these tests. TTC39B rs471364 was not associated with HDL-C levels despite a sample size of 18,089 and .99% power to detect the reported effect size. Neither ABCA1 rs4149268 nor rs1883025 was associated with HDL-C, although the latter test of association was underpowered (68%; n = 3,865). Finally, as previously discussed, CETP rs186 4163 was not associated with HDL-C in this European American dataset although we had 80% power to detect the reported genetic effect size. For LDL-C, only MAFB rs6102059 was not associated despite .90% power to detect the reported effect size.
The reasons for non-replication in this European American dataset for properly powered tests of association are unclear. It is possible that we have overestimated our power to detect reported associations. The ''winner's curse'' and inflated genetic effect estimates from initial discovery are well known [57,58]. Indeed, for the five SNPs that did not replicate in this meta-analysis for European Americans, the association was described in only one GWAS each despite the fact that numerous GWAS [31,[33][34][35][36][37][38][39][40][41][42][43] and a large meta-analysis [32] for these three traits have been conducted in populations of European-descent. The meta-analysis recently reported by Teslovich et al [32] did report significant associations between TTC39B rs581080 for HDL-C and MAFB rs2902940 for LDL-C. TTC39B rs581080 is in moderate linkage disequilibrium (LD) with rs471364 (r 2 = 0.49 in CEU HapMap), but MAFB rs2902940 is not in LD with rs6102059 (r 2 = 0.03 in HapMap CEU).
A second possibility for our observed non-replication is heterogeneity among the PAGE studies. Because it is important to understand the degree to which associations are consistent across individual studies, we compared directions of effect (betas) across PAGE study sites for each test of association (Figures S11, S12, S13) and performed tests of heterogeneity. Association results for TTC39B rs471364, which meta-analysis result for HDL-C in European Americans was insignificant, had significant evidence for heterogeneity across studies (p heterogeneity = 0.048; I 2 = 58.25%). In four of the five PAGE study sites, the association between this SNP and HDL-C had consistent directions of effect; however, only one test of association was significant in European Americans (p = 0.005 in EAGLE; Figure S11). Only two other association results had evidence for heterogeneity among European Americans: FADS1 rs174547 for HDL-C (p heterogeneity = 0.006; I 2 = 75.73%) and PCSK9 rs11206510 for LDL-C (p heterogeneity = 0.048; I 2 = 55.34%). However, for both of these loci, the tests of association were significant in European Americans and had similar directions of effect in all but one of the PAGE study sites (Figures S11 and S12).

Generalization to non-European populations
When taking into account power, significance, and direction of effect, most SNPs discovered in European Americans generalized to African Americans, Mexican Americans, and American Indians. Of note are the eleven tests of association significant in European Americans that did not generalize to African Americans despite having adequate power. Given that GWAS products are a mixture of tagSNPs and functional SNPs, it is likely that discovery in European Americans represents tagSNPs rather than the true functional SNP. Because linkage disequilibrium patterns differ across populations, tagSNPs genotyped directly in populations of non-European descent may not recapitulate the association observed in European-descent populations depending on the pattern of LD. The association of HDL-C and nonsynonymous rs3135506 versus tagSNPs rs28927680 in the APOA1/C3/A4/ A5gene cluster in this analysis is an example of the effects of LD and the ability to generalize across populations.
Evoking LD as an explanation for lack of generalization is appealing, but it does have limitations given that the functional SNP is not often obvious. All tests of association that did not generalize to African Americans had evidence of LD differences between CEU and YRI using the HapMap data. However, most of these SNPs are located in the intergenic and intronic regions.
Further fine-mapping in both the discovery population as well as other diverse populations will be needed along with a better understanding of genetic variation and its relationship to biological function to identify the true functional SNPs for these traits.
Among the five putative functional SNPs genotyped (nonsynonymous rs11591147, rs1260326, rs3135506, and rs1800961 and nonsense rs328), all five replicated in populations of Europeandescent, and three of the five generalized to populations of non-European descent. One putative functional SNP that did not replicate across populations was HNF4A rs1800961, likely due to low power because of the very low minor allele frequency in all subpopulations (0.0065 to 0.0398). Both the direction and magnitude of effect, however, were consistent across groups. GCKR rs1260326 did not generalize to all populations of non-European descent but did generalize in three of the four populations tested and trended towards significance in American Indians (p = 0.085; Table 4).

Limitations and strengths
The major strengths and limitations of the PAGE study for lipids are sample size and diversity. The largest sample size is for samples of European-descent (,20,000), followed by African Americans and American Indians. The sample sizes for Mexican Americans, Japanese/East Asians, and Pacific Islanders/Native Hawaiians are smaller and consequently underpowered for tests of association as estimated from genetic effect sizes in the published European-descent discovery studies. Also, not all SNPs were genotyped in all PAGE studies, further affecting the power of the meta-analyses.
An additional limitation is the lack of data related to lipid lowering medication. Ideally, all analyses would be adjusted for use of lipid lowering medication based on the type and dose of medication. In most PAGE studies, these data were not available and in many, use was low at baseline when blood samples were obtained. As we demonstrate in Supplementary material, inclusion of participants using lipid-lowering medication did not appreciably alter the results of the meta-analysis when compared with excluding these participants. While this finding may be useful for future studies, we caution that the majority of participants in this study were not on lipid lowering medications.
In general, the cohorts and surveys included in PAGE are diverse with regard to demographics, genetic ancestry, lifestyle, health, and environmental exposure. Despite this diversity, very few tests of association from the meta-analysis exhibited evidence of heterogeneity.

Conclusions
Overall, the majority of GWAS-identified SNPs for HDL-C, LDL-C, and TG replicated in European Americans and generalized to non-European-descent populations. These results suggest that the genotyped SNP either tags the functional SNP(s) common across these populations or that the genotyped SNP represents the risk SNP directly. SNPs that replicated in European Americans but did not generalize in the largest non-European-descent populations, despite adequate power, could represent priority associations that require fine-mapping and re-sequencing to identify the functional variant(s).

Study populations and phenotypes
All studies were approved by Institutional Review Boards at their respective sites (details are given in Text S1). PAGE study samples were drawn from four large population-based studies or consortia: EAGLE (Epidemiologic Architecture for Genes Linked to Environment), based on three National Health and Nutrition Examination Surveys (NHANES) [59][60][61], the Multiethnic Cohort (MEC) [62], the Women's Health Initiative (WHI) [63,64], and Causal Variants Across the Life Course (CALiCo), a consortium of several cohort studies: Atherosclerosis Risk in Communities Study (ARIC) [65], Coronary Artery Risk in Young Adults (CARDIA) [66], Cardiovascular Health Study (CHS) [67], Strong Heart Family Study (SHFS) [68], and Strong Heart Cohort Study (SHS) [69] ( Table 1). The PAGE study design is detailed in Matise et al [30].
Serum HDL-C, triglycerides, and total cholesterol were measured using standard enzymatic methods. LDL-C was calculated using the Friedewald equation [30,70], with missing values assigned for samples with triglyceride levels greater than 400 mg/dl. For PAGE study sites with longitudinal data, the baseline measurement was used for analysis. A full description of each study, along with population-specific study characteristics, is presented in Text S1 and Table S1.

SNP selection and genotyping
All SNPs considered for genotyping were previously associated with HDL-C, LDL-C, and/or triglycerides in published (as of 2008) candidate gene and genome-wide association studies. A total of 52 SNPs were targeted for genotyping by two or more PAGE study sites. There is no overlap between samples used in this study and samples used in GWAS from which the SNPs were selected. The 52 targeted variants are located in or nearby 32 different genes/gene regions, with 12 of the gene/gene regions represented by two or more SNPs. Five SNPs are nonsynonymous, one SNP is a nonsense variant, and two SNPs are synonymous; the remainder are located in introns, flanking, or intergenic regions. The full list of targeted SNPs, their locations, and their previously associated lipid trait can be found in Table S2.
Cohorts and surveys were genotyped using either commercially available genotyping arrays (Affymetrix 6.0, Illumina 370CNV BeadChip), custom mid-and low-throughput assays (TaqMan, Sequenom, Illumina GoldenGate or BeadXpress), or a combination thereof. Quality control was implemented at each study site independently. In addition to site-specific quality control, all PAGE study sites genotyped 360 DNA samples from the International HapMap Project and submitted these data to the PAGE Coordinating Center for concordance statistics [71]. Study specific genotyping details are described in Text S1. Of the 52 targeted SNPs, three (CETP rs1800775, APOE rs429358, and APOE rs7412) failed at all PAGE study sites that attempted genotyping; therefore, a total of 49 SNPs were tested in this analysis.

Statistical methods
All tests of association were performed by each PAGE study site using the same analysis protocol prior to meta-analysis. The study protocol excluded participants ,18 years of age as well as nonfasting samples (defined here as ,8 hours). When triglyceride level was the dependent variable, participants with .1,000 mg/dl were excluded from analyses. Triglyceride (TG) levels were natural-log transformed (ln) prior to analysis.
Linear regression was performed for fasting adults regardless of lipid lowering medication use with HDL-C, LDL-C, or ln(TG) as the dependent variable and a SNP as the independent variable, assuming an additive genetic model, stratified by race/ethnicity. The coded allele is reported in Table 2, Table 3, Table 4. The beta estimate is per additional copy of the coded allele. For each SNP, four models were considered: 1) unadjusted, 2) adjusted for age (continuous in years) and sex, 3) adjusted for age, body mass index (continuous in kg/m 2 ), current smoking (yes/no; binary), type 2 diabetes (yes/no; binary), post-menopausal status (yes/no for females only; binary), and current hormone use (yes/no for females only; binary), and 4) adjusted for age, body mass index, current smoking, type 2 diabetes, post-menopausal status, current hormone use, and previous myocardial infarction (yes/no; binary). All PAGE study sites (except for WHI, which is female only) stratified models 3 and 4 by sex given the sex-specific variables (post-menopausal status and hormone use) prior to meta-analysis. Select PAGE study sites also included study site or site of ascertainment as a covariate in all models. Results from Model 2 (adjusted for age and sex) are reported in the main text while results from Models 1, 3, and 4 are presented in Figures S4, S5, S6. Model 2 excluding participants on lipidlowering medications are presented in Figures S8, S9, S10.
Meta-analyses, using a fixed-effects inverse-variance weighted approach and tests for effect size heterogeneity across studies, were performed using METAL [72]. P-values were not adjusted for multiple testing, and association results were plotted using Synthesis-View [73,74], where indicated. Power calculations were performed using Quanto [75,76] assuming unrelated participants, an additive genetic model, the published effect size from European-descent populations listed in Table S1, and the populationspecific allele frequencies listed in Table 2, Table 3, Table 4. Linkage disequilibrium was calculated using HapMap European (CEU) and West African (YRI) data accessed through the Genome Variation Server. F ST was calculated using the Weir and Cockerham algorithm [77]. Aggregate data from the meta-analysis as well as individual tests of association from each PAGE study site will be made available via dbGaP [30,78].

Web resources
NHGRI GWAS Catalog (www.genome.gov/GWAStudies). Genome Variation Server (pga.gs.washington.edu). Synthesis-View (http://chgr.mc.vanderbilt.edu/ritchielab/method. php?method = synthesisview).  Figure S4 Comparison of unadjusted, minimally adjusted, adjusted models for HDL-C, by population. Results of tests of association for four regression models are plotted: model 1 (unadjusted), model 2 (adjusted for age and sex; and site of ascertainment for select PAGE studies), model 3 (adjusted for age, sex, body mass index, current smoking, type 2 diabetes, postmenopausal status, and current hormone use), and model 4 (model 3 with the addition of previous myocardial infarction). Each SNP was tested for an association with HDL-C. Meta-analysis was performed, and p-values (2log 10 transformed) of the meta-analysis are plotted along the y-axis. SNP location is given on the x-axis. Each triangle represents a meta-analysis p-value for each population. Models are color coded. Large triangles represent p-values at or smaller than genome-wide significance (p,10 28 ). The direction of the arrows corresponds to the direction of the beta coefficient. The exact beta coefficients are reported on the bottom panel. The significance threshold is indicated by the red bar at p = 0.05. (DOCX) Figure S5 Comparison of unadjusted, minimally adjusted, adjusted models for LDL-C, by population. Results of tests of association for four regression models are plotted: model 1 (unadjusted), model 2 (adjusted for age and sex; and site of ascertainment for select PAGE studies), model 3 (adjusted for age, sex, body mass index, current smoking, type 2 diabetes, post-menopausal status, and current hormone use), and model 4 (model 3 with the addition of previous myocardial infarction). Each SNP was tested for an association with LDL-C. Meta-analysis was performed, and p-values (2log 10 transformed) of the meta-analysis are plotted along the y-axis. SNP location is given on the x-axis. Each triangle represents a meta-analysis p-value for each population. Models are color coded. Large triangles represent p-values at or smaller than genome-wide significance (p,10 28 ). The direction of the arrows corresponds to the direction of the beta coefficient. The exact beta coefficients are reported on the bottom panel. The significance threshold is indicated by the red bar at p = 0.05. (DOCX) Figure S6 Comparison of unadjusted, minimally adjusted, adjusted models for triglyceride concentrations, by population. Results of tests of association for four regression models are plotted: model 1 (unadjusted), model 2 (adjusted for age and sex; and site of ascertainment for select PAGE studies), model 3 (adjusted for age, sex, body mass index, current smoking, type 2 diabetes, post-menopausal status, and current hormone use), and model 4 (model 3 with the addition of previous myocardial infarction). Each SNP was tested for an association with triglycerides. Meta-analysis was performed, and p-values (-log 10 transformed) of the meta-analysis are plotted along the y-axis. SNP location is given on the x-axis. Each triangle represents a meta-analysis pvalue for each population. Models are color coded. Large triangles represent p-values at or smaller than genome-wide significance (p,10 28 ). The direction of the arrows corresponds to the direction of the beta coefficient. The exact beta coefficients are reported on the bottom panel. The significance threshold is indicated by the red bar at p = 0.05. (DOCX) Figure S7 Comparison of genetic effect estimates when participants are excluded or included based on medication use with adjustments in WHI. Genetic effect estimates (b) and 95% confidence interval are plotted for each SNP tested for an association.

Supporting Information
The tests of association were performed on fasting European Americans adjusted for age and sex and excluding participants on lipid lowering medication (blue), including all participants regardless of medication use (green), and all participants on lipid lowering medication, adjusted for the average HDL-C, LDL-C, and ln(TG) effects estimated by Wu  . Each SNP was tested for an association with HDL-C, adjusted for age and sex (Model 2), including fasting adults on lipid lowering medications. SNP location is given on the x-axis and p-values (2log 10 transformed) are plotted along the y-axis. Each triangle represents a p-value for each PAGE study. PAGE study sites are color coded. Large triangles represent p-values at or smaller than genome-wide significance (p,10 28 ). The direction of the arrows corresponds to the direction of the beta coefficient. The exact beta coefficients are reported on the bottom panel. The significance threshold is indicated by the red bar at p = 0.05. (DOCX) Figure S12 Comparison of LDL-C associations across PAGE study sites, by population. Results of tests of association for the various PAGE study sites are plotted (where available) along with meta-analysis results (META): Atherosclerosis Risk in Communities (ARIC), Coronary Artery Risk in Young Adults (CARDIA), Cardiovascular Heart Study (CHS), Epidemiologic Architecture for Genes Linked to Environment (EAGLE), Multiethnic Cohort (MEC), Women's Health Initiative (WHI), Strong Heart Community Study (SHCS), and Strong Heart Family Study (SHFS) in Arizona(AZ), Oklahoma (OK) and South Dakota (SD). Each SNP was tested for an association with LDL-C levels, adjusted for age and sex (Model 2), including fasting adults on lipid lowering medications. SNP location is given on the x-axis and p-values (2log 10 transformed) are plotted along the y-axis. Each triangle represents a p-value for each PAGE study. PAGE study sites are color coded. Large triangles represent p-values at or smaller than genome-wide significance (p, 10 28 ). The direction of the arrows corresponds to the direction of the beta coefficient. The exact beta coefficients are reported on the bottom panel. The significance threshold is indicated by the red bar at p = 0.05. (DOCX) Figure S13 Comparison transformed triglyceride associations across PAGE study sites, by population. Results of tests of association for the various PAGE study sites are plotted (where available) along with meta-analysis results (META): Atherosclerosis Risk in Communities (ARIC), Coronary Artery Risk in Young Adults (CARDIA), Cardiovascular Heart Study (CHS), Epidemiologic Architecture for Genes Linked to Environment (EAGLE), Multiethnic Cohort (MEC), Women's Health Initiative (WHI), Strong Heart Community Study (SHCS), and Strong Heart Family Study (SHFS) in Arizona(AZ), Oklahoma (OK) and South Dakota (SD). Each SNP was tested for an association with naturallog transformed triglyceride levels, adjusted for age and sex (Model 2), including fasting adults on lipid lowering medications. SNP location is given on the x-axis and p-values (-log 10 transformed) are plotted along the y-axis. Each triangle represents a p-value for each PAGE study. PAGE study sites are color coded. Large triangles represent p-values at or smaller than genome-wide significance (p,10 28 ). The direction of the arrows corresponds to the direction of the beta coefficient. The exact beta coefficients are reported on the bottom panel. The significance threshold is indicated by the red bar at p = 0.05. (DOCX) Table S1 Study characteristics by PAGE study and population. Descriptive statistics for fasting ($8 hours) adults ($18 years of age) are expressed as percentage, median, and standard deviation (SD) for each variable. (DOCX)

Table S2
List of candidate gene and GWAS-identified SNPs targeted for genotyping in PAGE. For each SNP (denoted by rs number), we list the chromosomal and genomic location, the putative function of the SNP (based on SNP location) and the nearest gene, the number of PAGE studies that genotyped the SNP, the trait associated with the SNP based on the literature, the effect allele and effect size based on the literature, and the reference for these data.