Trans-Ethnic Fine-Mapping of Lipid Loci Identifies Population-Specific Signals and Allelic Heterogeneity That Increases the Trait Variance Explained

Genome-wide association studies (GWAS) have identified ∼100 loci associated with blood lipid levels, but much of the trait heritability remains unexplained, and at most loci the identities of the trait-influencing variants remain unknown. We conducted a trans-ethnic fine-mapping study at 18, 22, and 18 GWAS loci on the Metabochip for their association with triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C), respectively, in individuals of African American (n = 6,832), East Asian (n = 9,449), and European (n = 10,829) ancestry. We aimed to identify the variants with strongest association at each locus, identify additional and population-specific signals, refine association signals, and assess the relative significance of previously described functional variants. Among the 58 loci, 33 exhibited evidence of association at P<1×10−4 in at least one ancestry group. Sequential conditional analyses revealed that ten, nine, and four loci in African Americans, Europeans, and East Asians, respectively, exhibited two or more signals. At these loci, accounting for all signals led to a 1.3- to 1.8-fold increase in the explained phenotypic variance compared to the strongest signals. Distinct signals across ancestry groups were identified at PCSK9 and APOA5. Trans-ethnic analyses narrowed the signals to smaller sets of variants at GCKR, PPP1R3B, ABO, LCAT, and ABCA1. Of 27 variants reported previously to have functional effects, 74% exhibited the strongest association at the respective signal. In conclusion, trans-ethnic high-density genotyping and analysis confirm the presence of allelic heterogeneity, allow the identification of population-specific variants, and limit the number of candidate SNPs for functional studies.


Introduction
Genome-wide association studies (GWAS) have identified many common genetic variants associated with human diseases and complex traits (www.genome.gov/gwastudies), including ,100 loci associated with triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), or total cholesterol [1][2][3][4][5]. A majority of the lead SNPs at these loci have shown small effect sizes, leaving much of the trait heritability unexplained. Some of this missing heritability may be due to the incomplete coverage of functional common or rare variants and the poor representation of appropriate proxies on commercial genotyping arrays [6,7]. Other missing heritability may result from a failure to detect the full spectrum of causative variants present at GWAS-identified loci.
Fine-mapping of GWAS signals should increase the power to detect variants that influence trait variability. Genotyping of additional variants at GWAS loci can identify SNPs with stronger evidence of association than the reported GWAS index SNPs and may help detect or further localize the underlying causal variants [7,8]. The Metabochip is a high-density custom genotyping array designed to replicate and fine-map known GWAS signals for metabolic and atherosclerotic/cardiovascular endpoints, and more extensively, to identify all signals around the index SNPs [9,10]. The fine-mapping SNPs spanned a wide range of allele frequencies including rare (minor allele frequency (MAF),0.005) and less common (0.005#MAF,0.05) SNPs selected from the catalogs of the International HapMap Project and the August 2009 release of the 1000 Genomes Project. SNPs annotated as nonsynonymous, essential splice site or stop codon were included regardless of MAF, design score, or the presence of nearby SNPs [10]. The Metabochip contains densely spaced SNPs at 18, 22, and 18 loci previously reported for TG, HDL-C, and LDL-C, respectively. Allelic heterogeneity, in which different variants at the same gene/locus affect the same phenotype, is a frequent characteristic of both single-gene and complex disorders. Recently GWAS have identified more than one independent signal at loci associated with coronary artery disease [11] and type 2 diabetes [12,13]. Among a set of 30 lipid loci reported through GWAS, secondary SNPs that exhibited weak to moderate LD with the corresponding index SNPs and displayed little change of association in conditional analyses were detected at seven loci including CETP, LIPC, APOA5, APOE, LDLR, ABCG8, and LPL [4]. More than one association signal also was detected at 26 of 95 lipid loci reported by the Global Lipids Genetics Consortium [5]. However, allelic heterogeneity has not been comprehensively evaluated for common traits including lipid traits across ethnically diverse populations, especially in non-European populations such as African Americans and East Asians.
Due to divergent evolutionary and migratory histories, patterns of linkage disequilibrium (LD) vary across ancestry groups [14]. Greater haplotype diversity in some ancestry groups, especially in African ancestry populations, may facilitate the localization of functional variants that show association signals delimited in part due to weaker LD with neighboring SNPs [14,15]. A recent multiethnic analysis of lipid associated loci demonstrated that genetic determinants at many lipid loci differed between European Americans and African Americans [16]. For example, in African Americans from the PAGE consortium [9,17], a reported regulatory variant rs12740374 at CELSR2/PSRC1/SORT1 locus [18] was more strongly associated with LDL-C compared to many nearby variants demonstrating similar strength of association in European ancestry individuals [5]. High-density genotyping enables trans-ethnic fine-mapping studies to narrow the set of plausible candidate functional variants at GWAS loci without introducing uncertainty through imputation [19].
In this study, we analyzed high-density genotyped SNPs on the Metabochip for their associations with TG, HDL-C, and LDL-C in 6,832 African Americans, 9,449 East Asians, and 10,829 Europeans at 58 known lipid loci. We sought to (i) identify the variants with the strongest evidence of association at each locus in populations with different ancestries and in the combined transethnic samples; (ii) investigate allelic heterogeneity and populationspecific signals at the established lipid loci; (iii) explore whether high-density genotyping in diverse ethnic populations would narrow the sets of plausible candidate functional variants for further study; and (iv) assess whether the variants reported to have functional effects on gene expression or protein function during the past 30 years of biological study exhibited the strongest evidence of association at the corresponding GWAS signals.

Results
Loci with evidence of association in diverse populations and in the combined trans-ethnic samples Descriptions of the collection, phenotyping, and genotyping of study samples for each study site are provided in Table S1. Given that all 58 loci have a priori genome-wide significant evidence of association with one or more of these three lipid traits, we used a P value threshold of 1610 24 as an approximate correction for the mean of 451 SNPs tested at each locus in African Americans (Table S2). An average of 273 SNPs per locus was tested in East Asians and an average of 291 in Europeans, but we applied the same, more conservative, P value threshold of 1610 24 to these two groups as well.
A total of 33 loci (nine for TG, 14 for HDL-C, and 10 for LDL-C) exhibited evidence of association at P,1610 24 in at least one of the three ancestry groups, including 22 loci in African Americans, 17 in East Asians, and 31 in Europeans (Table S3A-S3C). The variants that reached this threshold of significance were common (MAF$0.05), except at three loci (PCSK9 and ABO for LDL-C, and APOA5 for HDL-C) in African Americans and two loci (PCSK9 and TOP1, both for LDL-C) in European ancestry individuals. When individuals of diverse ancestry groups were combined, 11, 15, and 12 loci showed evidence of significant association with TG, HDL-C, and LDL-C, respectively (Table  S4A-S4C). Among these 38 loci, six loci had not reached the P value threshold of 10 24 within any individual ancestry group, including CETP and NAT for TG, GALNT2 and MMAB for HDL-C, and TRIB1 and TIMD4 for LDL-C. One locus, COBLL1, was

Author Summary
Lipid traits are heritable, but many of the DNA variants that influence lipid levels remain unknown. In a genomic region, more than one variant may affect gene expression or function, and the frequencies of these variants can differ across populations. Genotyping densely spaced variants in individuals with different ancestries may increase the chance of identifying variants that affect gene expression or function. We analyzed high-density genotyped variants for association with TG, HDL-C, and LDL-C in African Americans, East Asians, and Europeans. At several genomic regions, we provide evidence that two or more variants can influence lipid traits; across loci, these additional signals increase the proportion of trait variation that can be explained by genes. At some association signals shared across populations, combining data from individuals of different ancestries narrowed the set of likely functional variants. At PCSK9 and APOA5, the data suggest that different variants influence trait levels in different populations. Variants previously reported to alter gene expression or function frequently exhibited the strongest association at those signals. The multiple signals and population-specific characteristics of the loci described here may be shared by genetic loci for other complex traits.
significantly associated with HDL-C in Europeans alone (P = 8.5610 25 ), but displayed less evidence of association in the combined trans-ethnic samples (P = 1.6610 24 ).
Loci with evidence of multiple signals at a locus, and often population-specific signals To assess the presence of two or more signals at each locus that exhibited evidence of association in at least one ancestry group, we performed sequential conditional analyses by adding the most strongly associated SNP to the regression model as a covariate and testing the association with each of the remaining regional SNPs independently. A set of sequential conditional analyses were followed by inclusion of the strongest SNP in each conditional model until the most strongly associated SNP showed a conditional P value.10 24 and was not annotated as a nonsense or nonsynonymous substitution. We also investigated whether association signals were population-specific, which we defined as association signals with variants that are not variable in the samples from the other two ancestry groups in this study or in the 1000 Genomes Project populations that represent those groups among total European ancestry (EUR), total East Asian ancestry (ASN), or total west African ancestry (AFR).
In African Americans, sequential conditional analyses revealed that 10 of the 22 loci with evidence of association exhibited two or more signals at P,10 24 (Table 1). Two loci (PCSK9 and the TOMM40-APOE-APOC4 cluster; both for LDL-C) each had seven signals, four loci (APOB for LDL-C, LDLR for LDL-C, LCAT for HDL-C, and CETP for HDL-C) had three signals, and another four loci (APOB, APOC1, APOA5, and LPL; all for TG) had two signals. Among the 10 loci with two or more signals, all these signals led to an average 1.8-fold increase in the amount of phenotypic variance (R 2 ) compared to that explained by the strongest signals alone (See Method) in African Americans. Among these 34 signals, 15 were represented by less common (0.005#MAF,0.05, n = 11) or rare (MAF,0.005, n = 4) variants. In addition, 15 signals at eight loci were African Americanspecific. If we only include SNPs that meet a locus-specific P-value threshold based on the number of genotyped SNPs (Table S2), LPL for TG and APOB for both TG and LDL each had one signal, and the seven loci with multiple signals still showed an average of 1.8-fold increase in the explained phenotypic variance.
The seven signals at PCSK9 in African Americans included six nonsense or nonsynonymous variants previously shown to associate with LDL-C levels and to affect PCSK9 expression or function [20][21][22], along with an unreported intronic variant ( Table 1). The strongest signals were a nonsense variant rs28362286 (C679X, Figure 1A) and a nonsynonymous variant rs28362263 (A443T, Figure 1B), which showed no reduction of association evidence when conditioned on C679X. Conditional analysis on both C679X and A443T yielded a third signal at rs28362261 (N425S, Figure 1C); and further conditional analyses successively implicated rs67608943 (Y142X, Figure 1D), rs72646508 (L253F, Figure 1E), and an intronic variant rs11800243 ( Figure 1F). The seventh signal, which did not reach the P conditional ,10 24 threshold, was represented by the nonsynonymous variant rs11591147 (R46L, Figure 1G) that exhibited the strongest and directionally consistent evidence of association with LDL-C in Europeans (P initial = 2.8610 230 , Table 2). The seven signals were weakly correlated with each other in African American individuals, and all pairwise LD r 2 values were less than 0.02. Among the seven PCSK9 signals, the top five were African American-specific, and six were either less common or rare in African Americans. The lead SNP C679X accounted for 1.3% of the explained LDL-C phenotypic variance and the seven signals together explained 3.6% of the phenotypic variance in African Americans. PCSK9 exhibited two signals in Europeans (R46L and rs2495477, Table 2), but no SNP reached P initial , 10 24 in East Asians.
At the TOMM40-APOE-APOC4 cluster, the seven signals in African Americans explained 6.6% of the LDL-C phenotypic variance compared to 4.1% explained by the strongest signal R176C, which had reported functional effects [23] (Table 1, Figure  S1). These seven signals were not entirely independent of one another. The fourth signal, rs157588, showed association with LDL-C (P = 2.0610 27 ) only after conditioning on the top three signals, but not in the original unconditioned association analysis (P = 0.72). The trait-decreasing allele (G allele: freq = 0.176) of rs157588 was present on haplotypes containing the trait-increasing allele of the third signal rs1038026 (A allele: freq = 0.351), thus the association of the fourth signal increased in significance after accounting for linkage disequilibrium (r 2 /D9 = 0.35/0.92) with the third signal at the same locus. Haplotype analysis revealed that compared to the reference A-A (increasing-increasing) haplotype, the G-G (decreasing-decreasing) haplotype only displayed modest association with LDL-C (P = 7.5610 23 ), but the A-G (rs1038026 increasing-rs157588 decreasing) haplotype showed significant association with decreased level of LDL-C (P = 1.5610 210 ) (Table  S5). In Europeans ( Table 2) and East Asians (Table 3), three and two signals were identified at TOMM40-APOE-APOC4, respectively. The known functional variant R176C exhibited the strongest evidence of association across the three ancestry groups, with effect sizes of 20.536, 20.505, and 20.411 mmol/L in individuals of African American, European, and East Asian ancestry, respectively (Table 1). However, another APOE variant rs429358 (C130R), that together with R176C, defines the three major isoforms of APOE (e2, e3, and e4) [7,24], was not successfully genotyped, therefore the LDL-C association with either C130R or the APOE haplotype was unavailable in this study.
In Europeans, 21 signals at nine of the 31 loci exhibited multiple signals for at least one of the three lipid traits at P,10 24 (Table 2). Three loci (APOA5 for TG, TOMM40-APOE-APOC4 cluster for LDL-C, and CETP for HDL-C) each had three signals while another six loci (PCSK9 for LDL-C, GCKR for TG, LIPC for HDL-C, APOB for LDL-C, and LPL for both TG and HDL-C) each had two signals. At the nine loci that had two or more signals, all association signals resulted in an average of 1.3-fold increase in the explained phenotypic variance compared to the strongest signals alone across loci. At PCSK9, rs11591147 (R46L) exhibited the strongest evidence of association in Europeans. As reported above, R46L also represented the seventh signal in African Americans. R46L accounted for 1.2% of the total variation in LDL-C levels in Europeans compared the 0.16% in African Americans. This SNP was not variable in the 1000 Genomes Project ASN samples (East Asian ancestry) and the .9,000 East Asian individuals in this study.
In East Asians, we observed three signals at the TG locus APOA5, and two signals at three loci including TOMM40-APOE-APOC4 cluster for LDL-C, CETP for HDL-C, and ABO for LDL-C (Table 3). At the four loci that exhibited multiple signals, all the association signals increased the explained phenotypic variance by an average of 1.3-fold compared to the strongest signal across loci. The second signal at APOA5 was the nonsynonymous variant G185C previously reported to affect the protein function [25]. Although G185C was not unique to East Asians, the frequency was very low in African Americans (MAF = 0.002, P = 0.028) and Europeans (MAF = 0.0003, P = 0.23), and the low allele frequency meant that this study had less than 5% statistical power to detect the association in these groups.    c P values of sequential conditional analyses, in which we added the SNP with the strongest evidence of association into the regression model as a covariate and tested for the next strongest SNP until the strongest SNP showed a conditional P value.10 24 and had no annotation suggesting potential function. At APOA5, which exhibited multiple signals in all three populations (Table 1, Table 2, Table 3), the strongest TGassociated SNPs differed and were not in high LD (r 2 ,0.8) with each other in any of the ancestry groups. In African Americans, the two signals S19W (MAF = 0.058, P = 8.4610 215 ) and rs79624460 (MAF = 0.083, P = 4.8610 212 ), showed no evidence of significant association in East Asians (Table 1), likely due to the low allele frequency and the limited power (,10%) to detect the association. The three signals at APOA5 in East Asians were only modestly associated with TG in African Americans (all P.10 23 , Table 3). The SNP LD r 2 values between the African American and East Asian signals were less than 0.02 in both populations, suggesting that they represent distinct APOA5 signals in the two ancestry groups. In addition, the APOA5 signal rs3741298 (P = 9.7610 244 , MAF = 0.222) in Europeans exhibited evidence of association with TG in African Americans (P = 9.8610 25 , MAF = 0.327) and East Asians (P = 1.2610 220 , MAF = 0.357), but the significance levels of the association with rs3741298 were substantially attenuated by conditioning on the strongest signals S19W in African Americans (P = 0.10) and rs651821 in East Asians (P = 0.88). In Europeans, the associations with rs3741298 were partially removed when conditioning on S19W and rs651821 (P conditional = 1.7610 228 and 3.1610 217 , respectively). The European signal rs3741298 was moderately correlated with the African American signal S19W (LD r 2 = 0.21 and 0.10 in the 1000 Genomes Project EUR samples (European ancestry) and in PAGE African American samples, respectively), and with the East Asian signal rs651821 (LD r 2 = 0.31 and 0.28 in 1000 Genomes Project EUR and ASN samples, respectively). Notably, the effect sizes of the two reported functional variants S19W [26] and G185C [25] at APOA5 were similar across the three groups (S19W, African American: 0.136; East Asian: 0.136; European: 0.121 and G185C, African American: 0.204; East Asian: 0.201; European: 0.269 mmol/L in log e scale) despite the limited power to detect significant evidence of association at low allele frequencies. These findings support the hypothesis that causative variants may have a similar genetic impact on trait variation across populations if not influenced by hidden gene-gene or gene-environment interactions [27]. We also observed that the second European signal rs75919952 exhibited nominal evidence of association (P initial = 0.018, MAF = 0.041), but was not associated with TG in the other two groups ( Table 2). The lack of association may be due to insufficient power (15% and 55% in African Americans and East Asians, respectively; assuming a = 0.05) corresponding to the lower allele frequency (MAF = 0.012) in African Americans, the smaller sample sizes in both populations, or underlying interactions.
Trans-ethnic high-density genotyping narrowed the region of association signals We next examined whether trans-ethnic meta-analysis or comparison across ancestries would refine the association signals by narrowing the genomic regions where functional variants might be expected to reside. The trans-ethnic analysis allowed the refinement of association signals at loci of GCKR, PPP1R3B, ABO, LCAT, and ABCA1 (Table 4, Table S3A-S3C). The signal at GCKR was localized to the reported functional variant P446L [28] due to the limited LD in African Americans ( Figure S2A-S2D).
Notably, there were seven and six variants in high LD (r 2 .0.8) with P446L in the 1000 Genomes Project ASN and EUR samples, but no SNP with LD r 2 .0.8 in African American individuals. At the signal ,200 kb from the PPP1R3B gene for which no functional regulatory variant(s) have been reported, the association signal was narrowed from 4 SNPs spanning 36 kb (P,10 24 ) in Europeans to two highly correlated SNPs located 1 kb apart in African Americans (rs6601299, P = 8.0610 28 and rs4841132, P = 2.9610 27 ; LD r 2 .0.94) (Figure 2). The lead SNP rs6601299 was in high LD with 11 variants in the 1000 Genomes Project EUR samples but only highly correlated with two and one variant in the 1000 Genomes Project AFR samples (West African ancestry) and PAGE African American individuals, respectively. At the ABO locus, trans-ethnic meta-analysis revealed six SNPs exhibiting stronger evidence of association (P,1.1610 211 ) with LDL-C compared to other variants in the same region (P.2.3610 27 ) ( Figure S3A-S3D). At the locus LCAT for HDL-C, the association signals spanned ,800 kb, ,360 kb, and ,360 kb in Europeans, East Asians, and African Americans, with a ,50 kb overlapping region. Trans-ethnic meta-analysis of all samples localized the signal to four variants spanning this 50 kb region ( Figure S4A-S4D). At HDL-C locus ABCA1, the reported GWAS index SNP rs1883025 consistently showed the strongest association within each of the three ancestry groups that we examined, but the significance level of the association was similar to those of the nearby SNPs. Trans-ethnic meta-analysis refined the signal by revealing that rs1883025 (P = 4.3610 217 ) and rs2575876 (P = 1.8610 215 ) displayed much stronger association than the neighboring SNPs (P.8.4610 210 ) ( Figure S5A-S5D).
Reported functional variants were frequently the most strongly associated ones at a signal Among loci associated with at least one lipid trait (P,10 24 ), at least 27 variants at 15 loci have been previously reported [18,22,23,25,26,[28][29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47] to functionally influence gene expression or protein function in vitro (Table 5). Among the 27 variants, 17 are present on the Metabochip and two are well-represented by perfect proxies in complete LD (r 2 = 1) based on the 1000 Genomes Project EUR data. Of the 19 reported functional variants, 14 (74%) exhibited the strongest association P-value among all SNPs at that signal in at least one population. In addition, two more reported functional variants (APOB-rs7575840, P = 7.0610 217 and LPL-rs328, P = 2.3610 211 ) were in high LD (r 2 .0.95) with the most strongly associated variants and showed similar evidence of association (APOB-rs934198, P = 3.7610 217 ; LPL-rs1803924, P = 1.1610 211 ). If we include these two variants, then 16 of the 19 (84%) reported functional variants displayed the strongest association P-value at the primary, secondary, or successive signals. The remaining three reported functional variants: LDLR-rs688 (N591N), LPL-rs1801177 (D9N), and HMGCR-rs3761740 (911C.A), were poorly tagged (LD r 2 ,0.2) by the strongest variants in our data. Additional functional variants may exist at these loci that have not yet been reported to change gene expression/protein function or that were not identified in our literature search. For example, P2739L and P145S that represented the two signals at APOB (Table 1) were predicted by PolyPhen [48] to be 'probably damaging' with a score of '1', although their functional roles were unclear. Figure 1. LDL-C locus PCSK9 exhibited seven signals in African Americans. Initial association in the main analysis (A). Residual association in sequential conditional analysis by sequentially adding the lead SNPs into the regression model (B-G). Each SNP was colored according to its LD (r 2 ) in the PAGE consortium, with the strongest SNP colored in purple and symbols designating genomic annotation defined in the 'annotation key'. Genomic coordinates refer to build 36 (hg18). doi:10.1371/journal.pgen.1003379.g001 Table 2. Lipid loci with multiple signals in Europeans. c P values of sequential conditional analyses, in which we added the SNP with the strongest evidence of association into the regression model as a covariate and tested for the next strongest SNP until the strongest SNP showed a conditional P value.10 24 and had no annotation suggesting potential function.  Table 3. Lipid loci with multiple signals in East Asians. Among the 16 reported functional variants and proxies that exhibited the strongest association P-value at a signal (Table 5), R176C at APOE was strongest in all three populations and GCKR L446P was identified in both African Americans and Europeans. The remaining 14 variants showed the strongest associations in only one of the populations, including 10 in African Americans, three in East Asians, and one in Europeans. Five of the 10 variants in African Americans were at the PCSK9 locus. Furthermore, nine of the 16 variants represented the strongest signal at a given locus, three for a 2nd signal, and four for the 3rd or additional signals.
These functional variants covered a wide allele frequency spectrum (MAF: 0.003-0.481), including five less common or rare variants observed only in African Americans.

Discussion
This study evaluated densely spaced SNPs at 58 lipid loci across three ancestrally diverse populations. The results support evidence that allelic heterogeneity is a frequent feature of polygenic traits [5,49] and extend the findings to non-European populations, especially to African ancestry populations that have high levels of haplotype diversity. The results also provide strong evidence that fine mapping at GWAS loci can identify population-specific signals. Despite comparable sample sizes, we identified more signals per locus and more signals overall in African Americans (34 signals at 10 loci) compared to Europeans (21 signals at nine loci) and East Asians (nine signals at four loci), and 15 of the 34 signals identified in African Americans were population-specific (Table 1,  Table 2, Table 3). These observations may reflect the larger number of SNPs genotyped in African Americans (Table S2), variation across populations subject to natural selection during human evolution [14], or genetic drift [50]. Due to the varied number of signals per locus, different associated markers, and different effect sizes, the phenotypic variance explained differs across populations [51][52][53]. Sampling variability, epistasis, and gene-environment interactions may cause over-or under-estimation of the proportion of explained phenotypic variance. In this study, we also observed that many population-specific signals, including those at PCSK9 and APOA5, are largely confirmatory [20,22,54]; however, the association evidence at other signals, in particular the additional signals at APOE, LDLR, and APOC1 identified by the conditional analyses, requires replication in future studies.
At PCSK9, the strongest signal C679X identified in African Americans is population-specific and showed substantially stronger evidence of association with LDL-C (P = 4.1610 222 ) compared to the GWAS index SNP rs2479409 [5] (P = 0.12) and the most strongly associated SNP R46L identified via fine-mapping [7] (P = 2.3610 23 ), both of which were previously reported in Europeans. The proportion of phenotypic variance explained in African Americans increased from 0.16% by the GWAS index SNP to 1.3% by the Metabochip signal C679X, and all variants at the locus together explained 3.6% of the total variation in LDL-C, providing evidence that heritability at identified loci may be underestimated by GWAS [7]. A limitation of these variance estimates is that calculations included the SNPs based simply on their significant association P values rather than the variants with biological function, which could over-estimate effects due to the winner's curse.
Results across the genotyped loci demonstrated that the majority of signals were represented by common variants, yet high-density genotyping also identified less common and rare variants associated with lipid traits. At PCSK9, the MAFs of six out of the seven signals were ,0.05 in African Americans. These signals, along with other low frequency variants identified at APOE, LDLR, LCAT, APOB, APOC1, and LPL provide evidence of the substantial contribution of low frequency genetic variants to the variance of lipid traits [6]. Other variants, some with very low allele frequency, may exist at these loci, suggesting that future sequencing studies may identify additional functional variants that influence lipid variation.
Sequential conditional analyses provided further insight into the genetic architecture of the established lipid loci by explaining additional phenotypic variation and revealing complex patterns of association. We observed loci at which signals were not independent of each other, but partially correlated based on moderate LD estimates and changes of association statistics before and after accounting for other signals. For these dependent signals, such as those at TOMM4-APOE-APOC4, the significance of residual association would increase when trait-increasing alleles were present on opposite haplotypes and decrease when traitincreasing alleles were on the same haplotype. Other signals that appeared to be independent on the basis of low pairwise LD and unchanged association evidence after conditional analysis may still be partially tagging an un-typed, yet influential, variant [55][56][57]. Therefore, deeper sequencing that identifies all variants at a locus will be required to characterize more fully the allelic heterogeneity and the patterns of association.
One of the major goals of high-density genotyping is to aid in identification of the functional variants by recognizing the most compelling candidate variants for experimental study. Because of the diverse LD structure across populations, particularly in terms of the limited LD extent in African ancestry populations, trans-ethnic finemapping of GWAS loci can narrow the region where functional variants are most likely to reside. This study was able to narrow the association signals at five lipid loci, based on the much smaller subsets of most strongly associated variants located in smaller regions. One signal was localized to a reported causal variant (GCKR-P446L) [28] and another to an uncharacterized nonsynonymous variant (SLC12A4-E4G near LCAT). These findings demonstrate that trans-ethnic association analyses can increase the resolution of fine-mapping by enlarging the haplotypic diversity of samples with different ancestries and consequently, narrowing the sets of candidate functional variants [58,59]. The previously described functional variants at LCAT [44] and ABCA1 [42,43], which are not present on the Metabochip, were physically located 22 kb and .43 kb away from the narrowed association signals observed in this study (Table 4).
Refining signals by trans-ethnic meta-analysis largely relies not only on the existence of distinct LD patterns across ancestry groups but also on shared functional variants. If functional variants are shared across populations, as observed with GCKR-P446L, performing trans-ethnic meta-analysis and integrating LD information across different populations may refine the signal. On the contrary, if trait variation is influenced by distinct functional variants across populations, as our data suggest for APOA5 ( Figure S6A-S6D), the lead SNPs produced by meta-analysis would be influenced by the sample size, magnitude of genetic effects, and allele frequencies. Similarly, in the case of population-specific functional variants, such as those at PCSK9, the results from meta-analysis would reflect the association in one particular population rather than the combined effect across populations if signals unique to this population drive the results. Therefore, accurate assessment of allelic variability is needed on a population-by-population and locus-by-locus basis.
Although genotype imputation has become a standard practice to increase genome coverage in GWAS by predicting the genotypes at SNPs that are not directly genotyped, imputation accuracy tends to be lower for rare variants owing to the lower degree of LD and the more challenging haplotype reconstruction [60]. In addition, African American samples pose a challenge for imputation due to their varying degree of admixture [61]. A major strength of our study is that all variants we tested for association were directly genotyped using the Metabochip, which was designed to provide a high-density coverage for both overall SNPs and low frequency variants concentrated around GWAS-identified loci and/or signals [9,10]. This approach increases the reliability of our association results overall, but in particular the variants with low allele frequencies.
In conclusion, we performed a large-scale trans-ethnic fine-mapping study to investigate the established lipid loci using the Metabochip Table 4. Trans-ethnic fine-mapping narrowed the association signals.  high-density genotyping array and focusing on diverse groups including African Americans, East Asians, and Europeans. Our results highlight the value of high-density genotyping in diverse populations to identify a wider spectrum of susceptibility variants at established loci, both in terms of additional signals and in terms of population-specific and/or potentially functional variants. The additional signals revealed through the sequential conditional analyses lead to a 1.3-to 1.8-fold increase in the explained phenotypic variance across the different populations. In addition, integrating diverse LD patterns across diverse ancestry groups allows for the refinement of association signals. Lastly, our findings that 74% of the reported functional variants exhibited the strongest association at these densely typed signals suggest that at loci and signals where functional variants are unknown, the variants with strongest association may be good candidates for functional assessment.

Study populations and phenotypes
The 6,832 African Americans studied are comprised of individuals from the Atherosclerosis Risk in Communities Study (ARIC) [62], the Multiethnic Cohort Study (MEC) [63], and the Women's Health Initiative (WHI) [64,65] that are part of Population Architecture using Genomics and Epidemiology (PAGE) consortium [66] and from Hypertensive Genetic Epidemiology Network (HyperGEN) [67]. The [69,70], and the Norwegians were from the cohorts of Nord-Trøndelag Health Study (HUNT 2) and the Tromsø Study (TROMSO) [71,72].
All study protocols were approved by Institutional Review Boards at their respective sites. Brief descriptions of the studies are provided in the Text S1. General characteristics and measurements of TG, HDL-C, and LDL-C in each cohort are summarized in Table S1. Values of triglycerides were natural log transformed to approximate normality in each study sample separately. Table 5. Reported functional variants exhibited the strongest association at a signal (P, 10 24   Genotyping We genotyped all study samples with the Metabochip according to the manufacturer's protocol (Illumina, San Diego, CA, USA). Table S1 summarizes the quality control criteria of genotyping, including call rate, sample success rate, Hardy-Weinberg equilibrium, and MAF that varied across studies.

Statistical analyses
We applied multiple linear regression models and assumed an additive mode of inheritance to test for association between genotypes and HDL-C, LDL-C, or log-transformed triglycerides. We performed each test of association separately in each of the 11 groups (Table S1) prior to meta-analysis. We constructed principal components (PCs) using the software EIGENSOFT. We used age and sex as covariates in each individual cohort; other cohortspecific covariates including age 2 , enrollment site, socioeconomic status, and principal components varied across studies (Table S1). The European samples include type 2 diabetes (T2D) cases and unaffected controls; to avoid confounding due to T2D status, samples were analyzed separately as Finnish T2D patients, Finnish unaffected individuals, Norwegian T2D patients, and Norwegian unaffected individuals.
We first conducted the meta-analysis within the African Americans, East Asians, and Europeans separately. We then performed combined trans-ethnic meta-analyses by combining the statistics of each the 11 participating groups to assess the association with the SNPs at the 58 lipids loci.
At loci that exhibited evidence of association at P,10 24 , we next performed a series of sequential conditional analyses by adding the most strongly associated SNP into the regression model as a covariate and testing all remaining regional SNPs for association. We conducted a set of sequential conditional analyses until the strongest SNP showed a conditional P value.10 24 and had no annotation or literature evidence that suggested a functional role.
Unless otherwise noted, linkage disequilibrium estimates were obtained from the 1000 Genomes Project November 2010 release. SNP positions correspond to hg18.
We performed haplotype analysis at LDL-C locus TOMM40-APOE-APOC4 in 5,593 unrelated African Americans from the PAGE consortium, using the 'haplo.stat' R package. Haplotypes and haplotype frequencies were estimated using the R function 'haplo.em'. The association between haplotypes and LDL-C was assessed using the R function 'haplo.glm'. An additive model was assumed, in which the regression coefficient b represents the expected change in LDL-C level with each additional copy of the specific haplotype compared with the reference haplotype, which was set as the A-A (trait increasing-increasing) haplotype.
We created the regional association plots using LocusZoom [76]. To plot the association results in Europeans and East Asians, we used the LocusZoom-implemented LD estimates from the 1000 Genomes Project (June 2010) CEU and CHB+JPT samples, whose LD structures are similar to our samples with European and East Asian ancestries. We applied the user-supplied LD calculated from the genotype data of the PAGE African American samples to plot the regional association in African Americans [9], because the LD patterns may vary from any pre-computed LD sources implemented in LocusZoom.
We evaluated the proportion of variance explained by a single SNP or any given locus by including the SNP or a set of SNPs into a linear regression model with all covariates used in association analysis and calculating the R 2 for the full model. We subtracted the variance explained by a basic model in which only covariates were included from the variance we obtained from the full model. We performed these analyses using SAS version 9.2 (SAS Institute, Cary, NC, USA). Figure S1 LDL-C locus TOMM40-APOE-APOC4 exhibited seven signals in African Americans. Each SNP was colored according to its LD (r 2 ) in PAGE consortium with the strongest SNP rs7412 (R176C) colored in purple. (PDF)      Table S5 LDL-C association with haplotypes consisting of the third (rs1038026) and the fourth (rs157588) signals at TOMM40-APOE-APOC4 cluster. (PDF) Text S1 Study description. (DOCX)