A Multiethnic Replication Study of Plasma Lipoprotein Levels-Associated SNPs Identified in Recent GWAS

Genome-wide association studies (GWAS) have identified a number of loci/SNPs associated with plasma total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglyceride (TG) levels. The purpose of this study was to replicate 40 recent GWAS-identified HDL-C-related new loci in 3 epidemiological samples comprising U.S. non-Hispanic Whites (NHWs), U.S. Hispanics, and African Blacks. In each sample, the association analyses were performed with all 4 major lipid traits regardless of previously reported specific associations with selected SNPs. A total of 22 SNPs showed nominally significant association (p<0.05) with at least one lipid trait in at least one ethnic group, although not always with the same lipid traits reported as genome-wide significant in the original GWAS. The total number of significant loci was 10 for TC, 12 for LDL-C, 10 for HDL-C, and 6 for TG levels. Ten SNPs were significantly associated with more than one lipid trait in at least one ethnic group. Six SNPs were significantly associated with at least one lipid trait in more than one ethnic group, although not always with the same trait across various ethnic groups. For 25 SNPs, the associations were replicated with the same genome-wide significant lipid traits in the same direction in at least one ethnic group; at nominal significance for 13 SNPs and with a trend for association for 12 SNPs. However, the associations were not consistently present in all ethnic groups. This observation was consistent with mixed results obtained in other studies that also examined various ethnic groups.


Introduction
Prior to genome-wide association studies (GWAS), genomewide linkage scans and candidate gene (positional and/or biological) association studies were the main approaches used to unravel the genetic determinants of complex traits such as plasma lipid/lipoprotein levels. These studies have implicated a number of genes and variants as determinants of plasma total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglyceride (TG) levels, of which some were more consistently replicated while several others yielded inconsistent results. With the availability of GWAS platforms, it became possible to identify susceptibility variants and genes for complex traits without making a priori assumptions.
Several GWAS investigating plasma lipid/lipoprotein traits primarily in subjects of European ancestry have been published to date [1][2][3][4][5][6][7][8][9][10][11]. These GWAS confirmed a number of genes previously implicated in influencing the inter-individual variation in four major lipid traits (TC, LDL-C, HDL-C, and TG) in earlier functional and/or candidate gene association studies as well as identified several new loci and genes. A recent meta-analysis of 46 lipid/lipoprotein GWAS comprising of .100,000 individuals, has identified 95 genome-wide significant loci associated with at least one of the four major lipid traits [11].
Only a handful of post-GWAS replication studies published to date have simultaneously examined various (3 or more) ethnic groups, including non-Hispanic Whites (NHWs), Hispanics, and African Americans. Additional independent studies investigating GWAS-identified variants in diverse racial/ethnic groups are needed. In this study, we sought to replicate 40 recent GWASidentified HDL-C-related loci, which were not among previously established lipid loci/genes, in 3 epidemiological samples comprising of U.S. NHWs, U.S. Hispanics, and African Blacks. In each sample, we performed association analyses with all four major lipid traits (TC, LDL-C, HDL-C, and TG) regardless of previously reported specific associations with selected SNPs.

Subjects
The study consisted of 621 NHW and 413 Hispanic nondiabetic subjects drawn from the San Luis Valley Diabetes Study, a population-based case-control study in the San Luis Valley in Southern Colorado, and 787 African Blacks drawn from a study on coronary heart disease (CHD)-related risk factors in Benin City, Nigeria. Detailed information on these studies and subjects can be found elsewhere [12][13][14][15][16][17]. The ages of participants ranged from 24 to 75 in NHWs, 21 to 75 in Hispanics, and 19 to 70 in African Blacks. The demographic and phenotypic characteristics of the study subjects are summarized in Table 1. The study was approved by the University of Pittsburgh and University of Colorado Denver Institutional Review Boards and all study participants provided written informed consent.

SNP Selection
The purpose of this study was to primarily replicate the HDL-C-related new signals (different from those found in established lipid loci/genes) identified in recent lipid/lipoprotein GWAS. We analyzed a total of 40 SNPs selected from four publications [7][8][9]11] (Table 2). We primarily targeted those SNPs that reached genome-wide level of significance for association with HDL-C levels (n = 36). We also included 4 additional SNPs (shown in italics in Table 2) that although did not reach genome-wide level of significance for HDL-C [7,8], they were either highly significant for HDL-C or modestly significant for HDL-C but genome-wide significant for at least one of the three other lipid traits (TC, LDL-C or TG). Whenever there were more than one significant SNP reported for a given locus by the same group and/or various groups, only one SNP was selected for replication in our samples. Although all these GWAS primarily investigated individuals of European ancestry (EU), one of them [11] also sought replication in various non-European populations, including African Americans. Whenever the information was available, the effects observed in African Americans are shown as 'concordant' (AA) or 'discordant' (AA*) in Table 2. Of 40 SNPs selected from these GWAS, all were analyzed in our NHW and Hispanic samples whereas only a subset (n = 34) with sufficient minor allele frequency (MAF) was analyzed in our African sample.

Genotyping
DNAs were extracted from either buffy coats (NHWs and Hispanics) or blood clots (African Blacks) using standard methods. Samples were whole-genome amplified using the GenomiPhi DNA Amplification Kit (GE Healthcare Bio-Sciences, Piscataway, NJ) prior to genotyping. Twenty SNPs were genotyped using the TaqMan allelic discrimination method (Applied Biosystems, Foster City, CA) in 621 NHWs, 413 Hispanics, and 787 African Blacks. The other twenty were genotyped using the iPLEX Gold technology (Sequenom, San Diego, CA) in 621 NHWs, 382 Hispanics, and 787 African Blacks. Depending on the genotyping method used, about 7-10% of samples were repeated to test the consistency of genotype calls for each assay.

Statistical Methods
Concordance of the genotype distribution to Hardy-Weinberg equilibrium (HWE) was tested for each variant using x2 goodness-of-fit test. Whenever it was necessary to reduce the effects of nonnormality, dependent quantitative variables were transformed using either log or square root transformation: 'log10' transformation was used for HDL-C and TG levels in NHWs and Hispanics, 'natural log' transformation for TC and TG levels in African Blacks, and 'square root' transformation for LDL-C and HDL-C levels in African Blacks. Significant covariates for each dependent variable were identified using stepwise regression in order to determine the most parsimonious set of covariates to be included into analysis in each population. Detailed information on the evaluation and effects of covariates in our study samples can be found elsewhere [16]. To test for the effects of genotypes on the means of the quantitative traits, a linear regression analysis was performed (under the additive model) and the results were adjusted for the relevant covariates. For NHWs and Hispanics, the covariates were sex, age, BMI, and smoking. For African Blacks, the covariates were sex, age, waist measurement, exercise (minutes walking or bicycling to work each day), and staff level (junior or senior). The R statistical software package (version 2.12.2, http://www.r-project.org) was used to perform all analyses. Because this was a replication study, we considered a nominal p,0.05 as evidence of association. In addition, because we compared our results to those obtained in large GWAS and meta-analyses that included several thousand subjects, we have also discussed the results of the SNPs that showed a trend (p-values between 0.05-0.20) for 'the same direction' of association with 'the same genome-wide significant lipid trait' reported in the original GWAS.

Results
The genotype call rates were very high ($95%) for almost all assays and only a small number of SNPs showed lower call rates: 1 in NHWs (87%), 3 in Hispanics (between 92-95%), and 2 in African Blacks (between 92-95%). No SNP showed low call rates across all populations genotyped. Discrepancy among replicates was detected for only one assay (rs174547) for which the discrepancy rate was 0.5%.
The association results for 4 major lipid traits examined in 3 ethnic groups are summarized in Table 3. Most SNPs differed in allele frequencies among various ethnic groups. For 10 SNPs (shown in italics in Table 3), it was not always the same allele that was the minor allele across various ethnic groups and this was taken into account when making cross-sample comparison because the genotypic effects were modeled as the additive effect of the population-specific minor allele in each ethnic group.
A total of 22 SNPs showed nominally significant association (p,0.05, shown in bold in Table 3) with at least one lipid trait in at least one ethnic group with a total of 40 significant p-values, although not always with the same lipid traits reported as genomewide significant in the original GWAS (13 of 22 significant SNPs showed replicated association with the same lipid traits reported in the original GWAS). The total number of significant loci was 10 for TC, 12 for LDL-C, 10 for HDL-C, and 6 for TG levels. Ten SNPs were significantly associated with more than one lipid trait in at least one ethnic group and these associations were as follows: 6 SNPs (CELSR2/rs646776, APOB/rs1042034, PPP1R3B/ rs9987289, GRIN3A/rs1323432, OR4A47/rs7395662, CMIP/ rs2925979) with TC and LDL-C, DOCK7/rs10889353 with TC     and TG, GALNT2/rs2144300 with LDL-C and TG, NUTF2/ rs2271293 with TC and HDL-C, and PGS1/rs4129767 with TC, LDL-C, HDL-C, and TG. Six SNPs (DOCK7/rs10889353, PPP1R3B/rs9987289, GRIN3A/rs1323432, UBASH3B/ rs7941030, MMAB-MVK/rs2338104, PGS1/rs4129767) were significantly associated with at least one lipid trait in more than one ethnic group, although not always with the same trait across various ethnic groups.
For 25 of 40 SNPs analyzed (34 in African Blacks), we were able to replicate the GWAS associations (with the same lipid trait in the same direction) in at least one ethnic group that we studied; at nominal significance (p,0.05) for 13 SNPs and with a trend for association (p-values between 0.05-0.20) for 12 SNPs (please see Table 4 for details). There were additional SNPs with higher pvalues that showed similar trends for effects on the same lipid traits as seen in the original GWAS. Of 6 SNPs showing genome-wide significance for TC levels, we were able to replicate the associations in the same direction for 5 SNPs (at nominal significance for 4 SNPs and with a trend for association for one SNP) in at least one ethnic group studied. Of 2 SNPs showing genome-wide significance for LDL-C levels, we were able to replicate the associations in the same direction for both SNPs at nominal significance in at least one ethnic group studied. Of 36 SNPs with genome-wide significance and one with p = 7.7610 24 for HDL-C levels, we were able to replicate the associations in the same direction for 18 SNPs (at nominal significance for 8 SNPs and with a trend for association for 10 SNPs) in at least one ethnic group studied. Two SNPs (PGS1/rs4129767 and MC4R/ 12967135) showed significant but discordant results (opposite direction) for association with HDL-C levels as compared to the original GWAS. Of 7 SNPs showing genome-wide significance for TG levels, we were able to replicate the associations in the same direction for 5 SNPs (at nominal significance for 3 SNPs and with a trend for association for 2 SNPs) in at least one ethnic group studied.
For 12 SNPs, we observed significant associations with lipid traits other than those reported as genome-wide significant in the original GWAS (Table 4). Three of these 12 SNPs did not show any significant association or a trend for the same direction association with the traits identified as genome-wide significant in the original GWAS.

Discussion
We conducted a replication study on 40 recent GWASidentified new loci that were associated with HDL-C levels [7][8][9]11] using a multiethnic sample comprising of 3 different ethnic groups (NHWs, Hispanics, and African Blacks). Since MAFs of 6 of 40 SNPs were low in African Blacks, only 34 SNPs were included in the analysis in this group. Although we primarily focused on GWAS signals influencing plasma HDL-C levels (with or without effects on other lipids) when selecting the SNPs to be replicated in our study, we performed association analyses with all four major lipid traits (TC, LDL-C, HDL-C, and TG) regardless of previously reported specific associations with selected SNPs.
Given that our study examined various ethnic groups and our sample sizes were modest as compared to those used in large GWAS and meta-analyses (which examined several thousand subjects), we have used nominal significance (p,0.05) for replication and also taken into account the p-values between 0.05 and 0.20 (whenever there was a trend for the same direction of association with the same genome-wide significant lipid trait) when comparing our results with the original GWAS signals. The use of nominal significance for replication and for generalization to non-European populations has been widely employed and applied by some large consortiums [18] and also an acceptable criterion for publications [19]. The comparison of our results to those in original GWAS from which the SNPs were selected has been summarized in Table 4, by including all SNPs that were significantly (p,0.05) associated with at least one lipid trait in at least one ethnic group in our study as well as the SNPs that showed a trend for the same direction of association as seen for at least one genome-wide significant (p#5610 28 ) lipid trait in the original GWAS. Our analysis revealed a total of 22 SNPs/loci with nominally significant association (p,0.05) with at least one lipid trait in at least one ethnic group, although not always with the same lipid traits reported as genome-wide significant in the original GWAS (13 of 22 significant SNPs showed replicated association with originally reported lipid traits in GWAS). Although our SNP selection was biased toward including primarily HDL-C-related variants, we identified a similar number of significant associations with TC (n = 10), LDL-C (n = 12), and HDL-C (n = 10), but relatively less with TG (n = 6). This observation suggests that replication studies should not restrict their analyses of GWAS signals to only specific lipid traits that were initially reported but should rather evaluate all lipid traits for a given variant. Ten loci (CELSR2, APOB, PPP1R3B, GRIN3A, OR4A47, CMIP, DOCK7, GALNT2, NUTF2, and PGS1) were significantly associated with more than one lipid trait in at least one ethnic group studied. Six loci (DOCK7, PPP1R3B, GRIN3A, UBASH3B, MMAB-MVK, PGS1) were significantly associated with at least one lipid trait in more Table 4. SNPs significantly (p,0.05) associated with at least one lipid trait in at least one ethnic group in our study, as well as those that showed a trend for the same direction of association (p between 0.05-0.20, italic traits) as seen for at least one genomewide significant lipid trait in the original GWAS 1 (only relevant observations have been included in the table).  lipid loci in a large sample set comprising of European Americans and African Americans, reported that many loci showed major differences in genetic architecture between these two ethnic groups and the most significant SNP at a given locus for a given trait often varied among them. Further characterization of relevant lipid loci is necessary through their comprehensive sequencing in individuals with extreme phenotypes followed by functional evaluation of identified variants. Among several loci identified to date, the top priority could be given to those found to be relevant to more than one lipid trait and/or confirmed in more than one ethnic group.