Comprehensive Evaluation of the Association of APOE Genetic Variation with Plasma Lipoprotein Traits in U.S. Whites and African Blacks

Although common APOE genetic variation has a major influence on plasma LDL-cholesterol, its role in affecting HDL-cholesterol and triglycerides is not well established. Recent genome-wide association studies suggest that APOE also affects plasma variation in HDL-cholesterol and triglycerides. It is thus important to resequence the APOE gene to identify both common and uncommon variants that affect plasma lipid profile. Here, we have sequenced the APOE gene in 190 subjects with extreme HDL-cholesterol levels selected from two well-defined epidemiological samples of U.S. non-Hispanic Whites (NHWs) and African Blacks followed by genotyping of identified variants in the entire datasets (623 NHWs, 788 African Blacks) and association analyses with major lipid traits. We identified a total of 40 sequence variants, of which 10 are novel. A total of 32 variants, including common tagSNPs (≥5% frequency) and all uncommon variants (<5% frequency) were successfully genotyped and considered for genotype-phenotype associations. Other than the established associations of APOE*2 and APOE*4 with LDL-cholesterol, we have identified additional independent associations with LDL-cholesterol. We have also identified multiple associations of uncommon and common APOE variants with HDL-cholesterol and triglycerides. Our comprehensive sequencing and genotype-phenotype analyses indicate that APOE genetic variation impacts HDL-cholesterol and triglycerides in addition to affecting LDL-cholesterol.


Introduction
Coronary heart disease (CHD), a multifactorial disease modulated by multiple genetic and environmental factors, continues to be a leading cause of morbidity and mortality worldwide [1]. Dyslipidemia with high low-density lipoprotein cholesterol (LDL-C) and low high-density lipoprotein cholesterol (HDL-C) is associated with high risk of CHD [1]. Genes involved in lipid metabolism are considered to be candidate genes for CHD risk, and their genetic variation could contribute, in part, to the inter-individual variation in plasma lipoprotein-lipid levels.
APOE is one of the most extensively studied candidate genes and the influence of its genetic variation on plasma lipid levels and CHD risk has been well investigated [15][16]. The epsilon polymorphism of APOE is defined by the rs7412 and rs429358 SNPs which leads to the generation of ApoE2, ApoE3 and Apo E4 isoforms and are coded by three codominant alleles (designated as E*2 E*3 and E*4). The three isoforms differ by an amino acid substitution at position 112 or position 158 in the 299-amino-acid peptide chain. Although the major effect of APOE genetic variation has been reported to be on LDL-C levels, recent genomewide association studies (GWAS) on lipid traits also identified statistically significant associations of APOE common variants with HDL-C and triglyceride (TG) levels [17][18]. Thus, deep resequencing of the APOE gene in selected individuals with high/low lipid levels is warranted in order to characterize both rare and common variants that might affect plasma lipid profile.
In this study, we resequenced the entire APOE gene region (total 5.5 kb), including all four exons (1,180 bp), three introns (2,432 bp), and ,1 kb of each of the flanking regions in selected individuals with extreme HDL-C levels (falling within the upper and lower 10 th percentiles) from two ethnically-distinct populations (95 US non-Hispanic Whites (NHWs) and 95 African Blacks). Following the sequencing-based discovery step, we genotyped all identified common tagSNPs (r 2 $0.9) with minor allele frequency (MAF) $5%, and relevant uncommon and rare variants with MAF,5% in the entire sample sets (623 NHWs and 788 African Blacks) to evaluate their associations with lipid traits. The association of APOE genetic variation was examined with three lipid traits (LDL-C, HDL-C and TG) and apolipoprotein B (ApoB) using single-site association analysis for variants with MAF$1%, gene-based and haplotype-based association analyses for all variants, and SKAT-O (sequencing Kernel association optimal test) for uncommon and rare variants (MAF,5%).

Study Samples
The study was conducted on two epidemiologically well-characterized population samples comprising 623 US non-Hispanic Whites (NHWs) and 788 African Blacks. NHW samples were collected as part of the San Luis valley Diabetes Study that was designed as geographical case-control study of non-insulin dependent diabetes mellitus and cardiovascular disease in Alamosa and Conejos counties of South Colorado [19). All NHWs used in this study were non-diabetic controls and the basic characteristics of this study are described elsewhere [19][20]. African Blacks were recruited from Benin City, Nigeria as part of a study on CHD risk factors in Blacks and the study details have been described in Bunker et al. [21][22]. While LDL-C, HDL-C and TG were measured in all subjects, ApoB was measured only in a subset of NHW individuals [23][24]. The demographic and lipid characteristics of these study samples can be found in our previous publications [24][25][26]. The study was approved by the University of Pittsburgh and University of Colorado Denver Institutional Review Boards and all study participants provided written informed consent.

DNA Extraction
The genomic DNA used for sequencing and genotyping was extracted from blood clots in Blacks and from buffy coats in NHWs using standard procedures.

DNA Sequencing
Ninety-five individuals with high HDL-C levels falling within the upper 10 th percentile (47 NHWs, and 48 African Blacks) and 95 individuals with low HDL-C levels falling in the lower 10 th percentile (48 NHWs, and 47 African Blacks) were selected for Sanger sequencing. The characteristics of the selected samples in both ethnic groups are summarized in Table S1 in S1 File.
A total of ,5.5 kb of the APOE gene region, including all 4 exons and 3 introns, 1,034 bp in 59 flanking region, and 845 bp in 39 flanking region were PCRamplified using M13 tagged forward and reverse primers. Publicly available information at SeattleSNPs database (http://pga.mbt.washington.edu/) was used to order M13 tagged primers, which generated nine overlapping PCR amplicons. PCR reaction and cycling conditions are available upon request. The PCRamplified samples were sent to a commercial lab (Beckman Coulter Genomics, Danvers, MA) for automated fluorescence-based cycle sequencing and capillary electrophoresis on ABI 3730x1DNA Analyzers. Variant Reporter version 1.0 (Applied Biosystems, Foster City, CA) and Sequencher version 4.8 (Gene Codes

DNA Genotyping
Common tagSNPs (MAF$5%) were determined by Tagger analysis of the sequencing data in each ethnic group using Haploview software and an r 2 cut-off of 0.9. All common tagSNPs and uncommon/rare variants (MAF,5%) identified in each ethnic group by our sequencing, as well as the suspicious variants with low sequencing quality and/or low coverage that warrant validation and the previously reported common variants not detected in our sequencing, were selected for follow-up genotyping.
TaqMan (Applied Biosystems) or iPLEX Gold (Sequenom, San Diego, CA) genotyping methods were used for genotyping following manufacturer's protocols and recommendations. Whole genome amplified DNAs dried in 384-well plates were used for genotyping. Endpoint fluorescence reading of custom or pre-made TaqMan assays was done using the ABI Prism 7900HT Sequence Detection System. The iPLEX Gold genotyping was performed in the Core laboratories of the University of Pittsburgh. Sequences of primers and probes used for genotyping are available upon request. All the samples used in sequencing were also included in genotyping as a quality control measure. The comparison of sequencing and genotyping calls was conducted to check the concordance as well as to increase the call rate in both sequencing and genotyping sets.

Statistical Analysis
Analyses for NHWs and African Blacks were performed separately. For sequencing subsets, the Haploview software (www.broadinstitute.org/haploview) was used to analyze allele frequencies, their distributions in the two extreme HDL-C groups, their concordance with Hardy-Weinberg Equilibrium (HWE), and their linkage disequilibrium (LD) patterns.
SNPs with extensive missing data (.20%) and/or deviating highly from HWE (P,0.01) were excluded from association analyses. A total of 15 variants in NHWs and 23 variants in Blacks remained for downstream analysis. The associations between SNPs and lipid traits were analyzed using additive linear regression model. We took the best power Box-Cox transformation such that the transformed lipid traits achieved normality. Stepwise regression in both directions was performed to identify significant covariates for each lipid trait. The covariates included were gender, age, BMI and smoking in NHWs and gender, age, BMI, waist measurement, smoking, exercise (minutes walking or bicycling to work each day), and staff level (junior or senior) in Blacks. Detailed information on those covariates and their effects can be found elsewhere [24]. Since the epsilon APOE E2/E3/E4 polymorphism has an established effect on cholesterol levels, we also adjusted the effects of novel associations for the epsilon APOE polymorphism. Single-site, haplotype-based and rare variants analyses were implemented in R and the versatile gene-based associations (VEGAS) [27] were also performed. For single-site analysis, we applied Benjamini-Hochberg procedure [28] to control for false discovery rate (FDR) and considered an FDR (q-value) of ,0.20 as statistically significant.
For haplotype association analysis, the generalized linear model (GLM) was used [29]. Including too many haplotypes can make above model inefficient and impractical. To reduce the number of haplotypes considered in association analysis, we used the sliding window, 4 SNPs per window, and assessed evidence for association within each window. Specifically, a global p-value for testing overall effects of the haplotypes with frequency greater than 0.01 was used to assess the associations between the traits and haplotypes in each window. Sliding-window haplotype analysis was performed using the haplo.glm function in the Haplo.Stats R package (version 1.5.0).
We analyzed the cumulative effects of uncommon/rare variants by using the SKAT-O method [30], which has been proposed to be the optimal test for rare variant analysis and outperformed the SKAT and burden tests in several ways. The analysis was performed by using three different minor allele frequency bin thresholds (#1%, #2% and ,5%). The SKAT method was implemented using the ''SKAT'' R package.  Table S2 in S1 File). All novel variants identified in this study have been submitted to dbSNP database: (http://www.ncbi.nlm.nih.gov/SNP/snp_viewTable.cgi?handle5KAMBOH).

APOE Sequencing Results
The codon position used for specifying the coding variants corresponds to the premature protein that also includes the first 18 amino acids of signal peptide. The distribution of the 40 variants is as follows: 10 in 59 flanking region, 7 in exons (including 2 in 39 UTR), 16 in introns (including 1 in splice site), and 7 in the 39 flanking region. Four of the 5 coding variants (80%) were non-synonymous. Ten of the 40 variants were present in both groups, while 9 variants were unique to NHWs and 21 variants were specific to African Blacks. Four of the ten shared-variants showed statistically significant allele frequency differences between the two ethnic groups (see Table S2 in S1 File for variants at positions 560, 624, 832, and 1163).

Distribution of APOE sequence variants in two extreme HDL-C groups
Comparison of sequencing variants distribution between the two extreme HDL-C groups in NHWs and African Blacks is presented in Table S3 in S1 File and Table  S4 in S1 File, respectively. Among the 8 rare/uncommon variants (overall MAF,5%) in NHWs, 6 were unique to the high HDL-C group, 1 was unique to the low HDL-C group, and 1 was present in both lipid groups. In parallel with observing more unique rare variants in the high HDL-C group, 21% (10/47 subjects) of this group had at least one unique rare variant as compared to 2% (1/48 subjects) of the other lipid group (Fisher exact test p-value50.0037). Furthermore, the two rare coding variants observed in this study (Ala23Ala; Val254Glu) were present only in the high HDL-C group.
Among the 21 rare/uncommon variants (overall MAF,5%) observed in African Blacks, 6 were unique to the high HDL-C group, 5 were unique to the low HDL-C group, and the remaining 10 were equally distributed among the two extreme HDL-C groups. Unlike NHWs, the distribution of the unique rare variants was similar in the two extreme lipid groups among African Blacks. Fifteen percent (7/48 subjects) of the high HDL-C group had at least one unique rare variant as compared to 6% (3/47 subjects) of the low HDL-C group (Fisher exact test p-value50.316).

Single-site association analysis of the SNPs in the entire NHW and Black samples
Following the identification of genetic variation in the sequencing step, common tagSNPs covering the entire APOE gene and rare variants were genotyped in the total sample of NHWs (n5623) and African Blacks (n5788) for genotype-phenotype association analyses. Initially, 20 variants in NHWs (9 tagSNPs, 8 rare variants, 2 suspicious SNPs, and 1 database SNP) and 32 variants in African Blacks (9 tagSNPs, 21 rare variants, 1 suspicious SNP, and 1 database SNP) were selected for genotyping. In NHWs, 2 of the 20 variants (APOE2294; MAF50.005, and APOE4951/ rs1081105;MAF50.042) failed in both Sequenom and TaqMan designs or runs, 2 suspicious variants (APOE4489, and APOE4490) were confirmed as not being genuine and one variant (APOE624/rs769446) with low call rate was excluded from the association analyses. So, a total of 15 variants (14 sequencing variants and 1 database SNP APOE3106/rs769452) were successfully genotyped in the entire NHW sample. In African Blacks, 6 of 32 variants (APOE471/rs439382;MAF50.132, APOE494;MAF50.005, APOE526;MAF50.005, APOE2576;MAF50.005, APOE4951/ rs1081105; MAF50.042, and APOE5229/rs80125357; MAF50.059) failed in both Sequenom and TaqMan designs or runs, the database SNP (APOE1586/rs74625294) and the suspicious variant (APOE91) were excluded because they turned out to be nonpolymorphic in our population and an additional variant (APOE1591/rs147236548) was excluded from the statistical analyses because it was out of HWE. Thus, a total of 23 variants were successfully genotyped in the entire African Black sample.

Gene-based association analysis
Gene-based tests including all APOE common and rare variants simultaneously within each ethnic group were performed ( Table 3 and Table 4). Gene-based association analysis showed significant associations (p,0.05) with TG, LDL-C and ApoB in NHWs and with LDL-C in African Blacks.
In Blacks, the strongest haplotype associations were observed with LDL-C. Similar to NHWs, the last four windows (17, 18, 19, and 20), which include common polymorphisms in exon 4 showed the most significant p-values with LDL-C in African Blacks (p-values range between 8.86E-09 and 8.2E-06). Additional twelve windows showed significant effect on LDL-C (p-values range between 0.0022 and 0.036) including windows 1, 2, 3, 4, 5, 6, 11, 12, 13, 14, 15, and 16 and confirming the single-site effects of multiple variants (APOE560rs449647, APOE624/rs769446, APOE832/rs405509, APOE2269/ rs61357706, and APOE2544/rs115299243) on LDL-C Unlike NHWs, only two windows (18, and 19) showed significant global p-value (0.035, and 0.038) with ApoB more likely due to the significant effect of E*2. Only the first window showed significant global p-value with TG (p50.035), most likely due to the effect of APOE73/rs1081101 as seen in the single-site analysis (p50.0093). Findings from haplotype-based association analyses confirm the single-site association results.

Uncommon/Rare variants association analysis
Uncommon/rare variants association analysis was performed to examine the cumulative effect of uncommon/rare variants (MAF,5%) on lipid traits (HDL-C, LDL-C, and TG) using SKAT-O test. We found significant association with HDL-C in NHWs after including all 7 uncommon/rare variants in the analysis (p50.0061), and APOE1575/rs769448 with MAF50.021 contributed largely to this significance ( Table 5) as it also showed the most significant association in single-site analysis (p50.0197). In Blacks, rare variants analysis ( Table 6) showed significant association with LDL-C (p50.00018) and the significant association was driven by three variants with MAF between 0.017 and 0.020 (APOE2269/ rs61357706, APOE2544/rs115299243, and APOE4036/rs769455), all of which showed significant association in single-site analysis (p range50.0009-0.0064).

Functional annotation of the sequence variation
We used open-access database RegulomeDB (http://regulome.stanford.edu) to predict the potential implication of the identified genetic variation on the gene expression regulation. The RegulomeDB score of 1-5 is based on its strength of association with the gene regulation process; the lowest score represents the highest significant impact on regulation process (based on these features; expression quantitative trait loci (eQTL), transcription binding site or DNase hypersensitivity) while the highest score represents the least significant implication in regulation process. The RegulomeDB score for each variant is given in Tables 1 and 2. According to the RegulomeDB score, three variants in the 59 flanking region (ApE173, ApE624/rs769446, and ApE832/rs405509), one intronic variant (ApE1231), and three variants in the 39flanking region (ApE5223, ApE5231, and ApE5361/rs1081106) seem to affect gene expression as they have small scores (RegulomeDB score51-3). However, only two of these variants (APOE 624/rs769446 and APOE832/rs405509) had significant effect on LDL-C or ApoB or TG, and these two variants, APOE5223 and APOE5361/rs1081106, showed borderline effects on LDL-C and HDL-C, respectively. Although the remaining variants with strong regulatory effects (APOE173, APOE1231, and APOE5231) were not associated with lipid variation, they may yet have other biological consequences.

Discussion
The role of common APOE genetic variation in affecting interindividual variation in plasma cholesterol, especially LDL-C, in the general population is well established. Less clear, however, is if APOE genetic variation has also an impact on other major lipid traits, like plasma HDL-C and TG. Recent lipid GWAS indicate that in addition to LDL-C, APOE common variants are also associated with HDL-C and TG levels [17][18]. Since common variants explain only ,25-30% of the genetic variance of each major lipid trait [18], it has been hypothesized that uncommon low-frequency and rare variants in candidate genes may explain part of the missing heritability, as it has already been shown for some lipid genes [30][31][32][33][34][35][36]. Thus, deep resequencing of the APOE gene is warranted to identify both uncommon and common variants that might affect plasma lipid profile. The objective of this study was to evaluate the 'common disease common variants' (CDCV) and 'common disease rare variants' (CDRV) hypotheses by sequencing the entire APOE gene in selected individuals (n5190) with extreme HDL-C levels from two ethnic groups in the variant discovery stage and then genotyping common tagSNPs and relevant uncommon/rare variants in the full datasets (NHWs5623, and Blacks5788) to evaluate their association with lipid traits. To our knowledge, this is the first population-based association study designed to evaluate the effect of the full spectrum of APOE genetic variation on major plasma lipid traits and ApoB levels. Previously, sequencing of the APOE gene has been reported in two different studies [38][39] and by the 1000 Genome project in order to characterize its genetic variation in unselected individuals without regards to lipid levels. Furthermore, most of the previous studies have only evaluated the influence of APOE coding and promoter variants on lipid traits [30][31][32][33][34][35][36][37][38][39][40][41][42]. By sequencing ,5.5 kb of the APOE gene region, including all four exons, three introns, and ,1 kb in each flanking region in selected individuals with extreme HDL-C levels in both population groups, we identified a total of 40 variants, including 10 novel variants not previously reported. As expected African Blacks tend to have more population-specific variants (21/31568%) as compared to NHWs (9/19547%). In NHWs, the proportion of common and uncommon variants was similar (56% vs. 44%), while in African Blacks more uncommon variants were observed than common ones (70% vs. 30%) ( Table S2 in S1 File). We observed more subjects carrying group-specific uncommon variants in the high HDL-C group than in the low HDL-C group in NHWs (21% vs. 2%; p50.0037) and in African Blacks (15% vs. 6%; p50.316), although the difference in Blacks was not statistically significant. Likewise, the cumulative uncommon/ rare variant analysis using SKAT-O also showed significant association with HDL-C in NHWs (p50.0061; Table 5).
Of the above-mentioned 8 significant variants independent of the E*2 and E*4 SNPs only 3 (APOE832/rs405509, APOE1163/rs440446, and APOE4036/rs769455 (Arg163Cys)) have been examined previously in relation to lipid traits. APOE832/ rs405509 located in the putative promoter region has previously been shown to be associated with LDL-related traits (LDL-C, TC, and ApoB) [16][17]45], APOE gene expression [46], myocardial infarction risk [47], and premature CHD [48]. Our findings confirm the potentially important role of this variant in LDL metabolism by observing significant associations with LDL-C. APOE1163/ rs440446 was earlier reported to be associated with CHD risk [49] and our current finding with its association with LDL-C validates this link given the relation between LDL-C levels and CHD risk. The non-synonymous variant APOE4036/ rs769455 (Arg163Cys) has previously been reported to be associated with type III hyperlipoproteinemia [50][51] and is probably the main contributor to the significance signal of the two other closely linked variants (APOE2269/rs61357706 and APOE2544/rs115299243) with LDL-C.
In addition to the known contribution of APOE to LDL-C, we have found multiple associations of common and uncommon variants with TG and HDL-C. One NHW-specific uncommon variant (APOE1575/rs769448) was associated with elevating effect on HDL-C (p50.0223) and one rare Black-specific variant (APOE618) was associated with extremely low HDL-C, (p50.001), implying the significant contribution of APOE uncommon/rare variants on plasma HDL-C variation. To our knowledge, these are novel associations and need to be confirmed in independent studies. Based on their locations (intron 1, and 59 flanking region, respectively), and RegulomeDB scores [4], they may be moderately involved in gene expression regulation. Nine variants showed significant association with TG, including five in NHWs: APOE832/rs405509 (p50.002), APOE1163/rs440446 (p50.0012), APOE2440/rs769450 (p50.0022), APOE4310/rs199768005 (p50.028), and APOE4528/rs374329439 (p50.0218) and four in Blacks: APOE73/rs1081101 (p50.0115), APOE1279/rs877973 (p50.014) APOE2544/rs115299243 (p50.038), and APOE4036/rs769455 (p50.0343). Four of these variants are uncommon, including two present only in NHWs (APOE4310/rs199768005/Val254Glu, and APOE4528/rs374329439) and two present only in African Blacks (APOE2544/rs115299243, and APOE4036/ rs769455/Arg163Cys). Two of these population-specific variants involving nonsynonymous changes (Arg163Cys, and Val254Glu) have previously been reported to be associated with type III hyperlipoproteinemia either in E*2-independent (rs769455/Arg163Cys) [50][51] or E*2-dependent (rs199768005/Val254Glu) [41] fashion. In our population-based samples while Arg163Cys was associated with higher TG levels, Val254Glu was associated with lower TG levels. The latter observation may not be surprising given that this variant was associated with hypertriglyceridemia only among E*2 carriers [41] and all our 5 subjects with this mutation in our study were non-E*2 carriers. This also implies that Val254Glu variant may be protective in the absence of E*2. In accordance with our observations, APOE832/rs405509 [17] has been found previously to be associated with VLDL as an indicator of TG variation and APOE1163/rs440446 [49] has previously been found to be associated with TG variation. To our knowledge, the remaining five TG associations observed in this study have not been reported previously and await confirmation in future studies.
In summary, this is the first comprehensive study that has evaluated the association of APOE common and rare variation with plasma lipid traits in two ethnic groups. In addition to the known association of common APOE variation with LDL-C, we have found that uncommon APOE variants also affect LDL-C levels. Our data also indicate the contribution of APOE genetic variation in affecting HDL-C and TG levels in the general population. Strengths of our study include the use of two extreme lipid groups for resequencing from two ethnic groups and then genotyping of the entire sample sets for genotype-phenotype association analyses. Limitations of our study include the use of relatively small sample sizes for resequencing. Many of our significant findings with uncommon/ rare variants should be considered provisional until replicated in independent and large data sets.