Apolipoprotein E-C1-C4-C2 gene cluster region and inter-individual variation in plasma lipoprotein levels: a comprehensive genetic association study in two ethnic groups

The apolipoprotein E-C1-C4-C2 gene cluster at 19q13.32 encodes four amphipathic apolipoproteins. The influence of APOE common polymorphisms on plasma lipid/lipoprotein profile, especially on LDL-related traits, is well recognized; however, little is known about the role of other genes/variants in this gene cluster. In this study, we evaluated the role of common and uncommon/rare genetic variation in this gene region on inter-individual variation in plasma lipoprotein levels in non-Hispanic Whites (NHWs) and African blacks (ABs). In the variant discovery step, the APOE, APOC1, APOC4, APOC2 genes were sequenced along with their flanking and hepatic control regions (HCR1 and HCR2) in 190 subjects with extreme HDL-C/TG levels. The next step involved the genotyping of 623 NHWs and 788 ABs for the identified uncommon/rare variants and common tagSNPs along with additional relevant SNPs selected from public resources, followed by association analyses with lipid traits. A total of 230 sequence variants, including 15 indels were identified, of which 65 were novel. A total of 70 QC-passed variants in NHWs and 108 QC-passed variants in ABs were included in the final association analyses. Single-site association analysis of SNPs with MAF>1% revealed 20 variants in NHWs and 24 variants in ABs showing evidence of association with at least one lipid trait, including several variants exhibiting independent associations from the established APOE polymorphism even after multiple-testing correction. Overall, our study has confirmed known associations and also identified novel associations in this genomic region with various lipid traits. Our data also support the contribution of both common and uncommon/rare variation in this gene region in affecting plasma lipid profile in the general population.

Introduction Dyslipidemia with elevated low-density lipoprotein cholesterol (LDL-C) and reduced highdensity lipoprotein cholesterol (HDL-C) is a major risk factor for cardiovascular disease, the leading cause of death worldwide [1]. Plasma lipoprotein-lipid variation is under genetic control and its estimated heritability ranges between 40-80% [2,3]. Although more than 100 lipidassociated loci have been identified, common variants in these loci explain only a small proportion of estimated heritability for lipid traits [4][5][6][7][8][9]. This indicates that there could be additional uncommon/rare variants that might contribute to the remaining unexplained heritability. Therefore, sequencing of candidate genes in subjects with extreme lipid traits would be an appropriate approach to identify all potentially causal common and uncommon/ rare variants affecting plasma lipid trait variation.
The APOE-C1-C4-C2 gene cluster, encoding four amphipathic apolipoproteins and encom-passing~45 kb, is located on chromosome 19q13.32. This cluster also includes two hepatic control regions (HCR1 and HCR2) that regulate the hepatic expression of these genes [10][11][12][13][14][15][16][17]. ApoE participates in reverse cholesterol transport mechanism and mediates hepatic uptake of triglycerides-rich lipoprotein [18]. ApoC1 is involved in lecithin cholesterol acyl transferase activation [19] and cholesterol ester transfer protein inhibition [20,21]. ApoC4 is involved in triglyceride (TG) metabolism [22] and apoC2 is a cofactor for lipoprotein lipase enzyme [23]. Although the major contribution of the APOE-C1-C4-C2 gene cluster is in the regulation of LDL-related traits, recent genome-wide association studies (GWAS) also reported their significant associations with TG and HDL-C levels [4,5,8] Our group has previously reported the sequencing-based analysis of APOE genetic variation and its association with plasma lipoprotein traits in non-Hispanic Whites (NHWs) and African blacks (ABs) [24]. The objective of this study was to extend our work to include the entire APOE-C1-C4-C2 gene region in order to comprehensively evaluate the association of  (47 NHWs,48 Blacks) and lower 10 th percentile (48 NHWs and 47 Blacks) of HDL-C/TG distribution], followed by genotyping of identified relevant singlenucleotide polymorphisms (SNPs) in NHWs (n = 623) and ABs (n = 788) for genotype-phenotype association analyses with plasma lipid traits.

Study samples
The study was conducted on two well-characterized epidemiological samples, including 623 NHWs and 788 ABs (S1 Table). NHW samples were collected as part of San Luis valley Diabetes Study (SLVDS) and AB samples as part of a previous study on coronary heart disease risk factors in Benin City, Nigeria. The study details, including methods and sample characteristics can ben found elsewhere [25,26,27,28]. All NHW subjects included in this study were non-diabetic. While LDL-C, HDL-C and TG levels were measured in all subjects, apoB and apoA1 measurements were available only in a subset of individuals [29,30]. This study was approved by the University of Pittsburgh Institutional Review Board, and all study participants provided written informed consent. DNAs obtained from the study subjects (extracted from blood clots in ABs and buffy coats in NHWs following standard procedures) were used for the sequencing and genotyping experiments (described below) following whole genome amplification.

DNA sequencing
Subjects with extreme HDL-C/TG levels falling in the upper 10 th percentile (47 NHWs and 48 ABs) and lower 10 th percentile (48 NHWs and 47 ABs) of HDL-C/TG distribution were selected for the initial sequencing-based variant discovery (S2 Table).
All four genes (APOE, APOC1, APOC4, APOC2) along with their 5' & 3' flanking regions and their two hepatic control regions (HCR1, and HCR2), which represent more than 50% (excluding intergenic regions and APOCIPI pseudogene) of the entire APOE/C1/C1P1/C4/C2 gene cluster region (~45 kb), were targeted for sequencing in 190 individuals with extreme HDL-C/TG levels from two ethnic groups (NHWs and Blacks) using Sanger sequencing method. The targeted region sizes were 5,491 bp for APOE, 6,687 for APOC1, 5,086 bp for APOC4, 6,438 bp for APOC2, 820 bp for HCR1, and 849 bp for HCR2. We used SeattleSNPs reference sequences for APOE, APOC1, APOC4, and APOC2 and NCBI database (build 137) to locate HCR1 and HCR2 reference sequences according to Allan et al. (1995) [12] and Dang et al. (1995) [17]. For the genes with insufficient 5' and/or 3' flanking region coverage at Seat-tleSNPs database, additional sequences were adopted from Chip Bioinformatics database. PCR amplification of targeted genomic regions was performed using either M13 tagged primers listed at SeattleSNPs database or the primers that we designed using the Primer3 software (see S3 Table for primer sequences), in order to generate overlapping PCR amplicons covering each targeted region (PCR conditions are available upon request). Automated Sanger sequencing of generated PCR amplicons was performed (in both directions) in a commercial lab (Beckman Coulter Genomics, Danvers, MA). We used Variant Reporter (Applied Biosystems, Foster City, CA) and Sequencher (Gene Codes Corporation, Ann Arbor, MI) software to analyze the resulting sequencing data for variant detection.

Variant selection for follow-up genotyping
We analyzed the sequencing data separately in each ethnic group to identify common tagSNPs (MAF�0.05, r 2 = 0.9) and uncommon/rare variants (MAF<0.05) to be included in follow-up genotyping of NHW and AB samples. Suspicious sequence variants with borderline quality (that warrant validation) and additional common variants reported in SeattleSNPs and/or Chip Bioinformatics databases within the sequenced regions (but not successfully captured by our sequencing) were also targeted for follow-up genotyping. Moreover, additional common tagSNPs were selected from the HapMap database for genotyping in order to achieve a full coverage of the entire region of interest at 19q13.32 (including intergenic regions) for common variation in each ethnic group.

Genotyping
The iPLEX Gold (Sequenom, San Diego, CA, USA) or TaqMan (Applied Biosystem, foster City, CA, USA) methods following manufacturer's protocol were employed to genotype selected variants in the entire sample sets of 623 NHWs and 788 ABs. Whole genome amplified DNAs in 384-well plates were used both genotyping methods. The ABI Prism 7900HT Sequence Detection Systems was used for end-point fluorescence reading for TaqMan genotyping and genotype calls were analyzed by using the SDSv2.4.1 and TaqMan Genotyper software. The MassARRAY iPlex Gold (Sequenom, San Diego, CA) genotyping technique was applied in Genomics and Proteomics Core laboratories of the University of Pittsburgh. In addition to random replicates included in the genotyping process for quality control (QC) assessment, the subsets of samples used in both sequencing and genotyping steps allowed us to also evaluate the concordance between the sequencing and genotyping results. QC filters used for genotyped variants included extensive missing data (>15%) and/or deviation from Hardy-Weinberg Equilibrium (HWE) (P<0.01).

Statistical analyses
Analyses were performed separately in NHWs and ABs. The haploview software (www. broadinstitute.org/haploview) was used to analyze the sequencing data to determine SNP allele/genotype frequencies, SNPs concordance with HWE, and their linkage disequilibrium (LD) patterns.
The Box-Cox transformation was used to normalize the distribution of apoB, HDL-C and TG levels in NHWs and that of all lipid traits in ABs. Significant covariates for each trait were identified using stepwise regression to select the most parsimonious set of covariates for each trait in each ethnic group [gender, age, BMI, and smoking in NHWs; gender, age, BMI, waist measurement, smoking, exercise (minutes walking or bicycling to work each day), and staff level (junior or senior, an indicator of lower or higher socio-economic status) in ABs]. A total of 70 QC-passed variants in NHWs and 108 QC-passed variants in Blacks were included in final association analyses with lipid traits. In addition to single-site, haplotype-based and uncommon/rare variant association analyses (conducted using R program) were also performed.
In single-site association analysis, additive linear regression model was used to test the associations between SNPs and plasma lipid levels (HDL-C, LDL-C, TC, and TG) and apoB and apoA1 levels. A P-value of less than 0.05 was considered as suggestive evidence of association for initial observations. P-values were also adjusted for the APOE2/3/4 polymorphism given its established effect on cholesterol levels. After applying the Meff (effective number of independent tests) method for multiple-testing correction [31], 8 and 14 independent tests were identified for 70 QC-passed variants in NHWs and 108 QC-passed variants in ABs, respectively. Thus, after correcting for the number of independent tests performed, we considered P<6.25E-03 (0.05/8) and P<3.57E-03 (0.05/14) as statistically significant in NHWs and Blacks, respectively.
For haplotype association analysis, the generalized linear model (GLM) [32] was used. Because including too many haplotypes can make this analysis inefficient and impractical, we used a sliding window approach (4-SNP per window, sliding one SNP at a time) and assessed evidence for association within each window. A global P-value for overall effect of all haplotypes (with frequency greater than 0.01) in each window was used to assess their association with lipid traits. The sliding-window haplotype analysis was performed using the haplo.glm function in the Haplo.Stats R package.
The cumulative effects of uncommon/rare variants (MAF<0.05) were analyzed by using the SKAT-O method [33], which has been proposed to be the optimal test for rare variant analysis that exceeds the SKAT and burden tests. The analyses were performed using three different MAF bin thresholds (�1%, �2% and <5%) by employing the SKAT R package.

DNA sequencing
Sequencing of four genes (APOE, APOC1, APOC4, APOC2) along with their 5' and 3' flanking regions and their two hepatic control regions (HCR1 and HCR2) in selected 190 subjects with extreme HDL-C/TG levels revealed a total of 230 variants (215 substitutions and 15 indels), of which 160 were previously reported and 65 were novel (not reported in public databases). While 63 of 230 variants were present in both ethnic groups, 52 were specific to NHWs and 115 were specific to ABs.
All novel SNPs and short indels identified in this study [excluding a large indel (114 bp) in APOC4 5' flanking region] were submitted to dbSNP database (http://www.ncbi.nlm.nih.gov/ SNP/snp_viewTable.cgi?handle=KAMBOH) and assigned refSNP IDs can be found in S4 and S5 Tables.

Genotyping
Tagger analysis results for the identified common sequence variants (MAF�0.05, r 2 = 0.9) are presented in S6 Initially, a total of 103 variants were selected for follow-up genotyping in NHWs, consisting of 90 variants identified by sequencing (33 common SNPs based on Tagger results, 53 uncommon/rare variants, and 4 suspicious variants), 6 additional common SNPs reported in Seat-tleSNPs and/or Chip Bioinformatics databases within the sequenced regions, and 7 additional tagSNPs from the HapMap database. Probably because of the high degree of sequence homology among the members of this gene cluster, we observed a relatively high failure rate; a total of 22 variants (6 common SNPs, 15 uncommon/rare variants and one HapMap SNP) failed genotyping. Eleven out of 81 genotyped variants were excluded from final statistical analyses, including 4 suspicious variants that turned out to be sequencing artifacts, 4 database SNPs (according to Chip Bioinformatics and SeattleSNPs; rs12721047, rs12709888, rs76186107, rs5164) and one HapMap SNP (rs5127) that turned out to be monomorphic in our NHW sample, and 2 variants that failed post-genotyping QC (APOE/rs769446 had low call rate and APOC1/rs12721052 was out of HWE). Therefore, a total of 70 QC-passed variants (65 variants identified by sequencing and 5 additional tagSNPs selected from HapMap) were included in final association analyses in NHWs, comprising 29 common SNPs (MAF�0.05) and 41 uncommon/rare variants (MAF<0.05) (see S10 Table).
Initially, a total of 160 variants were selected for follow-up genotyping in ABs, consisting of 152 variants identified by sequencing (58 common SNPs based on Tagger results, 90 uncommon/ rare variants, and 4 suspicious variants), one additional common SNP reported in SeattleSNPs and/or Chip Bioinformatics databases within the sequenced regions, and 7 additional tagSNPs from the HapMap database. Since genotyping of variants in this gene cluster region is challenging, we ended up with a total of 42 failures: 12 common SNPs, 27 uncommon/rare variants, and 3 suspicious variants. Ten out of 118 genotyped variants were excluded from final statistical analyses because they either turned out to be monomorphic, were out of HWE, or had low call rate. Thus, a total of 108 QC-passed variants (103 variants identified by sequencing and 5 additional tagSNPs selected from HapMap) were advanced into final association analyses in ABs, comprising 48 common SNPs (MAF�0.05) and 60 uncommon/rare variants (MAF<0.05) (see S11 Table).

Association analyses
Single-site association analysis. Single-site association analysis revealed 20 variants (MAF>1%) in NHWs and 24 (MAF>1%) in ABs with suggestive evidence of association (P <0.05) with at least one lipid trait, including the two SNPs (rs7412 and rs429358) that define the APOE 2/3/4 polymorphism (Tables 1 & 2). After adjusting the observed associations for the effects of APOE � 2/rs7412 and APOE � 4/rs429358, 11 of 18 variants in NHWs and 15 of 22 variants in ABs exhibited independent associations, including one variant (APOE/rs440446) showing association with LDL-related traits in both populations (LDL-C, TC in NHWs; apoB in ABs).
The established associations of APOE � 2/E � 4 alleles with LDL-C are replicated in both ethnic groups included in this study (Tables 1 & 2); APOE � 2/rs7412 was associated with lower LDL-C levels (P = 1.84E-07 in NHWs and P = 5.35E-07 in ABs) and APOE � 4/rs429358 was associated with higher LDL-C levels (P = 0.01 in NHWs and P = 0.032 in ABs). In addition to their established association with LDL-C, we observed APOE � 2/rs7412 to be associated with lower TC levels in both ethnic groups (P = 9.51E-06 in NHWs and P = 1.0E-04 in ABs), with lower apoB levels in both ethnic groups (P = 9.65E-13 in NHWs and P = 0.0356 in ABs) and with higher apoA1 levels in ABs (P = 8.0E-05); while APOE � 4/rs429358 showed association with higher TC levels (P = 0.0383) and higher apoB levels (P = 5.0E-04) in NHWs and a nonsignificant but similar trend of association with TC levels in ABs.
Rare/Uncommon variants association analysis. In NHWs, while significant associations were observed with TC for all tested MAF thresholds (�1%, �2% and <5%), the most significant result was detected for variants with MAF�0.01 (P = 0.0088), indicating the major impact of rare variants on TC. In ABs, significant association was detected between variants with MAF�0.01 and TG (P = 0.0302). On the other hand, variants with MAF�2% and MAF<5% showed association with apoA1 (P = 0.025 to 0.021), indicating a modest effect of variants with MAF<5% on TG (see Table 3).
Haplotype-based association analysis. S24-S35 Tables and Figs 3-14 show the results for haplotype-based association analysis with lipid traits using the sliding window approach in NHWs and ABs, respectively. For each window, the most common haplotype was used as the reference haplotype to compare with other haplotypes to calculate the p-values.
In NHWs, multiple haplotype windows showed significant global P-values for association with LDL-C, TC, and/or apoB levels confirming the single-site association results. The top significant global P-values (1.12E-07�P�1.02E-06) for LDL-C included windows 8 through 11 that contained the APOE � 2/E � 4 alleles. Whereas the significance of a number of relevant windows appeared to be driven by significant variants with MAF>0.01 (see above for single-site analysis results), we also found six significant windows (windows 46-47 and 60-63) that did not include any significant variants with MAF>0.01 and thus suggesting the cumulative effects of other variants. Similarly, multiple haplotype windows showed significant global P-values for association with TC and apoB levels, of which the top ones included windows 8 through 10 (2E-06�P�8E-06 and 4.37E-14�P�9.72E-13 for TC and apoB, respectively). Six windows (windows 46-47, 60-63) associated with TC and two windows (windows 19-20) associated  with apoB were significant without containing any significant variants with MAF>0.01. Moreover, a total of thirteen haplotype windows showed significant global P-values for association with TG levels, of which the top ones included windows 1 and 23-24 (0.00374�P�0.00429) that harbored significant variants with MAF>0.01. However, there were five significant TGassociated windows (windows 33, 47-49 and 58) that did not contain any significant variants    In ABs, multiple haplotype windows showed significant global P-values for association with LDL-C and TC levels (8.86E-09�P�0.049 and 4E-05�P�0.048, respectively) as seen in NHWs, but the effect on apoB levels (0.035�P�0.037) was smaller than that observed in NHWs. The top significant LDL-C and TC-associated windows (window 17 for LDL-C and window 18 for TC) harbored the APOE � 2/E � 4 and APOE � 4/rs429358, respectively. Whereas the significance of a number of LDL-C and TC-associated windows appeared to be driven by  https://doi.org/10.1371/journal.pone.0214060.g008 significant variants with MAF>0.01, we also observed one significant region for LDL-C (windows 95-97) and one significant window for TC (window 5) that did not include any significant variants identified in single-site association analysis (see above). Moreover, we observed three HDL-C-associated (windows 41-43), seven TG-associated (windows 1, 46-49, 79-80) and twelve apoA1-associated (windows 16-19, 41, 43, 54, 63, and 71-74) significant haplotype windows (3.7E-04�P�1.2E-03, 8.6E-04�P�0.01 and 5E-04�P�0.049, respectively). Although all of TG-associated significant windows contained variants that yielded significant associations in single-site analysis, all three HDL-C-associated windows (windows 41-43) and four of apoA1-associated windows (windows 41, 43, 54 and 63) were significant without containing any significant variants identified in single-site association analysis.

Discussion
In this study we resequenced four genes in the APOE-C1-C4-C2 cluster at 19q13.32 along with their 5' & 3' flanking regions and hepatic control regions (HCR-1 and HCR-2) in selected NHW and AB subjects with extreme HDL-C/TG distribution in order to examine the role of identified common tagSNPs and uncommon/rare variants with plasma lipid levels in two  epidemiologically well-characterized samples. Additional common tagSNPs selected from the HapMap database were also included in order to achieve a full coverage of the APOE-C1-C4-C2 gene region (including intergenic segments) for common variation. Although the established contribution of the APOE region is on LDL-related traits, recent GWAS meta-analyses reported multiple SNPs in this gene region to be associated with TG and HDL-C [4,5,8,34,35,36]. Therefore, we considered four major lipid traits (plasma LDL-C, TC, HDL-C, and TG levels) and two correlated apolipoprotiens (apoB and apoA1 levels) for our genotype-phenotype association analyses.
To our knowledge, this is the first study that has considered both common and uncommon/rare variants for genotype-phenotype association analysis of this gene cluster. We compared our sequencing data with a previously sequenced data in the four genes in this cluster in 48 African Americans and 48 white Americans (SeattleSNPs database). We detected all previously reported common variants (MAF�5%) in SeattleSNPs and NCBI's dbSNP build 138 for these two ethnic groups (n = 160) and identified 65 new variants in our NHW and AB subjects.
Single-site association analysis of common/uncommon variants (MAF>1%) revealed evidence of association (P<0.05) with at least one lipid trait including the well-known APOE � 2/ rs7412and APOE � 4/rs429358 polymorphisms. The established associations of the APOE � 2/E � 4 alleles with LDL-C and related traits were replicated in this study such that APOE � 2/rs7412 (T) was associated with lower LDL-C, TC, and apoB levels in both ethnic groups, and APOE � 4/ rs429358(C) was associated with higher LDL-C, TC, and apoB levels in NHWs and with higher LDL-C levels in ABs.
After adjusting for the effects of APOE � 2/E � 4 alleles, we observed 11 variants in NHWs and 15 variants in ABs that showed independent associations with at least one lipid trait. Eight variants in NHWs and 10 variants in ABs exhibited independent associations with LDL-related traits, including APOE/rs440446 that remained significantly associated with LDL-related traits in both populations. Four of the variants that showed independent associations with LDLrelated traits in our study (APOE/rs440446 and APOC1P1/rs5112 in NHWs and APOE/ rs769455 and APOE/rs61357706 in ABs) maintained their significance even after multiple-testing correction. The novel association of other 8 LDL-related traits-associated non-APOE variants (APOC1/rs3826688, APOC1/rs1064725, APOC2-C4/rs2288912, APOC2/rs5120, APOC2/ rs10422888, APOC2-C4/rs12709885, APOC4/rs5157, APOC2-C4/rs75463753) should be considered provisional until replicated in independent larger samples. The intronic variant, APOE/rs440446, was previously reported to be associated with TG levels and CHD risk in a large Finnish cohort [37] and our current finding of its association with TG, LDL-C and TC levels in NHWs (Table 1) and with apoB levels in Blacks (Table 2) reaffirm the importance of this SNP. In our study, the APOE/rs769455 non-synonymous variant (Arg163Cys) detected only in ABs showed association with higher TG and lower LDL-C and TC levels. Previously the same variant was found to be associated with type III hyperlipoproteinemia in five Latin-American family members [38,39] and, in accordance with our finding, Coram et al. (2013) [40] reported its association with TG. APOE/rs769455 was in strong LD (r 2 = 1) with an intronic variant (APOE/rs61357706), which also showed population-specific association with lower LDL-C levels in ABs in our study. To the best of our knowledge, this SNP-trait association was not previously reported in any population. Previously, APOE/rs405509 was found to be associated with LDL-related traits [41][42][43][44][45], and in agreement, we observed this variant to be associated with LDL-C, TC and TG levels in NHWs (Table 1). Ken-Dror et al. (2010) [45] have reported the association of rs4803770 with LDL-C and apoB, while we found this variant to be associated with LDL-C, TC and TG levels in NHWs.
One intergenic variant, rs7259004, which was previously found to be associated with LDL-C and apoB in US Whites [45], initially showed association with LDL-related traits in NHWs that disappeared after adjusting for the effects of APOE � 2/E � 4 in our study. Previous studies have also shown the association of APOC1/rs11568822 with elevated APOC1 expression, dysbetalipoproteinemia, and higher risk of CHD and Alzheimer's disease [46][47][48]. Moreover, this variant was found to be associated with TG, apoB and HDL-C among APOE � 3 carriers [49]. In our study, APOC1/rs11568822 was associated with LDL-C and apoA1 levels in AB dependent of APOE � 2/E � 4. The previously reported association of APOC4/rs12721109 with LDL-C [50,51] and apoB [52] was also replicated in our NHW sample, but it disappeared following APOE � 2/E � 4 adjustment. On the other hand, we observed a novel and independent association of APOC4/rs12721109 with TG in NHWs. Previously, APOE/rs449647 was found to be associated with lower LDL-C in US Whites but higher LDL-C in African [53]. Although we also observed similar opposite associations of this SNP in our NWH and AB samples, they did not survive after adjusting for APOE � 2/E � 4.
In addition to significant association of variants with MAF>1% with lipid traits, our rare/ uncommon variants association analysis has revealed significant association of variants with MAF�1% with TC in NHWs and TG in ABs, indicating an additional contribution of rare variants to inter-individual variation in plasma lipid levels, as it has previously been shown for some lipid-related genes/loci [56][57][58][59][60][61][62]. Moreover, our haplotype-based association analysis helped us to identify a number of significant haplotype windows not harboring individually significant variants (in addition to confirming our single-site analysis results), thus suggesting the cumulative effects of variants with weak effects captured by this approach.
Our study has some limitations. The sample size of the sequencing sample was relatively small and thus we may have missed the identification of some relevant/functional variants. Also, we primarily targeted the relevant genes and their flanking (or known regulatory) regions for sequencing, but not the intergenic regions; the latter were evaluated by tagSNP genotyping only. Moreover, in addition to some initially observed associations that were attenuated by APOE � 2/E � 4 adjustment, a number of the identified independent associations lost their significance after multiple-testing correction. Nevertheless, we were able to confirm several known associations and also identified some novel associations, awaiting replication in larger independent samples.
In summary, the association of APOE-C1-C4-C2 gene cluster variation with the evaluated lipid traits confirms the importance of this genomic region in affecting plasma lipid profile in the general population. Our study also supports the involvement of both common/uncommon and rare variants in regulating plasma lipid variation.
Supporting information S1  Table. Features of 70 QC-passed genotyped variants in NHWs (n = 623). HWE-P: Hardy Weinberg equilibrium, MAF: minor allele frequency, Position: chromosomal position corresponding to Chip bioinformatics database (NC_000019.9). RegulomeDB scores were generated by using http://regulome.stanford.edu/. Scores represent; "1a-eQTL + TF binding + matched TF motif + matched DNase Footprint + DNase peak; 1b-eQTL + TF binding + any motif + DNase Footprint + DNase peak; 1c-eQTL + TF binding + matched TF motif + DNase peak; 1d-eQTL + TF binding + any motif + DNase peak; 1e-eQTL + TF binding + matched TF motif; 1f-eQTL + TF binding / DNase peak; 2a-TF binding + matched TF motif + matched DNase Footprint + DNase peak; 2b-TF binding + any motif + DNase Footprint + DNase peak; 2c-TF binding + matched TF motif + DNase peak; 3a-TF binding + any motif + DNase peak; 3b-TF binding + matched TF motif; 4-TF binding + DNase peak; 5-TF binding or DNase peak; 6-other." Selection criteria: 1) Common tagSNPs identified by Tagger analyses of sequencing data (MAF�0.05, r 2 = 0.9); 2) Rare/uncommon variants identified by sequencing (MAF<5%); 3) Additional common SNPs selected from public resources. (DOCX) S11 Table. Features of 108 QC-passed genotyped variants in ABs (n = 788). HWE-P: Hardy Weinberg equilibrium, MAF: minor allele frequency, Position: chromosomal position corresponding to Chip bioinformatics database (NC_000019.9). RegulomeDB scores were generated by using http://regulome.stanford.edu/. Scores represents; 1a-eQTL + TF binding + matched TF motif + matched DNase Footprint + DNase peak; 1b-eQTL + TF binding + any motif + DNase Footprint + DNase peak; 1c-eQTL + TF binding + matched TF motif + DNase peak; 1d-eQTL + TF binding + any motif + DNase peak; 1e-eQTL + TF binding + matched TF motif; 1f-eQTL + TF binding / DNase peak; 2a-TF binding + matched TF motif + matched DNase Footprint + DNase peak; 2b-TF binding + any motif + DNase Footprint + DNase peak; 2c-TF binding + matched TF motif + DNase peak; 3a-TF binding + any motif + DNase peak; 3b-TF binding + matched TF motif; 4-TF binding + DNase peak; 5-TF binding or DNase peak; 6-other. Selection criteria: 1) Common tagSNPs identified by Tagger analyses of sequencing data (MAF�0.05, r 2 = 0.9); 2) Rare/uncommon variants identified by sequencing (MAF<5%); 3) Additional common SNPs selected from public resources. (DOCX) S12 Table. Single-site association analysis results for LDL-C levels in NHWs (n = 623). MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of the lipid trait in each genotype group;. � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. (DOCX) S13 Table. Single-site association analysis results for TC in NHWs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. (DOCX) S14 Table. Single-site association analysis results for ApoB in NHWs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. Four rare variants were excluded due to missing data. (DOCX) S15 Table. Single-site association analysis results for HDL-C in NHWs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. APOC1p703/rs3207187 was excluded due to missing data. (DOCX) S16 Table. Single-site association analysis results for ApoA1 in NHWs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. Four rare variants were excluded due to missing data. (DOCX) S17 Table. Single-site association analysis results for TG in NHWs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. (DOCX) S18 Table. Single-site association analysis results for LDL-C in ABs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. APOC2p5771 is excluded due to missing data. (DOCX) S19 Table. Single-site association analysis results for TC in ABs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. APOC2p5771 is excluded due to missing data. (DOCX) S20 Table. Single-site association analysis results for ApoB in ABs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates. APOC2p5771 is excluded due to missing data. (DOCX) S21 Table. Single-site association analysis results for HDL-C in ABs. MAF is the minor allele frequency; GT is genotype; GT count is the number of individuals in each genotype group; GT_SD is standard deviation of lipid traits mean in each genotype group; � Adjusted for relevant covariates, �� Adjusted for APOE � 2/E � 4 SNPs in addition to the covariates.