Genome-Wide Association Study Reveals Four Loci for Lipid Ratios in the Korean Population and the Constitutional Subgroup

Circulating lipid ratios are considered predictors of cardiovascular risks and metabolic syndrome, which cause coronary heart diseases. One constitutional type of Korean medicine prone to weight accumulation, the Tae-Eum type, predisposes the consumers to metabolic syndrome, hypertension, diabetes mellitus, etc. Here, we aimed to identify genetic variants for lipid ratios using a genome-wide association study (GWAS) and followed replication analysis in Koreans and constitutional subgroups. GWASs in 5,292 individuals of the Korean Genome and Epidemiology Study and replication analyses in 2,567 subjects of the Korea medicine Data Center were performed to identify genetic variants associated with triglyceride (TG) to HDL cholesterol (HDLC), LDL cholesterol (LDLC) to HDLC, and non-HDLC to HDLC ratios. For subgroup analysis, a computer-based constitution analysis tool was used to categorize the constitutional types of the subjects. In the discovery stage, seven variants in four loci, three variants in three loci, and two variants in one locus were associated with the ratios of log-transformed TG:HDLC (log[TG]:HDLC), LDLC:HDLC, and non-HDLC:HDLC, respectively. The associations of the GWAS variants with lipid ratios were replicated in the validation stage: for the log[TG]:HDLC ratio, rs6589566 near APOA5 and rs4244457 and rs6586891 near LPL; for the LDLC:HDLC ratio, rs4420638 near APOC1 and rs17445774 near C2orf47; and for the non-HDLC:HDLC ratio, rs6589566 near APOA5. Five of these six variants are known to be associated with TG, LDLC, and/or HDLC, but rs17445774 was newly identified to be involved in lipid level changes in this study. Constitutional subgroup analysis revealed effects of variants associated with log[TG]:HDLC and non-HDLC:HDLC ratios in both the Tae-Eum and non-Tae-Eum types, whereas the effect of the LDLC:HDLC ratio-associated variants remained only in the Tae-Eum type. In conclusion, we identified three log[TG]:HDLC ratio-associated variants, two LDLC:HDLC ratio-associated variants, and one non-HDLC:HDLC-associated variant in Koreans and the constitutional subgroups.

Introduction recruited from 22 oriental medical clinics for the Korea medicine Data Center (KDC) from 2006 to 2012. None of the subjects from the KoGES or KDC populations had a history of cancer treatment, postmenopausal hormonal therapy, and professional diagnosis or medication for dyslipidemia. Additionally, the KoGES subjects did not include those with low-quality genome-wide genotype data caused by gender inconsistencies, cryptic relatedness, and problems with genotype call rate and sample contamination as previously described [18]. All the subjects provided written informed consent to participate in the study, and the study was approved by the Institutional Review Board of the Korea Institute of Oriental Medicine.
The subjects (n = 5,229 in KoGES and n = 2,088 in KDC) were analyzed using an integrated diagnostic model consisting of face, body shape, voice, and questionnaire information, i.e., the Sasang Constitutional Analysis Tool (SCAT), in order to provide a basis for discriminating the constitutional types based on the probability values for each Sasang constitutional type [19]. The 63 KoGES and 479 KDC subjects were excluded after the SCAT analysis, due to missing data in the four components of the SCAT or low-quality data for facial pictures and vocal records [19]. Based on the tertiles of the SCAT probability values for the TE type, we divided the study subjects into 3 subgroups. The subjects on the top tertile were designated as the TE type (TE: n = 1,743 in the KoGES; n = 696 in the KDC), and those on the bottom tertile were designated as the NTE type (NTE: n = 1,743 in the KoGES; n = 696 in the KDC). To increase the reliability for the SCAT-determined constitutional type, the subjects with the middle tertile values were not used in the sub-group analysis based on TE type.

Genotyping
Genome-wide single nucleotide polymorphism (SNP) genotyping of the 5,292 KoGES subjects was performed using the Affymetrix Human SNP array 5.0 (Affymetrix, Santa Clara, CA) as previously described [18]. Of the 500,568 SNPs examined, those exhibiting high missing call rates (>5%), low minor allele frequencies (<0.05), or significant deviations from the Hardy-Weinberg equilibrium (HWE; p < 0.0001) were excluded for quality control, and the remaining 310,746 SNPs were subjected to further analyses.
The genotypes of ten variants that passed a statistical cut-off p-value for association with lipid ratios (rs180349, rs6589566, rs4244457, rs6586891, rs8067076, rs6501843, and rs2885819 for log[TG]:HDLC ratio, rs4420638, rs17445774, and rs2304072 for LDLC:HDLC ratio, and rs180349 and rs6589566 for non-HDLC:HDLC ratio) in the initial GWAS were determined in the 2,567 KDC subjects. For 805 subjects, the genotypes were determined by extracting the genotypes of 10 SNPs from Affymetrix SNP array, and for 1,762 subjects, they were determined by performing TaqMan1 assay on three SNPs (rs180349, rs4244457, and rs17445774) in the Fluidigm BioMark TM System (Fluidigm, South San Francisco, CA) or melting analysis of an unlabeled oligonucleotide probe (UOP) applied during PCR on the remaining seven SNPs [20]. The detailed process of genotyping using a UOP for the variant has been described in a previous report [21]. Nine variants except rs180349 were within the HWE in the KDC population (p > 0.01). Therefore, we performed association analyses using the nine SNPs in the KDC and combined populations.

Statistical analyses
During the discovery stage, GWAS was performed for identifying the variants associated with lipid ratios (log[TG]:HDLC ratio, LDLC:HDLC ratio, and non-HDLC:HDLC ratio) by linear regression analysis in an additive model using PLINK version 1.07 (http://pngu.mgh.harvard. edu/purcell/plink/) [22], with adjustment for age, sex, and recruitment region. Quantile-quantile plots for each lipid ratio were constructed with the distribution of the observed p-values against the theoretical distribution of the expected p-values. The genomic control inflation factors (λ) for the GWAS of each lipid ratio were checked for potential p-value inflation. Manhattan plots for the lipid ratios were generated using R version 3.0.2 software (http://www.r-project.org/), and regional plots with a 1-megabase (Mb) window centered at the variant with the peak SNP were constructed using the web-based LocusZoom tool [23].
In the replication analysis, linear regression analyses of the lipid ratios were performed to confirm the association of the GWAS SNPs in the KDC population, with adjustment for age and sex using R version 3.0.2. Chi-squared test was used to determine whether the GWAS SNPs deviated from HWE in the KDC population. Linkage disequilibrium (LD; Lewontin's D' = D/| Dmax| and r 2 ) was determined using Haploview version 4.2 (Daly Lab at the Broad Institute, Cambridge, MA) [24]. The interaction between TE category and lipid ratio-associated variants was assessed by adding an interaction term in the linear regression model. In the subgroup analysis according to TE and NTE types, the associations of the lipid ratios shown in all the subjects were revaluated in two populations, with adjustment for age and sex.
The association results from the GWAS and replication analysis were combined using Comprehensive Meta-Analysis program version 2.0 (Biostat, Englewood, NJ) in a random effect model by the DerSimonian and Laird method [25].
Genome-wide significance at the Bonferroni-corrected level (0.05/310,746 SNPs) and nominal significance (cut-off) in the GWAS (stage 1) were defined at p < 1.6 × 10 −7 and p < 5.0 × 10 −6 , respectively, and we regarded a p-value of 0.05 as the cut-off in the replication (stage 2) and the constitutional subgroup analyses. The SNPs in the combined analysis of GWAS and replication analysis were considered significant when the p-values showed traditional genome-wide significance, i.e. p < 5.0 × 10 −8 . The SNPs in the combined analysis of the constitutional subgroup were considered significant when p-values were at the Bonferronicorrected level (0.05/5 SNPs), i.e. p < 1.0 × 10 −2 .

Characteristics of the study subjects
We analyzed the effects of the common variants on lipid ratios such as the log[TG]:HDLC ratio, LDLC:HDLC ratio, and non-HDLC:HDLC ratio in two independent Korean populations as follows: GWAS in the KoGES population comprising 5,292 individuals (discovery stage: stage 1) and replication analysis in the KDC population comprising 2,567 individuals (replication stage: stage 2). The characteristics of the two populations, including traits related to dyslipidemic risk, are presented in Table 1. The KoGES population included older individuals and a higher proportion of men than the KDC population. Subjects with the TE type tended to have higher values of BMI and waist circumference as well as dyslipidemic traits including lipid ratios than those with NTE type, which are consistent with the results of previous reports [26,27].

Common variants associated with lipid ratios in all the subjects
We performed GWAS to identify the genetic variants associated with lipid ratios in the KoGES population in stage 1. The quantile-quantile plots presented deviations only in the extreme tail probabilities between the distributions of the expected and observed p-values (λ = 1.019 for log[TG]:HDLC ratio, λ = 0.990 for LDLC:HDLC ratio, and λ = 1.002 for non-HDLC:HDLC ratio), indicating that population stratification effects can be considered negligible (S1  Table 2).
Lipid ratio-associated variants according to constitutional types A genetic discrepancy for cardiovascular risk exists between the TE (high risk) and NTE (low risk) types [17]. Therefore, we explored interactions between lipid ratio-associated variants and TE subgrouping, i.e., the TE and NTE types categorized based on the tertiles of the SCAT probability values for the TE constitutional type, by adding an interaction term to the linear regression model applied to all subjects. However, there were no significant interactions between the variants for three lipid ratios and TE subgrouping (P TE-int > 0.05 in Table 3), as no remarkable differences in effect size between the two types, e.g., an opposite direction of the effect, were shown.  Because the TE type presented significantly higher lipid ratios in both KoGES and KDC populations when compared to the NTE type (Table 1), the associations of lipid ratio-associated variants were examined in constitutional subgroups. All five confirmed lipid ratio-associated variants in all the subjects presented significant constitution-consolidated association patterns (Table 3). That is, the minor allele effect of rs6589566 associated with increased log [TG]:HDLC was significant in the subgroup with the NTE type, whereas the effect of the SNP on non-HDLC:HDLC ratios remained significant in both TE and NTE types. The minor allele effects of the other four variants (rs4244457 and rs6586891 associated with decreased log[TG]: HDLC ratio and rs4420638 and rs17445774 associated with increased LDLC:HDLC ratio) remained significant in the subgroup with the TE type (Table 3).

Discussion
Our GWAS was aimed at identifying the genetic factors associated with lipid ratios. We found a novel locus (C2orf47-SPATS2L (spermatogenesis associated serine rich 2 like) region) associated with the LDLC:HDLC ratio along with three known loci previously reported for individual lipid traits. In addition, we confirmed genetic discrepancy of lipid ratios according to the TE and NTE type.
In association tests between the TG:HDLC ratio and the SNPs, the strongest signal was observed for rs6589566 located downstream of APOA5, an SNP strongly correlated with 3 0 UTR rs2266788 (calculated by Haploview version 4.2; r 2 = 0.99 and D 0 = 1.00 in Han Chinese in Beijing + Japanese population from HapMap 3 release #27) of APOA5. The minor allele of the 3' UTR SNP reduces has-miR-3021 and has-miR-485-5p binding, resulting in reduced APOA5 expression and hypertriglyceridemia [28,29]. The second strong signal was detected for rs4244457 (highly correlated with rs6586891 showing the third strong signal; r 2 = 0.97 and D' = 0.94 in our study) located downstream of LPL that catalyzes the hydrolysis of lipoprotein TG and involves in the uptake of esterified lipids [30]. Further, rs4244457 was in strong LD (calculated by Haploview version 4.2; r 2 = 0.48 and D 0 = 0.90 in Han Chinese in Beijing + Japanese population from HapMap 3 release #27) with rs13702 in the 3 0 UTR of LPL, which is associated with the change in blood TG and HDLC levels. The minor allele of rs13702 associated with decreased TG and increased HDLC disrupts the recognition site for has-miR-410 in the 3' UTR of LPL and induces an increase in LPL expression [31].
The SNP rs4420638 close to APOC1 has been found to be associated with higher LDLC and lower HDLC in previous reports [3,32]. rs4420638 also showed the strongest association with the LDLC:HDLC ratio in our study. However, the functional relationship between rs4420638 (or the correlated variants) and the change in the expression or activity of neighboring genes (APOE, APOC1, APOC2, and APOC4) remains unclear. The second strong signal for the LDLC:HDLC ratio was observed for rs17445774 close to C2orf47, which encodes uncharacterized protein, and is surrounded by formiminotransferase cyclodeaminase N-terminal like, C2orf69, tRNA-yW synthesizing protein 5, and SPATS2L. We searched lipid-SNP associations within the 1-Mb region around rs17445774 using two database tools, GRASP Search-v2.0.0.0 and PheGenI [33,34]. In total, 11 SNPs except rs17445774 were suggestively associated (1.75 × 10 −5 < p < 9.90 × 10 −4 ) with various lipid traits including TG, TC, LDLC, VDLC, HDLC, and ApoC3 levels in blood. Most of them had low LD with rs17445774, but two SNPs had strong LD (calculated by Haploview version 4.2; Both rs281787 and rs7565480 have r 2 = 0.02 and D 0 = 1.00 in Han Chinese in Beijing + Japanese population from HapMap 3 release #27) with rs17445774 and were found to be associated with ApoC3 levels (rs281787 p = 4.52 × 10 −5 and rs7565480 p = 1.75 × 10 −5 ) (S1 Table). ApoC3 is a component of remnant particles that inhibit the hydrolysis of TG-rich lipoproteins by LPL and the uptake of TG-rich lipoproteins by the liver, causing an increase in the TG level in the blood [35,36]. Moreover, C2orf47 is up-regulated 2.5-fold in human femoral atherosclerotic lesion, as determined in the gene expression analysis using microarrays [37]. These results indicate that this genomic region is genetically involved in the regulation of lipid metabolism.
The five SNPs of four loci satisfied the significance threshold (p-value < 5.0 × 10-8 ) in this study, and three SNPs among them were also found to be associated with lipid levels in a previous report [17]. Upon comparing the results, we found that rs6589566 was more significantly associated with independent lipid levels (p < 2.00 × 10-16 for increased TG levels; and p = 1.22 × 10-5 for decreased HDLC levels, versus p = 2.39 × 10-13 for increased TG:HDLC ratio). However, two SNPs were found to be more significantly associated with lipid ratios (rs4420638: p = 4.87 × 10-8 for increased LDLC levels and p = 8.87 × 10-5 for decreased HDLC levels, versus p = 8.21 × 10 −11 for increased LDLC:HDLC ratio; rs6586891: p = 5.56 × 10-6 for decreased TG levels and p = 9.39 × 10-9 for increased HDLC levels, versus p = 2.62 × 10-10 for decreased TG:HDLC ratio), although the present study had a smaller sample size than the previous one. This relatively higher significance suggests that the association test for lipid ratio is more effective in identifying genetic factors associated with lipid traits.
In the subgroup analysis according to constitutional type, several loci could be categorized into two groups according to their subgroup associations: (1) for the loci associated with both the TE and NTE types, the APOA5 locus was associated with increased TG:HDLC ratio in the NTE type and increased non-HDLC:HDLC ratio in both the TE and NTE types. (2) For the loci associated only with the TE type, one locus (LPL) was associated with decreased TG: HDLC ratio, and two loci (APOC1 and C2orf47) were associated with increased LDLC:HDLC ratio. Therefore, the TE type may be more susceptible to cardiometabolic risks caused by genetic elements compared to the NTE type, since the effects of most SNPs from the genomewide scan were significant only in the TE type. This genetic discrepancy is consistent with the clinical discrepancy for cardiometabolic risks reported in a previous study [26].
One limitation of our study is that we did not analyze the associations between CHD risk and lipid-ratio SNPs including the C2orf47 SNP, owing to lack of clinical information for CHD in the studied population. Therefore, we cannot conclude that the newly identified SNPs also play a significant role in CHD development.
In conclusion, we confirmed that the known loci associated with lipid levels were also associated with lipid ratios. Furthermore, a relationship between the C2orf47 locus and the LDLC: HDLC ratio was newly discovered. Our study is significant in the discovery of this association of the C2orf47 locus with the LDLC:HDLC ratio, given that the locus has a small effect on single-lipid phenotypes and has been overlooked in conventional single-lipid studies. With regard to the constitutional type, most SNPs exert genetic influences in the TE type. In the future, association studies for lipid ratios should be aimed at broadening the genetic perspective on cardiovascular diseases caused by atherogenic dyslipidemia.  Table. Lipid-associated SNPs in the 1-Mb region around rs17445774. Using two search engines, lipid-associated SNPs were searched in the 1-Mb region around rs17445774. SNP positions were represented according to GRCh38.p2. r2 and D' values were calculated by Haploview version 4.2 using two reference genotype data: (1) Japanese from 1000 genome phase 3 data and (2) Han Chinese in Beijing + Japanese from HapMap release #27 data. Genotype data of rs10497847 could not be downloaded from HapMap and 1000 genome data. (XLSX)