Association between Expression Quantitative Trait Loci and Metabolic Traits in Two Korean Populations

Most genome-wide association studies consider genes that are located closest to single nucleotide polymorphisms (SNPs) that are highly significant for those studies. However, the significance of the associations between SNPs and candidate genes has not been fully determined. An alternative approach that used SNPs in expression quantitative trait loci (eQTL) was reported previously for Crohn’s disease; it was shown that eQTL-based preselection for follow-up studies was a useful approach for identifying risk loci from the results of moderately sized GWAS. In this study, we propose an approach that uses eQTL SNPs to support the functional relationships between an SNP and a candidate gene in a genome-wide association study. The genome-wide SNP genotypes and 10 biochemical measures (fasting glucose levels, BUN, serum albumin levels, AST, ALT, gamma GTP, total cholesterol, HDL cholesterol, triglycerides, and LDL cholesterol) were obtained from the Korean Association Resource (KARE) consortium. The eQTL SNPs were isolated from the SNP dataset based on the RegulomeDB eQTL-SNP data from the ENCODE projects and two recent eQTL reports. A total of 25,658 eQTL SNPs were tested for their association with the 10 metabolic traits in 2 Korean populations (Ansung and Ansan). The proportion of phenotypic variance explained by eQTL and non-eQTL SNPs showed that eQTL SNPs were more likely to be associated with the metabolic traits genetically compared with non-eQTL SNPs. Finally, via a meta-analysis of the two Korean populations, we identified 14 eQTL SNPs that were significantly associated with metabolic traits. These results suggest that our approach can be expanded to other genome-wide association studies.

Introduction association with 10 metabolic traits in two independent Korean cohorts (Ansan and Ansung).

Study subjects
The study subjects comprised two population-based cohorts, Ansung and Ansan, which have been examined as part of the Korean Genome and Epidemiology Study (KoGES). The phenotype of the cohort has been described [17]. Briefly, the subjects came from Ansung and Ansan in KyungGi-Do province, near Seoul, Korea. Written informed consent was obtained from all participants, and this research project was approved by the institutional review board of KNIH. A total of 10,038 individuals were recruited for the cohorts, and 8,842 individuals of the KoGES were analyzed by the Korean Association Resource (KARE) consortium to understand their genome-wide association with the surveyed or measured phenotypes [16].
Subjects with genotype accuracies below 98% and high missing genotype call rates ($5%), high heterozygosity (.30%), or inconsistency in sex were excluded from subsequent analyses. Individuals who had a tumor were excluded, as were related individuals whose estimated identity-by-state (IBS) values were high (.0.80). Based on these criteria, 8,842 samples were selected; these quality-control steps were described in a previous GWAS [16].

Genotyping
The genotype data were obtained from KARE, which used the Affymetrix Genome-wide Human SNP array 5.0 and imputed SNPs. The genotype qualitycontrol criteria were as reported in a previous GWAS [16]. Briefly, the criteria for the inclusion of SNPs were a genotype call rate .0.98, a minor allele frequency .0.01, and Hardy-Weinberg equilibrium (HWE) (p.1610 -6 ). Ultimately, 352,228 SNPs passed the quality-control process. SNP imputation was performed with IMPUTE [18] using the JPT and CHB sets of HapMap Phase 2 as the reference. After removing SNPs with a MAF ,0.01 and a SNP missing rate .0.05, we combined the genotyped SNPs and imputed SNPs; a total of 1.8 million SNPs were used in the subsequent study.
Population stratification was tested by a principal component analysis using the EIGENSOFT software [19]. To prevent overrepresentation of regions with more redundant SNPs, we used the indep-pairwise command in PLINK [20] to reduce linkage disequilibrium between the remaining variants by eliminating any SNP that had a pairwise r 2 .0.3 with any other SNP in a 1500 bp window (step size, 150 bp). This reduced the original dataset to 93,877 SNPs; subsequently, we used smartpca [19].

eQTL-SNP selection
Most of the eQTL-SNP resources that were used in this analysis are available in online databases, such as RegulomeDB (http://regulome.stanford.edu/), including several published resources for various cell types/tissues, such as monocytes [9], human brain [10], lymphoblastoid cell lines [11,12], and human liver [13]. The RegulomeDB is one of the most comprehensive eQTL-SNP databases, because it contains rich information about the products of the ENCODE project, such as transcription factor binding sites, chromatin structure, histone modification, and eQTL. The RegulomeDB classifies the human SNPs into seven categories [21], of which Category 1 includes eQTL SNPs with other information about the regulatory elements, such as transcription factor binding, chromatin structure, and histone modification. A total of 39,332 eQTL SNPs were downloaded from the RegulomeDB, Category 1. In addition, we also included two recent eQTL reports based on liver (1,078 SNPs) or lymphoblastoid (907 SNPs) tissues [14,15]. Finally, a total of 41,317 eQTL SNPs were used to extract the same SNPs from our genotype dataset, and we extracted the 25,658 eQTL SNPs from the KARE SNPs using PLINK [20] software.
Estimation of the genetic variance that is explained by eQTL or non-eQTL SNPs The non-eQTL SNPs were defined as the SNPs that excluded the eQTL SNPs and were in perfect linkage disequilibrium with the eQTL SNPs (r 2 51.0 and D951.0). The genetic variances were computed via GCTA v1.24 [22], which is a tool for estimating the proportion of phenotypic variance that is explained by genomewide SNPs for complex traits [23]. First, we estimated the pairwise genetic relationship using the make-grm option. Next, we estimated the proportion of phenotypic variance that was explained by the eQTL and non-eQTL SNPs, respectively, based on the restricted maximum likelihood (REML) [23].

Statistical analyses
The effect of a genotype was analyzed by linear regression. We calculated the effect size (beta) and standard error (SE) of minor alleles on metabolic traits for each Ansung and Ansan subject. All analyses were adjusted for age, sex, body mass index (BMI), and principal component (PC) 1 and PC2. PLINK v 1.07 [20] was used for all statistical tests. All tests were performed based on the additive model, and we combined the Ansung and Ansan results by an inverse-variance metaanalysis under the assumption of fixed effects using Cochran's Q test, to assess between-study heterogeneity [24]. In this study, we applied false discovery rates and Bonferroni correction p-values to determine significant associations.

Dataset characteristics
The clinical characteristics and sample exclusion criteria are described in Table 1.
The KARE subjects consisted of two Korean populations (Ansung and Ansan), which we used as the replication set of genetic associations based on their differing clinical characteristics. However, the population stratification analysis showed similar genetic structures (S1 Figure), indicating that they constitute a replication set that can be used to identify consistent genetic effects, despite the differences in demographics and environments.

The proportion of phenotypic variance that was explained by eQTL and non-eQTL SNPs
Based on the RegulomeDB and previous reports, we isolated 25,658 eQTL SNPs from the KARE genotype dataset. The proportion of phenotypic variance that was explained by eQTL and non-eQTL SNPs for complex traits is described in Table 2. The non-eQTL sites examined were approximately 66 times more examined SNPs than the eQTL sites; however, compared with those explained by non-eQTL SNPs, the proportions of phenotypic variance that were explained by eQTL SNPs were larger for GLU0 (1.9 in eQTL vs 0.2 in non-eQTL SNPs) and TG (1.3 in eQTL vs 0.1 in non-eQTL SNPs), and similar for BUN (3.4 in eQTL vs 4.0 in non-eQTL SNPs). Moreover, the proportions of phenotypic variance that were explained by eQTL SNPs were relatively large for the remaining phenotypes, with the exception of ALB.

Association study of eQTL-related SNPs
These eQTL SNPs were examined with regard to their association with metabolic traits in the Ansung and Ansan cohorts. Ultimately, we selected 509 eQTL SNPs that had the same effect and p-values,0.05 in both cohorts, and combined the results obtained for each cohort via a meta-analysis. All association results and meta-analysis results for the 509 SNPs are described in S1 Table. Among the 509 SNPs analyzed, we identified significant associations using adjusted p-values,0.05 for FDR. Twenty-six SNPs for GLU0, eleven SNPs for GGT, two SNPs for Tchol, thirty-five SNPs for LDLC, and two SNPs for TG met our criteria (S1 Table, characters highlighted in bold). Because many associated SNPs were located in the same LD blocks, we selected the most significant SNP in each significant LD block (summarized in Table 3). We then used these SNPs for further annotation and compared them with the previous conventional GWAS results (http://www.genome.gov/gwasstudies). Further information for the significant eQTL positions is described in Table 4, including the genes, cell types, and ENCODE regulatory elements in metabolic-trait-related cell types derived from the blood, liver, or pancreas. No significant association was detected regarding the results of ALB, BUN, AST, ALT, and HDLC. Table 3 shows that two SNPs (rs1535 and rs463302) for FPG, two SNPs (rs2251468 and rs11065774) for GGT, two SNPs (rs780092 and rs4604177) for Tchol, and two SNPs (rs12679834 and rs651821) for TG were significantly associated and passed the Bonferroni correction. The p-values of two SNPs (rs3002007 and rs1431985) for FPG, one SNP (rs13233571) for GGT, and three SNPs (rs11065774, rs9966367, and rs4604177) for TG did not pass the Bonferroni correction, but passed the FDR correction.

Significantly associated eQTL SNPs
In this study, we focused on significantly associated SNPs. SNP rs1535 was a FADS1 and NXF1 eQTL-SNP hot spot, and the results of the meta-analysis of the association between the Ansan and Ansung populations were b5-0.943 and p-value53.3610 -8 . SNP rs463302 was a B3GALT4 eQTL-SNP hot spot, and the results of the meta-analysis were b51.347 and p-value58.5610 -7 . SNP rs2251468 was a C12orf43 eQTL-SNP hot spot, and the results of the meta-analysis were b5-0.812 and p-value52.2610 -7 . SNP rs11065774 was an MYL2 eQTL-SNP hot spot, and the results of the meta-analysis were b5-1.009 and p-value52.5610 -7 . SNP rs780092 was a XAB1 eQTL-SNP hot spot, and the results of the meta-analysis were b5-2.710 and p-value54.4610 -7 . SNP rs4604177 was a FAM169A eQTL-SNP hot spot, and the results of the meta-analysis were b5-2.492 and p-value57.7610 -7 . SNP rs12679834 was an eQTL-SNP hot spot of LPL, and the results of the meta-analysis were b5-6.183 and p-value54.5610 -9 . Finally, SNP rs651821 was an eQTL-SNP hot spot of TAGLN, and the results of the metaanalysis were b55.011 and p-value54.7610 -9 .

Discussion
In this study, we performed a GWAS of 10 biochemical traits using SNPs that were preselected based on the eQTL-SNP lists of RegulomeDB and two recent eQTL papers. The proportion of phenotypic variance that was explained by eQTLand non-eQTL-related SNPs showed that the eQTL SNPs were more likely to be associated with the metabolic traits than were the non-eQTL SNPs. We identified 14 eQTL SNPs that were associated with metabolic traits in two Korean populations (Ansung and Ansan), in which the p-values met our multiple comparison criteria. The SNPs revealed novel candidate genes for FPG, GGT, Tchol, LDLC, and TG.
SNP rs1535 was reported as being associated with decreased plasma phospholipid levels by an European study [25], and with decreased HDLC by an  Indian Asian study [26]. Although FADS1 has been studied extensively in lipid metabolism, recent genetic association studies showed its association with glucose metabolism [27]. Moreover, NXF1, a novel glucose-regulated protein, is elevated under high-glucose conditions [28]. Therefore, the expression of both the FADS1 and NXF1 genes is influenced by the rs1535 genotypes, which might represent a functional element for regulating both glucose and lipid metabolism. SNP rs463302 has not been reported in association studies; however, our study indicated that rs463302 may contribute to B3GALT4 expression levels. B3GALT4 encodes UDP-Gal:betaGlcNAc beta 1,3-galactosyltransferase, polypeptide 4, which is a member of the beta-1,3-galactosyltransferase protein family. Although B3GALT4 mitigates ganglioside activity in neurodegenerative disorders, such as Huntington disease [29], it was also reported that gangliosides are related to the immunological pathophysiology of type 1 diabetes and to insulin resistance in type 2 diabetes [30,31]. SNP rs463302 lies 200 bp upstream of B3GALT4, to which RNA polymerase 2A (POLR2A) binds, based on the ENCODE ChIP-seq data. In addition, the ENCODE DNase-seq and histone modification data indicated that the chromatin around this SNP is open and undergoes histone modifications. Taken together, our results and in silico evidence suggest that the rs463302 SNP may be a regulatory factor of B3GALT4 and a modifying factor of fasting glucose level.
In our study, rs2251468 was significantly associated with GGT. The SNP was recently reported to be associated with plasma homocystein levels [32]. Because homocystein concentration is a risk factor for coronary artery disease [32] and is significantly correlated with GGT [33], this SNP might be a novel candidate marker for coronary artery disease.
Although rs12679834 has not been reported in association studies, a SNP that is in high LD with rs12679834 (rs331, r 2 50.771, D951.000) has been reported as being associated with plasma lipoprotein concentration [34]. SNP rs12679834 was an eQTL-SNP of LPL (lipoprotein lipase), which is a critical enzyme in lipid metabolism that catalyzes the hydrolysis of TGs. Dysfunction of LPL induces pathophysiological lipid-related disorders, including hyperlipidemia, dyslipidemia [35], and hypertriglyceridemia [36].
Our study had two limitations. First, although the positions of the eQTL SNPs have several evidences of the regulatory elements from ENCODE results, not all eQTL SNPs were experimented in the cell types or tissues that are directly related to the metabolic traits [37]. Thus, we assumed that the eQTL SNPs also played a similar role in the metabolic-trait-related cell types or tissues. For example, the B3GALT4 eQTL-SNP (rs463302) was experimented in cerebellum tissue; however, it is located on several ENCODE regulatory elements, such as transcription factor binding, open chromatin, and active histone modification markers, in metabolictrait-related cell types, such as the liver and pancreas. The other limitation was that we used HapMap 2 imputed SNPs. Recently, many genome-wide association studies used 1000 Genomes Project-based imputation [38]. Although we could not use 1000 Genomes-based imputation in the current study, we will apply it in future studies.
Previous GWASs of complex traits have primarily examined DNA sequence variants (DSVs) between individuals that contribute to the susceptibility to a disease, clinical outcomes, and response to therapy [9]. However, the mechanisms that govern the expression of a phenotype are embedded not only in the DSVs, but also through their effects on various genomic components that regulate gene expression, variants, and posttranslational modifications of the encoded proteins, in conjunction with environmental factors. Thus, a complicated phenotype is the consequence of complex interactions between many genetic and nongenetic factors.
The results of eQTL-SNP GWAS might be novel targets for the design of future experimental studies. As an example, rs12740374 was identified in European and American GWAS for LDLC, and those studies reported the CELSR2 gene as a candidate gene based on positional proximity [39]. However, further analysis showed that the SNP is an eQTL of SORT1 [40]. Moreover, our approach using eQTL SNPs provides the possibility of understanding the internal mechanism that underlies the link from SNP to genes to phenotype. For example, in our study, SNP rs1535 was an eQTL of NXF1, which is regulated by the blood glucose levels. This result led us to speculate on the following internal link: the modulation of NXF1 expression by blood glucose levels may be modified by eQTL SNPs.
In conclusion, the relationships between functional eQTL SNPs and 10 biological traits may be helpful for understanding the underlying mechanism that connects genotype and phenotype. Because eQTL SNPs promote the understanding of gene expression and regulation, this approach might help identify biomarkers of metabolic traits.