Genome-wide association analysis identifies multiple loci associated with kidney disease-related traits in Korean populations

Chronic kidney disease (CKD) is an important social health problem characterized by a decrease in the kidney glomerular filtration rate (GFR). In this study, we analyzed genome-wide association studies for kidney disease-related traits using data from a Korean adult health screening cohort comprising 7,064 participants. Kidney disease-related traits analyzed include blood urea nitrogen (BUN), serum creatinine, estimated GFR, and uric acid levels. We detected two genetic loci (SLC14A2 and an intergenic region) and 8 single nucleotide polymorphisms (SNPs) associated with BUN, 3 genetic loci (BCAS3, C17orf82, ALDH2) and 6 SNPs associated with serum creatinine, 3 genetic loci (BCAS3, C17orf82/TBX2, LRP2) and 7 SNPs associated with GFR, and 14 genetic loci (3 in ABCG2/PKD2, 2 in SLC2A9, 3 in intergenic regions on chromosome 4; OTUB1, NRXN2/SLC22A12, CDC42BPG, RPS6KA4, SLC22A9, and MAP4K2 on chromosome 11) and 84 SNPs associated with uric acid levels. By comparing significant genetic loci associated with serum creatinine levels and GFR, rs9895661 in BCAS3 and rs757608 in C17orf82 were simultaneously associated with both traits. The SNPs rs11710227 in intergenic regions on chromosome 3 showing significant association with BUN is newly discovered. Genetic variations of multiple gene loci are associated with kidney disease-related traits, and differences in associations between kidney disease-related traits and genetic variation are dependent on the population. The meanings of the mutations identified in this study will need to be reaffirmed in other population groups in the future.


Introduction
Chronic kidney disease (CKD) is an important health problem that increases the incidence of cardiovascular disease and overall mortality [1]. Kidneys function is usually expressed as the glomerular filtration rate (GFR) and is generally deteriorated when the GFR is decreased. Traditionally, blood urea nitrogen (BUN) and serum creatinine levels have been used as surrogate markers of kidney function deterioration. BUN reflects the amount of nitrogen in the blood and is produced as a waste product of protein metabolism [2,3]. Serum creatinine is a representative biochemical indicator of kidney function and is produced by the breakdown of muscle creatine phosphate. Because the exacerbation of kidney function and the increase in serum creatinine level are not directly proportional, the GFR is estimated using demographic and biochemical factors, such as serum creatinine levels, age and sex. Uric acid is the last product of purine metabolism, and hyperuricemia develops when renal function is reduced and uric acid excretion is decreased [4]. Hyperuricemia is prevalent in patients with kidney disease, and genetic susceptibility plays an important role in the development of hyperuricemia [5]. Risk factors for CKD and end-stage renal disease requiring dialysis are diabetes mellitus, hypertension, glomerulonephritis, and polycystic kidney disease. However, these traditional risk factors alone cannot completely explain the development of CKD [6]. Genetic studies have shown that genetic factors affect approximately 36-75% of kidney function and vulnerability to CKD progression [7,8]. Genome-wide association studies have been used to identify genetic variations associated with kidney function in various populations, and differential genetic variation was found in each population group [9,10]. In this study, we conducted a genome-wide association and replication study of a Korean adult population to identify multiple genetic loci associated with kidney disease-related traits, including, BUN, serum creatinine, GFR, and uric acid levels.

Study participants
Between January 2014 and December 2014, 7,999 adults who underwent a health screening assessment at the Seoul National University Hospital Health Care System Gangnam Center were asked to consent to research, and their blood samples were collected and stored for further study. Most patients voluntarily conducted a personal health check-up or submitted to a health check-up with financial support from the company. The Institutional Review Board of Seoul National University Hospital approved the storage of blood samples for genetic analysis with informed consent. To investigate associations between genetic variations and kidney disease-related traits, we retrospectively enrolled 7,064 healthy participants after performing quality control analysis on the genetic samples. All participants had undergone blood sampling to measure BUN, serum creatinine, and uric acid levels. Demographic and other clinical information (age, sex, body mass index (BMI), comorbidities, and laboratory findings) regarding the participants were collected through electronic medical record review at the time of the health check-up. The Institutional Review Board of Seoul National University Hospital approved the research plan (IRB 1603-120-750), and the study was conducted in compliance with the Helsinki Declaration. Specifically, personal information was encrypted for confidentiality, and genetic information was analyzed by professional analysts in a third space separate from personal information.

Definition of kidney disease-related traits
The kidney function-related traits assessed in this study included BUN, serum creatinine, estimated GFR, and uric acid levels. GFRs were calculated using the Modification of Diet in Renal Disease (MDRD) estimated GFR equation.

Genotyping
Genomic DNA was extracted from venous blood samples genotyped using Affymetrix Axiom1 Customized Biobank Genotyping Arrays (Affymetrix, Santa Clara, CA, USA), and the PLINK program (ver. 1.07) was used for quality control procedures. Specimens with the following characteristics were excluded from the analysis: low genotyping call rate ( 97%), sex inconsistency, and related and cryptically related individuals (identical by descent > 0.9). Small nucleotide polymorphisms (SNPs) with low call rates (< 97%), low minor allele frequency (MAF 0.05%), or significant deviation from the Hardy-Weinberg equilibrium permutation test (HWE P < 1.0 × 10 −5 ) were excluded (S1 Table). After performing the quality control evaluations, 345,072 autosomal SNPs were retained for the association analysis. S2-S5 Tables summarize the HWE, MAF, and missing rates of each SNP. Targeted imputation was performed in the validation set when SNP information could not be confirmed as significant in the discovery set. Imputation was performed as follows: genotypes were pre-phased using SHAPEIT2, imputed with IMPUTE2, and analyzed using 1000 genome phase 3 haplotypes as the reference panel.

Statistical analysis
SNPs associated with kidney disease-related outcomes, including BUN serum creatinine, GFR, and uric acid levels, were identified with multiple linear or logistical regression methods with adjustments for age, sex, diabetes mellitus, hypertension, and BMI effects. Principal component (PC) scores were estimated with the EIGENSTRAT approach to adjust the population substructure, and the first five PC scores were also included as covariates [11]. A total of 345,072 SNPs that passed the quality control assessment were used for the genome-wide association study. Distributions of normality for BUN, serum creatinine, GFR, and uric acid levels were evaluated with a histogram and Kolmogorov-Smirnov test. Because the continuous variables did not show normal distribution, the characteristics of the continuous variables were displayed using median values and interquartile ranges. Categorical variables were expressed as frequencies or percentages. The R software package (version 3.1.1., R development Core Team; R Foundation for Statistical Computing, Vienna, Austria) was used for statistical analysis and to draw the Manhattan-log10 plots. Analysis results were verified using discovery and validation sets. The discovery set comprised 7,064 of the participants included in this study. Significant SNPs (P < 1.45 × 10 −7 , value derived from the 345,072 QC-qualified SNPs and Bonferroni correction) were tested in the validation cohort samples. BUN, serum creatinine, and GFR validation was performed using results from participants of the genome-wide association study of the Korea Association Resource (KARE), and uric acid validation was performed using results from participants of the Health Examinee shared control (HEXA) study. The KARE is a prospective cohort designed to identify risk factors for major chronic diseases in Koreans, including diabetes and hypertension [12][13][14]. The KARE cohort consists of 10,038 adults (aged 40-69) who are representative samples of residents in two cities (Ansung and Ansan) in South Korea. Data obtained from physical examinations and laboratory tests have been collected since 2001, and follow-up studies have been conducted every two years. Genetic testing results from 6,509 participants in the third phase of the KARE cohort study were used to validate BUN, serum creatinine, and GFR. HEXA is a large-scale cohort designed for the general population to identify environmental genetic factors for major chronic diseases in Koreans. Since 2004, adults 40-70 years of age representing the general population have been recruited from health screenings and medical institutions. The cohort includes data from health checkups and epidemiological surveys as well as follow-up data collected since 2007. Approximately 20 medical institutions exist in Korea nationwide, and approximately 30,000 new participants are enrolled in the study every year. Currently, data from approximately 173,300 adult participants are collected. For the validation of uric acid, we used genetic test results from 3,703 participants in the HEXA cohort study. Validation P values less than 0.05 were considered significant. We grouped significant SNPs by the linkage disequilibrium (LD) and D' values, and the graphs were generated using Haploview 4.2. software (S1 File, Figure A in S1 File for BUN in chromosome 3, Figure B in S1 File for uric acid in chromosome 4, Figure C in S1 File for uric acid in chromosome 11, Figure D in S1 File for serum creatinine in chromosome 12, Figure E in S1 File for GFR in chromosome 17, Figure F in S1 File for BUN in chromosome 18).

Participant characteristics
Clinical and demographic characteristics of the discovery cohort participants are shown in Table 1. A total of 7,064 participants were enrolled in this study. Median age was 51 years old Genetic loci significantly associated (P < 1.45 × 10 −7 in the discovery set) with BUN were found on chromosomes 3, 7, and 18. Genetic loci significantly associated with serum creatinine were found on chromosomes 12 and 17. Genetic loci significantly associated with GFR were found on chromosomes 2 and 17. Genetic loci significantly associated with uric acid were found on chromosomes 4 and 11. https://doi.org/10.1371/journal.pone.0194044.g001 significantly associated with GFR were found on chromosomes 2 and 17. Genetic loci significantly associated with uric acid levels were found on chromosomes 4 and 11. In addition, the distribution statuses of the observed versus expected P values are described using quantilequantile (QQ) plots in Fig 2. The QQ plots showed good adherence to null expectations. Studies on specific genetic inflation factors did not show substantial inflation of the test statistics on these traits. The calculated values (-log10 P values) are shown according to genomic position using a regional plot chart (Fig 3), and P values were obtained from the discovery set. Among the SNPs, the most significant SNP with the lowest P value is colored purple. On chromosome 18, rs6507625 in SLC14A2 was identified as the most significant SNP for BUN (P = 6.20 × 10 −9 ). On chromosome 3, rs10937329 in an intergenic locus was identified as the most significant SNP for BUN (P = 8.00 × 10 −12 ). On chromosome 17, rs9895661 in BCAS3 was identified as the most significant SNP for creatinine and GFR (P = 1.63 × 10 −8 for creatinine, P = 4.34 × 10 −11 for GFR). On chromosome 17, rs757608 in C17orf82 was also identified as significant SNP for both creatinine and GFR (P = 5.65 × 10 −8 for creatinine, P = 3.93 × 10 −10 for GFR). On chromosome 12, rs671 in ALDH2 was identified as the most significant SNP for creatinine (P = 3.45 × 10 −8 ). On chromosome 2, rs2390793 in LRP2 was identified as the most significant SNP for GFR (P = 4.28 × 10 −8 ). On chromosome 4, rs2231142 in ABCG2 (P = 3.38 × 10 −39 ) and rs3775948 in SLC2A9 (P = 8.59 × 10 −32 ) were identified as the most significant SNPs for uric acid. On chromosome 11, rs77459372 in OTUB1 was identified as the most significant SNP for uric acid (P = 9.12 × 10 −21 ). The most significant SNPs at each genetic locus associated with kidney disease-related traits are summarized in Table 2. We detected 4 SNPs on chromosome 18 (SLC14A2), 4 SNPs on chromosome 3 (intergenic loci), and 1 SNP on chromosome 7 (UNCX) associated with BUN in the discovery set (Tables 2 and 3). After validation set analysis, 4 SNPs on chromosome 18 (SLC14A2) and 4 SNPs on chromosome 3 (intergenic loci) still showed a significant association with BUN. The top SNPs in SCL14A2 on chromosome 18 and in intergenic loci on chromosome 3 were rs6507625 (P = 6.20 × 10 −9 in the discovery set, P = 3.70 × 10 −4 in the validation set) and rs10937329 (P = 8.00 × 10 −12 in the discovery set, P = 2.11 × 10 −9 in the validation set), respectively. rs11710227 in an intergenic region on chromosome 3 is a newly discovered SNP that showed significant association with BUN.
Six SNPs on chromosome 17 (BCAS3, C17orf82/TBX2) and 1 SNP on chromosome 2 (LRP2) significantly associated with GFR were found in the discovery and validation sets ( Table 5). The lead SNPs in BCAS3 and C17orf82/TBX2 on chromosome 17 and LRP2 on chromosome 2 were rs9895661, rs757608 and rs2390793, respectively. Comparing significant genetic loci associated with serum creatinine levels and GFR showed that 2 genetic loci (BCAS3 and C17orf82) were simultaneously associated with both traits. A genetic locus in ALDH2 on chromosome 12 was associated with only serum creatinine levels and not with GFR. Genetic loci in LRP2 on chromosome 2 was associated with only GFR and not with serum creatinine levels.

Discussion
In this study, we used genome-wide association analyses of discovery and validation populations to identify 6 genetic loci (SLC14A2 on chromosome 18 and intergenic regions on chromosome 3 for BUN; BCAS3 and C17orf82 on chromosome 17, and ALDH2 on chromosome 12 for serum creatinine; BCAS3 and C17orf82/TBX2 on chromosome 17, and LRP2 on chromosome 2 for GFR) that were associated with BUN, serum creatinine, and GFR. rs9895661 in BCAS3 and rs757608 in C17orf82 were simultaneously associated with both serum creatinine and GFR. rs11710227 was identified as a newly discovered SNP in an intergenic region on chromosome 3 that showed significant association with BUN. For uric acid, 8 genetic loci on SNPs that met the clustering quality control criteria, 101 SNPs in 20 genetic loci were related to kidney disease-related traits.  rs6507625 in SLC14A2 on chromosome 18 was the lead SNP associated with BUN. rs6507625 in SLC14A2 was previously reported to be associated with anthropometric parameters, including BMI and waist circumference, and kidney function-related traits, including serum creatinine and GFR [15]. Other 3 SNPs (rs1825475, rs1484873, and rs7232775) in SLC14A2 on chromosome 18 were known to be associated with serum creatinine and GFR [15]. rs1484873 and rs7232775 in SLC14A2 on chromosome 18 were also associated with BUN and hypertension [16,17]. rs10937329 in an intergenic region was the lead SNP on chromosome 3 associated with BUN. rs10937329 was previously proven to be associated with BUN in an analysis of 71,149 Asian populations [18]. In their study, rs10937329 was not related to serum creatinine, GFR, or uric acid levels, and these findings were similar to ours. The SNPs rs11710227, rs16862782, and rs4686914 were identified as novel SNPs on chromosome 3 (intergenic region) that were significantly associated with BUN. Although rs16862782 was reported to be associated with myopia [19], and rs4686914 with metabolic traits [20], the association of these SNP with BUN or other kidney disease-related traits have not been reported. SNPs rs11710227 on chromosome 3 (intergenic region) is a novel SNP that has not been shown to be related to a particular phenotype, and the relationship with BUN was newly revealed in this study.
The SNP rs9895661 in BCAS3 and SNPs rs757608 in C17orf82 on chromosome 17 on chromosome 17 were the top SNPs associated with both serum creatinine and GFR. A genetic locus in BCAS3 on chromosome 17 was previously reported to be associated with serum creatinine and GFR   associated with serum creatinine levels but not with GFR. A genetic locus in ALDH2 on chromosome 12 was previously reported to be associated with BUN and serum creatinine but not with GFR [18]. The SNP rs671 in ALDH2 was reported to be associated with metabolic traits, including diabetes mellitus, and blood pressure [27][28][29]. The SNP rs671 in ALDH2 also affects acute rejection after kidney transplantation and drug metabolism in end-stage renal disease patients [30,31]. C17orf82 was previously reported to be associated with serum creatinine and GFR [21]. The lead SNP rs757608 in C17orf82 on chromosome 17 was reported to be associated with height [24]. The other SNPs in C17orf82/TBX2 (rs9907379, rs8068318, and rs2079795) was also reported to be associated with height [24,25,32-34]. In this study, the relationship between the C17orf82/TBX2 genetic loci on chromosome 17 and GFR has been newly discovered, but this may be due to the relationship between height and GFR. Therefore, in subsequent studies, it is necessary to reconfirm the relevance of GFR by correcting the height factor among other population groups. The SNP rs2390793 in LRP2 on chromosome 2 was associated with GFR but not with serum creatinine levels. A genetic locus in LRP2 was previously reported to be associated with BUN, GFR, and proteinuria [21,35,36]. The SNP rs2390793 in LRP2 was previously reported to be associated with uric acid levels [37]. Uric acid was significantly associated with 84 SNPs in 14 genetic loci on chromosomes 4 (74 SNPs in 8 genetic loci) and 11 (10 SNPs in 6 genetic loci). Five genetic loci (2 in ABCG2, 1 in ABCG2/PKD2, 2 in SLC2A9) and 3 intergenic regions were associated with uric acid on chromosome 4. In ABCG2 on chromosome 4, 9 SNPs were significant. ABCG2 (ATP-binding cassette subfamily G member 2) is a protein-coding gene on chromosome 4. Mutations in ABCG2 are known to be associated with hyperuricemia and the risk of gout [38-40]. The two lead SNPs rs2231142 and rs3114018 in ABCG2 on chromosome 4 was associated with hyperuricemia or gout [41][42][43][44][45][46][47][48][49][50]. In the PKD2 gene region, SNP rs2725220 and rs2725201 were associated with uric acid levels. PKD2 is a protein-coding gene at 4q22.1 that encodes a member of the polycystin protein family. Association of the PKD2 gene with uric acid levels was proven in other previous studies [51]. SLC2A9 (solute carrier family 2 member 9) is located at 4p16.1 and encodes a member of the SLC2A facilitative glucose transporter family. The SCL2A9 gene was previously reported to be associated with uric acid levels and CKD progression [52][53][54][55][56][57][58][59][60][61][62]. The two lead SNPs rs3775948 and rs13129697 in SLC2A9 on chromosome 4 was associated with hyperuricemia or gout [4,41,46,[48][49][50][63][64][65][66][67][68][69][70][71]. The major 3 SNPs (rs6839820, rs59420943, and rs6823778) on the intergenic region of chromosome 4 were also found to be significantly related to uric acid levels in this study; these findings are novel, as they have not been previously reported.
Despite the clinical significance of this study, some limitations do exist. Relatively few patients in advanced kidney disease (N = 47, who showed GFR below 60 ml/min/1.73m 2 ) were enrolled due to the health screening patients being relatively healthy. In this study, researchers could adjust for only age, sex, diabetes mellitus, hypertension, and BMI as covariates for the analysis of genetic mutations associated with kidney disease-related traits, but many other variables could affect kidney disease-related traits.
In conclusion, we found 20 genetic loci and 101 SNPs that were associated with the kidney disease-related traits serum creatinine, BUN, GFR, and uric acid in the Korean population. SNPs rs11710227 on chromosome 3 (intergenic region) associated with BUN is a novel SNP that has not been reported to be related with specific phenotype. In this study, we also found the six novel genetic loci (3 intergenic region in chromosome 4, and OTUB1, RPS6KA4, and SLC22A9 on chromosome 11) associated with uric acid. Studies on genetic mutations have identified genetic risk factors for kidney disease. In addition to clinical findings, such as the degree of proteinuria and kidney biopsy results, results of genetic analysis may be used as risk factors for CKD progression. Because genetic impacts may vary from population to population, additional validation is needed to confirm whether these findings are similar in other populations.
Supporting information S1 Table. Description of genotyping quality control procedure and analysis workflow. Table. Results of genotyping quality control including minor allele frequency, Hardy-Weinberg equilibrium, and missing rate for blood urea nitrogen. Table. Results of genotyping quality control including minor allele frequency, Hardy-Weinberg equilibrium, and missing rate for serum creatinine. (DOCX) S4 Table. Results of genotyping quality control including minor allele frequency, Hardy-Weinberg equilibrium, and missing rate for glomerular filtration rate. (DOCX) S5 Table. Results of genotyping quality control including minor allele frequency, Hardy-Weinberg equilibrium, and missing rate for uric acid.