A Genome-Wide Association Study Identifies Susceptibility Variants for Type 2 Diabetes in Han Chinese

To investigate the underlying mechanisms of T2D pathogenesis, we looked for diabetes susceptibility genes that increase the risk of type 2 diabetes (T2D) in a Han Chinese population. A two-stage genome-wide association (GWA) study was conducted, in which 995 patients and 894 controls were genotyped using the Illumina HumanHap550-Duo BeadChip for the first genome scan stage. This was further replicated in 1,803 patients and 1,473 controls in stage 2. We found two loci not previously associated with diabetes susceptibility in and around the genes protein tyrosine phosphatase receptor type D (PTPRD) (P = 8.54×10−10; odds ratio [OR] = 1.57; 95% confidence interval [CI] = 1.36–1.82), and serine racemase (SRR) (P = 3.06×10−9; OR = 1.28; 95% CI = 1.18–1.39). We also confirmed that variants in KCNQ1 were associated with T2D risk, with the strongest signal at rs2237895 (P = 9.65×10−10; OR = 1.29, 95% CI = 1.19–1.40). By identifying two novel genetic susceptibility loci in a Han Chinese population and confirming the involvement of KCNQ1, which was previously reported to be associated with T2D in Japanese and European descent populations, our results may lead to a better understanding of differences in the molecular pathogenesis of T2D among various populations.


Introduction
Type 2 diabetes (T2D) affects at least 6% of the world's population; the worldwide prevalence is expected to double by 2025 [1]. T2D is a complex disorder that is characterized by hyperglycemia, which results from impaired pancreatic b cell function, decreased insulin action at target tissues, and increased glucose output by the liver [2]. Both genetic and environmental factors contribute to the pathogenesis of T2D. The disease is considered to be a polygenic disorder in which each genetic variant confers a partial and additive effect. Only 5%-10% of T2D cases are due to single gene defects; these include maturity-onset diabetes of the young (MODY), insulin resistance syndromes, mitochondrial diabetes, and neonatal diabetes [3][4][5]. Inherited variations have been identified from studies of monogenic diabetes, and have provided insights into b cell physiology, insulin release, and the action of insulin on target cells [6].
To date, a GWA scan for T2D has not been conducted in the Han Chinese population, although the association of some known loci have been confirmed, including KCNQ1 and CDKAL1, CDKN2A-2B, MTNR1B, TCF7L2, HNF1b, and KCNJ11 [42][43][44][45][46][47]. Therefore, we conducted a two-stage GWA scan for T2D in a Han Chinese population residing in Taiwan. There were a total of 2,798 cases and 2,367 normal controls (995 cases and 894 controls in stage 1, 1,803 cases and 1,473 controls in stage 2). Our accomplished objective was to identify new diabetes susceptibility loci that were associated with increased risk of T2D in a Han Chinese population.

Association analysis
We conducted a two-stage GWAS to identify genetic variants for T2D in the Han-Chinese residing in Taiwan. In the first stage, an exploratory genome-wide scan, we genotyped 995 T2D cases and 894 population controls using the Illumina Hap550duov3 chip ( Figure 1 and Table S1). For each sample genotyped in this study, the average call rate was 99.9260.12%. After applying stringent quality control criteria, high-quality genotypes for 516,737 SNPs (92.24%) were obtained, with an average call rate

Author Summary
Type 2 diabetes (T2D) is a complex disease that involves many genes and environmental factors. Genome-wide and candidate-gene association studies have thus far identified at least 19 regions containing genes that may confer a risk for T2D. However, most of these studies were conducted with patients of European descent. We studied Chinese patients with T2D and identified two genes, PTPRD and SRR, that were not previously known to be involved in diabetes and are involved in biological pathways different from those implicated in T2D by previous association reports. PTPRD is a protein tyrosine phosphatase and may affect insulin signaling on its target cells. SRR encodes a serine racemase that synthesizes D-serine from L-serine. Both D-serine (coagonist) and the neurotransmitter glutamate bind to NMDA receptors and trigger excitatory neurotransmission in the brain. Glutamate signaling also regulates insulin and glucagon secretion in pancreatic islets. Thus, SRR and D-serine, in addition to regulating insulin and glucagon secretion, may play a role in the etiology of T2D. Our study suggests that, in different patient populations, different genes may confer risks for diabetes. Our findings may lead to a better understanding of the molecular pathogenesis of T2D.  (Table S2). The results of principal component analysis in stage 1 revealed no evidence for population stratification between T2D cases and controls (P = 0.111, Fst statistics between populations ,0.001) (Text S1; Figure S1). Multidimensional scaling analysis using PLINK [48] produced similar results (Text S1; Figure S2). Furthermore, genomic control (GC) with a variance inflation factor l = 1.078 (trend test) did not substantially change the results of this GWAS (Table S3).
We selected eight SNPs in seven regions: rs9985652 and rs2044844 on 4p13, rs7192960 on 16q23.1, rs7361808 on 20p13, rs1751960 on 10q11.23, rs4845624 on 1q21.3, rs391300 on 17p13.3, and rs648538 on 13q12.3. These SNPs had association P values of ,10 25 at stage 1 with any of the genotype, allele, trend, dominant, and recessive models for subsequent cross-platform validation using Sequenom (Table 1; Table S3). For SNPs with weaker associations (P value between 10 24 and 10 25 ), we searched for novel susceptibility candidates for T2D as implicated by (1) gene function identified by a bioinformatics approach and (2) an animal model showing defects in glucose homeostasis caused by genes within the same subfamily. Therefore, we selected SNP rs17584499 (P = 2.4610 25 under best model) for further investigation. rs17584499 lies within protein tyrosine phosphatase receptor type D (PTPRD). We hypothesized that PTPRD might play a role in the regulation of insulin signaling, because its subfamily members leukocyte common antigen-related (LAR) and protein tyrosine phosphatase sigma (PTPRS) exhibit defects in glucose homeostasis and insulin sensitivity in knockout and/or transgenic mice [49][50][51].
We also evaluated the most significant SNP (rs231361) within KCNQ1, which was previously reported to be a diabetes susceptibility gene in a Japanese population, as well as in populations of Korean, Chinese, and European ancestry [28,29]. Together, these ten SNPs-the 8 SNPs with association p,10 25 , rs17584499, and rs231361-were cross-platform validated and yielded consistent results using both Illumina and Sequenom. The concordance rate for stage 1 samples typed on the Illumina and Sequenom platforms was 99.1%60.84% (Table S4). We took these ten SNPs and an additional 29 neighboring SNPs within the linkage disequilibrium (LD) block forward to replicate in 3,803 additional samples (stage 2; 1,803 cases and 1,473 controls). The average call rate for each sample was 96.13%64.66%. After applying stringent quality control criteria, high-quality genotypes for 35 SNPs (89.7%) were obtained, with an average call rate of 98.96%60.24% (Table S2). Of the ten SNPs selected in stage 1, only three SNPs still showed a strong association in the stage 2 analysis: rs17584499 in PTPRD at 9p24.1-p23, rs231359 in KCNQ1 at 11p15.5, and rs391300 in serine racemase (SRR) at 17p13.3 (Table 1). We were unable to replicate the association between T2D and the remaining seven SNPs in ATP8A1/GRXCR1, MAF/WWOX, SIRPA, LYZL1/SVIL, RORC/TMEM5, and KATNAL1 in the stage 2 analysis (Table 1). Joint analysis of stage 1 and stage 2 data revealed consistent results with stage 2. The most significant associations were found for rs391300, rs17584499, and rs231359 (Table 1; Figure 2). These associations remained significant after calculating P values using 10 8 permutations of the disease state labels. Joint association analysis was performed with all of the 2,798 T2D cases and 2,367 controls; this could achieve a power of 0.85 to detect a disease allele with a frequency of 0.15 and an OR of 1.5, assuming a disease prevalence of 0.06, at a significant level of 0.05 (Table S5).

Identification of two novel T2D loci and confirmation of KCNQ1 association
Two previously unknown loci were detected in our joint analysis of GWAS data. The strongest new association signal was found for rs17584499 in intron 10 Figure 2). The second strongest signal was found with rs391300 (P = 3.06610 29 [trend test]; OR = 1.28, 95% CI = 1.18-1.39). The nearby SNP rs4523957 also demonstrated a significant association (P = 1.44610 28 ; OR = 1.27, 95% CI = 1.17-1.38). SNPs rs391300 and rs4523957 were in tight LD with one another (r 2 = 0.942 in HapMap HCB), and were located within the serine racemase gene (SRR). SNP rs231361, located in intron 11 of KCNQ1, had a less significant association with T2D, and was selected in stage 1 (P = 1.49610 24 [trend test]; OR = 1.39, 95% CI = 1.17-1.64) ( Table 1). We further genotyped eight additional SNPs within the same LD block from the HapMap Asian group data: rs231359 yielded a P value of 4.56610 24 with a trend test (OR = 1.36, 95% CI = 1.14-1.61) (Figure 2). rs231361 and rs231359 were in strong LD with one another (r 2 = 1 in HapMap HCB), and were located approximately 164 kb upstream of SNP rs2237897, which was previously reported to be significantly associated with T2D in a Japanese population [28,29]. We took rs231361, rs231359, and neighboring SNPs within the LD block forward to replicate in stage 2. Joint analysis of stage 1 and stage 2 data revealed that rs231359 had an even stronger association with T2D than did rs231361 (rs231359: P = 3.43610 28 , OR = 1.33, 95% CI = 1. Additional SNPs that were reported to be significantly associated with T2D in a Japanese population were further genotyped [28,29]. The average call rate for each sample was 99.12%67.21%. After applying stringent quality control criteria, we obtained high-quality genotypes with an average call rate of 99.16%60.18% (Table S2). SNP rs2237895 showed the strongest association with T2D of all the genotyped SNPs in KCNQ1 (P = 9.65610 210 ; OR = 1.29, 95% CI = 1.19-1.40) (Figure 2 and Figure S3; Table S6). Conditioning on the rs2237895, the statistical significance of rs231361 (or rs231359) disappeared. It seems the same underlying biological effect between the 2 SNPs (Table S7).
Subsequently, we sequenced all of the exons, intron-exon boundaries, and up to 1.2 kb of the promoter region of the KCNQ1 gene in 50 individuals with T2D, and identified 42 polymorphic variations, including one nonsynonymous P448R polymorphism and two novel SNPs with minor allele frequency .0.03. We then genotyped the two novel SNPs and one nonsynonymous polymorphism; however, none of these SNPs showed an association with T2D (Table S6).

Discussion
Our GWAS for T2D in a Han Chinese population found two previously unreported susceptibility genes. All of the significant variants detected in our study showed modest effects, with an OR between 1.21 and 1.57. Two loci with less-significant associations in our primary scan (stage 1), PTPRD and KCNQ1, were selected for further replication; both showed compelling evidence of association in joint analysis. The susceptibility loci we identified in this study need to be further replicated in additional populations. Of the 18 loci previously reported to be associated with T2D (with the exception of KCNQ1), none of the P values for any of the SNPs within or near the genes reached 10 25 using allele, genotype, trend, dominant, or recessive models (Table S8; Figure S4). Three SNPs within CDKAL1, JAZF1, and HNF1B had the lowest P values, ranging from 5610 24 to 10 25 , among the 18 known loci (Table S8). No significant associations were found within these regions in our Han Chinese population.
The strongest new signal was observed for rs17584499 in PTPRD. The overall Fst among 11 HapMap groups for rs17584499 was estimated to be 0.068 [52], which indicated a significant difference in allele frequencies among the populations (P,0.0001, chi-square test ) (Table S9). PTPRD is widely expressed in tissues, including skeletal muscle and pancreas, and is expressed highest in the brain. PTPRD-deficient mice exhibit impaired learning and memory, early growth retardation, neonatal mortality, and posture and motor defects [53]. Multiple mRNA isoforms are expressed by alternative splicing and/or alternative transcription start sites in a developmental and tissuespecific manner [54,55]. PTPRD belongs to the receptor type IIA (R2A) subfamily of protein tyrosine phosphatases (PTPs). The R2A PTP subfamily comprises LAR, PTPRS, and PTPRD. The R2A family has been implicated in neural development, cancer, and diabetes [56]. Although the complex phenotype including neurological defects seen in knockout mice could obscure the roles of these genes in glucose homeostasis, LARand PTPRS-deficient mice were demonstrated to have altered glucose homeostasis and insulin sensitivity [49][50][51]. Transgenic mice overexpressing LAR in skeletal muscle show whole-body insulin resistance [57]. Because R2A subfamily members are structurally very similar [54], PTPRD could play a role in T2D pathogenesis and should be further characterized.
The second new association locus was found for rs391300 and rs4523957 in the biologically plausible candidate gene SRR. SRR encodes a serine racemase that synthesizes D-serine from L-serine [58,59]. D-serine is a physiological co-agonist of the N-methyl Daspartate (NMDA) class of glutamate receptors, the major excitatory neurotransmitter receptors mediating synaptic neurotransmission in the brain [60,61]. NMDA receptor activation requires binding of glutamate and D-serine, which plays a neuromodulatory role in NMDA receptor transmission, synaptic plasticity, cell migration, and neurotoxicity [62]. D-serine and SRR are also present in the pancreas [63]. Glutamate signaling functions in peripheral tissues, including the pancreas, and positively modulates secretion of both glucagon and insulin in pancreatic islets [64][65][66]. The nearby SNP rs216193 also showed significant association (P = 2.49610 26 ); this SNP resides 3.8 kb upstream from SRR, within Smg-6 homolog, nonsense mediated mRNA decay factor (C. elegans) (SMG6). rs216193 was in tight LD with rs391300 (r 2 = 0.942 in HapMap HCB). Based on their biological functions and the association results, neither SMG6 nor any of the nearby genes TSR1, SGSM2, MNT, and METT10D were compelling candidates for association withT2D. However, SRR was significantly associated with T2D; thus, we suggest that dysregulation of D-serine could alter glutamate signaling and affect insulin or glucagon secretion in T2D pathogenesis.
Our GWAS revealed that KCNQ1, which was previously reported to be associated with T2D in several populations, was also associated with T2D in a Han Chinese population residing in Taiwan. KCNQ1 encodes the pore-forming a subunit of a voltagegated K + channel (KvLQT1), which is involved in repolarization of the action potential in cardiac muscle [83,84]. Mutations in KCNQ1 cause long QT syndrome [85,86] and familial atrial fibrillation [87]. KCNQ1 is widely expressed, including in the heart, brain, kidney, liver, intestine, and pancreas [88][89][90]. It is also expressed in pancreatic islets, and blockade of the KvLQT1 channel stimulates insulin secretion in insulin-secreting INS-1 cells [91]. KCNQ1 knockout mice have cardiac dysfunctions [88,92] and enhanced systemic insulin sensitivity [93]. In our study, variants in the coding region did not show an association with T2D. The functional variant(s) could be located in the regulatory element of KCNQ1, rather than in the coding region. We did not find an association between either CDKAL1 or IGF2BP2 and T2D, in contrast with the results described in a previous study [29], nor did we find T2D associated with various other genes identified in populations of European descent.
In conclusion, we identified two previously unknown loci that are associated with T2D in a Han Chinese population, and confirmed the reported association of KCNQ1 with T2D. The novel T2D risk loci may involve genes that are implicated in insulin sensitivity and control of glucagon and insulin secretion: PTPRD may participate in the regulation of insulin action on its target cells, while SRR variants may alter glutamate signaling in the pancreas, thus regulating insulin and/or glucagon secretion. Our study suggests that in different patient populations, different genes may confer risks for diabetes, which may lead to a better understanding of the molecular pathogenesis of T2D.

Ethical statement
The study was approved by the institutional review board and the ethics committee of each institution. Written informed consent was obtained from each participant in accordance with institutional requirements and the Declaration of Helsinki Principles.

Subject participants
A total of 2,798 unrelated individuals with T2D, age .20 years, were recruited from China Medical University Hospital (CMUH), Taichung, Taiwan; Chia-Yi Christian Hospital (CYCH), Chia-Yi, Taiwan; and National Taiwan University Hospital (NTU), Taipei, Taiwan. All of the T2D cases were diagnosed according to medical records and fasting plasma glucose levels using American Diabetic Association Criteria. Subjects with type 1 diabetes, gestational diabetes, and maturity-onset diabetes of the young (MODY) were excluded from this study. For the two-stage GWAS, we genotyped 995 T2D cases and 894 controls in the first exploratory genomewide scan (stage 1). In the replication stage (stage 2), we genotyped selected SNPs in additional samples from 1,803 T2D cases and 1,473 controls. The controls were randomly selected from the Taiwan Han Chinese Cell and Genome Bank [94]. The criteria for controls in the association study were (1) no past diagnostic history of T2D, (2) HbA 1C ranging from 3.4 to 6, and (3) BMI,32. The two control groups were comparable with respect to BMI, gender, age at study, and level of HbA 1C . All of the participating T2D cases and controls were of Han Chinese origin, which is the origin of 98% of the Taiwan population. Details of demographic data are shown in Table S10.

Genotyping
Genomic DNA was extracted from peripheral blood using the Puregene DNA isolation kit (Gentra Systems, Minneapolis, MN, USA). In stage 1, whole genome genotyping using the Illumina HumanHap550-Duo BeadChip was performed by deCODE Genetics (Reykjavík, Iceland). Genotype calling was performed using the standard procedure implemented in BeadStudio (Illumina, Inc., San Diego, CA, USA), with the default parameters suggested by the platform manufacturer. Quality control of genotype data was performed by examining several summary statistics. First, the ratio of loci with heterozygous calls on the X chromosome was calculated to double-check the subject's gender. Total successful call rate and the minor allele frequency of cases and controls were also calculated for each SNP. SNPs were excluded if they: (1) were nonpolymorphic in both cases and controls, (2) had a total call rate ,95% in the cases and controls combined, (3) had a minor allele frequency ,5% and a total call rate ,99% in the cases and controls combined, and (4) had significant distortion from Hardy-Weinberg equilibrium in the controls (P,10 27 ). Genotyping validation was performed using the Sequenom iPLEX assay (Sequenom MassARRAY system; Sequenom, San Diego, CA, USA). In the replication stage (stage 2), SNPs showing significant or suggestive associations with T2D and their neighboring SNPs within the same LD block were genotyped using the Sequenom iPLEX assay. The neighboring SNPs in the same LD were selected from the HapMap Asian (CHB + JPT) group data for fine mapping the significant signal.

Statistical analysis
T2D association analysis was carried out to compare allele frequency and genotype distribution between cases and controls using five single-point methods for each SNP: genotype, allele, trend (Cochran-Armitage test), dominant, and recessive models. The most significant test statistic obtained from the five models was chosen. SNPs with P values less than a = 2610 28 , a cut-off for the multiple comparison adjusted by Bonferroni correction, were considered to be significantly associated with the traits. The joint analysis was conducted by combining the data from the stage 1 and 2 samples. We also applied Fisher's method to combine P values for joint analysis. The permutation test was carried out genome-wide for 10 6 permutations, in which the phenotypes of subjects were randomly rearranged. For better estimation of empirical P values, the top SNPs were reexamined using 10 8 permutations. Each permutation proceeded as follows: (1) the case and control labels were shuffled and redistributed to subjects, and (2) the test statistics of the corresponding association test was calculated based on the shuffled labels. The empirical P value was defined as the number of permutations that were at least as extreme as the original divided by the total number of permutations. Detection of possible population stratification that might influence association analysis was carried out using principle component analysis, multidimensional scaling analysis, and genomic control (Text S1). Quantile-quantile (Q-Q) plots were then used to examine P value distributions (Figure 3 and Figure  S5).