Familial Young-Onset Diabetes, Pre-Diabetes and Cardiovascular Disease Are Associated with Genetic Variants of DACH1 in Chinese

In Asia, young-onset type 2 diabetes (YOD) is characterized by obesity and increased risk for cardiovascular disease (CVD). In a genome-wide association study (GWAS) of 99 Chinese obese subjects with familial YOD diagnosed before 40-year-old and 101 controls, the T allele of rs1408888 in intron 1 of DACH1(Dachshund homolog 1) was associated with an odds ratio (OR) of 2.49(95% confidence intervals:1.57–3.96, P = 8.4×10−5). Amongst these subjects, we found reduced expression of DACH1 in peripheral blood mononuclear cells (PBMC) from 63 cases compared to 65 controls (P = 0.02). In a random cohort of 1468 cases and 1485 controls, amongst top 19 SNPs from GWAS, rs1408888 was associated with type 2 diabetes with a global P value of 0.0176 and confirmation in a multiethnic Asian case-control cohort (7370/7802) with an OR of 1.07(1.02–1.12, Pmeta = 0.012). In 599 Chinese non-diabetic subjects, rs1408888 was linearly associated with systolic blood pressure and insulin resistance. In a case-control cohort (n = 953/953), rs1408888 was associated with an OR of 1.54(1.07–2.22, P = 0.019) for CVD in type 2 diabetes. In an autopsy series of 173 non-diabetic cases, TT genotype of rs1408888 was associated with an OR of 3.31(1.19–9.19, P = 0.0214) and 3.27(1.25–11.07, P = 0.0184) for coronary heart disease (CHD) and coronary arteriosclerosis. Bioinformatics analysis revealed that rs1408888 lies within regulatory elements of DACH1 implicated in islet development and insulin secretion. The T allele of rs1408888 of DACH1 was associated with YOD, prediabetes and CVD in Chinese.


Introduction
Genome-wide association studies (GWAS) and their metaanalyses have discovered novel loci for T2D in European [1] and Asian populations with odds ratios (ORs) of 1.1-1.4 [2,3,4]. In Asia, the most rapid increase in diabetes occurs in the young-tomiddle aged group. Using Hong Kong Chinese as an example, 20% of people with diabetes were diagnosed before the age of 40 years. In these young patients, less than 10% had type 1 presentation and 15% had monogenic diabetes. In the remaining patients, family history, obesity and premature cardiovascular disease (CVD) were prominent features [5,6,7]. In the Hong Kong Family Diabetes Study (HKFDS) which recruited family members of patients with young-onset diabetes (YOD), we reported strong heritability of diabetes and obesity [8] with co-linkage of related traits to multiple chromosomal regions including chromosome 1q [9]. However, the genetic basis of this form of YOD has not been studied.
Epidemiological analysis has confirmed the clustering of metabolic syndrome, insulin resistance, diabetes and CVD [10,11], which may share common genetic, environmental or lifestyle factors [1]. In a meta-analysis of 3 GWAS conducted in Chinese from Hong Kong and Shanghai, we discovered rs10229583 located in 7q32 near PAX4 which was subsequently confirmed in Asian and Caucasian populations [12]. In the first discovery cohort of this meta-analysis consisting of 99 Hong Kong Chinese patients with YOD diagnosed before 40-year-old with at least 1 affected first degree relative and obesity, the T allele of rs1408888 in intron 1 of DACH1 (Dachshund homolog 1) with P, 10 25 was replicated in a multiethnic Asian case-control cohort, albeit insignificant in Caucasians.
In Drosophila, Dachshund (DAC) is the homolog of DACH1 in human which is a highly conserved transcription factor implicated in developmental biology [13,14]. In a recent report, DAC was found to interact physically with PAX6 to control insulin expression [14]. In light of these findings, we revisited the risk association of rs1408888 of DACH1 and explored whether this genetic variant might be associated with YOD and related traits in Chinese populations. To test this hypothesis, we examined the differential expression of DACH1 in peripheral blood mononuclear cells (PBMC) in patients with YOD and tested the genetic associations of rs1408888 in multiple case-control cohorts followed by bioinformatic analysis. We found reduced expression of DACH1 in PBMC from subjects with YOD and association of the T allele of rs1408888 with YOD, prediabetes and CVD. This risk variant is located within the vicinity of conserved non-coding elements (CNE) of DACH1 associated with multiple consensus transcription factor binding sites and chromatin modification sites suggesting possible regulatory functions. These findings suggested that genetic variants of DACH1 may be implicated in abnormal islet biology resulting in prediabetes, YOD and CVD in Chinese populations.

Associations with familial young-onset T2D in GWAS
In the GWAS of 99 obese Chinese subjects with familial YOD and 101 controls, 425,513 of 541,891 autosomal SNPs passed quality control with no population stratification using multidimensional scaling analysis and after adjusting for genomic control (GC). From stage 1 GWAS, 24 unique loci with the lowest P value in the dataset (P,10 24 ) were taken forward for replication in 1468 cases and 1485 controls. Of these, 19 SNPs passed the quality control criteria and 2 SNPs (rs1408888 and rs1449675) remained significantly associated with T2D ( In the combined analysis (1567 cases, 1586 controls), three more SNPs (rs6595551 in ZNF608, rs987105 in MUT, and rs1413119 in an intergenic region on chromosome 13) showed nominal associations (P,0.05). Among these five SNPs, rs1408888 in DACH1 had the highest OR [1.21(1.08-1.35); P = 9.1610 24 ] which remained significant (P = 0.0176) after correction for multiple testings of the 19 SNPs using 10,000 permutations. In a meta-analysis of 5 Asian case-control cohorts consisting of 7370 cases and 7802 controls, we obtained a combined OR of 1.07(1.02-1.12, P = 0.0112) with no heterogeneity [P = 0.107 in Cochran's Q test and I 2 = 44.8% (0.0%-78.1%)] ( Table 2).

Associations with quantitative traits in healthy adults
In 599 healthy adults and after adjustment for age and gender, systolic blood pressure (BP), Homeostasis Model Assessment index for insulin resistance (HOMA-IR) and b cell function (HOMA-b) and fasting insulin were associated with increasing number of alleles. Using multivariate analysis, the T-allele of DACH1 rs1408888 was associated with systolic BP

Clinical and pathological association with cardiovascular disease (CVD)
In the matched case-control cohort, Chinese T2D patients with CVD were more obese and had worse dyslipidaemia and renal function than those without CVD ( Table 4). The TT/TG genotype was associated with an OR of 1.51 (P = 0.02) which remained significant after adjusting for estimated glomerular filtration rate (eGFR) with an OR of 1.54 (1.07-2.22 P = 0.019). In an autopsy series of 173 non-diabetic cases, rs1408888 did not depart from Hardy-Weinberg Equilibrium (HWE). Compared to cases with TG/GG genotype (n = 83, mean6SD age: 70.6615.7 years, 43% female), cases with TT genotype (n = 90, age: 67.0615.7 years, 38% female) were more likely to have a history of coronary heart disease (CHD) (17.8% versus 7.2%, P = 0.0375) and coronary arteriosclerosis (16.7% versus 6.0%, P = 0.0287). The respective ORs were 3.31 (1.19-9.19, P = 0.0214) and 3.27(1.25-11.07, P = 0.0184) after adjustment for age and sex.

Bioinformatics analysis and expression study
Two neighboring SNPs in weak linkage disequilibrium (LD) (r 2 <0.5) with rs1408888 (rs9572813 and rs17791181) also showed nominal association with T2D (P = 0.01-0.001) in the GWAS analysis ( Figure 1). Bioinformatics analysis revealed that the region between rs1408888 and rs9572813 overlapped with a regulatory element conserved from fugu fish to human [15]. On datamining, this element [OREG0002711 (http://www.oreganno.org/ oregano/) or chr13:72,425,787-72,428,335 (hg19) (http:// enhancer.lbl.gov/frnt_page_n.shtml)] shows an enhancer activity which directs the distinct expression of a b-galactosidase reporter gene in the eye, cranial nerve, forebrain, hindbrain and neural tube in the mouse embryos [15,16]. In this region, another non- coding element (CNE803) [17], highly conserved in vertebrates, shows homology to an EST from the human eye (BY797940) ( Figure 1). We sequenced the region between rs1408888 and rs9572813 in 74 controls and 82 cases who had GWAS data and did not discover novel variants in the CNE803 nor in the surrounding regions. In this genomic region containing multiple SNPs, 3 SNPs (rs17252745, rs17252752 and rs57143718) with marked inter-ethnic differences in the NCBI SNP database ( Expression of DACH1 was detected in PBMC and PPC. Using quantitative real-time PCR, lower DACH1 expression was found in PBMC from patients with YOD compared to control subjects (P = 0.02 for regression with age and sex adjustment, P = 0.005 for Wilcoxon rank sum test) ( Figure 2).

Discussion
In an adequately powered case-control cohort of familial YOD characterized by obesity, we discovered several common SNPs (allele frequency.0.3) with ORs of 2-2.5, including the T allele of rs1408888 of DACH1 with allele frequency of 0.75 and an OR of 2.49. In a subset of this cohort, we found reduced expression of DACH1 in PBMC in patients with YOD. In the stage 2 experiment, we successfully genotyped 19 SNPs with the lowest P value and replicated the association of rs1408888 with an OR of 1.21 and a global P,0.05 in a random case-control cohort of 2953 Chinese with older age of diagnosis. This was followed by confirmation in 15,172 Asian subjects with an OR of 1.07. We further demonstrated that the T allele was associated with insulin resistance and high BP in normal subjects, CVD in patients with T2D, and coronary arteriosclerosis in autopsy samples. Bioinformatics analysis revealed its location in a conserved region within the intron of DACH1, implicated in pancreatic islet development [13]. Unlike the novel SNP rs10229583 at 7q32 near PAX4 discovered in a meta-analysis of 3 GWAS comprising 684 T2D patients and 955 controls of Southern Han Chinese descent with confirmation in a multi-ethnic population (P meta = 2.3610 210 in East Asians; P = 8.6610 23 in Caucasians) [12], the risk association of rs1408888 with T2D was increasingly attenuated in a multi-ethnic population, suggesting that this SNP might be more relevant to Asians undergoing rapid transition characterized by obesity and young age of diagnosis [6].
Known function of DACH1 DACH1, located on chromosome 13q21, is the mammalian homologue of the Drosophila dachshund (Dac) gene. It encodes a well-conserved nuclear protein capable of binding to DNA and has two highly conserved domains (DachBox-N and DachBox-C) from Drosophila to humans [19,20]. It is a key component of the retinal determination gene network that governs cell fate and plays a key role in ocular, limb, brain and gonadal development [21]. DACH1 knockout mice die shortly after birth, with no gross histological abnormalities observed in eyes, limbs, or brain tissues, suggesting its possible role in perinatal development [22]. Herein, DACH1 is a critical transcription factor in tissue differentiation and organ  development. One of its gene targets is the mothers against decapentaplegic homolog 4 (SMAD4) which upon binding with DACH1 can result in repression of TGFb signaling and TGFbinduced apoptosis [23]. In this regard, increased TGFb1 activity has been implicated in heart and vascular development, hypertension and progressive myocardial fibrosis [24,25]. In a recent proof-of-concept analysis, researchers merged co-expression and interaction networks and detected/inferred novel networks to explain the frequent but not invariable coexistence of diabetes and other dysfunctions. One of these networks included regulation of TGFBRII which facilitated oxidative stress with expression of early transcription genes via MAPK pathway leading to cardiovascularrenal complications. The second network proposed the interaction of beta-catenin with CDH5 and TGFBR1 through Smad molecules to contribute to endothelial dysfunction [26], often a precursor of CVD [27].

DACH1, T2D and Cardiovascular-renal disease
In the Wellcome Trust Case Control Consortium [28], Diabetes Genetics Initiative [29] and a recent Chinese GWAS [3], DACH1 was among the list of genes with nominal association (P,0.05) with T2D. In the Emerging Risk Factor Collaboration Study, diabetes was associated with multiple morbidities including cardiovascular and renal disease [30]. In a recent meta-analysis of GWAS data in 67,093 individuals of European ancestry, DACH1 was one of the susceptibility loci for reduced renal function [31]. Although we did not find association between rs1408888 and renal function in our study, these consistent  findings highlight the possible role of DACH1 in regulating functions of multiple organs. In support of these genetic associations, in a mouse model of diet-induced b-cell dysfunction, islet DACH1 gene expression was reduced in prediabetic animals fed a high-fat diet [32]. In both zebrafish and mice, loss of DACH1 resulted in reduced numbers of all islet cell types, including b-cells [13]. Although deletion of DACH1 in mice did not affect the number of PPC, it blocked the perinatal burst of proliferation of differentiated b-cells [13]. In Drosophila, there was strong expression of Dac, the homolog of DACH1/2, in insulin-producing cells with Dac interacting physically with Pax6 homolog Eyeless (Ey) to promote expression of insulin-like peptides. In a similar vein, the mammalian homolog of Dac, DACH1/2, also facilitated the promoting action of Pax6 on the expression of islet hormone genes in cultured mammalian cells [14].
Given the strong links between T2D and CVD, the association of TT/TG genotype of DACH1 with CVD with an odds ratio of 1.54, after adjustment for age, sex, disease duration and eGFR was noteworthy. Patients with CVD were more obese and had more risk factors including high BP. Interestingly, in normal subjects, the T allele was linearly associated with BP and insulin levels which are well known risk factors for CVD [11]. In the autopsy series, TT carriers had 2-3 fold increased risk of coronary arteriosclerosis and CHD. These consistent findings in independent cohorts at different stages of the spectrum of cardio-metabolic disease, together with experimental studies from other groups, strongly support the role of DACH1 in these complex diseases.  The complexity of human evolution and natural selection by external forces, including but not limited to temperature, foods, infections, can result in diversity of genomic architecture and expression, making replication of genetic association of complex diseases challenging [33]. Given the rapid westernization of Hong Kong Chinese within less than a century, we hypothesize that DACH1 may be a thrifty gene which regulates growth to improve survival chances during time of hardship but increases risk of obesity, prediabetes, YOD and CVD during time of affluence [34]. Hitherto, apart from maturity onset diabetes of the young [35], the genetics of familial YOD characterized by obesity have not been well studied and these results might motivate further research in subjects with these phenotypes to confirm or refute our hypothesis.

Possible significance of rs1408888 of DACH1
The risk allele rs1408888 is located in the first intron of DACH1 within the vicinity of conserved elements [16], which can direct a unique gene expression pattern resembling the embryonic expression pattern of DACH1 [22]. One of these elements, CNE803 located 1.6 Kb from rs1408888 (Figure 1), showed sequence homology to an EST from an eye library (BY797940). We sequenced this region in the original GWAS cohort but did not find any novel SNPs. Three SNPs in this region which were common in Chinese but rare in Caucasians showed nominal associations with T2D in the discovery cohort, with rs57143718 replicated in an expanded case-control cohort of YOD. Using PPC, we were unable to detect expression of CNE803 but found multiple DACH1 isoforms (data not shown). On bioinformatics analysis, rs1408888 is located in a region with multiple consensus transcription factor binding sites and closely associated with open and active chromatins (Table S1), suggesting that this region may regulate DACH1 expression. In support of these predictions, we found reduced expression of DACH1 in PBMC in patients with YOD. Although there is no direct link between rs1408888 and rs57143718 to the expression level of DACH1 in PBMC, the reduced expression of DACH1 in YOD patients supports  Table 5. Clinical profiles of discovery, replication and validation cohorts in a 3-stage genome wide association study in Asian populations. importance of DACH1 and agrees with the bioinformatic analysis of the region surrounding rs1408888. The functional significance of this SNP/locus requires further exploration.

Limitations and conclusion
In this multi-staged experiment, we have discovered risk association of an intronic SNP (rs1408888) of DACH1 with YOD, BP, insulin resistance and CVD in Chinese populations. Although no significant association between rs1408888 and the YOD subgroup in the stage 1 replication cohort was found (P. 0.1), possibly due to small sample size, the age of diagnosis was relatively young in our discovery (31.867.7 years) and replication cohorts for T2D (44.0613.6 years) and that for CVD (54.3612.7 years) (Tables 4 and 5) compared to most GWAS. Given that these findings were mainly found in Asian population where YOD is a predominant feature, our findings highlight the need for more genetic research in YOD. Together with the known function of DACH1 on developmental biology and regulation of insulin secretion and its reduced expression in human PBMC associated with YOD, these findings add to the growing body of knowledge regarding the candidacy role of DACH1 for cardio-metabolic dysfunction, which may manifest as YOD in populations undergoing rapid transition in nutrition and lifestyles.

Risk association with diabetes using multiple cohorts
The clinical characteristics of the discovery and replication cohorts of the Asian population as well as methods of genotyping and genetic analysis have been reported [12]. In brief, the discovery cohort (stage 1) consisted of 99 obese Chinese subjects (BMI$27 kg/m 2 or waist$90 cm in men or $80 cm in women) with YOD (age of diagnosis,40 years) and at least one affected first-degree relative selected from the HKFDS [8,9] and the Hong Kong Diabetes Registry (HKDR) [36]. The 101 age-and sexmatched control subjects were selected from a community-based health promotion program [37]. The replication cohort (stage 2) consisted of 1468 T2D subjects selected from the HKDR [36] and unrelated subjects from the HKFDS [8] while the control cohort consisted of 507 healthy volunteers [37] and 978 adolescents [38]. The validation cohorts (stage 3) consisted of 1892 cases and 1808 controls from Shanghai [39]; 749 cases and 616 controls from Korea [40]; 2804 cases (2010 Chinese, 794 Malay) and 2185 controls (1945 Chinese, 1240 Malays) from Singapore [41] and 471 cases and 582 controls from Japan [40] (Table 5)

Quantitative traits, CVD and clinico-pathological features
All subjects in the Hong Kong control cohort had documentation of anthropometric indexes, BP, cardiovascular risk factors, plasma insulin and glucose during 75 gram oral glucose tolerance test [42]. The case-control cohort was selected from the HKDR set up in 1995 as part of a quality improvement program using structured protocols [36]. Using this cohort, we selected a casecontrol cohort of CVD (953/953) matched for age, sex and disease duration. Definitions of CVD (including ischaemic heart disease, stroke and peripheral vascular diseases) were based on clinical assessments and the International Classification of Disease 9 th version. In an autopsy series with documentation of pathological features and clinical history [43,44], we extracted genomic DNA from archived paraffin blocks using white blood cell-concentrated spleen tissues for genotyping. Figure 3 summarizes the selection criteria and study flow.
Replication cohort. In the second stage, 24 SNPs with the lowest P value (P,1610 24 in allelic test) were genotyped in the replication cohort. Only one SNP was genotyped for locus with multiple SNPs in high LD (r 2 .0.6). Genotyping was performed at the McGill University and Genome Quebec Innovation Centre using primer extension of multiplex products with detection by MALDI-TOF mass spectroscopy on a Sequenom MassARRAY platform (San Diego, CA, USA). Of 24 genotyped SNPs, 5 were excluded due to low call rate (,90%). All remaining 19 SNPs were in HWE in controls with a concordance rate of 96% in 65 blinded duplicate samples.
Validation cohorts. rs1408888 which showed significance in stage 1, 2 and combined cohorts were genotyped in the Asian populations using the following methods: 1) Shanghai Chinese: MassARRAY platform (MassARRAY Compact Analyzer, Sequenom, San Diego, CA, USA) with 97.5% call rate and 100% concordance rate; 2) Korea: Assay-on-Demand TaqMan assays (Applied Biosystems, Foster City, CA, USA) and ABI PRISM 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA, USA) with 99.4% call rate and 100% concordance rate based on 13 duplicates; 3) Japan: TaqMan SNP genotyping system (Applied Biosystems, Foster City, CA) and ABI PRISM 7700 system with 20% of samples directly sequenced using Sanger sequencing and analyzed with an ABI 3100 capillary sequencer with 100% concordance rate. For the Singapore study, the Chinese samples were genotyped on the 610Quad and 1Mduov3 platforms while the Malay samples were genotyped on the Illumina HumanHap 6100Quad. We removed SNPs with call rate ,95%, or departure from HWE (P ,0.0001), or which were monomorphic.
Autopsy series. DNA was obtained using a modified DNAextraction protocol [43] for genotyping using a Taqman kit from ABI and an ABI 7900HT Fast Real-Time PCR System.

Detection of DACH1 and CNEs expression by reversetranscription PCR
Expression of CNE803 and DACH1 transcripts in various cell types were detected by RT-PCR. Expression of the CNE (220bp) was detected by primers 59-TAATACCATTGCCCCAAGGA-39 and 59-TTTGGATTTCAGCCTTGTCA-39. Expression of DACH1 was detected using 59-CTGCACCAACGCAAGTTC-TA-39 and 59-ATAAGCCCATCAGCATCTGG-39 as primers. Expression of b actin was used as a positive control using 59-AGAGCTACGAGCTGCCTGAC-39 and 59-AGCACTGTGT-TGGCGTACAG-39 as primers. Expression of DACH1 in PBMC was quantified by real-time PCR using the SYBR Green method with 59-GTGGAAAACACCCCTCAGAA-39 and 59-CGAAG-TCCTTCCTGGAGATG-39 as primers in an ABI 7900HT Real-Time PCR system. Expression level was normalized to the expression of b actin for comparison using the DDCt method. The result was analyzed by Wilcoxon rank sum test and regression analysis with sex and age adjustment.

Statistical analysis
We used PLINK v1.07 (http://pngu.mgh.harvard.edu/purcell/ plink/), Statistical Analysis Software v.9.1 (SAS Institute, Cary, NC, USA) or Statistical Package for Social Sciences for Windows v.15 (SPSS, Chicago, IL, USA) for all statistical analyses, unless specified otherwise. All data are presented as mean6SD or median (interquartile range) unless specified. Categorical variables were compared using x 2 test, Fisher's exact test and logistic regression, expressed as ORs and 95% CI as appropriate. In healthy controls, fasting plasma insulin, HOMA-IR and HOMA-b were logarithmically transformed due to skewed distributions. Genotype-phenotype associations were tested by multivariable linear regression adjusted for sex and age under the additive genetic model expressed in b coefficients with 95%CI. Betweengroup comparisons were performed by x 2 test, Student's t-test or Wilcoxon Rank Sum as appropriate. We used logistic regression to examine genetic association with CVD in a case-control cohort matched for age, sex and disease duration with further adjustment for logarithm of eGFR, expressed as OR (95%CI). A two-tailed P value,0.05 was considered significant.

Sample size estimation
Assuming an additive model with allele frequencies of 0.05-0.30, and an OR of 1.2-3.0 (for a prevalence of 0.1), we used the Genetic Power Calculator [45] to estimate the power for stage 1 (genome scan) and stage 2 (replication) at a levels of 1610 24 and 0.05, respectively. For allele frequency.0.2, a sample size of 200 had 90% power to detect an OR of 3 and 75% power for an OR of 2.5. For the replication cohort with a sample size of 3000, we had 90% power to confirm an OR 1.2 for allele frequency.0.2. For the risk association with CVD, for SNP with allele frequency. 0.2, a sample size of 2000 had over 90% power to confirm an OR of 1.5.

GWAS and meta-analysis
Distributions of all genotypes were analyzed for deviation from HWE by x 2 test with one degree of freedom. In stage 1 experiment, we estimated possible familial relationship using estimates of identity-by-descent (IBD) derived from pair-wise analyses of 102,919 independent (r 2 <0) and quality SNPs. We did not detect population stratification using multidimensional scaling analysis and the inflation factor l for GC. GC [46] was applied to correct for relatedness of the subjects and adjust for potential population stratification. The inflation factor l was estimated by taking the median of the distribution of the x 2 statistic from 425,513 quality SNPs in allelic test, and then divided by the median of the expected x 2 distribution. The Quantile-Quantile plots were used to compare the observed and expected distributions for the 1df x 2 statistics generated from allelic tests with or without correction for GC in the discovery stage. We calculated the corrected P values by dividing the observed x 2 statistic by l. For the top signals taken forward for replication, we used Haploview v4.1 to generate pairwise LD measures and the Manhattan plot as well as LocusZoom v1.1 to generate the regional plots for the interested gene loci. For analysis of data from stage 1, 2 and combined dataset, we used allelic x 2 tests in 262 contingency tables to derive the OR after correction for multiple testings in 10,000 permutations. We used MIX v1.7 [47] to perform meta-analysis and calculated the combined estimates of ORs by weighting the natural log-transformed ORs (with respect to the same allele) of each study using the inverse of their variance under the fixed effect model. Cochran's Q statistic (P ,0.05) and I 2 were used to assess heterogeneity of ORs between studies.