Replication and Relevance of Multiple Susceptibility Loci Discovered from Genome Wide Association Studies for Type 2 Diabetes in an Indian Population

Aim Several genetic variants for type 2 diabetes (T2D) have been identified through genome wide association studies (GWAS) from Caucasian population; however replication studies were not consistent across various ethnicities. Objective of the current study is to examine the possible correlation of 9 most significant GWAS single nucleotide polymorphisms (SNPs) for T2D susceptibility as well as the interactive effect of these variants on the risk of T2D in an Indian population. Methods Case-control cohorts of 1156 individuals were genotyped for 9 SNPs from an Indian population. Association analyses were performed using logistic regression after adjusting for covariates. Multifactor dimensionality reduction (MDR) analysis was adopted to determine gene–gene interactions and discriminatory power of combined SNP effect was assessed by grouping individuals based on the number of risk alleles and by calculating area under the receiver-operator characteristic curve (AUC). Results We confirm the association of TCF7L2 (rs7903146) and SLC30A8 (rs13266634) with T2D. MDR analysis showed statistically significant interactions among four SNPs of SLC30A8 (rs13266634), IGF2BP2 (rs4402960), HHEX (rs1111875) and CDKN2A (rs10811661) genes. Cumulative analysis showed an increase in odds ratio against the baseline group of individuals carrying 5 to 6 risk alleles and discriminatory power of genetic test based on 9 variants showed higher AUC value when analyzed along with body mass index (BMI). Conclusion These results provide a strong evidence for independent association between T2D and SNPs for in TCF7L2 and SLC30A8. MDR analysis demonstrates that independently non-significant variants may interact with one another resulting in increased disease susceptibility in the population tested.


Introduction
Type 2 Diabetes is phenotypically and genetically diverse and is a major global health concern affecting more than 76 million Indians and over 408 million individuals worldwide (http:// www.idf.org/atlasmap/atlasmap). The risk of an individual towards T2D reflects the environmental influence in the background of genetic predisposition. Owing to the complex etiology of the disease progression, the identification of genetic markers has been slow until 2007. The advent of new technology in the form of microarray chips, has led to the development of high throughput genome wide association study (GWAS). Nevertheless, only a few variants in genes such as KCNJ11, PPARG, SLC30A8, and TCF7L2 were reported to be linked with T2D [1][2][3]. However, GWAS in a French population by Sladek et al., 2007 reported variants in TCF7L2, SLC30A8 and HHEX as new loci for T2D [4]. Subsequent GWAS in various populations have identified SNPs in several novel genes such as IGF2BP2, PPARG, FTO, CDKN2A, CDKAL1, KCNQ1 and JAZF1 to be associated with T2D [5][6][7][8][9]. Palmer et al., 2012 showed variants of SLC44A3, RBM43, RND3, GALNTL4, TMEM45B and BARX2 as new susceptibility loci in African -American population [10]. Till date, GWAS have identified >70 susceptibility loci associated with T2D [10][11][12][13][14][15][16][17][18][19]. Besides, two well replicated genes, PPARG and KCNJ11 which were initially shown to be associated with T2D through candidate gene studies were also confirmed through GWAS in European population [7,9]. Though the association of many common variants established by GWAS has been replicated in several Caucasian populations, the results are conflicting in Asian population in general and Indian population in particular. Amongst all the association studies on T2D in Indian population, only the variants in TCF7L2 (rs7903146, rs12243326 and rs4506565) has been consistently replicated and shown to be most promising [3,20,21].
Evidence from large population based and epidemiological studies showed that Indian population is genetically more prone to insulin resistance and diabetes [22]. Moreover, Indians develop T2D a decade earlier at a BMI which is much lower than Caucasians and this has been, attributed to their increased central obesity. This phenotype of increased tendency towards resistance of insulin effect is referred to as "Asian-Indian phenotype" [23]. Thus, understanding the contribution of the common genetic loci on T2D risk in Indian population comparatively to those originally identified by GWAS of European and American population is extremely important. However, there is a significant difference in the frequencies of risk alleles and linkage disequilibrium pattern in some genetic variants across different ethnicities [23] and hence there is a need to evaluate the same. Moreover, previous report of Indian Genome Variation (IGV) Consortium has showed how populations from Indian-sub-continent are discrete from HapMap populations [24]. Hence, in the present study we prioritized to evaluate previously identified common genetic variants in KCNJ11 (rs5219), TCF7L2 (rs7903146), SLC30A8 (rs13266634), IGF2BP2 (rs4402960), HHEX (rs1111875), CDKN2A (rs10811661), KCNQ1 (rs2237892), CDKAL1 (rs7754840) and FTO (rs8050136) for their association with T2D susceptibility. These SNPs have also been studied among Indian and non-Indian population with conflicting results. Our study population primarily comprises of a diverse mix of ethnic and linguistic groups of Karnataka state in southern part of India. Thus replication studies among the different ethnic and linguistic groups within the Indian subcontinent in general and within the south Indian population in particular may provide valuable information on the genetic risk factors for disease predisposition. Previous reports have shown that the cumulative effect of multiple common SNPs of small effect size are also likely to interact with each other in T2D [25]. Hence, the allele dosage and gene-gene interaction analysis of the variants was also performed to investigate their influence on T2D risk.

Characteristics of study subjects
Study population comprised of 1156 unrelated individuals (578 cases and 578 controls) of southern part (Karnataka state) of India who were recruited from outpatient departments of Kasturba Hospital, Manipal and Mangalore, India. T2D subjects were recruited in accordance with following inclusion criteria: (1) T2D should be diagnosed according to world health organization (WHO) criteria, (2) onset of disease age should be >30 years and (3) should belong to south Indian origin. For control subjects, inclusion criteria's were (1) levels of fasting glucose should be <126mg/dl and glycated haemoglobin (HbA1c) < 6.0%, (2) no history of diabetes in the first degree relatives, (3) age >40 years and (d) of south Indian origin. Written informed consent was obtained from all participating subjects and study was approved by the Institutional Ethics Committee, Kasturba Hospital, Manipal.

Clinical and biochemical variables
General anthropometric measurements including height (m) and weight (kg) were obtained as per standardized protocols, and BMI was calculated using the formula weight (in kilograms)/ height (in square meters). Fasting plasma glucose (FPG), total cholesterol (TC), high density lipoprotein (HDL) cholesterol, and triglycerides (TG) were estimated using Hitachi 912 autoanalyzer (Roche, Basel, Switzerland). Low density lipoprotein cholesterol (LDL) was calculated using the Friedewald formula (LDL = TC-HDL-TG/5.0 (mg/dl) and glycated haemoglobin (HbA1c) levels were estimated in whole blood using Cobas Integra 512 clinical chemistry autoanalyzer.

Statistical analysis
Quanto version 1.2.4 was employed for sample size calculation using minor allele frequency data from dbSNP database (http://www.ncbi.nlm.nih.gov/projects/SNP/). The categorical data of SNPs for association analysis with T2D was performed by using Pearson's chi-squared test to detect differences in allele frequencies between cases and controls. Hardy-Weinberg equilibrium test was performed using a χ2 goodness-of-fit test to assess genotype frequencies. Association analysis was further confirmed by logistic regression after adjusting for age, sex and BMI as covariates and association results of SNPs with T2D was assessed by using odds ratio (OR) and corresponding 95% confidence interval (CI). In our analysis the most frequent homozygous genotype in the control population was considered as the reference category. Further we evaluated the differences in continuous variables (clinical variables between cases and controls) using Students t-test and data are represented as mean ± SD. Bonferroni correction has been used to reduce the chances of obtaining false-positive results (type I errors). A P value of less than 0.005 has been kept as a threshold of significance (after Bonferroni correction). All statistical analysis was performed using SPSS version 16 for windows (SPSS, Chicago, Illinois, USA) and SNPstat Software. Population attributable risk (PAR%) which identify what percentage of total risk for T2D is due to genetic effect of the variant was estimated for those SNPs which showed positive association with T2D susceptibility. For allele dosage analysis to acquire the combined information from multiple SNPs, we used allele count model where we summed the number of exact risk alleles carried by each individual in cases and control group.

Multifactor dimensionality reduction (MDR) analysis
To investigate higher order gene-gene interaction among the tested SNPs, MDR (MDR, V2.0 Beta 2) method was employed. MDR is a nonparametric method and therefore no hypothesis concerning the value of any statistical parameter is made. It is also model free, thus no genetic inheritance model is assumed. All interactions and identification of best models were performed using 10 fold cross validation consistency (CVC) and permutation testing was done considering all possible SNP combination. Testing balance accuracy in the context of 10-fold cross-validation was used to assess model quality. An overall best model was selected that had the maximum accuracy in the testing data (i.e. testing accuracy or TA). We also recorded the cross-validation consistency or CVC. This provides a summary of the number of cross-validation intervals in which a particular model was found. Higher numbers indicate more robust results. Those models/SNP combinations with highest testing balanced accuracy (TBA) and CVC was selected as ''best model'. This procedure generates an empirical estimate of the null distribution of testing accuracies and corrects for multiple testing because the same number of models are evaluated in all permutated and real data. All interactions were visualized by constructing an interaction dendrogram according to the method described by Moore et al., 2004 [26, 27, 28].

Results
We genotyped 9 SNPs of GWAS in a case-control cohort of south Indian population consisting of 578 cases and 578 normal glucose tolerant (NGT) controls matched for age, sex and ethnicity. Table 1 shows the anthropometric and biochemical measurements among cases and NGT controls. In general, individuals with T2D had a higher BMI, HbA1c, FPG, TG, TC, LDL levels and lower HDL cholesterol levels. As shown in table 2, single marker association analysis for T2D revealed a significant association of TCF7L2 (rs7903146) and SLC30A8 (rs13266634) with an allelic OR = 1.46, 95% CI 1.15-1.85, p = 0.001 and OR = 1.33, 95% CI 1.10-1.60, p = 0.002. No significant association was observed between the remaining 7 loci and T2D in this population (p>0.05). Given that, variants in FTO and HHEX have shown to be associated with BMI, we further examined the association of these polymorphisms with BMI in our NGT control subjects and found that FTO (rs8050136) was significantly associated with BMI ( Table 3).
The individual risk variants in our study showed similar effect sizes as compared to other large studies from Caucasian population (2,4,5,7,9). Several variants in our study though are not associated with T2D at p<0.05 were still included in our allele dosage analysis because these were confirmed T2D risk variants and lack of significance may probably due to lower risk allele frequencies. Fig 1 represents the percentage of cases and NGT control subjects who were grouped according to the number of exact risk alleles they carry. Risk allele distribution followed a normal distribution pattern in cases and NGT controls with a shift towards higher risk allele number in T2D subjects. There was an increase in OR for T2D with an increase in the number of risk alleles against the baseline group of 1.3% of individuals carrying exactly 5-6 risk alleles. Of individuals with exactly 13-14 risk alleles 14.5% had an OR of 1.58 (95% CI 1.1-2.27) against the baseline reference group. We further evaluated the percentage of total risk for T2D and genetic effect of SLC30A8 (rs13266634) and TCF7L2 (rs7903146) which were found to be associated with T2D in our population. We observed that TCF7L2 (rs7903146) accounts for 18.4% and SLC30A8 (rs13266634) accounts for 30.4% PAR. These two genetic regions  contribute to a total of 48.8% population attributable risk (PAR) for T2D (Table 4). We assessed the discriminatory power of genetic test based on nine T2D variants by calculating the area under ROC (Receiver operating characteristic) curve (Fig 2). The area under ROC curve for nine T2D variants studied here was 0.536. Similar analysis was performed for obese and non-obese subgroups in the cohort which showed an area under the curve value of 0.514 and 0.529 respectively. Further, we tested if BMI would add to discriminatory power of these 9 risk variants and the area under the curve showed marginal increase (0.624) when BMI and risk  variants were assessed cumulatively. No significant difference in area under the curve was observed when FTO (rs8050136) was removed from analysis. Consecutively to expand the findings in our data, MDR analysis was applied to detect higher order gene-gene interaction in cases and controls. MDR analysis of the above mentioned variants revealed statistically significant interactions. Best interaction models along with testing accuracy and cross validation consistency are represented in table 5. The overall best model with highest CVC consisted of a model that included SLC30A8 (rs13266634), IGF2BP2 (rs4402960), HHEX (rs1111875) and CDKN2A (rs10811661) SNPs. This model has a significant TBA of 0.618 and a CVC of 10/10. However it's worth mentioning that 3 SNPs showing genetic interaction in this model were not associated with increased risk of T2D in univariate analysis (IGF2BP2 (rs4402960), HHEX (rs1111875) and CDKN2A (rs10811661)). The entropy based dendrogram obtained by MDR evidently showed an intricate and a hierarchical pattern of interaction among the gene variants constituting the polygenic basis of the disease (Fig 3). In particular CDKN2A (rs10811661) and FTO (rs8050136), IGF2BP2(rs4402960) and HHEX (rs1111875) loci showed maximum degree of synergy in their interactions and on the contrary, different degree of redundancy (antagonism) was observed between TCF7L2 (rs7903146) and SLC30A8 (rs13266634) . Fig 4 depicts a graphical representation of the combined effect of four locus model of SLC30A8 (rs13266634), IGF2BP2 (rs4402960) and HHEX (rs1111875), and CDKN2A (rs10811661) as high and low risk groups along with its statistical interactions. As per the four locus model, individuals homozygous for the wild type allele in SLC30A8 (rs13266634), IGF2BP2 (rs4402960), HHEX (rs1111875) and individuals homozygous for the mutant allele in CDKN2A (rs10811661) were placed in low risk group while, individuals heterozygous for all four SNPs were placed in high risk group. However, we also observed that individuals heterozygous for at least three of the four alleles were also in high risk group.

Discussion
The outcome of GWAS has significantly contributed to the discovery of number of genetic loci linked to T2D [10][11][12][13][14][15][16][17][18][19]. In the present investigation, we have performed an unbiased replication study of SNPs associated with T2D derived from previously published GWAS in our independent dataset from south Indian origin. One of the selected SNP (KCNJ11 rs5219) belongs to well replicated biological candidate gene which has shown a clear evidence of its relationship to T2D susceptibility [2,3]. Until now, only TCF7L2 (rs7903146) has been shown to have a strongest association with the highest effect size for T2D association across various ethnicities [3, 4, [4,7,29,31]. Though previous reports from Omori et al., 2008, Wu et al., 2008and Ng et al., 2008 from Asian populations showed a positive association, these results were not replicated by Sanghera et al., 2008 in an Indian Sikh population which may be attributed to lower sample number and low power of the study [32][33][34]29]. Subsequently, a large scale study by Chauhan et al., 2010 in northern and western Indian population showed a strong association of this polymorphism with T2D [21]. In the present study, we found a positive association of this polymorphism with T2D with an effect size of 1.33, 95% CI 1.10-1.60, p = 0.002. While the significance of SLC30A8 and TCF7L2 as a key gene for T2D susceptibility has been well known, the genetic variants in these genes could account for about~20% of all T2D cases in the Caucasian population [35,36]. The PAR of SLC30A8 and TCF7L2 polymorphisms in Caucasian population was 30.4% and 18.4% respectively which is much higher when compared to our population, and this could be attributed to the higher risk allele frequency in Caucasian population. Further in our allele dosage analysis, we have identified a particular subset of individuals at different risk of disease by weighing against the individuals with smallest number of risk alleles in comparison to those with the majority of risk alleles. This in turn allowed us to identify subgroups of the population with distinctly differing risk for disease. In our population, we were able to distinguish 14.5% of individuals carrying >13-14 risk alleles that had four times increased risk of T2D when compared to 1.3% of individuals with >5-6 risk alleles. The high risk group also had over thrice the OR for T2D than those with the lower number of risk alleles. Thus, variants are not predominantly discriminative but only explain a small percentage of heritability of T2D. However, instead of focusing on individuals with elevated risk alleles at the population level, the efficacy of genetic test perhaps is better utilized by means of ROC curves. The nine T2D variants though had an insufficient discriminatory ability with an AUC of 0.536; a marginal increase was obtained when it was analyzed with BMI with an AUC value of 0.624.
Another major finding of our study was the establishment of gene interaction model between SLC30A8, IGF2BP2, HHEX, and CDKN2A genes towards T2D susceptibility by MDR analysis. Among the five models for all the loci tested (nine SNPs of nine candidate genes), the best gene-gene interaction model identified was a four locus model which includes SLC30A8 (rs13266634), IGF2BP2 (rs4402960), HHEX (rs1111875) and CDKN2A (rs10811661). Individuals heterozygous for at least three out of four loci were shown to cluster in high risk groups. In this model the combination of HHEX (rs1111875) CC homozygotes, SLC30A8 (rs13266634) CC homozygotes, IGF2BP2 (rs4402960) GG homozygotes and CDKN2A (rs10811661) TT homozygotes were associated with reduced risk to T2D. Our study did not show any independent association of IGF2BP2 (rs4402960), HHEX (rs1111875) and CDKN2A (rs10811661) with T2D. However in MDR analysis, combined effect of these polymorphisms with SLC30A8 (rs13266634) was associated with increased risk of T2D which suggest a cross talk between these genes in the pathogenesis of T2D. Further, a synergistic interaction among these SNPs suggests a significant role of epistasis in the susceptibility of T2D. However, it is also observed that all the SNPs which were included in this study showed a weak synergistic effect with each other and this suggests the interaction of susceptibility alleles with environmental and life style factors. Nevertheless, the risk allele frequencies of these variants in the tested population were entirely different with those from Caucasian population and thus, the absence of positive association towards T2D in remaining loci points towards the contribution of these polymorphisms to T2D susceptibility is not convincing across different ethnicities or to a particular genetic background. To the best of our knowledge, this is the first report from south Indian population trying to investigate the role of these polymorphisms individually and cumulatively towards T2D susceptibility. Nonetheless, further replication studies are warranted to validate these and newer susceptibility loci which are identified from large scale GWAS in independent samples collected from other ethnic groups.
Supporting Information S1 Table. List of genotype and clinical characteristics of study subjects. (XLSX)