Post Genome-Wide Association Studies of Novel Genes Associated with Type 2 Diabetes Show Gene-Gene Interaction and High Predictive Value

Background Recently, several Genome Wide Association (GWA) studies in populations of European descent have identified and validated novel single nucleotide polymorphisms (SNPs), highly associated with type 2 diabetes (T2D). Our aims were to validate these markers in other European and non-European populations, then to assess their combined effect in a large French study comparing T2D and normal glucose tolerant (NGT) individuals. Methodology/Principal Findings In the same French population analyzed in our previous GWA study (3,295 T2D and 3,595 NGT), strong associations with T2D were found for CDKAL1 (ORrs7756992 = 1.30[1.19–1.42], P = 2.3×10−9), CDKN2A/2B (ORrs10811661 = 0.74[0.66–0.82], P = 3.5×10−8) and more modestly for IGFBP2 (ORrs1470579 = 1.17[1.07–1.27], P = 0.0003) SNPs. These results were replicated in both Israeli Ashkenazi (577 T2D and 552 NGT) and Austrian (504 T2D and 753 NGT) populations (except for CDKAL1) but not in the Moroccan population (521 T2D and 423 NGT). In the overall group of French subjects (4,232 T2D and 4,595 NGT), IGFBP2 and CXCR4 synergistically interacted with (LOC38776, SLC30A8, HHEX) and (NGN3, CDKN2A/2B), respectively, encoding for proteins presumably regulating pancreatic endocrine cell development and function. The T2D risk increased strongly when risk alleles, including the previously discovered T2D-associated TCF7L2 rs7903146 SNP, were combined (8.68-fold for the 14% of French individuals carrying 18 to 30 risk alleles with an allelic OR of 1.24). With an area under the ROC curve of 0.86, only 15 novel loci were necessary to discriminate French individuals susceptible to develop T2D. Conclusions/Significance In addition to TCF7L2, SLC30A8 and HHEX, initially identified by the French GWA scan, CDKAL1, IGFBP2 and CDKN2A/2B strongly associate with T2D in French individuals, and mostly in populations of Central European descent but not in Moroccan subjects. Genes expressed in the pancreas interact together and their combined effect dramatically increases the risk for T2D, opening avenues for the development of genetic prediction tests.


Introduction
Recently, genome-wide association (GWA) studies have revealed new single nucleotide polymorphisms (SNPs) that are strongly associated with type 2 diabetes (T2D) [1][2][3][4][5]. In our French study, we showed that the well known rs7903146 TCF7L2 polymorphism ranked first for its effect on T2D prevalence followed by four new risk loci: SLC30A8, HHEX, LOC387761 and EXT2 [1]. Subsequently, GWA studies in Finnish, English, Icelandic and Danish populations emphasized the role of CDKAL1, CDKN2A/2B and IGFBP2 on T2D and confirmed the effect of TCF7L2, SLC30A8 and HHEX [2][3][4][5]. Additional SNPs, located in MMP26, LDLR, KCTD12, CAMTA1, NGN3, CXCR4, LOC646279, were also among the 15 first signals in the joint stage I and fast track stage II analyses of the French study [1], but their current status is uncertain as they did not rank high in the other GWA scans [2][3][4][5]. Although the other loci listed above are likely to be true T2D markers, the actual impact of these genetic variants in other European and non-European populations as well as their cumulative effects and potential interactions remain to be determined.
In the present study, we first investigated the association of CDKAL1, CDKN2A/2B and IGFBP2 SNPs with T2D in the French population that we analyzed in our GWA study followed by a fast track replication [3,295 T2D and 3,595 normal glucose tolerant (NGT)] [1]. Then, 22 SNPs in 14 loci previously associated with T2D in GWA studies were selected and analyzed in four additional independent populations of European [French (937 T2D and 1,000 NGT), Austrian (504 T2D and 753 NGT), Israeli Ashkenazi (577 T2D and 552 NGT)] and Moroccan Arabic (521 T2D and 423 NGT) origin. Finally, in the overall group of French individuals (4,232 T2D and 4,595 NGT), we assessed possible gene-gene interactions and determined the cumulative genetic risk of carrying risk alleles on T2D prevalence.

Results
Clinical characteristics of each population are reported in Table  S1. Minimum detectable effect size with a statistical power of 80% was calculated for each SNP in all studied samples and linkage disequilibrium was assessed between genetic variants located in the same locus (Table S2).

Association studies in our original set of French individuals
We first studied whether 3 risk loci (7 SNPs) Table 1). In this population and for each SNP, the best mode of inheritance (P max ) was found to be multiplicative.

Replication studies in 4 additional independent populations
The 22 SNPs located in the 14 loci that we identified in our previous study were analyzed in other independent populations. For each polymorphism, the best fitting genetic model was selected from our previous whole-genome association study in French individuals [1]. Thus, all polymorphisms were analyzed under a multiplicative genetic model except for EXT2 (dominant model) and KCTD12 (recessive model). For CDKAL1, CDKN2A/2B, IGFBP2, HHEX and EXT2, we studied more than one polymorphism because they have all been reported as associated with T2D in previous studies [2][3][4][5]. We first analyzed another independent French group of more modest sample size (937 T2D and 1,000 NGT) (

Gene-gene interactions and cumulative genetic effects in French subjects
For each studied locus, we selected the most T2D-associated SNP, including TCF7L2 rs7903146 (N = 15) in order to assess their cumulative effects on T2D prevalence as well as potential genegene interactions (i.e., deviation from a multiplicative model) in the complete set of French individuals (4,232 T2D and 4,595 NGT) [6].
The percentages of individuals with increasing numbers of risk alleles in T2D subjects and NGT individuals are shown in Figure 1. For all SNPs, including those of EXT2 and KCTD12, we considered their multiplicative allelic effects on T2D prevalence. The ORs for T2D in subjects carrying increasing numbers of risk alleles are presented in Figure 2 in comparison to the 7.5% of the study population in the reference group that have 0 to 10 risk alleles. After adjustment for age, body mass index (BMI) and gender, each additional risk allele increased the odds of disease by 1.24 [1.21-1.27]. Individuals with at least 18 risk alleles (6.1% of NGT individuals and 14.5% of T2D subjects) had an OR of 8.68 [6.37-11.83] compared to the reference group.
The predictive power of tests can be evaluated by the area under the ROC curve (AUC) [7]. The AUC is a measure of the discriminatory power of the test. A perfect test would have an AUC of 1; a test with no discriminatory power, an AUC of 0.5. For the 15 polymorphisms, the area under the ROC curve is 0.86 (Figure 3), corresponding to a high predictive power.

Discussion
GWA scans in Europeans [1][2][3][4][5] and other populations [8][9][10][11] have recently identified and confirmed new genetic variants associated with T2D. However, the clinical interest of these discoveries will depend on their true contribution in other ethnic groups as well as on their predictive value for T2D risk.
Using the same study design as our French GWA scan (joint stage I and fast track stage II), we confirmed the strong association of CDKAL1, CKN2A/2B and IGFBP2 SNPs with T2D previously found by other GWA scans [2][3][4][5]. Intriguingly, these results were not replicated in another independent French population. This may be due to the modest sample size and/or because T2D cases and NGT controls were not specifically selected from families with or without history of diabetes, respectively, and/or because T2D cases and NGT controls were not matched for BMI. Our data show that the level of association of some SNPs with T2D depends on factors other than just ethnicity. Similar discordant results were also found among European cohorts from Finland [2]. However, we replicated our previous results for SLC30A8, MMP26, CXCR4, LOC387761 and LOC646279 SNPs, suggesting that these loci may be truly involved in T2D risk in the French population.
In another mid-sized population from Central Europe (Austria), associations with T2D were found only with CDKN2A/2B, IGFBP2, SLC30A8, and NGN3 SNPs supporting the value of using large homogeneous populations to characterize the T2D genetic architecture in a given population. In Israeli Ashkenazi individuals, in whom one EXT2 SNP was found to be a T2D genetic marker, CDKAL1, CDKN2A/2B and IGFBP2 SNPs were associated with T2D, as in French subjects. Even though the TCF7L2 rs790146 SNP was previously associated with T2D in this Moroccan group [12], no association was detected with any of the newly studied SNPs. While this lack of association may be due to the modest sample size or hidden biases such as population stratification, it may also signify that other SNPs in the same genes or located in different genes could be associated in the North African population.
Among the 14 studied loci, only 3 (CDKN2A/2B, IGFBP2 and SLC30A8) were found to be associated with T2D in at least 3 independent case-control groups ( Table 6). Their association was recently confirmed in other subjects of European or Japanese origin, as for CDKAL1 and HHEX [13][14][15][16][17]. For the other loci, further meta-analyses and systematic replications of GWA studies (phase 2) are now necessary to fully evaluate their contribution to T2D risk. In the light of discrepancies between studied populations, GWA scans in non-European ethnic groups may bring additional insight. Further efforts in re-sequencing [18] are also needed to find etiologic variants causing T2D.
There is a growing interest in analyzing the combined effect of these novel loci on T2D susceptibility. This is the first time, to our knowledge, that potential synergistic interactions between novel T2D loci have been shown. IGFBP2 SNPs seem to interact with LOC38776, SLC30A8 and HHEX genetic variants. These data are in agreement with a primary beta-cell dysfunction. IGF2BP2 binds to the key growth and insulin signaling molecule insulin-like growth factor 2 (IGFII) and is highly expressed in pancreatic islets [3]. The zinc transporter ZnT8 (SLC30A8 protein) is specifically expressed in pancreatic endocrine cells and may participate in regulating insulin exocytosis [19,20]. Simi- larly, the CXCR4 risk variant was found to interact with NGN3 and CDKN2A/2B SNPs. It was reported that CXCR4-positive pancreatic cells express markers of pancreatic endocrine progenitors such as NGN3 [21]. CDKN2A could be a possible biological candidate for T2D [22] as its over-expression in rodents causes a decrease in islet proliferation [23]. The emerging picture of these possible interactions emphasizes the ability of multiple SNPs to potentiate their deleterious effects in both beta-cell development and function. The evaluation of the T2D risk in individuals carrying increasing numbers of risk variants is critical for a potential clinical use of a genetic test in the general population. It was previously shown that UK individuals with TCF7L2, PPARG and KCNJ11 risk alleles had an OR of 5.71-fold (95% CI, 1.15 to 28.3) compared to those with no risk alleles, for an AUC of 0.58 [6]. It has been suggested that 20-25 risk variants with allele frequencies greater than 0.1 and ORs of 1.5 are required for an AUC of about 0.8 [24]. In our study, with only 15 SNPs, we reached a good discriminating power to identify individuals with high susceptibility for T2D. After adjustments for age, BMI and gender, subjects with at least 18 risk alleles (14.5% of French T2D subjects) had approximately 9-fold higher risk of developing T2D compared to the reference group, for an AUC of 0.86. If confirmed, this increase in T2D risk due to genetic factors is even higher than that due to severe obesity (OR = 7.37) [25], the most established T2D risk factor.
After many years of limited success, the genetic architecture of T2D (common SNPs) is finally being uncovered by GWA studies. Even though the absence or presence of association in a given population is dependent on many factors (e.g., number of individuals, ethnicity, SNP prevalence, BMI, familial history of T2D and others), replication studies certainly help in validating SNPs truly associated with T2D and in excluding false positives.
Our data support the concept that T2D loci may interact together. Consequently, while single polygenic susceptibility variants may be of limited use in disease prediction, the combined information from a number of these variants should allow the identification of groups of subjects at high and low risk of developing a complex disease [26]. Hence, our study opens up the way to new applications in public health, based on early genetic testing for better prevention and care.

Study design
We first investigated the association with T2D of CDKAL1, CDKN2A/2B and IGFBP2 SNPs in the same French population analyzed in our previous GWA study (3,295 T2D and 3,595 NGT). These 3 loci were selected for their high degree of association with T2D in GWA studies and their ability to be replicated in different populations of European origin [2][3][4][5]. In addition to these 3 loci, the role of 11 additional loci for which we found the 15 highest SNP association signals (except for TCF7L2) in the joint stage I and fast track stage II of the French GWA scan [1] (SLC30A8, HHEX, EXT2, LOC646279, MMP26, KCTD12, LDLR, CAMTA1, LOC38776, NGN3 and CXCR4) was evaluated in 4 additional independent groups (2,539 T2D and 2,728 NGT) of   European (French, Austrian and Israeli Ashkenazi) and non-European (Moroccan) origin. Finally, in all French subjects, we assessed the cumulative genetic risk of carrying the studied risk alleles on T2D prevalence and their possible interactions, including the previously discovered T2D-associated TCF7L2 rs7903146 SNP.

Study populations
The main clinical characteristics of each studied population were presented in Table S1. The first French group of T2D subjects and NGT controls was previously described [1].The second set of French samples includes NGT individuals from the ''Supplémentation en Vitamines et Minéraux Antioxydants'' (SU.VI.MAX) cohort [27] and T2D subjects from the ''DIABete de type 2, NEPHROpathie et GENEtique'' (DIAB2.NEPHRO.-GENE) study [28].
The SU.VI.MAX study was a French population-based prevention trial designed to evaluate the impact of a daily antioxidant supplementation at nutritional doses on the incidence of ischemic heart disease and cancer. All participants were recruited from throughout France between October 1994 and June 1995. [29]. The NGT controls were selected when having no hypoglycemic treatment and a fasting glucose ,6.1 mmol/l. At baseline, a 35-mL venous blood sample was obtained from participants who had been fasting for 12 hr at the time of the visit. The samples were collected in vacutainer tubes that do not interfere with the concentration of trace elements (Becton Dickinson). After collection, blood was kept at +4uC in the dark until centrifugation. Centrifugation was realized at standard time, gravity, and temperature. The time elapsing between collection and aliquoting was recorded for all samples; it was less than 1 hr. Aliquots were stored at 220uC in the mobile units and field centers for, at the most, 7 days prior to shipment in dry ice to the reference laboratories and coordinating center.
The DIAB2.NEPHRO.GENE study was a French multi-center case-control study (15 diabetes and 5 nephrology centers from throughout France between 2001 and 2004) designed to assess the genetic determinants of diabetic nephropathy in type 2 diabetes. T2D was diagnosed on clinically determined absence of type 1 or secondary diabetes, in individuals of more than 40 years of age at onset and without insulin treatment within 2 years after disease onset. Each patient record was carefully checked by an adjudication committee to ascertain T2D status and diabetic complications. 21 ml of blood drawn on EDTA were used for DNA extraction using standard procedures (ethanol precipitation) and samples were stored at -80uc until use. The Austrian subjects were of Bavarian and Austrian German descent and came from the greater region of Salzburg, Austria. They were recruited between April 1999 and December 2002. Unrelated patients with type 2 diabetes were recruited from diabetes outpatient clinics of the Landeskliniken Salzburg and the Hospital Hallein (near Salzburg). The diagnosis of T2D was based on use of hypoglycemic agents or plasma glucose values .126 mg/dl (in absence of treatment). The patients were seen repeatedly and managed by the outpatient clinics. Participants in the Salzburg Atherosclerosis Prevention Program in Subjects at High Individual Risk (SAPHIR) [12,[30][31][32] who were not using hypoglycemic medications and had fasting plasma glucose levels ,110 mg/dL served as NGTcontrols. SAPHIR was a populationbased prospective study that investigated the involvement of factors contributing to the control of plasma lipid transport and carbohydrate metabolism in the progression of atherosclerotic vascular disease. Unrelated men and women subjects with an age range between 39 and 67 years who live in the greater Salzburg region and responded to invitations by their family or workplace physician or to announcement in the local press were included in the study. Average rates of recruitment were 10 controls/week and 3 T2D subjects/week. For all samples, whole blood was collected after an overnight fast in tubes containing 1.6 mg/ml EDTA.
Sladek Plasma was separated ,30 min after collection and used immediately for analysis. Aliquots were stored at -70uC. Aliquots of blood collected in EDTA were stored at -70uC. Genomic DNA was extracted from whole blood and stored at -20uC prior to analysis. All Israeli NGT/T2D subjects lived and were ascertained in Israel with the help of 15 major diabetes treatment centers throughout the country. They were of Ashkenazi Jewish origin, defined as having all 4 grandparents born in Northern or Eastern Europe. Subjects with known or suspected Sephardic Jewish or non-Jewish ancestry were excluded. Thus, only Jewish subjects were recruited and Jewish patients from the Mediterranean basin, the Persian Gulf region (Iraq, Iran), Yemen, Ethiopia and other areas that were populated with Jews before the major Roman exile of 70 C.E. were not included. The T2D patients were ascertained between 2002 and 2004 in 15 diabetes clinics throughout Israel by the ''Israel Diabetes Research Group'' and were defined according to WHO criteria (fasting glucose .140mg/dl on two or more occasions, or random glucose .200 mg/dl on two or more occasions). To avoid lateonset type 1 diabetics, patients who became insulin-dependent within 2 years of diagnosis were excluded. The average age at diagnosis was 47 years old. The NGT controls were defined as Ashkenazi (Northern and Eastern European ancestry) with no history of glucose intolerance or T2D and were purchased from the National Laboratory for the Genetics of Israeli Populations (http://www.tau.ac.il/medicine/NLGIP/nlgip.htm). They denied ever having been diagnosed with elevated blood glucose level, T2D or glucose intolerance. Whole blood samples were obtained in vacuum tubes containing EDTA. The samples were stored at 4uC and transferred to the Endocrine Laboratory at the Hadassah Hospital, Jerusalem Israel within 72h of collection. DNA was extracted using the Puregene Genomic DNA extraction kit purchased from Gentra Systems, Minneapolis, MN, U.S.A. according to the manufacturer's recommendations. Concentrated DNA was stored at 280uC, diluted stocks were stored at 220uC and working solutions were stored at 4uC.
Moroccan subjects were recruited, between February and July 2006, by the Faculty of Medicine (Fes) within the framework of the Genetic project Diabetes Morocco (GenDiabM: ,180 subjects by month) and were subjected to a standardized clinical examination at the Hassan II Hospital and in regional health centers. The NGT and T2D subjects were from two Moroccan regions: Fes-Tounate (central-North region) and Rabat-Sale (western region). Patients with T2D were recruited from a registry of associations for T2D and health centers when they had a family history of T2D in first degree relatives. The diagnosis of T2D was made according to the 1997 American Diabetes Association criteria or on being treated with medications for diabetes. The NGT controls were recruited from an unselected population undergoing a routine health check-up at the same health centers. All control individuals $40 years of age, not previously diagnosed for T2D, with no history of T2D in first-degree relatives, and with fasting plasma glucose ,6.1 mmol/l. Blood and serum samples (9 ml) were collected from all individuals and were immediately stored in frozen conditions (serum at 220uC and blood at 280uC) until use. Genomic DNA was then extracted from whole blood and stored at 220uC.
In all populations, the studied individuals were unrelated to each other (with no first-degree relatives). This genetic study was approved by local Ethical Committees and written informed consent was obtained from all participants.

SNP genotyping
All the polymorphisms were genotyped using an allelic discrimination assay-by-design TaqMan method on ABI 7900 (Applied Biosystems). All genotypic distributions were in Hardy-Weinberg equilibrium. The genotyping success rate was higher than 98% for each SNP. For each SNP, the genotyping error rate was reported in Table S2 and assessed by randomly re-genotyping 384 participants in each population.

Statistical analysis
Odds ratios were assessed by logistic regression models adjusted for age, BMI and gender. In Tables 2-5 and in Table  S2, we tested the effect of the minor allele. In these tables, the risk allele is the minor allele if the OR is .1 and is the major allele if the OR is ,1. The replication of an association with T2D was considered positive on the condition that the risk allele was not different from what was found in previous GWA studies. For each population, a simple Bonferroni correction (multiplication by the number of SNPs) was applied to the P values for multiple comparisons. After correction, no association remained significant, except for those found in our original French population. However, in the context of replication, it remains unlikely to detect an effect due to statistical fluctuation only. We explored the effect of multiple SNPs using a logistic regression model including a variable for the number of risk alleles in order to quantify the risk per supplementary allele for all the variants included in the model. OR corresponding to a given number of risk alleles compared to the reference group was also calculated. We evaluated the ability of this model to discriminate between NGT and T2D individuals with a Receiver Operating Characteristic (ROC) curve [33] using a logistic regression model adjusted for age, gender and BMI including a covariate for the number of risk alleles of the 15 studied variants. The Area Under the ROC Curve (AUC) was calculated as a measure of the discriminative power of the test. Gene-gene interactions (i.e., deviation from a multiplicative model) were tested by comparing a logistic regression model including only the main effects to another model including the main effects and an interaction term with a likelihood ratio test. We further explored the way two variants interact with interaction plots. We used Quanto (http://hydra.usc.edu/GxE/) for power calculations. Pairwise linkage disequilibrium between genetic markers was assessed using the R ''genetics'' package (version 1.3.2). All P values are two-sided. SPSS (version 14.0.2) and R statistics (version 2.5.1) software were used for general statistical analysis.