Genome-Wide Interaction with Insulin Secretion Loci Reveals Novel Loci for Type 2 Diabetes in African Americans

Type 2 diabetes (T2D) is the result of metabolic defects in insulin secretion and insulin sensitivity, yet most T2D loci identified to date influence insulin secretion. We hypothesized that T2D loci, particularly those affecting insulin sensitivity, can be identified through interaction with insulin secretion loci. To test this hypothesis, single nucleotide polymorphisms (SNPs) associated with acute insulin response to glucose (AIRg), a dynamic measure of first-phase insulin secretion, were identified in African Americans from the Insulin Resistance Atherosclerosis Family Study (IRASFS; n = 492 subjects). These SNPs were tested for interaction, individually and jointly as a genetic risk score (GRS), using genome-wide association study (GWAS) data from five cohorts (ARIC, CARDIA, JHS, MESA, WFSM; n = 2,725 cases, 4,167 controls) with T2D as the outcome. In single variant analyses, suggestively significant (Pinteraction<5×10−6) interactions were observed at several loci including LYPLAL1 (rs10746381), CHN2 (rs7796525), and EXOC1 (rs4289500). Notable AIRg GRS interactions were observed with SAMD4A (rs11627203) and UTRN (rs17074194). These data support the hypothesis that additional genetic factors contributing to T2D risk can be identified by interactions with insulin secretion loci.


Introduction
Although common variants examined in genome-wide association studies (GWAS) have iden-tified~80 loci associated with T2D risk, these variants explain only about 15% of T2D heritability [1,2]. A portion of the missing heritability may be explained by epistasis, which occurs when a genetic risk factor is modified by other factors in an individual's genetic background [3]. Epistasis, or gene-gene interaction, analyses may facilitate the detection of novel loci when non-additive effects exist, but may also provide novel insights illuminating biological mechanisms underlying complex diseases such as T2D [4].
T2D is characterized by impaired insulin secretion arising from pancreatic beta-cell dysfunction and insulin resistance in hepatic, skeletal muscle, and other peripheral tissues, leading to decreased plasma glucose uptake. However, documented T2D loci primarily map to genes influencing insulin secretion or other aspects of beta-cell biology [1]. Given the underlying bimodal pathophysiology, T2D may be a particularly well-suited disease model for hypothesisdriven investigation of epistatic interactions. Genetic insults to both insulin secretion and insulin sensitivity may jointly increase an individual's T2D risk in a non-additive manner. Considering the higher prevalence rate of T2D, insulin resistance, and obesity, African Americans are optimal for the study of genetic interactions that contribute to T2D risk.
In an effort to identify interactions contributing to T2D and to discover novel insulin sensitivity loci, we hypothesized that T2D risk loci, particularly those affecting insulin sensitivity, could be identified by interaction analyses with insulin secretion loci. In cross-sectional metaanalyses of five T2D studies (ARIC, CARDIA, JHS, MESA, and WFSM), we tested whether 5 insulin secretion SNPs, or a genetic risk score summarizing these SNPs, modified genomewide SNP associations with T2D risk.

Research Design and Methods Subjects
Two sources of data were analyzed in this study. Primary inferences of association with insulin secretion were derived from African American participants (n = 492 individuals from 42 families) in the Insulin Resistance Atherosclerosis Family Study (IRASFS), a metabolically wellcharacterized cohort [5]. Glucose homeostasis traits were measured by the frequently sampled intravenous glucose tolerance test (FSIGT) [5]. Briefly, a 50% glucose solution (0.3g/kg) and regular human insulin (0.03units/kg) were injected intravenously at 0 and 20 minutes, respectively. Blood was collected at −5, 2, 4, 8, 19, 22, 30, 40, 50, 70, 100, and 180 minutes for measurement of plasma glucose and insulin. AIR g was calculated as the increase in insulin at 2-8 minutes above the basal (fasting) insulin level after the bolus glucose injection at 0-1 minute. Insulin sensitivity (S I ) was calculated by mathematical modeling using the MINMOD program (version 3.0 [1994]) [6]. Disposition index (DI) was calculated as the product of S I and AIR g .
Inferences of genome-wide epistatic interaction with insulin secretion loci for T2D susceptibility were derived from African American participants from the Atherosclerosis Risk in Communities Study (ARIC; n = 955 T2D cases, 414 controls), Coronary Artery Risk Development in Young Adults (CARDIA; n = 94 T2D cases, 654 controls), Jackson Heart Study (JHS; n = 333 T2D cases, 1,450 controls), Multi-Ethnic Study of Atherosclerosis (MESA; n = 411 T2D cases, 793 controls), and the Wake Forest School of Medicine (WFSM; n = 932 T2D cases, 856 controls) cohorts for a total of 2,725 T2D cases and 4,167 controls [7][8][9][10][11][12]. T2D was diagnosed according to the American Diabetes Association criteria with at least one of the following: fasting glucose !126 mg/dL, 2-h oral glucose tolerance test glucose !200 mg/dL, random glucose !200 mg/dL, use of oral hypoglycemic agents and/or insulin, or physician diagnosed IRB approval was obtained at all sites and all participants provided written informed consent. Descriptions of the T2D study cohorts and funding sources for each individual study are summarized in S1 Appendix.

Genotyping, imputation, and quality control
For the IRASFS samples, genotyping and quality control were performed at the Wake Forest Center for Genomics and Personalized Medicine Research using the Illumina Infinium HumanExome BeadChip v1.0 as previously described [13]. Briefly, the exome chip contained 247,870 variants (92% protein coding), In addition, the chip included 64 SNPs associated with T2D from previous GWAS in Europeans, many of which have been implicated in insulin secretion (exome chip design: http://genome.sph.umich.edu/wiki/Exome_Chip_Design). Sample and autosomal SNP call rates were !99%, and SNPs with poor cluster separation (<0.35) were excluded. Mendelian errors were identified using PedCheck [14] and resolved by removing conflicting genotypes. Hardy-Weinberg Equilibrium (HWE) was assessed in unrelated samples (n = 39) using PLINK (http://pngu.mgh.harvard.edu/purcell/plink) [15] to reduce biases introduced by familial allele frequencies. All variants were in accordance with HWE (P>1x10 -5 ).
As described in detail previously [16], the T2D study samples were genotyped using the Affymetrix Genome-Wide Human SNP Array 6.0. For the ARIC, CARDIA, JHS, and MESA cohorts, genotyping and quality control were completed by the National Heart, Lung, and Blood Institute's (NHLBI's) Candidate Gene Association Resource (CARe) at the Broad Institute [17]. Genotyping for the WFSM study was performed at the Center for Inherited Disease Research (CIDR). For all T2D studies, imputation was performed using MACH with the function-mle (version 1.0.16, http://www.sph.umich.edu/csg/abecasis/MaCH/) to obtain missing genotypes and replace genotypes inconsistent with reference haplotypes as previously described [18]. SNPs with call rate !95% and minor allele frequency (MAF) !1% that passed study-specific quality control were used for imputation [17,19]. A 1:1 HapMap II (NCBI Build 36) CEU:YRI (European:African) consensus haplotype was used as reference. A total of 2,713,329 to 2,907,086 autosomal SNPs from each GWAS with call rate !95%, MAF !1%, and Hardy-Weinberg P-value !0.0001 for genotyped SNPs and MAF !1% and RSQ !0.5 for imputed SNPs were included in subsequent data analyses.

Principal component analysis
For IRASFS, admixture was estimated using principal components (PCs) from 39 ancestry informative markers (AIMs) and including HapMap CEU and YRI samples for comparison [20]. Only PC1 correlated with HapMap populations, and was thus used as a covariate in all analyses.
For the T2D studies, PCs were computed for each study using high-quality SNPs as previously described [13,[17][18][19]21]. The first PC was highly correlated (r 2 >0.87) with global African-European ancestry, as measured by ANCESTRYMAP [22], STRUCTURE [23], or FRAPPE [24]. The African American T2D study samples had an average of 80% African ancestry. By analyzing unrelated samples from all studies using SMARTPCA [21], only the first PC appeared to account for substantial genetic variation (data not shown), whereas the subsequent PCs may reflect sampling noise and/or relatedness in samples [22]. The first PC (PC1) was used as a covariate in all analyses to adjust for population substructure.

Analysis of association with measures of glucose homeostasis in IRASFS
To approximate a normal distribution, trait values were transformed by square root (AIR g , DI) or natural logarithm plus a constant (S I ). Measured genotype association analyses of exome chip variants with AIR g , S I , and DI were performed under an additive model using the variance components method implemented in Sequential Oligogenic Linkage Analysis Routines (SOLAR) [25] with adjustment for age, gender, body mass index (BMI), and PC1.

Genetic risk score construction
We further explored our interaction approach by constructing genetic risk scores (GRS), both weighted and unweighted, summarizing the effects of selected AIR g SNPs. The AIR g GRS was created using the AIR g -lowering effect alleles for AIR g SNPs (i.e. reflecting poorer insulin secretion) from IRASFS ( Table 1). The unweighted risk scores were calculated by summation of the number of risk alleles for each individual across all selected SNPs. The weighted AIR g GRS was calculated as the sum of risk alleles at each locus multiplied by their AIR g effect size from IRASFS. Missing genotypes for a given SNP were imputed as the average number of risk alleles across all samples. The association of each GRS with both AIR g and DI, a combinatorial measure of first-phase insulin secretion and insulin sensitivity, were evaluated in IRASFS using the variance components method implemented in SOLAR [25] adjusted for age, gender, and ancestry proportions.

Analysis of interaction for T2D risk in the T2D studies
Logistic regression with T2D as the outcome was modeled including genetic main and interaction effects as well as AIR g SNP genotype or GRS, age, sex, and principal component covariates for all samples. Additional models included adjustment for BMI, and individuals with missing values were excluded (n = 110).
Y is the log odds of T2D, S is the genotype for the AIR g SNP or the GRS, G is the genotype of the genomic variant, and C is the vector of all remaining covariates. The AIR g SNP genotype, both in single-variant analysis and GRS construction, were additively coded with values of 0, 1, and 2 corresponding to dosage of the AIRg-lowering allele. Genomic variant genotypes were additively coded with values of 0, 1, and 2 corresponding to an arbitrarily selected allele dosage of measured genotype or best-guess imputed genotype. Hypothesis testing included a Wald test statistic following a chi-squared distribution with 1 degree-of-freedom under the null H 0 :β 3 = 0. In each study, PLINK was used to calculate the interaction effect β 3 . Interaction results with extreme values (absolute β or SE>10), primarily due to low cell counts, were excluded. A large number of SNPs with β and SE outliers were excluded in interaction analyses with the AIR g SNP rs1552224 (2,466,070 to 2,479,156 SNPs excluded) due to its low frequency in the T2D study samples (combined MAF = 0.03). For interaction analyses with all other SNPs and risk scores, the number of SNPs excluded as outliers ranged from 0 to 17,000. Interaction results were combined by fixed-effect inverse variance weighting for each candidate SNP or GRS in METAL (http://www.sph.umich.edu/csg/abecasis/metal/). Each meta-analysis contained results for 486,148 to 2,965,304 SNPs.

Candidate insulin secretion SNP selection
The characteristics of IRASFS subjects are shown in S1 Table. Samples included 492 African Americans with mean age 41.2 years and mean BMI 29.1 kg/m 2 . Average African ancestry proportion was 0.75. FSIGT was performed for all subjects without T2D (n = 492) to assess measures including insulin secretion (AIR g ), insulin sensitivity index (S I ), and disposition index (DI).

Interaction analysis
The selected SNPs were examined for genome-wide first order multiplicative interactions with 1) individual insulin secretion SNPs and 2) risk scores summarizing these insulin secretion SNPs. To maximize power, these analyses were performed in five studies (ARIC, CARDIA, JHS, MESA, and WFSM) including 2,725 T2D cases and 4,167 non-diabetic controls and results were meta-analyzed. Representative meta-analysis q-q plots are provided in S1 and S2 Figs. The magnitude of interaction effects cannot be interpreted in terms of disease risk because one must consider the context of marginal effects from each locus when an interaction exists. For this reason, interaction effects have been presented as beta values, not odds ratios, in Tables  2 and 3.
The characteristics of T2D case (n = 2,725) and control subjects (n = 4,167) for each study cohort are shown in S2 Table. Mean age at examination ranged from 38.2 (CARDIA) to 67.6 (MESA) years. Mean age at diagnosis for T2D cases ranged from 35.0 (CARDIA) to 54.6 (MESA) years. In all cohorts except WFSM, BMI was >3 kg/m 2 higher in cases compared to controls.

GRS validation and interaction analysis
Each GRS was tested for association with AIR g and DI under an additive model using the variance components method with adjustment for age, gender, and PC1 in IRASFS (S3 Table). The weighted AIR g GRS was associated with AIR g in IRASFS with or without BMI adjustment (P = 4.96x10 -2 and P = 3.34x10 -2 , respectively). Since the weighted risk score was associated with measures of glucose homeostasis, analysis of this risk score was emphasized in the tests for genome-wide interaction in the ARIC, CARDIA, JHS, MESA, and WFSM cohorts.
Meta-analyzed estimates of genome-wide interactions with the weighted AIR g GRS are presented in Table 3. No interactions met conventional GWAS thresholds for significance. However, seven interactions with the weighted AIR g GRS reached a suggestive level of significance (P interaction <5x10 -6 ; Table 3). In the AIR g GRS interaction analysis the most significant association was observed with interacting SNP rs4975846 (Table 3; P interaction = 2.03x10 -7 ). This is an intergenic SNP upstream of the gene MRPL36, which encodes mitochondrial ribosomal protein L36. This same interacting SNP was implicated in single variant analyses with AIR g SNP rs10403702 (LGALS16). Other notable interacting SNPs included rs11627203 (SAMD4A, P interaction = 6.57x10 -7 ) and rs17074194 (UTRN, P interaction = 1.87x-10 −6 ). Top interactions with the AIR g GRS remained robust after BMI adjustment.
For comparison purposes, meta-analysis of single-SNP association performed under an additive logistic regression model with adjustment for age, sex, and principal component covariates was conducted for all interaction SNPs presented in Tables 2 and 3 (S4 Table). None of these SNPs had an association P<0.05.

Discussion
Meta-analyses of five African American T2D studies did not reveal statistically significant firstorder interactions with insulin secretion SNPs or composite risk scores. However, the observed interactions (P interaction <5×10 −6 ) suggest that a candidate insulin secretion SNP/GRS interaction approach is a valid method for identifying insulin sensitivity and T2D risk loci. For example, analyses with the AIR g SNP rs10403702 (LGALS16) revealed an interaction with rs10746381, an intergenic SNP upstream of the LYPLAL1 gene encoding lysophospholipaselike 1. Variants at this locus have previously been associated with T2D, fasting insulin, HDLcholesterol, triglycerides, and measures of body fat distribution, a signature common to insulin sensitivity loci [26][27][28][29]. The AIR g SNP rs10792837 (EED) showed interaction with rs7796525, an intronic SNP in CHN2. Interaction of CHN2 with the insulin receptor gene INSR results in severe insulin resistance [30]. Additionally the AIR g SNP rs10830963 (MTNR1B) displayed interaction with rs4289500, an intergenic SNP upstream of EXOC1, which encodes a component of the exocyst complex, required for translocation of glucose transporter type 4 (GLUT4) vesicles to the plasma membrane in insulin-stimulated glucose uptake [31].
Several genes related to pancreatic beta-cell function were also identified; suggesting interactions are not limited to insulin resistance as in our initial hypothesis. Analyses with AIR g SNP rs3827147 (C20orf96) revealed interactions with rs10483995, an intronic SNP in KCNK10. A study of pancreatic beta-cell line MIN6 cells has shown that Kcnk10, a member of the twopore domain potassium channels, may regulate depolarization-induced secretion of insulin in these cells [32]. The AIR g SNP rs1582418 (PTTG1) interaction analyses identified an interaction with an intergenic SNP upstream of CALM1 (rs6575130). Inhibition of the product of this gene, calmodulin, with phenothiazines inhibits glucose-stimulated insulin secretion in isolated rat islet cells [33][34][35][36][37]. Several other variants detected in our analyses show interactions with similar biological relationships to insulin secretion and T2D.
Interestingly, we observed interactions discrete for individual loci. For example, analyses with rs10830963 (MTNR1B) revealed an interaction with rs2640666, an intronic variant downstream of MTRR. MTRR encodes 5-methyltetrahydrofolate-homocysteine methyltransferase reductase, implicated in metabolic syndrome [38] and epigenetic instability [39]. MTRR is highly expressed in the pineal gland in a circadian pattern [40], an intriguing result considering that the ligand for melatonin receptor 1B (encoded by MTNR1B), melatonin, is secreted from the pineal gland in a circadian pattern. Also revealed with rs10830963 was an association with rs4289500, which is~200 kb upstream of CLOCK, a classic circadian clock gene. Further, analyses with other insulin secretion SNPs revealed genes involved in epigenetic modification. Analyses with rs3827147 (C20orf96) revealed an interaction with rs10975898, an intronic SNP in KDM4C which encodes a lysine-specific demethylase. We also identified 2 dystrophinrelated genes. Analyses with rs10403702 (LGALS16) revealed an interaction with rs1655028, an intronic SNP upstream of SNTB1. This gene encodes beta 1 syntrophin, a peripheral membrane protein that associates with dystrophin and dystrophin-related proteins. Evaluations of the AIR g GRS revealed an interaction with rs17074194, an intronic SNP in UTRN. This gene encodes utrophin, a cytoskeletal protein which has homology with dystrophin and is found at the neuromuscular synapse and myotendinous junctions in muscle cells. These observations may reflect different, input-dependent physiological characteristics of interaction results, and may lead to mechanistic insights about the underlying causes of T2D and defects in glucose homeostasis in expanded analyses.
Although results varied widely between interaction analyses, interaction with one locus, MRPL36, was replicated in multiple analyses. Functional characteristics of MRPL36 related to T2D and glucose homeostasis pathophysiology are not evident in the current literature.
Previous GWAS have largely ignored epistatic contributions to T2D risk due to the heavy multiple testing burden and computational challenges of exhaustive analytical approaches, and when they have considered this contribution, results have not been striking. For example, a recent genome-wide scan for two-locus interactions in the Wellcome Trust Case Control Consortium T2D GWAS data did not reveal any significant epistatic signals at a Bonferroni-corrected p-value threshold of 2.14x10 -11 after adjusting for the main effects of the most strongly associated T2D locus, TCF7L2 [41]. Further, Herold et al. estimated that analysis of all pairwise interactions among 550,000 SNPs in 1,200 samples on a 3 GHz computer would require a running time of 120 days [42]. The interaction analysis presented here overcomes the issue of a heavy multiple testing burden by using a candidate SNP approach. A recent study by Becker et al. demonstrated that a multiple test correction of 0.4m, where m is the number of SNP pairs tested, is sufficiently conservative for large-scale allelic interaction tests [43]. Further, Babron et al. show that a correction for the effective number of SNP pairs is equally sufficient [44]. Li et al. previously demonstrated that the effective number of SNPs for an imputed dataset is 10 6 . These findings suggest that a significance threshold of 1x10 -8 is appropriate for this study.
We did not detect interactions even at the conventional GWAS threshold of 5x10 -8 in the current study. In part, this likely reflects the challenge of inherently reduced power of interaction models due to the low frequency of compound genotypes [45]. Computational resources required for this study were equivalent to the requirements for running 12 GWAS (5 candidate insulin secretion SNPs plus a GRS, with and without BMI adjustment). This is a significant reduction compared to exhaustive approaches examining genome-wide interactions with all available SNP pairs. Although the results of a meta-analysis of five cohorts are presented here, this study lacks replication of the observed interaction associations in additional studies. Future work will require examination of associated interactions in African American and other populations.
In summary, our findings demonstrate that genome-wide interaction studies with selected insulin secretion variants is a powerful approach for the detection of T2D risk, insulin secretion, and insulin sensitivity loci. The use of a high-quality measure of first-phase insulin secretion, AIR g , to identify candidate interaction SNPs yielded compelling associations. These results justify an expansion of the current study and further investigation of putative insulin sensitivity loci, namely LYPLAL1, CHN2, and EXOC1.
Supporting Information S1 Appendix. Descriptions of the T2D study cohorts. (DOCX) S1 Table. Descriptive characteristics of IRASFS African Americans. Ã Data are shown as count, mean, percentage, or mean ± SD or percentage. (DOCX) S2 Table. Descriptive characteristics of African American diabetes case and control subjects. Data are shown as count, percentage, or mean ± SD. Ã Age and BMI are shown for the last available visit for the prospective studies including ARIC, CARDIA, and MESA (Exam 4); and the baseline visit for JHS and WFSM. (DOCX) S3 Table. Association of AIR g GRS with AIR g and DI in IRASFS. Ã Model 1 is adjusted for age, gender, and PC1. †Model 2 is adjusted for age, gender, PC1, and BMI. (DOCX) S4 Table. Single-SNP association results for interacting SNPs. a SNP interacting with the selected AIR g SNP or the weighted AIR g GRS with nearest gene within 500 kb in parentheses. b Meta-analyzed effect size, standard error, and p-value from association models adjusted for age, gender, and PC1. c Heterogeneity p-values across studies from association models adjusted for age, gender, and PC1.