Meta-Analysis of Genome-Wide Association Studies in African Americans Provides Insights into the Genetic Architecture of Type 2 Diabetes

Type 2 diabetes (T2D) is more prevalent in African Americans than in Europeans. However, little is known about the genetic risk in African Americans despite the recent identification of more than 70 T2D loci primarily by genome-wide association studies (GWAS) in individuals of European ancestry. In order to investigate the genetic architecture of T2D in African Americans, the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium examined 17 GWAS on T2D comprising 8,284 cases and 15,543 controls in African Americans in stage 1 analysis. Single nucleotide polymorphisms (SNPs) association analysis was conducted in each study under the additive model after adjustment for age, sex, study site, and principal components. Meta-analysis of approximately 2.6 million genotyped and imputed SNPs in all studies was conducted using an inverse variance-weighted fixed effect model. Replications were performed to follow up 21 loci in up to 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry. We identified three known loci (TCF7L2, HMGA2 and KCNQ1) and two novel loci (HLA-B and INS-IGF2) at genome-wide significance (4.15×10−94<P<5×10−8, odds ratio (OR) = 1.09 to 1.36). Fine-mapping revealed that 88 of 158 previously identified T2D or glucose homeostasis loci demonstrated nominal to highly significant association (2.2×10−23 < locus-wide P<0.05). These novel and previously identified loci yielded a sibling relative risk of 1.19, explaining 17.5% of the phenotypic variance of T2D on the liability scale in African Americans. Overall, this study identified two novel susceptibility loci for T2D in African Americans. A substantial number of previously reported loci are transferable to African Americans after accounting for linkage disequilibrium, enabling fine mapping of causal variants in trans-ethnic meta-analysis studies.


Introduction
The prevalence of type 2 diabetes (T2D) among adults in the USA is currently 11.3%, with substantially higher prevalence in African Americans (18.7%) than in European Americans (10.2%) [1]. To date, genome-wide association studies (GWAS) have identified .70 susceptibility loci for T2D [2][3][4][5][6][7][8]. While it is known that T2D is heritable in African Americans [9], it is unclear how much heritability is explained by the known genetic associations discovered primarily from European ancestry populations and whether there are risk loci specific to African Americans. Given that individuals of African ancestry tend to harbor more genetic diversity than individuals of other ancestries [10], we hypothesized that large-scale association analyses in African Americans could shed light on the genetic architecture of T2D and the risk attributable to cosmopolitan vs. population-specific variants.

Study overview
We conducted a meta-analysis of 17 African American GWAS on T2D comprising 8,284 cases and 15,543 controls (Tables S1 and  S2). Missing genotypes in individual studies were imputed to one of the HapMap reference panels (Phase II release 21-24 CEU+YRI, Phase II release 22 all populations, Phase II+III release 27 CEU+ YRI, Phase II+III release 27 CEU+YRI+ASW or Phase II+III release 27 all populations) using MACH, IMPUTE2 or BEAGLE (Table S3). Genomic control corrections [11] were applied to each study (l = 1.01-1.08) and after meta-analysis (l = 1.06) due to modest inflated association results (Table S3) [12]. Association results for ,2.6M SNPs were subsequently examined.
From stage 1 meta-analysis, 49 SNPs moderately associated with T2D (P,1610 25 ) and two candidate SNPs near the p value threshold (rs231356 at KCNQ1, P = 2.84610 25 and rs2244020 at HLA-B, P = 1.02610 25 ) totaling 51 SNPs in 21 loci were followed up for replication. rs231356 is 14 kb downstream of the reported T2D index SNP, rs231362, in Europeans [3]. Moderate associations have also been observed across the HLA region in Europeans [3]. The stage 2 replication included in silico and de novo replication in up to 11,544 African American T2D cases and controls, as well as in silico replication in 47,117 individuals of European ancestry from DIAGRAMv2 [3] (Table S4). Metaanalyses were performed to combine results from African Americans (stage 1+2a, n#35,371, Table S4) and both African Americans and Europeans (stage 1+2a+2b, n#82,488, Table S4).

T2D loci reaching genome-wide significance
Five independent loci reached genome-wide significance (P, 5610 28 ). Stage 1 meta-analysis identified the established TCF7L2 locus. Stage 1+2a meta-analysis identified the established KCNQ1 and HMGA2 loci. Stage 1+2a+2b meta-analysis identified a second signal at KCNQ1 and a novel HLA-B locus. Secondary analysis including body mass index (BMI) adjustment in stage 1+ 2a meta-analysis identified the second novel locus at INS-IGF2 (Table 1 and Figure 1). None of the most strongly associated SNPs at these loci demonstrated significant heterogeneity of effect sizes among studies within each stage, between African Americans in stages 1 and 2a, or between African Americans in stage 1+2a and Europeans in stage 2b after Bonferroni correction of multiple comparisons (P het .0.001) ( Figure S1).
Two novel T2D loci were identified. The effect sizes of rs2244020 located near HLA-B were similar in African Americans and Europeans (OR = 1.11 vs. 1.07, P het = 0.26; stage 1+2a+2b P = 6.57610 29 ) (Table 1 and Figure 2). HLA-B encodes the class I major histocompatibility complex involved in antigen presentation in immune responses.
The most strongly associated SNP near INS-IGF2 was rs3842770 in African Americans (OR = 1.14, P = 2.78610 28 , stage 1+2a BMI adjusted, Table 1 and Figure 2) but the risk A allele was absent in the CEU population. Insulin plays a key role in glucose homeostasis. Mutations at INS lead to neonatal diabetes, type 1 diabetes, and hyperinsulinemia [21]. Insulin-like growth factor 2 (IGF2) is involved in growth and development. IGF2 overexpression in transgenic mice leads to islet hyperplasia [22] and IGF2 deficiency in the Goto-Kakizaki rat leads to beta cell mass anomaly [23].
Associations at previously reported T2D and glucose homeostasis loci We investigated index SNPs from 158 independent loci associated with T2D and/or glucose homeostasis from prior genome-wide and candidate gene studies in individuals of European, East Asian, South Asian, or African American ancestry (Table S5). Among the 104 T2D-associated index SNPs, 19 were associated with T2D in stage 1 African American samples (P, 0.05). Most of the 17 T2D-associated SNPs that showed consistent direction of effects had similar effect sizes between this study and prior reports, despite that rs10440833 at CDKAL1 had substantially stronger effect size in Europeans (OR = 1.25) than in African Americans (OR = 1.06, P het = 5.86610 26 ). Additionally, 3 out of 54 trait-increasing alleles from glucose homeostasis-associated index SNPs were associated with increased T2D risk in African Americans (P,0.05).
We also performed a locus-wide analysis to test for associations of all SNPs within the LD region at r 2 $0.3 with the previously reported index SNPs and results were corrected for the effective number of SNPs [24]. Since the causal variant(s) at each locus may be different or reside on different haplotypes across populations with different LD structures, this approach allows the identification of the most strongly associated SNPs in African Americans that may or may not be in LD with the index SNPs reported in other populations. A total of 55 T2D-and 29 glucose-associated loci were associated with T2D in African Americans (P locus ,0.05, corrected for LD in ASW for SNPs within a locus; Table S6). We compared the genetic architecture between the previously reported index SNPs and our fine-mapped SNPs for these 84 loci. The respective average risk allele frequencies were 0.51 and 0.46, and the distributions or pairwise differences of risk allele frequencies were not significantly different (P = 0.255, Wilcoxon rank sum test; and P = 0.295, Wilcoxon signed-rank test, respectively, Figure S2). In contrast, the average odds ratios for the risk alleles were higher for the fine-mapped SNPs as compared to the index SNPs (1.14 vs. 1.05). The distributions and pairwise differences of risk allele odds ratios were significantly different (P = 1.18610 219 and 5.55610 214 , respectively, Figure S2). Thus, the locus-wide analysis identified variants with larger effect sizes and similar allele frequencies.
We leveraged differences in LD between African Americans and Europeans to fine-map and re-annotate several established loci. The association signal spanning ,100 kb at INTS8 in African Americans overlapped the ,200 kb TP53INP1 T2D locus in Europeans [3]. The most strongly associated SNP in MEDIA tended to have larger effect size in African Americans than in Europeans (rs17359493, OR = 1.13 vs. 1.06, P = 1.39610 27 vs. 3.20610 22 , respectively, P het = 0.06) (Table S4). However, rs17359493 at intron 10 of INTS8 was only in weak LD with the reported index SNP rs896854 in Europeans (r 2 = 0.21 in CEU, 0.10 in ASW). Neither the reported index SNP rs896854 nor its proxies from the CEU data demonstrated significant association to T2D in African Americans (Table S6 and Figure S3a,b), suggesting that rs17359493 may be an independent novel signal. INTS8 encodes a subunit of the integrator complex which is involved in the cleavage of small nuclear RNAs. At KCNQ1, the most strongly associated SNP rs231356 was in weak LD with the

Author Summary
Despite the higher prevalence of type 2 diabetes (T2D) in African Americans than in Europeans, recent genome-wide association studies (GWAS) were examined primarily in individuals of European ancestry. In this study, we performed meta-analysis of 17 GWAS in 8,284 cases and 15,543 controls to explore the genetic architecture of T2D in African Americans. Following replication in additional 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry, we identified two novel and three previous reported T2D loci reaching genome-wide significance. We also examined 158 loci previously reported to be associated with T2D or regulating glucose homeostasis. While 56% of these loci were shared between African Americans and the other populations, the strongest associations in African Americans are often found in nearby single nucleotide polymorphisms (SNPs) instead of the original SNPs reported in other populations due to differential genetic architecture across populations. Our results highlight the importance of performing genetic studies in non-European populations to fine map the causal genetic variants. index SNP rs231362 reported in Europeans [3] (r 2 = 0.24 in CEU and 0.17 in ASW). Given rs231362 was modestly associated with T2D in African American (P = 0.04) and was in weak LD (r 2 = 0.21 to 0.46 in CEU) with other associated SNPs in this region (Table S6 and Figure S3c,d), the results suggest a refinement of the localization of causal variant(s) to variants in strong LD with rs231356. At HMGA2, the most strongly associated SNP rs343092 was in moderate LD with the index SNP rs1531343 (r 2 = 0.60 in CEU and 0.32 in ASW). Despite rs1531343 and its proxies in high LD were not associated with T2D in African Americans (P.0.05), several SNPs in moderate LD, including rs343092, showed nominal to strong associations (Table S6 and Figure S3e,f). Trans-ethic fine mapping will be particularly useful to dissect the causal variant(s) at this locus.

Effect of obesity on T2D susceptibility loci
We investigated the influence of obesity by comparing the stage 1 meta-analysis results with or without adjustment for BMI at the 51 most significantly associated SNPs from the GWAS for follow up (Tables S4 and S7) and 158 established T2D or glucose homeostasis index SNPs (Table S5). Association results were highly similar with and without BMI adjustment (correlation coefficients were 0.99 for both effect sizes and 2logP values). Of particular note, FTO is suggested to influence T2D primarily through modulation of adiposity in Europeans [3,25], but evidence is contradictory across multiple ethnic groups [26][27][28]. The index SNP rs11642841 was not significantly associated with T2D in African Americans without and with BMI adjustment (P = 0.06 and 0.23, respectively) ( Table S5). The frequency of the risk A allele was 0.13 in this study. It had 100% power to detect association at the reported OR of 1.13 at type 1 error rate of 0.05, suggesting that FTO is unlikely a key T2D susceptibility gene in African Americans.

Gene expression and bioinformatics analyses
Among the six genome-wide significant loci (Table 1), we found no coding variants in the most significantly associated SNPs or their proxies. These SNPs demonstrated only weak associations with expression quantitative trait loci (eQTLs) (P.0.001, Table   S8). Examination of the ENCODE data [29] revealed that several SNPs at TCF7L2, KCNQ1, and HMGA2 were located at protein binding sites or were predicted to alter motif affinity for transcription factors implicated in energy homeostasis (Table  S9). The most strongly associated SNP rs7903146 in TCF7L2 is predicted to alter the binding affinity for a POU3F2 regulatory motif [30]. POU3F2 is a neural transcription factor that enhances the activation of genes regulated by corticotropin-releasing hormone which stimulates adrenocorticotropic hormone (ACTH). ACTH is synthesized from pre-pro-opiomelanocortin (pre-POMC) which regulates energy homeostasis. For the 39 signal at KCNQ1, several tag SNPs are predicted to alter the binding affinity for regulatory motifs, including SREBP, CTCF and HNF4A. SREBP is a transcription factor involved in sterol biosynthesis. CTCF regulates the expression of IGF2 [31]. HNF4A is a master regulator of hepatocyte and islet transcription. The tag SNP rs2257883 at HMGA2 is predicted to alter the binding affinity of MEF2, which regulates GLUT4 transcription in insulin responsive tissues [32].

Discussion
We have performed the largest genetic association analysis to date for T2D in African Americans. Our data support the hypothesis that risk for T2D is partly attributable to a large number of common variants with small effects [7]. We identified HLA-B and INS-IGF2 as novel T2D loci, the latter specific to African Americans. We found evidence supporting association for 88 previously identified T2D and glucose homeostasis loci. Taken together, these 90 loci yielded a sibling relative risk of 1. 19. The phenotypic variance measured on the liability scale is substantially larger in African Americans than in European Americans (17.5% vs. 5.7%) [7] due to larger effect sizes upon fine-mapping as well as higher disease prevalence in African Americans.
The two novel T2D loci, HLA-B and INS-IGF2, have been implicated in type 1 diabetes (T1D) risk in Europeans [33][34][35]. One limitation of our study is the lack of autoantibody measurement. However, our results are unlikely to be confounded by the presence of misclassified patients. Among diabetic youth aged ,20 years, T2D characterized by insulin resistance without autoimmunity is more prevalent in African Americans (40.1%) than in European Americans (6.2%), while African Americans less often present with autoimmunity and insulin deficiency resembling T1D compared to European Americans (32.5% vs. 62.9%, respectively) [36]. Autoimmunity is also uncommon in African American diabetic adults [37]. Furthermore, associations for T1D are stronger at HLA class II (HLA-DRB1, -DQA1, and -DQB1) than HLA class I regions in Europeans [33][34][38][39][40][41] (http:// www.t1dbase.org). In African Americans, T1D individuals showed both shared and unique risk and protective HLA class II haplotypes as compared to European T1D individuals [42][43]. More importantly, these individuals also showed substantially stronger associations at HLA class II (P,1610 225 ) than class I regions (P,1610 25 ) [42], which is in contradiction with our finding of stronger associations at HLA class I than class II regions in T2D individuals (HLA-B, Figure S4). The observed HLA-B association may be due to LD with nearby causal gene(s) since there is long range LD in this region. Recently, rs3130501 near POU5F1 and TCF19 was reported for association with T2D in a trans-ancestry meta-analysis [8]. rs3130501 was located 211 kb upstream of rs2244020 and mapped to the same LD interval. However, the two SNPs were not correlated in both CEU (D9 = 0.57, r 2 = 0.05) and ASW (D9 = 0.68, r 2 = 0.16) from 1KGP nor strongly associated with T2D in the stage 1 meta-analysis (P = 0.04). Other potential non-HLA candidate genes may include TNFA which regulates immune and inflammatory response. It has been hypothesized that activated innate and adaptive immune cells stimulate release of cytokines such as TNFa and IL-1b, which promote both systemic insulin resistance and b-cell damage [44]. On the other hand, evidence has implicated T1D loci HLA-DQ/ DR, GLIS3 and INS in the susceptibility of latent autoimmune diabetes in adults (LADA) and/or T2D [7,34,[45][46], while T2D loci such as PPARG and TCF7L2 was associated with T1D [47] and LADA [46,48], respectively. More comprehensive studies are needed to understand the shared and distinct genetic risks in different forms of diabetes which will facilitate diagnosis and personalized treatment.
Our results have several implications regarding the genetic architecture of T2D. First, fine-mapping suggests that currently known loci explain more of the risk than previously estimated. Second, the loci conferring the largest risk for T2D appear to act through regulatory rather than protein-coding changes. Third, many, but not all, of the previously identified T2D loci are shared across ancestries. The differential LD structure of African-ancestry populations at shared loci provides an opportunity for fine mapping in trans-ethnic meta-analysis. Fourth, the ,2.6M MEDIA SNPs achieved only 43.3% coverage of the 1KGP ASW common SNPs, suggesting that risk loci that are specific to African-ancestry individuals are difficult to discover with the genotyping arrays being used. Large-scale sequencing studies, such as those focusing on whole genomes, exomes, and targeted resequencing for associated non-coding regions, will be necessary to further delineate the causal variants for T2D risk in African Americans. African American subjects (6,061 cases and 5,483 controls), using in silico replication of GWAS data from eMERGE and IPM Biobank and de novo genotyping in IRAS, IRASFS, SCCS, and WFSM. In general, T2D cases were defined as having at least one of the following: fasting plasma glucose $126 mg/dl, 2 hour glucose during oral glucose tolerance test (OGTT) $200 mg/dl, random glucose $200 mg/dl, oral hypoglycemic agent or insulin treatment, or physician-diagnosed diabetes. All cases were diagnosed at $25 years (or age at study $25 years if age at diagnosis was not available). For cohort studies, individuals who met the criteria at any of the visits were defined as cases. Controls with normal glucose tolerance (NGT) were defined by satisfying all the following criteria: fasting plasma glucose ,100 mg/dl, 2 hour OGTT,140 mg/dl (if available), no treatment of diabetes, and age $25 years. For cohort studies, individuals who met the criteria at all visits were defined as controls. All study participants provided written informed consent, except for eMERGE that use an opt out program, and approval was obtained from the institutional review board (IRB) from the respective local institutions. Detailed descriptions of the participating studies are provided in Text S1.

Genotyping, imputation and quality control
For stage 1 and 2 GWAS studies, genotyping was performed with Affymetrix or Illumina genome-wide SNP arrays. Imputation of missing genotypes was performed using MACH [49], IMPUTE2 [50] or BEAGLE [51] using HapMap reference haplotypes. For each study, samples reflecting duplicates, low call rate, gender mismatch, or population outliers were excluded. In general, SNPs were excluded by the following criteria: call rate , 0.95, minor allele frequency (MAF),0.01, minor allele count , 10, Hardy-Weinberg P-value ,1610 24 , or imputation quality score ,0.5 (Table S3). For de novo replication studies, genotyping was performed using the Sequenom MassArray platform (Sequenom; San Diego, CA). Sample and SNP quality controls were performed as with GWAS data.

Statistical analysis
Single SNP association was performed for each study by regressing T2D case/control status on genotypes. To account for uncertainty of genotype calls during imputation, genotype probabilities or dosage were used for association tests in imputed SNPs. The association tests assumed an additive genetic model and adjusted for age, sex, study centers, and principal components. Principal components were included to control for confounding effects of admixture proportion and population structure. Secondary analysis with additional adjustment for BMI was performed for SNPs with P,1610 25 in stage 1 meta-analysis and index SNPs previously reported to be associated with T2D or glucose homeostasis traits. BMI adjustment allows increasing power to detect T2D loci independent of BMI effect and diminish associations at T2D loci with effects modulated through BMI. Logistic regression was used for samples of unrelated individuals. Generalized estimating equations [52] or SOLAR [53] were used for samples of related individuals. Association results with extreme values (absolute beta coefficient or standard error .10), primarily due to low cell counts resulting from small sample sizes and/or low minor allele frequencies, were excluded (Table S3).

Meta-analysis
In stage 1, association results were combined by a fixed effect model with inverse variance weighted method using the METAL software [12]. Genomic control correction [11] was applied to each study before meta-analysis, and to the overall results after meta-analysis. Results from SNPs genotyped in ,10,000 samples and those with allele frequency difference .0.3 among studies were excluded. A total of 2,579,389 SNPs were analyzed in the meta-analysis (Table S3). In stage 2a, association results from African American replication studies were also combined using a fixed effect inverse variance weighted method. To assess the overall effects in African Americans (stage 1+2a) and both African Americans and Europeans (stage 1+2a+2b), association results from studies in the respective stages were combined using a fixed effect inverse variance weighted method. Genome-wide significance is declared at P,5610 28 from the meta-analysis result of all stages, which has better power than the replication-based strategy [54].
Among the 51 SNPs carried forward for replication, heterogeneity of effect sizes across studies within each stage was assessed using Cochran's Q statistic implemented in METAL. Meta-analysis results from stages 1 and 2a, stage 1+2a and 2b were used to assess heterogeneity of effect sizes between discovery and replication stages in African Americans, and between African Americans and Europeans, respectively. For SNPs with significant heterogeneous effect size after multiple comparison corrections (P het ,0.001), metaanalysis results including studies of all stages assessed by the random effect model implemented in GWAMA [55] were reported. Heterogeneous associations may partly due to differences in ascertainment scheme across studies. For index SNPs reported in prior studies, assessment of heterogeneity using Cochran's Q statistic between prior studies and this study were also reported.

Transferability analysis
Index SNPs associated with T2D or glucose homeostasis traits from prior GWAS and candidate gene studies were examined for association with T2D in African Americans (Table S5). For the index SNP association tests, a per-SNP P value ,0.05 was defined as significant. In the locus-wide analysis, the boundaries of a locus were defined by the most distant markers (within 6500 kb) using the 1KGP CEU data with r 2 $0.3 with the index SNP. All MEDIA SNPs within these bounds were examined for association analysis. All pairwise LD values within each locus were estimated using the 1KGP CEU and ASW data. To estimate the effective number of SNPs at a locus, we retrieved genotypes from the 1KGP ASW data for markers present in MEDIA, estimated the sample covariance matrix from those genotypes, and spectrally decomposed the covariance matrix [24]. The effective number of SNPs was estimated using the relationship N eff~P in which l k is the k th eigenvalue of the K6K covariance matrix for the K SNPs in the locus [24]. The per-locus significance level was defined as 0.05/effective number of SNPs (Table S6). By accounting for all SNPs within the bounds of LD, the per-locus significance level is corrected to account for markers in LD with the index SNP as well as markers not in LD with the index SNP, thereby potentially allowing for discovery of new associations at markers not tagged by the index SNP.

Liability-scale variance explained
For each independent locus, we estimated the sibling relative risk using the most strongly associated SNP within that locus. Let p i and y i be the risk allele frequency and the corresponding odds ratio at the i th SNP, respectively. Assuming the additive genetic model and independence between SNPs, the contribution to the sibling relative risk l s for a set of N SNPs is given by T{T 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi representing the standard normal quantile function and z representing the standard normal density at T [57].

Coverage
The coverage of MEDIA SNPs to the human genome was estimated using HaploView [58] via pairwise tagging at the r 2 = 0.8 threshold. We used all SNPs with minor allele frequencies $1% in both MEDIA and the 1KGP ASW sequence data. Coverage was estimated using non-overlapping bins of 1,000 SNPs.

Power analysis
Study power was calculated using the genetic power calculator [59]. For SNPs with MAF$0.3, our study had .80% power to detect odds ratios for T2D at OR$1.06 and $1.13 at P,0.05 and P,5610 28 , respectively, in stage 1 samples under an additive model. The observed odds ratios among our stage 1 most significantly associated SNPs with P,1610 25 ranged from 1.11 to 1.56 (Table S4). Given our African American sample size in stage 1+2a, our study had .80% power to detect OR$1.1 at P, 5610 28 at MAF$0.3, thus provided good power to detect genome-wide significance among the most significantly associated SNPs using all African American samples. For T2D SNPs reported from the literature, power was also calculated from the reported effect size using the risk allele frequency from this study for stage 1 samples at P,0.05 and P,5610 28 , respectively (Table  S5).

Gene expression analysis
The MuTHER resource (www.muther.ac.uk) includes lymphoblastoid cell lines (LCLs), skin, and adipose tissue derived simultaneously from a subset of well-phenotyped healthy female twins from the TwinsUK adult registry [60]. Whole-genome expression profiling of the samples, each with either two or three technical replicates, was performed using the Illumina Human HT-12 V3 BeadChips (Illumina Inc.) according to the protocol supplied by the manufacturer. Log 2 -transformed expression signals were normalized separately per tissue as follows: quantile normalization was performed across technical replicates of each individual followed by quantile normalization across all individuals. Genotyping was performed with a combination of Illumina arrays (HumanHap300, HumanHap610Q, 1M-Duo, and 1.2MDuo 1M). Untyped HapMap2 SNPs were imputed using the IMPUTE2 software package. In total, 776 adipose and 777 LCL samples had both expression profiles and imputed genotypes. Association between all SNPs (MAF.5%, IMPUTE info .0.8) within a gene or within 1 Mb of the gene transcription start or end site and normalized expression values were performed with the GenABEL/ProbABEL packages [61][62] using the polygenic linear model incorporating a kinship matrix in GenABEL followed by the ProbABEL mmscore score test with imputed genotypes. Age and experimental batch were included as cofactors.
Genotype and gene expression in LCL in HapMap samples were also available [63]. Association of genotypes and gene expression of transcripts within 1 MB of tested SNPs were analyzed separately for CEU and YRI populations. The variance components model implemented in SOLAR was used for association analysis which accounts for correlation among related individuals [53].
In this study, we examined the association of the most significantly associated SNPs from the six genome-wide significant loci and their proxies (r 2 $0.8 in ASW) within 1 Mb of the associated SNPs with cis-expression quantitative trait loci (eQTLs) in peripheral blood leukocytes (LCL) and adipose tissue (Table S8).

ENCODE data analysis
We examined putative function of non-coding genome-wide significant SNPs and their proxies within 1 Mb (r 2 $0.8 in 1KGP ASW) using HaploReg [30] and RegulomeDB [64]. These databases interrogated multiple chromatin features from the Encyclopedia of DNA Elements (ENCODE) project [29]. High priority was given to variants annotated as protein-binding via ChIP-seq, and motif-changing via position weight matrices, with the respective transcription factors implicated in diabetes pathogenesis and related biological processes. Figure S1 Forest plots of the most strongly associated SNPs at five previously and newly identified T2D loci in African Americans. Odds ratio and 95% CIs are presented for individual studies (black circle and line) and meta-analysis results (red diamond and line). At KCNQ1, two independent associated SNPs are shown. (PDF) Figure S2        Text S1 Description of GWAS and replication studies. (PDF)