Consistent Association of Type 2 Diabetes Risk Variants Found in Europeans in Diverse Racial and Ethnic Groups

It has been recently hypothesized that many of the signals detected in genome-wide association studies (GWAS) to T2D and other diseases, despite being observed to common variants, might in fact result from causal mutations that are rare. One prediction of this hypothesis is that the allelic associations should be population-specific, as the causal mutations arose after the migrations that established different populations around the world. We selected 19 common variants found to be reproducibly associated to T2D risk in European populations and studied them in a large multiethnic case-control study (6,142 cases and 7,403 controls) among men and women from 5 racial/ethnic groups (European Americans, African Americans, Latinos, Japanese Americans, and Native Hawaiians). In analysis pooled across ethnic groups, the allelic associations were in the same direction as the original report for all 19 variants, and 14 of the 19 were significantly associated with risk. In summing the number of risk alleles for each individual, the per-allele associations were highly statistically significant (P<10−4) and similar in all populations (odds ratios 1.09–1.12) except in Japanese Americans the estimated effect per allele was larger than in the other populations (1.20; Phet = 3.8×10−4). We did not observe ethnic differences in the distribution of risk that would explain the increased prevalence of type 2 diabetes in these groups as compared to European Americans. The consistency of allelic associations in diverse racial/ethnic groups is not predicted under the hypothesis of Goldstein regarding “synthetic associations” of rare mutations in T2D.

It has recently been argued that single rare causal variants and/ or collections of multiple different rare variants on unrelated haplotypes may create ''synthetic associations'' of common variants with disease risk [19][20][21]. One prediction of this model is that the associations with common variants will not be consistent across populations (since many of the mutations will be young in age, and post-date the migrations that led to the founding of modern continental populations). Type 2 diabetes has been specifically discussed as a possible case in which synthetic associations might be operative, based on the lack of statistical significance in very small studies that examined allelic associations for T2D in multi-ethnic samples.
Testing the association of each validated risk allele for T2D in multiple populations is an important step to determine (a) whether these genetic markers can be used to better understand population risk in non-European populations, (b) to measure their association with racial/ethnic variation in disease risk, and (c) to test a prediction of the Goldstein ''common SNP, rare mutation'' hypothesis [19][20][21].
To allow for comparability of estimates of genetic risk among racial/ethnic groups requires large studies comprised of cases and controls defined using identical criteria and sampled ideally from the same study population. In the present study, we, as part of the Population Architecture using Genomics and Epidemiology (PAGE) Study, examined genetic associations with 19 validated risk alleles for T2D in European American, African American, Latino, Japanese American, and Native Hawaiian T2D cases (n = 6,142) and controls (n = 7,403) from the population-based Multiethnic Cohort study (MEC). We also evaluated whether these variants can be utilized to model the genetic risk of T2D in each population and their association to disparities in risk.

Results
The age of the cases and controls ranged from 45 to 77 at cohort entry, with the mean age of cases (mean 59.0 years) being essentially the same as the controls (mean 58.8 years), and African Americans being on average the oldest (mean 60.2 years) and Native Hawaiians the youngest (mean 55.6 years). Compared to controls, cases were heavier, more likely to be a current or former smoker, less physically active and had fewer years of education (Table 1). Compared to the other groups, the Japanese were leaner (for cases and controls, men and women).
We first assessed whether the ''risk allele'' of each SNP was associated in the same direction (odds ratios.1) in each ethnic group. Whereas the null hypothesis is that 50% of ''risk'' alleles would trend in the same direction, we observed from 12 (63%; P = 0.18; binomial probability) in European Americans to 19 (100%; P = 1.9610 26 ) in Japanese Americans. The number of these associations that reached nominal significance (P,0.05) ranged from 3 (P = 0.067; binomial probability) in Native Hawaiians to 10 (P = 5.9610 29 ) in Japanese ( Table 2). For the majority of alleles with positive associations, odds ratios for

Author Summary
Single rare causal alleles and/or collections of multiple rare alleles have been suggested to create ''synthetic associations'' with common variants in genome-wide association studies (GWAS). This model predicts that associations with common variants will not be consistent across populations. In this study, we examined 19 T2D variants for association with T2D risk in 6,142 cases and 7,403 controls from five racial/ethnic populations in the Multiethnic Cohort (European Americans, African Americans, Latinos, Japanese Americans, and Native Hawaiians). In racial/ ethnic pooled analysis, all 19 variants were associated with T2D risk in the same direction as previous reports in Europeans, and the sum total of risk variants was significantly associated with T2D risk in each racial/ethnic group. The consistent associations across populations do not support the Goldstein hypothesis that rare causal alleles underlie GWAS signals. We also did not find evidence that these markers underlie racial/ethnic disparities in T2D prevalence. Large-scale GWAS and sequencing studies in these populations are necessary in order to both improve the current set of markers at these risk loci and identify new risk variants for T2D that may be difficult, or impossible, to detect in European populations. homozygous carriers were greater than for heterozygous carriers in each population, which provides support for their associations and allele dosage effects (Table S2). In African Americans, results were similar after adjustment for percent European ancestry (Table S3). Adjustment for education, a proxy for socio-economic status (SES) and European ancestry, did not influence the results (Table S4) [23]. We next performed analyses that combined evidence for association across the five ethnic groups. In this analysis the power to achieve nominal significance for the allelic effects reported previously was .80% for 18 out of 19 alleles (average 94%; Table S1). In this analysis all 19 (100%; P = 1.9610 26 , binomial probability) variants were associated with risk in the same direction as the initial report (odds ratios.1) and 14 (P = 5.7610 215 ; binomial probability) with nominal statistical significance (P,0.05). All 19 associations remained in the same direction as previous reports (OR.1) and 13 of the variants were significantly associated with T2D risk when the European American subjects were excluded from the analysis. The association of rs8050136 in FTO was attenuated by adjustment for BMI (odds ratio (95% confidence interval), 1.06(1.00-1.11) prior to adjustment; 1.02(0.96-1.08) after adjustment). Only 5 of the 19 risk variants showed nominal evidence for heterogeneity in the odds ratio across ethnic groups, and only one of these (CDKAL1) was significant after correction for having performed 19 tests (PPARG, rs1801282, P het = 0.048; WFS1, rs10010131, P het = 0.032; CDKAL1, rs7754840, P het = 6.2610 24 ; HHEX, rs1111875, P het = 0.037; and, HNF1B, rs4430796, P het = 0.043; Table 2).

Summary Measures of Risk
We next calculated a summary risk score comprised of an unweighted count of the 19 risk-associated alleles. The average increment in risk per allele was generally similar in all populations, except Japanese Americans, where the effect of each allele was nearly double that observed in Europeans ((odds ratio, 95% confidence interval): African Americans, 1.09, 1.05-1.12; (P = 3.0610 26 ); Native Hawaiians, 1.10, 1.06-1.15 (P = 1.2610 25 ); European Americans, 1.11, 1.06-1.17 (P = 1.2610 25 ); Latinos, 1.12, 1.09-1.14 (P = 7.5610 219 ); and, Japanese, 1.20, 1.17-1.24; (P = 7.0610 232 ); P het = 3.8610 24 ). Results were similar when limiting the analysis to individuals with complete genotype data for all variants and when including only those markers associated with risk (at P,0.10) (Table  S5). Individuals in the top quartile of the risk allele distribution were at 1.6 (African Americans, P = 5.3610 24 ) to 3.1-fold (Japanese Americans, P = 7.9610 226 ) greater risk of diabetes compared to those in the lowest quartile (Table 3).  Using these ethnic-specific per allele odds ratio estimates and the aggregate risk allele counts, we built a quantitative risk model to compare the distribution of genetic risks between populations associated with these marker alleles. The higher average number of risk alleles in African Americans caused their distribution to be slightly right shifted (towards higher log ORs) compared to European Americans, however their relatively smaller per allele odds ratio resulted in wide overlap with the European American distribution (Figure 2). The Japanese Americans had a wider distribution of risk because of the large per allele odds ratio, but the low average risk allele counts caused the Japanese distribution to be left-shifted (towards lower log ORs) compared to European Americans. The distributions for Latinos and Native Hawaiians were very similar to the European Americans.

Discussion
We tested 19 common genetic risk markers that were discovered in European populations. We found that association with all 19 of these SNPs trended in the same direction in this large multiethnic study, and the majority of these variants were nominally significant in their association with diabetes risk. A risk score comprised of these alleles was significantly associated with diabetes risk in all five racial/ethnic groups, with the only significant heterogeneity being larger effect sizes in Japanese Americans. However, in comparing the distribution of risk conferred by these alleles between populations we found that they explain little, if any, of known differences in the prevalence of diabetes between these populations.
These observations indicate that most, if not all, of these alleles show directionally similar association to T2D across many populations. Such a pattern indicates that the causal alleles at these validated risk loci (which have yet to be found) likely predate the migrations that separated these populations now residing in Europe, Africa, East Asia, the Pacific Islands and the Americas. We note that this pattern is unexpected under the recently described ''common SNP, rare mutation'' model of Goldstein that suggests that GWAS signals with common alleles for T2D and other diseases may be ''synthetic associations'' created by one or more rare alleles [19][20][21]. Under the Goldstein Hypothesis the consistent associations that we noted at these loci across populations would only be observed if, in each population, one or more distinct rare alleles arose at each locus, and they happened to arise each time on the same haplotype background. Although possible, this scenario seems unlikely, and a more parsimonious explanation would be the ''synthetic association'' hypothesis of Goldstein does not apply to a majority of these T2D SNPs.
The modest number of cases and controls in this study (as compared to the initial discovery studies) likely underlies the lack of statistically significant associations in some groups. Weaker associations in some racial/ethnic groups may also be due to differences in allele frequencies, linkage disequilibrium, and environmental and genetic modifiers. In two cases (WFS1 and CDKAL1), significant heterogeneity by race/ethnicity reflected a lack of association in African Americans, perhaps because of lower linkage disequilibrium between the marker and the biologically relevant allele.
It is interesting that the odds ratios observed for these marker SNPs were larger in Japanese Americans than in the original discovery cohorts, and in the other ethnic groups in our study. A meta-analysis of 7 association studies in Japanese populations replicated associations from studies in European populations for 7 loci under study (TCF7L2, CDKAL1, CDKN2B, IGF2BP2, SLC30A8, KCNJ11, and HHEX) [ Table 2. cont observed significant associations in KCNQ1 as well as these same 7 loci and, similar to our observations, noted magnitudes of effect that were generally stronger than previously observed in European populations [25]. Additional studies in other Asian populations have replicated associations with many of these loci as well [24,[26][27][28].
In the Multiethnic Cohort, we have found the prevalence of T2D to be at least 2-fold higher in African Americans, Latinos, Japanese and Native Hawaiians compared to European Americans, with these differences being independent of body weight [14]. We examined the extent to which the known genetic risk alleles for diabetes could explain these disparities by quantifying and comparing the relative risk distributions between populations. Compared to European Americans, we did not observe evidence of greater genetic risk in any population. Our findings therefore indicate that these risk markers explain little, if any, of racial/ ethnic disparities in T2D prevalence. It remains possible that the actual causal alleles in these regions may be more common in frequency and/or have larger effects than the index signals in non-European populations. As seen with KCNQ1 [1,2], GWAS in non-European populations are effective in discovering risk loci that are important in multiple populations but difficult to identify in European populations where the alleles are rare.
This study had a number of limitations. First, a self-report of diabetes and use of medication for diabetes was used to define cases and controls. We observed that approximately 1% of a random sample of the controls in this study had HbA1C levels above 7.0%, which suggests that only a small portion of controls had undiagnosed diabetes (see Materials and Methods). Also, our case definition did not differentiate between T1D and T2D, however we expect this misclassification to be minor as ,3% of T2D cases had a previous diagnosis of T1D based on other sources (see Materials and Methods). The highly consistent findings of this study, as compared to the discovery GWAS reports, argue that our phenotypic characterization is adequate to observe the association to T2D.
Some caution should also be given to the interpretation of the risk modeling conducted in each ethnic group, as the genetic markers included are unlikely to be the causal alleles. Future finemapping and sequencing studies to identify the functional variants (common and/or rare) and large-scale testing of each allele will be required to more precisely model risk as well as assess differences in the distribution of genetic risk across populations.
Another limitation is that we did not account for the potential confounding effects of population stratification. However, odds ratios were essentially unchanged after adjusting for global European ancestry in a subset of African Americans (336 cases 397 controls) for whom ancestry markers were available, suggesting that effects due to population substructure were not substantial, at least in this group. We also noted that controlling for education, a proxy for SES which has been shown to be significantly associated with Native American ancestry in Latinos [23], had little effect on the associations with these risk alleles. Furthermore, the risk alleles were not generally more frequent in Latinos than in European Americans which would be likely if these alleles were proxies for more general ancestry differences. While population stratification is unlikely to fully explain these findings, it remains possible that at some loci, the causal alleles may be more correlated with ancestry than the index SNPs.
In summary, our data provide strong support for common genetic variation contributing to T2D risk in multiple populations. Our findings in T2D do not support the theory that GWAS signals are due to rare alleles. Nonetheless, GWAS and sequencing studies in these and other racial/ethnic populations are needed to reveal a more complete spectrum of risk alleles that are important globally as well as those that may contribute to risk disparities.

Ethics Statement
The Institutional Review Boards at the University of Southern California and University of Hawaii approved the study protocol.

Study Population
The MEC consists of 215,251 men and women, and comprises mainly five self-reported racial/ethnic populations: European Americans, African Americans, Latinos, Japanese Americans and Native Hawaiians [29]. Between 1993 and 1996, adults between 45 and 75 years old were enrolled by completing a 26-page, selfadministered questionnaire asking detailed information about dietary habits, demographic factors, level of education, personal behaviors, and history of prior medical conditions (e.g. diabetes). Potential cohort members were identified through Department of Motor Vehicles drivers' license files, voter registration files and Health Care Financing Administration data files. In 2001, a short follow-up questionnaire was sent to update information on dietary habits, as well as to obtain information about new diagnoses of medical conditions since recruitment. Between 2003 and 2007, we re-administered a modified version of the baseline questionnaire. All questionnaires inquired about history of diabetes, without specification as to type (1 vs. 2). Between 1995 and 2004, blood specimens were collected from ,67,000 MEC participants at which time a short questionnaire was administered to update certain exposures, and collect current information about medication use.
Cohort members in California are linked each year to the California Office of Statewide Health Planning and Development (OSHPD) hospitalization discharge database which consists of mandatory records of all in-patient hospitalizations at most acutecare facilities in California. Records include information on the principal diagnosis plus up to 24 other diagnoses (coded according to ICD-9), including T1D and T2D. In Hawaii cohort members have been linked with the diabetes care registries for subjects with Hawaii Medical Service Association (HMSA) and Kaiser Permanente Hawaii (KPH) health plans (,90% of the Hawaiian population has one of these two plans) [15]. Information from these additional databases have been utilized to assess the percentage of T2D controls (as defined below) with undiagnosed T2D, as well as the percentage of identified diabetes cases with T1D rather than T2D. Based on the OSHPD database ,3% of T2D cases had a previous diagnosis of T1D. We did not use these sources to identify T2D cases because they did not include information on diabetes medications, one of our inclusion criteria for cases (see below).
In this study, diabetic cases were defined using the following criteria: (a) a self-report of diabetes on the baseline questionnaire, 2 nd questionnaire or 3 rd questionnaire; and (b) self-report of taking medication for T2D at the time of blood draw; and (c) no diagnosis of T1D in the absence of a T2D diagnosis from the OSHPD (California Residents). Controls were defined as: (a) no self-report of diabetes on any of the questionnaires while having completed a minimum of 2 of the 3 (79% of controls returned all 3 questionnaires); and (b) no use of medications for T2D at the time of blood draw; and (c) no diabetes diagnosis (type 1 or 2) from the OSHPD, HMSA or KPH registries. To preserve DNA for genetic studies of cancer in the MEC, subjects with an incident cancer diagnosis at time of selection for this study were excluded. Controls were frequency matched to cases on age at entry into the cohort (5-year age groups) and for Latinos, place of birth (U.S. vs. Mexico, South or Central America), oversampling African American, Native Hawaiian and European American controls to increase statistical power.
Fasting glucose (FG) and HbA 1C measurements were used to validate the case-control selection criteria. Among 185 T2D cases and 1,048 controls who met the T2D case-control definitions above and with FG measurements available from ongoing studies in the MEC, 57% of cases (ranging from 43% in European Americans to 63% in Japanese Americans) and 3% of controls (ranging from 1% in African Americans to 6% in Latinos) had a FG value .125 mg/dl. We also measured HbA 1C (ARUP Laboratories, Salt Lake City, Utah) in 50 cases and 50 controls per each sex-ethnic group. Just over 1% (6/500) of controls were likely to have unreported T2D (HbA 1C value $7%). In contrast, ,47% (234/500) of T2D cases had HbA 1C $7% (ranging from 41% in European Americans to 57% of Native Hawaiians). Since hypoglycemic medication use was part of the case selection criteria, some cases were expected to have FG and HbA 1C levels in the normal range.
Altogether, this study included 6,142 T2D cases and 7,403 controls (European American (533/1,006), African American (1,077/1,469), Latino (2,220/2,184), Japanese American (1,736/ 1,761) and Native Hawaiian (576/983)). Genotyping was conducted by the TaqMan allelic discrimination assay (Applied Biosystems, Foster City, CA) [30]. For all SNPs, genotype call rates were .95% among case and control groups in each population and HWE p-values among controls were .0.05 in at least 4 of the 5 ethnic groups and none of the values were ,0.01 (Table S6). Subjects missing data for .5 SNPs (n = 82) were removed from the analysis.

Statistical Analysis
Odds ratios and 95% confidence intervals were calculated for each allele in unconditional logistic regression models while adjusting for age at cohort entry (quartiles), body mass index (BMI, kg/m 2 , quartiles), sex, and race/ethnicity (pooled analysis) in ethnic-stratified and pooled analyses. Associations with the two variants at KCNQ1 were examined adjusting for the other allele. Potential confounding factors including, smoking history, education, physical activity, and history of hypertension were evaluated but did not influence the results. Potential confounding by percent European ancestry was examined in a subset of African American men (336 cases, 397 controls) with available genetic ancestry information [16,31,32].
We also modeled the cumulative genetic risk of T2D using these markers. We summed the number of risk alleles for each individual and estimated the odds ratio per allele for this aggregate unweighted allele count variable as an approximate risk score appropriate for unlinked variants with independent effects of approximately the same magnitude for each allele. We also examined a second model where each allele was weighted and multiplied by the log of the published odds ratio prior to summing all alleles. The results of the more parsimonious unweighted risk score is presented as the two risk scores were highly correlated in each ethnic group (Pearson r$0.92) and similar associations with T2D risk were observed for each score. For individuals missing genotypes for a given SNP, we assigned the average number of risk alleles within each ethnic group (26 risk allele frequency) to replace the missing value for that SNP. We used these ethnicspecific per allele summary odds ratios and the total number of risk alleles among control subjects to estimate the distribution of relative risks conveyed by all risk alleles. To avoid making the reference group carriers of zero risk alleles (a group which does not exist) we centered the distribution on the mean number of risk alleles observed in the control population (18.5). The log relative risk for each subject was calculated as logRR = (RA218.5)6log(OR i ) (where RA is equal to the subject's total risk alleles and log(OR i ) is the log of the ethnic specific per allele odds ratio. A spline function was used to capture the shape of the distributions of log OR for display purposes. Two variants in KCNQ1 were included in the risk modeling because both were significantly associated with T2D when co-modeled (results were similar when only the most significant of the two, rs2237897, was included). The variant in FTO was excluded from risk modeling procedures, as we found (as have others) that it is not a risk factor for diabetes independent of its effect on obesity.