Associations between Single Nucleotide Polymorphisms in Iron-Related Genes and Iron Status in Multiethnic Populations

The existence of multiple inherited disorders of iron metabolism suggests genetic contributions to iron deficiency. We previously performed a genome-wide association study of iron-related single nucleotide polymorphisms (SNPs) using DNA from white men aged ≥25 y and women ≥50 y in the Hemochromatosis and Iron Overload Screening (HEIRS) Study with serum ferritin (SF) ≤12 µg/L (cases) and controls (SF >100 µg/L in men, SF >50 µg/L in women). We report a follow-up study of white, African-American, Hispanic, and Asian HEIRS participants, analyzed for association between SNPs and eight iron-related outcomes. Three chromosomal regions showed association across multiple populations, including SNPs in the TF and TMPRSS6 genes, and on chromosome 18q21. A novel SNP rs1421312 in TMPRSS6 was associated with serum iron in whites (p = 3.7×10−6) and replicated in African Americans (p = 0.0012).Twenty SNPs in the TF gene region were associated with total iron-binding capacity in whites (p<4.4×10−5); six SNPs replicated in other ethnicities (p<0.01). SNP rs10904850 in the CUBN gene on 10p13 was associated with serum iron in African Americans (P = 1.0×10−5). These results confirm known associations with iron measures and give unique evidence of their role in different ethnicities, suggesting origins in a common founder.


Introduction
The levels of iron, a micronutrient required for life and health, must be tightly regulated to avoid excess unbound iron that can generate toxic free radicals while at the same time maintaining adequate supplies for vital functions including oxygen carrying capacity [1,2]. Disorders of iron metabolism underlie some of the most prevalent diseases in humans and encompass a broad spectrum of clinical manifestations, ranging from anemia to iron overload and neurodegenerative diseases [3]. Body iron balance normally is maintained by control of iron uptake from the diet by the duodenal enterocytes and its transfer to the systemic circulation, as humans cannot actively excrete iron. Intestinal iron absorption and release of stored iron from macrophages are dependent on similar pathways [4]. While many factors can lead to iron deficiency, most commonly it is attributable to blood loss, a lack of dietary abundance, or defective absorption that collectively affect two-thirds of the world's population [5]. However, the existence of multiple inherited disorders of iron metabolism in man, rodents and other vertebrates make plausible a genetic contribution to iron deficiency [6,7,8]. It is widely known that genetics can play a significant role in the iron overload found in whites, the most common example being hereditary hemochromatosis attributable to mutations in the hemochromatosis gene, HFE [9,10]. Less is known about the role of genetic factors in disorders of iron status in other ethnic groups or about genetic effects on susceptibility to iron deficiency.
In order to investigate the genetic contribution to iron deficiency in whites, we recently completed a genome-wide association study (GWAS) in iron-deficient white male and female participants in the Hemochromatosis and Iron Overload Screening (HEIRS) study [11]. Case-control status and seven quantitative iron-related measures were studied, including serum iron (SI), total iron-binding capacity (TIBC), unsaturated ironbinding capacity (UIBC), transferrin saturation (TfS), serum ferritin concentration (SF), serum transferrin receptor (sTfR), and body iron. The quantitative iron-related measures were significantly associated with presence of iron deficiency in the GWAS (p,0.001), and a high degree of concordance was observed in the results across the quantitative traits. The study found genome-wide statistically significant associations between at least one of the iron measures and SNPs on chromosomes 2p14, 3q22 (in the transferrin gene, TF), 6p22 (in the HFE gene), 7p21 and 22q11, with the association at TF being replicated in a follow-up case-control study.
Furthermore, mutations in the TMPRSS6 gene have been implicated in iron deficiency anemia refractory to oral iron therapy within white populations [12,13,14]. Further evidence of genetic influences on iron status was found in a recent GWAS of four serum markers of iron status (serum iron, transferrin, transferrin saturation and serum ferritin) [15]. Along with confirming previously reported associations of the HFE C282Y mutation, significant associations between iron status markers and the TMPRSS6 gene and TF were reported. Tanaka and colleagues investigated genetic variants associated with iron concentrations in persons not affected by overt genetic disorders of iron metabolism and found SNPs in TMPRSS6 were strongly associated with lower serum iron concentration and other hematological variables [16]. Most of the genetic studies of iron deficiency and iron status markers carried out to date have used samples from populations of white individuals.
In the current study, we have investigated SNPs and iron status in multiple ethnic groups, including not only whites but also African Americans, Hispanics, and Asians. Our aims were to investigate whether the same SNPs associated with iron deficiency in whites play a similar role in other ethnic groups and to identify additional SNPs that may play a role in iron deficiency in these populations. The role of 1239 candidate or known SNPs associated with iron deficiency and iron status measures was tested in white, African-American, Hispanic and Asian iron deficient case and normal control samples. For consistency, the same outcome measures of iron status previously examined in whites were assessed in the other ethnic groups in the current study. To our knowledge, this research represents the first major assessment of candidate genetic determinants of iron deficiency and iron status measures in non-white populations. In addition to studying the association between SNPs and the primary outcome of presence of iron deficiency, a unique aspect of the statistical approach was the estimation of the effect of increasing copies of the minor allele of selected SNPs on changes in the degree of iron deficiency in multiple ethnicities.

Study Population
The current study utilized a subset of subjects who had been enrolled in the initial screening phase of the HEIRS Study at five Field Centers encompassing six geographic locations including Alabama, California, District of Columbia, Hawaii, and Oregon in the United States, and Ontario, Canada [17,18]. Participants were eligible for the current study if they had not withdrawn consent and had agreed to blood storage. Selection criteria included self-report of white or Caucasian, African-American, Hispanic or Asian race/ethnicity, males at least 25 years of age and females at least 50 years. Females younger than 50 years were excluded because of pre-menopausal iron depletion from blood loss. Approval for the study was obtained from the following: Population samples consisting of cases of iron deficiency and controls were selected from the African-American, Asian, Hispanic, and white participants in the HEIRS Study. Cases of iron deficiency were defined as subjects having a serum ferritin concentration (SF) #12 mg/L, the point of total depletion of iron stores [19,20]. Controls (SF .100 mg/L in men, SF .50 mg/L in women) were frequency matched 2:1 to cases by sex and geographic location. Cases and controls who were selected from African Americans (77 cases, 144 controls), Asians (51 cases, 102 controls), and Hispanics (79 cases, 160 controls) were new to this study and had not been included in the previous GWAS. White subjects included 357 cases and 358 controls from the previous GWAS as well as additional 374 white controls added to achieve the desired 2:1 frequency matching.

Laboratory Methods
Laboratory methods are described in detail elsewhere [11]. Briefly, HFE C282Y and H63D genotypes were determined using the InvaderH Assay (Third Wave Technologies, Madison WI). Lack of a detectable C282Y or H63D mutation was designated as HFE wild-type (wt/wt). Spectrophotometric measures of SI and UIBC levels, turbidometric immunoassay of SF (Roche Applied Science/Hitachi 911, Indianapolis, IN), and calculation of TfS were performed on non-fasting blood samples. SI, SF, UIBC, and sTfR, were analyzed using Roche reagents on the Roche/Hitachi Modular P instrument (Roche Diagnostics, Indianapolis IN). TIBC was calculated as the sum of SI + UIBC. TfS was calculated as the ratio, SI/TIBC, and expressed as a percentage. Body iron (mg/kg), an index of iron deficiency, was assessed as follows: body iron = -[log 10 ((sTfR 61000)/SF) 22.8229]/0.1207. In this approach, body iron is expressed as a positive value when stores are present and negatively with tissue iron deficiency [21,22]. A body iron , 24 mg/kg body weight represents a deficit severe enough to produce anemia. However, positive values may occur in some cases of iron deficiency, for example, when sTfR is not elevated as a result of a lack of erythropoietin related to co-morbid conditions such as kidney disease. The sTfR/SF ratio was calibrated previously by quantitative phlebotomy performed in healthy subjects [23]. To exclude common environmental causes of iron deficiency, antibody testing was performed for H. pylori, carcinoembryonic antigen (CEA), and celiac disease. C-reactive Protein (CRP), alanine aminotransferase (ALT), and gammaglutamyltransferase (GGT) were measured to identify acute phase protein elevations in SF.

Sample and SNP Selection, Genotyping and Quality Control Procedures
Buffy coat DNA was extracted and purified by SDS cell lysis followed by a salt precipitation method for protein removal using commercial PuregeneH reagents (Gentra System, Inc., Minneapolis, MN, now Qiagen, Valencia, CA). Using GoldenGate methodology, a custom SNP set with 1536 SNPs per array was designed to cover the number of SNPs that had been chosen for genotyping. These included 1239 SNPs as follows: a) 107 unique SNPs chosen on the basis of our previous GWAS performed in whites that showed significant associations with iron-related outcomes (p-value ,0.00005), b) 67 SNPs tagging regions identified in the GWAS, c) 36 SNPs associated with iron status that had been reported in the scientific literature and that were located in TF, HFE, and TMPRSS6 genes, among others, and d) 1029 tag SNPs located in candidate genes for iron metabolism. Additionally, 297 ancestry informative markers were genotyped to estimate the admixture proportions in the African-American and Hispanic samples (Table S1). A CEPH trio and a within-study replicate were placed on each plate to assess the concordance of genotype calls. Mendelian consistency for all trios was greater than 99% and reproducibility for the same CEPH individuals across all plates was greater than 98%. Reproducibility for the thirteen within-study replicates placed across plates was greater than 99%. Reported gender was compared to gender estimated by Illumina's GenomeStudio, based on gender targets built into the custom OPA. Twelve samples were excluded due to a conflict between reported and genetically inferred gender. Because of allele frequency differences between the four ethnic groups, further SNP and sample quality control assessments were done separately for each group. SNPs were filtered based on call rate (,95%) and allele frequency (,0.005). Individuals were filtered on call rate (,95%), heterozygosity (.,50%) and IBS (.90%). Quality control tests were completed using the GenABEL library [24] of the R statistical package (http://www.r-project.org/). The genotype distributions of each SNP were tested for fit to Hardy-Weinberg equilibrium HWE expectations. No SNPs were excluded from the association analysis based on the p-value of the HWE test. From the 1702 unique samples included for genotyping, there were 1084 white, 153 Asian, 212 African-American and 233 Hispanic individuals that passed the quality control assessments.

Statistical Analyses
Eight iron-related outcomes were studied. The primary outcome was iron-deficient case-control status. Other indicators of iron status included SI, SF, TfS, sTfR, body iron, TIBC, and UIBC. With the exception of the dichotomous iron deficient casecontrol status variable, the variables were continuous quantitative traits. The distributions of the seven continuous outcomes and four continuous covariates were tested for their fit to the normal distribution. Natural logarithm transformations were applied to the SF, TfS and sTfR outcomes and the CEA, CRP, GGT and ALT covariates to improve their fit to the normal distribution.
Genotypes were coded as 0, 1 or 2 indicating the number of minor alleles in the genotype and were modeled as continuous variables in the multiple regression models. Covariates showing nominally significant effects on an outcome were included in multiple regression models that also included genotype effects. Odds ratios and regression coefficients were computed with odds ratios representing the multiplicative increase in risk attributed to the addition of one copy of the minor allele to the genotype (sometimes referred to as the single allele odds ratio) and regression coefficients representing the change in the outcome associated with increasing copies of the minor allele. For the African-American and Hispanic samples, ancestry proportions were estimated from ancestry informative markers (AIMs) using the STRUCTURE program with K = 2 [25]. The estimated proportion of the first ancestry component was included as a covariate in the linear and logistic regression models for the two admixed populations.
After filtering based on the quality control assessments, there were 1134, 1115, 1113 and 1134 SNPs analyzed separately for association with the eight iron-related outcomes in the white, African-American, Hispanic and Asian population samples, respectively, adjusted for covariates. Although many of the tested SNPs are correlated through linkage disequilibrium and are not independent, we applied a conservative Bonferroni correction to adjust for the multiple tests. A total of 1134 independent tests were assumed for each population sample such that the Bonferroni multiple test-corrected nominal p-value of 0.05/1134 = 4.4610 25 represented the threshold for statistical significance. No further multiple test correction was made for the eight iron-related outcomes. The analysis strategy treated the four population samples as independent experiments and SNP association results obtained in one population sample were assessed for replication across the others. For a SNP that showed a statistically significant association with an iron-related outcome in a population sample (p,4.4610 25 ), in order to be considered as evidence for replication, the remaining population samples had to show an observed p-value ,0.01 for at least one of the iron-related outcomes and the direction of the effects had to be consistent with what was observed in the original sample. The statistical analysis was done using the R statistical package (http://www.r-project. org/) and the genotype association analysis used the GenABEL library of R [24]. Table 1 shows the distribution of age, sex, the HFE C282Y/ H63D genotype and the continuous outcomes, presented by population sample and iron deficient case-control status. Eight variables were assessed for significant covariate effects on the outcomes, including age, sex, the C282Y/H63D genotype, CagA strain infection status, and natural log of CEA, CRP, GGT, and ALT. The effect of the covariates on each outcome was assessed separately in each population sample using multiple regression models. Table 2 displays the variables that showed significant covariate effects (p-value ,0.05 for regression coefficient of the variable) for each outcome and population sample. The relatively large sample size for whites compared to the other population samples could explain why more variables showed significant effects on the outcomes in that sample. The covariates shown in Table 2 were included in the multivariate models that were used to test for association between the genotypes and the outcomes in each population sample. There was a significant association between sex and the outcomes of body iron, natural log of SF, natural log of sTfR, UIBC, and TIBC, adjusted for additional covariates. Table S2 displays the observed differences in males and females with regard to the mean of the continuous iron-related outcomes, and supports the statistical adjustment for sex in the multivariate models.

Associations between SNPs and Iron-related Outcomes
SNPs significantly associated with iron-related outcomes, after correction for multiple comparisons. Forty-nine SNPs showed statistically significant p-values, corrected for multiple testing, for at least one of the eight iron-related outcomes in at least one of the population samples (Table S3). Forty-eight of the significant associations were observed in the white population samples and one was found in the African Americans. The preponderance of significant results in the white sample was expected given that the sample size and statistical power of the sample far exceeded that of the other population samples, and because many of the SNPs were chosen based on the results of a GWAS that included 63% of the white samples analyzed in the current study. The 49 significantly associated SNPs were distributed within 14 distinct genomic regions. Eleven of the 49 associated SNPs were in the previously reported associated regions on chromosome 2p14, but not in known genes, and 20 of the 49 associated SNPs were in or around the TF gene on chromosome 3q22, which has been repeatedly shown to be associated with ironrelated outcomes. Two SNPs on chromosome 7p21 and one SNP on 22q11 were significantly associated in the current study and were also reported in the recent GWAS. Hence, 69% of the 49 significantly associated SNPs were in regions that were reported in the recent GWAS. The remaining 15 SNPs were distributed within ten distinct genomic regions. Eight of the regions (including 10 SNPs) showed suggestive evidence for association in the previous GWAS but failed to reach genome-wide significance and hence, were not reported in that study. Four SNPs in the TMPRSS6 gene on chromosome 22q12 showed statistically significant associations. The SNP rs10904850 in the CUBN gene on chromosome 10p13 was significantly associated with serum iron in the African-American sample (observed p-value = 1.04610 25 ), but showed no evidence for association in any of the other population samples.
SNPs in TF gene on chromosome 3q22. SNPs in and around the TF gene on chromosome 3q22 showed the strongest evidence for association in the white population sample, as well as the strongest evidence for replication in the other population samples. Forty SNPs were genotyped in this region. Twenty SNPs met the multiple test corrected statistical significance threshold in the white sample for association with TIBC, 14 of which showed significant association with UIBC as well. Ten out of the 20 SNPs associated with TIBC were located within the boundaries of the TF gene itself. Strong evidence for association was found between TIBC and the TF gene SNPs in the other three population samples. Table 3 shows the association results for TIBC in the four population samples for the six TF SNPs that were significantly associated in the white sample and showed association in at least one other population. Although the non-white populations did not meet the multiple test corrected statistical significance threshold individually, the evidence for association within these populations is very strong given their role as replication samples. Weaker statistical evidence for association was observed between the TF region SNPs and the other iron-related outcomes in all samples. Figure 1 shows the -log 10 (p-values) for the 36 SNPs in the TF gene region that passed the quality control assessments and minimum allele frequency threshold in all four population samples. The location and structure of the TF gene is shown across the top of the figure. The most significant associations were found at rs3811647 (observed p-value = 5.02610 215 ) and rs1525892 (observed pvalue = 4.56610 215 ), both located in the TF gene and indicated in Figure 1.
SNPs on chromosome 18q21 and in TMPRSS6 gene on chromosome 22q12. Table 4 shows the results for the SNPs on chromosomes 18q21 and 22q12 that had statistically significant associations in the white population sample and evidence for replication in at least one of the other three population samples. The SNP rs9948708 on chromosome 18q21 showed evidence for association with all the iron-related outcomes in the white sample with TIBC showing the most statistically significant association (observed p-value = 2.9610 25 ). Similar results were observed in the Asian population sample, where TIBC was the most significantly associated of the outcomes (observed p-value = 0.0048) and at least nominal statistical evidence for association was observed at the majority of the outcomes. The regression parameter estimates are consistent across the white and Asian samples although the effects appear considerably stronger in the Asian sample. The larger effect size but reduced evidence of significance of the parameter estimates could be due to the lack of statistical power with the small Asian sample and differences in the minor allele frequencies; the minor allele frequency estimate for rs9948708 was 0.27 in the Asian population sample versus 0.42 in the White sample. There was no evidence for association in either the African-American or Hispanic samples. Two SNPs on chromosome 22q12 showed statistically significant associations in the white sample with the Asian sample providing evidence for replication at rs2111833 and the African-American sample providing replication evidence for rs1421312 (Table 4). In the white sample, rs2111833 showed the strongest  associations with serum iron (observed p-value = 4.7610 27 ) and log-transformed transferrin saturation (observed p-value = 0.00014). The strongest associations with rs2111833 in the Asian sample were with UIBC (observed p-value = 0.0067) and TIBC (observed p-value = 0.007), with weaker evidence for association with serum iron (observed p-value = 0.044). Considerably stronger evidence for replication was seen at SNP rs1421312. Again, the most statistically significant associations in the white sample were with serum iron (observed p-value = 3.7610 26 ) and the log-transformed transferrin saturation (observed p-value = 0.0018). In the African-American sample, serum iron and log-transformed transferrin saturation were the two most statistically significant associations with rs1421312, with observed p-values of 0.0012 and 0.0011, respectively. The regression coefficients for serum iron and log-transformed transferrin saturation for the white and African-American samples showed opposite signs, indicating that the direction of the effects were not consistent. However, the minor allele in the white (minor allele frequency = 0.40) is the major allele in the Africans (minor allele frequency = 0.61) so the opposite signs reflect associations with opposite minor alleles. If the genotypes in the African-American sample were re-coded to reflect the minor allele in the white sample then the signs of the regression coefficients would be in the same direction. No evidence for association to chromosome 22q12 was found in the Hispanic or Asian population samples. There were more females than males in each population sample. The association analyses for SNPs on chromosomes 18q21 and 22q12 were repeated using data from females only. Although the overall sample size was smaller, the results reported in Table 5 were generally similar to those based on data from males and females (Table 4). For example, in the white sample SNPs rs2111833 and rs1421312 on TMPRSS6 showed strong associations with serum iron with observed p-values of 5.2610 26 and 7.9610 26 , respectively.

Discussion
The primary aim of the current study was to identify SNPs that showed association with iron status in cases with iron deficiency and control subjects across multiple ethnicities. Because patients presenting with iron deficiency are often diagnosed by using multiple measures of iron status, it is important to assess associations with each of these measures. Thus, we tested for associations not only between SNPs and a diagnosis of iron deficiency, but also with SI, TIBC, UIBC, TfS, SF, sTfR, and body iron. Three chromosomal regions showed evidence for association with one or more or these measures across the multiple populations that were sampled, including SNPs in the TF gene on chromosome 3q22, the TMPRSS6 gene on chromosome 22q12, and on chromosome 18q21.
Twenty SNPs in the TF gene region were significantly associated with TIBC in the white sample (observed p,4.4610 25 ). Of these, six SNPs showed replication in other ethnicities (Table 3). SNPs chosen for genotyping in the TF gene region that have previously shown an association with serum transferrin included rs1867504, rs4525863 and rs1830084 [15]  rs3811658 and rs1880669 [26], rs1358024 and rs6794945 [15,26], and rs3811647 [15,26,27]. In our study, the strongest statistical significance with TIBC, a marker for transferrin, was observed for rs3811647 in whites (observed p = 5.02610 215 ). In our previous GWAS, this association was statistically significant (p = 7.0610 29 ) and replicated in a sample of cases and controls selected from a population of white male and female veterans (observed p = 0.012) [11]. To our knowledge, our study is the first to replicate this observation in non-white population samples, with strongest evidence for replication in Hispanics (observed p = 0.00086). A novel SNP rs9948708 on chromosome 18q21 satisfied the multiple-test corrected significance level for association with TIBC in the white sample (Table 4, observed p = 2.9610 25 ), with evidence for replication in the Asian sample. To our knowledge, this SNP has not been reported in the scientific literature with regard to implications for iron deficiency. In contrast, associations between variants in the TMPRSS6 gene and hemoglobin levels were found in individuals with Indian Asian ancestry as well as in those with European ancestry [28]. In our previous GWAS, we did not find genome-wide associations with TMPRSS6 SNPs in the white sample. However, in this follow-up study, the number of iron-replete controls was increased and that may have increased the power of the study to detect significant associations with serum iron in the white sample with SNP rs2111833 (Table 4, observed p = 4.7610 27 ) and SNP rs1421312 (Table 4, observed p = 3.7610 26 ), with evidence for replication in Asians and African-Americans, respectively.
Of the 49 SNPs that were statistically significantly associated with at least one iron-related outcome (observed p,4.4610 25 ), only one SNP showed a statistically significant association at the multiple test-corrected significance level in a non-white sample. SNP rs10904850 in the CUBN gene on chromosome 10p13 was significantly associated with serum iron in the African-American sample (observed p = 1.0610 25 ), but no evidence for association with this SNP was observed in any of the three other samples. The minor allele frequency was 0.13 in African Americans, compared with 0.33 in whites, 0.17 in Hispanics, and 0.19 in Asians. The CUBN gene plays a role in vitamin and iron metabolism by facilitating their uptake. Cubilin and megalin are responsible for reabsorbing transferrin iron in the kidney urine collecting system [29] and the influence of CUBN on chronic kidney disease in African Americans has been studied [30]. We found that African-American controls had the lowest iron and TIBC concentrations  of all the groups (Table 1). A possible explanation is that the cubilin SNP is affecting serum iron through an influence on efficiency of reabsorption of iron-bearing transferrin in the kidney. In this study, four population samples were examined as a follow-up to a genome wide association study of iron deficiency conducted in white participants enrolled in the HEIRS Study. An advantage of the research design was the genotyping of a panel of markers that show large frequency differentials between major geographic ancestral groupings [31,32] and this enabled estimation of ancestry proportions for African-American and Hispanic samples. Additionally, models incorporated information from relevant demographic and clinical measures to control for the effects including environmental causes of iron deficiency. Observed similarities and differences in age, sex, iron measures and the distribution of HFE genotypes are shown in Table 1. Although frequency matched by sex and geographic location, on average, whites were three years older than the African-American sample and 4.3 years older than Hispanics and Asians. The proportion of whites with HFE gene mutations was higher than that in the other groups; there were nine C282Y homozygotes (two cases and seven controls) compare to zero in the non-white samples. Generally, females had lower observed mean values for iron-related outcomes than males, especially within the controls (Table S2). This provides further support to the inclusion of sex as a covariate in multiple regression models when there was a significant association between sex and the outcome. A limitation to the study was the relatively small sizes of the non-white population samples, thus lack of association between some SNPs and iron measures may have been due to low statistical power.
In summary, SNPs were identified that showed association across the multiple populations that were sampled. We found a novel SNP in TMPRSS6 that was associated with serum iron in whites and replicated in African Americans, suggesting a role for this SNP in increasing the risk of iron deficiency in affected persons. Our results confirm known associations between SNPs in the TF and TMPRSS6 genes with TIBC and UIBC and give evidence of their role in different ethnic groups, a unique aspect of this study, and suggest the possibility of origins in a common founder.

Supporting Information
Table S1 Basis of SNP selection for genotyping. a) SNPs significantly associated with iron-related outcomes in a previous GWAS performed in whites (GWAS, n = 107), b) SNPs tagging regions identified in the GWAS (GWAS region, n = 67), c) SNPs associated with iron status and reported in the scientific literature (Literature, n = 36),d) tag SNPs located in candidate genes for iron metabolism (Gene, n = 1029), and e) ancestry informative marker (AIM, n = 297). (DOCX)