Replication and Characterization of Association between ABO SNPs and Red Blood Cell Traits by Meta-Analysis in Europeans

Red blood cell (RBC) traits are routinely measured in clinical practice as important markers of health. Deviations from the physiological ranges are usually a sign of disease, although variation between healthy individuals also occurs, at least partly due to genetic factors. Recent large scale genetic studies identified loci associated with one or more of these traits; further characterization of known loci and identification of new loci is necessary to better understand their role in health and disease and to identify potential molecular mechanisms. We performed meta-analysis of Metabochip association results for six RBC traits—hemoglobin concentration (Hb), hematocrit (Hct), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV) and red blood cell count (RCC)—in 11 093 Europeans from seven studies of the UCL-LSHTM-Edinburgh-Bristol (UCLEB) Consortium. We identified 394 non-overlapping SNPs in five loci at genome-wide significance: 6p22.1-6p21.33 (with HFE among others), 6q23.2 (with HBS1L among others), 6q23.3 (contains no genes), 9q34.3 (only ABO gene) and 22q13.1 (with TMPRSS6 among others), replicating previous findings of association with RBC traits at these loci and extending them by imputation to 1000 Genomes. We further characterized associations between ABO SNPs and three traits: hemoglobin, hematocrit and red blood cell count, replicating them in an independent cohort. Conditional analyses indicated the independent association of each of these traits with ABO SNPs and a role for blood group O in mediating the association. The 15 most significant RBC-associated ABO SNPs were also associated with five cardiometabolic traits, with discordance in the direction of effect between groups of traits, suggesting that ABO may act through more than one mechanism to influence cardiometabolic risk.


Introduction
Red blood cell (RBC) traits are routinely measured in clinical practice. These traits, hemoglobin concentration (Hb), hematocrit (Hct), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV) and red blood cell count (RCC) are tightly regulated so that deviations from the physiological ranges are usually a sign of disease, such as hematological, infectious and immune disorders and cancer. Therefore, they serve as markers not only of specific hematological conditions but also of the general health of an individual. However, physiological values for the traits also vary between healthy individuals and between different ethnic groups, at least in part due to genetic factors [1][2][3][4][5][6][7][8][9][10][11][12][13].
Heritability varies between the different RBC traits, with estimates ranging from 40-80% for Hb and RCC, although values as low as 14% for MCH and as high as 96% for MCV have been reported [14][15][16][17]. Only a few large scale genetic association studies of RBC traits have been performed to date, the latest one identifying 75 loci potentially influencing one or more of these traits [13]. The largest studies were of Caucasian/European cohorts, followed by studies of African Americans and Asians, with several of the RBC-associated loci consistently identified across different ethnic groups. The recent use of large scale meta-analyses has accelerated the detection of loci and has emphasized the need for additional studies to identify still undetected contributors to the underlying genetic basis of these traits [1][2][3][4][5][6][7][8][9][10][11][12][13]; reviewed in [18,19]. Further characterization of known loci is equally important in order to better understand their roles in erythropoiesis, susceptibility to and development of different disorders and their impact on the clinical characteristics and prognosis in those disorders [18][19][20].
In this study, RBC traits were associated with five loci, including 11 new SNPs for ABO (the gene encoding ABO blood group), further refining these associations. Conditional analyses

Meta-analysis in the UCLEB consortium
The study characteristics are given in Table 1. The analysis was done on~4.2 million genotyped and imputed SNPs. An identical analysis was performed for each study and trait, accounting for the covariates available for that specific study, before combining the results by meta-analysis. Genomic-control inflation factors (λ GC ) varied between 0.987 for Hb in MRC NSHD to 1.041 for RCC in BWHHS in individual studies; these were used to correct the test statistics in the meta-analysis (S2 Table). Manhattan plots for the meta-analysis of each trait are shown in S2 Fig Table. In total, there were 394 non-overlapping SNPs in five loci which reached the genome-wide significance threshold of 5×10 −8 used in this study (116 SNPs associated with Hb in three loci; 15 with Hct in one locus; 371 with MCH in three loci; six with MCHC in one locus; 188 with MCV in four loci; and 40 with RCC in two loci). 338 of these SNPs (86%) were significantly associated with more than one RBC trait. The five loci with SNPs reaching statistical significance were: 6p22.1-6p21.33 (locus includes HFE as a primary candidate gene), 6q23.2 (includes HBS1L gene among others), 6q23.3 (locus does not contain any genes), 9q34.3 (includes only the ABO gene) and 22q13.1 (includes the TMPRSS6 gene among others). Overlap of the significant loci for all six RBC traits is shown in S4 Fig.
There was a strong positive correlation between hemoglobin, hematocrit and red blood cell count as well as between mean corpuscular hemoglobin and mean corpuscular volume (r > 0.75; S6 Table). All other RBC trait combinations show weak to moderate positive or negative correlation with each other (S6 Table).

Replication analysis of ABO locus in the CoLaus study
For replication, 47 SNPs were selected in/around the ABO gene. All of these were below the suggestive threshold for Hb, Hct, and RCC in the meta-analysis, out of which 15 were below the significance threshold. The CoLaus study had 2848 participants for whom both genotypes and phenotypes were available. Using a nominal significance cut-off of P < 0.05 for replication, we confirmed the association between four significant SNPs and Hb, and between 10 significant SNPs and Hct (Table 2). Two significant associations of SNPs with RCC were not replicated. Likewise, associations between five SNPs (rs600038, rs651007, rs579459, rs649129 and rs495828) and Hb and Hct were not replicated. However, CoLaus was previously part of a  two suggestive associations, respectively, were also replicated. We used the same nominal significance cut-off of P < 0.05 to call these associations replicated as they were in a locus already known to be associated with the traits (S4 Table).

Conditional analyses in HFE locus
Conditional analyses were performed using the two major HFE mutations (C282Y and H63D) as covariates, to test if the signal in this locus was due to HFE alone. C282Y and H63D lie on different ancestral haplotypes, which in case of C282Y may extend over 6 Mb and to around 700 kb or less for H63D [22][23][24]. The strong peak in this region, containing many SNPs, could be due to the association of HFE mutations, particularly C282Y, with long founder haplotypes. For all three traits, the signal in this locus disappeared in conditional analysis, indicating that these two mutations of HFE are responsible for all of the observed significant association (S5

Conditional analyses in ABO locus
Conditioning on blood groups. At the ABO locus, using Hb as an example, analysis conditional on O blood group haplotype caused the signal at~135.154 Mb (with rs507666 as lead SNP) to drop under the suggestive threshold while some SNPs at~136.131 Mb and proximal became significant at the suggestive level (Fig 1B) compared to the unconditional analysis. Similarly, conditioning on AO blood group haplotype caused the same signal at~135.154 Mb to drop around and below the suggestive threshold (Fig 1C). Analyses conditional on AA and B blood group haplotypes did not differ from the unconditional analysis (Fig 1D and 1E), while in analysis conditional on AB blood group haplotype a group of SNPs directly proximal to rs507666 increased in significance (Fig 1F). However, just as in the unconditional analysis, only three SNPs from this group were above the significance threshold: rs28850884, rs9411378 and rs550057. Analyses run separately on BO and BB blood group haplotypes did not differ from those in unconditional analysis and so were combined. The distribution of the three traits associated with the ABO locus according to blood groups is shown in Table 3, with each trait showing highly significant (p 0.0001) variation across the blood groups.
Conditioning on associated RBC traits. In analyses conditional on RBC traits we expect to see reduction in the association signal if the trait we are conditioning on plays a role in the association. Conditional analyses of all three traits, taking each one in turn as the dependent variable with the other two as covariates, showed total loss of signal at the ABO locus (Fig 2). When conditioned on Hct, the signal for Hb at the HFE locus disappeared (Fig 2A). However, when the analysis was conditional on RCC, the signals for Hb and Hct at both HFE and TMPRSS6 loci rose (Fig 2B and 2D). The same happened when the analysis for RCC was conditional on Hb and Hct, with both HFE and TMPRSS6 loci becoming significant (Fig 2E and 2F). The signal at the HBS1L locus remained unaffected by conditional analyses (Fig 2E and 2F).

Association of ABO SNPs with other cardiometabolic traits
Due to the richly phenotyped studies within the UCLEB consortium we were able to access the results of prior large scale meta-analyses of a large number of 'intermediate' continuous traits,  predominantly cardiometabolic traits, including lipids/lipoproteins, coagulation factors and demographic factors [25]. We identified five cardiometabolic traits that were significantly associated with the ABO locus (S7 Fig): factor VIII (FVIII), von Willebrand factor (log transformed, logVWF), low-density lipoprotein cholesterol (LDL), total cholesterol (TC), and alkaline phosphatase (log transformed, logALP), all of which have been previously reported to be associated with ABO SNPs [26][27][28][29][30][31][32]. The study characteristics for these traits are given in S5 Table. We directly compared the effect sizes of 15 RBC-associated ABO SNPs across the three RBC and five additional cardiometabolic traits in the UCLEB consortium dataset using standardized regression coefficients (β; Fig 3). Direct comparison showed variation in effect size from β = -0.30 for logALP on rs507666 to β = 0.52 on the same SNP for logVWF, and from β = -0.25 to β = 0.47 on the widely cited rs651007. The direction of the minor allele effects on RBC traits and logALP were negative for all SNPs while the same alleles had positive effects on LDL-and total-cholesterol and coagulation factors, most notably logVWF. Only FVIII showed weak correlation with some of the RBC traits, all other traits were not correlated with RBC traits.

Discussion
Seven studies in the UCLEB Consortium with a total of 11 093 individuals of European ancestry contributed data for this meta-analysis of six red blood cell traits. There were 394 non-overlapping SNPs in five loci associated with the RBC traits at genome-wide significance level, out of which 86% were significantly associated with more than one trait. The five loci (with candidate gene) identified were: 6p22.1-6p21.33 (HFE), 6q23.2 (HBS1L), 6q23.3 (contains no genes), 9q34.3 (ABO) and 22q13.1 (TMPRSS6). Associations at the HFE, HBS1L and TMPRSS6 genes with RBC traits are well established across different ethnic groups, adults and/or children and associated with one or more red blood cell traits [2-4, 6-10, 12, 13, 21, 33]. Both HFE and TMPRSS6 are involved in iron homeostasis by regulating the production of hepcidin, which is considered to be the "master" iron regulator; mutations in HFE cause the most common form of hereditary hemochromatosis while mutations in TMPRSS6 cause iron-refractory iron deficiency anemia (IRIDA). SNPs in the HBS1L-MYB locus are associated with fetal hemoglobin levels and all hematological traits of the three major blood-cell lineages: white blood cells, RBC and platelets. SNPs in the 6q23.3 locus show association with MCH and MCV in several studies. This locus does not contain any genes itself, however, the nearby CITED2 gene has been reported as a candidate gene [6,7,9,13]. CITED2 inhibits transactivation of hypoxia-inducible factor 1-alpha (HIF1A)-induced genes affecting essential physiological processes, including iron homeostasis. Underlying correlation between the traits is reflected in the results. High correlation between Hb, Hct and RCC shows in the association of these traits with ABO locus, while high correlation between MCH and MCV shows at almost all other associated loci. Similar patterns were observed in the CHARGE Consortium [6], where it was suggested that the observed patterns characterize various clinical hematological diseases and provide context for interpretation of the results. We have also replicated and confirmed the association of ABO SNPs with Hb, Hct and RCC in Europeans, previously reported in East Asians, Caucasians and African-Americans [5,7,9,10,13,34]. By imputation to 1000 Genomes data, we identified 11 new SNPs significantly associated with these traits and replicated all but one in an independent population, further refining the associations. We then focused on the characterization of the ABO locus, looking to dissect its association with RBC traits by means of conditional analysis. First we investigated whether these three RBC traits were associated with the identified loci, especially ABO, independently. The results indicated that the signals at HFE and TMPRSS6 were mediated through the MCV component of Hct and through Hb. This is in agreement with findings that patients with hereditary hemochromatosis have higher mean Hb and MCV, and lower mean RCC than controls [35,36], while in iron-deficiency states such as IRIDA, low values for Hb and MCV occur, with RCC being higher [2,37]. In both conditions, changes in Hb and MCV are in the opposite direction to RCC, consistent with our observations in the conditional analysis. The HBS1L signal for RCC appeared not to be influenced by Hb or Hct, while the signal at the ABO locus was influenced by all three traits. Consistent with this association of ABO with RBC traits, a recent study of biomarkers of iron status found that the ABO locus was associated with serum ferritin concentration, an indicator of body iron stores, in Europeans [38]. A study conducted in South Eastern Nigeria found a significant difference in serum iron concentration and total iron binding capacity (TIBC) between the ABO blood groups, with group A having highest and group O the lowest values for both iron parameters [39]. Given that the other four loci we identified are well accepted as having a direct or indirect role in iron homeostasis, it is an intriguing possibility that the ABO locus may also influence RBC traits through an effect on iron.
Further, we looked at the distribution of the traits associated with the ABO locus in our study according to haplotypes predicting blood group, which showed a significant difference in trait means between the blood group types. Results from analyses conditional on the genetic determinants of blood groups suggested that the locus at~135.154 Mb is greatly influenced by the blood group O deletion. They also indicated the possibility of two loci here: one at the 5 0 end at~135.15 Mb with rs507666 as the lead SNP, and another one proximal, between~135.15 and~135.14 Mb, with rs28850884 as lead SNP. However, these two loci are not entirely independent-SNPs in both loci are in moderate LD with each other and the signal on both of them disappears completely in analysis conditional on rs507666 (results not shown). Since the ABO locus is also associated with several other traits and diseases, including lipids, coagulation factors, markers of endothelial function, cardiovascular disease, infectious disease and cancer [26][27][28][29][30][31][32][40][41][42], we investigated whether other traits available in the UCLEB Consortium are associated with it. This directly demonstrated the pleiotropic associations of the same ABO SNPs with other cardiometabolic traits: coagulation factors (FVIII, logVWF), lipids (LDL, TC) and a liver function trait (logALP). This is not surprising given that, in addition to their expression on red blood cells, ABO antigens are widely expressed on different cell types, including vascular endothelium and epithelium [42,43]. Hence they have potential for wide influence upon human traits. ABO determinants are also present in body fluids and are carried on the coagulation protein von Willebrand factor, consistent with the ABO locus associations we demonstrated with these traits [44]. The direction of effect on the ABO alleles associated with cardiometabolic traits in our study were consistent with previous reports. For different traits, these effect directions are widely reported to be associated with generally either beneficial or detrimental effects upon cardiometabolic risk. Our results show that for the 15 ABO SNPs most strongly associated with RBC traits, an increase in copies of the minor allele was associated with increased coagulation factor and lipid levels and decreased RBC traits and alkaline phosphatase. This effect was reversed for blood group O, resulting in decreased coagulation factor and lipid levels and increased Hb, Hct, RCC and ALP, both acting protective. These findings add to the mounting evidence for the role of the ABO blood group as a risk factor for development of cardiovascular disease, thrombosis, diabetes mellitus and cancer, with the O blood type being protective [45][46][47][48][49].
Studies such as ARIC [50] and HEIRS [51] showed that HFE C282Y homozygosity, which is associated with increased iron absorption, is also associated with lower total and LDL cholesterol. Equally, studies show that triglyceride, total cholesterol and LDL cholesterol are decreased in women with iron deficiency anemia [52,53]. Both conditions have in common the diminished production of hepcidin, which in turn leads to iron-depleted macrophages which may negatively regulate systemic cholesterol levels and so offer protection from atherosclerosis [54]. Evidence from mouse studies suggests that macrophage iron depletion may have pro-inflammatory effects [55]; reviewed in [56], which could be relevant to O blood group individuals as discussed above. rs651007, close to the 5 0 end of the ABO gene and significantly associated with RBC traits in our study, was the only SNP associated with decreased ferritin concentration and decreased atherosclerosis risk in a recent study of the role of iron and hepcidin in atherosclerosis [57]. Additionally, increased levels of IL-10 have been found in acute coronary syndrome cases compared to healthy controls, reflecting an underlying inflammatory state, with levels of IL-10 being higher in blood type O compared to individuals with other blood types and associated with poor outcome in these patients [58]. They suggested that inflammation is a more important risk factor in these individuals, while increased coagulation is more important in non-O blood type [58].
Finally, the current discovery analysis is somewhat biased towards loci already known to be associated with cardiometabolic disease due to the design of Metabochip [59]. Imputation to the 1000 Genomes expanded the regions covered by the chip but could not completely fill the gaps. Analysis based on GWAS coverage imputed to the latest 1000 Genomes phase could discover new loci associated with these traits. Most of the studies, including ours, focus on classic ABO blood group phenotypes. The complexity of the ABO blood group system extends to a number of different subtypes and the way that they may influence the traits and disease development requires more detailed investigation. However, we were able to further dissect the association between ABO locus and RBC traits as well as show how the same SNPs influence range of traits and put it in context of cardiometabolic risk. Our results support the suggestion by Johansson et al. [58] that the ABO blood group of an individual may determine which risk factors are more important for them.

Studies and phenotypes
The UCL-LSHTM-Edinburgh-Bristol (UCLEB) Consortium has been described in detail in Shah et al. [25]. Six RBC traits were analyzed: Hb, Hct, MCH, MCHC, MCV and RCC. We calculated missing traits as detailed in S1 Note. The total sample size in seven eligible studies was 11 093. The Cohorte Lausannoise [60] (CoLaus, N = 2848) study was approached for replication of the results due to the availability of both genotypes imputed to 1000 Genomes and the traits of interest. Discovery and replication study design are further explained in S1 Note. All participants were of European ancestry.

Ethics Statement
Each study of the UCLEB consortium was approved by the appropriate local research ethics committee and all participants gave written informed consent, with details as follow: 1958BC -Written consent was obtained from participants for the use of information in medical studies. Participants were asked for written informed consent to review their medical records and for permission to perform anonymized genetic tests; CaPS-Ethics approval was obtained from the South East Wales Local Research Ethics Committee, and each subject signed their agreement to be involved; EAS-Approval for the study was given by the Lothian Health Board ethics committee and written informed consent was obtained from each subject; ELSA-Ethical approval for ELSA was given by the National Research Ethics Service and all participants gave written consent; ET2DS-Ethical permission was obtained from the Lothian Medical Research Ethics Committee. All subjects attended a dedicated research clinic where they gave written informed consent for participation in the study; MRC NSHD-Ethical approval was obtained from the Central Manchester Research Ethics Committee (07/H1008/168 and 07/H1008/245) and North Thames Multi-Centre Research Ethics Committee (MREC 98/1/121). Informed written consent was obtained from the study members; WHII-In Whitehall II, participants provided their written informed consent and the National Health Services (NHS), Health Research Authority, National Research Ethics Service (NRES) Committee London-Harrow approved the consent procedure.
Blood samples were collected and analyzed using standard methods and assays. For the replication study, the Institutional Review Board of the Centre Hospitalier Universitaire Vaudois (CHUV) in Lausanne and the Cantonal Ethics Committee approved the CoLaus study protocol and signed informed consent was obtained from participants. Starting in 2009 all participants were invited for a follow-up visit five years after the initial study, completed in 2012. This follow-up study was approved by the local ethics committee.

Genotypes and imputation
All studies with RBC traits were genotyped using Metabochip and imputed to the 1000 Genomes (Phase1, CEU haplotype set). Genotyping, quality control (QC) procedures and imputation were performed in each study separately using the same protocol [25]. The exclusion criteria for samples and SNPs consisted of an Illumina GenCall score < 0.15, sample and SNP rates < 0.95, HWE P value 0.001, replicate discordance, discrepancy in sex, ethnicity and relatedness checks or against previous genotype data, where available. No MAF cut-off was applied at this stage. Imputation was done on cleaned genotypes in three phases-chunking, phasing and imputing-using the MACH1 and Minimac software [61,62] and applying the strategy described at Minimac:1000 Genomes Imputation Cookbook. The resulting file contained~4.5 million autosomal SNPs (R 2 > 0.3, MAF > 0.001). The data were collated in posterior probabilities from genotype dosage, incorporating imputation uncertainty, to be used by the R Package snpStats.

Association and meta-analysis
Based on descriptive statistics and visual inspection (Table 1; S1 Fig), association testing was done on untransformed data, separately for each study and on each trait. The associations between genotyped and imputed SNPs and traits were tested by linear regression using an additive genetic model. Adjustment was made for age, sex and diabetes status, where appropriate. Analyses were carried out using snpStats. Prior to meta-analysis, SNPs with MAF frequency < 0.005 were filtered out. Genomic-control inflation factors (λ GC ) were calculated and used to correct test statistics from each study for each trait in the meta-analysis. Study-specific estimates of effect size were combined by fixed effects meta-analysis weighted by inverse variance using the METAL program. Genomic-control inflation factors were calculated again after the meta-analysis, but the results were not further corrected. Significance threshold was set at P < 5×10 −8 , suggestive at P < 5×10 −6 . Locus (significant or suggestive) was defined as all SNPs below the threshold with SNPs located less than 500 kb apart. LocusZoom [63] was used to display results, using 1000 Genomes Phase 1 CEU haplotype set for linkage disequilibrium (LD) calculation. The positions of SNPs are given on human genome build 19.

Conditional analyses
Conditional analysis was performed using the SNPs encoding the two major HFE mutations, C282Y (rs1800562) and H63D (rs1799945), as covariates to further investigate if the signal at this locus for Hb, MCH and MCV was due to HFE alone. Also, to further dissect the signal at the ABO locus, we used analyses conditional on the genetic determinants of blood groups (AA, AO, B, AB and O) with a blood group entered as a binary variable one at the time (e.g. O vs all other groups). The three SNP haplotypes used to derive blood groups are presented in Table 4. To investigate whether the signal in the ABO locus for Hb, Hct and RCC affects a trait independently of the other two traits, the subset of five studies which had data on all three traits was analyzed. For a given trait, the other two traits were included in the model as covariates one at a time [64].

Association with other traits within UCLEB
To assess pleiotropy by direct comparison within UCLEB, we ran an association study on standardized phenotypes (mean = 0, SD = 1). Effect sizes (standardized regression coefficients, β) were visualized using phenotypicForest 0.2.
Pearson's r was calculated to assess correlation between all the traits within the study.     S2 Table. Genomic-control inflation factors (λ GC ) for each study and trait. λ GC were recalculated for each trait after meta-analysis. (DOCX) S3 Table. Meta-analysis results. All SNPs significantly associated with one or more red blood cell traits (P < 5×10 −8 ) in seven studies from the UCLEB consortium (separate file). (XLSX) S4 Table. Results of replication in the CoLaus study. Results for 47 SNPs in/around ABO gene, which were either significantly or suggestively associated with Hb, Hct and RCC in seven studies from the UCLEB consortium. The rs numbers of the SNPs which were significantly associated with Hb, Hct and/or RCC (P < 5×10 −8 , N = 15) in the discovery study are highlighted in bold and the corresponding fields are marked in blue; these are additionally shown in Table 3. The significant replication results (P < 0.05) are marked in red (separate file). (XLSX) S5 Table. Study characteristics on other cardiometabolic traits associated with ABO SNPs and available in UCLEB. (DOCX) S6 Table. Pearson's r correlation coefficient between six red blood cell traits and five cardiometabolic traits. (XLSX)