Identification of Nine Novel Loci Associated with White Blood Cell Subtypes in a Japanese Population

White blood cells (WBCs) mediate immune systems and consist of various subtypes with distinct roles. Elucidation of the mechanism that regulates the counts of the WBC subtypes would provide useful insights into both the etiology of the immune system and disease pathogenesis. In this study, we report results of genome-wide association studies (GWAS) and a replication study for the counts of the 5 main WBC subtypes (neutrophils, lymphocytes, monocytes, basophils, and eosinophils) using 14,792 Japanese subjects enrolled in the BioBank Japan Project. We identified 12 significantly associated loci that satisfied the genome-wide significance threshold of P<5.0×10−8, of which 9 loci were novel (the CDK6 locus for the neutrophil count; the ITGA4, MLZE, STXBP6 loci, and the MHC region for the monocyte count; the SLC45A3-NUCKS1, GATA2, NAALAD2, ERG loci for the basophil count). We further evaluated associations in the identified loci using 15,600 subjects from Caucasian populations. These WBC subtype-related loci demonstrated a variety of patterns of pleiotropic associations within the WBC subtypes, or with total WBC count, platelet count, or red blood cell-related traits (n = 30,454), which suggests unique and common functional roles of these loci in the processes of hematopoiesis. This study should contribute to the understanding of the genetic backgrounds of the WBC subtypes and hematological traits.


Introduction
White blood cells (WBCs) mediate immune systems, and play essential roles in defending the body against invading foreign microorganisms [1]. WBCs consist of a variety of cells that mediate diverse roles, and are morphologically classified into 5 main subtypes: neutrophils, lymphocytes, monocytes, basophils, and eosinophils [1]. A number of previous studies have demonstrated significant contributions of these WBC subtypes to the regulation of innate and adaptive immune systems [2][3][4][5][6]. Since the number of WBC subtypes circulating in peripheral blood are tightly regulated, and abnormality in their numbers are closely linked to the presence and prognosis of diseases [2][3][4][5][6], the counts of WBC subtypes are widely used as important blood markers in medical treatment. Therefore, elucidation of the mechanism(s) that regulates the counts of WBC subtypes would have substantial clinical impact and would provide new insights into the etiology of the immune system.
WBC subtypes are known to be heritable traits and several epidemiological studies have suggested the existence of genetic factors that explain the variations in the counts of WBC subtypes, as well as a number of common environmental factors such as age, sex, and smoking [7][8][9][10]. Recently, genome-wide association studies (GWAS) have identified a number of genetic loci that affect hematological traits, but most of these identified loci were determined to be associated with red blood cell (RBC) or platelet (PLT) -related traits or total WBC count [11][12][13][14][15][16][17]. However, studies investigating WBC subtypes have yet to be further assessed [18][19][20][21]. Moreover, it is also of interest to evaluate whether ethnic differences underlie the genetic backgrounds that affect hematological traits.
Previous studies for hematological traits have also suggested that several genetic loci have pleiotropic associations with other hematological traits [15][16][17][18]. Therefore, it is of interest whether the WBC subtype-associated genetic loci have pleiotropic associations with counts of other WBC subtypes, RBCs, and PLTs, when considering the biological roles of the loci in the processes of hematopoiesis.
In this study, we report a large-scale GWAS for the counts of the WBC subtypes in 14,792 Japanese subjects enrolled in the BioBank Japan Project [22]. Subsequently, we performed pleiotropic association analysis of the identified WBC subtype-associated loci. We evaluated the associations of the loci identified in the Japanese population using data obtained by cohorts of Caucasian populations [23], in order to highlight the ethnically common and divergent genetic backgrounds of WBC subtypes.

GWAS
In the GWAS for the WBC subtypes, we enrolled 8,794 Japanese subjects. The counts of the 5 main WBC subtypes (neutrophils, lymphocytes, monocytes, basophils, and eosinophils) of the subjects were collected from medical records and summarized in Table S1. We found moderate degree of correlation (R 2 .0.1) between the neutrophil and monocyte counts, between the basophil and lymphocyte counts, and between the basophil and eosinophil counts (Table S2). These results were considered to be compatible with previous reports [7][8][9]. To relatively compare the effect sizes on the traits, we carried out normalization of the counts of the respective WBC subtypes. The subjects with normalized values beyond 64SD were discarded, which accounted for less than 0.5% of the total subjects.
Genotyping was performed with over 590,000 SNP markers using Illumina610-Quad Genotyping BeadChip (Illumina, CA, USA). We applied stringent quality control criteria, including principal component analysis (PCA) [24] to evaluate potential population stratification, and obtained genotype data for 481,110 autosomal SNPs. To extend the genomic coverage, we subsequently performed the whole-genome imputation of the SNPs, using HapMap Phase II genotype data of Japanese (JPT) and Han Chinese (CHB) individuals as references [25]. After the imputation, 2,178,645 autosomal SNPs that satisfied the criteria of a minor allele frequency (MAF) $0.01 and an imputation score (Rsq value by MACH software [26]) $0.7 were obtained. The associations of these imputed SNPs with the transformed values of the counts of WBC subtypes were evaluated using a linear regression model.
Quantile-Quantile plots of P-values indicated remarkable departures from the null hypothesis in their tails, except for the lymphocyte count ( Figure S1). Inflation factors of P-values, l GC [27], were as low as 1.024-1.038, which suggested no substantial population stratification existed in our study population as previously anticipated for the Japanese population [28]. We identified 10 significant associations that satisfied the genome-wide significance threshold of P,5.0610 28 in the GWAS for the counts of neutrophils, monocytes, basophils and eosinophils ( Figure 1 and Table S3). We also evaluated the associations in the previouslyreported WBC subtype-associated loci, and observed significant associations in six of these loci (the PSMD3-CSF3 and PLCB4 loci for the neutrophil counts [21], the MHC region for the lymphocyte counts [20], the IL1RL1, IKZF2, HBS1L-MYB loci for the eosinophil counts [18], P,0.005; Table S4).

Combined analysis of GWAS and replication study
We subsequently performed a replication study using independent 5,998 Japanese subjects, and further evaluated the associations of the loci by combining the results of the GWAS and the replication study. We selected a total of 36 genetic loci that showed P,5.0610 26 in the GWAS for any of the WBC subtypes as the candidates for inclusion in the replication study. As a result of combined study, we finally identified 12 genetic loci that satisfied genome-wide significance threshold of P,5.0610 28 . Of these, the top-associated SNPs in 2 loci were genotyped and the SNPs in 10 loci were imputed with imputation score of Rsq.0.90. Specifically, we found 2, 4, 4, and 3 loci for neutrophil, monocyte, basophil, and eosinophil counts, respectively (Table 1, Table S3, and Figure 1). One locus was shared between basophil and eosinophil counts. On the other hand, no significant association was detected for the lymphocyte count in the combined study.
Among the identified loci in the combined study, 4 loci were the replication for the previous studies: rs4794822 in the PSMD3-CSF3 locus for the neutrophil count [21], rs4328821 in the GATA2 locus, rs2516399 in the MHC region, and rs9373124 in the HBS1L-MYB locus for the eosinophil count [18]. Associations in the other 9 loci were novel findings to our knowledge ( Figure 2). Specifically, we identified associations in one locus for the neutrophil count (rs445 in the CDK6 locus), 4 loci for the monocyte counts (rs12988934 in the ITGA4 locus, rs3095254 in the MHC region, rs10956483 in

Author Summary
White blood cells (WBCs) are blood cells that mediate immune systems and defend the body against foreign microorganisms. It is well known that WBCs consist of various subtypes of cells with distinct roles, although the genetic background of each of the WBC subtypes has yet to be examined. In this study, we report genome-wide association studies (GWAS) for the 5 main WBC subtypes (neutrophils, lymphocytes, monocytes, basophils, and eosinophils) using 14,792 Japanese subjects. We identified 12 significantly associated genetic loci, and 9 of them were novel. Evaluation of the associations of these identified loci in cohorts of Caucasian populations demonstrated both ethnically common and divergent genetic backgrounds of the WBC subtypes. These loci also indicated a variety of patterns of pleiotropic associations within the hematological traits, including the other WBC subtypes, total WBC count, platelet count, or red blood cell-related traits, which suggests unique and common functional roles of these loci in the processes of hematopoiesis.

Associations of the identified loci in Caucasian populations
To highlight the ethnically common and divergent genetic backgrounds of the WBC subtypes, the associations of the 12 identified loci were further evaluated in Caucasian populations by using 15,600 subjects in cohorts of the CHARGE Consortium (Table S5) [23]. The CHARGE Consortium consists of multiple community-based and prospectively designed cohorts from the United States and Europe [30] and has performed association studies for hematological traits, including the WBC subtypes [23]. We observed the same directional effects of the alleles in all 12 loci evaluated in the CHARGE Consortium. Furthermore, significant associations were observed in 4 loci (the PSMD3-CSF3 locus for the neutrophil count, the ITGA4 locus for the monocyte count, and the GATA2 locus for both the basophil and eosinophil counts; P,0.004). We also observed the suggestive associations in 2 loci (the CDK6 locus for the neutrophil count and the HBS1L-MYB locus for the eosinophil count; P,0.05).

Pleiotropic association study for WBC subtype-associated loci
We next evaluated the pleiotropic associations of the WBC subtype-associated loci. For the top-associated SNPs in each of the loci that indicated significant associations in our study (P, 5.0610 28 ), we evaluated the associations with the counts of the other WBC subtypes, total WBC count, RBC-related traits (RBC count, hemoglobin [Hb], hematocrit [Ht], mean corpuscular   [18] a SNPs that satisfied genome-wide significance threshold of P,5.0610 28 in the combined study in Japanese population were indicated. b The allele that increased the count of the corresponding WBC subtype was denoted as effect allele (EA) and is indicated based on the forward strand.  (Table S6). We found various patterns of pleiotropic associations for the loci associated with the WBC subtypes ( Figure 3A and Figure 4). Three loci demonstrated specific associations with the original WBC subtypes identified in the GWAS (rs12748961 in the SLC45A3-NUCKS1 locus, rs12988934 in the ITGA4 locus, rs11018874 in the NAALAD2 locus), although 9 other loci demonstrated pleiotropic associations with other traits. The most pleiotropic associations were observed in the HBS1L-MYB locus, which indicated significant associations with all of the evaluated hematological traits (P,0.005). The T allele of rs9373124 that increased the eosinophil count also increased the counts of the other WBC subtypes, total WBC count, RBC count, the Hb and Ht levels, and conversely decreased MCH, MCHC, MCV, and PLT count, validating its substantial role in hematopoiesis [15,31].
In the GATA2 locus, we observed significant associations with both the basophil and eosinophil counts (P,5.0610 28 ; Figure 3B). This locus encompassed several genes, although GATA2 seemed to be most responsible for regulating eosinophils and basophils from a functional standpoint [3,32]. Interestingly, the peaks of the associations of the SNPs in the basophil and eosinophil GWAS showed concordance at rs4328821. Possession of the A allele of rs4328821 increased both the basophil and eosinophil counts ( Figure 3C). The subjects who were homozygous for the A allele had 1.28-fold and 1.19-fold higher basophil and eosinophil counts, respectively, compared with the corresponding levels of the subjects who were homozygous for the G allele. Moreover, rs4328821 significantly explained 2.7% of the correlation between the basophil and eosinophil counts (permutation P,1.0610 29 ). Upon combining the effects of the SNPs in the identified loci, up to 2.1% of the variations of the counts of the WBC subtypes was explained, and up to 8.0% of the correlation between the WBC subtypes was explained (Table S2).

Discussion
Through a GWAS and a replication study consisting of 14,792 Japanese subjects, our study identified 12 loci that were significantly associated with the counts of WBC subtypes. Among the identified loci, 9 loci are reported for the first time in this study. The identified loci demonstrated a variety of patterns of pleiotropic associations within the WBC subtypes and/or with total WBC count, RBC- Two loci identified in the GWAS for the neutrophil count (the CDK6 and PSMD3-CSF3 loci) have previously been reported to be associated with the total WBC count [16,17]. Since neutrophils are the most abundant subtype of WBCs and typically comprise 50-70% of the total WBC count, the associations between these loci with the total WBC count would have reflected the associations with the neutrophil count [21].
We identified 4 novel loci associated with monocyte count (the ITGA4, MLZE, and STXBP6 locus and the MHC region). The landmark SNP in the MHC region was located near the HLA-C gene and in moderate LD with the particular HLA-C allele, which belongs to MHC class I molecules. ITGA4 encodes the a4 chain of the integrins, which mediate migration of the WBCs [33]. Previous reports have demonstrated that STXBP6 (also known as amisyn) binds to the components of the SNARE complex, which mediates membrane fusions including phagocytosis [34,35]. In response to inflammation, monocytes differentiate into macrophages and migrate into affected tissues of inflammation, and subsequently perform phagocytosis and antigen presentation using MHC molecules expressed on the cell surface [6]. Therefore, associations of the SNPs in these loci with the monocyte count would be plausible from a biological perspective. Recently, clinical benefits of inhibition of a4 integrin have been demonstrated in the treatment of autoimmune diseases [36]. Although further functional investigation is necessary, the SNP in the ITGA4 locus that was identified in our GWAS may be a promising target for pharmacogenomics of anti-a4 integrin therapy. MLZE (also known as GCDMC) belongs to the Gasdermin family of genes [37], and its role in the regulation of the monocyte count should be further explored.
In addition, we identified 4 novel loci associated with the basophil count (the SLC45A3-NUCKS1, GATA2, NAALAD2, and ERG loci) and replicated associations in 3 previously-reported loci associated with the eosinophil count (the GATA2 and HBS1L-MYB loci and the MHC region). Basophils and eosinophils coordinately mediate allergic inflammation [3][4][5], and the correlation of these counts [7] suggested the existence of genetic factors that are shared between them. Our pleiotropic study demonstrated overlap of the associated loci between the basophil and eosinophil counts, which was most highlighted in the GATA2 locus. GATA2 is a well-known zinc-finger transcription factor and plays an essential role in hematopoiesis, particularly in the regulation of basophils and eosinophils [3,32]. The landmark SNP in the GATA2 locus was concordant in the GWAS for both the basophil and eosinophil counts, and significantly explained part of their correlation in the counts. Pleiotropic associations of the SNP with the basophil and eosinophil counts were further replicated in the Caucasian populations. These results suggested an ethnically-shared substantial functional role of the SNP in the etiology of GATA2.
ERG encodes a member of the Ets family of transcription factors, and is known to be included in the Down syndrome critical region on chromosome 21 [38]. Although its functional role(s) in the regulation of basophils has not been investigated to date, an essential role of ERG for definitive hematopoiesis has been demonstrated [38]. The SLC45A3-NUCKS1 locus in the present study encompassed several genes, and we submit that the functional origin of this locus should be further investigated. Interestingly, the fusion transcript of SLC45A3 and ERG is observed in prostate cancers, which have been characterized by the overexpression of ERG mRNA [39,40], although we did not find any significant gene-gene interaction in the SNPs in these two loci for basophil count (data not shown). Finally, NAALAD2 is a member of the N-acetylated a-linked acidic dipeptidase gene family [41], and its role in the regulation of the basophil counts should be a topic to be further investigated in the future.
In contrast to the WBC subtypes mentioned above, no significant association was detected for the lymphocyte count. One probable explanation for this finding is that lymphocytes can be further divided into a variety of subsets, such as natural killer (NK) cells, T cells, and B cells, which were not examined specifically in this study. Therefore, future GWAS that focus on those specific subsets of lymphocytes [20] are necessary to efficiently investigate the genetic backgrounds of the lymphocytes.
Several points about this study bear discussion. First, since our study populations consisted of the disease patients, it would be useful to assess the possibility that the disease status might have confounded the results. When the respective disease groups were analyzed separately and evaluated through meta-analysis, all the identified loci satisfied the significant associations (P,5.0610 28 ) without significant heterogeneities of the effects (a = 0.01). None of the identified loci has been reported to be associated with the risk of the diseases enrolled in the study population, except for the MHC region with Rheumatoid Arthritis (RA) [42]. Moreover, after the subjects affected with RA were excluded, the significant associations of the SNPs in the MHC region with the monocyte and eosinophil counts were observed (P = 1.7610 210 for rs3095254 and P = 9.66 10 212 for rs2516399, respectively). It would be of note that we observed concordance of the associations in the identified loci with those by CHARGE consortium, which is consisted of multiple community-based and prospective cohorts incorporating normal populations. Although further validation study using non-affected subjects would be desirable, these observations suggested that the utilization of disease patients have not induced substantial bias in our study.
Second, the counts of the WBC subtypes were based on medical records. Although the data collection protocol is enormously standardized [22], there is a possibility that unstandardized discrepancy among the medical institutes might induce bias in the phenotype distributions and impair statistical power of the study.
Third, the explained proportion of the WBC subtypes by the identified loci would be estimated conservatively. Because of the stringent significance threshold adopted in the study, a number of associated loci with moderate effect sizes would be still unidentified. Further approaches, such as considering the entire SNPs simultaneously [43], are necessary for the accurate estimation of the explained variations. Candidate gene analysis based on the biological pathway of the WBC subtypes would also be a promising approach to uncover these unidentified loci [44].
Fourth, since the correlation among the traits could modulate the pattern of the pleiotropic associations, further distinction of actual pleiotropic associations from simple associations induced by the correlations would be a topic to be investigated.
In summary, our study identified 9 novel loci that are associated with the counts of the WBC subtypes. The pleiotropic association study of the identified loci demonstrated unique and common genetic backgrounds underlying the WBC subtypes. Our study should contribute to the general understanding of the etiology and regulation of the WBC subtypes.

Subjects
The subjects enrolled in the GWAS, and in the replication study for WBC subtypes (n = 14,792), and in the pleiotropic association study for hematological traits (n = 30,454) consisted of patients that were classified into 27 disease groups (Tables S1 and S6). The subjects in the pleiotropic association study included the subjects in the GWAS and the replication study. All subjects were collected under the support of the BioBank Japan Projects [22]. Subjects who were determined to be of non-Japanese origin by either selfreport or by PCA in GWAS were excluded from analysis. Some of the subjects in this study have also been included in our previous studies [17,21,45,46]. All participants provided written informed consent as approved by the ethical committees of the BioBank Japan Project [22] and the University of Tokyo. Clinical information of the subjects including age, gender, and smoking history were collected by self-report on the questionnaire. The laboratory data including the counts of the WBC subtypes and other hematological traits were collected from medical records by the professional medical coordinators according to the standardized protocol [22]. The details of the study enrolled by the CHARGE Consortium, including subject details and the study design, are described at length elsewhere [23], and are summarized in Table S5.

Genotyping and quality control
In the GWAS for the WBC subtypes, 592,232 SNPs were genotyped for 8,943 subjects using Illumina HumanHap610-Quad Genotyping BeadChip. We excluded 77 subjects with call rates ,0.98 in the process of genotyping. After this initial exclusion, SNPs with call rates ,0.99 or with ambiguous clustering of the intensity plots, or non-autosomal SNPs, were excluded. We excluded 67 closely related subjects based on the identity-by-descent (IBD), which was estimated using the ''-genome'' option implemented in PLINK version 1.06 [47]. For each pair with a 1st or 2nd degree of kinship, we excluded the one member of the pair with lower call rates than the other. We then excluded subjects whose ancestries were estimated to be distinct from East-Asian populations using PCA performed by EIGENSTRAT version 2.0 [24]. We performed PCA for the genotype data of our GWAS along with the genotype data of Phase II HapMap populations (unrelated European (CEU), African (YRU), and East-Asian (JPT + CHB) individuals) (release 24) [25]. Based on the PCA plot of the subjects, we visually identified and excluded 5 outliers in terms of ancestry from JPT + CHB clusters. Subsequently, the SNPs with MAF ,0.01 or with an exact P-value of the Hardy-Weinberg equilibrium test ,1.0610 27 were excluded. Finally, we obtained 481,110 SNPs for 8,794 subjects.
After the quality control criteria mentioned above were applied, genotype imputation was performed using MACH 1.0 [26] in a two-step procedure [46]. The genotype data of Phase II HapMap JPT and CHB individuals (release 24) [25] were adopted as references. In the first step of the imputation, recombination and error rate maps were estimated using 500 randomly selected subjects from those enrolled in the GWAS. In the second step, genotype imputation of all subjects was performed using the estimated recombination and error rate maps. Quality control filters of MAF $0.01 and Rsq values $0.7 were applied for the imputed SNPs.
The genotype data of the SNPs enrolled in the replication or pleiotropic association study were obtained from the genome-wide screening data of the BioBank Japan Project [22]. Genotyping was performed using either Illumina HumanHap550v3 Genotyping BeadChip or Illumina HumanHap610-Quad Genotyping Bead-Chip, and the same quality control filters and imputation procedure were applied.

Statistical analysis
The common log-transformed values of the counts of each of the WBC subtypes were adjusted for gender, age, smoking history, and the affection statuses of the subjects with the disease groups (Table S1), using linear regression by R statistical software (version 2.11.0). Then the residuals were normalized as Z scores, and the subjects with Z score .4.0 or ,24.0 were excluded in each of the traits. Associations of the SNPs with the counts of the WBC subtypes were assessed by linear regression assuming the additive effects of the allele dosages on the Z scores, using mach2qtl software [26]. In the replication and pleiotropic association studies, the association of the SNPs with the normalized residuals were also evaluated by the linear regression as univariate analysis for each of the phenotypes. The transformation methods used for the hematological traits in the pleiotropic association study are summarized in Table S6. Combined study of the results of the GWAS and the replication study was performed using an inversevariance method from the summary statistics of beta and standard error (SE). Through the combined study of the GWAS and the replication study, the locus which satisfied the genome-wide significance threshold of P,5.0610 28 was considered to be significant. We did not account for multiple comparisons among the traits. These significantly associated loci were subsequently enrolled in the pleiotropic association study. For the selection of the loci that were evaluated in the replication study, we adopted less stringent threshold of P,5.0610 26 to include potentially associated loci. For the evaluation of the identified loci using the Caucasian populations, Bonferroni correction based on the number of the evaluated loci were adopted (a = 0.05, n = 12, P,0.004). LD between the SNPs in the MHC region and HLA alleles were estimated using the genotype data of the SNPs and the high-resolution HLA alleles for Phase II HapMap JPT and CHB individuals [25,29]. Explained proportions of the variations of the WBC subtypes by the combination of the associated SNPs were estimated based on the differences of the coefficient of determination, R 2 , in the multivariate linear regression model for common-log transformed counts of the respective WBC subtypes, including the associated SNPs as covariates, and the model additionally including age, gender, smoking history, and the affection statuses of the subjects as covariates. Explained proportions of the correlation between the two WBC subtypes by the associated SNPs were estimated based on the following statistics: (R 2 resi1 -R 2 resi2 ) / R 2 nomi , where R 2 nomi is the R 2 between the log-transformed values of the counts of the WBC subtypes, R 2 resi1 is the R 2 between the residuals of the values adjusted for gender, age, smoking history, and the affection statuses of the subjects, and R 2 resi2 is the R 2 between the residuals of the values adjusted for gender, age, smoking history, the affection statuses of the subjects, and the SNPs. Significance of the statistics was evaluated using permutation procedure (6 10 9 iteration steps).

Web resources
The URLs for data presented herein are as follows.

Supporting Information
Figure S1 Quantile-Quantile plots (QQ-plots) of P-values in the GWAS for the WBC subtypes. QQ-plots of the GWAS for (A) neutrophil, (B) lymphocyte, (C) monocyte, (D) basophil, and (E) eosinophil counts. The horizontal axis indicates the expected -log 10 (P-values). The vertical axis indicates the observed -log 10 (P-values). The gray line represents y = x. l GC represents the inflation factor of the test statistics. The SNPs for which the P-value was smaller than 1.0610 215 are indicated at the upper limit of the plot. (TIF)     Text S1 Full descriptions of acknowledgements. (DOC)