Single- and Bayesian Multi-Marker Genome-Wide Association for Haematological Parameters in Pigs

Haematological traits are important traits that show associations with immune and metabolic status, as well as diseases in humans and animals. Mapping genome regions that affect the blood cell traits can contribute to the identification of genomic features useable as biomarkers for immune, disease and metabolic status. A genome-wide association study (GWAS) was conducted using PorcineSNP60 BeadChips. Single-marker and Bayesian multi-marker approaches were integrated to identify genomic regions and corresponding genes overlapping for both methods. GWAS was performed for haematological traits of 591 German Landrace pig. Heritability estimates for haematological traits were medium to high. In total 252 single SNPs associated with 12 haematological traits were identified (NegLog10 of p-value > 5). The Bayesian multi-marker approach revealed 102 QTL regions across the genome, indicated by 1-Mb windows with contribution to additive genetic variance above 0.5%. The integration of both methods resulted in 24 overlapping QTL regions. This study identified overlapping QTL regions from single- and multi-marker approaches for haematological traits. Identifying candidate genes that affect blood cell traits provides the first step towards the understanding of the molecular basis of haematological phenotypes.


Background
Genome-wide associational study (GWAS) has become a powerful genomics tool for mapping genetic loci associated with common diseases and quantitative traits. Haematological traits are important in the sense that they can reflect the immune status and healthy conditions and be utilized as biomarkers in human and animals [1]. Pigs are valuable as agricultural commodities and share some similarities in physiology and genome as well as haematological traits with humans. Therefore pigs can serve as a tractable model to study genetic determination of physiological and metabolic traits [2,3].
Haematological traits include three components, leukocytes, erythrocytes and platelets, which are markers of immune and/or inflammatory responses [4,5]. A few studies have reported on QTL mapping and GWAS for haematological traits in the pig and most studies used F2 resource populations created by mating two genetically distinct breeds [6][7][8][9]. Two studies of purebred pigs are available, one has detected QTL affecting haematological traits based on a linkage analysis using 206 microsatellite markers [10] and the other identified SNPs associated with haematological traits by GWAS [11]. GWAS results for haematological traits at three growth stages in a White Duroc X Erhualian F2 intercross were also reported [12].
In the present study, we report GWAS for haematological traits of 591 performance-tested pigs from commercial herds of German Landrace (DL) using the PorcineSNP60 BeadChip (Illumina Inc., San Diego, CA, USA). The Bayesian multi-marker approach was integrated with single-marker regression analyses to identify genomic regions and corresponding genes overlapping for both methods.

Animals and sample collection
Animal care and tissue collection procedures followed the guidelines of the German Law of Animal Protection, and the experimental protocol was approved by the Animal Care Committee of the Leibniz Institute for Farm Animal Biology (FBN). Animals (n = 591) of a German Landrace (DL) herdbook herd were kept at the Experimental Farm of the FBN. Animals were fed ad libitum. Samples were collected from the pigs at an average age of 170 days at the experimental slaughter facility of the FBN after electronarcosis followed by exsanguination. Veterinary inspection of the animals before and of carcasses and organs after slaughter ensured that only animals without any impairment, disease symptoms or inflammatory and pathological signs were considered, thus avoiding any bias of blood phenotypes.  Table 1) were determined using an automated blood analyser device (ABX Pentra 60 HORIBA,

SNP genotypes
Genotyping was performed using the PorcineSNP60 BeadChip (Illumina Inc., San Diego, CA, USA) according to the manufacturer's SNP Infinium HD assay protocol. In brief, 200 ng of DNA were amplified, fragmented, and hybridized to the PorcineSNP60 BeadChip containing 62,163 locus-specific 50-mers that are covalently linked to beads distributed on the microarray surface. Single-base extension of captured oligos allowed the incorporation of labelled nucleotides that were detected by Illumina iScan, and images were subsequently converted to intensity data. The intensity data were normalized and the cluster position, genotype, and quality score were assigned using the GenomeStudio software (Illumina Inc.). Quality control steps were applied including removing samples with call rates < 99%. Markers with low minor-allele frequency (< 5%) were excluded. Markers that strongly deviated from Hardy-Weinberg equilibrium (p < 0.0001) were also filtered out. The average call rate for all samples was 99.8% ± 0.2. All markers on the PorcineSNP60 BeadChip were mapped to the porcine reference genome, Sscrofa 10.2.

Single SNPs GWAS
Haematological traits were analysed for an association with SNPs using a mixed-model analysis of variance in JMP Genomics (SAS Institute, Cary, NC, USA). Mixed-model analysis tests an association between traits and a single SNPs and simultaneously adjusts for population structure and family relatedness [13]. The genetic similarity matrix between individuals was first computed as identity by descent of each pair for the k-matrix. This genome wide relatedness and the slaughter day were used as random effects. For controlling of population stratification, the correlation-selected principal components analysis was used [14,15]. Significant correlations at a false discovery rate (FDR) of 5% were considered as covariates. Additionally, genotype and gender were used as fixed effects, age and carcass weight were considered as covariates. Significantly associated SNP markers were reported at a threshold of NegLog10 (pvalue) > 5. In order to consider multiple testing issues, a false discovery rate (FDR) was estimated (FDR < 5% corresponding to NegLog10 (p-value) > 6).

Bayesian GWAS
Prior to the analyses, the genotype matrix was processed using fastPHASE (version 1.2) to impute missing genotype information [16]. Bayesian models implemented in GenSel software (version 4.55R) were applied to the dataset [17]. All analyses were performed using a chain length of 51000 iterations with the first 1000 cycles being discarded as burn-in. An output was created at every 50th iteration. The proportion of SNP that were considered as having no effect in a single iteration was set to π = 0.995. Thus, approximately 240 SNPs were reported in a single iteration of the Markov chain Monte Carlo (MCMC) sampling. Initially, the Bayes C approach was used to estimate additive genetic and residual variance components for each trait. The heritability was calculated as the proportion of the additive genetic variance to the total phenotypic variance. Subsequently, the prior information of variance components was used to run Bayes B models and estimate SNP effects. For haematological traits, fixed effect of gender was considered as class variable in the models. Age and carcass weight were included as covariates. In addition, estimated SNP effects were combined for all markers located in nonoverlapping 1-Mb windows and the window contributions to the genetic variance were estimated using the window option implemented in the GenSel software. In total, 2559 1-Mb windows located on autosomes and the X chromosome were included in the analyses. The theoretical proportion of a single window to the genetic variance of a trait was approximately 0.039% (100%/2559) and 1-Mb windows with contributions above 0.5% were reported.

Results
Phenotypic and genotypic measurements  Table 1. Heritability estimates for haematological traits were medium to high (0.17-0.69). After filtering, 48,909 SNPs were retained for the subsequent GWAS by both single-marker analysis and Bayesian multi-marker approach. For single-marker analysis, a total of 252 SNPs associated with 12 haematological traits were revealed at significance levels of negative log 10 of p value > 5 (See: S1 Table, Fig 1). The top five markers associated with each trait are shown in Table 2 for haematological traits. For the Bayesian multi-marker approach, 1-Mb windows with an explained additive genetic variance of the traits above 0.5% were reported. In total 102 QTL regions were associated with haematological traits that were distributed across the whole genome (See: S2 Table, Fig 2). The regions overlapping between the results of single-(generalized linear model) and multi-(Bayesian approach) marker genome-wide association analyses accounted for 24 out of 102 overlapping QTL regions (See: S2 Table). The transcripts located in the overlapping regions, especially those that showed largest QTL effect for each haematological trait are shown in Table 3.    group box family member 2 (TOX2) gene locus (P = 1.6x10 -6 and P = 2.7x10 -6 ) as shown in Manhattan plots of WBC (Fig 1). Bayesian multi-marker approach detected the region that explained 2.2% of the additive genetic variance for WBC at 28-29 Mb on SSC14 (Fig 2). The overlapping region with single-marker analysis was also found on SSC17 (51-52 Mb) and explained 1.01% of the additive genetic variance for WBC. The top 4 LYM associated SNPs mapped on SSC14 (77.9-78.5 Mb) in DEAD-box helicase 21 (DDX21) and storkhead box 1 (STOX1) and on SSC1 in lipase C, hepatic type (LIPC) ( Table 2 and Fig 1). It is noteworthy that at the same region on SSC14, QTLs for LYM were identified by Bayesian method, which explained 2.16% of the additive genetic variance (Fig 2). The other two overlapping regions on SSCX were detected by the Bayesian analysis and by the single-marker analysis for an association with LYM. These regions contained the interesting transcripts of lysine demethylase 6A (KDM6A) and micro RNA (miR-221 and miR-222) ( Table 3).

Red blood cell traits
We performed GWAS for the following seven erythrocyte-related traits: RBC, HCT, HGB, MCV, MCH, MCHC and RDW. The single-marker approach revealed a number of genetic loci that were significantly associated with RBC (9 loci), HCT (1 loci), HGB (4 loci), MCV (98 loci), MCH (78 loci), MCHC (5 loci) and RDW (27 loci) at significance thresholds of FDR < 5% and NegLog10 (p-value) > 5. The detailed information is shown in additional file 1: S1 Table. In addition, the corresponding Manhattan plots are shown in Fig 1 and the top 5 most significant associations are listed in Table 2.
Interestingly, overlaying the results from both analysis methods consistently discerned RBC associated regions located on SSC1 (305.1-305.2 Mb) physically linked to nucleoporin 214 (NUP214), and on SSC6 (136 Mb) in ribonucleoprotein, PTB binding 2 (RAVER2). The QTL region for RBC with the largest effect was found on SSC16 (5-5.9 Mb) explaining 1.36% of the additive genetic variance (Fig 2). This region was annotated with the transcript of membrane associated ring-CH-type finger 11 (MARCH11) and C7H14orf2. The SNPs on NUP214 also associated with HCT (p = 4.2x10 -6 ). A region with the largest QTL effect for HCT accounting for 1.82% of the additive genetic variance encompassed the potential candidate genes family with sequence similarity 13 member A (FAM13A), HECT and RLD domain containing E3 ubiquitin protein ligase 3 (HERC3) and PIGY upstream reading frame (PYURF). The results also showed some discrepancies between the two GWAS analysis methods such as HGB of which most significant single SNPs were located on SSC15 (141.3 Mb), while multi-marker analysis detected significant loci on SSC5 (4.1-4.9 Mb) and SSC16 (5.0-5.9 Mb).

Platelet traits
We performed GWAS for three platelet traits: PLT, MPV and PCT. In total, we found 29 PLT loci, 16 MPV loci and 21 PCT loci (See: S1 Table and Fig 1) for single-marker GWAS. The most significant SNPs associated with MPV were MARC0044698 and ALGA0066401 of SSC12 (42.6 Mb) with p = 3.7x10 -7 and p = 9.1x10 -8 . By multi-marker analysis, QTL with the largest effect identified was on SSC18 at 0-1 Mb, follow by SSC1 at 284-285 Mb which explained 1.23% and 1.14% of the additive genetic variance of MPV, respectively. The SNP ASGA0030815 on bone morphogenetic protein 6 (BMP6) (SSC7) and MIGA0002922 on zinc finger, GATA-like protein 1 (ZGLP1) (SSC2) were associated with both PLT and PCT. Moreover, the region on SSC2 (69.1-69.8 Mb) was also confirmed by multi-marker analysis ( Table 3). It should be noted that two loci significantly associated with PCT hosted candidate genes with biologically plausible functions. The M1GA0000623 SNP within the insulin like growth factor 2 receptor (IGF2R) gene (SSC1) was highly associated with PCT with p = 6.7x10 -7 ; this region (SSC1, 9-10 Mb) was also confirmed by the multi-marker analysis with the greatest effect that explained 5.13% of the additive genetic variance and hosting 2 interesting transcripts IGF2R and MAS1 protooncogene (MAS1) ( Table 3).

Discussion
The proportion of phenotypic variance explained by markers is considered as a measure of heritability and therefore indicates whether these traits are heritable. Heritability estimates for haematological traits in our study were shown to be moderate for WBC (0.23) which is in line with previous studies in Large White and Yorkshire pigs [18,19]. Our findings showed that erythrocyte-related traits RBC, HCT, HGB, and RDW were moderately heritable (0.34 to 0.48) compared to MCV, MCH and MCHC which were highly heritable (0.67-0.69). However, in another genetic background RBC and HCT were highly heritable (0.56-0.62) compared to MCV and MCH which were moderately heritable (0.37-0.47), as reported by Mpetile and colleague [19]. The discrepancies of heritability estimation could be due to genetic differences in the breeds studied and the age of the animals when the phenotypic variance was measured.
In total, the single-marker GWAS by GLM analysis revealed 252 SNPs associated with 12 haematological traits at significance levels of 5% FDR, whereas the Bayesian approach detected altogether 102 QTL regions across the genome for 12 haematological traits using a 1-Mb window and the genetic variance above 0.5%. The multi-markers method was challenged to avoid false positives and overestimation of QTL effects derived from single-SNP analysis; accordingly, Bayesian approaches involving SNP-windows to reflect QTL have been applied in many studies before [20][21][22]. To take advantage of the Bayesian approach and to enhance the power of finding potential candidate genes, we combined the results from both methods by overlaying the derived QTL.The integration of both methods resulted in 24 overlapping QTL regions. For lymphocyte count, two overlapping regions from both methods located on SSCX (44.1-46.0 Mb) at KDM6A, miR-221 and miR-222, and SSC14 (77-79 Mb) at STOX1 and DDX21. KDM6A (histone 3 lysine 27 demethylase UTX) was identified as a novel regulator for haematopoietic cell migration by using haematopoietic stem and progenitor cell [23]. Consistency of results between both methods was also found for WBC on SSC17 (51-52 Mb) at TOX2. TOX2 is a transcription factor belonging to the TOX family that shares a highly conserved high mobility group DNA-binding domain with the other TOX members. As recently reported, TOX2 regulates human natural killer cell development by controlling T-BET expression [24]. Recently, a study demonstrated that transcriptional regulator TOX is required for the in vivo differentiation of common lymphoid progenitors into innate lymphoid cells [25]. All together there is growing evidence promoting TOX2 as a candidate gene for WBC counts.
QTL for MCH and MCV were reported in many pig chromosomes in particular on SSC2 at 55-60 Mb [11] and SSC8 at 42-73 Mb which covers the KIT gene [9,10,26]. KIT regulatory mutations are responsible for the dominant white phenotype in pigs [27] and have profound pleiotropic effects on peripheral blood cell measures in Western commercial crossbred pigs and experimental crosses [6,28,29]. Inconsistently, others have not found any significant association between the KIT mutations and haematological parameters [30,31]. Also, in our study, no significant associations with red blood cell traits were detected for both of these regions, in particular not of the KIT region. Landrace pigs have solid white coat color phenotypes and are usually homozygous for the dominant white allele (I) at the KIT gene. Thus there is not QTL segregating; consequently there are no significant SNPs found around the KIT region.
In this study, a region on SSC8 at 128 Mb was found significantly associated with MCV by means of single-marker analysis. Interestingly, Bayesian approaches persistently detected an associated effect for MCV, MCH, and RDW on SSC1, SSC13 and SSCX which constitute some plausible candidate genes contributing to red blood cell traits via haematopoietic mechanisms including EFNB1, TLE4 and OPA1. In particular, identified regions on SSCX at 2.0-2.8 Mb and 62.1-62.7 Mb include NLGN4X and EFNB1 as interesting candidate genes. NLGN4X encodes a protein which belongs to a family of neuronal cell surface and causal factors for monogenic autism as well as directly impacts neurodevelopmental processes during the formation of neurons and their connections [32,33]. EFNB1 encodes a membrane protein that acts as a ligand for Ephrin receptor tyrosine kinases and thus play a potential role in modulating blood pressure [34]. EFNB1 was detected in leukaemia cell lines and bone marrow and was shown to be involved in normal haematopoietic development and tumorigenesis [35,36]. The other candidate gene located on SSC1 which involved in the complex processes of haematopoiesis was TLE4. The TLE family of genes is a group of highly conserved transcriptional corepressors that are involved in myeloid cell proliferation and survival [37]. Tle4 knockout mice exhibit leukocytopenia, B cell lymphopenia, and significant reductions in haematopoietic stem and progenitor cells [38]. Another plausible candidate locus contributing to red blood cell traits via energy metabolism of erythropoiesis was OPA1 located on SSC13 (140-140.9 Mb).This locus explains 1.7 and 4.7% of the additive genetic variance for MCV and RDW. Erythropoiesis is highly dependent of mitochondrial metabolism through multiple ways [39]. The OPA1 gene encodes a dynamin-like mitochondrial GTPase OPA1and plays a significant role in mitochondrial structure, maintenance and fusion. In addition, OPA1 -is involved in the regulation of energetic metabolism and cell death, which underscores its multiple physiological roles [40]. Recently, it was reported that when erythropoietic cells are copper deprived, MFN2 and OPA1 become up-regulated and functional to promote fusion [41].
In a previous study, the QTLs for RBC were mapped on the end of SSC1 [9]. Regarding the present study, one of common candidate genes in SSC1 (305.1-306 Mb) that associated with HCT and RBC, was NUP214. The gene is a member of the FG-repeat-containing nucleoporins and known to be fused with the DEK gene on chromosome 6 in a t(6,9)-translocation associated with acute myeloid leukemia and myelodysplastic syndrome [42].
The highly significant SNPs and regions, that both approaches detected s to be associated with platelet traits (PCT), were physically linked to the IGF2R gene (SSC1, 9-10 Mb). The IGFs have an important role in physiologic and neoplastic processes as well as normal and malignant development of the haematopoietic system [43,44]. Recently, it was reported that deletion of the insulin receptor in murine resulted in an increases platelets count and volume, and blocked the action of insulin on platelet signalling and function [45]. ZGLP1 (zinc finger, GATA-like protein 1) is a strong positional candidate gene for both PCT and PLT but there is still limited knowledge about the function of this gene. GWAS of PLT further revealed the BMP6 gene which is the key endogenous regulator of hepcidin, an iron homeostasis gene [46]. In this context, it was also reported that iron deficiency increases megakaryopoietic differentiation and alters platelet phenotype [47].

Conclusions
In summary, our study provides insights into the genetic architecture of haematological traits and opens new opportunities to the application of haematological parameters as a monitoring indicator for health and infection in pigs. Taking the advantages of Bayesian GWAS approach, combined with a GLM single-marker approach, we were able to provide a list of promising QTL regions and plausible candidate genes that carry common genetic variants associated with RBC, WBC, and platelet phenotypes. Further validation and identification of the causal mutations are necessary. This study provides an additional step towards the understanding of the molecular basis of blood cell phenotypes and could be used as a model for many diseases which might be simpler for blood cell traits rather than most other complex phenotype.
Supporting Information S1 Table. Results of single-marker genome-wide association analyses for 12 haematological traits. List of significantly associated SNP markers at NegLog10 (p-value) > 5 obtained by single-marker analysis. Haematological traits, marker name, chromosome, position according to Pig Genome Annotation 10.2, major/minor allele, % variance explained, p-value (NegLog10), positional candidate gene. (XLSX) S2 Table. Results of multi-marker genome-wide association analyses for 12 haematological traits. List of 1-Mb windows with % variance explained > 0.5%. Haematological traits, chromosome, position of start and end of window according to Pig Genome Annotation 10.2, % of the additive genetic variation explained by the QTL regions associated with the traits, number of SNPs in the QTL regions, regions overlapping between the results of single-(generalized linear model) and multi-(Bayesian approach) marker genome-wide association analyses. (XLSX)