Genome-wide association study using haplotype alleles for the evaluation of reproductive traits in Nelore cattle

Zebu cattle (Bos taurus indicus) are highly adapted to tropical regions. However, females reach puberty after taurine heifers, which affects the economic efficiency of beef cattle breeding in the tropical regions. The aims of this study were to establish associations between the haplotype alleles of the bovine genome and age at first calving (AFC) in the Nelore cattle, and to identify the genes and quantitative trait loci (QTL) related to this phenotype. A total of 2,273 Nelore cattle (995 males and 1,278 females) genotyped using the Illumina BovineHD BeadChip were used in the current study. The association analysis included females with valid first calving records as well as open heifers. Linkage disequilibrium (LD) analysis among the markers was performed using blocks of 5, 10, and 15 markers, which were determined by sliding windows shifting one marker at a time. Then, the haplotype block size to be used in the association study was chosen based on the highest r2 average among the SNPs in the block. The five HapAlleles most strongly associated with the trait (top five) were considered as significant associations. The results of the analysis revealed four genomic regions related to AFC, which overlapped with 20 QTL of the reproductive traits reported previously. Furthermore, there were 19 genes related to reproduction in those regions. In conclusion, the use of haplotypes allowed the detection of chromosomal regions associated with AFC in Nelore cattle, and provided the basis for elucidating the mechanisms underlying this trait.


Introduction
predicted records that were obtained by adding a penalty of two months to the maximum value of AFC in their respective contemporary groups. Consideration of open heifers was aimed to avoid bias in the estimation of genetic parameters and EBVs [19,20]. This procedure is based on the assumption that open heifers could have calved if they were bred for longer breeding seasons [20,21]. The contemporary groups were defined by the concatenation of herd code, year and season of birth, management group identifications (from birth to weaning and from weaning to yearling), and farm (birth, weaning and yearling).
Variance components and EBV were obtained using the DMU software [22]. In the analysis of AFC, a single-trait animal model was fitted, including the fixed effect of contemporary groups and the age of dam as covariates (linear and quadratic effects), as well as a random direct genetic effect and a random residual term. For each animal, the reliabilities of the EBVs were computed based on the corresponding estimates of prediction error variance.
Only genotyped animals that had dEBV with associated reliabilities greater than 25% were kept for further analysis (S1 Dataset), such, that a set of 1,189 animals was considered for association analyses (S2 Dataset). The dEBV average for this samples was equal to -2 (ranging from -46.7 to 60.6) days and reliabilities equal to 0.59 (ranging from 0.25 to 0.94).

Quality control
Genomic data were subjected to quality control (QC) measures, including the maintenance of autosomal markers with minor allele frequencies (MAF) >2%, Hardy-Weinberg equilibrium (HWE) significance at P > 10 −5 based on Fisher's exact test, and an SNP call rate > 95%. The following filters were used for sample exclusion: identity by state (IBS) analysis > 95% and the samples in which less than 90% of the genotypes were determined were discarded. The QC was conducted using the GenABEL package, version 1.8-0 [23] of the R software (version 3.2.1) [24].

Genotype phasing
Haplotype assembly requires the identification of the chromosome (maternal or paternal) in which a certain allele is located. Therefore, the haplotype phase was determined using the SHAPEIT software version 2.r837 [25], considering all genotyped animals (N = 2,273).

Linkage disequilibrium
LD between markers was analyzed to determine the number of SNPs per haplotype block, because high-LD markers minimize the occurrence of recombination within the blocks [26]. Therefore, LD was estimated as the squared correlation of allelic frequencies (r 2 ), following Hill and Robertson [27]: where, freq.A, freq.a, freq.B and freq.b denote the frequencies of A, a, B, and b alleles, respectively; and freq.AB, freq.ab, freq.Ab and freq.aB denote the frequencies of the haplotypes AB, ab, Ab and aB in the population, respectively. LD between markers was performed using blocks of 5, 10, and 15 markers with a sliding window for each SNP. The window size that resulted in stronger LD was chosen to perform the association study. LD was analyzed using the Plink software, version 1.9 [28,29].

Haplotypes
With the phased haplotypes of autosomal chromosomes, the haplotypes were constructed with blocks size chosen based on high r between the SNPs in the LD analysis and sliding window of one marker. The R package GHap v1.2.1 [30] was used to build the haplotype blocks (Hap-Block) and identify alleles (HapAllele). Therefore, genotypes were scored as 2, 1, or 0, corresponding with the presence of two copies, one copy, or the absence of the HapAllele, respectively. The genomic position in the association analysis was based on the average distance between the first and last SNPs in a HapBlock. HapAlleles with allele frequency lower than 3% were excluded.

Association analysis
Since dEBV reliability varies across individuals, weighted analyses are required to account for heterogeneous variances. The weights were obtained as proposed by Garrick et al. [18]: where h 2 is the heritability of the trait (0.092) estimated using the complete database of AFC recording, r 2 is the dEBV reliability, and c is a constant that can assume values from 0 to 1. We assumed that c was equal to 0.5. Statistical analyses were performed using the ghap.lmm and ghap.assoc functions of the GHap package. First, the variance was estimated using the maximum-likelihood method and the following mixed linear model: where y is the vector of dEBV, X is the incidence matrix relating elements in y to fixed effects b (intercept), Z is the incidence matrix for random effects u (animal), and u is a vector of random effects~N (0, Kσ u 2 ), where K is HapAllele relationship matrix, and e is a random residual vector (with variance-covariance Wσ e 2 ), assuming independence of the residuals. The estimated residuals from the model (3) were considered as adjusted observations for covariance and polygenic effects, these values were adjusted via Genomic Control [31] and were used in the least squares regression analysis to test the association between each haplotype allele and the phenotype.

Analysis of genomic regions
A genome-wide threshold level at 5 × 10 −5 was selected for this study, as used by Lander and Kruglyak [32]. For Hapalleles with distances less than 1 Mb or those that overlapped, only the one with the lower P value was chosen. The 1 Mb sequence windows that flanked the significant regions of the top five haplotypes were explored using the bovine genome assembly of UMD version 3.

Results
After filtering through quality control analyses, 511,375 SNPs and 1,189 individuals remained in the analysis. LD analysis indicated that the average of r 2 between adjacent markers for the blocks of 5, 10, and 15 SNPs (S1 Fig) were 0.37, 0.32, and 0.30, respectively. Thus, the block sizes of 5 SNPs were chosen to perform association studies.
The haplotyping procedure generated 507,384 HapBlocks and 2,238,795 HapAlleles with allele frequency higher than 3%. The mean number of alleles per block ranged from 4.37 (BTA 1) to 1.19 (BTA 25). The distribution of the number of haplotype blocks per chromosome is shown in S1 Fig. The results of GWAS between each Hapalleles and AFC in terms of -log10 (P value) are shown in Fig 1, where 68 haplotype alleles with P value < 5 × 10 −5 were obtained (more details included in S1 Table). The five most significant regions selected for genomic exploration were in the detected peaks of BTA3 (BTA for Bos taurus chromosome), BTA5, BTA6, BTA21 and BTA26 (Table 1).
In total, 90 QTL (S2 Table) and 65 candidate genes (S3 Table) were present in these top five regions. Among these, 19 genes were related to previously described reproductive traits ( Table 2). Some of the genes that were identified were associated with embryonic implantation in the uterus, development of reproductive organs in females and regulation of progesterone secretion.
Within the top five regions, we found several QTL that were related to reproductive traits (Table 3), and considering the QTL with AFC linkage, we can highlight the age at puberty, daughter pregnancy rate, and conception rate.

Discussion
Most of the studies involving haplotypes generated based on the sliding windows have been applied to map alleles associated with human diseases [39]. The use of haplotypes instead of single markers gives putative advantages, the effects of individual SNPs being too small to overcome the stringent significance threshold [14].
The size of the haplotypes might interfere with the success of the analysis, because long blocks lead to the inclusion of non-informative SNPs and increase in the number of rare alleles, whereas short blocks might ignore the informative markers and reduce the power of the association analysis [40,41]. The use of blocks with 5 SNPs was shown to be appropriate, with regards to the size and average of LD between the adjacent markers. Moreover, the detection of significant signals in the regions of BTA3, BTA5, BTA6 and BTA26, where known candidate genes and QTL of reproductive traits are located, reinforces the suitability of this block size.
The BAG3, DMBX1, and PRDX3 genes were annotated in the MGI database [35] with biological function related to abnormalities in fat deposition. Knockout of the BAG3 genes (cell death suppressor interaction Bcl-2 (Bis)) and DMBX1 (transcription factor) gave rise to mice with low fat deposition and severe thinness [42,43]. The PRDX3 gene is highly expressed in the adipocytes and the suppression or knockout of this gene resulted in obese phenotype in rats and humans [44]. These genes can be pointed out by GWAS because of the positive or negative genetic correlation between body weight and reproductive traits, such as AFC and heifer pregnancies [45], in addition to the proximity of BAG3 (830 kb) and PRDX3 (383 kb) with the QTL affecting daughter pregnancy rate. Nelore heifers with early fat deposition can have decreased growth rate, with the nutritional resources directed for reproduction, thus presenting better reproductive performance than that of the heifers of greater weight. The eIF gene family acts at several stages of translation during protein synthesis. Although there were no reports of association between eIFs and reproductive traits, functional search in String v10.5 identified the interaction of this family with the genes TIAL1, CACUL1, and NSUN4. The TIAL1 gene (also known as TIAR) encodes protein member of the RNA binding family, acting on post-transcriptional regulation. In rodents, changes in the expression of TIAL1 severely affects the development of primordial germ cells of gametes (cells that differentiate into sperm and oocytes), being orthologous in cattle [46]. CACUL1 (or CAC1) acts on the activation of the CDK2 gene involved in cell proliferation processes [47]. In the experiments using mice, inactivation of CDK2 gave rise to sterile animals, thereby proving to be essential for the development of germ cells in males and females [48]. It is worth mentioning that these two genes are close to the QTL associated with daughter pregnancy rate in BTA26, wherein TIAL1 is located at 753 kb and CACUL1 is located at only 4kb, corroborating the idea that these genes have roles in AFC variation. In this context, the NSUN4 gene has been reported to be differentially expressed in mouse oocytes. NSUN4 is a candidate gene involved in the association with cumulus cells, responsible for oocyte viability and embryonic developmental competence [38], which could possibly explain their association with AFC in the current study.
Further interaction between the LOC513210 and FAAH genes was found. LOC513210 belongs to the olfactory gene family that was associated with precocity in Nelore cattle in a previous study [49]. The physiological explanation is that these genes with olfactory function also act on germ cells [49,50]. The expression of the FAAH gene occurs in the uterine epithelial cells and in the myometrium during the estrous cycle in rats. Adjustments in the expression of the FAAH gene are carried out by sex hormones and determine the success of uterine receptivity for the implantation of the embryo [51]. In heifers, sheep, and women, the decrease in FAAH gene expression has been associated with abortions [52][53][54]. In cows, the inactivation of FAAH can lead to infertility [54]. The CYP4A22 gene was found to be associated with precocity in Brahman heifers in previous studies [55,56]. CYP4A22 belongs to the cytochrome P450 family and is involved in fat metabolism. Nguyen et al. [56] have suggested that this gene acts in response to the progesterone hormone. This hypothesis explains the involvement of CYP4A22 (having function in the pituitary gland) in the mechanisms of ovarian feedback established with during puberty. Using String, it was possible to detect a relationship between CYP4A22 and CYP4A11, which was the last one associated with perinatal modifications in target tissues that is necessary for the progression of pregnancy [57].
In addition to CYP4A22, the PARM1 gene has also been previously reported to be associated with precocity in heifers [58,59]. PARM1 is expressed in the ovaries and encodes proteins involved in cell proliferation the, is regulated by androgens [60]. The PARM1 gene was also associated with daughter pregnancy rate of Holstein sires [61], and is located within two QTL that are essential for AFC, conception rate, and daughter pregnancy rate. This gene also participates in the regulation of luteinizing hormone and controls progesterone levels in rats [62,63].
The genes RGS10, WIF1 and CYP4B1 were reported to be essential for the embryo implantation process in the uterus and for pregnancy viability [64][65][66][67]. RGS10, located 688 kb close to the daughter pregnancy rate QTL, had its expression identified in the pig endometrium. RGS10, which is orthologous in cattle and pigs, can lead to changes in the endometrium during the estrous cycle that can lead to successful embryo implantation [65]. WIF1 is located 476 kb away from the age at puberty QTL. This gene encodes the regulatory protein of canonical WNT molecules, involved in cell proliferation, differentiation, and migration [66,68]. Expression studies in the endometrial tissue of heifers suggested the activity of the WIF1 gene as a molecule moderator, important for the implantation and development of the embryo [66,67].
The BTC gene, localized on the BTA6, 197 kb away from the QTL affecting conception rate and daughter pregnancy rate, is essential for oocyte maturation and development, and fertilization [69]. BTC acts as the mediator of luteinizing hormone effects [70], which regulates functions related to puberty, such as oocyte maturation and ovulation [71].
Gurgan et al. [72], aiming to identify markers capable of predicting the quality of human oocytes detected the presence of the RCHY1 gene as a possible hormone regulator involved in follicular development, further emphasizing that this gene is close (383 kb) to the conception rate and daughter pregnancy rate QTL.
In a transcriptomic study using granulosa cells from the ovarian follicles of heifers, the GRK5 gene was detected in association with the maturation of granulosa cells [73]. These cells are important for the hormonal regulation occurring in the gonads and for viability of the female gamete [73,74]. It is noteworthy that this gene is found 414 kb of daughter pregnancy rate QTL.
The EFCAB14 gene encodes a calcium ion binding protein and belongs to the EF-hand calcium binding gene family. There are still no reports available in the literature on the biological pathways in which genes from this domain (domain 14) are involved. However, this gene family has been associated with reproductive traits in women and cattle [75,76].
Another candidate gene, located in the BTA3 and BTA5, encodes the U6 spliceosomal RNA, and had paralogous genes associated with AFC in a different population of Nellore cattle, as reported in a previous study [77]. In BTA5, this gene is close (498 kb) to an important QTL related to AFC, which is the age at puberty QTL.
There were no known genes or QTL related to the reproductive traits among the eight QTL and four genes annotated in the vicinities of the significant haplotypes of BTA21 (S2 and S3 Tables). This may be due either to the lack of knowledge regarding the functions of these genes or the existence of non-annotated genes in this region. It is noteworthy that the present study detected association in regions different from that reported by previous studies on the Nelore breed [77][78][79]. This can be because of herd particularities, such as the extent of LD, allelic frequencies, sample size, and statistical approaches [79], in addition to the use of haplotypes in the current study.
The number of significant peaks dispersed across the genome (Fig 1) confirms the polygenic nature of the trait AFC. Therefore, the current study was able to reveal genomic regions putatively associated with the reproductive performance of cattle.
The proximity of the QTL with the association regions and the involvement of QTL in reproductive traits (e.g., calving ease, daughter pregnancy rate, calving rate, embryonic survival, stillbirth, and conception rate) support the results obtained for the examined genes. Moreover, our results support the involvement of the identified regions in AFC.

Conclusion
The use of haplotypes allowed the detection of chromosomal regions associated with AFC. In these regions, genes and QTL related to reproduction were found. These results provide the basis for further studies that aim to elucidate the mechanisms underlying the roles of the examined genes during AFC expression.
Thus, a better understanding of this mechanism will allow the use of specific genotypes as a guide in animal genetics improvement programs, and will enable the construction of cheaper, low-density panels for the evaluation of specific genotypes that are advantageous to selection.