Genome Wide Analysis of Fertility and Production Traits in Italian Holstein Cattle

A genome wide scan was performed on a total of 2093 Italian Holstein proven bulls genotyped with 50K single nucleotide polymorphisms (SNPs), with the objective of identifying loci associated with fertility related traits and to test their effects on milk production traits. The analysis was carried out using estimated breeding values for the aggregate fertility index and for each trait contributing to the index: angularity, calving interval, non-return rate at 56 days, days to first service, and 305 day first parity lactation. In addition, two production traits not included in the aggregate fertility index were analysed: fat yield and protein yield. Analyses were carried out using all SNPs treated separately, further the most significant marker on BTA14 associated to milk quality located in the DGAT1 region was treated as fixed effect. Genome wide association analysis identified 61 significant SNPs and 75 significant marker-trait associations. Eight additional SNP associations were detected when SNP located near DGAT1 was included as a fixed effect. As there were no obvious common SNPs between the traits analyzed independently in this study, a network analysis was carried out to identify unforeseen relationships that may link production and fertility traits.


Introduction
Herd fertility is important for the economic sustainability of dairy cattle farming. However, the fertility of Holstein cattle is steadily decreasing [1], and is becoming one of the main causes of culling and replacement of cows in dairy herds worldwide. Fertility is affected by several male and female characteristics, many of which are difficult and expensive to measure accurately. Dairy cattle selection programs are therefore often focused on fertility phenotypes that have low heritability, some of which are associated with the high level of milk production. These phenotypes have been used to create fertility selection indexes that may include production traits, typically milk yield, in addition to measures of fertility [2].
Linkage mapping studies have localised genetic loci putatively associated with reproductive and production traits in dairy cattle [3,4] and their relative genetic and phenotypic correlation have been tested [5,6,7]. In addition candidate genes with effects on fertility traits have been described, including Plasma Protein-A2 (APP2) on chromosome 16 [8] and calpastatin (CAST), which are expressed in reproductive organs [9]. Further, studies of Single Nucleotide Polymorphisms (SNPs) within genes showing differential expression in the bovine endometrium during luteal and ovulatory phases have identified loci with genetic effects on fertility and production traits [10].
Genomic estimation of breeding values of bulls using SNPs and phenotypic data collected on their progeny are being used in selection programmes. The data used for the genomic estimates can also be used for genome wide association studies to identify genetic loci with major effects on fertility and other traits. Recently, Genome Wide Association Studies (GWAS) of female fertility traits have been carried out using 50K SNP genotypes in Holstein-Friesian, Finnish Ayrshire, and Jersey breeds [6,11,12,13]. Significant marker-trait associations have been described on several chromosomes, however only a few chromosomal regions (e.g BTA13 and BTA18) were consistently identified in these studies with the same SNPs associated with fertility traits across investigations, and hence the chromosomal regions implicated were not exactly the same. These differences may be due to genetic heterogeneity among the populations studied, that the trait measurements were not the same, differences in sample size, power of the analyses or environmental influences (eg. experimental/commercial herds).
The GWAS reported here used data from Italian Holstein proven bulls genotyped with 50K SNPs, with the objective of identifying loci associated with fertility related traits and to test their effects on milk production traits. The analysis was carried out using EBVs for the aggregate fertility index traditionally used in selection for Italian Holstein cows [2] and for each trait contributing to the index: angularity (ANG), calving interval (CI), non-return rate at 56 days (NR56), days to first service (DFS), and 305 day first parity lactation (MILK). In addition, two production traits not included in the aggregate fertility index were analysed: fat yield (FAT) and protein yield (PROT). Further, the most significant marker on BTA14 associated to milk quality traits located in the DGAT1 region, was treated as fixed effect to discover other significant markers eventually masked by the large effect of the SNP on fat content and correlated traits.

Animals
A total of 2139 Holstein bulls progeny tested in Italy were chosen based on the: i) availability of biological material or DNA samples; ii) reliability of at least 75% for the Production, Functionality, Type (PFT) national selection index. PFT takes into account both production and functional traits in the proportion 49% : 51% ; iii) largest number of families and balanced number of members per family, to maximize the representativeness of the population. In detail the dataset contained 204 son-father pairs. Semen samples (straws) were provided by Italian certified artificial insemination centers (EU Directive 88/407/CEE), thus ethics committee approval for this study is not required.

Genotyping and quality control
Bulls were genotyped with the BovineSNP50 BeadChip (Illumina, San Diego, CA). Genotypes were quality controlled and markers excluded for missing data (≥ 2.5%), minor allele frequency (≤ 5%), mendelian errors identified for father-son pairs (≥ 2.5%), and deviation from Hardy-Weinberg equilibrium (HWE; p ≤ 0.001, Bonferroni corrected). Bulls with more than 5% missing data were also removed. A mendelian transmission check was performed on father-son pairs. Sons of father-son pairs with unexpectedly high (≥100) mendelian errors were excluded.
Classical Multi Dimension Scaling (MDS) was performed within the R statistical environment using the GenABEL package [14] to explore population structure and verify the genetic homogeneity of the dataset prior to carrying out the GWAS.

Phenotypes
Official EBVs for eight fertility and production traits were provided by the Italian Holstein breeder association (ANAFI; table 1). Fertility traits included the aggregate fertility index (FRT) used to select for fertility in Italian Holstein cows and the five traits from which it is compose: angularity (ANG), calving interval (CI), non return rate at 56 days (NR56), days to first service (DFS), and first parity mature equivalent milk yield at 305 days (MILK; [2]). EBVs for the five component traits were obtained using a multivariate animal model with all traits fitted simultaneously. This multi-trait approach allowed (co)variances between traits to be estimated, which were then used to calculate the aggregate index. In addition, two production traits were analysed, fat (FAT) and protein (PROT) yield. EBVs for FAT and PROT were obtained from the official ANAFI evaluation which is based on a multiple trait, multiple lactation Random Regression Test Day Model (RRTDM), as described in Muir et al. (2007) [15]. Heritability for the fertility traits analysed were 0.17 for ANG, 0.07 for CI, 0.03 for NR56, 0.06 for DFS and 0.026 for the FRT Fertility Index [2].
The number of animals included in the GWAS, and the mean and standard deviations of EBVs for all the traits analyzed are shown in Table 1.

Analysis
Genome-wide association analysis was performed with the GenABEL package in R using a three step GRAMMAR-GC (Genome wide Association using Mixed Model and Regression -Genomic Control) approach [16,17], using the genomic relationship matrix estimated from SNP data instead of pedigrees. First an additive polygenic model was used to obtain individual environmental residuals using the polygenic function of the GenABEL library to disentangle the cryptic population structure caused by the presence of closely related animals in the sample set [16]. To account for relatedness, the variance/covariance matrix was estimated from the genomic kinship matrix, as pedigree information was not available. The relationship matrix used in the analysis was estimated using genomic data with the ''ibs'' (option weight= ''freq'') function of GenABEL. Secondly, association was tested using a simple least squares method using an additive model on the residuals, corrected for cryptic relatedness, familiar correlation, and

Network analysis
Genes located within 1 Mbp upstream and downstream of significant SNPs were retrieved from the Ensembl database. Location and gene annotation were based on the UMD 3.1 genome assembly and on Ensembl release 65 (http:// www.ensembl.org). Canonical transcripts were blasted against the Human RefSeq and the Uniprot databases to search annotations for unannotated cow genes by retrieving homologous human transcript IDs. The Bovine Ensemble protein IDs were analyzed using Ingenuity Pathways Analysis software (IPA, Ingenuity® Systems, www.ingenuity.com). Each identifier was then mapped to its corresponding gene object in the Ingenuity Pathways Knowledge Base. Finally, networks of genes were generated on the basis of their connectivity. IPA assignes a score to each network (based on P value) representing the likelihood that the focus genes within the network are found therein by random chance. A high number of focus genes within a dataset leads to a higher network score.

Genotyping and quality control
Following quality control, markers with excessive missing data (2102 SNPs), low minor allele frequency (10643 SNPs), high mendelian error rate (177 SNPs) and because they were out of HWE (79 SNPs) were excluded. Markers without position information and on sex chromosomes were also removed (2,465 SNPs). Thirty-eight samples were removed because of low genotyping rate, and 8 because of excessive mendelian errors. No outlier was identified by Classical Multi Dimension Scaling (MDS, data not shown). The working dataset contained 2,093 samples and 38,535 SNPs. In each GWAS run, samples missing phenotype data for the specific trait were eliminated.

Results of the association analysis
GWAS identified 61 significant SNPs and 75 significant marker-trait associations with the single marker model. The genome-wide Manhattan Plots and details on these markertrait associations are shown in Figure 1 and Table 2. Eight additional SNP associations were detected when SNP ARS-BFGL-NGS-4939, located near DGAT1 was included as a fixed effect. Among the significant 69 SNPs, 58 were associated with a single trait, 8  Four regions contained more than one significant SNP. The proximal region of BTA14 between 1.2 and 2.8Mbp contained 21 SNPs (Tab. 2), associated with either two (MILK and FAT) or to all three of the production traits investigated; a region on BTA2, at 51.2Mbp contained 2 SNPs (BTA-47612-no-rs and BTB-00098202) associated with MILK; a region on BTA 5, at 60.6Mbp had 2 SNPs) (ARS-BFGL-NGS-110018 and Hapmap60306-rs29023088) associated with DFS and a region on BTA19, at 63.4Mbp, where two markers (ARS-BFGL-NGS-39839 and ARS-BFGL-NGS-72483) were respectively associated with NR56 and FRT.
The number of significant associations per trait ranged from a minimum of 2 for ANG to a maximum of 33 for FAT ( Table 2). The number of associated SNPs was higher for production compared with fertility traits, but comparable when closely linked SNPs were considered as marking a single region. In total, 36 SNPs and 26 regions were associated with the 5 traits (ANG, CI, NR56, DFS and MILK) included the aggregated FRT index. Conversely FRT was significantly associated with 5 regions, 2 of which were in common, associated with CI (on BTA2 and BTA28) and which were not 3 detected by the single trait analysis (on BTA17, BTA19 and BTA28; Table 2).
Significant associations were detected on 20 of the 29 bovine autosomes (

Results of the network analysis
In total 118 genes located within 1Mbp of significant SNPs were used in the IPA network analysis. Five main networks were identified. The first had an IPA score of 37 and includes genes involved in Cell Death, Cellular Development and Cell Signaling. The second network was comprised of genes involved in Gene Expression, DNA Replication, Recombination, Genetics of Fertility and Production in Cattle

Discussion
The genome wide association studies described here detected in total 69 significant markers and 83 marker/trait associations with individual fertility traits, the aggregated fertility index and the three production traits (P ≤ 5 x e-05). A larger number of significant associations were identified for production (54) than for fertility traits (29). In addition, the associations with production traits showed higher significance levels and lower false discovery rates (Table 3). This may be expected from their higher heritability and EBV reliability of the production traits compared with single the fertility traits and the composite fertility index (Table 1). Fertility traits are highly affected by environmental conditions, which are difficult to measure. Additionally these traits are not the biological processes that determine fertility, but are correlated measures which are also affected by management decisions. These factors contribute to decrease the power of detecting associations with the markers. Among the significant SNPs, only 21 showed highly significant associations (P ≤ 5 x e-07), and these were with three traits, two production traits that are precise measurements (MILK and FAT) and the third trait which has subjective, but well defined measurements (ANG). These significant SNPs were on two chromosomes, BTA14 and BTA26.
Considering significant markers closer than two Mbp as defining a same QTL region, 22 regions were associated with fertility traits and 14 with production traits. In general, regions associated with the fertility traits were marked by fewer SNPs than the production traits. This is due to the weaker association of SNPs associated to fertility traits, compared to production traits, and hence for fertility traits often only a single marker in a QTL peak exceeded the significance threshold.

The effect of DGAT1
When the SNP close to DGAT1 on BTA14 was included as a fixed effect, only 43 marker-trait associations were significant, which were not completely overlapping with the 69 identified when SNP close to DGAT1 was included in the analysis. These 43 significant SNP were located at 20 independent Table 3. Allele substitution effect (effB) and probability of association between marker BTA-27242-no-rs on BTA5 and traits investigated. The number of observations per trait is also indicated.  [18,19], and has an impact on the MILK trait considered here. MILK is among the five traits used to calculate FRT [2], which was the only trait showing an increased p-value when accounting for the effect of DGAT1. Therefore, it is likely that the apparent effect of the chromosomal locus containing DGAT1 on fertility is through the inclusion of MILK in the FRT. The remaining 35 significant associations were always observed in our results when the DGAT1 associated SNP was included or excluded from the analysis.

GWAS of fertility
A total of 38 SNPs were significantly associated with traits included in FRT index (ANG=2; CI=5; NR56=7; DFS=10; MILK=14). Only 3 of these SNPs were significantly associated with the FRT composite index. The three SNPs in common were on BTA2 and BTA28 for CI and on BTA19 for NR56, which are the two traits with highest relative weight in the index Genes identified under the main peaks of the genome scan are shown in grey, while the symbols without color are the genes added by Ingenuity Pathway Analysis Software to produce the network (connecting genes). Lines in the network indicate direct interaction, while dotted lines indicate indirect interactions. Nodes are displayed using various shapes that represent the functional class of the gene product. The composite score of the network is 34, that represents the negative log of the p-value for the likelihood that the molecules would be found together by chance. A higher score indicates greater probability that the molecules in the network are interconnected. doi: 10.1371/journal.pone.0080219.g002 (CI=51%; NR56=17%). Low SNP effects, epistatic interactions, and/or the relative low weight in FRT of the other individual traits may have reduced the chance of detecting the association of the other 35 markers with the composite index. Interestingly, FRT was associated with two regions not detected for the individual traits, on BTA17 and BTA28. Pleiotropy or epistasis may explain this finding. No QTL was found in common between the individual fertility traits included in the FRT index (Table 2).
A recent study based on the Finnish Ayrshire breed identified SNPs on chromosomes 1, 2, 4, 12, 13, 20, 24 and 27, associated with non-return rate, time from first to last insemination, number of inseminations and time from calving to first insemination [13]. Some of these traits are similar to those analyzed in the present study, however, only one chromosomal region is in common. In the present study a QTL on BTA27 at 10.5Mbp was associated with DFS while Shulman et al. detected a QTL for non return rate for heifers at 9.3Mbp on UMD 3.1 (quoted as 11.2Mbp in the original publication based on the Btau 4.0 assembly). However, the distance between the two QTL and the different nature of the trait definitions gives little confidence that the same locus has been identified.
Other recently published genome wide studies in dairy cattle have reported markers associated with fertility traits [5,12,11]. In all cases regions reported are either distant from those identified in the present study or supported by very large confidence intervals. For example, the significant association with NR56 reported here on BTA12 at position 20.1Mbp lies within the confidence interval of a QTL for the same trait reported by Hoglund et al. (2008) [5]. The Hoglund et al.. QTL confidence interval is so large (11Mbp -47Mbp) that comparison is difficult. In the same region of BTA12, a QTL for first to last insemination in days was found in Finnish Ayrshire at 24.3 Mbp (Schulman et al., 2008). In this case the trait definition is consistent with that of the present study, however, the distance is such that the loci detected are unlikely to be the same.

Functional candidate genes for fertility
A literature review based on recently published association or expression studies (Table S2) identified 559 candidate genes associated with fertility traits or differentially expressed in high and low fertility individuals in different mammalian species (Table S1). Among these candidate genes, 6 mapped within 2Mbp of significant markers identified in the present study.
The carnitine palmitoyl transferase 1B gene (CPT 1B) maps on BTA5 approximately 2Mbp from ARS-BFGL-NGS-7979 which was associated with NR56. CPT 1B is involved in lipid metabolism and transport, and has been shown to be upregulated in dairy cattle endometrium at day 17 of pregnancy [20]. At implantation the uterus is in a highly active metabolic state and variations in this gene may affect early embryonic losses, and hence return rates.
The Insulin-like growth factor 1 gene (IGF1) is also located on BTA5, close to two SNPs associated with DFS (ARS BFGL-NGS-25179 and Hapmap60306-rs29023088, see Table 2). An association was found between the SNP SnaBI in IGF1 and interval from calving to commencement of luteal activity postpartum (CLA) in a study of fertility, milk production and body condition in Holstein-Friesian dairy cows [21].
The anti-müllerian hormone receptor type II gene (AMHR2) is located on BTA5 under the peak centered on BTA-72978-nors, associated with CI (Table 2). Several TGF-β superfamily members and their receptors are involved in folliculogenesis, including AMHR2 [22]. Anti-müllerian hormone is an early follicle growth inhibitor that seems to be the only ligand of AMHR2 [22] . AMHR2 mRNA levels in granulosa cells of small follicles have been found to be elevated in cases of polycystic ovary syndrome in humans [23]. This dys-regulation of growth factors potentially alters the intra-follicular environment impairing the maturation of the oocyte and hence will influence the calving interval [22].
The transcription growth factor b2 (TGFB2) and the apolipoprotein H (APOH) genes are under the peaks identified by BTA-40382-no-rs on BTA 16 and ARS-BFGL-NGS-39839 on BTA 19, respectively associated to DFS and NR56 ( Table  2). Both TGFB2 and APOH are involved in the process of the follicular development as they interact with the reproductive hormones LH and FSH and so could be considered as candidate genes affecting fertility [24].
The immunoglobulin lambda-like polypeptide 1 precursor (IGLL1) is located near to ARS-BFGL-NGS-74168 on BTA17 which was associated with FRT. The IGLL1 gene has been found to be up-regulated during the peripartum period in the endometrium in lactating vs. non-lactating dairy cattle [20]. This suggests that this gene may play a role in energy balance, and influence production and fertility traits.
Only one SNP was associated with more than one trait: ARS-BFGL-NGS-4670 on BTA26 was associated with MILK and ANG. It is not known if this SNP marks the same QTL which affects both of these traits, or if there is a fortuitous colocation of two different QTL. There are two genes located within 1 Mbp of this SNP. Phosphatase and tensin homolog (PTEN), which is involved in regulating the cell cycle and in endometrial cancer in humans [25] and Lipase family protein J (LIPJ), a gene with testis specific expression that belongs to the family of mammalian acid lipases and superfamily of AB hydrolases that catalyse the hydrolysis of triglycerides in the body. Neither of these genes has an obvious role in the quantitative traits associated with this chromosomal region.

Analysis of interaction between fertility and production traits
The relationship between fertility and production was investigated using two strategies: i) SNPs significant for a trait were examined to identify if they may be associated with another trait which fell slightly below the p-value threshold set for significance and ii) a network analysis was performed taking into account all genes identified under the main peaks, to identify candidate genes that may affect both production and fertility traits through intermetabolic network connections The SNP BTA-27242-no-rs on BTA5 was significantly associated with PROT and CI. Using a lower threshold for significance (p-value < 5e-04) this SNP was also associated with FRT, ANG and DFS. The SNP had opposite effects on fertility and production traits, in so-far-as the positive allele for PROT had a negative effect on FRT, CI , DFS and ANG. Five genes are located within 1Mbp of this SNP ( Table 4), none of which has been extensively studied in cattle or other livestock species. Pathway analysis of the human homologues did not identify functions that could obviously be related to fertility or production traits.
An IPA network analysis of 118 genes located within 1Mbp of the 83 significant SNPs (Table 2) identified 11 networks, 5 of which had high scores. The highest scoring network included 37 proteins, 19 of which were produced by candidate genes near GWAS peaks, 9 for MILK and FAT and 10 for fertility related traits. In this network the interconnection between the 19 genes is mainly mediated by molecules not detected by GWAS. The main functions of the network are cell death, cell signaling and cell development. The same functions involved in cell cycle and in cell death were found also in the 5 other networks identified (4, 5, 9, 10 and 11), however no genes were in common among the networks.
The second network had a score of 34 and contains 18 genes among the 118 close to the significant SNPs ( Figure 2). This network includes genes involved in the control of production traits, including DGAT1 and APOH described above, and ATPase, Ca++ transporting plasma membrane 1 (ATP2B1) which has been shown to be involved in milk production traits [18,19] and in addition genes having a role in reproduction, including luteinizing hormone (LH) and the follicle-stimulating hormone (FSH) genes. Luteinizing hormone is produced by the anterior pituitary gland, and variations in the concentration of the LH trigger ovulation and development of the corpus luteum in females, while in males LH is involved in the production of testosterone (Knight PG et al. 2006). LH acts synergistically with FSH to regulate follicle development, embryonic growth, pubertal maturation, and general reproductive processes [24] .
The second network also includes genes connecting LH and FSH with DGAT1. These are the platelet-derived growth factor beta polypeptide (PDGFBB), transforming growth factor beta 1 (TGFB1), and transforming growth factor beta 2 (TGFB2) genes. Three genes (LH, FSH and PDGFBB) are known to be involved in ovarian follicle development [24]. In addition the TGFB2 gene is located under the SNP rs41635426 in position 22101902 on BTA16, which was significantly associated with the number of days open (p-value 1.18 e-05), indicating a possible link between fertility and protein content of milk.
The third network had a score of 30 and includes 16 of the genes close to significant SNP. This network, which includes amino acid metabolism, small molecule signalling and molecular transport, is more closely associated with production traits than with fertility.
Although the details of the mechanism controlling variation in fertility traits and production traits in cattle still needed to be untangled. The chromosomal regions with significant loci contain genes within pathways described here identify unforeseen relationships that would not be obvious from classical GWA studies and merit further investigation.

Conclusions
The results presented here support the polygenic nature of the traits analysed. All these traits have low heritability, particularly the fertility traits, which are correlated predictors rather than direct biological measurements. These factors reduce the probability of identifying associations in a genome wide associations in study of this size. Few markers were found in common among the traits: one marker was associated with production (MILK) and ANG on BTA26, and 2 markers on BTA5 were associated with FRT and CI. Only one association, on BTA5, was identified as having effects on both production and fertility traits when a slightly lower significance threshold was applied.
Marker ARS-BFGL-NGS-4939 on BTA14, which is close to the DGAT1 gene, was found to have an effect on all the production traits analysed. This genetic locus had little or no direct effect on fertility traits, however, network analysis indicated an interconnection between DGAT1 and genes influencing fertility. The complex balance between production and fertility deserves the combination of GWAS and network analysis to provide novel insights into the reproductive biology of dairy cows and the possible connections between fertility and production, justifying the further development of novel and interdisciplinary approaches, as used here.

Supporting Information
Table S1. List of Candidate genes liked with fertility obtained from literature review and their relative genomic position.
(DOCX) Table 4. Name, Ensembl ID and position of the five genes located within 1Mbp upstream and 1Mbp downstream marker BTA-27242-no-rs. Gene names and positions in human are also indicated.  Table S2.
Reference list used to identify functional candidate genes for fertility. (DOCX)