A Genome-Wide Association Study for Clinical Mastitis in First Parity US Holstein Cows Using Single-Step Approach and Genomic Matrix Re-Weighting Procedure

Clinical mastitis (CM) is one of the health disorders with large impacts on dairy farming profitability and animal welfare. The objective of this study was to perform a genome-wide association study (GWAS) for CM in first-lactation Holstein. Producer-recorded mastitis event information for 103,585 first-lactation cows were used, together with genotype information on 1,361 bulls from the Illumina BovineSNP50 BeadChip. Single-step genomic-BLUP methodology was used to incorporate genomic data into a threshold-liability model. Association analysis confirmed that CM follows a highly polygenic mode of inheritance. However, 10-adjacent-SNP windows showed that regions on chromosomes 2, 14 and 20 have impacts on genetic variation for CM. Some of the genes located on chromosome 14 (LY6K, LY6D, LYNX1, LYPD2, SLURP1, PSCA) are part of the lymphocyte-antigen-6 complex (LY6) known for its neutrophil regulation function linked to the major histocompatibility complex. Other genes on chromosome 2 were also involved in regulating immune response (IFIH1, LY75, and DPP4), or are themselves regulated in the presence of specific pathogens (ITGB6, NR4A2). Other genes annotated on chromosome 20 are involved in mammary gland metabolism (GHR, OXCT1), antibody production and phagocytosis of bacterial cells (C6, C7, C9, C1QTNF3), tumor suppression (DAB2), involution of mammary epithelium (OSMR) and cytokine regulation (PRLR). DAVID enrichment analysis revealed 5 KEGG pathways. The JAK-STAT signaling pathway (cell proliferation and apoptosis) and the ‘Cytokine-cytokine receptor interaction’ (cytokine and interleukines response to infectious agents) are co-regulated and linked to the ‘ABC transporters’ pathway also found here. Gene network analysis performed using GeneMania revealed a co-expression network where 665 interactions existed among 145 of the genes reported above. Clinical mastitis is a complex trait and the different genes regulating immune response are known to be pathogen-specific. Despite the lack of information in this study, candidate QTL for CM were identified in the US Holstein population.


Introduction
Clinical mastitis (CM) is a common infectious disease in dairy cows. It is usually defined as an inflammation of the mammary gland resulting from the introduction and multiplication of pathogenic microorganisms, and has been shown to affect farm profitability and animal welfare. Furthermore mastitis brings strong concern about the high risk of culling for diseased animals [1] as well as the delivery of antibiotic residues in milk and the environment due to high frequency of veterinary treatments [2]. A variety of factors impact the susceptibility to mastitis in dairy cows, but selection can be used to limit the frequency of this disease [3]. Examples of direct selection against clinical mastitis through selection indeces can be found in Scandinavian countries, where genetic progress over the last 20 years has shown a positive trend [4,5]. Besides, selection for correlated traits, such as somatic cell score, has been proven to be a valid alternative [6].
The discovery of genomic regions with quantitative impacts on a trait is of essential importance to understand its genetic architecture, and can be used in the design of breeding schemes to increase the frequency of favorable alleles in the population. This is of particular importance when traits have low heritability and there are difficulties in the routine recording of phenotypes, such as resistance to diseases.
The genomic dissection of clinical mastitis is not trivial: etiology depends on a variety of microorganisms (bacteria, fungi, yeasts, and algae can induce mastitis), udder infection can follow different patterns across time (mastitis can be subclinical with a duration of weeks or months, but can be also peracute and lead the cow to death in few days), and pathology can show different levels of intensity according to the individual's ability to react to pathogens (see Rinaldi et al. [7] for an extensive review). Moreover, resistance to mastitis can be considered as a complex trait, which is likely controlled by many genes with small effects, rather than by few genes with large effect [8].
Health event data voluntarily collected by farmers provide a wealth of information suitable for large-scale breeding value estimation [9]. Unfortunately, these data do not provide any information about the level and duration of exposure to pathogens for every single cow [10], and subclinical infections are often difficult to detect and commonly go unrecorded [3].
These peculiarities in the phenotypic and genetic dissection of CM did not prevent the discovery of associations of this trait with regions of relevant impact across the genome. Several QTL have been found to affect resistance to CM in dairy cattle populations [11][12][13][14], although to our knowledge no study has been conducted in the US Holstein population. Detilleux [15] reviewed several studies on potential candidate genes for clinical mastitis, and indicated that genes that are part of the major histocompatibility complex and those linked with neutrophil regulation play major roles in susceptibility to the disease.
Information about regions of the genome involved in resistance to CM currently is lacking in US Holstein, one of the largest dairy cattle populations in the world. The objective of this study was to perform a genome-wide association study for clinical mastitis using producer-recorded health data collected on US dairy cows to identify regions of the genome associated with occurrence of clinical mastitis.

Materials and Methods
No animal care approval was required for the present manuscript because all records came from field data. The health events for mastitis on first parity cows were extracted from voluntary producer-recorded health event data from Dairy Records Management Systems (Raleigh, NC) as described by [Parker-Gaddis] et al. [16]. Lactation incidence rate was available for 103,585 first-lactation daughters of 10,934 sires, reared in 752 herds. Genotypes were obtained using the Illumina BovineSNP50 BeadChip (Illumina, Inc., San Diego, CA, USA) and included information for 1,361 Holstein bulls. Only genotypes of bulls with at least 5 daughters with phenotype were included in the analysis. Genotypes were edited to eliminate individual markers and individuals with call rate (CR) below 99% and monomorphic markers or markers with minor allele frequency (MAF) below 0.05. Only autosomal markers were used in the analyses. After editing, 39,004 markers were available for analysis. A summary of the data is shown in Table 1.
The association study was performed using the single-step genomic-BLUP approach (ssGBLUP [17][18][19]). This method was already employed for GWAS by Dikmen et al. [20] and Wang et al. [21]. The model considers additive genetic relationships between the individuals, combining pedigree and genomic information into the H matrix [17,22], the inverse of which is constructed by blending the inverse of the SNP-derived genomic matrix (G) and the pedigree numerator relationship matrix (A) following: where A -1 22 is the inverse of the numerator relationship matrix for the genotyped individuals. In the present study, the G matrix was constructed weighting each marker contribution by its expected variance [23] G ¼ ZDZ 0 where D is a diagonal matrix with elements containing the inverse of the expected marker Z is the marker incidence matrix containing genotypes (-1, 0 or 1) corrected by allele frequency [23].
The H -1 matrix was substituted into the mixed model equation, and a threshold-liability model was used to accommodate the binary nature of the trait. The model used for analyzing CM outcomes (0/1) and obtaining solutions for the sire breeding values was the same as in [Parker-Gaddis] et al. [9] l ¼ Xb þ Z h h þ Z s s þ e where λ represents a vector of unobserved liabilities to clinical mastitis, β is a vector of fixed effects including the overall mean and year-season, X is the corresponding incidence matrix for the fixed effects, h represents the random herd-year effect assuming h~N(0,Iσ 2 h ), s represents the random sire effect where s~N(0,Hσ 2 s ), with H representing the additive relationship matrix that combines pedigree and genomic information, Z h and Z s represent the corresponding incidence matrices for the herd and sire additive genetic random effects, respectively, and e represents the random residual, modeled following N(0,Iσ 2 e ). Variance components and heritability were estimated using the softwares PREGSF90-POSTGSF90-THRGIBBS1F90 version 2.104 [24][25]. A total of 120,000 iterations were run with the first 20,000 discarded as burn-in, thinning every 10 samples. Post-Gibbs analyses were completed using POSTGIBBSF90 version 3.04 [26] and the 'coda' package in R [27]. Trace plots were also inspected visually to ensure convergence had been reached. Posterior standard deviations and 95% highest probability density intervals were calculated for each estimate. re Marker effects (u) were obtained using an iterative process similar to the one described by Wang et al. [28]. Briefly, after solution of the ssGBLUP model genomic breeding values of genotyped individuals (a g ) were back-solved to obtain marker effects accounting for their shared genomic variance, as described in the formula var a g u ¼ Individual marker effects were obtained by solving: In the first round of the iterative process the variance absorbed by each marker was obtained as 2p i ð1 À p i Þu 2 , where p is the frequency of one of the 2 alleles, Z was the marker incidence matrix containing genotypes (-1, 0 or 1) corrected by allele frequency and G was the genomic relationship matrix. In successive iterations, in order to highlight regions of higher impact on the genetic variation of the trait, a weighted G matrix was created, where expected marker contributions were replaced with realized variances, so that elements of D were with u being the marker effect estimated in the previous iteration. New marker effects were obtained considering the weighted G matrix in the formula reported above. For detailed description of the iterative re-weighting procedure please see the 'Scenario 1' procedure in Wang et al. [28]. The process was repeated 4 times to ensure stability of estimates.
The variance absorbed by 10-SNPs moving windows was successively calculated across the whole genome. We selected the 10 windows explaining the largest amount of genomic variance for gene annotation, gene network and pathway analyses. Based on the starting and ending coordinates of the windows, gene annotations were obtained using the Biomart platform on Ensemble [29] through the 'Biomart' R package (http://www.bioconductor.org). A list of genes located in proximity to the windows was used for performing a gene network analysis using the online resource GeneMania [30], and a pathway-enrichment analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG [31]) and the Database for Annotation, Visualization and Integrated Discovery (DAVID [32,33]). A Manhattan plot was created using the R package 'ggplot2' [34].

Results and Discussion
Summary statistics, variance components, and heritability estimates are reported in Table 1. Phenotypic data used here were similar to those reported by [Parker-Gaddis] et al. [9] for first lactation cows, while genotyped sires and available markers were different. Here, only markers with known positions on autosomes (chromosomes 1 through 29) were used, and 1,361 sires with genotype were included in the analysis. The model used here differs from the previous study because it is a single-trait rather than multi-trait model. Heritabilities and variance components were in agreement in both studies. While no other studies were available to compare genomic estimates of h 2 , our estimates were also in agreement with traditional pedigree-BLUP estimates as reported by other authors in the Holstein as well as in other breeds [3,4,35]. Heritabilities for CM appear to be essentially low, in most of the studies being below 0.10, although genetic variability for immune system capabilities appears to be appreciable. As reported by Detilleux et al. [36], some immune-system parameters measured in cows near calving showed heritabilities larger than 0.10. The reason for this increased uncertainty when modeling the genetic background of any disease resistance trait, such as CM, was suggested by Bishop and Wooliams [10], who remarked that there is low information content when categorizing diagnoses into 2 possible outcomes (healthy/diseased). The absence of knowledge of the level of exposure of animals to the disease, and the lack of information about exposure to specific pathogens, further increases uncertainty. The etiology of mastitis results in a wide range of possibilities for the intensity of inflammation, as found by Lavon et al. [37]: while Escherichia coli infections lead to acute responses which make the disease easily identifiable, Staphylococcus aureus is more likely to cause subclinical infections, which are probably not reported when phenotyping depends on treatment events and this might have repercussion on the association analyses. In fact, there is a lack of understanding of the genetic variation of the trait when using binary variables for resistance to CM, and some causative mutations may not be identified.
The present study allowed us to associate clinical mastitis susceptibility to SNP polymorphisms across the genome. The Manhattan plot of marker additive genetic variance explained by 10-SNP moving windows is reported in Fig. 1, and a summary of the 10 windows that explained the largest proportion of variance is provided in Table 2. Clinical mastitis appears to be a moderately polygenic trait, with many regions across the genome contributing to genetic variation. However, there were some regions that appeared to contribute significantly to variation. The re-weighting procedure of the genomic matrix used here shrunk several windows to have adsorbed variance value close to 0. The first 10 windows explained 6.4% of total genomic variance.
In the present study, regions of higher impact on the trait were located in chromosomes 14 (from 2,574,909 to 3,137,184 bp), 11 [12] found QTLs on chromosomes 5 (57. The authors found that genes encoding for interleukin-8 and the interleukin-8 receptors located on chromosomes 2. In other studies on Scandinavian cattle from different breeds, Holmberg et al. [41] found QTL affecting clinical mastitis on chromosomes 9 (between 130 and 150 cM) and 11 (between 20 and 30 cM); Schulman et al. [13] found SNPs to selectively affect CM on chromosomes 14 at 25 cM and Klungland et al. [11] also identified QTL affecting clinical mastitis on chromosome 14 around 90 cM. Regions affecting CM on chromosome 14 were found in most of the breeds, which is in agreement with the present study.
The markers contained in the 10 most informative windows are reported in S1 Table, while the genes located in correspondence of the 10 windows with the highest variance are reported in Table 2. In window 2295 (chromosome 14, from 2,754,909 to 3,137,184 bp) there were six genes annotated (LY6K, LY6D, LYNX1, LYPD2, SLURP1, and PSCA) that are part of the lymphocyte-antigen-6 complex (LY6), which is known for neutrophil regulation function in human and mice, as part of the class III region of the major histocompatibility complex [42,43]. Among those genes LY6D is known to be involved in the first stage of B-cell Leukocyte development in mice [44]. Bahremberg et al. [45] suggested that PSCA (prostate stem cell antigen) gene is expressed during hematopoiesis from multipotential stem cells differentiating into leukocyte subpopulations in the peripheral lymphoid tissues, while Adermann et al. [46] reported that SLURP-1 codes for an amino acid sequence that is similar to the cytotoxins. Moreover, Thuong et al. [47] found LY6K to be overexpressed in macrophages extracted from human patients affected by Mycobacterium tubercolosis, which is considered an agent of bovine mastitis. On the other hand, BAI1 seems to be involved in apoptotic cell degradation [48], where it regulates the engulfment of dead cells that are removed prior to infection.
No  [49] found that IFIH1 (interferon induced with helicase C, domain 1) was overexpressed in dermal fibroblasts challenged with lipopolysaccharide treatment, concluding that the gene regulates innate immune response. Pimentel et al. [50] found that a SNP located within IFIH1 increased milk yield and decreased interval from first to last insemination in German Holstein, but reported no association with udder health. DPP4 was not found to be directly associated with udder health, but is known to interact with cytokines and inter-leukines [51], while integrin beta-6 (ITGB6) was found to be differentially expressed in resistant and susceptible lines of sheep challenged with Staphylococcus spp. [52]. Lymphocyte antigen 75 (LY75) is involved in immune response and inflammation processes [53]. NR4A2 is not known to regulate udder health, but Moreilhon et al. [54] found that human airway cells infected with Staphylococcus aureus induced transcriptional responses of this gene suggesting that it may have a generic function in resistance to bacteria.
Window 3109 located on chromosome 20 and spanning from 32,174,117 to 61,609,342 bp harbored ten genes know to be related to CM. Growth hormone receptor (GHR) is known to regulate production traits [55] and other mammary gland phenotypes, as summarized by Ogorevc et al. [56]. Zarrin et al. [57] simulated mastitis in dairy cows with lipopolysaccharide injections and found Succinyl-CoA:3-ketoacid-coenzyme A transferase 1 (OXCT1) overexpressed, showing that this gene might regulate mammary gland metabolism and milk synthesis during mastitis infection. Complement components C6, C7, and C9 are part of the membrane attack complex [58] and play an important role in immune function, antibody production, inflammation and phagocytosis of bacterial cells. These were found to be associated with CM in the periparturient period [14], and C7 was also associated with breeding values for somatic cell score [59]. Another complement, C1q and tumor necrosis factor related protein 3 (C1QTNF3), was found among the genes overlapping this window and associated with SCS [60]. Likewise, caspase recruitment domain family, member 6 (CARD6) is known to be upregulated in cases of Staphylococcus aureus infection [61]. DAB2 is known to be a putative tumor suppressor, and was found to be differentially expressed in mammary glands of cows milked with different patterns [62]. Oncostatin M is a cytokine that is produced post-lactation by the mammary epithelium and is involved in cell death. When its receptor (OSMR) was knocked-out in mice the mammary epithelium presented delayed involution [63]. In the case of CM, variants of the OSMR gene might be involved in mammary epithelium regeneration after infection and cell apoptosis. Prolactin receptor (PRLR) is known to be associated with production traits and somatic cell score [55], and was found to be downregulated in bovine mammary epithelial cells infected with Staphylococcus aureus [64], as well as in bovine hepatic tissue following intramammary injection of Escherichia coli lipopolysaccaride to simulate mastitis infection [65].
Prolactin was found to be significantly increased in udder quarters with high somatic cell count and chronic mastitis [66], and positively correlated with the number of neutrophils in milk, suggesting that prolactin may up-regulate cytokine expression. The opioid-binding protein/cell adhesion molecule (OPCML) found in window 3871 (chromosome 29 from 34,091,321 to 34,723,917 bp). It is a candidate for the suppression of ovarian and broad tumors [67,68].
Gene network analysis performed using GeneMania revealed the dense co-expression network reported in Fig. 2. The network included 145 genes with 665 interactions among them. The number of interactions for each gene in the network is reported in S2 Table. Several genes (OSMR, LY75, OXCT1, C7, GHR, C6, ITGB6, CARD6, DAB2, DPP4, LY6D, and PRLR) presented at least 10 connections, their potential role in determining the resistance to udder infection was discussed above. S1 to S5 Figs. report five functional gene networks highlighted within the general network, respectively 'Regulation of acute inflammatory response', 'Positive regulation of secretion', 'Regulation of protein activation cascade', 'Regulation of protein activation', 'Serine/threonine protein kinase complex'. As expected from a trait with low heritability and highly polygenic architecture, there are a large number of genes involved in the network, with several connections. Table 3 reports the 5 KEGG pathways identified using genes annotated in the 10 windows with the highest explained variance. The pathways were the 'Complement and coagulation cascades', 'Prion diseases', 'ABC transporters', 'JAK-STAT signaling pathway', 'Cytokine-cytokine receptor interaction'.
The JAK-STAT signaling pathway is known for its regulatory role in cell proliferation and apoptosis, responding to the presence of cytokines [69,70]. JAK2 is considered among the highest-ranking genes for a role in resistance to bovine mastitis [71]. The system works such that the signal is transmitted to the intra-cellular environment through the Janus Kinase (JAK) and signal transducer and activator of transcription (STAT) proteins. This pathway is important in regulating udder response to infection because it controls the persistent accumulation of neutrophils in the bovine mammary gland [72], while in turn JAK also works as a signaling element also for hormones and interleukin receptors [73]. The JAK-STAT pathway is activated during cow lactation [74] and when cattle tissues are challenged with a wide range of pathogen agents, from bacteria such as Streptococcus uberis [70], Escherichia coli [75], and Chlamydia trachomatis [76], to tick-borne protozoan parasites like Theileria annulata [77]. The PI3K/Akt pathway can be found within the JAK-STAT signaling pathway: it is overexpressed in lactating cows [78] and in the udder of cows inoculated with Streptococcus uberis [72].
The 'Cytokine-cytokine receptor interaction' pathway contains the prolactine (PRLR), growth hormone (GHR) and Inter-leukine 7 (IL7R) receptors. The role of cytokines and interleukines in immune response to infectious agents is well known [76,79,80], nonetheless it is recognized that expression of neutrophil chemokines or pro-inflammatory cytokines depends on the kind of pathogen [81][82][83]. Since our data did not allow differentiation of infections according to the causative pathogen, it is not possible to infer more about genes regulating cytokine expression. The 2 pathways ('JAK-STAT signaling pathway' and 'Cytokine-cytokine receptor interaction' pathway) are closely linked. Sigl et al. [84] suggested that some JAK-STAT related proteins, as activated by PRLR, can regulate balance between growth hormone and milk protein yield. Buitenhuis et al. [75] found both those pathways as being altered in expression in udder tissue of cows challenged with Escherichia coli.
The 'Complement and coagulation cascades pathway' contained the complement components C6, C7, and C9, as well as the bradikinin receptor (BDKR) and serin peptidase inibitors A (SERPINA1 and SERPINA5). Bradikinin is a vasodilator, and high levels of bradikinin may be associated with mastitis infection caused by Staphylococcus aureus [85]. Beecher et al. [86] found some variants of SERPINA1 to be associated with protein yield and fat percentage in Irish Holstein, while Swanson et al. [87] found SERPINB4 to be slightly upregulated in the mammary gland of cows infected with Streptococcus uberis.
Other pathways involving annotated genes were the 'ABC transporters' and the 'Prion diseases'. The former was already found by Naeem et al. [70] to be activated together with the JAK-STAT pathway, and Bionaz et al. [74] found it to be impacted in lactating cows, probably because of its involvement in steroid synthesis and cholesterol regulation.

Conclusions
The present study demonstrates the complexity of resistance to mastitis pathogens in dairy cattle. Single-step Genomic-BLUP allowed for optimal and simple extraction of genomic information from a population with small fraction of genotyped animals and phenotypes expressed as a categorical variable.
Clinical mastitis was confirmed to be a highly polygenic trait and genetic variance associated with trait was distributed across several regions of the genome. Regions on chromosomes 2, 14, 20, and 29 were found to affect genetic variation for clinical mastitis as well as contain genes with known roles in immune response that are putative QTL. Regions on chromosomes 8,11,16,19, and 24 contained no annotated genes that could be linked to clinical mastitis.
Pathway analysis revealed 5 pathways, 3 of which were know to be involved in immune system regulation, some of them specifically with mastitis resistance in dairy cattle. Pathways and genes found here appear to be differentially expressed in a pathogen-specific manner. Other pathways are known as linked to specific pathogens, but our data did not permit dissection of etiology. The genomic regions identified in this study can be used as predictors of genetic merit for resistance to clinical mastitis in US Holstein dairy cattle. Supporting Information  Table. List of genes involved in the co-expression network created using GeneMANIA with the respective number of connections for each gene. Names in bold were linked to clinical mastitis based on results of a literature search. (TIF)