Genome-wide association study for mammary structure in Canadian Angus cows

Functional and enduring mammary structure is pivotal for producer profitability, and animal health and welfare in beef production. Genetic evaluations for teat and udder score in Canadian Angus cattle have previously been developed. The aim of this study was to identify genomic regions associated with teat and udder structure in Canadian Angus cows thereby enhancing knowledge of the biological architecture of these traits. Thus, we performed a weighted single-step genome wide association study (WssGWAS) to identify candidate genes for teat and udder score in 1,582 Canadian Angus cows typed with the GeneSeek® Genomic Profiler Bovine 130K SNP array. Genomically enhanced estimated breeding values (GEBVs) were converted to SNP marker effects using unequal variances for markers to calculate weights for each SNP over three iterations. At the genome wide level, we detected windows of 20 consecutive SNPs that explained more than 0.5% of the variance observed in these traits. A total of 35 and 28 windows were identified for teat and udder score, respectively, with two SNP windows in common for both traits. Using Ensembl, the SNP windows were used to search for candidate genes and quantitative trait loci (QTL). A total of 94 and 71 characterized genes were identified in the regions for teat and udder score, respectively. Of these, 7 genes were common for both traits. Gene network and enrichment analysis, using Ingenuity Pathway Analysis (IPA), signified key pathways unique to each trait. Genes of interest were associated with immune response and wound healing, adipose tissue development and morphology, and epithelial and vascular development and morphology. Genetic architecture from this GWAS confirms that teat and udder score are distinct, polygenic traits involving varying and complex biological pathways, and that genetic selection for improved teat and udder score is possible.


Introduction
Mammary structure is a functional trait that has been associated with cow longevity [1,2], cow health and welfare [3], as well as with maternal care and performance of calves [4][5][6]. Therefore, maintenance of functional mammary structure (teat size and shape, and udder suspension, size, and shape), plays an important economic role for cow-calf producers.
Previous studies have demonstrated that individual differences in bovine mammary structure is genetically influenced with heritability value estimates ranging from low to moderate [7,8]. In addition, there is significant potential for identifying the genetic contributions and biological mechanisms that lead to individual variation observed in this complex system. To date, genome-wide association studies (GWAS) for teat and udder structure have focused primarily on dairy cattle and traits of economic importance to the dairy industry including attachment and size for both teat and udder, mastitis resistance, milking speed, and milk production [9][10][11]. Few GWAS have included Bos taurus beef breed populations. Vallee et al. [12] have performed GWAS for udder volume, teat size, and leg structure in Charolais cattle. In Fleckvieh cattle, Pausch et al. [13] identified QTL significantly associated with multiple mammary phenotypes such as udder depth, central ligament score, and teat length and thickness identified. And, in Angus-Nellore cross cows, Tolleson et al. [14] have performed GWAS for udder support score and teat size.
Since genetic evaluations and selection in beef cattle are evolving to include traits that impact efficiencies as well as animal health and welfare [15], the Canadian Angus Association (CAA) has recently developed a genetic evaluation for teat and udder score for Canadian Angus cattle. Covariance components and genetic parameters were estimated for teat size and udder suspension in Canadian Angus cows [7], however, the genes and molecular pathways associated with these traits remain unidentified. Therefore, the objective of this study was to perform a genome-wide association study aiming to identify genomic regions, potential candidate genes, and the biological mechanisms underlying teat and udder structure in Canadian Angus cattle.

Materials and methods
All procedures involving cattle were reviewed and approved by the University of Calgary Animal Care Committee (Protocol AC16-0218). The collection of DNA samples for animal genotyping was done in accordance with the Canadian Code of Practice for the care and handling of farm animal [16]. Detailed methods and materials are described by Devani et al. [7].

Phenotypes
Detailed descriptions of teat and udder scoring in Canadian Angus cows, population structure, and animal management were provided previously by Devani et al [7]. In short, teat and udder scores were recorded by 3 trained individuals for 1,735 Canadian Angus cows from 10 herds across Alberta. The Beef Improvement Federation (BIF) recommended scoring guidelines [17], that ranges from 1 (large bottle shaped teats and pendulous udders with compromised suspension) to 9 (the smallest teats and tightest udders), were used to assess cow teat and udder structure. In accordance with the guidelines all bred females within each herd were assessed, and the poorest mammary quarter was scored.
Cow pedigree information (4-generation, including 8,199 dams and 3,227 sires), cow parity (ranging from 1 to 16), date of birth and breed (Black or Red Angus), calf date of birth and sex were provided by the Canadian Angus Association (CAA). Contemporary groups (CG) were defined by herd, year, and season of calving. The CG with no variation in the observed traits or with less than 3 cows were excluded. Cows in later parities were grouped to manage the distribution bias within parities. Parity groups are defined as per Table 1.

Genotypes
Of the phenotyped cows, 1,582 were genotyped using the GeneSeek 1 Genomic Profiler Bovine 130k BeadChip. The BeadChip contains 139,456 highly polymorphic SNPs that were selected for high minor allele frequency values and uniform genome coverage for Bos taurus cattle, with a mean distance of 19 kb between markers. Quality control was applied to exclude SNPs that had a call rate lower than 0.95, a minimum allele frequency lower than 0.01, monomorphic SNPs, or were deviant from Hardy-Weinberg equilibrium (χ 2 >0.05). Animals were excluded for greater than 0.10 frequency missing genotypes or parent-progeny Mendelian conflicts [18]. After quality control, a total of 1,459 animals with genotypes comprised of 103,980 SNPs remained and were used for further GWAS analysis.

Statistical analyses
Weighted single-step GBLUP (WssGBLUP) was used to estimate SNP effects as illustrated by Wang et al. [19]. The WssGBLUP was conducted using BLUPF90 family of software [18], wherein all phenotype, pedigree, and genomic information was combined simultaneously using Bayesian inference via Gibbs sampling. For both traits, the animal model included additive genetic effect and residual effect as random effects, contemporary group, breed, and parity group as fixed effects, and the number of days-between-calving-and-measure (linear effect) as covariate. Only for udder score, the number of days-between-calving-and-measure was included as covariate with quadratic effect. The general model can be represented as follow: where y is the vector of phenotypic observations of teat or udder score, β is the vector of fixed effects, a is the vector of direct additive genetic effects, e is the vector of the random residuals effects, X and W are the incidence matrices of β and a, respectively. Assumptions included a~N (0, Hs 2 a ), where H is the relationship coefficient matrix amongst animals and the genetic additive variance (s 2 a ). Also, e~N (0, Is 2 a ) where I is identity matrix and s 2 e is the residual variance. The prior distribution for genetic and residual variance components was an inverted Wishart and the posterior estimates were obtained using the POSTGIBBSF90 program.
In accordance with Aguilar et al. [20] the inverse of H matrix, combining both pedigree and genomic information was obtained as follow: where A À 1 22 is the inverse of numerator relationship matrix for all phenotyped and genotyped animals and G −1 is the inverse of the genomic relationship matrix which was constructed in where Z is the SNP incidence matrix containing genotypes (0, 1 or 2) adjusted for allele frequency, D is a diagonal matrix with the inverse of expected SNP variance (initially D = I), and q is a weighting factor for SNP variances. The weighting factor used was as in Vitezica et al.
(2011), ensuring that the average diagonal in G is close to that of A 22 . For each analysis, 700,000 iterations were generated, retaining every 50 th sample. The first 200,000 iterations were discarded as fixed burn-in. Data convergence was checked through graphical analysis of sampled values. Estimates of SNP effects and weights for WssGWAS were obtained in accordance with the iterative process proposed by Wang et al. [19], as follows: In the first iteration (t = 0): GEBV were calculated for the entire dataset using ssGBLUP, GEBV were converted to estimates of SNP effect, whereâ g is the GEBV of animals that were genotyped, The weight for each SNP to be used in the next iteration was calculated as: where i is the i-th SNP, The SNP weights were normalized to keep the total genetic variance constant: G (t+1) = ZD (t+1) Z´λ was calculated, t = t + 1 and loop to step 2. Three iterations of the above process were run, from step 2 to 7, in accordance with Wang et al [22], wherein both animal and SNP effects were updated and used to construct the weighted G matrix, update the GEBV, and to estimate the new SNP effects during the next iteration. The percentage of genetic variance explained by each SNP window (comprised of 20 consecutive SNPs) was also calculated in accordance with Wang et al [22] as: where a i is the genetic value of the i-th SNP window that consists of a region of 20 consecutive SNPs, s 2 a is the additive genetic variance, Z j is the vector of gene content of the j-th SNP for all individuals, andû j is the marker effect of the i-th SNP within the i-th SNP window.

Gene network analyses
The results from the WssGBLUP were used to identify genomic windows associated with teat and udder scores. Thus, the SNP windows that explained more than 0.5% of the total genetic variance for teat and udder score were selected for further exploration. For visual reference, a Manhattan plot was created, for each trait, using the R 3.6.2 package "ggplot2" [23]. In addition, chromosome number, start and end coordinates of each SNP window were used to identify candidate genes from the Bos taurus genome UMD3.1 assembly as the reference map through Ensembl Biomart Martview application (http://www.ensembl.org).
The identification of gene networks, pathways, and enrichment analysis (Gene Ontology and KEGG pathways) were performed using Ingenuity Pathway Analysis (IPA, Ingenuity Systems, Redwood City, CA. (http://www.ingenuity.com), which uses human gene ontology terms and KEGG pathways to develop networks. In addition, manual searches were performed to ascertain whether any of the genes had been associated with traits in bovine within previous studies.
SNP windows that explained more than 0.5% of total additive genetic variance were also used to identify QTLs already described. Using SNP window start and end positions the animal QTLdb database (release 40, access date: January 3, 2020) [24] was interrogated for previously associated QTL within these regions. Documented QTL were catalogued according to associations and causation.

Phenotypes
Detailed description for phenotype collection are described previously [7]. Observed teat and udder scores, on 1,735 Canadian Angus cows that ranged in parity from 1 to 13, varied from score 1 (large bottle shaped teats and pendulous udders) to score 9 (small symmetrical teats and well suspended udders). To correct for distribution bias observed in higher parities (due to producer culling) parity groups were constructed (Table 1). In total, 1,582 Canadian Angus cows were genotyped to incorporate genomic information into genetic evaluations and to better understand the genetic architecture of teat and udder structure through genome-wide association studies (GWAS). The descriptive statistics for the data set is presented in Table 2.

Estimation of variance components
The variance components and heritability estimates are presented in Table 3 in which the heritability (SE) for teat and udder score were 0.35 (0.05) and 0.16 (0.04), respectively which is in line with previous estimates reported based on the phenotypes and pedigree, but not including genotypes for this population [7]. These values are also in accordance with previously reported estimations for beef [8,12,25], and dairy [9,26] cattle. Moderate heritability estimates are indicative for potential genetic selection and to improve teat and udder structure in Canadian Angus cows.

Genome wide association study
The GWAS analyses for teat and udder score resulted in 35 and 28 associated windows of 20 consecutive SNPs, respectively, that explained more than 0.5% of the total additive genetic variance for these traits (Table 3). Of these windows, none explained more than 2.5% of the variation observed. In total, these SNP windows explained 26.23% and 23.45% of the total additive genetic variance for teat and udder structure in Canadian Angus cows, respectively. The SNP windows were distributed on Bos taurus autosomes and Manhattan plots are shown in Figs 1 and 2.

Gene annotation
Within these SNP windows, using the Biomart tool embedded in the Ensembl Bos taurus genes database version 94 (UMD3.1.), there were 94 and 71 annotated genes (including 11 open reading frames) identified (Tables 4 and 5). Only two SNP windows were in common for both traits in which 7 overlapping genes, 1 common microRNA, and 1 common open reading frame (ANKRD60, APCDD1L, bta-mir-4449, C13H20orf85, RAB22A, RASL11B, RF00568, USP46, VAPB) were identified. For teat score, most of the genes identified (18.09% and 13.83%) were on BTA2 and BTA6 respectively. For udder score, most of the genes identified (11.27% total) were on BTA2 and BTA22. Using IPA to extend this Ensemble list of genes identified, 138 and 83 characterized genes were identified for teat and udder score, respectively. From the two common SNP windows, 14 annotated genes were identified for both traits (Fig 3). These findings suggest that these are polygenic traits mainly influenced by distinct genes since there are many genomic regions with small additive effects on teat and udder structure, and that most of these are not commonly shared between the two traits. This is in line with previously reported genetic correlations between the traits ranging from 0.46 to 0.81 [7,8,27], as well as with other GWAS studies that found distinct genes associated with teat and udder score in beef [14] and dairy cattle [9].

Gene network analyses
Gene enrichment analyses were performed by using the Ingenuity Pathway Analysis (IPA) software. The list of genes associated with each trait that was located within all SNP windows that explained more than 0.5% of the total additive genetic variance are presented in Tables 4  and 5 and were used as inputs for IPA. The most enriched gene networks identified associated (p < 0.05) with teat score are involved in i = Cell Cycle, Cellular Assembly and Organization, and Cellular Function and Maintenance (Fig 4); ii = Cardiovascular System Development and  In addition, in order to enhance understanding of the architecture of both traits, the input genes were grouped by associations with Diseases and Functions highlighted by IPA software ( Table 6). The associations for both traits ranged from immune response and wound healing, lactation failure, fertility, udder health and morphology, and temperature sensitivity. Associations with cancer, specifically breast cancer, for a multitude of genes identified through this GWAS can be attributed to impaired gene function towards immune response, morphological integrity, and cell-cell binding. For teat score, genes were also grouped under functions pertaining to epithelial and vascular development, and thymus development, function, and associated diseases. On the other hand, for udder score, gene functions also included adipose tissue (development, quantity, function, and related diseases) and organ development. Of the genes identified within the 20 SNP windows that explained more than 0.5% of the total additive genetic variance for teat and udder score in Canadian Angus cows several are of particular interest in terms of knowledge and understanding of the trait architecture, such as ADAM23 (on BTA2), ANKRD60 (on BTA13), CNTN1 (on BTA5), PTPRR (on BTA5), USP46 (on BTA6), and VEGFA (on BTA23). A window of 20 consecutive SNPs on BTA2 explained 1.15% of the variance observed in teat score, this region contains the gene ADAM23 (ADAM Metallopeptidase Domain 23) which encodes a disintegrin family of membrane anchored protein that has been associated with biological processes involving cell-cell interactions required for fertilization, muscle development and neurogenesis. Genes that play a role in cell-cell and cell-extracellular matrix are pivotal to tissue remodeling processes involved in structural development, wound healing, inflammation, and tumour cell invasion [28]. Aberrant expression and function of regulatory genes such as ADAM23 can lead to loss of tissue architecture allowing higher proliferation of cancer cells. Verbisck et al. [29] demonstrated that cancer cells not expressing ADAM23 had higher migration capacities. Thus, the gene has been associated with multiple types of cancer, including breast cancers [30,31]. In addition, Elizondo et al. [32] have demonstrated

PLOS ONE
ADAM23 involvement in immune function has the expression of ADAM23 in dendritic cells governs T cell proliferation and cytokine production. Integrity of mammary gland tissue architecture is essential to maintain both structure and function of the udder. In Holstein cattle, Fang et al. [33] found ADAM23 to be significantly associated with milk protein production.
In addition, the Vascular endothelial growth factor A gene (VEGFA) which has been associated with milk protein and fat percentage in Holstein cows [34], was of particular interest for teat score. The VEGFA gene, which encodes a vascular endothelial growth factor is upregulated in cancer tumors and its expression is correlated with tumor stage and progression [35]. In   Table 6. Summary of diseases and functions associated with teat and udder score by using Ingenuity Pathway Analysis (IPA) software.

Function
Genes associated with teat score bovine, the VEGFA gene has also been associated with angiogenesis and increased vascularity in ovarian follicles [36,37], being considered an important regulator of placental development and function [38]. This gene plays an important function in the better supply of oxygen and nutrients to tissues, as well as in the tissue repair process after damage [39]. Healthy mammary glands need to be well supported with blood vessels, arteries and veins, these provide a continuous supply of nutrients to milk synthesising cells as well as regenerating cells. Thus, VEGFA is of particular interest towards understanding the biological architecture of mammary morphology. Conversely, in myocardial tissue VEGFA has been associated with vascular permeability and edema [40]. Udder edema has been linked increased incidence of mastitis and, in some instances, lower milk production [41]. It is also possible that udder edema may result in delayed calf suckling [42]. Teats that maintain size and structure under the duress of calf suckling are significant to the beef industry. Of interest, previous studies have alluded to eye-udder genetic correlations and consistent with this, VEGFA gene has been described to play a significant role in retinal epithelial cell development and proliferation [43]. Furthering the eye-udder link, a missense mutation in the Receptor-type tyrosine-protein phosphatase R (PTPRR) gene has been associated with high grade myopia [44,45]. The PTPRR gene encodes a tyrosine phosphatase receptor type R protein known to be a signaling molecule that regulates cell growth, differentiation, and mitotic cycle. Altered protein tyrosine phosphatase signalling results in deregulated kinase activity oncogenic transformation [46]. This gene is implicated in numerous cancers [47], and specifically breast cancer due to its role in mammary epithelial cell biology [48]. Massive tissue remodeling occurs between parturitions in bovine mammary glands. Post lactational involution involves massive mammary epithelial cell apoptosis. In preparation for the next lactation, rapid proliferation of epithelial cells and cell differentiation to develop alveoli and ductal branches become predominant [49,50]. Aoki et al. [51] suggest that (in mice, at least) protein tyrosine phosphatases are upregulated both during this proliferation phase of mammary gland and the involution phase, and down regulated during lactation. Regarding udder structure in bovine, PTPRR has been identified as upregulated or significantly associated with udder health and structure [52]. Herring et al. [14] found the PTPRR gene in close physical proximity with significant SNPs identified by GWAS for udder support and structure in Angus-Nellore crossbred cows. Engle et al. [53] found associations between PTPRR and stayability in the same cow population. These findings are in line with the observation that teat and udder structure contribute towards early culling in cows [1,2,54].
To this end, the CNTN1 gene codes for Contactin 1, a glycosylphosphatidylinositolanchored neuronal membrane protein and member of the immunoglobulin superfamily, that functions as a cell adhesion molecule. Contactin 1 is thought to play a role in the proliferation and differentiation on neurons [55]. It is noteworthy, that CNTN1 has been associated with myopathy and muscular weakness in both humans and mice [56]. Compton et al [56] suggest nerve-muscle communication may be interrupted by the loss of glycosylate membrane binding in skeletal muscles within humans with CNTN1 mutations. Maintenance of udder structure over repeated parities is dependent on muscular vigor. In addition, Shin et al. [57] used copy number variations to infer an association of CNTN1 with milk production in Holstein cattle.
As mentioned previously, two SNP windows, and 7 annotated genes within them, overlapped between teat and udder score. One of the genes here was the Ubiquitin Specific Peptidase 46 (USP46) gene. This is a protein coding gene which belongs to a large gene family of cysteine proteases that function as deubiquitinating enzymes (DUBs), and has been associated with maternal behavior in mice [58], mastitis resistance in sows [59], and breast cancer in humans [60]. DUBs may regulate gamma-Aminobutyric acid (GABA) action which is an inhibitory neurotransmitter but can also suppress inflammatory immune response while promoting regulatory immune response. GABA has been associated with human peripartum behaviour [61]. DUBs are also critical regulators of numerous ubiquitin dependent processes including synapse development and function. Ubiquitination of pre and postsynaptic proteins can regulate their stability, function and subcellular attachment [62]. Ubiquitination effects cellular processes by regulating the degradation or activation of proteins, this process is crucial for cell cycle progression, proliferation, and development. Previously discussed, profound proliferation and involution of both the extracellular matrix and mammary epithelial cells occur cyclically within the mammary gland. Thus, USP46 may play a significant role in the maintenance of mammary health and structure. Relatedly, and also in common to both traits, was Ankyrin Repeat Domain 60 (ANKRD60) which moderates protein-protein interactions between diverse families of proteins. Ankyrin repeats are tandemly repeated modules of approximately 33 amino acids that occur in a large number of functionally diverse proteins. Ankyrin repeat domain-containing protein 60 also consists of a ubiquitin-like-domain. Both the repeat motif and ubiquitin-like-domain are associated with cell process regulation and proteasomal degradation. Although the protein is not well characterized, in humans this gene is highly expressed in reproductive glands, and in bovine has also been associated with fertility [63]. From the gene enrichment activities within this study, IPA identified several genes that are associated with fertility and reproduction. Fertility is the most impactful variable on producer profitability, therefore, any associations between teat and udder score and fertility are important to explore further.
Previous GWAS for mammary structure traits in beef breeds identified that the myostatin gene (GDF8) was significantly associated with udder volume and teat size [12] in Charolais cattle. In addition, 8 genes (SP5, GC, NPFFr2, CRIMI1, RXFP2, TBX5, RBM19 and ADAM12) that are in close proximity to 7 QTLs significantly associated with udder depth, central ligament score, and teat length and thickness were identified in Fleckvieh cattle [13]; and in Angus-Nellore cross cows, Tolleson et al. [14] found 3 SNPs associated with udder support score and teat size that are in close proximity to the vitamin D receptor gene (VDR), and interleukin 22 gene (IL22). It was within this study that Tolleson et al. [14] also reported the association of the protein tyrosine phosphate receptor type R gene (PTPRR) with teat and udder structure. Enrichment analysis of previously identified associations with teat and udder structure includes genes related to immune response, cell signaling, tissue remodeling and organ development, supporting GWAS findings from this study.

QTL database
To increase understanding about the molecular architecture of teat and udder structure in Canadian Angus cows, SNP windows from this GWAS were aligned to the publicly available Cattle Quantitative Trait Locus (QTL) Database (Cattle QTLdb). Documented QTLs were catalogued according to associations and causation, summarized in Table 7. The highest proportion of QTL identified were associated in previous studies with milk production (31%), reproductive production (16%) and body conformation (15%). Possibly a reflection of the amount of studies on teat and udder structure, particularly in beef cattle, only 2% and 7% of QTL previously associated within these windows were associated to teat and udder structure, respectively. The presence of QTLs within the windows identified from this study with those previously associated with mastitis resistance and immune function supports their important role in these traits.

Conclusion
The objective of this study was to incorporate genomic information into the genetic evaluation for teat and udder score in Canadian Angus cattle in order to enhance knowledge and understanding of the genetic architecture of the traits. Using WssGWAS, multiple SNP windows were identified to explain 0.5% or more of the variance observed in teat and udder score for Canadian Angus cows, none surpassing 2.33%, implying that the traits are polygenic. Using IPA software for gene enrichment analysis we found genes associated with cancer, immune response and temperature sensitivity. Only 2 such SNP windows were common to both traits implying that the genetic architecture and biological pathways involved are distinct for the traits, and associated with epithelial cell, vascular, and tubular development, and thymus gland functions for teat score, and adipose tissue development, organ development and vigour, lactation failure and fertility for udder score. Previously described associations with QTLs within SNP windows of significance for teat and udder score in Canadian Angus cows also provided information on the biology behind teat and udder score within this population. Persisting good teat and udder structure supports producer profitability and animal health and welfare. This study therefore fosters an appreciation of the complexity of the traits and the need for accurate and appropriate genetic selection tools for teat and udder in beef cattle.

Acknowledgments
This study would not have been possible without cooperation of Canadian Angus producers.