Pleiotropic Genes Affecting Carcass Traits in Bos indicus (Nellore) Cattle Are Modulators of Growth

Two complementary methods, namely Multi-Trait Meta-Analysis and Versatile Gene-Based Test for Genome-wide Association Studies (VEGAS), were used to identify putative pleiotropic genes affecting carcass traits in Bos indicus (Nellore) cattle. The genotypic data comprised over 777,000 single-nucleotide polymorphism markers scored in 995 bulls, and the phenotypic data included deregressed breeding values (dEBV) for weight measurements at birth, weaning and yearling, as well visual scores taken at weaning and yearling for carcass finishing precocity, conformation and muscling. Both analyses pointed to the pleomorphic adenoma gene 1 (PLAG1) as a major pleiotropic gene. VEGAS analysis revealed 224 additional candidates. From these, 57 participated, together with PLAG1, in a network involved in the modulation of the function and expression of IGF1 (insulin like growth factor 1), IGF2 (insulin like growth factor 2), GH1 (growth hormone 1), IGF1R (insulin like growth factor 1 receptor) and GHR (growth hormone receptor), suggesting that those pleiotropic genes operate as satellite regulators of the growth pathway.


Introduction
Carcass yield plays a major economic role in beef cattle, as the carcass retail price is essentially determined by its weight. As differences in carcass yield between steers are partially heritable, selection and breeding are determinant operations in the beef cattle sector [1]. However, direct carcass measurements are challenging as phenotype collection depends on animal slaughter. Therefore, the use of surrogate phenotypes such as body weight measurements and visual carcass evaluation in live animals is imperative for improving carcass yield [2].
Weight measurements and visual scores of conformation, carcass finishing precocity and muscling (CPM) have been routinely employed in selection to improve carcass yield in Brazilian Nellore (Bos indicus) cattle. These traits are inexpensive and effortless to measure, and present moderate heritability in the breed [3]. However, these different traits supposedly have distinct genetic architecture, and determining the extent of their genetic correlations, as well as identifying genes affecting multiple traits simultaneously (i.e. pleiotropic genes) would be beneficial to improve strategies for genetic selection.
Bolormaa et al. [4] have recently described a method for mapping pleiotropic variants affecting traits of interest in beef cattle. The procedure consists in performing genome-wide association (GWA) scans for each trait separately, and then summarizing the effects of each genetic marker across traits with a meta-analytical approach. Additionally, other recently developed methods have aimed at increasing power and interpretability of association studies by combining single-marker results within functional elements (e.g. genes) or user-specified chromosomal windows [5,6]. Combining these two approaches may be useful in the search for putative pleiotropic genes affecting traits of interest in animal breeding. Here, we attempted to apply these methods to a sample of 995 Nellore bulls genotyped for over 777,000 single-nucleotide polymorphism (SNP) markers, for which deregressed estimated breeding values (dEBV) were available for nine weight and CPM traits. More specifically, we aimed at identifying major pleiotropic genes underlying variation in traits that are predictive of carcass yield in B. indicus cattle.

Ethical statement
This study was exempt from the local ethical committee evaluation as DNA samples used for genotyping were obtained from industrialized semen straws.

Genotypes
A total of 995 Nellore bulls were genotyped with the Illumina1 BovineHD Genotyping Bead-Chip assay, according to the manufacturer's protocol. The panel included 777,962 SNPs annotated in the UMD v3.1 bovine genome assembly. These bulls are part of the genomic selection reference population from a commercial breeding program, namely DeltaGen (http://www. deltagen.com.br/nelore.php). Data filtering was performed with PLINK v1.9 [7,8]. All genotyped samples had call rate greater than 90%. Only autosomal markers presenting a minimum call rate of 95% and a minor allele frequency of at least 2% were analyzed.

Estimated breeding values and variance components
Estimated breeding values (EBVs) for birth weight (BW), weaning gain (WG), conformation at weaning (CW), carcass finishing precocity at weaning (PW), muscling at weaning (MW), postweaning gain (PG), conformation at yearling (CY), carcass finishing precocity at yearling (PY), and muscling at yearling (MY) were obtained from routine genetic evaluations. The single-trait animal models used to generate the EBVs were corrected for environmental and maternal effects, and included records from 1,278,057 animals born between 1985 and 2012, and raised in 315 grazing-based Brazilian herds. The variance ratios required to solve the mixed model equations were computed based on restricted maximum likelihood estimates of the variance components. The heritabilities obtained for BW, WG, CW, PW, MW, PG, CY, PY and MY were 0.37, 0.26, 0.25, 0.25, 0.26, 0.33, 0.31, 0.31 and 0.30, respectively. Prior to the association analysis, EBVs were deregressed following [9], and only the bulls presenting deregressed EBVs with a minimum accuracy (based on prediction error variance) of 0.50 were analyzed.
Records for WG and PG were based on the weight gain from birth to weaning (adjusted for a period of 205 days) and from weaning to yearling (adjusted for a period of 550 days), respectively. Records for conformation, carcass finishing precocity and muscling were taken at weaning and yearling based on visual score evaluations relative to the animals of the same management group. Scores were assigned in a discrete ordered scale ranging from 1 to 5. The model used for BW included the fixed effects of contemporary group (defined as animals from the same herd, born in the same year and season, and belonging to the same birth management group) and age of dam at calving, as well as random maternal effects (maternal additive genetic effect and maternal permanent environmental effect). The model used for weaning traits included fixed effects of contemporary group (concatenation of BW contemporary group and herd-management group at weaning), Julian birth date within birth season, age at phenotype recording and age of dam at calving, in addition to the maternal effects described for BW. Post-weaning gain and the remaining yearling traits were corrected for the fixed effects of contemporary group (concatenation of WG contemporary group and herd-management group at yearling), age at phenotype recording and age of dam at calving.

Regression model
Prior to testing SNPs for association, the same single-trait regression model was applied across traits: where y is the n x 1 vector of dEBVs, 1 n is a n x 1 vector of 1s, μ is the overall mean, g is the n x 1 vector of random polygenic effects, and e is the n x 1 vector of random residual effects. Vector g was assumed a linear combination of additive marker effects: where X is a n x m matrix of allele dosages at m markers (coded as 0, 1 or 2 copies of the minor allele), and b is a m x 1 vector of random additive marker effects. Vector g was assumed N(0, G), where G = Kσ g 2 , K is the n x n matrix of additive relationships between individuals, and σ g 2 is the variance due to genome-wide markers. Vector e was assumed N(0, R), where R = Wσ e 2 , W is a diagonal matrix of weights accounting for heterogeneity of variance in dEBVs, and σ e 2 is the residual variance. The resulting variance-covariance matrix of the model was V = G + R. Notice that this is essentially the polygenic [10,11] or the Genomic Best Linear Unbiased Predictor (GBLUP) model [12] corrected for heteroscedastic residuals. Estimates of genetic parameters for each trait were obtained using the hglm v2.1-0 package in R v3.2.1 [13].

Choice of residual weights
Following Garrick et al. [9], the diagonal elements of W were computed as w = λ -1 (d + c), where λ = (1-h 2 )/h 2 , h 2 is the heritability of the trait before deregression, d = (1 -r 2 )/r 2 , r 2 is the squared vector of accuracies (i.e., reliabilities) of the pseudo-phenotypes, and c is a parameter taking values between 0 and 1 controlling the relative contribution of pseudo-phenotypes on the basis of their reliabilities. As can be seen in the formula, the weights w used here are linearly related to the weights d used by Neves et al. [14], except that they were scaled by the variance ratio λ and added by a constant c. In order to achieve a balanced contrast between dEBVs with high and low accuracy, we adopted c = 0.5.

GWA analysis
Conceptually, associations are tested by contrasting the null polygenic model against alternative models that include the fixed effect of one candidate marker at a time [11,15,16]. However, this contrast is redundant since the candidate marker was also included as a random effect in the null model through g = Xb. This introduces a bias known as 'proximal contamination' [17], which can substantially reduce the power of the tests. In order to avoid it, we used the leaveone-chromosome-out approach described by Yang et al. [18]. Briefly, the method consists in partitioning the genome-wide scan procedure per chromosome. For each chromosome j, we fit a modified null model where matrix K is built excluding all markers on j, guaranteeing that the null model does not contain the marker being tested or any other marker in linkage disequilibrium (LD) with it. Then, each marker on chromosome j is contrasted against the modified null model by using the In this way, t is conditional on the random effects of genome-wide markers, such that the model preserves power while correcting for relatedness and population substructure. Additionally, by incorporating matrix W on V, we accounted for heterogeneity of variance in dEBVs while estimating b and SE(b). Estimates for b and SE(b) were obtained by providing V and y Ã to the mmscore function in GenABEL v1.8-0 [19]. In summary, our single-trait GWA analysis was almost identical to the leave-one-chromosomeout (i.e. mlma-loco) procedure [18] in GCTA [20], except that our model accounted for heterogeneity in residual variance.

Detection of pleiotropic genes
In order to identify pleiotropic genes affecting CPM and body weight traits, we combined two distinct but complementary strategies. The Multi-Trait Meta-Analysis method described by Bolormaa et al. [4] was used to summarize single-marker statistics across all studied traits and detect major pleiotropic genes. Additionally, the Versatile Gene-Based Test for Genome-wide Association Studies (VEGAS) method [6] was applied to the single-trait associations to perform gene-set based analyses, and genes appearing in the significant list of at least four of the nine traits (i.e., approximately half of the traits) evaluated were considered as candidate pleiotropic genes. VEGAS and Multi-Trait Meta-Analysis were implemented in R v3.2.1 and are described below.
Multi-trait meta-analysis. For each SNP, consider t as the q x 1 vector of signed t-values across q traits, and C the q x q matrix of t-values correlations across genome-wide markers. The test statistic t'C -1 t is distributed as χ 2 with q degrees of freedom (df) under the null hypothesis of no pleiotropic effect. One standing issue in this implementation is that assuming C i,i = 1 when some traits present higher average correlations than others may cause highly significant composite scores even when single-trait analyses collectively present poor evidence of association. This issue was corrected here by adding the average correlation of each trait to their respective diagonal elements. The expected proportion of false discoveries among the markers declared significant was computed as f = αm/s, where m is the number of tests, α is the significance level threshold, and s is the number of tests with p < α. In order to select a value of α resulting in a false discovery rate lower than 5%, we applied the procedure described by [21]. Briefly, we manipulated the expression above to obtain α = fs/m and defined s as the largest p-value rank position i satisfying p i f i /m for f = 0.05.

VEGAS.
For each trait and for each gene, the joint VEGAS test was computed as the sum of squared t-values across markers. The distribution of the VEGAS test under the null hypothesis of no association is unknown but can be approximated using Monte Carlo simulations. The simulation procedure is carried out as follows: random draws from the null distribution are generated as x 2 = z 1 2 + z 2 2 + . . . + z m 2 , where z m 2 is one element of a m x 1 random vector z sampled from a multivariate normal distribution with mean 0 q and covariance D, encoded by the matrix of signed genotypic correlations among the m markers within the gene. The probability of observing a VEGAS statistic value as extreme as the one obtained if the null hypothesis is true is then computed as the number of times the simulated x 2 values were greater than or equal to the observed VEGAS statistic, divided by the total number of simulations. The number of simulations required to approximate the null distribution is chosen adaptively: let x 2 be the vector of random draws from the null distribution. At every iteration, if the p-value is lower than the inverse of the current number of simulations k, k-length(x 2 ) extra samples are obtained, the new p-value is computed, and the new number of simulations is set to 10 log10 (k)+1 . We initialized the iterations using k = 1,000. In this way, the algorithm dynamically recalibrated low p-values based on a sample precision of 10 3 , 10 4 , 10 5 and 10 6 simulations. Due to computational limitations, the process was interrupted at 10 6 simulations, and probabilities were bounded to p < 10 −6 . Genes sheltering no SNPs had their p-values set to 1. In order to guarantee that D was positive definite, we used the maximum set of markers for which all pairwise squared correlations were lower than or equal to 0.5. In order to capture information from intergenic markers in LD with unobserved variants lying within genes and their regulatory regions, we expanded gene boundaries in the UMDv3.1 assembly to ± 100 kb of the 5' and 3' UTRs. Finally, given the VEGAS test is highly conservative, the gene list for each trait comprised all genes with p < 0.01. Based on the moderate genetic correlations among the nine traits studied here (see the Results and Discussion section), putative pleiotropic genes were defined as those appearing in the gene list of four or more traits, corresponding to genes potentially associated with at least half of the studied traits.

Results and Discussion
Data filtering

Evidence of pleiotropic effect from additive genetic correlations
As dEBVs encapsulate individual additive genetic values and association t-values represent marked additive genetic effects, we considered both dEBVs and t-values correlations between traits as proxies for the additive genetic correlations between traits. On average ( Table 1), dEBVs were moderately correlated across traits (r = 0.442), and strikingly similar results were found for t-values across traits (r = 0.423). An exception was BW, which was only mildly correlated with WG and moderately correlated with conformation traits. These noteworthy genetic correlations suggest that pleiotropic loci may contribute to the genetic variance of these traits.

Major pleiotropic effects map to the PLAG1 region
After combining the results across the nine traits with the Multi-Trait Meta-Analysis method, a total of 983 markers were declared significant at an empirical threshold of p < 9.20 x 10 −5 (Fig 1A), resulting in a false discovery rate of approximately 5%. A single large, dominant signal mapping to chromosome 14:19.46-34.92 Mb was identified. The leading SNP, namely rs136543212 (probe BovineHD1400007373), mapped to position 25,502,915, in the vicinity of the well-known PLAG1 (pleomorphic adenoma gene 1) chromosome domain. This chromosomal segment has been implicated as a highly pleiotropic locus underlying genetic differences in growth, weight and reproductive traits in cattle [4,[23][24][25][26][27][28][29]. The functional candidacy of PLAG1 is well supported by its regulation of the expression of insulin-like growth factors (IGF) [30]. These factors are major mediators of the growth pathway and the hypothalamic-pituitary-gonadal axis [31], and serum concentrations of IGF1 in cattle have been shown to be highly heritable, negatively correlated with weight and carcass traits [32,33] and primarily determined by variants in the PLAG1 chromosomal region [29]. Nevertheless, the identities of the causal variant and the affected genes are still to be unraveled, since other genes in the vicinity of PLAG1 are also plausible functional candidates, notably MOS [23], CHCHD7 [23,34], XKR4 [35][36][37] and PENK [38][39][40]. The signal detected here suggested that the underlying causal variant has negative effects on carcass finishing precocity/muscling traits and positive effects on weight/conformation traits, consistent with a previous report of a candidate causal variant [23] of Bos taurus origin (the C allele at SNP rs109231213, position 14:25003338) associated with decreased IGF1 serum concentrations and precocity, as well as with increased height and weight [28].
To determine if this large segment comprised a single signal driven by a large LD block or it construed a mixture of signals, we re-analyzed our data using the same GWA model conditional on the fixed effect of the top scoring SNP. Correction for the effect of rs136543212 alone was able to remove most of the signal (Fig 1B), which suggests that this large segment is a single LD block. In fact, all significant markers were in moderate to high LD with the top scoring marker (Fig 2). However, as correcting for the leading SNP was not sufficient to completely  eliminate the signal, it is hard to distinguish between the presence of more than one causal nucleotide within the LD block and residual effects captured by the remaining markers due to imperfect tagging of rs136543212.

Detection and interactions of additional candidate pleiotropic genes
We carried out the VEGAS analysis using single-trait GWA results. The top scoring gene appearing in the gene list of eight of the nine traits investigated was PLAG1, consistent with the findings of the Multi-Trait Meta-Analysis approach. Our analysis showed that the VEGAS and the Multi-Trait Meta-Analysis approaches are complementary, and that they can be used jointly to maximize the discovery of pleiotropic genes. As the PLAG1 signal comprised a large LD block, we carried out further search for pleiotropic genes omitting results from chromosome 14, except for PLAG1 itself. Additionally, we also omitted results from 25 olfactory receptor genes and a cluster of 32 histone genes, both mapping to the vicinity of other functional candidate genes. After applying these filters, we obtained a list of 224 candidate pleiotropic genes. These included 176 protein-coding genes, 12 pseudo-genes, 12 snoRNA, 11 snRNA, 11 miRNA and two misc_RNA. We then focused our functional annotation on protein-coding genes. Besides PLAG1, we found a series of growth-related genes, including growth differentiation factors 2 (GDF2) [41], 10 (GDF10) [42] and 11 (GDF11) [43], growth arrest-specific 2 like 3 (GAS2L3) [44], fibroblast growth factor 22 (FGF22) [45], PH domain and leucine rich repeat protein phosphatase 1 (PHLPP1) [46], signal transducer and activator of transcription 2 (STAT2) [47], SMAD family member 4 (SMAD4) [48], and insulin-like growth factor binding protein 5 (IGFBP5) [49]. Genes involved in muscle development and function were also found, including methylmalonyl CoA mutase (MUT) [50], troponin T type 1 (TNNT1) [51], troponin I type 3 (TNNI3) [52], and sarcoglycan delta (SGCD, also known as 35kDa dystrophin-associated glycoprotein) [53].
We then used the STRING database to annotate protein-protein interactions among the candidate genes. From the initial list of 176 genes, 82 presented connections. From these, 54 genes were involved in a single network (Fig 3A). The remaining 28 genes formed smaller networks ranging from two to six genes (S1 Data). Interestingly, PLAG1 was not present in any network, in spite of being the major pleiotropic gene in our study.
Although key genes involved in growth, such as IGF1 (insulin like growth factor 1), IGF2 (insulin like growth factor 2), GH1 (growth hormone 1), IGF1R (insulin like growth factor 1 receptor) and GHR (growth hormone receptor), were not found to shelter pleiotropic variants affecting carcass traits in the present study, we decided to add them to the list to reveal their interactions with the pleiotropic genes found here (Fig 3B). The main motivation for this was the known function of PLAG1: by fine-tuning the expression of the IGF family, the predicted downstream consequence of mutations in PLAG1 is direct interference in the growth pathway. This could also be the case for other pleiotropic genes identified here. Another argument to support this strategy was that mutations directly occurring in the growth pathway are likely to produce extreme phenotypes, such that common variation in carcass size is expected to be explained by variation in satellite genes modulating that pathway. For instance, mutations in IGF1, IGF1R and GHR have been implicated in extreme body size reduction across dog breeds [54], as opposed to common variation in human height being explained by a large number of variants in genes outside the growth pathway [55].
Surprisingly, when IGF1, IGF2, GH1, IGF1R and GHR were added to the gene list, the pleiotropic genes formed a satellite network surrounding a central hub comprising the growth pathway. Additionally, FGF22, ITGB5, LOC506088 and PLAG1 were incorporated to the network when the genes mentioned above were included. This finding supported our hypothesis that the pleiotropic genes detected here are modulators of the growth pathway. It also opens the question whether genetic correlations among the weight and CPM traits encapsulate information about the growth pathway, in the sense that each one of these traits may be serving as a partial surrogate for the same underlying intermediate trait (i.e. growth). In this scenario, one would expect to find similar results in a direct GWA analysis on the intermediate phenotype. Network of candidate pleiotropic genes for carcass traits in Bos indicus (Nellore) cattle. The network was built from known proteinprotein interactions (edges) between gene products (nodes). The size of the node is proportional to the number of traits the gene is associated with. In A, the network is portrayed according to the list of genes obtained from the VEGAS analyses. In B, after the inclusion of five essential genes (in blue) form the growth pathway, the network presented itself as a satellite, and four more genes (in red) could be incorporated, including the major pleotropic gene PLAG1. doi:10.1371/journal.pone.0158165.g003

Conclusions
We identified genes associated with multiple carcass traits in Bos indicus (Nellore) cattle. These pleiotropic genes formed a network modulating IGF1, IGF1R, IGF2, GH1 and GHR, which are well known major actors of the growth pathway. This finding suggests that common variation in carcass traits is not likely to be explained by mutations in essential genes controlling growth. Instead, the variation may lie in accessory genes that regulate the function and expression of essential genes. Among these accessory genes, PLAG1 seems to be the most influential. Moreover, we did not rule out the possibility that, at some extent, the genetic correlations among the nine traits studied here represent the indirect, partial measurement of a single underlying growth trait. In this case, the pleiotropic genes identified here may simply represent genes affecting the intermediate phenotypes. The future characterization of causal variants in these genes may contribute to improved prediction of carcass yield and more informed mating decisions in B. indicus cattle.
Supporting Information S1 Data. GWA data including detailed information of the single-trait and multi-trait analyses, as well as gene description results for the VEGAS and STRING analyses.