Selection Indices and Multivariate Analysis Show Similar Results in the Evaluation of Growth and Carcass Traits in Beef Cattle

This research evaluated a multivariate approach as an alternative tool for the purpose of selection regarding expected progeny differences (EPDs). Data were fitted using a multi-trait model and consisted of growth traits (birth weight and weights at 120, 210, 365 and 450 days of age) and carcass traits (longissimus muscle area (LMA), back-fat thickness (BF), and rump fat thickness (RF)), registered over 21 years in extensive breeding systems of Polled Nellore cattle in Brazil. Multivariate analyses were performed using standardized (zero mean and unit variance) EPDs. The k mean method revealed that the best fit of data occurred using three clusters (k = 3) (P < 0.001). Estimates of genetic correlation among growth and carcass traits and the estimates of heritability were moderate to high, suggesting that a correlated response approach is suitable for practical decision making. Estimates of correlation between selection indices and the multivariate index (LD1) were moderate to high, ranging from 0.48 to 0.97. This reveals that both types of indices give similar results and that the multivariate approach is reliable for the purpose of selection. The alternative tool seems very handy when economic weights are not available or in cases where more rapid identification of the best animals is desired. Interestingly, multivariate analysis allowed forecasting information based on the relationships among breeding values (EPDs). Also, it enabled fine discrimination, rapid data summarization after genetic evaluation, and permitted accounting for maternal ability and the genetic direct potential of the animals. In addition, we recommend the use of longissimus muscle area and subcutaneous fat thickness as selection criteria, to allow estimation of breeding values before the first mating season in order to accelerate the response to individual selection.


Introduction
Growth traits are generally useful selection criteria for beef cattle. Weight records at different ages can be easily collected on-farm and tend to present strong correlation estimates [1][2][3] and moderate to high heritability estimates [4][5][6][7]. Weaning weight is usually considered as a correlated trait in the genetic evaluation of livestock and is often used to support culling and selection decisions [8]. Other traits such as fat thickness and muscle area are also fairly easy to measure using real-time ultrasound [9,10], and may furnish accurate data for the estimation of breeding values, as reported for young bulls [10,11].
Expected progeny differences (EPDs) are important tools for animal breeding and selection. However, each EPD reflects an animal's genetic merit for only a single trait at a time. By contrast, animal breeders dealing with complex production systems must focus on many traits simultaneously. Historically, animal selection has been based on EPDs combined with a portfolio of diversifying approaches (see [12] for tandem selection and selection based on independent culling levels; [13] for index selection; and [14] for an aggregate value index). All methods share the same goal in terms of selecting animals with superior genetic merit for traits of economic importance. Some methods may present advantages which depend on the availability and complexity of data. The selection index approach is time-consuming and is reliant on detailed farm-specific data in order to obtain the economic values of phenotypic traits. These restrictions likely account for the limited application of selection indices, despite proposal of the theory over seven decades ago.
Multivariate analyses are frequently used as an alternative approach for data summarization. In cattle breeding this technique has been applied to efficiently rank and group bulls according to similarity [15]. The k-means method has demonstrated fine clustering of bulls which were submitted to a progeny test [16]. Principal components (PC) analyses are multivariate techniques used mainly to reduce the dimensionality of data and to explore the relationship between traits in a dataset. In animal breeding, PC analysis techniques have been used to study the relationships among the estimated breeding values of various traits [17,18]. Another important multivariate approach also used in quantitative genetic analysis is discriminant analyses [19,20], which also can be used to classify animals based on their breeding values. However, we are unaware of any report attempting to use a multivariate approach for the purpose of genetic selection in cattle. Thus, revisiting the multivariate playground in order to identify and select similar/superior animals using EPDs may prove fruitful. In particular, this approach seems promising for the purpose of accounting for the relationships among breeding values that are related to several traits at the same time. Specifically, multivariate techniques might be suitable for the evaluation of growth and carcass traits, providing an alternative approach in the portfolio of selection tools [18].
This study was carried out using data from Brazil, where the use of EPDs has played a crucial role in increasing livestock productivity. The present status of cattle breeding and selection in this country matches the methodological summary given above. There have been several efforts to estimate covariance and genetic parameters and to predict breeding values in Nellore cattle [3,7,21,22] with no reports on any multivariate approach to selection. Over the past 30 years, Brazilian cattle breeders have widely adopted the use of animal breeding values rather than phenotypes in the selection process.
Currently, Brazil has the world's second largest commercial beef cattle herd, thus occupying a prominent position in the global beef market. In this country, beef cattle production is expected to increase 5.1% compared to 2014 [23] and Zebu (Bos indicus) cattle correspond to more than 80% of the beef-producing herds. The significant variability reported in the evaluation of growth and production traits in Nellore cattle (Bos indicus) reared in Brazil seems to mirror the great variability of climate and general management applied in the breeding systems. This is a clear indication that generalization of economic values between herds is unlikely to hold. Conversely, the estimation of economic values for each and every breeding system would be a difficult and time-consuming task. Therefore, an alternative to the economic selection index is needed.
In this study, we propose and evaluate a multivariate approach, restricted to genetic effects, as a strategic practical shortcut for the identification of animals with superior genetic merit.

Material and Methods
The data set was provided by the National Association of Breeders and Researchers (ANCP) in Brazil and consisted of weights and carcass traits measured between the years of 1995 and 2014. Evaluation of phenotypes took place in a tropical humid region (254 m altitude) in herds located in the county of Pontes e Lacerda, Mato Grosso state, Brazil. Mean annual precipitation is of 1,500 mm with defined wet and dry seasons. Records consisted of birth weight (BW), weights at standard ages of 120 (W120), 210 (W210), 365 (W365) and 450 (W450) days, longissimus muscle area (LMA), back-fat thickness (BF) and rump fat thickness (RF). The following formula generated the weight at standard ages of each animal: where AdjW is the weight adjusted at standard ages of 120, 210, 365 and 450 days; WA and WB are the weights before and after standard ages (SA), respectively; AA and AB are ages before and after the standard ages (SA), respectively. Adjusted weights were used because all breeding programs in Brazil work with specific selection criteria, mainly weights at standard ages. Records of LMA and BF were obtained from cross-sectional images of the Longissimus muscle, between the 12th and 13th ribs. Back-fat thickness was estimated at the 3/4 position from the chine bone end of the Longissimus muscle using the cross-sectional LMA image. RF was measured over the intersection between the gluteus medium and biceps femoris muscles located between the hooks and pin bones. All ultrasound image collections and analyses were carried out by Ultrasound Guidelines Council-certified technicians, hardware and software.
Exploratory analysis was carried out using the Statistical Analysis Systems [24] to check the consistency of the data and to evaluate the significance of environmental sources of variation that can affect the traits, such as current herd, year of birth, season of birth (classified into four groups: Jan-Mar, Apr-Jun, Jul-Sep, Oct-Dec), and sex and cow age at calving, as a covariate. Thus, the contemporary groups (CG) were defined as the groups of animals of the same sex, and born in the same herd, year, and season and reared under the same conditions. Linear and quadratic regressions of the weights by age of dam at calving were performed to determine if this covariate should be included in the genetic analyses. Hence, due to significance (P<0.01) of the dam age at calving, it was used to fit the following traits: birth weight, weight at 120, 210, 365 and 450 days of age. Edits included removal of weights and carcass traits outside of 3 standard-deviations from their respective GC means, discarding contemporary groups of fewer than 9 observations, and discarding contemporary groups represented by a single sire. Summary information of the edited data set is in Table 1. The relationship matrix included 9 generations of pedigree information and contained a total of 28,828 animals.
Genetic analyses were carried out by fitting a model that included the following effects: age of dam as covariate, sex of the animal coded as male or female, season of birth coded into four levels and the effect of the management group on the farm. The GLM (General Linear Model) procedure was used to define the fixed effects included in the contemporary groups. The fixed effects that significantly influenced the growth traits were included in the subsequent analyses. Genetic analysis was done by fitting multi-trait animal models. The mixed linear model for these traits was: where β represents the fixed effects (contemporary group as a cross-classified effect) and cow age at calving (as linear and quadratic effects) as covariates), associated with the observation (records), vector y is the known matrix X and a, m and mpe are the random effects vectors (direct, maternal and maternal permanent environmental effects, respectively) associated with records in y by the incidence matrix Z 1 , Z 2, and Z 3 , respectively; and e is the residual (co)variance. For this model, it was assumed that: Uniform and Gaussian priors were assumed for fixed and random effects, respectively: where G a and G m are (co)variance matrices of additive direct genetic effects and additive maternal additive genetic effects, respectively; G am is a matrix of additive genetic (co)variances between direct and maternal effects; E mpe is a (co)variance matrix of maternal permanent environmental effects; R is a (co)variance matrix of residual effects; A is the additive relationship matrix among all animals in the pedigree file; and I c and I n are identity matrices, whose orders were the numbers of dams and animals, respectively. The inverse Wishart distribution was used to derive priors for variance components: where S a and v a , S m and v m , S mpe and v mpe , and S r and and v r are a priori values and degrees of freedom for direct genetic, maternal genetic, maternal permanent environmental and residual covariance, respectively. In matrix notation, the variance structures were: var ¼ residual effects. (Co)variance components were estimated by a Bayesian implementation via Gibbs sampling using flat improper priors for nuisance parameters and flat priors for (co)variance components. The marginal posterior distribution for each parameter was obtained by integration of multivariate density functions, considering one long chain with 1,500,000 iterations. The initial discard was 500,000 and the thinning interval of the chain was 100. Convergence was checked by visual inspection of the sample trace plots. Computations of direct genetic (s 2 a ), maternal genetic (s 2 m ), covariance between the direct and maternal genetic (σ am ), maternal permanent environmental (s 2 mpe ) and residual variances (s 2 e ) were carried out using the program GIBBS2F90 [25]. The following (co)variance components and parameters were calculated as in [26]: phenotypic variance for weights (s 2 , genetic correlation between the direct and maternal effect (r am ) and maternal permanent environmental variance as a proportion of the phenotypic variance The expected progeny differences were rescaled to have zero (0) mean and unit (1) variance, and cluster analyses were performed using the k-mean method, which is effective to detect initial clusters with a standard iterative algorithm for minimizing the sum of squared distances from cluster means. A set of points named cluster seeds were selected as a first guess for cluster means. Each animal was assigned to the nearest seed to form temporary clusters. The seeds were then replaced by the means of temporary clusters and the process was repeated until no further changes occurred in the clusters. Therefore, the animals were encoded according to their respective group and k was chosen according to the pseudo F statistic [27], as well as sum of squares within groups. The dunn.test package [28] was used in order to compute Dunn's test [29] among multiple pairwise comparisons after a Kruskal-Wallis test among k groups. False Discovery Rate (FDR) was controlled using the Benjamini-Yekutieli adjustment [30].
Principal component and discriminant analyses of EPDs were performed in order to use both the PC and the coefficients of linear discriminants (CLD) as a selection index. The proportion of trace was considered in order to determine which CLD should be used, which indicates for how much (as a proportion) of the dataset of standardized EPDs each of the equations can successfully account. The results obtained using the CLD and PC indices were compared to results using a selection index, which was estimated according to [13]. In matrix notation this selection index would be I = b'X, where X is an n x 1 vector of selection criteria and b is an n x 1 vector of weighting factors to be computed. Thus, the selection index weights were calculated as b = P −1 G 12 v, where G 11 is an n x n genetic (co)variance matrix of the n selection criteria, P is an n x m phenotypic (co)variance matrix of the selection criteria, and v is a vector of economic values. As it is just a simulated index, it was considered that all economic weights (v) had the same value (v = 1).
Animal genetic selection is ideally no longer based on phenotypic measures but rather on estimated breeding values such as EPDs, which are estimated using best linear unbiased predictions in a multi-trait linear mixed model. Thus, the phenotypic (co)variance matrix is not needed for index construction [14,31]. Although the phenotypic correlations have no effect on the derivation of index weights (coefficients) they are required for the calculations describing the index [13]. As a second approach, where only EPDs are used, the only information needed in addition to the economic values to allow prediction of the breeding objective, is information on the genetic (co)variances among selection criteria in the index and on genetic (co)variances among the selection criteria and the breeding goals [14,32]. In that case, EPD was used instead of phenotypic measures in this index, and solving for the index coefficients is by an equation proposed by [14]: where b is a vector of index weights for the EPD of the selection criteria in the index, G 11 is a n x n genetic (co)variance matrix of the n selection criteria, G 12 is a n x m genetic (co)variance matrix among the n selection criteria and the m breeding goals, and v is a vector of economic values, for which all values are the same (v = 1).
After estimating the weight indices, estimates of correlation among the indices were computed in order to analyze if the best animals selected with the multivariate approaches were the same as those identified based on the selection index proposed by [13] and [14]. Bayesian estimates of Spearman correlations between the indices were obtained using the package Bayesian-FirstAid [33].

Results
The posterior marginal distributions of (co)variance component estimates were accurate, tending to normal distribution. The symmetrical distributions of measures of central tendency indicated accurate analysis [34,35]. In general, the samples obtained for the genetic correlations showed no wide dispersion (Tables 2-8), i.e. the oscillations remained stable, indicating that the burn-in period considered in the analysis was reliable and allowed convergence of the chain [36]. The estimates of genetic additive correlation among growth traits were all positive and high. Maternal genetic correlation estimates between the traits were all positive and ranging from 0.17 (BW-W210) to 0.96 (W120-W210). The correlations between weights at birth and the other weight categories were the only genetic correlations that presented negative values regarding the maternal effects, which means that selection of the best cows cannot result in selection of the best progenies (Table 3). Genetic correlation among birth weights and the weights at 120, 210, 365 and 450 days of age were all positive (Table 4).
Additive direct heritabilities for weights at birth, 120, 210, 365 and 450 days of age, longissimus muscle area, back-fat thickness, and rump fat thickness were 0.53±0.030, 0.24±0.017, 0.29±0.015, 0.55±0.017, 0.53±0.018, 0.37±0.029, 0.09±0.020 and 0.24±0.027, respectively. The estimates of maternal heritabilities were 0.24±0.014, 0.11±0.009 and 0.08±0.006, respectively for weights at birth, 120, 210 days of age (Table 8). Estimates were moderate and remained fairly similar within pre-weaning traits and within post-weaning traits. The estimates of maternal permanent environmental effects (c 2 ) for weights at birth, 120 and 210 days of age were 0.025±0.004, 0.045±0.004 and 0.031±0.003, respectively, which means that repeatability of the dam in this herd did not affect the expression of these traits, or on the other hand, the dams were constantly changing. Before performing cluster analysis, it was necessary to diagnose how many clusters suited the data. Thus, visual examination were used to determine the best k value: the first consisted of using the Calinsky criterion (CC) technique; and the second approach was the sum of squared errors (SSE) within groups (Fig 1). In accordance with the k-mean partition comparison and the CC, the best k was 2, and for the SSE within groups the best k was 4; therefore we used the mean of both approaches (k = 3).
After performing cluster analysis using the k-mean method and k = 3, groups 1, 2 and 3 were composed of 2970, 5886 and 2997 animals. Significant differences were observed between the three groups (p < 0.0001), as pointed out in Table 8. It was possible to sort the animals that had similar EPDs into each cluster. Kruskal-Wallis rank sum test, followed by Dunn's test and p-value adjusted by FDR analysis revealed differences among these groups for weights and carcass traits. Thus, using this approach it is possible to choose the best animals for selection, or conversely choose inferior animals to cull. Cluster analyses allow one to perform more accurate culling of the worst animals, meaning those that do not contribute positively in the herd, as well as helping to select the best animals. The K-mean method was efficient in grouping animals based on similar EPDs, and its standardized averages are shown in Fig 2. The animals grouped in cluster 1 had the highest EPDs for all traits, cluster 2 presented average animals, and cluster 3 comprised the animals with the lowest EPDs. There were significant difference between these groups for all traits (Table 9). Thus, this approach allowed a simulation in order to choose the superior animals for selection and the inferior animals for culling. The linear discriminant coefficients LD1 and PC1 were established as the two equations that allowed for the best discrimination (Table 10). The proportion of trace indicates the amount of data for which each equation can successfully account, regarding standardized EPDs. The first linear discriminant function (LD1) was used because its proportion of trace was close to 1 (0.9837). The accuracy of the selection index was 0.969. The index weights, as proposed by [12] and [14], have only values for measurable traits, in other words, only for phenotypes. Estimates of correlations between pSI-LD1 (0.66±0.061), pSI-PC1 (0.66±0.061), pSI-  (Table 11). Favorable genetic trends were obtained for the direct and maternal standardized EPD for birth weight, weight at 120, 210, 365 and 450 days of age, back-fat thickness, and rump fat thickness; except for longissimus muscle area which showed sigmoidal trend (Fig 3). Trace and density plots of marginal posterior Spearman correlations (Fig 4) between the selection indexes proposed by [13] and [14], first linear discriminant function, and the first principal component were accurate, tending to normal distribution, and showing symmetrical distributions of measures of central tendency, which indicate high accuracy in these analyses [34,37].

Discussion
Moderate to high estimates of direct heritabilities for growth and carcass traits suggest a gap for genetic improvement of the traits using mass selection [10,[38][39][40][41][42]. Overall, the estimate of maternal genetic effects was moderate and affirmed the importance of maternal genetics in the expression of the phenotype of Nellore cattle regarding growth traits. Estimates of maternal permanent environmental effects (c 2 ) were quite low, but enough to influence the expression of an individual's real phenotype due to masking effects of pre-weaning care. This indicates a higher effect of the maternal permanent environmental effect on weaning weight and a declining trend in the subsequent weight categories. The genetic correlations between direct additive genetic and maternal additive genetic effects were negative only between correlations regarding birth weight. Therefore, selection for direct additive genetic effects would not improve maternal ability, making it difficult to conduct joint selection for these traits. Different reasons for these negative estimates have been proposed. As reported by [43], the number of offspring per dam may have increased the dependence between maternal parameters, although other factors such as the number of dams with records also interfere in these estimates. This occurs due to failure to include some important fixed effects in the model, and the inclusion of sire x year interaction in the model could also lead to a reduction in the negative correlation estimates between the animal effects [11]. The negative correlations may also indicate antagonism between the effects of genes related to growth and maternal ability and are often considered to be a statistical issue rather than a biological matter in animal breeding [44,45]. Antagonism between the effects of an individual's genes for growth and those of its dam for a maternal contribution may also be due to natural selection for an intermediate optimum [46]. On the other hand, estimates of genetic correlation between direct and maternal effects for pre and post weaning weights (W120, W210, W365 and W450) were all positive, which means that the selection for direct genetic effects can improve maternal ability, enabling joint selection for growth traits.
Estimates of direct heritabilities were moderate to high for all traits, suggesting that genetic improvement can be obtained through the selection for these traits. Estimates of direct heritabilities for post weaning live weights previously reported by [41] were similar to the estimates obtained in the present study. The estimates of genetic correlations between genetic direct additive and maternal additive effects were moderate and were negative only between BW and the other weights, which reflected an antagonistic relationship. This antagonistic effect is hard to explain and probably occurred for the reasons cited above. The estimates of genetic correlations among all traits were positive and ranged from moderate to high, indicating strong genetic associations among all traits, in accordance with results reported by [4,40,41] regarding Nellore cattle. Genetic correlations between weaning weight and post weaning weights were also moderate to high and positive, suggesting that many of the genetic factors that influence body weight at different ages were the same and that selection of animals at weaning age is associated with the weight at 15 months. Likewise, high genetic correlations of W120 with other traits mean that animals presenting greater EPDs for W210 will probably present greater genetic merit for W365 and W450 also. Therefore, reduction of the age of selection from 450 days to 365 days or even at weaning (W210) may be possible for Nellore cattle.
High genetic variability observed in the herd, indicated by high estimates of the heritability coefficients, reveal that it is possible to make improvements over generations through a selection schedule [2,47,48] aiming at genetic progress of traits. Genetic trends (Fig 3) were positive and increasing over the years, which may have contributed to the correct classification and grouping of similar animals within each cluster.
Animals in cluster 1 presented the highest values when the selection indices were used, which points to similar results regardless of the approach adopted to select superior animals.  This statement is supported by positive and high correlation estimates between the first linear discriminant (LD1) function and the other selection indices [13]. Therefore, in the absence of economic weights, the multivariate approach can be a reliable tool for the purpose of selecting the best animals. Multivariate analysis allowed forecasting information based on the relationships among breeding values and accounting for maternal ability and the direct genetic potential of the animals. Likewise, the multivariate index enabled data summarization after genetic evaluation, fine discrimination and rapid selection of animals. The results of this study contribute to achieving the goals of simplifying the process of cattle breeding, lowering costs and saving time to subsidize on-farm decision-making regarding the selection of sires and dams. The generation of progeny with superior genetic merit for traits of economic importance is crucial to increase profitability in beef cattle production. Currently, economic returns are obtained mainly by selling heavier animals for slaughtering, with a growing emphasis on carcass quality. This logic seems to perfectly match the increased genetic trends found in this study.
The selection index approach is limited due to working only with phenotypes (selection criteria), requiring farm-specific economic and productive data and needing a separate index for every animal. These costly and time-consuming limitations are overcome by using the multivariate technique presented and tested in this study. Nevertheless, despite some limitations, the economic selection approach to weights and carcass traits is no doubt the most robust method for the selection of cattle.

Conclusion
Genetic correlations were substantial across in traits of economic importance evaluated in this study. High to moderate estimates of heritabilities and genetic correlations indicate that a correlated response approach appears to be suitable for practical decision-making in beef cattle systems. The results obtained from the multivariate approach were consistent with results obtained using selection indices, which supports the use of multivariate analysis as a potential tool for selection in cattle breeding. In addition, longissimus muscle area and subcutaneous fat thickness measured by real-time ultrasound should be used as selection criteria, allowing the estimation of breeding values before the first mating season in order to accelerate the response to individual selection. Standardized genetic trends for weights (BW, W120, W210, W365 and W450) and carcass (LMA, BF and RF) traits due to direct and maternal effects. BW, birth weight; W120, weight at 120 days of age; W210, weight at 210 days of age; W365, weight at 365 days of age; W450, weight at 450 days of age; LMA: longissimus muscle area; BF: back-fat thickness; obtained between the 12th and 13th ribs; RF: rump fat thickness. doi:10.1371/journal.pone.0147180.g003 Use of Multivariate Analyses in Livestock Selection