Heterogeneity of variance and genetic parameters for milk production in cattle, using Bayesian inference

The goal of this study was to verify the effect of heterogeneity of variance (HV) on milk production in up to 305 days of lactation (L305) of daughters of Girolando, Gir and Holstein sires, as well as in the genetic evaluation of these sires and their progenies. in Brazil. The model included contemporary groups (consisting of herd, year and calving season) as a fixed effect, cow age at calving (linear and quadratic effects) and heterozygosity (linear effect) as covariates, in addition to the random effects of direct additive genetic and environmental, permanent and residual. The first analysis consisted of the single-trait animal model, with L305 records (disregarding HV). The second considered classes of standard deviations (SD): two-trait model including low and high classes (considering HV), according to the standardized means of L305 for herd-year of calving. The low SD class was composed of herds with SD equal to or less than zero and the high class with positive SD values. Estimates of (co)variance components and breeding values were obtained separately for each scenario using Bayesian inference via Gibbs sampling. Different heritability was estimated. Higher for the high DP class in the Gir (0.20) and Holstein (0.15) breeds, not occurring the same in the Girolando breed, with a lower value among the classes for the high DP (0.10). High values of genetic correlations were also found between low and high SD classes (0.88; 0.85 and 0.79) for the Girolando, Gir and Holstein breeds, respectively. Like the order correlations (Spearman) which were also high for the three breeds analyzed (equal to or above 0.92). Thus, the presence of HV had a smaller impact for L305 and did not affect the genetic evaluation of sires.


Introduction
The use of local breeds adapted to a specific environment is one of the OIE's recommendations [1], seeking to improve animal health conditions, as well as promote the sustainability of production systems. In addition, to improve the productive characteristics of dairy cattle, changes have been made in the management of herds (whether nutritional, ambience and/or reproductive) and/or the use of breeding strategies [2][3][4]. The genetic progress of herds has become increasingly widespread with the intense use of reproductive biotechnologies, resulting in the distribution of genetic material of animals, mainly sires, with high productive potential in different environments and contexts of dairy production around the world [5].
In this context, genotype x environment interaction (GEI) emerges as a possible problem in defining strategies for the use of genetic resources, due to the fact that genotypes with favorable performance in one environment may not have all their genetic potential expressed in another environment [6]. In addition, there is a tendency for GEI to be ignored in genetic evaluations, due to the low accuracy in selecting for performance in extreme environments, when there is limited information to estimate the genetic value and due to the complexity of the models used [7]. As a result, we can select some breeders in the wrong way, with better performances depending on the breeding environment provided to them and not by their genetic potential [8,9].
The heterogeneity of variance in milk production is a form of GEI and when it occurs, the production of the progenies of a given breeder will be weighted by the proportion of SD of the herds in which their progenies are being evaluated. The result of this is that the productions of these daughters, from more variable herds, will present a greater weight in the genetic evaluation of breeders than progenies with production from less variable herds [10].
However, if the GEI is not expressive, records of daughters in different environments may alleviate the lower response to selection caused by GEI. If there is significant interaction between genotype and environment in the expression of a given productive characteristic, better said, if the genetic correlation between environments is less than 0.70; different improvement programs will be required for genetic assessment in each of the environments [11].
Therefore, it is essentially important to choose appropriate environmental descriptors in GEI studies [12][13][14]. The average level of production of each herd, for a given characteristic, is constantly used as an indicator of the effect of the environment to which the progeny is inserted, since this is an easily accessible information that associates numerous other environmental factors that influence the expression of the phenotype [15][16][17][18].
India is the world's largest dairy producer, with the largest proportion of production coming from small herds, averaging less than two dairy cows per herd. These cows are mostly multi-generation indefinite crosses between exotic dairy breeds and indigenous cattle, with no record of performance or pedigree. Initial results from a performance recording program indicate a strong potential for genomic selection to improve milk production in these animals. The performance of animals with different racial compositions, in different Indian environments, will allow better guidance for small producers on the ideal racial composition for each environment. These results highlight the effect of GEI on dairy production, showing that the best animals for one environment will not necessarily be the same in another [19].
Other authors also found the existence of the effect of GEI in several breeds, regions, countries, and levels of herd production and demonstrated that there is an effect on the ranking of breeders [20][21][22]. These results show the importance of considering GEI in genetic evaluations of dairy herds.
Therefore, the goal of this study was to evaluate the presence of variance heterogeneity for milk production up to 305 days of lactation in the various Brazilian herds of the Girolando, Gir and Holstein breeds, as well as to measure how much this heterogeneity can interfere in the classification of sires through genetic values.

Animals and data consistency
For the present study, information on sires and cows of gir, Girolando and Holstein breeds was used. Data were collected from the Programa de Melhoramento Genético da Raça Girolando (PMGG), under the technical coordination of the Embrapa Gado de Leite (Fig 1).
To perform the analyses, we chose to separate the daughters to be evaluated, according to the race of their parent, performing the correction of direct heterozygosis (DH), according to the equation proposed by [23]: Where, a t i e a v i denote the proportion of the breed 'i' in the sires and cow (mother) of the progeny.
For data consistency, proles records were eliminated without ancestry information, no date of birth or date of birth. The total milk yield (MY), in each breed, was standardized for 305 days in lactation (MY305) through the expression: Were, LL represents the length of lactation and b1 is the linear regression coefficient of MY as a function of LL. Milk production was limited up to 305 days at a maximum of up to 25,000 liters/lactation, with the age of the cow at delivery of up to 120 months.
Observations regarding herd-year-season classes of delivery with several information smaller than four were excluded. To ensure the genetic connection between the Classes of SD, breeders should have at least two daughters in each class and in both classes.
The months in which the deliveries occurred were grouped into two seasons, the dry season occurring from April to September and rain from October to March. These seasons were included in the formation group of contemporaries by the junction of the herd classes, year, and parting season.

Statistical model used to estimate genetic parameters
To perform the analysis of the heterogeneity of variance among the herds, the data were analyzed in two situations. In the first, disregarding different variances of milk production among the herds-year of delivery (general analysis), obtaining estimates of the components of (co)variance and prediction of the genetic values of individuals for milk production.
In the second approach, milk production was considered in each phenotypic SD class as a distinct characteristic, obtaining estimates of the components of (co)variance and prediction of the genetic values of the individuals in each class.
The evaluation model included the fixed effects of contemporary group (composed of herdyear-season of delivery), the age of the cow at delivery (linear and quadratic effects) and degree of heterozygosis (linear effect) as covariates, and additive genetic effects of animal, permanent and residual environment as random effects.

Model that disregards variance heterogeneity
In the general analysis, to obtain the estimates of the variance components, as well as the genetic values of the animals, the following animal model was used:

PLOS ONE
Where, y is the vector of n milk production observations; b is the fixed effects vector; a is the vector of genetic values of the animals; p is a vector of values referring to the permanent effect of the environment on the animals; e is a vector of residuals of the same dimension as y. X, Z, W are the corresponding incidence matrices of fixed, additive genetic and permanent environment effects as random effects, respectively.

Model considering variance heterogeneity
Classes of herd-year of calving were defined and, subsequently, the milk production averages were calculated in each of these classes. The mean values of the classes were standardized (considering mean equal to zero (μ c = 0) and variance equal to one (σ c = 1)) and then used to define the SD classes.
The low SD class consisted of standardized mean values equal to and less than zero. Otherwise, the high SD class was constituted with positive standardized mean values. In this way, it was possible to separate the herds that were on opposite sides of the normal data distribution curve. The following formula was used to standardize the mean: Where, Z = standardized mean; X = mean milk production of herd-year of calving classes; μ = general average of production up to 305 days; σ = general standard deviation of production up to 305 days.
In this analysis, which considers each class of SD as a distinct trait, obtaining variance components and genetic values of breeders for milk production, taking into account a joint distribution of traits, considered the following model: Where, yi = milk production in the i-th SD class and the other terms that represent the milk production in each SD class are the same previously described in the single trait model. For the components of (co)variances, distributions were assumed, the Inverse-Wishart Conjugate Prior.
The analyzes were performed by Bayesian inference using the BLUPF90 family programs [24]. A total of 300,000 Gibbs samples were generated, considering an initial discard period of 30,000 iterations. To minimize the effects of the initial values and to ensure the independence of the samples, a sampling interval of 10 iterations was considered, generating a total of 27,000 samples of the estimates of the variance components. The convergence criterion used was the Geweke diagnosis [25].

Correlation between genetic values
Spearman correlation coefficients were calculated for all combinations of estimated genetic value sets and also considering only the top 20% sires and progenies (cows) based on the genetic value in general analysis, of the different models, within each breed. In this way, it is possible to verify if there was a reranking of the animals, due to the heterogeneity of variance.
The accuracy (r) of the estimated breeding values was obtained using the following formula, according to [26]: r ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi

PLOS ONE
Where PEV is the variance of the prediction error and s 2 g is the additive genetic variance of the trait.
The consistency of the database, as well as spearman's genetic and correlations, were performed through the Statistical Program SAS [27].

Results and discussion
Subsequent values observed for means, standard deviations, coefficients of variation (CV), number of lactations and number of group of contemporaries, for milk production, in each phenotypic SD class and in general analysis are presented in Table 1 for the Girolando, Gir and Holstein breeds.
The CV were high, mainly in the herds included in the low SD class, reflecting, especially, in inequality of productions between the studied herds.
The herds included in the high SD class had a higher mean milk production for all breeds studied. However, the highest SD was found in the general analysis, with no increase in the mean associated with an increase in SD in the three breeds analyzed.
A similar result was observed by [28] and [29], who also found that increasing the mean did not necessarily increase the standard deviation. Contrary to what was found by [30, 31], when working with the Brown-Swiss breed, where an association between mean and SD within the herd was verified.
It was possible to observe that the high SD class with the highest average, highest SD and lowest CV, between classes, exhibited homogeneity in the milk production of the animals, in the three breeds used in this study, indicating that the criterion for forming the SD classes was efficient.
In all breeds, there was an increase in the posterior means of the estimates of the variance components referring to the additive genetic effect, permanent environment and residual effect (Tables 2-4).
This increase occurred from low to high phenotypic SD, however, this change was not proportional to the point that heritabilities, repeatability and correlations remained constant in the different analyses.

PLOS ONE
Heterogeneity of variance and genetic parameters for milk production, using Bayesian inference Table 2. Means, respective SD and credibility interval (HPD = 95%) for posterior estimates of variances, heritability (h 2 ), repeatability (rep) and genetic correlation (r g ) for milk production up to 305 days of daughters of sires of the Girolando breed, in general analysis and in the different classes of phenotypic SD.   The increase in the estimates of the genetic and residual variance components, according to the increase in the production level of the herds, has been reported in other studies [32-35] with dairy cattle, using approaches different from the one used in this study.

General analysis SD classes
It is noteworthy that, for the Girolando breed, the posterior mean of the estimation of permanent environment variance for the high SD class was 98.36% higher than the low SD class. This had a great impact on the calculation of the total phenotypic variance. In general, environmental variation was responsible for an important part of the total variation of the evaluated trait, causing low heritabilities.
Geweke's criterion [25], which consists of indicating the convergence of the mean, resulted in a probability value associated with the test always greater than the 5% level of significance adopted, indicating convergence of the chains and the validity of the Bayesian analysis.
For the Girolando breed, the posterior means of heritability, considering the SD classes, were close, being 0.10 for the high SD and 0.12 for the low SD. This shows that, although the herds are in different environments, genetic factors act in a similar way in both.
Comparing the a posteriori mean of heritability in general analysis (0.09), there is a great difference in relation to the values reported by [36], who obtained heritability of up to 0.1, much larger than that found here. This difference may be due to the elimination of short lactations, less than 120 days, which caused a decrease in the range of data in that study. Another point could be the fact that they used blood degrees of the Girolando breed for the cows and not according to the breed of the sire, as performed in the present study.
The subsequent mean of heritability found in the general analysis (0.17) for the Gir breed was much lower than that reported by [37] with values up to 0.39. Furthermore, in our study, the high SD class (0.18) was only 10% higher than the heritability estimate for the low SD class (0.20).
Posterior means of heritability of 0.12, 0.15 and 0.11 (general analysis, high and low SD, respectively) approximate from those found by [38] in general analysis (0.14) and different by [39] in general analysis (0.18), for Holstein animals.
Heritability values were, in general, higher in the high SD classes, with the exception of the Girolando breed, mainly due to the influence of the posterior estimate of the permanent environment variance in this breed.
In many cases, the more intensive the management of the herd for milk production, the greater the heritability [40] and this can be explained by the greater opportunity for cows to express their genetic potential in herds with high levels of production, due to the better environment offered, which is characterized by greater control of the health aspects of the cows and better feeding [41][42][43].
Differences in heritability estimates may also be due to non-compliance between the populations studied and the different analysis methods used. However, ignoring these differences between heritability estimates, although small, raises doubts about the reliability of estimated reproductive values.
In addition, heritability, and gradients of genetic variation between environments could imply a higher genetic response in herds with more intensive management. It becomes evident the need for further research considering VH in genetic evaluation models, including through models of reaction norms.
Repeatability values indicated reasonable similarity between consecutive lactations of the same animal, suggesting the importance of permanent environmental effects in determining the expression of this trait. In general, comparing within each breed, repeatability values did not differ greatly, ranging from 0.31 to 0.32 (Girolando), 0.35 to 0.40 (Gir) and 0.28 to 0.31 (Holstein).
Animals of the Girolando breed reported a repeatability value equal to 0.93 in general analysis The animals of the Girolando breed have repeatability values ranging from 0.33 [44] to 0.93 [45] in the general analysis, much higher than that found in this study (0.31) for the same breed. A large part of this difference is probably due to our lower heritability values found. In other words, genetic variability contributed to a lesser extent in the repeatability estimates [46].
For the Girolando, Gir and Holstein breeds, high values of genetic correlations were found (Tables 2-4), between low and high SD classes (0.88; 0.85 and 0.79). Studies on GEI [47,48] concluded that when there is a genetic correlation greater than 0.61 between two environments, a higher average genetic gain is achieved for all sires in both environments.
In view of this, as reported by other authors [30, [49][50][51], our results indicate a significant absence of variance heterogeneity for milk production up to 305 days in the analyzed herds, that is, it is expected that the genetic value and the ranking of the sires are the same in the two SD classes.
The order correlations (Spearman) between the genetic values obtained in the low and high SD analyzes and those obtained in the general analysis, taking into account all sires, were equal to or greater than 0.92 (Table 5), and were shown to be significant (p<0.01) for the three breeds studied.
The order correlations (Spearman) between the genetic values obtained in the low and high SD analyzes and those obtained in the general analysis, taking into account all sires, were equal to or greater than 0.92 (Table 5), and were shown to be significant (p<0.01) for the three breeds studied.
However, when the correlation is performed for the 20% of the best sires, classified based on the genetic values estimated in the general analysis, there were reductions in the order correlations. Becoming higher for Gir and Holstein breeds, between low and high SD classes, 32.26% and 29.35%, respectively. As for the Girolando breed, the greatest impact was between the general analysis and the low SD class, with a reduction of this correlation by 9.47%.
The greater the selection intensity, the lower the percentage of common animals between the classification groups, since the effect of errors in the classification order is more evident when a small number of animals are selected.
The greater association of high SD classes with the general analysis indicates that more variable herds contribute with greater participation in the prediction of genetic values, in a situation that disregards variance heterogeneity. This is because the high SD classes had greater additive genetic variability.
As with the sires, the order correlation was performed for their progenies (Table 6). Correlation coefficients considering all cows, according to sire breed (father), were also equal to or greater than 0.92. This means that their ranking remained similar between the different SD classes.
However, there was a downward trend in correlation values as selection pressure increased. This became evident when the Spearman correlation considering only 20% of the best cows, Table 5. Spearman correlation estimates (above the diagonal) between all sires in general analysis and in each phenotypic SD class and, for the top 20% sires (below the diagonal); classified according to the genetic value in general analysis, for milk production up to 305 days, for the daughters of Girolando, Gir and Holstein breeders.

PLOS ONE
Heterogeneity of variance and genetic parameters for milk production, using Bayesian inference according to the genetic value in the general analysis, decreased to values up to 0.62. However, there was a downward trend in the correlation values, as there was an increase in the selective intensity of the sires, since the Spearman correlation considering only 20% of the best cows, according to the genetic value in the general analysis, decreased to values of up to 0.62.
A study analyzing breeding strategies in two environments, based on the highest mean genetic gain [52][53][54][55][56], found that the analysis was relatively insensitive to heritability, number of progenies per sire and the relative importance of both environments, but was very sensitive to intensity of selection, as occurred in this study. Another study [57] evaluating the weight of male and female cattle, but using linear regression, showed that males had higher body weights and greater gains in body mass (BW) compared to females, with advancing age.
Although alterations were observed in order correlations, it is verified in Table 7 that there was no drastic change in the number of animals, for sires and cows, which remained in the list of the ten best classified.
There were only two cases of considerable change. The first was in the Holstein breed, in which only 6 animals of the 10 bests, remain in the analysis, considering the low SD class (analysis of the sires). The second happened in the analysis of the cows, in which only 5 animals, out of the top 10, remain in the analysis considering the low SD class, in the Gir breed.
Other studies [2,22,[58][59][60][61][62][63] show small GEI effects and correlation values close to unity, with no significant reclassification of sires for milk production, whether with reaction norms or with analysis of genetic and order correlations between environments.
This indicates that in this research herd environments (included in SD classes) may be similar in some respects, although they differed in herd size and location regions.
The HV existing in this study did not cause great harm in the genetic evaluation of sires in the three studied breeds. This was evidenced when observing the results obtained in the general analysis, which would similarly select the same sires within the HV classes.

Conclusion
Estimates of heritability and genetic correlation for milk yield revealed that the heterogeneity of variance present is mainly genetic in nature. Therefore, the presence of variance heterogeneity did not affect the genetic evaluation of sires for the three breeds studied. Table 6. Spearman correlation estimates (above diagonal) among all cows (daughters of Gir, Girolando, and Holstein sires), in overall analysis and in each phenotypic standard deviation class, and for the top 20% cows (below diagonal), ranked according to the genetic value in general analysis, for milk production up to 305 days.

PLOS ONE
Spearman's correlations showed that there was no difference in the classification of sires without considering HV. However, this pattern was altered when the selection intensity increased in the sires, resulting in a considerable change in their ranking.
In animal genetic evaluation models, it is important to verify the presence of HV, as well as its effect on the genetic evaluation of the breeders, especially in countries where there are large variations in the environment.