Multiple-trait, random regression, and compound symmetry models for analyzing multi-environment trials in maize breeding

An efficient and informative statistical method to analyze genotype-by-environment interaction (GxE) is needed in maize breeding programs. Thus, the objective of this study was to compare the effectiveness of multiple-trait models (MTM), random regression models (RRM), and compound symmetry models (CSM) in the analysis of multi-environment trials (MET) in maize breeding. For this, a data set with 84 maize hybrids evaluated across four environments for the trait grain yield (GY) was used. Variance components were estimated by restricted maximum likelihood (REML), and genetic values were predicted by best linear unbiased prediction (BLUP). The best fit MTM, RRM, and CSM were identified by the Akaike information criterion (AIC), and the significance of the genetic effects were tested using the likelihood ratio test (LRT). Genetic gains were predicted considering four selection intensities (5, 10, 15, and 20 hybrids). The selected MTM, RRM, and CSM models fit heterogeneous residuals. Moreover, for RRM the genetic effects were modeled by Legendre polynomials of order two. Genetic variability between maize hybrids were assessed for GY. In general, estimates of broad-sense heritability, selective accuracy, and predicted selection gains were slightly higher when obtained using MTM and RRM. Thus, considering the criterion of parsimony and the possibility of predicting genetic values of hybrids for untested environments, RRM is a preferential approach for analyzing MET in maize breeding.


Introduction
Maize (Zea mays L.) is the most cultivated crop worldwide, with a global yield of 1.1 billion tons in the 2018/2019 crop year [1]. An important advantage of this crop is that it can be cultivated across a range of environments and seasons. However, such aspects lead to differential responses of genotypes to varied environmental conditions, which is known as genotype-byenvironment interaction (GxE) [2] or phenotypic plasticity [3]. Due to the significant influence of environmental effects on the expression of quantitative traits, multi-environment trials (MET) are an important tool to assess such effects. Variations in phenotypic performance depend on the magnitude of GxE, which occurs when there is dissimilarity in a genotype's performance in different environments [4]. An environment can be defined by biotic or abiotic factors to which plants are exposed, and can include other characteristics, such as level of technology used or plant population density [5].
To assess GxE, the compound symmetry model (CSM) is the most simple and parsimonious, while the multiple-trait model (MTM) is the most complex and complete [6]. With CSM, a small number of parameters can be estimated, however the assumptions of this model are limited, such as the genetic correlation between environments being equal to 1 [7]. On the other hand, MTM tends to present problems in relation to convergence due to the large number of parameters estimated [8]. Thus, the MTM became prohibitive when the number of environments is high [6].
Random regression models (RRM) estimate the same genetic parameters as MTM but with less parameterization [9] and capture the continuous change of a trait over time or environmental gradient [8,10]. Furthermore, this approach is a powerful tool to predict reaction norms [11,12] and, consequently, predict genotypic values of genotypes for untested environments [7]. The reaction norms consider the effects of GxE and can identify the underlying causes, as they are plotted over an environmental gradient [13].
Selective accuracy is the most suitable approach to compare statistical methods in genetic improvement [14], because it is directly related to the reliability of genetic selection [15]. Moreover, selection gains and heritability are also used to compare statistical methods for analyzing GxE in maize, as maximizing selection gains is the main objective of maize breeding programs [16].
Considering the importance of GxE in maize breeding [16], effective and informative statistical methods are necessary to assess MET, with the goal of improving selective accuracy and, consequently, genetic gains with selection [8]. Recently, MTM and RRM models have been successfully used in plant breeding to analyzing the GxE [11,17]. Thus, the objective of this study was to compare MTM, RRM, and CSM in terms of analyzing MET for maize breeding.

Experimental data
The MET was carried out between January and July 2018, in Goiás State, Brazil (S1 Table). The climate of the region is classified as humid temperate (Cwa), with dry winters and hot summers [18]. The average annual temperature is around 21.5˚C and average rainfall is between 1400 and 2000 mm year -1 . The agricultural practices used in the trials are based on those commonly employed for maize cultivation in the region [19].
A data set including 84 maize hybrids evaluated for the trait grain yield (GY) in four environments (E1, E2, E3, and E4) was used (S1 Table). The trials were established using a complete block design with three replications and 44 plants per plot. Plots consisted of four 4 m rows, with a spacing of 0.40 m between plants and 0.45 m between rows. To eliminate the competition effect, only the two central rows were evaluated. The GY was calculated considering the weight of the grain produced per plot, converted to kilograms per hectare (kg ha -1 ).

Statistical analyses
Three classes of statistical models, with homogeneous and heterogeneous residual variance structures, were considered: (i) CSM; (ii) MTM; and (iii) RRM fitted through Legendre polynomials. The estimation of variance components and prediction of the genotypic values for GY were conducted via the restricted maximum likelihood/best linear unbiased prediction (REML/BLUP) procedure [20].
(i) Compound symmetry models. CSM was determined by the following equation: where: y is the vector of phenotypes; r is the vector of block-environment combinations (assumed to be fixed), which encompasses the effects of environment and block within environment, added to the overall mean; g is the vector of genotypic effects (assumed as random); ge is the vector of GxE effects (random); and e is the vector of residuals (random). Uppercase letters refer to the incidence matrices for the respective effects.
In this model g � Nð0; Is 2 g Þ; ge � Nð0; Is 2 ge Þ; and e~N(0,R), where: s 2 g is the genotypic variance between hybrids; s 2 ge is the GxE variance; I is an identity matrix with appropriate order to the respective effect; and R refers to a diagonal matrix of residual variances (homogeneous or heterogeneous).
(ii) Multiple-trait models. MTM was determined by the following equation: In this model g~N(0,S g �I) and e~N(0,R), where: S g is the genotypic covariance matrix; I is an identity matrix with the appropriate order; � is the Kronecker product; and R refers to a diagonal matrix of residual variances (homogeneous or heterogeneous).
(iii) Random regression models. In order to use Legendre polynomials, the phenotypic mean of each environment (μ i ) must be scaled to a range of -1 to +1. The environmental gradient values (E i ) were obtained with the following expression [21]: The RRM were fitted through Legendre polynomials, considering all possible degrees of fit, as follows: where: Y ijk is the i th genotype (i = 1, 2, . . ., 84) in the j th environment (j = 1, 2, 3, 4) in the k th block (k = 1, 2, 3); μ is the overall mean; S j is the fixed effect of environment j; R(S jk ) is the fixed effect of block k within environment j; d is the Legendre polynomial degree, ranging from 0 to D (D = number of environments-1); α id is the random regression coefficient for the Legendre polynomial for the genotype effects; F ijd is the d th Legendre polynomial for the j th environment for the i th genotype; and e ijk is the residual random effect associated with Y ijk . In the matrix notation, the above model was described using the following equation: In this model g~N(0,K g �I) and e~N(0,R), where: K g is the covariance matrix for the coefficients of genotypic effects; � is the Kronecker product; I is an identity matrix with the appropriate order; and R refers to a diagonal matrix of residual variances (homogeneous or heterogeneous).

Model selection
In order to compare the residual variance structures (homogeneous and heterogeneous) of CSM, MTM, and RRM the Akaike information criterion (AIC) [22] was used. The difference among the AIC values (Δ AIC ) [23] were calculated to indicate which model provided the best fit. Significance of the random effects of CSM, MTM, and RRM were tested using the likelihood ratio test (LRT) [24].

Variance components and genetic parameters
For CSM, phenotypic variance (ŝ 2 p ), broad-sense heritability (h 2 ), coefficient of determination of GxE (c 2 ge ), and genotypic correlation across environments (r gloc ), were estimated using the following expressions:ŝ For RRM, the estimates of genotypic variance (ŝ 2 g ) and predicted genotypic values (g ij ) at the original scale, were estimated/predicted as follows [21,25]: For MTM and RRM, phenotypic variance (ŝ 2 p ) and broad-sense individual heritability (h 2 ) were estimated as [6]:ŝ The selective accuracies (rĝ g ) were calculated for CSM, MTM, and RRM, with the following expressions, respectively [6]: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi ffi ffi ffi ffiffi where PEV is the prediction error variance, obtained by the diagonal elements of the inverse of the coefficient matrix (information matrix) of the mixed model equations.
The experimental coefficient of variation (CV e ) was calculated with the following expression: where μ i is the phenotypic mean of environment i.

Genetic selection
The genotypes were ranked for each environment by each model. Then, selection gains (SG) were predicted considering the selection of 5, 10, 15, and 20 genotypes, based on the following expression [26]: where: GV i is the predicted genotypic value of genotype i; n is the number of genotypes selected; and μ p is the overall mean. The coincidence index was calculated to determine the similarity of the ranking of genotypes by the compared models in each environment. This index was calculated as the number of similarly ranked genotypes in the two compared models in each environment, divided by the total number of compared genotypes. The values were given in percentage for 5, 10, 15 or 20 selected genotypes, for the three models in the four environments.

Model selection
Based on the differences among the AIC values (Δ AIC ), the best fit CSM was CS.Rhe, the best fit MTM was MT.Rhe, and the best fit RRM was RR.2.Rhe, since a difference below 2 suggests a competitive model with the best fit (Δ AIC = 0) [29] (Table 1). Thus, the selected models (CS. Rhe, MT.Rhe, and RR.2.Rhe) were used to estimate the variance components and to predict the genotypic values. Considering all models (CSM, MTM, and RRM), the best fit was RR.2.Rhe. In addition, according to the LRT, significant genotypic effects were detected by all models and GxE effects were detected, explicitly, by CSM (Table 1).

Variance components and genetic parameters
Based on CSM, single estimates of genotypic and GxE variance were obtained (Table 2). On the other hand, one estimate of genotypic variance in each environment was obtained using MTM and RRM, but with no estimate of GxE variance (Table 2). For MTM and RRM, the highest estimates of genotypic variance were detected in environments E1 and E2, respectively. Environment E4 presented the lowest estimates of genotypic variance, and E3 presented the highest estimates of residual variance (Table 2).
Broad-sense heritability estimates did not follow any pattern across the evaluated environments. Considering CSM, the estimates of broad-sense heritability varied from 0.21 (E3) to 0.29 (E4), with similar results obtained for MTM and RRM (Table 2). Regarding the mean selective accuracies, a single estimate was obtained using CSM (0.88), while MTM and RRM provided one estimate for each environment (Table 2 and S2 Table). E4 was the only environment in which CSM presented a higher mean selective accuracy compared to MTM and RRM ( Table 2 and S2 Table). In the other environments, the mean selective accuracies estimated by MTM or RRM were slightly higher than CSM (Table 2 and S2 Table). Based on CSM, the genotypic correlation across environments was 0.72.

Reaction norms
The reaction norms (Fig 1), fitted through Legendre polynomials of degree two, confirmed the significance of the GxE interaction effects detected by CSM, because the reaction norms intersected, diverged, or converged ( Fig 1B). Therefore, the ranking of genotypes changed across the environmental gradient (S3 Table) [30].

Genetic selection
Based on the three selected models, the greatest selection gains were obtained in environments E1 and E2 (Table 3), while MTM provided the highest selection gains for all selection intensities (5, 10, 15, and 20 hybrids). In general, the predicted selection gains using RRM were slightly lower than those predicted by MTM and higher than those predicted by CSM. As expected, selection gains decreased as the number of selected genotypes increased ( Table 3).
The coincidence indices between the selected genotypes for CSM, MTM, and RRM and each pair of environments are shown in the Fig 2. Our results show a decrease in coincidence index values with an increase in selection intensity, and we can identify which environment had lowest coincidence values compared to other environments and across models. This information can be used to evaluate the most divergent Table 1

Model selection
Although both MTM and RRM enable the assessment of genetic and residual (co)variance structures [6], RRM is considered a reduced and simplified MTM, in that it provides estimates of the same genetic parameters of interest (heritability, genetic correlation), but with less parameterization and computational effort [31]. Effective modeling of genetic and residual Table 2.

PLOS ONE
effects enables breeders to mitigate the adverse effects of GxE and maximize selective accuracy [32]. Resende et al. [6] highlight the importance of testing different residual variance structures, since these structures can directly affect the estimation of genetic parameters and the prediction of genetic values.
The selected models (CS.Rhe, MT.Rhe, and RR.2.Rhe) fit heterogeneous residuals (i.e., one residual variance for each environment). Heterogeneous residuals in MET analyses were also reported by Melo et al. [32] in analyzing progeny of the common bean, and Alves et al. [12] in   analyzing eucalyptus clones. In this study, CSM, MTM, and RRM provided estimates for 6, 14, and 7 parameters, respectively. Thus, the parsimony (i.e., fewer parameters to be estimated) of CSM offers a significant advantage over MTM and RRM. Despite the efficacy of the Average Information REML [33], there are some issues regarding convergence for MTM (an unstructured model), particularly when the number of environments is high [6]. In the present study, RRM with the highest order (order four) was equivalent to MTM. Therefore, RRM (RR.2.Rhe) outperformed MTM (MT.Rhe) in terms of parsimony [7]. In addition, RRM enabled us to capture changes in GY continuously over the environmental gradient.

Variance components and genetic parameters
The estimates of variance components and genetic parameters using MTM and RRM are more realistic because these models consider the heterogeneity of genetic variances among environments [6]. Based on the scale proposed by Resende and Alves [15], GY showed heritabilities of moderate magnitude (0.15<h 2 <0.50) in all environments, independently of the model used.
Selective accuracy is one of the most relevant parameters in evaluating the effectiveness of inferences of predicted genetic values [12]. This parameter is defined as the correlation between the predicted and true genotypic values and offers evidence of experimental quality, since it considers residual and genetic variances [14]. Based on Resende and Duarte [14], the mean selective accuracy magnitudes ranged from high (0:70 < rĝ g < 0:90) to very high (rĝ g � 0:90), with RRM providing very high selective accuracies in three (E1, E2, and E3) of the four evaluated environments.
The genotypic correlation across environments (0.72) estimated by CSM, underscores the need for more robust models for genetic selection [34], as CSM assumes a constant genetic variance and a correlation across environments equal to one. However, these assumptions are not prerequisites in MTM and RRM [9]; as such, the latter two models are preferred [34].

Reaction norms
Based on CSM, GxE variance was estimated directly, and its significance was verified by LRT. For MTM and RRM, GxE variance cannot be estimated directly, since these effects are confounded with the genetic effects (ŝ 2 g i ¼ŝ 2 g i þŝ 2 ge i ). In the random regression context, the significance of GxE effects can be determined through reaction norms; the interaction is significant when the reaction norms intersect, diverge, or converge [30]. Herein, the significance of the GxE effects is very clear because the ranking of genotypes was different across the environments, independently of the model used.
Phenotypic plasticity is essential for genotype performance in changing environments [10]. The reaction norms in the present study show that the evaluated hybrids present various forms of phenotypic plasticity. In this context, phenotypic plasticity can be considered favorable or unfavorable changes for genotype adaptedness [30].

Genetic selection
The difference among the ranking of genotypes observed in all models is due to the GxE effects, which have an impact on genotype performance in different environments [13]. As mentioned by van Eeuwijk et al. [30], when genes are expressed differently in different environments, GxE occurs.
Regarding genetic selection, in general MTM and RRM indicate slightly higher genetic gains than the results obtained with CSM. This result can be attributed to the best statistical properties of MTM and RRM [6]. Environments E1 and E2 stood out in terms of genetic gains, when considering the same selection intensity, independently of the model used. These differences among selection gains can be explained by the estimates of broad-sense heritability, which were higher in environments E1 and E2, except for CSM in environment E4, which presented the highest estimate of broad-sense heritability. Furthermore, E1 and E2 presented the lowest coefficients of experimental variation.
The coincidence index demonstrates some characteristics of the environmental correlations. Firstly, we can see a decrease in the coincidence index with an increase in the number of selected genotypes. Secondly, the less coincident an environment i is with the others, the greater the variability of an environment i across the three models, and, the higher is the CV ei of the environment i, the poorer experimental quality this environment i presents.
Furthermore, CSM, MTM, and RRM can be fitted through Bayesian inference or through Hierarchical Generalized BLUP (HG-BLUP) [35], both in the phenotypic and in the genomic context. Thus, relevant studies can be carried out using these approaches.

Conclusion
The RRM presented the best fit and, consequently, provided more accurate estimates of genetic parameters and predicted genetic values. Furthermore, this model enabled to generate realistic reaction norms. Thus, the results suggest that among the three classes of statistical models, RRM is a preferential approach for analyzing MET in maize breeding.  Table. Genotype ranking in each environment (E1, E2, E3, and E4), based on the compound symmetry (CSM), multiple-trait (MTM), and random regression (RRM) models. (DOCX)