Bayesian factor analytic model: An approach in multiple environment trials

One of the main challenges in plant breeding programs is the efficient quantification of the genotype-by-environment interaction (GEI). The presence of significant GEI may create difficulties for breeders in the selection and recommendation of superior genotypes for a wide environmental network. Among the diverse statistical procedures developed for this purpose, we highlight those based on mixed models and factor analysis that are called factor analytic (FA) models. However, some inferential issues are related to the factor analytic model, such as Heywood cases that make the model non-identifiable. Moreover, the representation of the loads and factors in the conventional biplot does not involve any measurement of uncertainty. In this work, we propose dealing with the FA model using the Bayesian framework with direct sampling of factor loadings via spectral decomposition; this guarantees identifiability in the estimation process and eliminates the need for the rotationality of factor loadings or imposition of any ad hoc constraints. We used simulated and real data to illustrate the method’s application in multi-environment trials (MET) and to compare it with traditional FA mixed models on controlled unbalancing. In general, the Bayesian FA model was robust under different simulated unbalanced levels, presenting the superior predictive ability of missing data when compared to competing models, such as those based on FA mixed models. In addition, for some scenarios, the classical FA mixed model failed in estimating the full FA model, illustrating the parametric problems of convergence in these models. Our results suggest that Bayesian factorial models might be successfully used in plant breeding for MET analysis.


Introduction
The recommendation of new genotypes for commercial use requires confident and accurate estimations of genetic parameters such as marginal genotypic values, stability, adaptability, disease and environmental stress resistance. This information can be obtained by analyzing the PLOS  genotypes across different environments; such an analysis is called multi-environment trials (MET). These trials are required to isolate the effect of the genotype-by-environment interaction (GEI), which means the differential genotypic responses on different environments. In general, the GEI hinders the breeder's work on the selection and recommendation of the best genotypes for a wide class of environments. Thus, it is used to investigate efficient methods that identify stable genotypes (those that do not contribute to GEI) and positive effects of GEI for specific groups of environments aimed at regionalized recommendations. Several statistical methods have been proposed. One is the additive main effect and multiplicative interaction (AMMI), and another is the genotype plus genotype x environment interaction biplot (GGEbiplot) [1][2][3]. These methods have been widely applied in plant breeding programs for the identification of mega-environments, which-won-where patterns, ideal genotypes, and specific adaptability, among others. Limitations inherent to these fixed-based parameters models (such as the lack of flexibility to treat unbalanced data and heterogeneity of variances) have motivated the development of more flexible methods, such as those based on mixed models. Piepho [4,5] and Smith et al. [6] proposed the MET analysis based on multivariate mixed models; it uses factor analysis (AF) structures that consider environments/genotypes and interaction as random effects.
In the literature, these models have been frequently referred to as factor analytic (FA) models [6][7][8][9][10]; they have shown great versatility for genotype selection since they combine the stability and adaptability studies into a single approach. The advantage of this approach is related to its ability to address missing data and the heterogeneity of residual and genotypic (co)variances. These models are also notable for allowing the inclusion of heteroscedastic residues of genetic values, which constitute an important aspect to be considered in the analyses [11,12]. It is known that the heterogeneity of variances among genotypes is affected by the heterogeneity of variances among environments and vice versa [13]. Furthermore, this model has been useful for summarizing the covariance pattern in multivariate data [14].
Despite the recognized advantages offered by FA models, the method also has limitations, such as the need to impose some constraints and, under some scenarios, the non-identifiability in parameter estimation. It is worth highlighting the difficulty to construct exact confidence intervals for the components of variance, since they are approximate and require assumptions of asymptotic normality. In addition, there is a great demand for computational resources and efficient algorithms to avoid the occurrence of solutions outside the parametric space (the socalled Heywood cases), among other aspects [15,16].
An interesting alternative to the frequentist or likelihood approaches is the use of Bayesian inference. Bayesian analysis allows greater flexibility for the construction of credible intervals for unknown parameters, since all inference processes are based on the posterior distribution. The flexibility of the Bayesian method for GEI analysis was partly illustrated by Crossa et al. [17], Oliveira et al. [18], Perez-Elizalde et al. [19] and Silva et al. [20], which incorporated credibility regions for the bilinear parameters in the AMMI model. It is known that inferences about genotypic and environmental scores in linear-bilinear models present great difficulties for frequentist methods or non-parametric approaches [17,18,21,22].
Using Bayesian inference in AMMI models, Perez-Elizald et al. [19] have shown how historical information can easily be incorporated into the model. Recently, Jarquin et al. [23] showed how this information could be included in the sites regression (SREG) model using the multilevel (hierarchical) Bayesian approach. Methodological innovations and applications of factorial analysis have been rapidly designed in recent years, partly due to access to computational tools for numeric integration in the Bayesian framework. In particular, it is possible to highlight the use of the Markov chain Monte Carlo (MCMC) in classical factor analysis [24].
The Bayesian analysis for FA model was presented by de Los Campos and Gianola [25]. These authors proposed prior distributions based on the assumptions of the classical factors analysis that avoids the imposition of constraints and reduces the computational requirements that have restricted the use of these models. However, this approach it not founded on the initial FA structure described in Smith et al. [6]. Instead, de Los Campos's and Gianola's [25] model accounts for the decomposition of the genetic variance, ignoring a more general structure for residual variances and performing a two-stage adjustment for the parameter of the model. In addition, only the balanced scenario was considered, since it does not consider situations where missing data are present, which is the great appeal of the FA models for MET data analysis.
This study proposes the Bayesian approach for the FA model applied in MET data using spectral decomposition to ensure the model identifiability. Additionally, we sought to evaluate the Bayesian FA predictive ability on unbalanced data with respect to the classic FA models through REML.

Material
Simulated data. We simulated a dataset with 20 genotypes (G1-G20) that were evaluated in five environments (E1-E5) using a randomized complete block design with two replicates. Five genotypes had interactions simulated from the Gaussian distribution with large variances (unstable genotypes) and positive marginal effects. Five more genotypes were marginally negative and contained large Gaussian variances (unstable). 10 genotypes were a standard Gaussian distribution, and the unstable genotypes had variance of 2 (three times the standard Gaussian distribution. Therefore, the stability and instability in this study were considered as a function of the size of the variability across the environments and the genotypes' marginal effects as Gaussian realizations. Experimental data. The experimental data were described by Melo et al. [31]. These data are related to grain yield traits measured in 50 maize single-cross hybrids (G1-G50). The experiment was conducted using incomplete block designs with 2 replicates, and each plot had 5-m rows with 70-cm spacing between rows. The grain yield was evaluated and adjusted for 13% moisture and converted into t�ha -1 . Come from.
These hybrids are originated from crossing among lineages of different backgrounds (tropical-Flint, Lancaster and Stiff Stalk Synthetic sources-see Melo et al. [31]). The crossing design was in a partial diallel system. These hybrids were evaluated during the agricultural years of 2013 and 2014 in 10 environments (E1-E10) representative of the Southeast and Southern regions of Brazil. Details on the environmental characteristics are given in the S1 Table. Method Statistical model. The classical multivariate linear mixed model under the unstructured covariance matrix can be described as: where y (n×1) is the vector of observations for p environments, q blocks and m genotypes.
Vectors β (pq×1) , u (mp×1) and ε (n×1) are fixed, random and residual vector effects, respectively. The matrices X (n×pq) and Z (n×mp) denote information regarding β and u, respectively. For simplicity, it is assumed that u represents the additive genetic effects. Moreover, ε~N(0,R) and u~N(0,S�I m ). In the FA framework, we can approximate the vector related to additive genetic effects by common and specific factors using u = (Γ p×k �I m )f+δ, where the covariance matrix S is represented by a factor analytic structure (S = ΓΓ > +Ψ). Therefore, model (1) can be rewritten as: This has been referred to in the literature as the mixed model factor-analytic structured model or simply the factor analytic (FA) model. Γ (p×k) ,f (mk×1) and δ (mp×1) are the loading matrix (k = 1, . . .,p), the factor vector related to additive genetic effects and the vector of specific effects, respectively. Furthermore, ε is the vector of residuals, X is the fixed-effect matrix referring to β, and Z is the genetic matrix referring to f and δ, where k represents the number of multiplicative terms. The Z (n×mp) is a block diagonal matrix matching the vector y (n×1) ; this is equivalent to assume that Z = (Z 1 ,� � �,Z p ) with Z k the matrix matching i th genotype at factor k with dimension (n×m), for k = 1,. . .,p, and Given that ΓΓ > is a symmetric matrix, we can rewrite the model (2) observing that the matrix of factor loadings can be obtained by Γ ¼ VΛ 1 2 . In this expression, V represents the matrix of singular vectors and Λ a diagonal matrix formed by the eigenvalues obtained by the spectral decomposition of the loading matrix. The spectral decomposition of the unstructured covariance matrix can be approximated by where λ k is the i-th singular value and α k the i-th singular vector (or eigenvector) of the spectral decomposition, respectively. A model with k = p (p as the matrix rank) multiplicative terms is called full rank, and the specific effects are assumed as null. However, the advantage of using factor analysis occurs when k is significantly smaller than p. If so, the number of parameters in the factor analysis k(p +1) becomes much smaller than those p(p+1)/2 parameters of S.
Using principal components to describe the factor loadings and considering the spectral decomposition properties and appropriate linear transformations, we can rewrite the Eq (2) that involves the loadings as follows: l k α k . Further details on these properties and the theorem's demonstration can be found in Songgui and Suju [32] and in (S1 Text). Replacing the Kronecker product into (2) with the sum in (4), the FA model can be expressed by The terms λ k and α k , as already specified in (3), are the k-th singular values and eigenvectors of the spectral decomposition of S, the matrices X 1 , X 2 , Z and Z f are design matrices destined to distribute the FA effects for experimental unity (see S1 Text). By the summation, we can efficiently distribute the effects by avoiding using the Kronecker product or other artifices in the conditional distributions. This facilitates algebraic manipulations and computational implementations. Thus, the model presented in (5) is more treatable from a Bayesian point of view with respect to its equivalent (2). The conditional likelihood has a multivariate normal density as follows More details about the likelihood in factorial models can be obtained in supplemental material (S2 Text): Prior distributions. In this study, the prior distributions for the FA parameters were established based on the assumptions of the factor analysis model through the maximum restricted likelihood (REML) [6]. In this sense, equivalent Jeffrey's prior (Gaussian with large variance) was used for the fixed effects and Bayesian AMMI priors for the loading parameters approximated by the eigenvalues and eigenvectors. The prior distributions for each parameter were given by where: I m -is an identity matrix. δjμ δ ; s 2 δ � Nð0; ΨÞ, where: Ψ = diag(ψ 11 ,. . .,ψ kk ), k = 1,. . .,p is a diagonal matrix in which the elements ψ ii are the specific variances from each environment.
α k~U niform spherical in the corrected subspace. The uniform spherical distribution is a special case of von Mises-Fisher distribution [15].
Here, to simplify the notation, it will be assumed that R is a diagonal block matrix composed of s 2 e k , with k = 1, . . ., p. For this parameter, we assume an inverse scaled chi-squared prior distribution denoted by Since Ψ is also a diagonal matrix composed of diagonal elements ψ kk with k = 1 . . ., p, we also assumed the inverse scaled chi-squared prior distribution for ψ kk as follows: All previous prior distributions satisfy the model constraints and are conditionally conjugated.
The Jeffrey's prior for the fixed effects is |I (β) | 1/2 , where I (β) is the expected Fisher information about β, used for environmental effects is proportional to a constant given that the information about β in the likelihood does not depends on β. Some equivalence is obtained in the posterior distribution by assuming a prior normal distribution with large variance, i.e 10 12 .
For the singular values λ k , the truncated normal distributions were used in order to ensure the model identifiability for positive and sorted singular values, i.e., λ k >0 and λ 1 >λ 2 >. . .>λ k . For the singular vectors α k , the orthonormality restriction request that the coordinate be distributed on the hypersphere space; here an uniform hypersphere was assumed, in which is equivalent to a von Mises-Fisher with concentration parameter equal to zero [15]. For the factor model parameters (f, δ), the priors were assumed according to the classical factor model assumptions about this effects [6]. The prior hyperparameters for the variance components were chosen in order to impose large entropy and little weight of prior on the posterior distribution.
The data likelihood given above may be simplified as where ϕ = (β,λ,δ,α,f,R) and Full conditional posterior distributions for the FA parameters. Applying Bayes' theorem on the likelihood and priors for φ= (β,λ,δ,α,f,R,Ψ), the joint posterior distribution is given by The full conditional posterior distributions for the Bayesian factorial analytic (BFA) parameters were derived from (7) as follows: i) Complete conditional a posteriori distribution for β where: ii) Complete conditional a posteriori distribution for λ k The full conditional posterior distribution for the singular value is denoted by: iii) Complete conditional a posteriori distribution for α k Although the prior α k presented spherical isotropy, it is not possible to obtain a conjugate von Mises-Fisher distribution similar to those obtained in Viele and Srivasan [33], Crossa et al. [17], or Oliveira et al. [18] for the AMMI model or in Crossa et al. [23] and Oliveira et al. [34] for the GGE model. This fact is because the previous-mentioned studies assumed the homogeneity of variances, which is different from the FA proposed here in which different variances were assumed for the environmental network. Instead, the conditional posterior will be a multivariate normal, which can violate the spectral decomposition constraints such as orthonormal eigenvectors.
To overcome these difficulties, the sampling will be performed in the corrected space free of the orthogonality constraint by using auxiliary variables (which will be defined in the further section); the vectors will be returned in the correct subspace in < m through orthogonal linear transformations. iv) Conditional a posteriori distribution for factor scores f The complete posterior conditional distribution for f is given by where vi) Complete conditional a posteriori distribution of Ψ Since Ψ is diagonal matrix with the independent elements Ψ = diag(ψ 11 ,. . .,ψ kk ) with k = 1,. . .,p, it was considered a scaled inverse chi-squared distribution for each diagonal element. The complete conditional distribution posterior for each ψ kk is where m k is the number of genotypes in the environment and ν k = 1. vii) Complete conditional a posteriori distribution for s 2 e k Similar to Ψ, the R is also a diagonal matrix and the independent priors also have scaled inverse chi-squared distributions. The conditional posterior distribution obtained for each s 2 e k is given by where n k is the number of observations in each environment. Sampling parameters requesting orthonormal basis. As all conditional distributions were obtained in a closed form, the distributions have known shapes that allow for direct sampling using the Gibbs sampler. However, as was previously highlighted, the conditional distribution for the vector α k is a multivariate Gaussian distribution instead of a von Mises-Fisher. Nevertheless, this distribution and its parametric space do not ensure the model's constraints since the vectors must have unitary norms and be orthogonal to each other.
However, the sampling can still be performed from the normal multivariate in the corrected subspace by adding two further steps: normalize the α k vector and return it into the orthogonal basis using the correct subspace through a linear transformation. Viele and Srinivasan [33] have shown how to perform the sampling of the spherical uniform distribution using the standardized Gaussian distribution and how the vectors can be placed in the correct subspace using the Gram-Schmidt orthonormalization process.
Assuming that α k is an unit vector of m-s dimension in space < m , ðα > k α k ¼ 1Þ is orthogonal to vector S (v 1 , v 2 , . . ., v s ) in < m . Further suppose that the vectors α 1 ,α 2 ,. . .,α s are a set of orthonormal vectors in the subspace generated by v 1 , v 2 , . . ., v s . In these terms, given the matrix H s = α 1 ,α 2 ,. . ., α s , there is a matrix H k with the dimensions m×(m−s) so that H m = (H s ,H k ) is an orthonormal matrix. This matrix can be obtained by the Gram-Schmidt orthonormalization process.
Thereby, we obtained the vectorα k by the linear transformationα k ¼ H > k α k so that α k 2 < mÀ s . In other words, the sampling will be performed in the "corrected" subspace without the constraints imposed by the spectral decomposition. We can easily show that α k ¼ H kαk . Therefore, we retrieve the vector in the correct subspace in < m and orthogonal to s vectors by applying the inverse operation, satisfying the model's constraints.
The posterior distribution for the sampling process can be obtained from the kernel of the full conditional posterior distribution of α k as follows: By solving within the brackets, we have Using the identity H k H > k ¼ I and dividing each part of the kernel, we get Thus, the conditional posterior for the auxiliary variableα k given the other parameters in the corrected subspace is given bỹ Therefore, the sampling ofα k is performed in the corrected subspace through the previously obtained conditional. As presented in Crossa et al. [17] and Oliveira et al. [18], we seek the sample vectors that have the norm 1. Thus, the orthogonal vectors must be normalized as α � k ¼α k ffi ffi ffi ffi ffi ffi ffi ffĩ , through algebraic manipulations, the conditional distribution for the orthonormalα k is given bỹ To place the singular vector in the correct subspace < m satisfying the orthonormal constraints, we apply the inverse transformation α k ¼ H kαk . The sampled vector is now orthogonal to the other s vectors and its transformation preserves the vector norm since Thus, the random m-s dimensional vector in < m isone-to-one transformation into the same random vector in < mÀ s .
From the previously complete conditional distributions, the parameter sampling was performed by the Markov chain Monte Carlo (MCMC) using the Gibbs sampler. The implemented iterative sampling algorithm is illustrated in S3 Text.
After concluded the iterative process and checked the chains' convergence, the samples were considered to have resulted from the marginal densities. The convergence diagnostic was performed using the Raftery and Lewis [35] Heidelberger and Welch [36] criterion. All inference processes were performed using the R statistical software [37].
Inference about the linear and multiplicative parameters of the model. The sampled eigenvectors present orthonormal basis across MCMC process, however, their maximum posterior estimator (MAP) may not be orthogonal. The MAP estimator for the multiplicative terms were constructed according to the method proposed by Chen and Shao [38] implemented in the Bayesian output analysis (BOA) package using the R statistical software (R CORE TEAM, 2016).
Bivariate regions of credibility for factor loadings and scores. The biplot credibility regions for factor loadings (λ 1 α 1 ,λ 2 α 2 ) and factor scores (f 1 ,f 2 ) were constructed using the Euclidean distances of the sampling points with respect to the distribution center using 5% as the cut off [39].
The FA biplot interpretation was performed similarly to the GGE-biplot, as suggested by Burgueño et al. [7].
Model validation in the prediction of missing data. The cross-validation process was performed considering different levels of missing data in the GEI matrix. The sample was randomly divided into k-fold of equal size, with k = (10, 3, 2) corresponding respectively to the 10%, 33% and 50% levels of random genotype losses in the environments without replacement. Thus, the simulated missing was performing on GEI table cells where some genotypes (lines) information were totally withdraw from specific environments (columns), but keeping all environments in the dataset. In addition, the cross-validation was performed to allow that all genotypes were evaluated in at least one environment. Therefore, the GEI cells were randomly sampled, but observing the restrictions given above.
The BFA's predictive ability was compared to the two-step FA models using the EM algorithm (FA-EM-expectation maximization) (Nuvunga et al. [40] and the FA via AI algorithm (Average information) (FA-AI) (Smith, Cullis and Thompson [6]) in sparse matrices implemented in Asreml-R [41].
The evaluation of the model's predictive ability was performed using the average PRESS (predicted residual error sum of squares) and phenotypic correlation between the predicted ðŷ ij Þ and observed (y ij ) values. The PRESS expression is given by Model selection or choice of the number of factors k. The number of latent factors to be retained in the model was selected using the PRESS criterion, which uses a cross-validation approach [42].The statistical efficiency criterion is given by SE = (PRES full )/(PRESS k ), which is the ratio between the PRESS of the full model and the PRESS of low-order models.
The model selection for real data was done using the Akaike Information Criterion Monte Carlo (AICM). ΔAICM corresponds to the difference between the full model and the competing models, as suggested by Raftery et al. [43]. The AICM is calculated as a version based on a posteriori simulation of the AIC [44].
where � l is the mean of the marginal log-likelihood and s 2 l is the posterior variance of the marginal log-likelihood.
Thus, the AICM can be seen as the simplified version of the penalized posterior mean of the log-likelihood. The selected model is the one with the highest AICM and lowest ΔAICM.

Simulated data
For this scenario, MCMC chains with 65,560 iterations were simulated for the BFA model. As already pointed out, the convergence of the generated chains was monitored by the criteria Raftery and Lewis and Heidelberger and Welch. The first 8,400 observations were burned-in and a thinning for each four observations was performed to ensure the convergence process. The values of burning and thinning were based on a training sample according to the test of Raftery and Lewis. It was also observed that all parameters had a dependence factor I <5. The final chain length was 14,290 samples for each parameter.
In addition, all parameters passed the stationarity test, indicating that convergence was achieved according to the criterion of Heidelberger and Welch and Geweke. That is, the tests indicated good convergence properties for all model parameters. In S4 Text, the traces for the residual variance chains are shown and its pattern corroborates the convergence test results. Another interesting detail, in the Bayesian context, is the computational time of analysis, which for the simulated data was 22.15 minutes.
The simulated parametric values for the residual variance and FA loadings τ k (recovered by λ k α k ) and the MAP estimates are presented in Table 1. We notice that the estimated values and parametric values do not present large differences, which support the BFA's ability in estimating the loadings of FA models. All values used in the simulations are within the 95% credibility intervals.
Specific estimates and credibility confidence regions for the coordinates related to the first two factor scores can be seen in (S2 Table). The posterior estimates presented values very close to those from FA mixed models obtained by the restricted maximum likelihood (REML) estimates using the average information (FA-AI) algorithm. However, the second axes for both methods present large differences for G1, G2 and G5, but they are within the credibility intervals.
The bivariate credibility regions (highest posterior density-HPD) for the factor loadings (λ 1 α 1 ,λ 2 α 2 ) and genotypic scores (f 1 ,f 2 ) that did not included the biplot origin (0,0) are shown in Fig 1. From these biplots, we see the clustering groups with respect to grain yields and/or GEI patterns. In FA models where G is confounded with GE, the first loading tends to present a positive signal (Fig 1B). According to Burgueño et al. [7], Smith, Cullis and Thompson [6] and Stefanova and Buirchell [29], in these situations, the interpretation for the FA biplot is similar to the GGE-biplot. However, it is necessary to be careful when interpreting the intersections among the credibility regions, given that loadings and scores do not have inner-product properties. If so, the biplots are better justified with separate presentations for loading and factor scores to avoid confusion with the AMMI or GGE models.
However, just to illustrate the usefulness of the BFA model, we plotted both factor loadings and scores in a single biplot (Fig 1A). In this Figure, it is possible to see two distinct groups of genotypes with respect to yields, but they are similar with respect to the GEI pattern since all regions cross the first axis. Fig 1B shows the HPD's credible regions for factor loadings at 95% confidence. This subgroup of environments has similar effects with respect to the GEI pattern, as indicated by the overlaps between the credibility regions and ellipses that did not encompassed the biplot origin.
Given that the first axis captures much of the main effect of genotypes and the second axis captures the complex part related to the GEI, it was observed that the scale and ranking of the factor scores were equivalent to the marginal E-BLUPS obtained from the mixed model analysis since the regression adjustment between the two estimates was approximately 1 (r 2 = 0.979).
Performance evaluation of the unbalanced model. In addition to evaluating the BFA model in terms of parameter estimations, the Bayesian FA model's predictive ability was evaluated and compared to the two step FA model [40], and FA-AI [41]. Although it is difficult to adjust a full FA model in the mixed models framework due to computational costs and convergence problems, this problem was not observed for the simulated dataset, given that the set of environments is relatively lower compared to the data set commonly utilized in MET. The cross-validation results showed that it is possible to predict the performance of hybrids using FA models with high accuracy, reaching up to 0.82 in some folds, as explained in S1-S6 Figs.
Regardless of the unbalanced level applied to the hybrid panel, the magnitude of the correlation values was higher than 0.30 (Fig 2) and the Bayesian model showed the highest prediction ability for all scenarios.
The PRESS in the10-fold scenario (as expected) was lowest when compared to the other folds for the three models (Fig 3). At all considered unbalanced levels, the BAF model had the lowest PRESS. At 10%, the FA-AI and FA-EM models had the same PRESS and alternated precision at the 33% and 50% levels. In other words, the scenarios proposed to PRESS using these two FA approaches were inconclusive.
FA model selection based on cross validation. The numbers of latent factors to be retained in the model were selected using a 10-fold cross-validation. Therefore, the FA5, FA4, FA3, FA2 and FA1 models were adjusted and the model selection was based on the PRESS criterion and statistical efficiency (SE) (defined as the ratio between the full FA model versus the low-dimension ones). Fig 4B and 4A present the PRESS and the correlation between the simulated phenotypic value and the predicted one using each model. Observing Fig 4B, one can verify that the FA4 model (with k = 4) presented the highest predictive accuracy compared to the others candidate Credibility regions at 95% probability for genotypic factor scores ( Fig 1A) and for the factor loadings of environments (Fig 1B) using the first two components. Only the genotypes scores that did not include the biplot origin were represented.
https://doi.org/10.1371/journal.pone.0220290.g001 models, indicating that this model is the best one. Additionally, using the PRESS criterion, the best model again was FA4 (Fig 4A). Fig 4C shows the graphic of the statistical efficiency of the competing models. In this graphic, it is notable that the efficiency increases from FA1 to FA4 where we find the maximum efficiency. This demonstrates that the use of k = 4 would be the best choice for these data representation.
The FA2 model is generally considered parsimonious and interpretable for plant breeders, since the first axis can be seen as related to adaptability and the second related to stability, such as in the SREG2/GGE model [7,29]. However, it was verified that the FA4 model showed better results in the three used criteria.

Experimental data
For the real data scenario, 85,000 Markov chains were simulated and (similar to the previous analysis), the first 9,800 observations were burned-in and thinned for every eight samples. A final MCMC chain with 9,400 observations was obtained for each parameter. The chain convergence was verified by Raftery and Lewis [35], Heidelberger and Welch [36] and and Geweke criterion [45]. The trace pattern related to the posterior distributions also indicated good convergence. The computational time spent in the MCMC sampling process was 13,938.34 minutes.
The point and interval estimates for the components of variance s 2 k are presented in Table 2. It is possible to notice that the environments E8 and E2 presented the highest and lowest residual variances, respectively. Fig 5A shows the HPD credibility regions at the 95% probability for the genotypic factor scores. For simplicity, only genotypes that did not include the biplot origin (0,0) were represented. A correlation between the first factor scores and the marginal genotypic BLUPs was r 2 = 0.897 in the Bayesian FA model and r 2 = 0.929 for the FA mixed model, which would also justify the GGE biplot interpretation.
Moreover, for the real data scenario, all factor loadings related to environments were positive and with high overlapping for the credibility intervals (Fig 5B). In this same Figure, it is observed that environments with low residual variances ( Table 2) show more concentrated credible regions with respect to those with higher residual variances. Thus, the elliptical range depends on each specific experimental variance. Estimates for factor loadings and factor scores can be found in S3 and S4 Tables.  In these same Figures, it was verified that the estimates of the two models were coincident for factor scores and factor loadings. Similarities can be observed when comparing the  distribution of environmental scores in standard biplot analyses (Fig 6) compared to those obtained from the Bayesian model. It is possible to verify the change in the E8 position from the FA-mixed model (fourth quadrant) to the Bayesian FA (second quadrant) that may result in small differences in the mega-environments formation and the specific adaptability. Model selection. In the Bayesian factors analysis, one of the most important issues to be addressed is the choice of the appropriate number of factors to be retained in the model.
In Table 3, we present the log-likelihood AICM values of the ten competing models for the real data set. It was verified that the AICM criterion was unable to select the best FA model since the AICM were practically equal.
In this same table, it is possible to verify the ΔAICM for each model. Similarly, there is not a fair criterion to select the number of factor loadings using this information. The results presented in Table 3 indicate the difficulty of selecting the best model since the criterion differentiation occurs only in the third decimal place, making it necessary to add an additional measure for model selection.
In FA models, the genetic covariance is estimated by [S = (ΓΓ T ) k−1 +Ψ]. To ensure the FA model's identifiability, the loading matrix must be conditioned to the following equality (ΓΓ T = S−Ψ).
In Fig 8, we can verify that when the full model-FA k is adjusted, the proportion of marginal genetic variance given by the geometric mean ½s  Credibility regions at the 95% probability for genotypic factor scores (a) and for factor loadings of environments (b) using the Bayesian approach for real data. Only the intervals that did not include the biplot origin were represented.
https://doi.org/10.1371/journal.pone.0220290.g005 full model, one can verify the decreasing residual variance and the fast increase of the specific variance that merges the genetic and residual (noise) variances.
It can be verified that starting from FA (k-1) model to model FA1, the variance s 2 e remains constant, but the harmonic mean related to the specific variance ½c ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi diagjΨj 10 p � increases as the FA model becomes more parsimonious. In addition, (as expected) the genotypic variance recovered by the loadings ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi diagðΓΓ T Þ 10 p decreases. It is possible to note that the use of complex models (high k-dimensional FA) has no advantage or gain in the calculation of s 2 g or s 2 e . Thus, under the identifiability imposed in this study, the performance of the model selection tests using the likelihood can be complicated, since the specific variances (or noise) are estimated separately from the experimental error and the missing variance unrecovered by the loading in low-dimension FA structure recovered by the specific variances given in Ψ.

Discussion
The development of models able to describe the response of genotypes in environmental networks has become a great challenge to quantitative breeders, mainly in the genome selection context [46] The use of factorial analytical structures in MET analyses has contributed greatly to the analysis and meta-analysis of phenotypic data in trial networks, presenting unbalanced data and different experimental accuracies. The main difference between the BFA method and the classical mixed model FA lies in the BFA model assumptions that are founded on factor analysis via spectral decomposition of the genetic covariance matrix. This approach allows one to incorporate inference in the biplot of the loadings and rotationally, which is not directly performed in classical FA analysis.  Table 3. AICM values and ΔAICM (difference between the AICM of the complete model and the others) for the selection of FAk models, (k = 1, . . ., 10) and the ranking of models for real data. This particularity ensures the model's identifiability and avoids the occurrence of estimates outside of the parametric space (Heywood cases), as observed in Smith, Cullis and Thompson [6]. In addition, it eliminates the need for loadings' rotationality, providing robust estimates for covariance parameters through loadings and factor scores. Thus, the BFA method guarantees the ability to test all latent factors that cannot be ensured by classical FA model. For example, it was observed that the FA analysis conducted in Asreml-R on real data did not converge on more complex FA models (FA�3) and a fair scanning for the best model was not possible using the real data. The selection of the number of k factors is still a non-trivial issue and inadequate choices can result in biased and unstable estimates of S and R [47,48].

Model
In this study, we assumed independence among environments assigning the scaled-inverse chi-squared distributions for each diagonal element of the residual variance. However, if this assumption is relaxed, we can use an inverse Wishart prior for the residual covariance matrix (as in de Los Campos and Gianola [25]) or the sparse prior matrix for the loading matrices (as proposed by Runcie and Mukherjee [48]).
It is important to highlight that some problems related to improper posterior may emerge in hierarchical models when non informative priors are used. This issue was discussed in Hobert and Casella [49] and Gelman [50]. While it is not ease checking all marginal posterior in complex models some tips can be observed during MCMC sampling such as non convergence or bimodal posterior presenting high mass on zero value.
In this study, we did not observed such problems as those verified in our previous work using Bayesian hierarchical AMMI model [20]. On the scenario presented by these authors, some ad hoc procedures were applied to obtain proper posteriors in family of inverted Gamma-Gaussian hierarchical models when improper Jeffrey's prior replaces the inverted Gamma distribution. Silva et al. [20] proposed the correction of degree of freedom based on Ter Braak [51] approach and suggested such correction when there is evidence of improper posterior.
The rotationally issue is not discussed in the study of de Los Campos and Gianola [25]. Runcie and Mukherjee [48] argue that the rotationality may be guaranteed by imposing constraints on the loading matrixin the prior specification (hierarchical modeling). As it is known, the rotationality of the loading matrix does not influence the estimation of model parameters. However, it may blur some biological interpretations [6] and hamper the MCMC's convergence [52].
Moreover, the parametric advantages of the BFA model in its efficiency in representing the FA pattern can be seen in Figs 6 and 7, where the BFA is compared with the FA-AI. The same FA pattern was expected, given the BFA'spriors based on the FA model's assumptions reported in Smith et al. [6]. Furthermore, in the study involving the simulated data, the estimates obtained for residual variances were close to the true values and the recovery of the first two factor loadings by the eigenvalues and eigenvectors (Table 1).
Other proposals for multi-environment data analysis can be found in the literature; some highlight the use of Bayesian AMMI or SREG models [17][18][19]23,34]. One limitation of these approaches with respect to BAF is to assume homogeneity of variances and other model assumptions based on ANOVA.
When the homogeneity of variances is assumed, the sampling of the eigenvectors is performed in hyperspherical space through the von Mises-Fisher distributions, as proposed by Liu [52] for the AMMI models. Here, given that different residual variances were assumed across the environments, it was not possible to approximate a von Mises-Fisher distribution as the posterior distribution; rather, the eigenvectors are sampled from a multivariate Gaussian distribution that provides hyper-ellipses instead of hyper-sphere over multivariate spaces. The eigenvectors were further placed in the correct subspace by orthogonal transformation, thus meeting the restrictions imposed by spectral decomposition.
Although the aim of our study is not to provide an intensive comparison of the predictive ability between the BAF and FA-based mixed models, the results showed that even using different likelihood approaches for FA mixed models (AI and EM), the BFA outperformed these models in most of the missing data scenarios (S1-S6 Figs). Given the intra-class correlation (or the average heritability) obtained for the simulated data, one could expect a maximum correlation between the observed and predicted values of 0.82; however, this threshold was exceeded in some folds (folds 3 and 5 in S1 Fig), showing that the model's accuracy may be higher than expected in some scenarios, especially when the group to be predicted is composed of stable genotypes.
The BFA was not superior in all k-fold scenarios. For instance, in some k-folds, we can verify a marginal loss of BFA with respect to FA-EM. This result was observed under low levels of missing data (10%). Additionally, the predictive ability of the model was similar for some folds, which were 33%, and 50% of data were missing. This result suggests that BFA might be inferior to FA-based mixed models under some data scenarios. As emphasized by Wolpert and Macready [53], the high performance of some algorithms in a class of problems is compensated by the low performance in another class (No Free Lunch Theorem).
The main drawback of the Bayesian approach is the high demand for computational studies of for the MCMC's convergence. However, this disadvantage can be offset by the greater model flexibility and guaranteed convergence in the parameter space. However, high-technology computers associated with optimized codes and parallel processing can alleviate this disadvantage of MCMCs.
The FA model proposed by Smith et al. [6] and improved by Thompson et al. [54] uses one-stage analysis. Meyer [16] describes this same model using a factor analysis approach that was implemented using a two-stage FA analysis (FA-EM) by Nuvunga et al. [40]. In this study, it was observed that the FA-EM was slightly superior to the classic FA. The relative gain of the FA-EM with respect to the FA-AI can be explained by the structure of the simulated data. According to Nuvunga et al. [40], the FA-EM is estimated in two stages. In the first stage, an unstructured covariance matrix (UN) is estimated by EM. In the second stage, the factor analysis is used to estimate the loading and FA scores using the varimax rotationally. In turn, the FA-AI estimates the covariance matrix approximating the FA loadings and scores using a single-stage analysis. Since few environments were simulated and given that the FA-EM is based on previous UN analysis (which tends to be the best choice for low-dimensional data), under this scenario, the FA based on two-stage analysis may have advantages for low-dimensional FA.
Crossa [55] note that the FA model can be interpreted in a similar way to the SREG model when the G effect is confounded with GEI or as in the AMMI model if the G effects are marginalized from GEI. Burgueño et al. [56] notes that there is no clear difference in the predictions between these two approaches. However, it should be noted that the FA models using the confounding of G+GE (as presented here) are more parsimonious than those FA models that marginalize the genotype effect from the GEI.
The bivariate credibility regions (at 95% probability) were incorporated into the FA biplots (Figs 1A, 1B, 5A and 5B), identifying homogeneous subgroups of genotypes and environments for adaptability and stability. Through the credibility regions, it is possible to identify which environments show greater variability or contribute to GEI. The interpretation of the BFA biplot is similar to that in classical factor analysis, although the uncertainty is considered in the BFA biplot. The joint decomposition of G+GE becomes the model's interpretation, similar to the GGE-biplot. However, it is noteworthy to highlight that this view must be assumed with caution, since the graphical representation must be performed separately for genotypes and environments, since their responses do not have the same scale or inner-product proprieties.
The selection of models for simulated data was performed by three complementary criteria: the PRESS criterion, the correlation between the observed and the predicted phenotypic value, and the statistical efficiency (SE). It was observed that the model chosen from the simulated data was FA (4). For the real data, we adopted a parametric criterion to select the best model, including the AICM and the ΔAICM information criteria [43]. As observed, the AICM was not informative enough to select the number of factors to be retained in the model, although it presented a tendency using the full model, which in theory recovers the UN matrix. The widely used FA(2) model did not present the best information criterion, even with it being very parsimonious, presenting graphical justifications and genetic interpretations [26,29,40]. The use of information criteria as a selection method for the number of factors has several ambiguities and is always subject to criticism since different criteria tend to select different models, as shown in Table 3 and in Smith et al. [28]. In addition, it can also be observed that the differentiation of models was small, which could be interpreted as a non-selection of models (Table 3).
In addition to information criteria, Smith et al. [28] emphasized the need to use additional measures in the model selection, such as the proportion of variance explained by the components. It was observed that as FA(k-1) model is adjusted, the residual variance decreases and the genetic variance increases, which is now explained by the loadings and specific variance (Fig 8). From the FA(k-2), (k-3), . . ., FA(1) model, it was verified that the specific and residual variances remain constant, showing that the application of information criteria as a model selection method is not easy t under the restrictions imposed in the BFA analysis.
Some other information criterion based on expectation of likelihood could be applied such us the averaged BIC, DIC, AEBIC and so on [57]. While these methods may present divergent behavior of AICM and the ΔAICM information criteria, we understand that, on the present scenario, the best Bayesian FA model selected by these information criterions could differ qualitatively from AICM (i.e. selecting the FA(1) model) but presenting low quantitative differences since the average marginal likelihood will be the same and the penalty criterion could not be large enough to efficiently separate the FA models within each information criterion. It worth to highlight that this occurrence is not a problem related to the information criterions, but a characteristic of our model since the restriction (ΓΓ T = S−Ψ) ensure equivalence among the marginal likelihoods across the FA models.
In general, the choice of the FA2 model has become consensual among researchers. It is argued that increasing the number of components (such as FA3, FA4 and FA5) does not guarantee better prediction ability, but it will certainly increase the model's complexity; therefore, it is doubtful that a better adjustment will be produced, as observed in our study with real and simulated data. The results obtained by Burgueño et al. [26] Burgueño et al. [7], Kelly et al. [9] and Nuvunga et al. [40] show that FA models with more than two components improved variance-covariance estimates, but this was not reflected in the genotypic predicted values (EBLUPs).
In this study, it was not our intention to give a fine perspective of model selection in FA bayesian framework; instead, our aim was to provide a bayesian perspective of FA models in MET analysis. While the model selection is an open issue in FA models, the cross-validation approaches used here, in general, were more informative to select FA scores than AIC criterion. Others methods based on transdimentional models such as reversible jump could be proposed in order to select FA scores [58]; but, until now, results in this area are scarce in Bayesian context.
Although both data sets used in this study may be relatively small when compared to the data available from experimental trials in breeding programs, it is sufficient to show the strength of the BFA method in MET analysis, presenting a better predictive ability than the classical FA based on mixed models. Given that in the MET analysis some environments are not present in the trial network and further years are unpredictable, some functional information may be included in the FA analysis to predict the genotypic values for coming years; for example, we can take the covariance matrix as a functional response surface related to some distance measures in the Hilbert space.
Our results demonstrate that the Bayesian FA model can be effectively implemented to study GEI patterns in MET networks and predict missing data with high levels of imbalance.