Adaptability and stability evaluation of maize hybrids using Bayesian segmented regression models

The occurrence of genotype by environment interaction (G x E), which is defined as the differential response of genotypes to environmental variation, is frequently reported in maize cultures, making it challenging to recommend cultivars. Methods allowing to study the potential nonlinear pattern of genotype responses to environmental variation allied to prior beliefs on unknown parameters are interesting to evaluate the phenotypic adaptability and stability of genotypes. In this context, the present study aimed to assess the adaptability and stability of maize hybrids, by using the Bayesian segmented regression model, and evaluate the efficacy of using informative and minimally informative prior distributions for the selection of cultivars. Randomized complete-block design experiments were carried out to study the yield (kg/ha) of 25 maize hybrids, in 22 different environments, in Northeastern Brazil. The Bayesian segmented regression model fitted using informative prior distributions presented lower credibility intervals and Deviance Criterium of Information values, compared to those obtained by fitting using minimally informative distributions. Therefore, the model using informative prior distributions was considered for the adaptability and stability evaluation of maize genotypes. Once most northeastern farmers in Brazil have limited capital, the genotype P4285HX should be considered for planting, due to its high yield performance and adaptability to unfavorable environments.


Introduction
Maize (Zea mays L.) cultures are appreciated worldwide. Thus it has tremendous relevance due to its several uses and applications in areas ranging from animal feed to technological industries.
Furthermore, because maize is grown under different environmental conditions, it interacts with various environments, resulting in varied genotype performances [1]. Such interactions a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 hinder the genotype sealing works given that the best-suited genotype for a specific environment may not be best suited for another environment where such interactions take place. Thus, recommendations for the broad adaptability and stability of cultivars become costly [1].
Unlike the deterministic and frequentist methods, the Bayesian framework allows for the incorporation of additional information relating to the parameters through prior distributions, which are characterized by the probability distribution. According to [11], all information is useful and must be used in the Bayesian analysis. Additionally, owing to the large quantity of information available from previous studies, incorporating this information during modeling is reasonable [12].
Despite being interesting, the Bayesian approach to adaptability and stability studies is based on a simple regression model [10]. According to [13], the simple linear regression models are unable to fit a potential nonlinear pattern to genotype responses to environmental variations. Aiming to solve this deficiency, under a statistical "frequentist" framework, [3,4] proposed the segmented regression model allowing the identification of the "ideal" genotype, which presents high yield performance, high stability and low sensitivity to adverse conditions. Nascimento et al. [12] proposed the Bayesian segmented regression model approach to analyze phenotypic adaptability and stability. This approach differs from the "frequentist" framework, allowing the addition of prior beliefs to unknown parameters, bringing new insights for plant breeders. Additionally, this method allows for the exploitation of potential nonlinear patterns of genotype responses to environmental variations, aiming to identify genotypes that present high yield performance, and high stability under adverse conditions. This genotype is denoted as "ideal," according to [4].
In light of the above, the present study aimed to assess the adaptability and stability of maize hybrids by using the Bayesian segmented regression model and evaluate the efficacy of using informative and minimally informative prior distributions in the selection of cultivars.

Materials and methods
During the agricultural years 2012 and 2013, 25 maize hybrids from public and private companies from the states of Maranhão (Balsas, Brejo, Colinas, and São Raimundo das Mangabeiras counties), Piauí (Nova Santa Rosa, Teresina, and Uruçuí counties), and Sergipe (Nossa Senhora das Dores, Frei Paulo, and Umbaúba counties) were assessed. The assessments comprised 11 environments, where the Nossa Senhora das Dores County had two different fertilization and each one was assumed as a different environment (Table 1).
During the trials, samples considered to have high fertilization ranges were treated with a total of 180.00 kg ha -1 of N, 149.80 kg ha -1 of P2O5 and 85.60 kg ha -1 of K2O, whereas samples considered to have low fertilization ranges were treated with 45.00 kg ha -1 of N, 37.80 kg ha -1 of P2O5 and 21.60 kg ha -1 of K2O, in the form of 535 and 135 kg ha -1 of 8-28-16 Zn at the time of sowing, respectively.
The experimental design was based on randomized blocks, with two repetitions, wherein each plot comprised four 5.0 m-long rows, with spacings of 0.70 m x 0.20 m, between rows, and between holes within the rows, respectively.
Fertilization was performed according to the results of the soil analysis from each experimental area. Irrigation was not carried out, and weed and pest control was performed according to the crop's requirement in each region.
The maize yield data were subjected to variance analysis for each environment. A joint analysis was carried out by adopting the model: y ijk = μ + r/e k(j) + e j + g i + ge ij + ε ijk , where y ijk is the phenotypic mean, μ is the overall mean, r/e k(j) is the effect of the k th repetition in the j th environment, g i is the fixed effect of the i th genotype, e j is the effect of the j th environment normally and independently distributed (NID) ð0; s 2 e Þ, ge ij is the effect of the interaction of the ith genotype in the j th environment NID ð0; s 2 ge Þ, and ε ijk is the experimental error NIDð0; s 2 ε Þ:

Model and Bayesian inference
The bi-segmented regression model is given by where y ij is the response of genotype i in environment j, β i0 is the mean response of genotype i, β i1 is the slope under the first regime (the linear regression coefficient related to the unfavorable environments), and β i2 represents the change in slope from the first to the second regime (β i1 + β i2 is the slope after the change-point, that is, the linear response to the favorable and � I þ is mean of the coded environmental index considering only environments with positive indexes and e ij is the error term, NID (0, σ 2 ). The Bayesian approach for the bi-segmented model is described in Nascimento et al. [12]. In summary, assuming e ij jIs 2 ie � Nð0; Is 2 ie Þ, each observation y ij has a distribution y ij � Nðb i0 þ b i1 I j þ b i2 TðI j Þ; Is 2 ie Þ, and the likelihood function for each genotype is given by The prior distributions for the parameters , and α i , β i are the known parameters. This last prior distribution is the Gamma distribution with mean and variance equal to a b e a b 2 , respectively. Additionally, ie the precision is equal to 1 The joint posterior distribution is proportional to the product of the likelihood function (Eq 1.1) and the prior distributions (Eqs 1.2-1.5).
ffi ffi ffi ffi ffi ffi ffi ffi ffi To make inferences regarding the parameters in Eq 2, the Markov chain Monte Carlo (MCMC) was used to obtain the posterior marginal distributions for each parameter.
The marginal distribution samples of the stability parameter, s 2 di , were obtained indirectly. This parameter is a function of s 2 ie . Therefore, using the s 2 ie values from each interaction, we obtain s 2 di according to the following expression:ŝ 2 di ¼ŝ 2 ie À MSR r , where MSR is the residual mean square obtained from the variance analysis and r is the number of repetitions of the experiment. The hypotheses of interest were tested by calculating the 95% credibility intervals for the parameters.

Priors distributions
Two models were fitted to assess the model's goodness of fit. Model 1 (M1-minimally informative prior distributions) was characterized by minimally informative prior distributions, which were represented by distributions with large variances: Model 2 (M2-informative prior distributions), similar to the method employed in [12], was characterized by the estimates obtained from the frequentist analysis of the bi-segmented model, used as information to define the hyperparameters.

Assessing the model's goodness of fit
Models M1 (minimally informative priors) and M2 (informative priors) were compared by means of the Deviance Information Criterion (DIC) [14]: Here, DðŷÞ is a point estimate of the deviance obtained by replacing the parameters with their posterior mean estimates in the likelihood function and 2p D is given by the effective number of parameters in the models. Models with lower DIC are preferred.

Bayesian analysis
We adopted MCMC chains considering 100,000 iterations of the Gibbs sampler algorithm. We set the burn-in to 10,000 iterations and thinned every five iterations. In each chain, we analyzed the posterior mean, standard deviation, 95% credibility intervals, and convergence criterion statistics [15,16]. The methodology was implemented in software R [17], and the joint distribution samples were obtained using the rbugs function of the rbugs package [18], which was accomplished by fusing R and OpenBugs (a software application for the Bayesian analysis of complex statistical models using MCMC methods). The MCMC chain convergence was accessed by Geweke and Raftery-Lewis diagnostics using the package [19] provided in the R software [17].

Results and discussion
The analysis of variance of the maize yield (kg/ha) demonstrated that the genotypes, environments and the genotype x environment interaction (G×E) presented a significant effect (P < 0.05) ( Table 2). The significance of G x E interaction indicates contrasts between environments and differential genotypic responses to environmental effects. The occurrence of G × E interaction, which can be defined as the differential response of genotypes to environmental variation, is frequently reported in maize cultures, making it challenging to recommend cultivars [20][21][22][23][24].
The posteriori means and their respective credibility intervals (CI) provided estimates for the adaptability and stability parameters. Considering the results provided by the model M1, which is characterized by the minimally informative prior distributions, most genotypes (14 genotypes) presented the linear regression coefficient related to the unfavorable environments equal to 1 (β i1 = 1), except 30A16HX, 2B707HX, 2B587HX, 30A37HX, 2B604HX, 20A55HR, 20A78HX and DKB370, which presented values higher than 1 (β i1 > 1), and the genotypes P4285HX and BRS2020, which presented values lower than 1 (β i1 < 1) ( Table 3). Among those genotypes that presented the linear regression coefficient related to the unfavorable environments equal to 1, only two (30A68HX and AS1555YG) exhibited the linear response to the favorable environments higher than 1 (β i1 + β i2 > 1) ( Table 3). No genotype presented stability parameter (s 2 di ) equal to zero. On the other hand, the genotype AS1555YG presented coefficient of determination higher than 80% (Table 3). However, only the genotype 30A68HX presented higher mean productivity (b 0;30A68HX ¼ 9465:56 >m ¼ 8682:99).

PLOS ONE
The analysis considering M2 was able to better discriminate the genotypes, since, out of 25 genotypes, 14 and 11 presented the linear regression coefficient related to the unfavorable environments equal to 1 for M1 and M2 fitted models, respectively.
A comparative analysis of the limits of credibility intervals obtained by the two fitted models (M1 and M2) reveals that the use of informative prior distributions (M2) reduced the limits of credibility intervals, when compared to minimally informative prior distributions (M1). Similar results were observed by [11], who used the Bayesian segmented regression model for adaptability and stability evaluation of cotton genotypes. Nascimento et al. [10], Couto et al. [20] and Teodoro et al. [25] used the Eberhart and Russel's Bayesian method to evaluate the phenotypic stability and adaptability of alfalfa and popcorn cultivars and obtained similar results. Additionally, the difference in DIC values between models using minimally informative and informative priors ranged between 1.59 and 2.01. Once smaller DIC values indicate better data fitting, these results demonstrate that M2 should be considered for the adaptability and stability evaluation of maize genotypes (Table 4).
Overall, the Bayesian framework of the segmented regression model allowed the incorporation of additional information related to the parameters, through prior distributions, which reduced the ranges of the credibility intervals, increased the precision of parameter estimates, and, consequently, provided reliable genotype selection. In practice, this information can be obtained from previous studies, including [10,20]. Due to the lack of prior information related to the evaluated maize hybrids in the literature, in this work, the estimates obtained from the frequentist analysis of the segmented model were used to define the hyperparameters.
In practice, most northeastern farmers in Brazil have limited capital, which prevents them from investing in production technology. Therefore, genotypes adapted to unfavorable environments should be considered for low technology planting [26]. The recommendation of cultivars not adapted to regional conditions leads to low yield and other serious problems, such as the indiscriminate use of pesticides and excessive cultural treatment [27]. Considering the results provided by the model M2 (informative prior distributions), only the genotype P4285HX presented the linear regression coefficient related to the unfavorable environments lower than 1 (β i1 < 1) and high mean productivity (b 0;P4285HX ¼ 9054:01 >m ¼ 8682:99) ( Table 3).

Conclusions
Incorporating additional information about the parameters through prior distributions decreases the credibility interval ranges. The difference in DIC values between models using minimally informative (M1) and informative priors (M2) was positive, which indicates a better data fitting, considering M2. Therefore, it should be an alternative for the adaptability and stability evaluation of maize genotypes. The genotype P4285HX presents high yield performance and adaptability to unfavorable environments and should be considered for low technology planting, which is practiced by northeastern Brazilians farmers.