Multiple-trait model through Bayesian inference applied to Jatropha curcas breeding for bioenergy

Multiple-trait model tends to be the best alternative for the analysis of repeated measures, since they consider the genetic and residual correlations between measures and improve the selective accuracy. Thus, the objective of this study was to propose a multiple-trait Bayesian model for repeated measures analysis in Jatropha curcas breeding for bioenergy. To this end, the grain yield trait of 730 individuals of 73 half-sib families was evaluated over six harvests. The Markov Chain Monte Carlo algorithm was used to estimate genetic parameters and genetic values. Genetic correlation between pairs of measures were estimated and four selective intensities (27.4%, 20.5%, 13.7%, and 6.9%) were used to compute the selection gains. The full model was selected based on deviance information criterion. Genetic correlations of low (ρg ≤ 0.33), moderate (0.34 ≤ ρg ≤ 0.66), and high magnitude (ρg ≥ 0.67) were observed between pairs of harvests. Bayesian analyses provide robust inference of genetic parameters and genetic values, with high selective accuracies. In summary, the multiple-trait Bayesian model allowed the reliable selection of superior Jatropha curcas progenies. Therefore, we recommend this model to genetic evaluation of Jatropha curcas genotypes, and its generalization, in other perennials.


Introduction
The increasing energy demand and environmental awareness about pollution caused by fossil fuels have raised the interest of private companies, governments, and agencies of many countries in the development and production of alternative and sustainable energy sources. The use of bioenergy has been pointed out in the last decades as one of many alternatives to reduce the need for fossil fuels. Bioenergy should be interpreted as a renewable source of biomass, i.e. sugar/lignocellulosic tissues and vegetable oils [1]. Many crops have been identified as potential feedstock to obtain biofuels, mainly corn, sugar cane, soybean, castor bean oil, forest a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 biomass, livestock manure, oil palm, peanut, canola and physic nut (Jatropha curcas L.) [2]. Then, biofuel may be used in engines either directly as fuel or as a blend mixed with fossil fuels [3,4].
The energetic worldwide matrix emerges for the consumer of non-renewable fuels, when compared with the Brazilian scenario [5]. Brazil has applied extensive efforts in research and production of such fuels. In this country, Jatropha curcas has been indicated as an important alternative to the increasing demand for oil production, an important source for biofuel and bio kerosene [6,7]. This species grabs attention due to its potential for oil production, high adaptability in a range of environments, tolerance to drought and longevity. It is also an alternative for diversifying the biofuel sector [5,8,9].
As a perennial, Jatropha curcas provides several records over time in the same experiment, thus characterizing repeated measures data. This kind of data demands efficient statistical methodologies to infer on the accuracy of the estimated genetic values and gains with selection. Studies carried out under such framework denote several alternatives to point out for correlation structure between measurements (including compound symmetry models, random regression models, and multiple-trait models-MTM).
The MTM has been indicated as the most effective procedure for repeated measure analysis in perennials breeding. Its efficiency is due to the use of the genetic and residual correlations between the repeated measures. Besides that, MTM allows the estimation of variance components and genetic values for each measure [10,11].
In general, Bayesian inference has outperformed traditional frequentist analyses [12][13][14][15] by: (i) providing additional results to the frequentist approach, such as creditability and highest posterior density intervals (HPD); (ii) estimating the genetic parameters and the genetic values with increased precision, once it is conducted by Gibbs sampler, which is a Markov Chain Monte Carlo (MCMC) sampling algorithm; and, (iii) being a flexible methodology that allows to estimate accurately variance components and genetic values, even from reduced samples.
The MTM has been already applied to Jatropha curcas breeding [10,16], but no study has addressed the Bayesian MTM for repeated measures analysis. In this context, the objective of this study was to propose the Bayesian MTM for repeated measures analysis in Jatropha curcas breeding for bioenergy.

Experimental design and genetic material
The experiment was implemented in November 2008, in the experimental field of Embrapa Cerrados, Planaltina − DF, Brazil (15˚35 0 30" S and 47˚42 0 30" W; 1007 m asl), as a randomized block design, with two replicates and five plants per plot. The plants were arranged in rows, spaced 4 m between rows and 2 m between plants. All management practices were based on [17,18]. The experiment consisted of a performance evaluation of 730 individuals of 73 halfsib families. Each family was derivate from one tree (female parental), that received pollen from the whole population, given that Jatropha curcas is a allogamous plant [5]. The plants were evaluated for grain yield (kg plant −1 ) trait over six harvests, from 2010 to 2015 crop years.

Statistical analyses
The Bayesian MTM, considering the experimental design, was given by: where y is the vector of phenotypic data, with the conditional distribution assumed as: y | β, g, p, G, T, R~N(Xβ + Zg + Wp, R � I), where G is the family (co)variance matrix, T is the plot (co)variance matrix, R is the residual (co)variance matrix, and I is an identity matrix; β is the vector of block effects (systematic effect) added to the overall mean; g is the vector of family effects, assumed as g|G~N(0, G � I); p is the vector of plot effects, assumed as p|T~N(0, T � I); e is the vector of residual effects, assumed as e|R~N(0, R � I). The terms X, Z, and W refers to the incidence matrix for the block, family, and plot effects, respectively.
We assumed that G, T and R follow an inverted Wishart distribution WI (v, V), with hyperparameters v and V [19]. The hyperparameters for all prior distributions were selected to provide non-informative or flat prior distribution. For the systematic effect (β), a uniform prior distribution was attributed. Furthermore, the parameters β, g, p, G, T, and R were estimated, following the joint posteriori distribution: P(β, g, p, G, T, R|y) α P(y | β, g, p, G, T, R) × P(β, g, p, G, T, R).
Variance components and genetic parameters were estimated by the Gibbs sampling algorithm. The family variances (ŝ 2 g ) were obtained from the diagonal of the estimated matrix G, the plot variances (ŝ 2 p ) were obtained from the diagonal of the estimated matrix T, and the residual variances (ŝ 2 res ) were obtained from the diagonal of the estimated matrix R. We consider a total of 10,000,000 interactions, with burn-in of the 1,000,000 and sampling interval period (thin) of the 50. This led to a sample of the 180,000 iterations.
All effects from the model were tested against the full model by removing one effect (genetic and plot) at a time and calculating the deviance information criterion (DIC). As [20], the DIC was described as: DIC = D(θ) + 2pD, where D(θ) is a point estimate of the deviance obtained by replacing the parameters with their posterior mean estimates in the likelihood function and pD is the effective number of parameters in the model. The convergence of the parameters was checked by the Geweke criterion [21]. Posterior means and high posterior probability (HPD) for estimated parameters were obtained from MCMC samples.
The estimates of additive genetic variance between families (ŝ 2 a ), individual phenotypic variance (ŝ 2 phen ), heritability of family means (h 2 g ), coefficient of determination of plot effect (c 2 p ), coefficient of determination of residual effects (c 2 res ) and Bayesian selective accuracy (rĝ g ) were obtained as follows: and where n is the number of individuals in the plot (five individuals), r is the number of repetitions (two repetitions), and S(g) is the standard deviations of the estimated genetic values (GV). Genetic correlations (ρ g ) and phenotypic correlations (ρ p ) between each pair of measures were calculated as follows: whereŝ gðijÞ , is the genetic covariance estimated between progenies for the pair of harvests i and j;ŝ 2 gðiÞ andŝ 2 gðjÞ are the genetic variance estimated among progenies of harvests i and j, respectively;ŝ resðijÞ is the residual covariance estimated between the progenies for the pair of harvests i and j; and,ŝ 2 resðiÞ andŝ 2 resðjÞ are the residual variance estimated among progenies of harvests i and j, respectively.

Genetic selection based on selection index
The additive index (AI) [22] was used to identify superior families to be selected in the Jatropha curcas breeding program. The AI was given by: where w h is the weight assigned for harvest h; u h + f kh is the overall mean for harvest h added to the estimated genetic value of family k at harvest h; and σ h is the standard deviation for u h + f kh . Weights equal to u h /u, where u is the overall mean, were assigned in the AI. The selection gains (SG) were obtained directly by the AI output, considering four different selection intensities: 27.4%, 20.5%, 13.7%, and 6.9%, which referred to the selection of 20, 15, 10 and 5 progenies, respectively, as follows: where X s is the overall mean of the estimated genetic values of the selected progenies, and X o is the overall mean of the population. All analyses were carried out in the R software [23], using 'MCMCglmm' [24] and 'boa' [25] packages.

Results
The chains [(co)variance components] achieved convergence by the Geweke criterion, with 180,000 effective samples. Thus, the posterior mean estimates were obtained for the variance components, which suggested density with qui square and normal shape appearances (Fig 1). M1 to M5 present qui square (from which the Wishart distribution is a generalization) and only M6 shows normal appearance. This corroborates the adequacy of the prior generalized qui square used. Only M6 showed statistical learning from the data through the likelihood function. According to the DIC, there was evidence that the full model (DIC = 1399.212) is the

PLOS ONE
one that best fits the data, which reveals the significance of the genotypic effects (DIC = 2742.933) and plot effects (DIC = 1403.41).
The Bayesian MTM led to the estimation of variance components for each environment individually. For the grain yield trait, genetic variances ranged from 0.004 to 0.290 (Table 1). This dissimilarity among harvests was also observed for phenotypic variances (ranging from 0.014 to 0.882) and residual variances (ranging between 0.006 and 0.316). For the 2010, 2012, and 2015 harvests, the coefficients of determination of residual effects was higher than the others. Conversely, the coefficients of determination of plot effects were higher, when compared with the others, in the 2011 and 2013 harvests (0.449 and 0.591, respectively). The heritability estimates were higher (h 2 g > 0:50) in the 2010, 2012, 2014, and 2015 harvests, whereas it's were classified as lower (h 2 g < 0:29) in the 2011 and 2013 harvests. Lastly, the HDP intervals indicates significance for all random effects of the MTM (Table 1 and S1 Table).
The additive index indicated a selection gain of 17.29%, when selecting five families, namely, 41, 15, 10, 36 and 37. As expected, the selection gains decreased with the increase of selected families: 10 progenies (SG = 14.74%), 15 progenies (SG = 13.04%), and 20 progenies (SG = 11.60%). The ranking with the 20 progenies selected was presented in Fig 3 and the complete ranking was presented in S3 Table.

Discussion
The chains convergence ensures that the most likely estimate for each co(variance) component was achieved. The significance of genetic effects indicates genetic variability among the 73   Table).
https://doi.org/10.1371/journal.pone.0247775.g002 half-sib families, which allows the genetic selection. Deviance information criterion is widely applied as a criterion to evaluate the goodness of fit of models in Bayesian inference [12]. The HPD intervals also indicate the significance of the genetic (progenies) and plot effects. One advantage of the Bayesian inference in relation with the frequentist inference is the possibility of obtaining the HPD intervals. The HPD intervals are more accurate, when compared with the confidence intervals from frequentist inference, which increases the reliability of the variance components and genetic parameters estimated through Bayesian inference. High values of selective accuracies (0:70 < rĝ g < 0:89) were obtained, indicating high reliability and a favorable scenario for genetic selection since high accuracy allows correct ranking of the genotypes.
It is important to highlight that in this study no prior information was used. Thus, the genetic parameters estimates through Bayesian inference presented similar values those estimated through frequentist inference by restricted maximum likelihood (REML) [26]. In addition, the gain with selection was similar with the MTM frequentist [10], for the same selective intensity (13.49%, referring to 20 progenies).
The HPD intervals and selective accuracies magnitudes demonstrate the reliability and usefulness of the Bayesian inference in Jatropha curcas breeding. It is worth mentioning that high selective accuracy directly affects the success of breeding program, since the selective accuracy is the most relevant parameter for evaluation of the effectiveness of the inference about the estimated genetic value of a genotype.
The Jatropha curcas progenies evaluated in this study are non-domesticated, with great potential of improvement [1,5,27]. However, many studies have pointed out the small genetic variability in this species [7,[28][29][30][31]. In this sense, carry out a recurrent selection program is a suitable strategy for the exploration of the additive genetic variability [30,32]. Another possibility to add genetic variability in the Jatropha curcas breeding programs is the addition of new accessions.
The selective accuracies increased in the last two harvests, as well as the genetic and phenotypic correlations, which were higher between the last harvests. As indicated by [33], at least four harvests (years) are needed for the genetic selection of superior Jatropha curcas genotypes for the grain yield trait.
The heritability estimates present a fluctuating pattern, going up and down from one measure to another. Such trend could be related with the genotype × year interactions, especially in the early stages of growth, where the plants are more sensitive to environmental variations, and to the fact that the metabolism of young perennials often benefit the vegetative over reproductive growth, implying in an uneven genetic response in the early stages [5,34]. These results indicate a biannual variation of the Jatropha curcas genotypes.
Two aspects in which the Bayesian MTM can outperform the frequentist MTM should be highlighted: (i) it provides the HPD intervals [35], and (ii) flexibility in choosing the distributions for the data set and unknown parameters and the possibility of incorporating prior knowledge about the parameters of the model [36,37]. It is worth mentioning that the REML method also presents hindrances regarding the convergence of the MTM [38], whereas the convergence of the Bayesian MTM it is easier to achieve [11]. Therefore, when compared with the frequentist inference, it seems reasonable that the Bayesian inference presents better results [11,13,14,39,40].

Conclusion
The Bayesian MTM revealed to be a suitable strategy for repeated measures analysis in Jatropha curcas breeding. Besides that, the MTM have potential for genetic evaluation of any perennial.