Genetic insights into elephantgrass persistence for bioenergy purpose

Persistence may be defined as high sustained yield over multi-harvest. Genetic insights about persistence are essential to ensure the success of breeding programs and any biomass-based project. This paper focuses on assessing the biomass yield persistence for bioenergy purpose of 100 elephantgrass clones measured in six growth seasons in Brazil. To assess the clones' persistence, an index based on random regression models and genotype-ideotype distance was proposed. Results suggested the existence of wide genetic variability between elephantgrass clones, and that the yield trajectories along the harvests generate genetic insights into elephantgrass clones’ persistence and G x E interaction. A gene pool that acts over the biomass yield (regardless of the harvest) was detected, as well as other gene pools, which show differences on genes expression (these genes are the major responsible for clones’ persistence). The lower and higher clones’ persistence was discussed based on genome dosage effect and natural biological nitrogen fixation ability applied to bioenergy industry. The huge potential of energy crops necessarily is associated with genetic insights into persistence, so just this way, breeding programs could breed a new cultivar that fulfills the bioenergy industries.

The use of a given genotype for energy purposes should be mainly based on the knowledge of its calorific value and its yield biomass [12]. Although raw material quality significantly impacts on bioenergy conversion, the greatest economic driver of raw material production is biomass yield. As biomass yield per unit area increases, transport expenses and demand on arable land decreases, leading to an increase in overall economic returns [13]. PLOS  Besides high biomass yield, biomass energy industries demand a persistent (high sustained yield) and dedicated energy crop cultivar over the harvests. This fact allows better scaling and scheduling of planting and consequently better storage of the raw material, which is given based on the biomass demand, reducing overall costs. Furthermore, due to the perenniality of biomass crop cultivars, they may not be regularly substituted in a plantation [14].
Perennial crops must be sufficiently persistent to maintain their yield performance over the subsequent growth seasons [15][16][17]. Persistence can be affected by several factors, such as environment (e.g., disease, temperature, drought, etc.) and crop management (e.g., harvest and grazing) [18]. In addition, the genetic contribution for persistence control should be highlighted.
High biomass yield potential and greater output:input ratio are the breeder's goals regarding any dedicated energy crop. Therefore, to achieve the plant ideotype, many steps are involved in the plant breeding process. In any breeding program of energy crops, the phenotyping step will focus on the target traits, i.e., the data are obtained along the yield trajectory in different growth seasons for the same traits, which are denoted by longitudinal data, according to Meyer [19].
Genetics models such as random regression (RR) models deal with longitudinal data very well [20] because they capture the change of a trait continuously over the trajectory with fewer parameters than the multi-trait models [21]. This means that random regression models are a parsimonious covariance structure within a continuous scale (infinite dimensional) that provides estimated genetic values at specific times (harvest) or as a trend over time. Besides that, the RR models partition the variance into genetic and permanent environmental effect without assuming constant during the whole evaluated period [22].
Understanding the yield trajectory along the growth seasons/harvests may determine the success of any biomass-project and even generate genetic insights into elephantgrass clones' behavior, which is useful for breeding programs. To achieve these highlights, this paper assessed the biomass yield persistence of elephantgrass clones for bioenergy purpose.

Location and experimental conditions
The experiment was carried out at the experimental field of Embrapa Dairy Cattle Research Center, located in the municipality of Coronel Pacheco, MG, Brazil (lat 21˚33' 18" S, long 431 5' 51" W, at 417 m asl), in a red-yellow latosol with the following chemical properties: pH (5.4); H+Al (2.31 cmolc dm -3 ); P and K (1.1 and 23 mg dm -3 , respectively); and the following exchangeable cations: Al 3+ , Ca 2+ , and Mg 2+ (0.2, 1.4, and 0.7 cmolc dm -3 , respectively). The planting was carried out in December 2011, in 0.20 m deep furrows, and 80 kg ha -1 P 2 O 5 fertilizer was applied at planting. After the establishment stage, at 30 days after planting, elephantgrass plots were cut to 0.30 m stubble height (uniformity harvest), beginning the first of the six harvests. Maintenance fertilization was performed with 300 kg ha -1 of the N-P 2 O 5 -K 2 O formulation (20:05:20 blended granular fertilizer), after the uniformity harvest and after all harvests. Fertilizers were applied according to the soil analysis.
Six harvests were carried out for this study. Aiming at using them as bioenergetic feedstock, the first (September 28 th , 2012) and the second (June 04 th , 2013) harvests were made at 250 growth days; the third harvest (April 15 th , 2014), at 315 regrowth days. Nevertheless, at the fourth harvest (January 15 th , 2015), at 275 regrowth days, only the propagation material for the network assay of elephantgrass was collected, i.e., no data field information was registered. At 315 regrowth days, the fifth harvest was performed (November 26 th , 2015), and the last harvest was carried out at 210 regrowth days (June 22 th , 2016). Weather and phenotypic data for the term of the present assay are shown in S1 Fig and S1 Table, respectively.

Experimental design
One hundred genotypes of the Elephantgrass Active Germplasm Bank (BAGCE, S2 Table) were evaluated. Plots (1.5 m x 4 m) consisted of a single 4 m row. Plots were planted side by side, spaced 1.5 m apart and allocated in a 10 x 10 simple lattice design, with two replications.

Measurement-Biomass yield
The elephantgrass was harvest and weighted in a 3m section from the middle of the rows to obtain the gross fresh biomass weight per plot. Previously, randomly fresh sub-samples of three complete plants from each plot were harvest and weighted (fresh biomass weight) and oven dried at 56˚C for 72 hours until reaching constant weight (dry biomass weight). After that, the material was ground until passing through a 1 mm mesh. The dry biomass yield was estimated using the fresh and dry biomass weights of the sub-sample fractions and the fresh biomass weight of the gross sample.

Statistical analyses
Random regression model. Initially, several random regression models were tested to identify the one that best fits the biomass yield trajectory, using the following general model: The random regression models were fitted on Legendre polynomials of age at measuring (harvest day) for random and fixed effects, considering various orders of fit. y ijk is the i th genotype measured (i = 1, 2, . . ., ng, where ng is the total number of genotypes) on the j th harvest day (j = 1, 2, . . ., nh, where nh is the last harvest day) on the k th replication. R k is the fixed effect of replication (k = 1, 2). β m is the fixed regression coefficient fitted through the quartic degree (order 5 or M b = 5) of Legendre polynomials to the common average trajectory of all genotypes. The random effects, α im and p ikm , are the random regression coefficient for the Legendre polynomial of order m for the genetic effect and the permanent environmental effects for the ik th plot (ik = 1, 2, . . ., np, where np is the total number of plots), respectively. F ijm is the m th Legendre polynomial for the j th harvest day, standard from -1, to +1, from the i th clone; M is the fit order, ranging from 1 to 5, of the Legendre polynomial for the genetic and permanent environmental effects, respectively; e ijk is the residual random effect associated with y ijk . In the matrix notation, the above model is described as follow: where: y is the data vector; β is the vector of the effects of the replication (assumed as fixed) added to the overall mean; α is the vector of genetic effects (assumed as random); p is the vector of the permanent environment (random); e is the vector of residue (random). X, Z, and W represent the incidence matrices for these effects. The fixed part of the model was assumed to account for systematic harvest effect, so that α~N(0, K g I ng ), α~N(0, K p I np ), α and p are uncorrelated, and e~N(0, R), where I ng and I np are identity matrices of appropriated size ng and np, respectively. denotes the direct product. K g and K p are the covariance coefficient matrices for genetic and permanent environmental effect, respectively. R represents a matrix of residual variances. Several models with different residual variances structures (e.g., unstructured, diagonal, and homogeneous) were tested.
Choice of the best-fitted model. The maximum degree of the fitted orthogonal polynomials was tested to determine the most appropriate combination. The random regression models were compared using the likelihood ratio test [23] (LRT) and the Schwarz's Bayesian information criterion (BIC), BIC = -2LogL + pLog[n-r(x)], where: LogL is the logarithm of the likelihood function; p is the number of estimated parameters; n is the number of observations, and r(x) is the rank of incidence matrix of fixed effect [24].

Extracting the genetics information
Variance components. Based on Kirkpatrick et al. [25], the following estimator was used to obtain the variance components ðV x Þ on the original scale.
The term F ijm was defined in the section Random regression model;K x is the estimated coefficient covariance matrix for the random effect (genetic or permanent environment).
Estimated genetic values. The genetic values ðĝ ij Þ were estimated as follows: Whereâ im is the random regression coefficient of order m for the genetic effects of the i th clone. Accuracy. Accuracy ðr ij Þ was estimated as follows: where PEV ij is the prediction error variance, obtained by the diagonal elements of the transformed coefficient matrix of clone i and harvest j;V g is the estimated genetic variance. Eigenfunctions. Additionally, the eigenfunction (C i ) of the genetic coefficient covariance matrix was calculated to provide genetic insights about the studied trait, based on Kirkpatrick et al. [25].
where (c Ci ) m is the m th element of the i th eigenvector ofK g , and F m is the normalized value of the m th Legendre polynomial.
Clones' persistence. Clones' persistence (Persistence i ) was obtained by the distance between each clone in relation to the ideotype (genotype-ideotype distance), considering all estimated genetic values in the range of 250 to 1615 days. The ideotype-max(ĝ j ) was defined as the maximum estimated genetic value in each day in the experimental period. The following algorithm was used:

Software
Statistical analyses were performed using the ASReml 4.1 [26] and R [27] software. The ASReml code is available in S1 Code.

The best-fitted model and the general genetic behavior
The goodness of fit of the models is presented in Table 1. According to the Schwarz's Bayesian information criterion (BIC), the best model is denoted by Leg4.1.D with diagonal residual variance and was adopted to describe the changes in the variance and covariance components for elephantgrass biomass yield over multi-harvest. When the models without the genetic or permanent environmental effects were tested by the likelihood ratio test, genetic variability (pvalue < 0.01) and significant permanent environmental effect (p-value < 0.01) were detected for all models. All the models run on ASReml follows in S3 Table with the main output. Heritability, genetic variance, phenotypic variance, and permanent variance trajectory for biomass yield over the multi-harvest Fig 2 shows that the phenotypic variance trajectory was not stable over the multi-harvest. The phenotypic variance reached the peak in the sixth harvest, i.e., the greatest phenotypic variability occurred in the sixth harvest followed by the second one. A constant trajectory was observed for permanent environment variance.
The genetic variance trajectory was increasing over the multi-harvest, with a slight decrease around the fifth harvest (Fig 2). The heritability estimates ranged from 0.45 to 0.75. In general, heritability values decrease from the first to the second harvest (indicating that the second harvest is strongly influenced by the environment) and remain above 0.69 from the third harvest onwards (Fig 2). In addition, the genetic values for the fourth harvest (1090 days, without phenotypic data) were predicted with 84.41% average accuracy. All trajectories were estimated from the random regression model (Leg4.1.D) fitted by Legendre polynomials for biomass yield over multi-harvest. See S2 Fig for accumulated rainfall and temperature data. Fig 3 reveals that the first eigenfunction (associated with the largest eigenvalue) is nearly constant over the multi-harvest, indicating that 86% of the genetic variation is explained by a gene pool that acts over the biomass yield, regardless of the growth season.
The second eigenfunction (around 11% of the genetic variation, Fig 3) represents other gene pool that shows differences in genes expression under different environment conditions, i.e., biomass yield reverses between the first two harvests (250 and 500 days) in relation to the third, fourth, and fifth harvests (815, 1090, and 1405 days) and reverses again in relation to the last one (1615 days). This genetic factor is the major responsible for the genotypes by environments (growth seasons) interaction. The third and fourth eigenfunction was not interpreted due to their small genetic variation proportion (3.31 and 0.26%, Fig 3).

Elephantgrass clones' persistence
The experimental biomass yield means were 12.51, 29.60, 19.60, 15.28, and 19.07. Mg ha -1 for 250, 500, 815, 1405 and 1615 days after uniformity harvest, respectively. Fig 4 shows the ten most and the five least persistent elephantgrass clones. Clone 2 was not the most yielded at all harvests; however, it sustained the biomass yield with the highest performance from the third harvest (815 days) onwards.
Among the ten most persistent clones, clones 95 and 55 showed the highest biomass yield in the first two harvests, i.e., these clones achieved the biomass yield peak quickly, but were not able to persist over the multi-harvest (Fig 4). Clones 49, 100 and 99 showed the lowest persistence (Fig 4). Clones 99 and 100 were the only ones that completely died in the plots of the first replication in the fifth harvest (1405 days) and in all plots in the sixth harvest (1615 days).

Discussions
The best-fitted model Statistical methods for analyzing data of perennials need to appropriately model the genetic effects over time [28]. A suitable method for modeling the genetic trajectory over time is the random regression model, as commonly used in the animal sciences [20,22]. It is noteworthy that the experimental data used in this work shows unequally spaced sample intervals over the harvests (250, 500, 815, 1090, 1405, 1615 days after the uniformity harvest) and that no phenotypic data was collected in the fourth harvest (at 1090 days after the uniformity harvest). Kirkpatrick et al. [25] relate that, under this condition, the random regression models are sufficient, being the adequate methodology.
Different criteria have been used to find the polynomial order of the model with the best fit and parsimony [29]. In the present study, the best-fitted model was indicated by the BIC criterion (Legendre polynomials of the fourth order, for genetic effect, and first order, for permanent environmental effect with diagonal residual variance structure-Leg4.1.D, Table 1). From this model, all interpretations were performed.
Simple repeatability and multi-trait models can be employed for longitudinal data analysis [21]. However, the repeatability model considers the genetic correlation between the different harvests equal to one, i.e., exactly the same genes acting in the control of the trait over time [30]. The repeatability model can be reproduced from the random regression model, considering the fitted order equal to one (intercept random regression model). Thus, the repeatability model is not suitable to represent the genetic behavior presented by the biomass yield over the multi-harvest. The multi-trait model assumes that the data are discontinuous, while in the present work the harvests are taken continuously. Therefore, the extrapolation of the genetic value to an unobserved harvest is not recommended. The multi-trait model would need to estimate 30 parameters, while the chosen random regression model (Leg4.1.D) required only 16 parameters, thus being parsimonious, in addition, the multi-trait model does not allow computing the random permanent environment effects (i.e., the permanent environment effect is confused or overlaid with the temporary environment effect).

Genetic variability and general genetic behavior
The high amplitude that the random genetic curves deviations showed in relation to general biomass yield trajectory suggests high genetic variability (Fig 1). Wide genetic variability in the studied clones was expected since they are clones belonging to a germoplasm bank (BAGCE) and have not yet been genetically improved for bioenergetics [8]. According to Azevedo et al. [31] and Rocha et al. [9], these clones presented genetic variability using single sequence repeated markers and the biomass yield trait, respectively. Knowledge about genetic diversity is the key to further improvement, and evaluation of diversity in germplasm is essential for the effective use of genetic resources in breeding programs. Assessing the diversity information would facilitate the progress in plant breeding [32].

Permanent environment, plasticity, and persistence
Besides the genetic and the temporary environmental effect that composes a phenotype, a permanent environmental effect occurs in longitudinal data. Permanent implies stability and a constant or common presence to repeated measures [33,34]. Kruuk and Hadfield [33] have shown that permanent environmental effect may overlap with several factors, e.g., dominance or epistatic genetic effects, maternal genetic effects, common environmental effects, especially the phenotypic plasticity. It is worth mentioning that the permanent environmental effect is estimated by the variance among repeated measures in the same individual (i.e., in animal science no replications of the individual is used in the experiment as in plant science). Replications of individuals or families are very common in plant breeding experiments, and the permanent environmental effect is estimated by the variance among repeated measures in the plots (different genotypes-replications combination). Under this condition, additional effects occur due to differential competition between the same individual in different plots (due to experimental randomization).
The second harvest showed the greatest phenotypic variance and the greatest temporary environment or residual variance (Fig 2); this behavior may be explained by the rainfall and temperature data (S2 Fig). The second harvest showed the most favorable environmental condition (e.g., temperature and rainfall, see S2 Fig) for elephantgrass growth, which is confirmed by the highest biomass mean, 29.60 Mg ha -1 (51% more productive than the second most productive harvest). Thus, resources availability (e.g., light, water, nutrient, temperature, etc.) can stimulate phenotypic plastic response subject to generate a large phenotypic variance.
According to Nicotra et al. [35], phenotypic plasticity is the range of phenotypes that a single genotype can express as a function of its environment. Phenotypic plasticity depends on the genome plasticity, defined as a change in the genome structure or organization associated with environmental signals [35]. In this context, the phenotypic plasticity can be considered as favorable or unfavorable changes for genotype adaptedness [36]. Nicotra et al. [35] define adaptive plasticity as the phenotypic plasticity that increases the global fitness of a genotype. In the plant breeding context, the adaptive plasticity is equivalent to the adaptability proposed by Finlay and Wilkinson [37].
Bradshaw [38] relates that plasticity is related to stability. Plasticity can be a simple sign of weakness (of lack of fitness), but it can also be a sign of strength, attributed to maintenance mechanisms of fitness. Breeders attempt to select genotypes with consistent performance between a range of target environments in order to reduce the G x E effects [39]. For instance, breeders try to produce cultivars that reliably perform despite year-to-year fluctuations in weather patterns. In the case where limited phenotypic plastic response confers stability, the low G × E contribution may have a desirable effect by enabling germplasm to predictably perform across environments [39]. Plasticity not only gives an edge over competitors but also is essential for genotype persistence in new or changing environments [40]. In the plant breeding scenario, persistence was defined as high sustained yield over environmental changes [15][16][17] and is a relevant trait under several aspects of the bioenergy industry.

Insights about G x E interaction-Driving the selection
When trajectory curves are non-constant, genotypes show plasticity (in new and changing environments-growth seasons), and when the curves intersect, a G x E interaction occurs (see Fig 4). According to van Eeuwijk et al. [36], this type of G x E has more severe consequences for breeders as it will change the rank order of clones in function of the environmental conditions.
The main reason for the seasonality of elephantgrass yield over multi-harvest is the differential genes expression, i.e., the environmental effect promotes different levels of genes expression (even the nonexpression of the genes) that affect the elephantgrass biomass yield. The differential genes expression is the theoretical base of G x E interaction. The first eigenfunction (Fig 3) captured a gene pool that was equally expressed in all growth seasons (e.g., general adaptability genes). The second eigenfunction (Fig 3) clustered genes that expressed themselves depending on the environmental differences (these genes determine the persistencespecific adaptability genes). The uninterpreted eigenfunctions showed small eigenvalues (Fig  3), and according to Kirkpatrick et al. [25], eigenfunctions with very small (or null) eigenvalues represent deformations for which there is little (or none) genetic variation.
The second harvest is not recommended for selection (h 2 = 0.45, Fig 2). Results also indicated that the first, third, fourth, fifth, and sixth harvests represent a more favorable scenario for selection (i.e., accuracy higher than 83%). The genetic breeding must handle inheritable traits, i.e., those with high heritability. The heritability of a trait will have an impact on selection decisions. Genetic progress tends to be much slower in lowly heritable traits. Conversely, with higher heritability, a faster progress is achieved with selection due to greater accuracy in selection decisions [41].
Forage breeding can be a complex task due to the plant perenniality, among several other factors [42]. Persistence is a complex trait affected by a large number of biotic, abiotic, and genetic factors, e.g., diseases and pests, mechanical harvesting equipment, intensity of harvest management, temperature, drought, plant competition [18], and genome plasticity. Thus, pyramiding of genes that express themselves in different environments would increase genome plasticity and consequently increase the genotypes' persistence.

Lowest persistence-Supported by the genome dosage
The death of clones 99 and 100 in the last two harvests is a factor that explains the lowest persistence. Death may have been caused by the low perenniality of these clones, i.e., the genome dosage may interfere with the perenniality of elephantgrass clones.
Elephantgrass is allotetraploid (2n = 4x = 28, A'A'BB) with ploidy level variations [43]. Pearl millet [Pennisetum Glaucum (L.) R. Br.; Syn. Cenchrus americanus (L.) Morrone, 2n = 2x = 14, AA] has an annual growth habit and can produce interspecific hybrids with elephantgrass [43,44]. For instance, clone 99 is triploid (2n = 3x = 21, AA'B), and clone 100 is hexaploid (2n = 6x = 42, AAA'A'BB). Triploids have an additional copy of A pearl millet genome, whereas the hexaploids have two copies of A pearl millet genome. The A' genome chromosomes are larger than the B genome chromosomes. Moreover, the B genome contributes to elephantgrass perennial life cycle [45]. However, the additional genome dose of the pearl millet (A genome in Clones 99 and 100) may reduce the perenniality due to the annual growth habit genes present in the A genome.
Clone 46 (Kizozi) also showed low persistence. Kizozi was previously studied by Techio et al. [46] and confirmed as a wild species of the genus Cenchrus due to the somatic chromosome number (2n = 54). Wild species could have less adaptive genes when compared with breeding cultivars, which leads them to low persistence. Some of the least persistent clones are tetraploid (e.g., clones 30, 34, 35, 6, 77, 74, 91, and 49). The low persistence presented by clone 49 (Mott) is due to its reduced plant height (dwarfing genes). Mott is specifically adapted to be used as forage, in the pasture, owing to its high nutritive value, and it has previously been identified as of low potential for bioenergy production [9]. The other clones showed reduced biomass yield, low height, thin stalks [8], and consequently low persistence. These clones were classified as Napier, Mercker, or intermediate (Napier/Mercker) group [47], and were studied by Rocha et al. [9] regarding the aptitude for bioenergy purposes. The authors showed that Napier, Mercker, and intermediate groups have a low potential for biomass energy production.
Furthermore, biomass crops should be perennial because a cultivar cannot be regularly substituted in a plantation. Perenniality, unlike annual habit, would be advantageous due to costs reduction with the establishment of energy crops [14].

Clones' persistence applied to bioenergy industries
The aim of supplying biomass to bioenergy is to achieve high energy yields per unit area and the best possible fuel quality. The energy yield comprises the biomass yield and the energy content of the biomass. Fuel quality is determined by the physical and chemical properties and influences the entire process of thermal utilization [48]. However, the greatest economic driver of raw material production is biomass yield [13]. Besides high energy yields per area, biomass energy industries look for a cultivar with higher persistence (high sustained yield). This fact allows reducing costs due to the better scaling and scheduling of planting, harvesting, and storing of the raw material, based on the biomass demand.
Persistence is an economically important trait for perennial forages due to the costs involved in sward establishment. This trait is dependent on the vigor of a plant and its ability to survive and contribute to yield and ground cover [16], and thus, the clumps expansion capacity and the number of basal and axillary tillers may directly impact on persistence. The persistence of elephantgrass clones was measured using clones-ideotype distance over multiharvest. This approach takes into account the yield stability and the high genetic values.
Clone 2 showed the highest sustained biomass yield over multi-harvest (highest persistence), which may be supported by the natural biological nitrogen fixation (BNF) ability of this clone, as reported by Morais et al. [49]. The higher natural nitrogen input (e.g., BFN) is related to higher biomass production and competitive advantages, especially under unfavorable environmental conditions. Furthermore, to make the input:output rate more favorable to energy balance, industrial nitrogen inputs (N fertilizers) must be minimized. Low nitrogen requirement is desirable not only for being a valued constituent in terms of conversion to energy but also for N fertilizer is a costly input [50].
Understanding the yield trajectories patterns of elephantgrass clones allowed detecting the G x E interaction and assess the persistence of these plants. The eigenfunctions indicate valuable insights about the G x E interaction, i.e., there is a gene pool (general adaptability genes) that is expressed in all growth seasons and another gene pool (specific adaptability genes or persistence genes) that is expressed under different environmental conditions. These findings suggest that increase the elephantgrass persistence can be successfully achieved with breeding efforts, as a consequence of wide genetic variability for biomass yield and high heritability values over harvests. Moreover, the random regression model allows optimizing elephantgrass management techniques, as well as developing strategies for crosses (i.e., explore the genetic variability).
Future persistence studies applied to elephantgrass should integrate molecular markers information, besides the phenotypic data, aiming at finding several stable quantitative trait loci (QTLs) across multi-harvest. However, the instable QTLs detected may contribute to the persistence increment by recombining (genes pyramiding) unstable QTLs. QTLs studies would allow identifying the genetic basis of G x E in the form of QTLs x E interaction [36]. In addition, further studies on natural biological fixation nitrogen vs. persistence specifically designed for bioenergy purposes must be developed, mainly to keep a favorable energy balance and reduce costs with N fertilizer input. In this way, breeders will be able to rationally deal with the factors that determine persistence by using these factors in their favor. Moreover, they will breed cultivars that will be adopted in bioenergy industries, mainly due to the increase in persistence.  Table. ASReml output of all models tested. (DOCX) S1 Code. ASReml code. (DOCX)