Genomic selection for productive traits in biparental cassava breeding populations

Cassava improvement using traditional breeding strategies is slow due to the species’ long breeding cycle. However, the use of genomic selection can lead to a shorter breeding cycle. This study aimed to estimate genetic parameters for productive traits based on pedigree (pedigree and phenotypic information) and genomic (markers and phenotypic information) analyses using biparental crosses at different stages of selection. A total of 290 clones were genotyped and phenotyped for fresh root yield (FRY), dry matter content (DMC), dry yield (DY), fresh shoot yield (FSY) and harvest index (HI). The clones were evaluated in clonal evaluation trials (CET), preliminary yield trials (PYT), advanced yield trials (AYT) and uniform yield trials (UYT), from 2013 to 2018 in ten locations. The breeding stages were analyzed as follows: one stage (CET), two stages (CET and PYT), three stages (CET, PYT and AYT) and four stages (CET, PYT, AYT and UYT). The genomic predictions were analyzed via k-fold cross-validation based on the genomic best linear unbiased prediction (GBLUP) considering a model with genetic additive effects and genotype × location interactions. Genomic and pedigree accuracies were moderate to high (0.56–0.72 and 0.62–0.78, respectively) for important starch-related traits such as DY and FRY; when considering one breeding stage (CET) with the aim of early selection, the genomic accuracies ranged from 0.60 (DMC) to 0.71 (HI). Moreover, the correlations between the genomic estimation breeding values of one-stage genomic analysis and the estimated breeding values of the four-stage (full data set) pedigree analysis were high for all traits as well as for a selection index including all traits. The results indicate great possibilities for genomic selection in cassava, especially for selection early in the breeding cycle (saving time and effort).


Introduction
Cassava (Manihot esculenta Crantz) is an important starchy root crop grown predominantly in the tropics that serves as a source of raw material for industrial applications and a food source for millions of people, mainly in tropical and developing regions [1,2]. Moreover, it is the world's second most important source of starch [3], as its underground roots contain more than 80 percent starch by total dry weight [4]. Cassava is a rustic crop; it is believed to have an acclimation mechanism since it has no great environmental requirements and can grow in a wide range of climates and regions [5]. However, the average root yield is far below its potential due to its inadequate management and the use of unimproved varieties. The use of improved varieties is one option to increase the production of the crop without increasing the planting area. Cassava breeding is based on recurrent phenotypic selection, taking advantage of vegetative propagation [6], and includes several stages of selection [7].
In general, for some traits, the improvement rate in cassava using traditional breeding strategies is slow due to the combination of several problems related to the biology of the species, such as poor flowering, a long breeding cycle, limited genetic diversity and a low multiplication rate of planting materials [8]. By virtue of the low multiplication rate of cassava, it takes several years to obtain enough planting material to conduct replicated multi-location evaluations [9]. Usually, the selection cycle requires one to two years to produce botanical seeds of the clones to be tested and four to six years of field evaluations, commonly divided into selection stages such as clonal evaluation trials, preliminary yield trials, advanced yield trials and uniform yield trials [6,10]. Selection decisions are made during this process to reduce the number of genotypes to be evaluated in replicated multi-location trials. However, traditional breeding strategies are still very demanding in terms of financial resources because cassava cultivation is highly labor-intensive and field costs can account for up to ninety percent of total costs [11].
When made possible by advancements in genotyping methods together with reduced costs per sample, information obtained at the molecular level has been used in genetic analyses to accelerate the effective selection of clones. Among molecular markers, single nucleotide polymorphisms (SNPs) stand out for broad genome coverage. SNPs occur as single nucleotide differences between individuals, and thousands of SNP markers are widely used in genome analysis [12]. According to [13], obtaining a catalog of SNPs segregating within farmers' varieties can be used to accelerate breeding programs to improve cassava's quality, addressing nutrition and plant disease concerns. [14] have proposed genomic selection (GS) to identify individual candidates for selection through molecular markers that are in linkage disequilibrium with loci associated with quantitative traits of interest. Thus, GS combines molecular and phenotypic data in a training population to predict the genomic estimated breeding values (GEBV) of individuals in a testing population that has only been genotyped [15]. The main advantage of genomic selection for cassava breeding programs is the ability to predict the agronomic potential of clones at an early stage and with high accuracy, thus reducing the generation interval [16]. This can accelerate the recurrent selection method [2]. Although the results are promising for some traits, we are currently in the early stages of genomic selection for cassava [10].
This study aims to (i) estimate genetic parameters (variance components, heritability and accuracy) from genomic (phenotypes and marker information) and pedigree (phenotypes and pedigree information) predictions for agronomic traits related to starch production, at different stages of a cassava breeding program; (ii) integrate clones' breeding values into a selection index; (iii) obtain the clones' genomic estimated breeding values through a genetic additive effect model with genotype × location interactions via Genomic Best Linear Unbiased Prediction (GBLUP); and (iv) compare the one-stage genomic analysis and the four-stage pedigree analysis to verify the feasibility of performing selection at the early stages of the breeding program using GS predictions.

Plant material
A total of 290 cassava clones from the Cassava Breeding Program (CBP) of Embrapa Mandioca e Fruticultura (Cruz das Almas, Brazil) were genotyped and phenotyped for some important agronomic traits. The set of clones consisted of populations of full-and half-siblings obtained from 30 biparental crosses; commercial clones were used as parents and control trials.

Experimental design and phenotypical evaluation
Cassava clones were evaluated at different selection stages: clonal evaluation trials (CET), preliminary yield trials (PYT), advanced yield trials (AYT) and uniform yield trials (UYT), in field trials carried out from 2013 to 2018 in ten locations within Bahia State that included Cruz das Almas, Santo Amaro and Laje ( Table 1).
The CET plots each consisted of a single row with 5-8 plants per plot, in an augmented block design of 6-19 blocks, with 3-11 checks as a control (with the exception of one field trial that was a complete randomized block design with 2 reps). PYT plots contained two rows with 20 plants per plot, in a complete randomized block design with 2 reps and 4-8 checks as a control. AYT plots contained four rows with 36 plants per plot, in a complete randomized block design with 3 reps and 3-5 checks as a control. Finally, UYT plots included four rows with 80 plants per plot, in a complete randomized block design with 3 reps and 3-6 checks as a control.
Planting was performed from May to July (during the rainy season). Spacings between rows and plants were 0.90 and 0.80 meters, respectively. All trait management was performed, whenever necessary, in accordance with the technical recommendations and standard agricultural practices for cassava [17]. Plants were harvested 11-12 months after planting. All plants in the plots were evaluated.
The traits evaluated were as follows: fresh root yield (FRY, in t ha -1 ), dry matter content by specific gravity method (DMC, %), measured using the gravimetric method described by [18] and expressed in %; dry yield (DY, in t ha -1 ), determined by multiplying FRY by DMC; fresh shoot yield (FSY, in t ha -1 ) and harvest index (HI, %). HI is the proportion of fresh root weight in total biomass.

Genotyping and data quality control
The genotyping of the 290 clones was performed at Cornell University. The DNA was extracted from the leaves of cassava accessions using cetyltrimethylammonium bromide according to the protocol of [19]. Then, the DNA concentration was adjusted to 60 ng/μl, and the samples were sent to the Genomic Diversity laboratory for library preparation, sequencing and bioinformatics analyses. The genotyping by sequencing (GBS) protocol is described in detail by [20]. The DNA was digested with an ApeKI enzyme, a type II restriction endonuclease that recognizes a degenerate five-base sequence and creates a 5' overhang (3 bp). The linkage between ApeKI-cut genomic DNA and adapter was made after the digestion of the samples and the 192-multiplex samples for sequencing. The Genome Analyser 2000 (Illumina, Inc., San Diego, CA, USA) was used to perform the GBS. Data quality control was performed for markers; indels and markers with minor allele frequency (MAF) less than 0.0415-calculated by the formula 1 ffi ffi ffi ffi 2N p , where N is the number of clones [21]-and a call rate of less than 0.90 were removed from the analyses. The GBS coverage was approx. 72%. Excluding the indels, the dataset consisted of 209,316 SNP markers. After quality control, the final dataset used to perform the genomic analyses consisted of 51,259 SNPs, with mean heterozygosity of 0.10 and mean missingness of 3.34%. The missing data were imputed by the heterozygote [22].

Pedigree and genomic statistical analyses
Pedigree and genomic analyses were performed. The pedigree analyses were based on the pedigree information, whereas for the genomic analyses, a genomic kinship matrix was constructed based on the information provided by the 51,259 SNP markers. The clones were evaluated at the four selection stages (CET, PYT, AYT and UYT) for FRY, DMC, DY, FSY and HI. Statistical analyses were performed by splitting the dataset by the stages of the breeding program and analyzing subsets of data with different numbers of stages. The breeding stages were analyzed as follows: one stage (CET), two stages (CET and PYT), three stages (CET, PYT and AYT) and four stages (CET, PYT, AYT and UYT) ( Table 2). This division was made to assess the differences in the genetic parameters that appeared with the addition of new trials and therefore new combinations of year, location and block within location, taking into consideration the unbalance common in this type of system. The overall aim was to verify the possibility of performing selection at an earlier stage of a breeding program (CET). The models used for the pedigree (P) and genomic analyses through markers (M) are as follows, respectively: where y is the vector of phenotypic observations and b is the vector of fixed effects considering the combination of year, location and block within location added to the mean. The vectors a P and a M refer to the random additive genotype effects, a P~N (0, s 2 aP A) in the pedigree analysis and a M~N (0, s 2 aM G) in the genomic analysis, where A is the pedigree relationship matrix, G is the additive genomic relationship matrix, s 2 aP is the genetic variance estimate associated with a P and s 2 aM is the genetic variance associated with a M . The random effect vectors a LP and a LM refer to the genotype × location interaction effect, a LP~N (0, s 2 alP A L ) in the pedigree analysis, and a LM~N (0, s 2 alM G L ) in the genomic analysis, where A L is the block-diagonal of each location pedigree relationship matrix, G L is the block-diagonal of each location genomic relationship matrix, s 2 alP is the variance estimate due to genotype × location interactions with phenotype and s 2 alM is the variance estimate due to genotype × location interactions with marker information. X, Z 1 and Z 2 are the incidence matrices for fixed effects and the random effects of genotypes and genotype × location interactions, respectively. ε represents the residual vector (ε~N(0, s 2 e I)), where s 2 e is the residual variance estimate. We estimated heritability as the ratio of the genetic variance to the sum of the genetic variance, the variance due to the genotype × location interaction and the variance of the residuals (total variance) in each model; the interaction's coefficient of determination was estimated by the ratio of the variance due to the genotype × location interaction and the total variance.
The mixed model equations for best linear unbiased prediction (BLUP) (1) are as follows: where A is the pedigree relationship matrix and A L is the block-diagonal of An i matrices of pedigree relationship matrix in each location: where n i = [4,5,7,10] is the number of locations and i = [1,2,3,4] is the number of stages of evaluation.
The size of A is 290 × 290, regardless of the number of stages included in the analysis. The size of the A L matrix is 457 × 457, 581 × 581, 698 × 698 and 757 × 757 when one, two, three and four stages are included in the analysis, respectively.
The equations of mixed genomic best linear unbiased prediction (GBLUP) models (2) [23] are given as follows: where and G is the genomic relationship matrix, p i and q i are the allele frequencies of marker i, and M is the markers incidence matrix centered by the mean 2p i . G L is a block-diagonal matrix of Gn i matrices in each location: The SNP effects (m) are calculated as follows: The pedigree and genomic analyses were performed using the methodology of mixed models via REML/BLUP, which combines the genetic values of BLUP (best non-biased linear prediction; in genomic analysis it is called GBLUP because it depends on the G matrix) prediction procedure with the REML (residual or restricted maximum likelihood) estimation of variance components. A Likelihood Ratio Test (LRT) and deviance analysis were undertaken to test the random effects, which were tested using the chi-square statistic [24].
The cross-validation method used was the ten-fold, and the clones were randomly assigned to each fold. The training set, composed of 9 of the 10 subsets, was used to estimate marker effects and the remaining subset was the validation set. These marker effects estimates were used to predict the genomic breeding values of the validation set individuals. This process was repeated until all 10 subsets had served as the validation population once. Trait estimates of predictive ability, accuracy and bias were calculated from cross-validation with the training set for pedigree and genomic analyses. The selection efficiency measures were estimated at each fold, and the value presented in this study is the mean of the folds.
The predictive ability was given by the correlation coefficient between predicted genetic values (PGVs) and predicted genetic values added to the mean of the residuals in the validation population. PGVs were the estimated breeding values (EBVs) for the pedigree analysis and the genomic estimated breeding values (GEBVs) for the genomic analysis. Predicted genetic values added to the mean of the residuals for each clone can be interpreted as pseudo-phenotypes (PPs). We adopted this type of cross-validation due to the considerable imbalance of the dataset. For example, the control clones (some of which were parents of other clones to be tested) were replicated at several locations and through the years, whereas many other clones to be tested were replicated only a few times (due to the limited planting material) (S1 Dataset).
The accuracy, one of the main measures to compare models and methods in genomic selection, was calculated as the ratio between the predictive ability and the square root of the phenotypic trait heritability. This measure indicates how accurate the model is in estimating genetic values.
The PPs were linearly regressed on the PGVs, and the regression coefficientb PP;PGV was used to measure the degree of bias of the PGV prediction. The bias relates to the size of the absolute differences between clones' predicted genetic values and their pseudo-phenotypes.
The estimated magnitude of these differences can be quantified by theb PP;PGV regression coefficient and can be overestimated (b PP;PGV < 1) or underestimated (b PP;PGV > 1). A regression coefficient equal to one indicates no bias. Then, here we will represent bias as one unit minus the regression coefficientb PP;PGV (Bias = 1Àb PP;PGV ). Statistical analyses were performed using the package sommer [25] in the software R [26].

Selection index
After obtaining the EBVs and GEBVs of the clones, we calculated an empirical selection index (SI) that integrates five relevant traits, assigning them weights based on the breeding program's objective:

Pedigree analyses
The set of clones presented genetic variability for all evaluated traits (p � 0.01). In relation to the genotype × location interaction effect (GxL), when only one stage was evaluated, the clones presented different behaviors in relation to environmental changes for FRY, DY and FSY (p � 0.01); however, when four stages were included in the analysis, the additive GxL interaction effect was significant (p � 0.01) for all of the traits (Table 3). Trait heritabilities ranged from 0.42 to 0.73, i.e., from moderate to high. DMC and HI were the traits with the highest heritabilities (ranging from 0.69 to 0.73 and from 0.56 to 0.60, respectively), while heritability of the other traits varied across the analyzed data sets ( Table 4). The variation of the genetic parameters with the increment of selection stages occurs due to the addition of phenotypic observations, and therefore information that comes from different experimental designs. Successive stages of selection gradually reduce the number of genotypes as plot-size and locations increase [6]. Trait heritabilities were higher when the full data set (four-stage) was analyzed, with values of 0.52, 0.50, 0.52 and 0.60 for FRY, DY, FSY and HI, respectively, with the exception of DMC, which exhibited higher heritability when one stage was analyzed (0.73). DMC and FSY were the traits with the lowest and highest coefficients of determination of the interaction, respectively. The pedigree means varied from 29.24-31.81 t ha -1 , 33.41-33.54%, 9.63-10.61 t ha -1 , 12.25-13.65 t ha -1 and 70.62-73.86% for FRY, DMC, DY, FSY and HI, respectively, depending on the number of stages included in the analysis.
The coefficient of residual variation (CVe), which is a measure of dispersion and relative variability, ranged from 0.04 to 0.82 across traits and datasets (number of stages included), being higher for FSY (0.62-0.82) and lower for DMC (0.04-0.05). In general, for all traits, the CVe decreased when a greater number of stages was included in the analysis.
Trait predictive abilities ranged from 0.42-0.54, 0.45-0.61, 0.46-0.63 and 0.53-0.61 when one, two, three and four stages were included in the analyses, respectively. For FRY, the predictive ability was 0.44, 0.51, 0.49 and 0.53, when one, two, three and four stages were included in Table 4. Estimates of heritability, mean, coefficient of variation (CVe), predictive ability, accuracy and bias for pedigree analyses. the analyses. For DY, the predictive ability also increased continuously (except for a small reduction in the three-stage analysis), with values of 0.42, 0.51, 0.50 and 0.53 when one, two, three and four stages were included in the analyses. That increasing pattern occurred for all traits with the exception of HI, for which the predictive ability did not alter much, ranging from 0.50 to 0.53. Trait accuracies ranged from 0.62-0.69, 0.67-0.78, 0.69-0.75 and 0.69-0.77 when one, two, three and four stages were included in the analyses, respectively (Table 4). FRY, DMC, DY and FSY presented accuracies greater than 0.70 when more than one stage was included in the analyses (two-, three-and four-stage datasets). Accuracies above 0.70 are very suitable from the selection practice point of view. According to [27], accuracy values between 0.70 and 0.90 are classified as high precision and values above 0.90 as very high precision. The increase in accuracy and predictive ability may reflect a greater experimental reliability of the stages included in the analyses. When we think of the one-stage analysis, with CET only, this field trial was performed with single rows with 6-8 plants per plot, differing from the subsequent stages, which included replications and were performed with 2-4 rows and a greater number of plants per plot. FRY and DY presented higher accuracies when four stages were included in the analysis compared to when only one was evaluated. For FRY, these values were 0.63 and 0.73, and for DY, they were 0.62 and 0.75. For the other traits, those differences in accuracy were of approximately the same amplitude, except for HI, for which the accuracy was 0.69 whether one or four stages were included in the analyses.
The bias ranged from -0.06 (DMC, three stages) to 0.09 (HI, two stages), with the lowest amplitude (highest bias value minus lowest bias value) occurring when four stages were evaluated. In general, the traits presented little bias, ranging from -0.03 to 0.07 for FRY, from -0.06 to 0.01 for DMC, from -0.03 to 0.05 for DY, from -0.02 to 0.07 for FSY and from 0.08 to 0.09 for HI. DY presented zero bias when four stages were included in the analysis. HI was the trait that presented the highest values of bias; therefore, the estimates of the genetic values for HI were slightly overestimated.
Indirect selection is an option when there is a linear relationship between two traits and one trait either has low heritability or is difficult to measure compared to the other trait. For example, FSY is relatively easier to measure than root-related traits, and it had moderate positive correlations with FRY (0.41-0.48) and DY (0.43-0.49) and negative and moderate to high correlations with HI ((-0.59)-(-0.62)). FRY was highly and positively correlated with DY (0.99) ( Table 5). HI was negatively correlated with DMC (-0.31 and -0.19 in the one-stage and four-stage analyses, respectively). The correlations between vectors of estimated breeding values obtained through the different analyses (with a different number of stages of selectionone and four stages) are presented in the diagonal of Table 5. The high correlations, ranging from 0.84 to 0.88, indicate good correspondence between the one-stage and four-stage EBVs, which might allow early selection.

Genomic analyses
The set of clones presented genetic variability for all evaluated traits (p � 0.01), as was shown for the pedigree analyses. The GxL interaction effect was significant (p � 0.01) for all traits with the exception of DMC ( Table 6).
The traits presented moderate to high heritability, ranging from 0.38-0.48, 0.66-0.69, 0.36-0.46, 0.41-0.51 and 0.47-0.60 for FRY, DMC, DY, FSY and HI, respectively (Table 7). In general, heritability estimates obtained in the genomic analyses were lower than those obtained by the pedigree analyses. However, the differences in heritabilities estimated through pedigree and genomic analyses were smaller when a larger number of stages of selection was included in the analysis. DMC was the trait that presented the highest heritabilities, whereas DY presented the lowest heritabilities, regardless of the dataset evaluated. As was found in the pedigree analyses, in the genomic analyses DMC and FSY were the traits with the lowest and highest coefficients of determination of the interaction, respectively.
In general, the predictive ability was higher when four stages were included in the analysis, except for HI, for which it decreased. For FRY, DMC and DY, the predictive abilities were higher in the four-stage analyses (0.50, 0.59 and 0.51, respectively) and lower in the one-stage analyses (0.43, 0.51 and 0.43, respectively); for FSY, the predictive ability was higher in the four-stage analysis (0.50) but lower in the two-stages analysis (0.44) compared to one stage. For HI, predictive ability was also higher for the one-stage analysis (0.54) than for the twostage analysis (0.42).
The  (Table 7). Across the analyses based on different datasets (with a different number of stages included), accuracy estimates varied (but they were at least greater than 0.56). As an example, for DY, the accuracy values were 0.63, 0.72, 0.67 and 0.72 when one, two, three and four stages were included in the analyses. For HI, the scenario was quite different, with a large reduction in accuracy from one to two stages, with values of 0.71, 0.56, 0.62 and 0.61 when one, two, three and four stages were included. The inclusion of Table 5 more information and the imbalance common for this type of system (more stages-selection included-fewer clones-more locations in the latter stages) may have affected the traits differently in relation to parameters of genomic selection precision. In general, the traits presented no or very little bias, with the exception of HI when two, three or four stages were included in the analyses. As shown in the pedigree analysis, the genetic values for HI were overestimated. The bias in the genomic analyses varied slightly more, in terms of amplitude, when compared to the biases from the pedigree analyses (they ranged from -0.02 to 0.27 for genomic analyses and from -0.06 to 0.09 for pedigree analyses).

. Trait correlation estimates between vectors of EBVs from pedigree analysis, above the diagonal considering the four stages of evaluation, and below the diagonal considering only one stage of evaluation; along the diagonal is the correlation between vectors of genetic values through the different analyses of stages of selection.
The estimates of the correlations based on the GEBVs (Table 8) were similar to the estimates of the correlations based on the EBVs ( Table 5). The most expressive differences of trait correlations presented in Tables 5 and 8 were between the trait pairs FRY and FSY, FRY and HI, DY and FSY, DY and HI, and DMC and HI. FRY and DY were highly and positively correlated (0.99 with one-stage and four-stage datasets). HI and FSY were negatively correlated, with values of -0.62 and -0.71 when one and four stages were included in the analyses, respectively. The differences in correlation with respect to the stage of the breeding program may be explained by the fact that the relationships among traits change through the breeding program due to selection [6]. The diagonal of Table 8 shows the correlation between vectors of GEBVs through the different analyses (with different numbers of stages of selection-one and four stages). These values (ranging from 0.80 to 0.86) indicate good correspondence between the vectors of GEBVs from one-and four-stage analyses, which may be evidence in favor of early selection. The SNP effects for the five traits, considering one and four stages of evaluation, are shown in Fig 1. Comparing the one-stage with the four-stage evaluation graphs, one can see differences in terms of which chromosome the higher-effect SNPs were located on and also in the magnitude of the effects resulting from the addition of extra information from one stage to four stages of evaluation. These differences could be explained by the fact that in the analyses that included only one stage, the clones were evaluated at four locations, while in the analyses that included the four stages of selection, the clones were evaluated in ten locations and the amount of individual information doubled ( Table 2).

Comparisons
If the results of the one-stage and four-stage evaluations are comparable, it would verify the possibility of using the results from genomic analysis to select clones at the initial stage of a cassava breeding program. To perform this comparison, we compared the rankings of the top 10 clones based on the one-stage genomic analysis (aiming at a reduction of the selection interval) and the four-stage pedigree analysis (complete data set) for each trait as well as for the selection index. We also calculated the correlations between GEBVs (one-stage analysis) and EBVs (fourstage analysis), which were high for all traits, ranging from 0.82 (FRY and DY) to 0.87 (FSY).
For FRY, all clones in the top 10 ranking based on the one-stage genomic analysis were from the cross of Fécula Branca and BRS Formosa, indicating that their progeny represent an important full-siblings family for this trait. There was a good correspondence between the GEBVs from the one-stage genomic analysis and the EBVs from the four-stage pedigree analysis, with a correlation of 0.82 (S1 Table). Many of the clones in the two top-10 rankings coincided (eight out of ten, from the 'Fécula Branca x BRS Formosa' crossing). In the top 10 based on the four-stage pedigree analysis, a clone from the cross between BGM-1683 and Fécula Branca also stood out.
There was a good correspondence between GEBVs and EBVs for DMC (correlation of 0.83), and the one-stage and four-stage top 10 lists shared five clones. Progeny from the cross between Fécula Branca and BRS Formosa also stood out for this trait, as well as crossings between the clones Equador72 and BGM-0728, Equador72 and BGM-1124, Cascuda and BGM-0728, BGM-1662 and Fécula Branca and the BRS Mulatinha self-crossing (S2 Table).
For DY, all of the clones in the rankings originated from the cross between Fécula Branca and BRS Formosa, and there was a good correspondence between the GEBVs and the EBVs (correlation of 0.82) (S3 Table). There was a high coincidence between the clones presented in the top 10 rankings for the traits FRY and DY (nine out of ten clones), which was already expected since those traits were highly correlated (0.99). Fresh shoot yield was the trait that presented the highest amplitude in the top 10 ranking list. The amplitude of GEBVs (one-stage genomic analysis) for the top 10 clones was 20.90; for EBVs (four-stage pedigree analysis), the amplitude was 8.62. This trait also presented a high correlation between the GEBVs and the EBVs (0.87) (S4 Table).

Table 8. Trait correlation estimates between vectors of GEBVs from genomic analysis, above the diagonal considering the four stages of evaluation, and below the diagonal considering only one stage of evaluation; along the diagonal is the correlation between vectors of genomic values through the different analyses of stages of selection.
The same pattern was observed for HI, which presented a correlation of 0.84 between GEBVs and EBVs. Moreover, for this trait, all clones in both rankings were from the crossing of Fécula Branca and BRS Formosa (S5 Table).
Assuming that it is a good idea to select clones based on their GEBVs obtained from the one-stage genotypic analysis, the genetic gains from selecting the top 10% of clones (29 clones) were calculated for each trait. The genetic gains were as follows: 15.91% for FRY, 5.15% for DMC, 14.76% for DY, 43.11% for FSY, and 8.47% for HI.
For the selection index, the correlation between GEBVs and EBVs was 0.83 (Table 9). Comparing the top 10% of the clones based on the rankings from these analyses, it was observed that there was a coincidence of 72.41% (21 out of 29 clones). Several clones from Fécula Branca × BRS Formosa crosses presented good performance, as previously noted for individual traits. However, clones from a considerable number of other crosses were also highlighted, such as Equador 72 × Eucalipto, Equador 72 × BGM-0728, BGM-1683 × Fécula Branca, and BRS Formosa × BRS Jari. The genetic gains made by selecting the top 10% of clones based on SI (calculated from the GEBVs of the one-stage analysis) were estimated as follows: 15.13% for FRY, 0.39% for DMC, 14.38% for DY, 24.97% for FSY, and 1.85% for HI.

Discussion
Cassava breeding is usually based on a recurrent phenotypic selection with several stages. For some traits, the improvement rate in cassava is slow due to a combination of several problems related to the biology of cassava. Genomic selection could be one promising approach to reduce the selection interval by predicting the agronomic potential of clones earlier in the process [16,15,10].
In our study, we evaluated 290 cassava clones from biparental crosses, which were genotyped and phenotyped for productive traits. Our major objective was to characterize the set of clones, determining genetic parameters relying on pedigree and marker information, as well as to compare the results of one and four selection stages to verify the feasibility of genomic selection at the program's early stage. We found substantial genetic variation among the 290 cassava clones for all traits evaluated. Moreover, they showed intermediate to high heritability, and consequently, good progress can be made in selecting clones for those traits. The majority of traits presented significant genotype × location interaction effects (p � 0.01), with the exception of DMC (one-stage pedigree analysis and one-and four-stage genomic analyses) and HI (one-stage pedigree analysis).
The coefficient of determination of the interaction represents how much of the total variation is explained by the genotype × location interaction. According to our results, DMC and HI appear less impacted by genotype × location interactions and their impacts than the other traits (FRY, DY and FSY).
FRY and DMC are important agronomic traits for cassava, and the mean trait values found in the present work were consistent with the results of [28]. They evaluated 627 genotypes in a clonal trial, and their estimates for FRY ranged from 9.64 to 63.66 t ha -1 (mean of 34.37 t ha -1 ) and for DMC from 20.65 to 41.25% (mean of 33.47%). In the study by [6], they performed

)), FSY (fresh shoot yield (t ha -)) and HI (harvest index (%)), considering only one and four stages of evaluation.
https://doi.org/10.1371/journal.pone.0220245.g001 successive stages of phenotypic assessment of cassava adapted to the sub-humid environment, and the observed means for FRY, DMC and HI were 24.00 t ha -1 , 33.57% and 58%, respectively. In the present study, when only the one-stage pedigree analysis (CET) was assessed, we found means of 29.24 t ha -1 , 33.49% and 70.62%, for FRY, DMC and HI, respectively (Table 4).
Dry matter content is usually what determines the price paid to cassava producers since it is directly related to the starch yield and, consequently, its several derivative products [29,30]. Despite its importance for starch yield (SY), the correlations between DMC and DY (which is a redundant trait for SY since the correlation between SY and DY is 0.99 (data not shown)) in the present work were low, ranging from 0.04 to 0.11 (Tables 5 and 8), which differs from the values reported by [30], where the genetic and phenotypic correlations for DMC and SY were 0.36 and 0.42, respectively.
Harvest index is commonly considered an indirect cassava yield indicator based on reports of high correlation between HI in single row trials and FRY performance in replicated plots in Table 9. Genomic selection in cassava breeding populations multi-location trials [31,3]. In our work, the correlations between HI and FRY were moderate, ranging from 0.26 to 0.41 (Tables 5 and 8). HI and FSY are negatively correlated traits. It is important to be aware that plants with high HI and low FSY, even with high-yielding roots, are undesirable because they produce little propagating material. In this case, a high HI may not reflect a high-yielding root, but a low production of fresh shoot yield. It is important to seek a balance between the production of roots and shoot [32]. The total dataset consisted of clones from biparental crosses and commercial clones (used as parents and controls); therefore, there were half-sibling and full-sibling populations (clones sharing one or two parents, respectively). According to [15], using GS to achieve a shorter interval cycle is predicted to be favorable within full-sibling families because biparental populations have high linkage disequilibrium (LD) between marker alleles and trait alleles with no group structure.

Genomic analysis-One stage
All traits presented moderate to high heritability, which shows great potential for obtaining genetic gains. Traits with higher heritability tend to show higher accuracy, but this was not the case in some situations for the present work. For example, in one-stage genomic analysis, FSY and HI presented the highest accuracies (0.69 and 0.71, respectively), but the heritability of DMC (0.66) was higher than those of FSY and HI (0.45 and 0.47, respectively; Table 7). As reported by [33], those discrepancies can presumably be caused by the proximity of SNP markers to the QTL (quantitative trait loci) and the genetic architecture. In pedigree and genotypic analyses, DMC was the trait that presented the highest heritability, which might be related to the fact that DMC was the only trait that did not present a significant (p � 0.01) genotype × location interaction effect (Table 6). According to [33], the genotype × location effect and the relatedness among genotypes affect the precision of genomic selection estimates.
Using only the most informative SNPs, [16] observed an increase in the accuracy for the traits FRY, DMC and SY. In their work, 358 cassava accession were evaluated using 390 marker SNPs, and broad heritabilities (obtained by REML analysis of phenotypic data) of 0.72, 0.67 and 0.69, respectively, were found. For FRY, DMC and SY, they obtained accuracies of 0.31, 0.20 and 0.33, respectively. When they selected markers, the accuracies were 0.76, 0.67 and 0.77 for FRY, DMC and SY, respectively, with predictive abilities of 0.38, 0.30 and 0.39. In the present work, in our four-stage genomic analysis, we observed predictive abilities for FRY, DMC and DY of 0.50, 0.59 and 0.51, respectively (Table 7), without selecting markers, taking into consideration 51,259 SNPs. The heritabilities for FRY, DMC, FSY and HI were considerably higher (ranging from 0.46 to 0.69; four-stage, Table 7) than those reported by [33], which ranged from 0.11 to 0.28. [33] adopted three different cross-validation (CV) approaches: CV without close relatives, random fivefold CV and CV with close relatives. Accuracies varied from 0.46 to 0.48 for DMC, from 0.43 to 0.48 for HI, from 0.36 to 0.41 for FRY, and from 0.23 to 0.30 for FSY; the lowest accuracy value was obtained using CV without close relatives, and the highest was obtained by random fivefold CV. The genomic accuracies that we have presented in this study varied from 0.60-0.70, 0.56-0.71, 0.61-0.69 and 0.68-0.72 for DMC, HI, FRY and FSY, respectively. It is possible that the values presented here were higher because we adopted a random CV approach and because we had a considerable number of related clones.
[10] analyzed data from three African cassava research institutions and assessed the accuracy of seven prediction models for seven traits. Those authors reported that some trait-dataset combinations exhibited better predictive accuracies than others. For example, for DMC, the estimated accuracies varied from 0.27 to 0.66 across the evaluated breeding programs. According to [34] and [15], when a large number of loci controls the trait, genomic accuracy depends on several factors, among them the training population size, trait heritability and genetic diversity and its relationship with the test population.
The comparisons between the top 10 clone rankings of the one-stage genomic analysis and the four-stage pedigree analysis provided insights about the feasibility of using the clonal evaluation trials' GEBVs for early selection. This approach would reduce the selection interval and consequently minimize the extensive, costly and time-consuming phenotypic evaluations that are usually conducted in a traditional cassava breeding program [35]. Among the traits, the correlation between GEBVs (one-stage genomic analysis) and EBVs (four-stage pedigree analysis) ranged from 0.82 to 0.87, and between 50 and 90% of their top 10 rankings coincided. The clones that originated from the 'Fécula Branca x BRS Formosa' crosses stood out for all of the traits as well as for the selection index. This indicates that this specific combination is favorable for the improvement of cassava productive traits. The feasibility of using genomic selection at the early stages of evaluation is further supported by the fact that the accuracy estimates obtained for the genomic analysis that included only the clonal evaluation trial (one stage) ranged from 0.60 (DMC) to 0.71 (HI). The harvest index, which was the trait that presented the highest accuracy in the one-stage genomic analysis, is an important trait for cassava breeding programs as it is usually used as an indirect indicator of cassava yield.
The genetic gains, calculated separately for each trait (based on the selection of the top 10% of clones by GEBVs obtained by means of the one-stage genotypic analysis), represent great perspectives for cassava breeding (genetic gains of 15.91% for FRY, 5.15% for DMC, 14.76% for DY, 43.11% for FSY, and 8.47% for HI, respectively). [30] calculated the genetic gains for DMC and SY, selecting 30 out of 471 clones. They obtained genetic gains of 10.75 and 5.50% for DMC and 74.62 and 49.95% for SY when comparing the overall averages of the experiments and controls, respectively. On the other hand, the genetic gains when selecting the best 10% of clones based on the proposed selection index (calculated with GEBVs from the onestage genotypic analysis) were as follows: 15.13% for FRY, 0.39% for DMC, 14.38% for DY, 24.97% for FSY, and 1.85% for HI. Although the selection gains were lower when based on the selection index compared to the selection by trait individually, with the selection index, it is possible to improve key traits simultaneously [6].
It is worth mentioning that although many of the clones with high selection index values come from the Fécula Branca × BRS Formosa crossing, many others came from other crossings. According to [7], this variability weakens the identity of families and supports the idea that outstanding hybrids can be obtained from essentially every family. This idea arises from the large within-family segregations (due to the fact that progenitors are generally heterozygous) that cassava breeders observe in their evaluation fields.
Traditional assessments based solely on phenotypic information are still very effective and work well for cassava breeding since most of the traits have high heritability. However, here, we wanted to present genomic selection as an option and a great tool to obtain accurate estimates by assessing clones' genomic estimated breeding value earlier in the cassava breeding cycle, thus saving resources such as time, planting area, and direct and indirect costs. The evaluated clones that stood out in the present work could be new varieties to be released in the future and/or could be used as a germplasm source for the development of new varieties with high allele frequencies for starch-related productive traits.

Conclusions
The heritability and accuracy values obtained from genomic and pedigree analyses were moderate to high for important traits related to starch production. The results indicate excellent potential for breeding and genomic selection in these cassava populations aimed at increasing starch production for commercial use. In addition, they indicate an optimal potential for selection at early stages of a cassava breeding program (CET), since the correlations between the GEBVs of one-stage genomic analysis and the EBVs of the four-stage pedigree analysis were high for all traits. Moreover, the GEBV-based and EBV-based rankings of the top 10 best clones overlapped by 5 to 9 clones for individual traits, and the rankings of the 10% best clones exhibited 72% overlap for the selection index. The estimated genetic gains obtained by using the 10% best clones identified by the selection index with the GEBVs from the one-stage genomic analysis were 15.13%, 0.39%, 14.38%, 24.95% and 1.84% for FRY, DMC, DY, FSY and HI, respectively. Thus, genomic selection enables effective early selection in the first stage of clonal evaluation. (DOCX) S1 Dataset. i) estimated breeding values (EBVs) and genomic estimated breeding values (GEBVs) based on the pedigree and genomic analysis, respectively, for the different breeding stages were analyzed as follows: one stage (Clonal evaluation trial-CET), two stages (CET and preliminary yield trial-PYT), three stages (CET, PYT and advanced yield trial-AYT) and four stages (CET, PYT, AYT and uniform yield trial-UYT); ii) phenotypic raw data; iii) SNPs effects for the two breeding atges (one and four), and iv) SNPs location on the cassava chromosome.