Phenotypic and molecular characterization of sweet sorghum accessions for bioenergy production

Sweet sorghum [Sorghum bicolor (L.) Moench] is a type of cultivated sorghum characterized by the accumulation of high levels of sugar in the stems and high biomass accumulation, making this crop an important feedstock for bioenergy production. Sweet sorghum breeding programs that focus on bioenergy have two main goals: to improve quantity and quality of sugars in the juicy stem and to increase fresh biomass productivity. Genetic diversity studies are very important for the success of a breeding program, especially in the early stages, where understanding the genetic relationship between accessions is essential to identify superior parents for the development of improved breeding lines. The objectives of this study were: to perform phenotypic and molecular characterization of 100 sweet sorghum accessions from the germplasm bank of the Embrapa Maize and Sorghum breeding program; to examine the relationship between the phenotypic and the molecular diversity matrices; and to infer about the population structure in the sweet sorghum accessions. Morphological and agro-industrial traits related to sugar and biomass production were used for phenotypic characterization, and single nucleotide polymorphisms (SNPs) were used for molecular diversity analysis. Both phenotypic and molecular characterizations revealed the existence of considerable genetic diversity among the 100 sweet sorghum accessions. The correlation between the phenotypic and the molecular diversity matrices was low (0.35), which is in agreement with the inconsistencies observed between the clusters formed by the phenotypic and the molecular diversity analyses. Furthermore, the clusters obtained by the molecular diversity analysis were more consistent with the genealogy and the historic background of the sweet sorghum accessions than the clusters obtained through the phenotypic diversity analysis. The low correlation observed between the molecular and the phenotypic diversity matrices highlights the complementarity between the molecular and the phenotypic characterization to assist a breeding program.


Introduction
The current policy in several countries, including Brazil, is to promote research and development on renewable energy sources [1][2][3]. Many countries are pursuing to increase the participation of biofuels in its energy mix and consequently to reduce carbon dioxide emissions in the atmosphere by decreasing the burning of fossil fuels [4]. In Brazil, sugarcane stands out as a feedstock for ethanol production [1,5], but the country has difficulty to meet its domestic demand, especially in the sugarcane off-season. Sweet sorghum [Sorghum bicolor (L.) Moench] is a type of domesticated sorghum characterized by the accumulation of high levels of sugar in the stems and high biomass production, making this crop an important alternative for bioethanol production, and cogeneration of energy [3,[6][7][8]. Sorghum is a grass of African origin that resembles sugarcane, a close relative. Thus, sorghum juice can be easily extracted to produce ethanol in the same distilleries that process sugarcane. In addition, the sorghum harvest can be carried out during the sugarcane off-season just prior to the beginning of sugarcane processing, benefiting the ethanol industry. Besides these advantages, sweet sorghum cultivars that are insensitive to photoperiod have a vegetative cycle ranging from 90 to 130 days, much shorter than that of sugarcane [9][10][11][12].
Sweet sorghum accessions were introduced into the United States, from China and Africa, 150 years ago. The variety Chinese Amber was the first introduction of sweet sorghum into the USA, in 1853. Several cultivars came from Africa, such as Orange, Sumac, Redtop, Gooseneck, Texas Seed Cane Ribbon, Honey and White African [13]. The center of sorghum domestication is in central Africa, and the highest levels of genetic and phenotypic diversity in both cultivated and wild sorghum are found in this region [14]. Other cultivars were introduced over the years, such as Collier, South Africa, Mclean, Australia, and others of unknown origin, such as Folger, Coleman, Sugar Drip, and Rex [15,16]. Most modern sweet sorghum varieties were developed in the period 1940-1983 with support from the United States Department of Agriculture (USDA) and the Sugar Crops Field Station, located in city of Meridian, Mississippi. Landraces, i.e. inbred lines considered native to Africa, were used in several studies for the genetic improvement of sweet sorghum. In the 1850s, the main goal was to use sweet sorghum for the production of syrup, which reached about 136 million liters in 1946, replacing crystal sugar during World War II [17,18]. The focus was to develop materials for syrup production with disease resistance, high soluble solids content (Brix), good purity (high sucrose) and quality of sugars in the juice. The landraces MN 960, MN 1048, MN 1054, MN 1056, MN 1060 and MN 1500 were widely used in the early breeding programs in the United States [19].
Sorghum was introduced into Brazil in the early twentieth century, mostly through initiatives and efforts of research institutes and universities [20,21]. In 1976, influenced by the National Alcohol Program (in portuguese, Programa Nacional do Á lcool-Pro-Á lcool), Embrapa Maize and Sorghum initiated a research program for sweet sorghum cultivar development and feasibility studies for ethanol production, especially for use in small distilleries [20] to supply liquid fuel for agriculture expansion in the Central-west region of Brazil. However, the sweet sorghum breeding program was put on hold in the mid 1980 0 s with a modified government policy that focused the incentives only for large distilleries. Embrapa 0 s sweet sorghum breeding program was reactivated in 2008 following the guidelines of the Brazilian National Agro-Energy Plan (PNA 2006-2011) [22]. For the production of bioenergy, the main objectives of a sweet sorghum breeding program are to improve the quantity and quality of sugars in the extracted juicy from the stems and to increase green biomass productivity. A high-potential sweet sorghum cultivar should have the following features: high biomass yield capacity, lodging resistance, high percentage of extractable juice, high content of soluble solids in the juice, high purity of sugars, resistance to major diseases and tolerance to drought and waterlogging [23].
Sorghum is a species that exhibits a diverse set of agronomic and morphological characteristics [24,25]. Harlan and de Wet [26] classified sorghum into five major races: bicolor, caudatum, durra, guinea and kafir and 10 other hybrid races which are combinations of the basic races. This classification is simple and primarily based on morphological features of panicle and grain. However, sweet sorghum has not been bred for panicle or grain characteristics, and there are few insights about its origin. Therefore, the relationship between sweet sorghum and the traditional classification of major sorghum races is inconsistent [19] [27]. Sweet sorghum varieties have been developed using sweet sorghum introductions, both germplasm bank accessions and landraces, many of them originally used for grain or forage production [28]. Genetic diversity studies can be very useful for sweet sorghum breeding programs, in which understanding the relationship between accessions is essential to define breeding strategies and to identify superior parents for the development of new cultivars [19,27,[29][30][31][32].
Several strategies have been used to access genetic diversity in many crop species [33][34][35][36][37] based on morphological, agronomic, molecular, geographical and biochemical differences among accessions. Over the years, studies have dealt in estimating genetic diversity in cultivated sorghum based solely on phenotypic traits [38][39][40]. Even though phenotypic characterization provides a range of information about the genetic variability among accessions in a germplasm bank, the effects of environment, genotype-by-environment interaction, and measurement errors also contribute to the observed differences [41,42]. Thus, some authors have reported that the combined use of molecular markers and phenotypic traits could be advantageous to quantify the genetic differences among accessions [33,43,44]. However, few studies have assessed genetic diversity in sweet sorghum using morpho-agronomic traits and molecular markers simultaneously [32,45]. Wang et al. [32] accessed the genetic diversity of 142 sweet sorghum parent lines used in the hybrid breeding program of Heilongjiang Academy of Agricultural Sciences (Harbin, China) based on agronomical traits and simple sequence repeat (SSR) markers, and concluded that both tools should be considered simultaneously for the diversity analysis in hybrid breeding programs. Other studies have compared different types of cultivated sorghum using genetic diversity analyses based on molecular marker [19,29]. For example, Murray et al. [19] investigated the genetic relationship between sweet and grain sorghums using SSR and single nucleotide polymorphism (SNP) markers and Ritter et al. [29] used amplified fragment length polymorphism (AFLP) markers to access and to compare the level of genetic diversity between sweet and grain sorghums.
The use of molecular markers in genetic diversity analyses has some advantages over the phenotypic characterization, since molecular markers are not influenced by the environment and allow identification of differences in the DNA level that would be imperceptible via phenotyping [46][47][48]. Different molecular markers have been widely used to access genetic diversity in sorghum [27,35,[49][50][51][52][53]. However, SNP markers have some advantages, for example local specificity, codominance, abundance along the genome, and potential for high throughput analysis. Recently, the costs and processing time were dramatically reduced by a variety of high-throughput SNP genotyping platforms, which offered the possibility of using abundant SNP markers as routine activities of breeding programs [42,54]. For example, genotypingby-sequencing (GBS) [55] has provided new opportunities for breeders with cost-effective genome-wide scanning and multiplexed sequencing platforms [54,56]. Therefore, molecular markers are an excellent tool to efficiently assess the genetic diversity in a breeding program.
The aims of this study were: i) to perform phenotypic and molecular characterization of 100 sweet sorghum lines from the germplasm bank of the Embrapa Maize and Sorghum breeding program, using morpho-agronomic traits and SNP markers obtained via GBS; ii) to examine the relationship between the phenotypic and molecular diversity matrices; iii) and to infer about the population structure in the sorghum accessions.

Plant material
One hundred sweet sorghum accessions (S1 Table) from the germplasm bank of the Embrapa Maize and Sorghum breeding program were used. These sorghum accessions were classified as historical lines, modern lines and landraces according to the genealogy and historic background available in the GRIN (Germplasm Resources Information Network) database [57]. The historical lines (HL) are those developed and used between 1850 and the early 1900 0 s, and frequently have unknown origin and lack of concrete information about genealogy [19]. Modern lines (ML) correspond to those that have been genetically improved and have pedigree information. Landraces are those accessions collected in Africa and Asia that are often phenotypically diverse, but may exhibit some genetic similarity. The landraces were classified as LIS (Landrace World Collection-ICRISAT sorghum collection), LMN (Landrace Meridian Mississippi-USDA sorghum collection) and LSSM (Landrace Sorghum Seed Montpelier-CIRAD sorghum collection). The lines were not classified according to the races due to the inconsistent relationship previously detected between sweet sorghum and the traditional classification of major sorghum races.

Phenotypic data
For phenotypic characterization, morphological traits related to the plant architecture, stem, leaf, panicle and caryopsis, and agro-industrial traits related to the production of sugars and biomass were used. The morphological traits were selected according to the list of Sorghum bicolor descriptors for cultivar registration purposes, based on the "Instructions for the Execution of Distinctness Tests, Homogeneity and Stability of Sorghum Cultivars" [58], which resulted in a total of 44 descriptors (S2 Table). Morphological characterization was performed in a greenhouse without replication, conducted at Embrapa Maize and Sorghum, in Sete Lagoas, State of Minas Gerais, Brazil (19˚28' 57'' south latitude and 44˚14' 48'' west longitude). Agro-industrial traits of economic importance for bioenergy production were characterized in a field experiment with one-hundred lines in a 10 x 10 lattice design with three replications, with plots of four rows of five meters (m) long and 0.70 m between rows. The following traits were evaluated: days to flowering (FLOW, in days after sowing) in which 50% of the plants in a plot started the pollen liberation; plant height (PH, in meters) as an average in each plot, measured from the soil surface to the top of the panicle; fresh biomass yield (FBY, in t.ha -1 ), weighing all plants from the effective plot area; juice extraction (EXT, in %), using hydraulic press, from five to eight plants sampled randomly per plot, without panicles; total soluble solids (TSS, in˚Brix) in the extracted juice, using hydraulic press, with the use of a digital automatic refractometer; sucrose concentration in the juice (POL, in %), which is the measure of the amount of sucrose in the sugar mixture; reducing sugars in the juice (RSJ, in %), in which the weight of juice was calculated through the equation adapted from CONSECANA [59]; lignin (LIG, in %), hemicellulose (HEM, in %) and cellulose (CEL, in %) were measured following the sequential extraction method proposed by Van Soest and Wine [60], using samples of the stalk after juice extraction, which were dried in an oven for 72 hours at 65˚C. The field experiment was planted in December 2013, in the experimental area of Embrapa Maize and Sorghum in Sete Lagoas, State of Minas Gerais, Brazil. The cultural treatments were those recommended for sweet sorghum crop [61].

Molecular markers data
Leaf samples were collected from five plants per accession and the DNA extraction was performed using the Dneasy 1 Plant Mini Kit (QIAGEN, Germantown, Maryland, USA). The quality and quantity of extracted DNA was checked in agarose gel and NanoDrop 1 ND-1000 Spectrophotometer. Genotyping-by-sequencing (GBS) [55] was performed by the Institute for Genomic Diversity at Cornell University. Genomic DNA was digested individually with ApeKI, and the bar-coded DNA samples were pooled and sequenced in a HiSeq2000 platform (Illumina Inc., San Diego, California, USA). Sequencing data were separated for each accession and aligned to the BTx623 Sorghum bicolor reference genome [62,63] version 2.1, using the Burrows-Wheeler Aligner (BWA) software [64]. SNPs were called using the GBS pipeline available in the software TASSEL [65]. Subsequently, SNP markers were filtered considering a minor allele frequency (MAF) of 5% and a maximum of 5% of missing genotypes per locus.

Phenotypic analyses
For the morphological traits, a correlation analysis was performed based on the Pearson 0 s correlation coefficient [66] to identify highly correlated variables. Pairs of variables exhibiting correlation coefficient greater than 0.80 had one of the variables removed from the diversity analysis.
Agro-industrial phenotypic data were analyzed using the following mixed model, in which the number of days to flowering (FLOW) and the plant height (PH) were used as covariables: where y ijk is the random phenotypic effect of the genotype i at block j, in replication k; μ is the general mean; d ik is the number of days to flowering for the genotype i, in replication k, and β is the corresponding fixed effect; h ik is the plant height for the genotype i, in replication k, and γ is the corresponding fixed effect; r k is the fixed effect of replication k; b jk is the random effect of block j, in replication k, with b jk~N ð0; s 2 b Þ; g i is the random effect of genotype i, with g i~N ð0; s 2 g Þ; ε ijk is a random non-genetic effect, with ε ijk~N (0,σ 2 ). FLOW and PH have direct effect on the phenotype of other agro-industrial traits related to the production of sugar and biomass. The correction for these phenological covariables was necessary for identify accessions that have favorable phenotypes for the production of sugar and biomass independently of their flowering time and plant height. Thus, this correction was considered when fitting the model for: fresh biomass yield (FBY), juice extraction (EXT), total soluble solids (TSS), sucrose concentration in the juice (POL), reducing sugars in the juice (RSJ), lignin (LIG), hemicellulose (HEM) and cellulose (CEL). Random and fixed effects in the model were tested using the likelihood ratio test (LRT) [67] and the Wald test [68], respectively, considering a 5% significance level. The adjusted means of each line for the agro-industrial traits were obtained via best linear unbiased predictor (BLUP) [69,70]. Variance components were estimated via residual maximum likelihood (REML) [71,72]. Heritabilities were calculated as: where s 2 G is the genetic variance; s 2 R is the residual variance; and r is the number of replications. All mixed models analyses were performed using the software GenStat v15 [73]. Then, phenotypic correlation between morphological and agro-industrial traits was estimated based on the Pearson's method, using the R package Hmisc (R Core Team 2015).

Diversity analyses
Genetic diversity analyses in the sweet sorghum accessions were conducted separately using the phenotypic and the molecular data. Initially, for the phenotypic data, all morphological and agro-industrial variables were standardized. Then, the dissimilarity matrix between lines was calculated using the Average Euclidean distance [74]. The relative contribution of each morphological and agro-industrial trait for the diversity analysis was evaluated based on the Mahalanobis distance (D 2 ), according to the method proposed by Singh [75], using the software Genes [76]. Subsequently, genetic distances between the sweet sorghum accessions were calculated based on the SNP data using the identity-by-state (IBS) coefficient [77] in the software TASSEL. This measure of similarity takes into account the number of identical alleles, whether or not inherited from a common ancestor. Based on the phenotypic and the molecular dissimilarity matrices, two separate cluster analyses were performed through the Neighbor-Joining method [78] using the software DARwin [79]. Different clusters of sweet sorghum accessions were identified according to the nodes present in the Neighbor-Joining trees. The Mantel test [80] was performed, using the software Genes, to test the significance of the correlation between the phenotypic and the molecular dissimilarity matrices, considering ten thousand random permutations and a 5% significance level. Averages of the agro-industrial traits were estimated for each cluster obtained through the phenotypic and the molecular diversity analysis, and were compared using the Duncan 0 s test [81] at a 5% significance level. In addition, a principal component analysis (PCA) [82] was performed, based on the molecular similarity matrix, in order to infer the population structure in the sweet sorghum accessions, using the package pcaMethods for the R software [83], available at the Bioconductor software [84].

Phenotypic traits
After correlation analysis performed for the 44 morphological traits, 11 variables that showed high correlation with another variable, were not included in the diversity analysis (r > 0.80). The remaining 33 descriptors used in the diversity analysis are listed in the S2 Table, where additional information about all morphological traits is presented. Most of the correlations between the 33 descriptor traits were very low and not significantly different from zero (Fig 1). Only a few pairs of descriptors exhibited correlations greater than 0.3 and significantly different from zero, considering a 5% significance level, for example: PCA/PFLA (0.  Table).
Genetic variances were significant, using the likelihood ratio test at a 1% significance level, for all agro-industrial traits (Table 1), indicating the existence of genetic variability among the 100 sweet sorghum lines. The variance of blocks was not significant for the fresh biomass yield (FBY) and the reducing sugars in the juice (RSJ). The fixed effects in the model were tested using the Wald test. The phenological covariable days to flowering (FLOW) was significant for the following response variables: fresh biomass yield (FBY), total soluble solids (TSS,˚Brix), sucrose concentration in the juice (POL), hemicellulose (HEM) and RSJ (Table 1). Plant height (PH) was only significant as a phenological covariable for FBY and cellulose (CEL) and the fixed effect of replication was only significant for the total soluble solids (TSS) and the sucrose concentration in the juice (POL). The heritability varied from 0.62 to 0.92 for RSJ and FLOW, respectively (Table 1). According to the results of the correlation between morphological and agro-industrial traits (Fig 1) the highest values of correlation, considering a 5% significance level (S3 Table), were observed among the agro-industrial traits, for example: CEL/HEM (0.

Molecular markers
After raw GBS sequence data processing, a total of 403,433 SNP markers distributed along the ten sorghum chromosomes were obtained, varying from 21,823 to 71,557 SNPs for the chromosomes 8 and 1, respectively (S4 Table). Then, SNP data was filtered for a minor allele frequency (MAF) of 5% and a maximum of 5% of missing genotypes per locus, resulting in a total of 40,206 polymorphic SNPs, which varied from 2,327 to 7,019 SNPs for the chromo-  Table). Genetic diversity Based on the morphological and the agro-industrial traits, the Neighbor-Joining method resulted in the identification of five major clusters of sweet sorghum lines (Fig 2). The cluster I-P was the most homogeneous off all clusters and consisted of 32 lines, mostly CMSXS lines derived from the Embrapa Maize and Sorghum breeding program, except for CMSXS624 and CMSXS604, which have different parents than the other CMSXS lines and grouped in the clusters II-P and V-P, respectively. The lines Theis, Wray, Brandes and Rio, which were used as parents of most of these CMSXS lines (see S1 Table), were also grouped in the cluster I-P. CMSXS627 and Keller Crystal Drip exhibited a high relationship, which is in agreement with the CMSXS627 genealogy. Dale, considered a modern line, grouped together with one of its parents, Tracy. Most of the lines grouped in the cluster I-P were classified as modern lines, except for the landrace MN4423 and the historical lines Early Folger, Soave and Sirri. The cluster II-P was the most heterogeneous of all clusters, in which most of the lines do not have information about genealogy and historic background, and consisted of 23 lines: 11 historical lines, 3 modern lines, 1 modern line Embrapa (CMSXS624), 3 landraces IS and 5 landraces MN. In this cluster, Taguaíba is a historical line collected in Brazil probably introduced from Africa, which grouped relatively distant from the other lines due to the fact that it did not flower in consequence of the photoperiod during cultivation. Most of the landraces IS and all the landraces SSM were grouped in the cluster III-P. This cluster consisted of 20 lines: 12 landraces IS, 3 landraces SSM and 5 landraces MN. The cluster IV-P was the smallest one, with only 10 lines: 4 modern lines (Sart, Ramada, Roma and Norkan) and 6 landraces MN. The cluster V-P was also heterogeneous with 15 lines: 9 historical lines, 1 modern line (White Sourless), 1 modern line Embrapa (CMSXS604) and 4 landraces MN. The morphological traits that greatly contributed to the diversity study were: leaf angle (8.61%), juice quality (8.21%), plant color (7.66%), pigmentation coleoptile by anthocyanin (6.29%), stalk succulence (6.11%), stalk diameter (5.43%), grain color (3.74%), glume color (3.54%) and panicle shape (3.51%), with a  total contribution of 53.1%. The agro-industrial traits that greatly contributed were: juice extraction (12.07%), cellulose (11.39%), plant height (11.33%), reducing sugars (10.47%), sucrose concentration in the juice (10.38%), total soluble solids (10.35%), lignin (9.38%), days to flowering (9.17%), fresh biomass yield (7.9%) and hemicellulose (7.59%).
The Neighbor-Joining method, using the SNP markers data, resulted in 6 major clusters of sweet sorghum lines (Fig 3). The cluster I-M consisted of 23 lines with a composition very similar to the cluster I-P, including most of the CMSXS lines, except for CMSXS604 and CMSXS624 that grouped in the clusters III-M and V-M, respectively. Wray, Brandes, Rio and Theis were also grouped in this cluster.  lines were grouped in the cluster VI-M, which consisted of 24 lines. Dale and Norkan, considered as modern lines, also grouped in this cluster with one of its parents Tracy and Atlas, respectively. Other modern lines (Williams, White Sourless and Brawley) and 1 landrace IS (IS2232) were also in the group VI-M.
The phenotypic and the molecular diversity matrices exhibited a low correlation coefficient (0.35, significant at a 1% significance level) obtained via the Mantel Test, which is in agreement with the inconsistencies observed between the clusters formed by the phenotypic and the molecular diversity analyses. The clusters obtained by the molecular diversity analysis were more consistent with the genealogy and the historic background of the sweet sorghum accessions than the clusters obtained through the phenotypic diversity analysis.
The population structure revealed by the principal component analysis (PCA) based on the SNP markers data was also consistent with the genealogy and the historic background of the sweet sorghum lines (Fig 4). In this analysis, the first (PC1) and second (PC2) principal components explained 13.67% and 7.74%, respectively, of the genetic variability observed in the sweet sorghum lines (Fig 4). As expected, the PCA results were more consistent with the clusters obtained by the Neighbor-Joining method using the SNP markers data when compared to the phenotypic data.
The distribution of the agro-industrial traits were showed for all clusters obtained through the molecular and the phenotypic diversity analysis, respectively (Fig 5). For most of the traits,  Table). According to the averages of the agro-industrial traits obtained for each cluster formed by the molecular diversity analysis (S5 Table), the groups III-M and VI-M exhibited significantly different averages and interesting phenotypes for several agro-industrial traits. For example, averages of 49.12 and 46.26 t.ha -1 of fresh biomass yield, 60.37 and 63.06% of juice extraction, 13.99 and 14.30˚Brix of total soluble solids, 9.08 and 8.30% of sucrose concentration, 1.44 and 1.56% of reducing sugars in the juice were observed for the clusters II-M and VI-M, respectively. On the other hand, based on the phenotypic diversity analysis, the clusters III-P and V-P showed satisfactory consistency with the lines genealogy and the historic background and also interesting averages for several agro-industrial traits. For example, averages of 44.74 and 51.36 t.ha -1 of fresh biomass yield, 57.57 and 65.49% of juice extraction, 14.31 and 16.01˚Brix of total soluble solids, 9.45 and 9.03% of sucrose concentration, 1.69 and 1.46% of reducing sugars in the juice were observed for the clusters III-P and V-P, respectively (S5 Table). Besides the interesting phenotypes for bioenergy production, these clusters formed by the molecular (II-M and IV-M) and the phenotypic (III-P and V-P) diversity analyses exhibited considerable genetic divergences with the I-M and I-P clusters, respectively, which were composed by most of the CMSXS (Embrapa) sweet sorghum lines. Thus, these results of the molecular and the phenotypic diversity analyses can be combined and used to identify potential lines to be introduced in the Embrapa Maize and Sorghum breeding program focusing on bioenergy.

Discussion
Genetic diversity and population structure analyses in this collection of sweet sorghum accessions provided important information to define breeding strategies and to identify superior parents for the development of new sorghum cultivars focusing on bioenergy production. The clusters obtained by the molecular diversity analysis were more consistent with the genealogy and the historic background of the sweet sorghum accessions than the clusters formed by the phenotypic diversity analysis. SNP markers have revealed valuable information about the relationship among the sweet sorghum accessions, especially for those with unknown genealogy and historic background, allowing the identification of potential parents to be used in the Embrapa Maize and Sorghum breeding program focusing on bioenergy production. The lack of consistency between the clusters identified by the phenotypic diversity analysis and the genealogy and the historic background of the sweet sorghum accessions can be attributed to the large genotype-by-environment interaction effect commonly observed for morphological and agro-industrial traits of quantitative inheritance. Therefore, molecular markers combined with the phenotypic characterization of sweet sorghum accessions should be used to investigate the genetic diversity of potential lines to be introduced in a breeding program.
The low correlation between the phenotypic and the molecular diversity matrices should not be considered as a limitation to access the genetic diversity but as an indicative of the complementarity of these tools [46,48,85]. Most of the variation detected by molecular markers is commonly of the non-adaptive type and therefore not subject to natural and/or artificial selection, different from the phenotypic traits which are mostly subject to natural and/or artificial selection [43]. Several studies have also reported a lack of consistency between phenotypic and molecular distances in different species, such as pepper [48], cotton [85], wheat [49,86], maize [87,88], barley [89], ryegrass [90] and Avena sterilis [91]. In sorghum, some authors have found low correlations between the genetic distances estimated by molecular markers and by phenotypic traits [32,51]. For example, Geleta et al. [51] conducted a genetic diversity study in a collection of 45 sorghum accessions, using morphological data and SSR markers, and found a low but significant correlation (r = 0.19, p < 0.01) between the phenotypic and the molecular diversity matrices. These authors stated that it is possible to obtain a relevant minimum subset of markers that can be used in combination with morphological data to better classify genotypes. Furthermore, their study indicated that, although the phenotypic characterization is time-consuming and greatly influenced by the environment, in general, it is a significant and practical way to make progress in the evaluation of sorghum germplasm. Wang et al. [32] conducted a genetic diversity study using 142 sweet sorghum lines and also found a low but significant correlation (r = 0.45, p < 0.01) between molecular and phenotypic diversity matrices, concluding that the clusters of accessions formed based on the SSR markers data did not coincide with the clusters based on the phenotypic data, suggesting that the molecular diversity analysis provided better results. According to Geleta et al. [51] and Singh et al. [43], the best way to identify divergence among genotypes is the combined use of phenotypic and molecular data, since these tools provide complementary results.
Breeding populations exhibiting high genetic variability are required for the success in selecting individuals with favorable genotypes for a given trait [30]. The knowledge about the genetic relationship among inbred lines is useful to maintain the genetic variability as well as to identify promising parental combinations to create segregating populations in a breeding program [92]. According to the averages of agro-industrial traits, lines grouped in different clusters identified through the molecular diversity analysis, with favorable phenotypes for bioenergy production and exhibiting considerable genetic divergences with the CMSXS lines, could be suggested as potential parents to be introduced in the development of improved lines and/or hybrids in the Embrapa Maize and Sorghum breeding program. For example, the following sweet sorghum inbred lines may be interesting to these purposes: Georgia Blue Ribbon, Rosso Lombardo, Atlas, Ellis Sorgo, Rex, Sourless, MN4509, MN4508, MN1030, MN4581, SSM1123, IS15443, IS15752, IS2787 and IS16044.
Expressive contribution of the morphological traits was observed (53.1%) to the phenotypic diversity analysis, and these traits represent a simple way of measuring genetic diversity while studying genotype performance under normal growing conditions [41]. Moreover, most of the morphological traits used in this study have a qualitative inheritance, whose expression is not strongly influenced by the environment [93]. Other studies also highlighted the contribution of morphological traits to diversity analyses in sweet sorghum. For example, Gerrano et al. [93] used six AFLP primer combinations and nine qualitative morphological traits to study the genetic diversity in 17 sorghum accessions, and concluded that the morphological traits were able to distinguish accessions and that the molecular markers complemented the analysis to separate closely related individuals. Grenier et al. [94] evaluated 45 sorghum accessions using ten qualitative morphological traits, and observed a wide morphological diversity in the evaluated germplasm, which contributed to cluster the genotypes according to each geographic region of Ethiopia and Eritrea.
Some sweet sorghum accessions used in this study were also previously used in other genetic diversity studies [19,30]. For example, 33 out of the 125 sweet sorghum accessions evaluated by Murray et al. [19] were also used in this study, and presented similar clustering patterns. The accessions Brandes, Keller, M81E, Rio, Wiley, Wray and Sart (Modern Lines) were grouped in the same cluster as also observed through the molecular diversity analysis performed in this study. Moreover, the accessions MN1056, MN960, MN1060, MN1500 (Landraces MN), Iceberg, Ellis Sorgo, Mclean, Kansas Orange, Atlas, Sugar Drip, White African and Sacalline (Historical Lines) were also grouped in the same cluster by Murray et al. [19]. Similar clustering patterns were also observed by Ali et al. [30] for the accessions Dale, Tracy, White African, Kansas Orange, Rox Orange, Williams, Iceberg, Early Folger, Rio, Keller, Roma and Ramada.
Accessing the genetic diversity of potential parental lines by phenotypic and molecular characterization can provide valuable information in order to help breeders to identify promising crosses in a commercial hybrid breeding program [95]. Sweet sorghum has emerged as an ideal feedstock for bioethanol production to exploit alternative bioenergy. Indeed, significant genetic potential exists in the sweet sorghum germplasm collection [31]. The wide genetic variability observed in this study for brix, sucrose concentration in the juice, stalk and biomass yield indicate a high potential for the development of high-yielding sweet-stalked high-sucrose sweet sorghum lines [9]. Breeders can select parental lines grouped in different phenotypic and molecular clusters to perform higher heterotic crosses, since it is expected to occur higher levels of heterosis between clusters than within clusters [32]. This study also indicated that the assessment of the genetic diversity using molecular markers is indispensable and complementary to the phenotypic characterization. Thus, in order to obtain improved sweet sorghum hybrids with high level of heterosis, breeders should simultaneously select parental lines from different clusters based on agro-industrial traits and molecular marker data.

Conclusion
Phenotypic and molecular characterization revealed the existence of considerable genetic variability between the sweet sorghum accessions from the Embrapa Maize and Sorghum breeding program. The clusters obtained by the molecular diversity analysis were more consistent with the genealogy and the historic background of the sweet sorghum accessions than the clusters identified in the phenotypic diversity analysis. The population structure revealed by the PCA based on the SNP markers data was consistent with the genealogy, the historic background of the sweet sorghum lines, and, as expected, with the clusters obtained by the Neighbor-Joining method using the SNP markers data. A low correlation was observed between the molecular and the phenotypic diversity matrices, which highlight the complementarity between the molecular and the phenotypic characterization to assist a breeding program.  Table. SNP markers used for molecular characterization. Number of SNPs before and after data filtering for a minor allele frequency (MAF) of 5% and a maximum of 5% of missing genotypes per locus, final chromosome coverage and final marker density. (DOCX) S5 Table. Averages and standard deviations (SD) of the agro-industrial traits obtained for each cluster identified through molecular and phenotypic diversity analyses. (DOCX)