Unraveling the genetic structure of Brazilian commercial sugarcane cultivars through microsatellite markers

The Brazilian sugarcane industry plays an important role in the worldwide supply of sugar and ethanol. Investigation into the genetic structure of current commercial cultivars and comparisons to the main ancestor species allow sugarcane breeding programs to better manage crosses and germplasm banks as well as to promote its rational use. In the present study, the genetic structure of a group of Brazilian cultivars currently grown by commercial producers was assessed through microsatellite markers and contrasted with a group of basic germplasm mainly composed of Saccharum officinarum and S. spontaneum accessions. A total of 285 alleles was obtained by a set of 12 SSRs primer pairs that taken together were able to efficiently distinguish and capture the genetic variability of sugarcane commercial cultivars and basic germplasm accessions allowing its application in a fast and cost-effective way for routine cultivar identification and management of sugarcane germplasm banks. Allelic distribution revealed that 97.6% of the cultivar alleles were found in the basic germplasm while 42% of the basic germplasm alleles were absent in cultivars. Of the absent alleles, 3% was exclusive to S. officinarum, 33% to S. spontaneum and 19% to other species/exotic hybrids. We found strong genetic differentiation between the Brazilian commercial cultivars and the two main species (S. officinarum: Φ^ST = 0.211 and S. spontaneum: Φ^ST = 0.216, P<0.001), and significant contribution of the latter in the genetic variability of commercial cultivars. Average dissimilarity within cultivars was 1.2 and 1.4 times lower than that within S. officinarum and S. spontaneum. Genetic divergence found between cultivars and S. spontaneum accessions has practical applications for energy cane breeding programs as the choice of more divergent parents will maximize the frequency of transgressive individuals in the progeny.

Introduction Sugarcane has a worldwide economic importance, with Brazil being the largest producer, harvesting over 657.2 million tons of sugarcane in 2016/17 [1]. In addition to its importance as a food, sugarcane is increasingly more important for ethanol production, and electricity cogeneration currently accounts for 4% of Brazilian electric energy demand [2]. Brazil's position as a world leader in sugarcane production is attributed to not only climate conditions and adequate agronomical management practices, but also to the high performance of Brazilian commercial sugarcane cultivars, which have been developed by the country's three main sugarcane breeding programs (RIDESA: Interuniversity Network for the Development of the Sugarcane Sector; CTC: Sugarcane Technology Center; IAC: Agronomic Institute of Campinas).
A major breakthrough for sugarcane came with the discovery that the Saccharum genus occasionally yields viable offspring with superior phenotypes via interspecific crosses [4]. Artificial interspecific hybridizations started between the 19 th and 20 th century in Asia by Dutch cane breeders [5] and basically, two main species were used in this approach: S. officinarum, known as "noble cane", a decaploid (2n = 80, x = 10) with thick, juicy stalks rich in sugar but highly susceptible to several biotic and abiotic stresses, and S. spontaneum, an octaploid (2n = 40-128, x = 8) with no significant sugar content but highly resistant, vigorous [6,7], and efficient in carbon fixation due to its high photosynthetic rate [8]. The first interspecific hybrids were repeatedly backcrossed to S. officinarum accessions to recover the sucrose content in a process known as nobilization [5]. Therefore, approximately 80% of the genome of current sugarcane cultivars came from this ancestor species, 10-15% from S. spontaneum and the remaining 5-10% being recombinant chromosomes [9]. Other Saccharum species such as S. robustum, S. barberi and S. sinense, although used less extensively in the first artificial interspecific hybridizations, also contributed to the genome of modern sugarcane cultivars [10]. It should be noted that sugarcane cultivars from the first interspecific hybridizations have impacted the sugarcane industry of the twentieth century, especially the cultivar POJ2878, known as the "wonder cane" and genetically present in the majority of modern sugarcane cultivars [11]. Despite the polyploid nature of sugarcane and the great genetic variability available in the Saccharum germplasm, very few accessions took part in the first interspecific hybrids [12,13] leading to the narrow genetic base of the actual cultivars.
According to the International Union for Conservation of Nature (IUCN), the conservation of genetic diversity is crucial for the maintenance of biodiversity [14,15]. For instance, the exhaustive selection of certain traits over several breeding generations inevitably tends to threaten other traits. Certain genic regions or even alleles and gene subsets can be lost with time. Variability loss and genetic base narrowing directly implies biodiversity loss and intensifies the risk of susceptibility to an unexpected disease or pest occurrence [16]. To preserve genetic variability and biodiversity, germplasm banks are maintained by breeding programs for the conservation of genetic resources for use in future crossings, especially in genetic introgression programs [17]. There are several sugarcane germplasm banks in the world, such as the World Collection of Sugarcane and Related Grasses (WCSRG) in Miami, FL, USA and the Research Centre of Kannur in Coimbatore, India. In Brazil, one of the biggest germplasm banks in the world is maintained by CTC in Camamu, BA, Brazil [18]. These banks encompass a huge amount of accessions of Saccharum species and related genera including those collected in the sugarcane first expeditions, acting on an internationally collaborative level [19].
Undoubtedly, the genetic variability present in most of the germplasm banks is underexploited. Sugarcane germplasm banks have thousands of accessions, which complicate logistics and planning of future crosses. Organization and classification are key factors for the functionality of germplasm banks. Although morphological descriptors have an important role in accession classification and organization, errors may occur once environmental conditions affect phenotypes. Obtaining the molecular profile of the accession, which is a more reliable way to identify a given individual [20,21], is a complement to its morphological description. In addition, several molecular markers have been applied to investigate and measure the genetic diversity at the molecular level of sugarcane accessions and also to trace genetic relationship among Saccharum species [22,23]. Microsatellites or Single Sequence Repeats (SSRs) are among the most suitable and preferred markers for genetic structure and conservation studies [24]. SSRs are stretches of DNA composed by motifs of one to six base pairs repeated in tandem and flanked by conserved sequences, allowing the amplification through PCR by specific primer pairs [25]. In addition, SSRs are highly polymorphic and reproducible codominant markers, transferable among closely related species and genera [26]. Due to the ploidy nature of sugarcane, microsatellite markers are considered dominant markers once the presence of a given amplified product doesn't account for its dosage [27]. SSRs are promising to establish unique fingerprints for cultivar identification [28]. For sugarcane, a large amount of SSR primer pairs is available for research, either developed from genomic enriched libraries such as those from the International Sugarcane Microsatellite Consortium (ISMC) [29] or from expressed sequence tags (EST) data mining (e.g., SUCEST-Sugarcane Expressed Sequence Tag Database), which are being extensively applied for genetic mapping, diversity studies and fingerprinting [30][31][32].
The narrow genetic base of sugarcane breeding programs and its implications for future sugar yield and productivity gains are concerns among sugarcane breeders around the world [33,34]. However, there is still a considerable genetic variability among the actual sugarcane cultivars [35]. Therefore, investigation into the genetic structure of current commercial cultivars, and the comparison of this structure with the gene pool of the main ancestor species (S. officinarum and S. spontaneum), permit breeders to better manage their crosses and germplasm banks.
The characterization of sugarcane basic germplasm accessions has gained attention because of its utilization by sugarcane breeding programs [10], mainly by those focused on energy cane cultivars development in which S. spontaneum has been used as parental in crosses with commercial cultivars to meet the demand for biomass production [36].
Due to the importance of the sugar and ethanol sector in the Brazilian economy, new sugarcane cultivars are release almost every year by the breeding programs increasing the number of cultivars available for the cane growers. However, the assessment of its genetic structure and hence, the narrow genetic base's impact on the magnitude of the Brazilian sugarcane genetic breeding pool has not yet been completely investigated.
In the present study, the genetic structure of a panel of Brazilian commercial cultivars currently in use by cane growers (which in turn represents the actual gene pool of Brazilian sugarcane breeding programs) was assessed through microsatellite markers and contrasted against a group of basic germplasm composed mainly of S. officinarum and S. spontaneum accessions. Here, we also report a set of twelve SSR primer pairs that, taken together are capable to distinguish sugarcane commercial cultivars and basic germplasm accessions, showing high power in capturing genetic variability and hence allowing the application of molecular markers in a fast and cost-effective way for routinely cultivar identification and management of sugarcane germplasm banks. We found significant contribution of S. spontaneum in the genetic variability of Brazilian sugarcane cultivars. Nonetheless, our data suggest that there is still potentially exploitable allelic richness (approximately 85% S. spontaneum exclusive alleles were not accounted into commercial cultivars) that can be incorporated in the gene pool of Brazilian breeding programs through energy cane cultivar development. This is the first report to investigate the genetic diversity at the molecular level in a huge group of currently planted Brazilian sugarcane cultivars.

Plant material
A total of 137 genotypes encompassing 81 Brazilian commercial sugarcane cultivars and 56 basic germplasm accessions (19 S. officinarum, 26 S. spontaneum and 11 genotypes comprising five Saccharum spp. exotic hybrids and six accessions representing other species-S. barberi, S. robustum and Miscanthus spp.) were collected from the IAC Sugarcane Germplasm Collection maintained at the Sugarcane Center, in Ribeirão Preto, SP, Brazil. For clarity, the last group of 11 genotypes will henceforth be referred to as "other species/exotic hybrids." Most of the S. spontaneum accessions and other species/exotic hybrids was imported from the USDA's World Collection of Sugarcane and Related Grasses (Miami, FL, USA). The 56 accessions, which compose the basic germplasm group, were previously selected, taking into consideration molecular data, as the most divergent, non-redundant and historically relevant accessions of the IAC Fingerprint SSR-based Data Bank, among over 300 others. Some of the historically relevant individuals [13,37] were also chosen according to its contribution in the nobilization process. Clones such as KASSOER, COIMBATORE and CHUNNEE were intensively used in early interspecific hybridizations and are present in the pedigree of several commercial cultivars worldwide, as well as the cultivar POJ2878, because of its remarkable contribution on the development of most of the current cultivars throughout the world [11,38]. A detailed list of all materials is found in S1 Table. DNA extraction and PCR amplification Total genomic DNA was extracted from young leaves or leaf roll tissue according to the CTAB method [39]. PCR reactions were conducted in a 16 μl final volume containing 30 ng of template DNA, 0.2 μM of each forward and reverse primers (labeled with infrared dyes IR700 or IR800), 100 μM of each dNTP, 2.0 mM MgCl 2 , 10 mM Tris-HCl, 50 mM KCl, and 0.5 unit Taq DNA polymerase (Invitrogen, São Paulo, Brazil). Reactions were amplified on a MyCycler Thermal Cycler (BIO-RAD1) thermocycler. For CV (CanaVialis) primer pairs: CV29, CV37, CV38, CV60, CV79, CV94, and CV106 [28,30] thermocycling conditions were performed according to Palhares et al. (2012) [40]. For the SC (Sugarcane) primer pairs: SCB213, SCB312, SCB381, SCB423 and SCC436 [22,41] thermocycling conditions were as follows: denaturation at 94.0˚C for 3 min followed by 30 annealing cycles (94.0˚C for 1 min, annealing temperature specific for each primer for 1 min; extension of 72.0˚C for 1 min) and a final elongation step at 72.0˚C for 5 min. SSR primer pair sequences with respective annealing temperatures and expected allele range are listed in Table 1. Amplification products were separated by electrophoresis on 6% denaturing polyacrylamide gel on a NEN4300 DNA Analyzers (LI-COR Biosciences1). The attributes of the SSR primers as genetic markers (ubiquity, co-dominance and high heterozygosity) [42], allied to its ability in successfully identifying sugarcane genotypes [43] and also the low cost of the technique (in comparison to SNPs, for instance) was considered in their choices.

Genotyping
Due to the polyploid nature of sugarcane, the amplified products were scored as presence (1) or absence (0). The 50-350pb molecular weight standard (LI-COR Biosciences1) was used to calibrate the sizes of the amplified alleles on each run. The gel images were manually scored and revised with Saga GT™ software (LI-COR Biosciences1). This software identifies the amplified product size to build the genotyping matrix.

Data analysis
The Polymorphism Information Content (PIC), Effective Multiplex Ratio (E), Marker Index (MI), Resolving Power (Rp) and the percentage of different profiles (haplotype assessment) were taken as indexes to evaluate the efficiency/discriminatory potential of the 12 SSRs primer set to capture polymorphism and distinguish sugarcane genotypes. The percentage of different profiles refers to the different molecular patterns (fingerprints) generated by each SSR primer pair considering the total number of genotyped individuals with no missing data point. Furthermore, for each SSR primer pair, the frequency of each type of allele and the presence of exclusive alleles (private alleles) were also investigated.
Polymorphism Information Content (PIC) was calculated [44] according to the expression P n j¼iþ1 2p 2 i p 2 j (in which p is the frequency of the j th pattern for marker i, and n is the number of alleles) using the PIC Calculator program [45]. The Multiplex Ratio (n) was estimated as the average number of fragments amplified per genotype while the  Effective Multiplex Ratio (E) was estimated by the expression E = nβ in which n is the multiplex ratio and β the fraction of polymorphic loci [46]. The Marker Index (MI) was obtained by the expression MI = PICxE. The percentage of different profiles refers to the different molecular patterns (fingerprints) generated by each SSR primer pair. The Resolving Power (Rp) for each primer pair was calculated from the genotyping matrix by using the expression Rp = ∑Ib, where Ib is the allele information, i.e., Ib = 1 − (2 × |0.5 − M|), and M the proportion of the total genotypes containing the allele [47]. The frequency of the different allele types (allele frequency) observed for a SSR primer pair was estimated by dividing the number of times that each type of allele occurs by the total number of alleles amplified by the respective SSR primer pair. Genetic diversity (D) was estimate for each group of species and commercial cultivars according to the expression D ¼ 1 À P n i¼1 P 2 ij , in which P ij is the frequency of the j th pattern for marker i, summed across n patterns [48] The analysis of molecular variance (AMOVA) was performed by the Arlequim v.3.5 software [49] to quantify the degree of differentiation and distribution of the genetic variability between and within predefined groups: Brazilian cultivars and basic germplasm; Brazilian cultivars and S. officinarum; Brazilian cultivars and S. spontaneum; S. officinarum and S. spontaneum.
The number of sub-populations (k) and the membership proportion (Q) was inferred by using Structure v.2.3.4 software [50,51]. The k was set from 1 to 10 (k-value), with 10 iterations at a 100,000 burning period and 200,000 Markov Chain Monte Carlo (MCMC) repeats [10]. Values of k and ΔK were obtained by Structure Harvester [52]. Best values for k were confirmed by the CLUMPAK Best k pipeline [53].
The pair-wise dissimilarity was estimated based on the Jaccard similarity coefficient complement (Dissimilarity = 1 -Similarity) [41,54]. The phylogenetic tree was constructed according to the Neighbor-Joining method with 1,000 bootstrapping using the DARwin v.6.0.13 software [55,56] also for the identification of the extreme dissimilarity values. To verify if the number of markers used to estimate the genetic dissimilarities between accesses was adequate in terms of accuracy, the bootstrap resample technique [57] was applied as in Tivang et al. (1994) [58] and Hállden et al. (1994) [59]. The binary matrix of the genotypes was resampled, considering different numbers of markers (in the present research, 3, 6, 9, . . ., 285 markers). For each of the 95-sample size considered, 1,000 bootstraps were performed, and for each bootstrap sample, a dissimilarity matrix was obtained, allowing for the calculation of the average coefficient of variation (CV%) of the estimates for each sample size. An exponential function was adjusted to estimate the number of markers needed to assure that the CV associated with the dissimilarity estimates were lesser or equal to 10%-considered acceptable in this research. The median (not the mean) of the coefficient of variation estimates were used to evaluate the accuracy of the dissimilarity values [60]. All these procedures were performed in RStudio software [61] as in Urashima et al. (2017) [62].
Aiming to detect the presence of some clustering pattern among the evaluated genotypes, Principal Coordinate Analysis (PCoA) [63][64][65] was conducted based on the dissimilarities between each pair of accessions.

Analysis of SSR primer pairs efficiency
The 12 SSR primer pairs generated 285 alleles (markers) among the 137 genotypes. This set of SSR primers produced high quality fingerprints (S1 To assess the ability of the primer pair for distinguishing among genotypes, the Resolving Power was calculated for the total of genotypes or for the cultivars only. For total genotypes, the Resolving Power estimates ranged from 3.149 (CV60) to 11.746 (SCB312), while for the cultivars it varied from 1.778 (CV94) to 10.642 (SCB312).

Allele frequency distribution, exclusive alleles, and genetic diversity
The number of alleles observed for the Brazilian commercial cultivars, S. officinarum, S. spontaneum, and other species/exotic hybrids was 166, 186, 246 and 215, respectively (Fig 1). Among the basic germplasm group, four (1.4%), 39 (13.68%) and 19 (6.67%) were the respective putative exclusive alleles of S. officinarum, S. spontaneum, and other species/exotic hybrids. Four alleles (SCB381.0244, SCC423.0300, CV38.0210, and CV106.0135), representing 1.4% of the total alleles, were exclusive to the commercial cultivars. One hundred and nineteen alleles (41.8%) spread across the basic germplasm accessions were not present in the Brazilian cultivars. Of these, 55 putative exclusive alleles of the basic germplasm group (three from S. officinarum, 33 from S. spontaneum and 19 from other species/exotic hybrids) were not found in the Brazilian cultivars (S2 Table).

Population differentiation and genetic structure
The degree of the genetic differentiation estimate (F ST ) obtained between the Brazilian commercial cultivars and the basic germplasm was 0.179 (P<0.001), which means that 17.91% of the total variation found by the SSRs is distributed between these two groups, while 82.09% is within them. The genetic differentiation estimate between Brazilian commercial cultivars and S. officinarum accessions (F ST = 0.211; P<0.001) was slightly lower than that observed between Brazilian commercial cultivars and S. spontaneum (F ST = 0.216; P<0.001). The smallestF ST value was found when comparing S. officinarum and S. spontaneum accessions (F ST = 0.127; P<0.001) ( Table 3).
According to the structure analysis, 10 cluster subsets resulted for every input k value, which were submitted to Structure Harvester and CLUMPAK's Best K pipeline. The best k value was 2 (ΔK = 318.196, Fig 2A), suggesting that the 137 genotypes can be divided into two groups ( Fig 2B) corresponding to the Brazilian cultivars and basic germplasm. The second most relevant k value is 3 (ΔK = 117.164), suggesting that the group formed by the basic germplasm can be subdivided according to their main species (S. officinarum and S. spontaneum), capturing most of the data structure in three groups (Fig 2C). Bar colors denote the membership proportion for each genotype.

Genetic dissimilarity and phylogenetic analysis
The number of markers used in this research was sufficient to estimate the pair-wise genetic dissimilarity with an acceptable level of accuracy. For all the 285 markers, the CV associated Brazilian sugarcane cultivars genetic structure assessment by SSR markers with the estimates of similarities was 6.86%, and thus were under the threshold previously established (10%). In fact, around 140 markers would be sufficient to obtain a CV average estimate around 10% (S2 Fig). The average dissimilarity values within the cultivars, S. officinarum, S. spontaneum, and other species/exotic hybrids was 0.176, 0.206, 0.248, and 0.285, respectively. The average dissimilarity values between the cultivars and S. officinarum was 0.247, while for between the cultivars and S. spontaneum it was 0.277, and between the cultivars and accessions from other species/exotic hybrids it was 0.272. In relation to the two main sugarcane species, the average dissimilarity between S. officinarum and S. spontaneum accessions was 0.279. The average dissimilarity between the accessions from other species/exotic hybrids and S. officinarum was 0.282, and with S. spontaneum it was 0.278 (Table 4).
When comparing the average dissimilarity values between the commercial cultivars and each single basic germplasm accession, MIDAZ (S. officinarum x S. barberi) was the accession most genetically similar (0.203) to the commercial cultivars and CHUNNEE the most divergent (0.326) (S4 Table).
We observed a low dissimilarity value (0.077) between IAC87-3396 and SP87344, which are full sibs from the cross CO740 and SP701143, for example.
The phylogenetic tree revealed the prevalence of two major groups (cultivars and basic germplasm) which agrees with the K = 2 value (two subpopulations) pointed to by the software Structure. In addition, the accessions were mainly clustered according to their species (S. officinarum and S. spontaneum), with the accessions from other species/exotic hybrids spread among these two groups suggesting the existence of three groups (Fig 3).
Together, the first two principal components (PCs) explained 19.18% of the total variability expressed among genotypes, based on dissimilarity measurement (Fig 4). According to the first PC, cultivars are grouped in a cluster isolated from the others. Looking at the second PC, S. officinarum accessions form a distinct group, while S. spontaneum and the other species/ exotic hybrids tend to cluster together. On the other hand, it seems that there is a slight differentiation between these two groups, once S. spontaneum accessions are concentrated at the first quadrant of the graphic.

Efficiency of the SSR primer set for sugarcane germplasm management and cultivar identification
Undoubtedly, the availability of an efficient and adequate size set of SSR primer pairs has a great impact for routine management of sugarcane germplasm and cultivar identification in a  cost-effective way. Several indexes are available to evaluate primer pair efficiency for polymorphisms at the molecular level [21,22,28,47]. In general, the indexes used here to measure primer effectiveness showed high values for either all the genotypes or only the Brazilian cultivar panel. This indicates that a large amount of genetic information was captured by these 12 SSR primer pairs across this diverse sugarcane group. Microsatellites allow the amplification of different allele types at a locus. Due to the sugarcane's high ploidy level and consequent heterozygosity, more than two types of alleles per locus in a single individual can be observed and the "banding patterns" reveal the different allele types of a locus defined by one SSR primer pair, although the allele dosage or number of copies cannot be determined [20,41,44]. Therefore, a considerable number of alleles are observed on a gel line for a single individual with a single SSR primer pair. In fact, the average number of alleles obtained with the 12 SSR primer set was promising compared to those reported by Nayak et al. (2014) [10] in which 209 alleles were produced by 36 SSR primer pairs screened over 1,002 accessions from the World Collection of Sugarcane and Related Grasses (WCSRG) and by Pan (2010) [66] which obtained 144 alleles derived from 21 SSR primer pairs over a Saccharum spp. molecular database.
The high number of alleles obtained by the 12 SSR primer set also positively impacted PIC and Resolving Power (Rp). Most of the PIC values were above 0.7 for either the total accessions or the cultivars. The Resolving Power, which reveals the ability of a given primer pair to distinguish between genotypes [47], was high for half of the SSR primer pairs and most of them were able to distinguish more than 90% of the total evaluated accessions.

Allelic richness and exclusive alleles
Due to its large ecogeographical distribution, S. spontaneum is considered the most diverse species in the Saccharum genus [67,68]. This species presented the highest number of alleles Brazilian sugarcane cultivars genetic structure assessment by SSR markers across the 12 microsatellite loci. In addition, of the total number of putative exclusive alleles detected in the basic germplasm, the greatest contribution was from S. spontaneum (63%), as compared to S. officinarum (6.5%) and other species/exotic hybrids (31%).
Cultivars encompassed more than half of the total alleles occurring in the basic germplasm showing a considerable representation of its allelic richness. However, a reasonable number of putative exclusive alleles derived from the basic germplasm (41.8%) was not inherited by the Brazilian commercial cultivars. In fact, a very limited number of accessions were used in the early interspecific crosses to develop the first commercial sugarcane cultivars [13,69]. Only 19 S. officinarum, two S. spontaneum and one S. sinense clones were extensively used to form the genetic base of modern sugarcane during the first interspecific hybridizations [12]. This genetic proportion appears to have been maintained with little additional genetic variation across generations of breeding programs, resulting in the narrow genetic base of current commercial cultivars and increasing the S. officinarum contribution [70].
Of the few putative-species-origin exclusive alleles within the Brazilian commercial cultivars, only one of four (25%) and six of thirty-nine (15.4%) were derived from S. officinarum and S. spontaneum, respectively. This could be a result of the nobilization process, as almost 80% of the commercial cultivar genome is from S. officinarum, indicating that most of the alleles present in the Brazilian commercial cultivars are shared with S. officinarum. On the other hand, the proportion of S. spontaneum exclusive alleles found in the Brazilian commercial cultivars seems to be close to the S. spontaneum genome contribution (around 10-15%) expected for commercial sugarcane cultivars [9].
This result also suggests that S. spontaneum had a significant contribution to the genetic variability of commercial cultivars used in Brazil. Aitken et al. (2006) [71], assessing the genetic diversity though AFLP markers in a collection of sugarcane cultivars and elite parents from the Australian breeding program, also reported that the variation observed for cultivars was attributed in large part to the S. spontaneum chromosomes.
We also reported the presence of putative exclusive alleles derived from commercial cultivars, suggesting that either the basic germplasm sampled in our work did not encompass the whole genetic pool used in the prior breeding programs, or that new alleles in commercial cultivars could have emerged over time as a result of DNA polymerase slippage or unbalanced crossing-over, which are mechanisms that lead to insertions or deletions of one or more repeats [72,73].

Genetic structure of Brazilian commercial cultivars
The genetic differentiation index (F ST ) proposed by Excoffier et al. (1992) [74] is considered analogous to Wright's fixation index, F ST . According to Wright (1978) [75], F ST values between 0 to 0.05 signals little population differentiation while values in the 0.05-0.15 and 0.15-0.25 ranges, respectively, indicate moderate and strong genetic differentiation. The genetic differentiation value between the Brazilian commercial cultivars and S. officinarum was slightly lower (21.1%) than that observed between commercial cultivars and S. spontaneum (21.6%), and in turn, these values reflect a strong genetic differentiation between the commercial cultivars and the two main Saccharum species. This strong genetic differentiation between Brazilian commercial cultivars and these main species reflects the previously mentioned founder effect, combined with the genetic drift effect, as a small number of these two progenitor species' clones contributed to the current genetic base of commercial sugarcane cultivars. Moreover, for the Brazilian cultivars, there is a high number of shared common parents among them, which also contributes to an increase in the genetic differentiation between commercial cultivars and their ancestor species.
Despite the limited genotypes employed in the first breeding programs, the sugarcane genetic base has not narrowed as much as had been suspected [54], especially given the high ploidy and genetic diversity of the major ancestral species. On the other hand, the high genetic differentiation between the Brazilian cultivars and S. officinarum and S. spontaneum signals that considerable genetic variability can be incorporated in the current genetic breeding pool by exploiting both species, especially S. spontaneum for energy cane development.
Moderate genetic differentiation was obtained between S. officinarum and S. spontaneum exposing a genetic relationship between these two species. As pointed out by Aitken and McNeil (2010) [35], based on Stevenson (1965) [76], it was hypothesized that sugarcane evolved from a primitive form of S. spontaneum in the Himalaya foothills of northern India, from which, as a consequence of evolutionary forces such as selection, migration, and introgression, two centers of diversity, respectively for S. spontaneum in India and S. robustum in New Guinea had resulted. However, it is well accepted that S. officinarum has emerged from S. robustum in New Guinea by domestication of this later species [77].

Genetic dissimilarity and cluster analysis
The four groups previously established (Cultivars, S. officinarum, S. spontaneum, and other species/exotic hybrids) was confirmed by the PCoA results. The commercial cultivars were assigned to a separate cluster, as expected, since most of them share the same parents and were bred to meet the high demand for sugar and ethanol production. The S. officinarum accessions have outstanding features such as high sugar production, low fiber content, and thick stalks, among others, which differs from related species. In accordance with the phenotypic differentiation, divergence at the molecular level was observed between S. officinarum accessions and the other species/exotic hybrids, justifying the formation of an isolated cluster composed by genotypes of this species. S. spontaneum, and the other species/exotic hybrids are more similar, and some overlap was shown between these groups. However, there is a slight trend toward separation among them (Fig 4). All these results agree with those obtained by the other methods applied in this study.
As expected, the highest average dissimilarity (0.285) was observed within the accessions of the other species/exotic hybrids group while the lowest (0.176) within the cultivars group. The average dissimilarity obtained within cultivars was 1.2 and 1.4 times lower than that within S. officinarum and S. spontaneum, respectively. This result can have practical applications for sugarcane breeding programs focusing on energy cane, once the choice of more divergent parents (S4 Table) amplifies the genetic variability in the progeny and also the possibility to maximize the range of sugar-to-fiber ratios and hence the selection for Type I (higher fiber content and lower sucrose content) and Type II (fiber content higher than Type I and marginal sugar content) of energy canes [78,79].
The average dissimilarity within cultivars implies that on average the commercial Brazilian cultivars shared 82% of the genomic regions assessed with SSRs. This is the first report in which a very large number of currently planted Brazilian cultivars was investigated at the molecular level, giving a more complete picture of their genetic relationship.
The pairwise dissimilarities of Brazilian cultivars ranged from 0.077 (IAC87-3396 x SP87344) to 0.258 (CTC2 x SP801836), and was lower than that observed in SSR assays by Pocovi and Mariotti (2015) [80] with 66 cultivars from the INTA (National Institute of Agricultural Technology) Sugarcane Germplasm Bank, in Argentina, ranging from 0.079 to 0.651. Hemaprabha et al. (2005) [81] assessed the genetic diversity of 54 commercial cultivars developed in India by Sequence Tagged Microsatellite Sites (STMS), a SSR-based technique, reporting 238 possible combinations between the fifty-four materials with lower similarity (<0.650) out of 1,431 possible crossings, which points to a very low-diversity genetic pool. You et al. (2013) [82] used SSR markers to evaluate the genetic structure and diversity of 115 sugarcane clones bred in China. Genetic similarity ranged from 0.725 to 1.000, revealing a very high level of uniformity among those cultivars, strongly enforcing the usage of new parents to broaden the Chinese sugarcane genetic base. A subsequent study [83] encompassing some of these 115 cultivars from the earlier study, within a group of 181 Chinese cultivars, reported that the usage of these novel parents in crossings may have an important role in broadening the genetic base of sugarcane bred in China. These results highlight the need to broaden the Brazilian sugarcane genetic pool, perhaps in the direction of available unexploited basic germplasm.
Clustering and phylogenetic analyses revealed the parenthood among the total genotypes evaluated. They tend to cluster according to their species and genealogy, forming consistent groups in agreement with the pedigree data available. However, some discrepancies were observed. Two S. spontaneum individuals, GLAGAH and IN8488, were grouped within S. officinarum. In fact, despite its classification as S. spontaneum, GLAGAH is a possible natural hybrid between Saccharum species. Although IN8488 is considered a S. spontaneum clone, the membership proportion showed high identity correlation with S. officinarum, suggesting that it probably is also a hybrid. POJ2878 and KASSOER are tightly clustered in the phylogenetic analysis. In addition, the structure analysis revealed almost the same membership proportion of the three groups in both individuals, revealing an already expected low genetic dissimilarity (0.0947) because POJ2878 is a direct F2 descendant of KASSOER [11].