Phylogenetic Analysis of Different Ploidy Saccharum spontaneum Based on rDNA-ITS Sequences

Saccharum spontaneum L. is a crucial wild parent of modern sugarcane cultivars whose ploidy clones have been utilized successfully in improving the stress resistance and yield related traits of sugarcane cultivars. To establish knowledge regarding the genetic variances and evolutional relationships of ploidy clones of Saccharum spontaneum collected in China, the rDNA-ITS sequences of 62 ploidy clones including octaploid clones (2n = 64), nonaploid clones (2n = 72), decaploid clones (2n = 80), and dodecaploid clones (2n = 96), were obtained and analyzed. The rDNA-ITS sequences of four species from Saccharum and Sorghum bicolor selected as controls. The results showed that decaploid clones (2n = 80) possess the most abundant variances with 58 variable sites and 20 parsim-informative sites in ITS sequences, which were then followed by octaploid clones with 43 variable sites and 17 parsim-informative sites. In haplotype diversity, all four population exhibited high diversity, especially nonaploid and decaploid populations. By comparing the genetic distances among four ploidy populations, the dodecaploid population exhibited the closest relationship with the nonaploid population, and then the relationship strength decreased successively for the decaploid population and then for the octaploid population. Population differentiation analysis showed that the phenomena of population differentiation were not found among different ploidy populations, and low coefficient of gene differentiation(Gst) and high gene flow(Nm) occur among these populations possessing close genetic relationship. These results mentioned above will contribute to the understanding of the evolution of different ploidy populations of Saccharum spontaneum and provide vital knowledge for their utilization in sugarcane breeding and innovation.


Introduction
Saccharum spontaneum L. is a crucial wild parent for modern sugarcane cultivars that can improve cultivars with regards to tolerances to abiotic or biotic stress and yield related traits. More notably it has the widest ecogeographical distribution among Saccharum spp. and different clones Sugarcane Research Institute (YSRI). At present, these clones were conserved in the CNNSGR (China National Nursery of Sugarcane Germplasm Resources), which was built by China's Ministry of Agriculture in Kaiyuan city, Yunan province in 1995. The YSRI was entrusted with managing the routine works of CNNSGR. We were assigned to responsible for management, evaluation of these resources by YSRI. Finally, we confirm that no specific permits were required for the present studies.

Plant materials
A total of 62 different ploidy clones of S. spontaneum were selected from CNNSGR, of which 45 clones (23 decaploid clones and 22 octaploid clones) were chosen according to the standard of one clone per county with reference to their collection location. Because there were only 7 nonaploid clones and 10 dodecaploid clones conserved in CNNSGR, all of these clones were chosen for this study. And 31 rDNA-ITS sequences from five species (Saccharum officinarum, Saccharum barberi, Saccharum sinense, Saccharum robustum and Sorghum bicolor) downloaded from GenBank were regarded as controls. All clones and control sequences were listed in Tables 1 and 2 in detail.

DNA extraction and PCR amplification
Considering all stalks per clone arise from these rhizome buds through vegetative propagation, the mixed young tender leaves from multiple stalks per clone were powdered with liquid nitrogen, then the genomic DNA of which was extracted by using the traditional CTAB method, the quality and concentration of DNA were respectively tested with 0.8% agarose gel and Thermo Nanodrop 2000, and then obtained DNA samples were diluted to the concentration of 20 ng/μl with deionized water for PCR amplification.
The rDNA-ITS region of all samples, which contain ITS1, 5.8s, and ITS2 regions, were amplified through using the universal primers ITS4 and ITS5(ITS4 primer sequence: 5'-TCCTCCGCTTATTGATATGC-3', ITS5 primer sequence: 5'-GGAAGTAAAAGTCGTAA CAAGG-3') [25]. In view of lots of clones and shorter amplification sequence length, the High Fidelity TransTaq DNA Polymerase from Transgen biotech company, whose fidelity of PCR amplification is 18 times than common Taq polymerase, was employed for amplifying these short sequences instead of using PCR replication experiment to reduce the PCR amplification error. The PCR reaction system and procedures were performed according to Chen et al. [22]. PCR was performed on a Mastercycler gradient thermocycler (Eppendorf, Germany). The PCR products were tested by 1.0% agarose gel electrophoresis and then were purified using the OMEGAEZNA Gel extraction Kit. The purified PCR product was cloned into a PMD19-T vector, and the recombinant plasmids were transformed into a DH5α competent cell. In order to further increase the accuracy of sequence, five transformed clones per sample were selected for bi-directional sequencing by the BGI Company, China, then the sequence occupying the highest proportion among five sequences each sample was used for analysis. Finally, all obtained ITS sequences were uploaded to GenBank, the sequence accession No. per sample was list in Table 1.

Sequence alignment and analyses
All obtained right sequences were aligned using the Clustal W program [26] with default settings. The basic sequence statistics including GC content, variable sites, and parsim-informative sites were counted through MEGA 6.06 software [27]. In view of DnaSP5.0 [28] and Arlequin3.11 [29] softwares successfully used to estimate nucleotide diversity of DNA or gene sequences and population differentiation of ployploid plants such as wheat [30][31][32] and potato  [33,34], the two softwares were also used for rDNA-ITS sequence analysis of S. spontaneum clones. The haplotype diversity, nucleotide diversity, average number of nucleotide difference, gene flow(Nm) and coefficient of gene differentiation (Gst) were calculated according to these formulas (equation 8.4, equation 10.5 and equation 5) from Nei's reports [35,36] by using DnaSP5.0 software; and the analysis of molecular variance among populations were implemented by using Arlequin 3.11 software to calculate the Variance of components, Percentage of variation, fixation Index according to the standard AMOVN computations method with choosing haplotypic data and DNA type as data parameter type. The genetic distances among four different ploidy populations were calculated according to Kimura 2-Parameter model using MEGA6.06 software. Differences in genetic distance between intra-population and inter-population were assessed by using independent-samples T test at P<0.05. The maximum-likelihood (ML) and neighbor-joining (NJ) method were used to construct a haplotype phylogenetic tree according to the Kimura 2-Parameter model using MEGA6.06 software, and all branches were evaluated with 1000 bootstrap replications and the trees with bootstrap confidence values >50% appear in the phylogenetic tree.

Component and variance analysis of ITS sequences
Regarding the length of ITS sequences, there was only a types of sequences length in ITS1 sequences (207 bp) and 5.8S rDNA sequences (164 bp), and three length types (218 bp, 219 bp, and 220 bp) in ITS2 sequences. With regards to GC content, the value of GC content in ITS2 sequences with a mean of 69.3% is higher than that in ITS1 sequences with a mean of 63.5% (Table 3). 5.8S rDNA sequences exhibited the lowest GC content with a mean of 57.1%. Among different ploidy populations, there are no significant differences found in GC content.
According to the results of ITS sequences aligned using the Clustal W program, every ploidy population had 207 sites found in ITS1 sequences. However, there were differences in the number of sites for ITS2 sequences among different ploidy populations with 222 in an octaploid population, 220 in a decaploid population, and 219 in nonaploid and dodecaploid populations. For ITS sequences variable sites, the decaploid population had more rich variable sites with total 58 variable sites and 20 parsim-informative sites (20 variable sites and 13 parsim-informative sites in ITS1 sequences, 11 variable sites and 1 parsim-informative sites in 5.8S rDNA sequences, 27 variable sites and 6 parsim-informative sites in ITS2 sequences), which made up 9.81% and 3.38% of total sites respectively ( Table 4). The ranked second for variances of ITS sequences is the octaploid population with total 43 variable sites and 17 parsim-informative sites. Then the dodecaploid and nonaploid populations exhibited low number of variable sites. As mentioned above, the largest variances of ITS sequences arise in the decaploid population, followed by the octaploid population. This may be due to the number of clones selected in this study.

Haplotype diversity analysis of population
The results of haplotype diversity analysis among four populations showed that total 51 haplotypes were found in four ploidy populations (Table 5), there were 20 haplotypes in octaploid  (Table 5). Nonaploid population performed the highest diversity, followed by decaploid population. Similarly, the high diversity in nonaploid and decaploid populations was also found in nucleotide diversity (Pi) because of high Pi value (0.0174 and 0.0177). Moreover, the two populations also appear big nucleotide difference, varying from 10.1905-10.3795. Using 17 haplotype data of rDNA-ITS sequence as outgroup, 16 of which from four species of Saccharum (S.officinarum, S.robustum, S.barberi and S.sinense) and 1 from Sorghum bicolor. Two phylogenetic trees with bootstrap confidence values >50% were constructed based on a Kimura 2-parameter model using the maximum-likelihood (ML) and neighbor-joining (NJ) methods (Fig 1). The results showed that the NJ tree was similar to the ML tree. For the two trees, the Hap68 from Sorghum bicolor and Hap60 from S.robustum separated firstly from the largest group consisting of 66 remained haplotypes. In the big group, 5 haplotypes from S.officinarum, S.robustum, S.barberi and S.sinense were clustered together with 71% and 65% bootstrap value in NJ and ML, and 5 haplotypes from octaploid and decaploid populations were assigned into another small group with 65% or 63% bootstrap value. Because the haplotypes from same population did not cluster together instead of exhibiting confused clustering relationships, these haplotypes from different ploidy populations were not obvious differentiation.

Genetic distance among populations
By using a Kimura 2-parameter model of MEGA6.06 software, the mean genetic distances among different ploidy populations were obtained. The results are listed in Table 6. Four populations showed a close genetic relationship, of which nonaploid population and dodecaploid population exhibited the closest relationship with the smallest genetic distance of 0.0156. The genetic distances (0.0162) among dodecaploid population and decaploid or octaploid population were ranked as second. However, octaploid population and nonaploid displayed the farthest genetic relationship with the biggest genetic distance of 0.0171.
In order to determine whether a reliable phylogenic tree of four populations can be constructed successfully according to ITS sequence data. The differences of genetic distance between inter-population and intra-population were assessed using independent-samples T test. The results exhibited that the genetic distances of inter-populations have no significant bigger than that of intra-population at P<0.05 (Table 6), which means that the reliability of population phylogenic tree may be interfered by intra-population variation. According to the situation above, a reliable phylogenic tree among four populations cannot be constructed.

Population differentiation
The coefficient of gene differentiation (Gst), Gene flow and molecular variance were computed by using DnaSP5.0 and Arlequin 3.11 softwares. the results exhibited that the lowerest Gst value (0.0191), the highest Nm value (12.83) were obtained between nonaploid and decaploid populations (Table 7), this result indicated that two populations have high frequency gene exchanging, followed by the Gst (0.0314)and Nm (7.71) value between decaploid and dodecaploid populations. Between octaploid and dodecaploid populations, the biggest Gst value (0.0814) and the lowest Nm value (2.82) implied that low genetic exchanging occurred between two populations, similar result also appeared between octaploid and nonaploid populations. AMOVA analysis indicated that there was no significant differentiation among four ploidy populations at significance level of 0.001 with a low fixation index (0.0403) ( Table 8). And the most of the variation (95.97%) was from within populations, only 4.03% variation from among populations. On comparison the percentage of variation of among population, the biggest value of 10.96% between octaploid and dodecaploid populations implied that there were more genetic differences between two populations, followed by between octaploid and nonaploid populations with a value of 8.26%. The results were consistent with the analysis of coefficient of gene differentiation (Gst) and Gene flow.

Discussion
S. spontaneum is a very complex polyploid plant which possess approximately 26 types of chromosome number (2n = 40-128) [4]. In China, about 16 types have been reported with chromosome number ranging from 54 to 108, but only four ploidy clones (2n = 64, 72, 80, 96) appear to be distributed with high frequency [17,[37][38]. However, the questions of how these ploidy Regarding the origin of S. spontaneum in china, Chen et al. [11] hypothesized that S. spontaneum might have originated from southern regions of Yunnan in China which has low altitude and latitude. They conjectured that it then spread to northwest regions of Yunnan with a higher altitude and latitude, then through Sichuan and Guizhou, and finally extended to other provinces such as Guangxi, Guangdong, Fujian, Jiangxi, and Zhejiang. Because octaploid clones are mainly distributed in possible origin regions such as Yunnan [17,[37][38], we inferred that octaploid clones might belong to a primitive chromosome type. According to chromosome number of nonaploid clone (2n = 72), we presumed that nonaploid clones may have arisen from a crossing of offspring between the octaploid clones (2n = 64) and decaploid clones (2n = 80) due to the overlap in their distribution regions. Because of 40 chromosomes from decaploid and 32 from octaploidy, the nonaploid should have a more close genetic relationship with decaploid than with octaploid. The genetic distance of three ploidy populations in this study is consistent with our assumption. For Dodecaploid, it only distributed in Fujian provinces in China. Because its distribution region belongs to the extended regions of the evolution of S. spontaneum, we conjectured that dodecaploid clones may belong to evolutional types. Sreenivasan [39] hypothesized that it may originate from a triploid seedling from an octaploid, but the theory not be supported by our study. Actually, dodecaploid has a more close relationship with nonaploid rather than octaploid and decaploid, it means that dodecaploid may derived from nonaploid. But how they evolve still remains unknown, we presumed that the odd ploidy clone may produce a kind of six ploidy gamete containing 48 chromosomes, then crossing with each other form dodecaploid clone possessing 96 chromosomes. More research about the evolution of different ploidy of S. spontaneum should be carried out in future. Phylogenetic Analysis of Different Ploidy Saccharum spontaneum Based on rDNA-ITS Sequences