Definition of Eight Mulberry Species in the Genus Morus by Internal Transcribed Spacer-Based Phylogeny

Mulberry, belonging to the order Rosales, family Moraceae, and genus Morus, has received attention because of both its economic and medicinal value, as well as for its important ecological function. The genus Morus has a worldwide distribution, however, its taxonomy remains complex and disputed. Many studies have attempted to classify Morus species, resulting in varied numbers of designated Morus spp. To address this issue, we used information from internal transcribed spacer (ITS) genetic sequences to study the taxonomy of all the members of generally accepted genus Morus. We found that intraspecific 5.8S rRNA sequences were identical but that interspecific 5.8S sequences were diverse. M. alba and M. notabilis showed the shortest (215 bp) and the longest (233 bp) ITS1 sequence length, respectively. With the completion of the mulberry genome, we could identify single nucleotide polymorphisms within the ITS locus in the M. notabilis genome. From reconstruction of a phylogenetic tree based on the complete ITS data, we propose that the Morus genus should be classified into eight species, including M. alba, M. nigra, M. notabilis, M. serrata, M. celtidifolia, M. insignis, M. rubra, and M. mesozygia. Furthermore, the classification of the ITS sequences of known interspecific hybrid clones into both paternal and maternal clades indicated that ITS variation was sufficient to distinguish interspecific hybrids in the genus Morus.


Introduction
Mulberry (Morus spp.), belonging to the order Rosales, family Moraceae, and genus Morus [1], is distributed in a wide range of areas worldwide [2]. It is an important economic woody plant with a significant impact on human society due to its economic value in sericulture as well as for its nutritional benefits and medicinal values [3]. This plant also plays significant ecological roles in the prevention and control of sand erosion, and in stony desertification and salinealkali land treatments [4]. However, the taxonomy of Morus has been disputed [5] because of its wide geographical distribution, morphological plasticity [6], hybridization among species [7], long history of domestication, and introduction and naturalization of species [8]. Different studies have recognized a variable number of Morus species, leading to an uncertain taxonomy. In 1753, Linnaeus established the first taxonomy of Morus [9]. The first comprehensive Morus taxonomy was created by Bureau [10] based on features of its leaves and pistillate catkins, and recognized five species, 19 varieties, and 13 sub-varieties. Since that time, the taxonomy has undergone many further modifications based on morphological and phenological characteristics [5]. In 1917, Schneider [11] described a new species, Morus notabilis C. K. Schneid in China. Subsequently, Koidzumi [12], Leroy [13], and Hotta [14] classified the genus Morus into 24 species and 1 subspecies, 19 species, and 35 species, respectively. More recently, Zhou and Gilbert [15] described 16 species in Morus, 12 of which were found in China. Because of this variable taxonomy, 260 validated published names in Morus have been deposited in the International Plant Names Index (http://www.ipni.org/), with many of them being synonymous. Therefore, although as many as 68 [16,17] or 150 [18,19] species have been reported in Morus, only 10-16 are generally cited and accepted [20][21][22][23][24].
Molecular markers that are heritable, fast, and easy to measure and evaluate have been used to classify Morus species in order to improve the taxonomy of Morus based on phenotypic characteristics. Using amplified fragment length polymorphism (AFLP) markers, Sharma [25] and Kafkas [26] concluded that the results from both marker and conventional methods were consistent to a large extent and that M. nigra and M. rubra were molecularly distinct from M. alba. Rao [27] grouped domesticated species (M. alba and M. indica) into one cluster by building a phylogenetic tree of 80 wild mulberries from four species using four isozymes (polyphenol oxidase [PPO], peroxidase [PoX], esterase [EST], and diaphorase [DIA]). Zhao [28][29][30][31][32] placed Chinese domesticated species, including M. alba, M. multicaulis, M. bombycis, M. australis, M. atropurpurea, and M. rotundiloba, into one cluster by creating phylogenetic trees using ITS sequences, microsatellite loci, and sequence related amplified polymorphism (SRAP), inter simple sequence repeat (ISSR), and SSR markers. Recently, the genus Morus was classified into 13 species using a combination of ITS and trnL-trnF intergenic spacer markers, these included eight Asian species (M. alba, M. australis, M. cathayana, M. macroura, M. mongolica, M. nigra, M. notabilis, and M. serrata), four New World species (M. celtidifolia, M. insignis, M. microphylla, and M. rubra), and one African species (M. mesozygia) [24]. However, the M. notabilis sample they collected was not a true M. notabilis sample because the ITS sequence (GenBank accession number HM747175) is actually identical to that of M. alba. The fact that these studies did not include M. notabilis, as reported by Schneider [11], limits our full understanding of Morus taxonomy.
In plants, the ITS locus has been widely used in phylogenetic inference [33] and has shown the highest discriminatory power for species [34]. Furthermore, ITS has been recently proposed for use as the primary DNA barcode marker in fungi [35] and its use has enabled the estimation of highly accurate phylogenies of closely related organisms in yeast [36]. In this study, which includes data from M. notabilis, we characterized the ITS locus in Morus spp. including length and SNP. Using this ITS data, we redefined the taxonomy of

Plant materials
All the Plant materials used in this study were listed in the Table 1. M. notabilis located at 29°4 5.278' north latitude, 102°53.878' east longitude, a wild mulberry species with seven pairs of chromosomes, was collected by the author (Ningjia He) from a pristine forest in Ya'an, Sichuan DNA isolation, amplification, and sequencing

Data analyses
We downloaded an additional 187 Morus ITS sequences from NCBI (http://www.ncbi.nlm. nih.gov/), and determined the sequence boundaries of ITS1, 5.8S, and ITS2 within all Morus sequences by comparison to known Arabidopsis thaliana ITS sequences (X52320). All data containing the full ITS1, 5.8S rRNA and ITS2 sequences were used for analysis. The sequence length and GC contents of all samples were analyzed by PerlScript.

Detection of SNPs in M. notabilis
In order to detect M. notabilis ITS SNPs, the improved Short Oligonucleotide Alignment Program (SOAP2) [37] was used to map all of the M. notabilis genome reads from 12 libraries [38] onto the reference sequence, a 3709-bp fragment located at scaffold2247 (GenBank accession number KF784877). SNPs were detected using SoapSNP essentially as described [37], but with the following modifications: SoapSNP calls were discarded if the quality score of the consensus genotype was less than 20, if the sequencing depth of the site was less than 5, or if the distance of a neighbor candidate SNP was less than 5.

Phylogenetic analysis and estimation of divergence times
Phylogenetic analysis was performed using a Bayesian method with MrBayes V3.1.2 software [39]. The phylogenetic tree of the ITS locus (ITS1-5.8S-ITS2) was rooted by the corresponding homologous sequences in Broussonetia papyrifera and Ficus adhatodifolia, which are sister genera of Morus, and in Cannabis sativa, which is another member of the Rosales order in the Cannabaceae. The phylogenetic tree was reconstructed and the estimations of the divergence times were determined according to the method described by Sun et al [40]. The divergence time of M. notabilis and C. sativa [38] was used to calibrate the estimation.

Sequence characteristics of the ITS region
From our analysis, an additional 187 ITS sequences of Morus that included generally accepted Morus species were deposited in NCBI, including 56 sequences that contained the complete ITS region (ITS1-5.85S-ITS2), 60 that contained the partial 18S region (3 0 end, "AGG ATC ATT G"), 156 covering the complete 5.8S and ITS2 region, and 57 sequences that contained the partial 28S region (5 0 end, "G(T/C)G ACC CCA G"). After adding the newly sequenced ITS regions, we characterized the structures of the ITS nucleotide sequences (Table 2) along with their sequence lengths and GC contents ( Table 3). The first 10 nucleotides of the ITS1 region were highly conserved in Morus, and a transversion (G/T) and a single nucleotide deletion (T) were found in the last 10 nucleotides ( Table 2 (Table 3). In contrast to other Morus species, a 13 bp deletion of the ITS1 region in M. alba was located at the 55th nucleotide position. The comparison of the 5.8S region sequences of M. alba (121 sequences), M. rubra (9 sequences), M. notabilis (6 sequences), and M. celtidifolia (4 sequences) found that the intraspecific 5.8S regions of these species were completely identical. In addition, we found that the ITS1 sequences of interspecific hybrids between M. alba and M. rubra consisted of 215 bp (GenBank accession numbers HQ144171 and HQ144170) and 229 bp (GenBank accession numbers HQ144175 and HQ144187) sequences, respectively. However, the ITS2 lengths of these hybrids were unique ( Table 3).

Detection of SNPs in M. notabilis and ITS genetic heterogeneity in Morus
To detect SNP sites in the ITS region in the draft genome of M. notabilis, we employed the algorithm SOAP2 to map all of the genome reads, identifying six SNPs in the ITS region of M. notabilis (Table 4). In the M. alba lineage, only M. australis (GenBank accession number AM042004) showed variation in the 5.8S region. Comparison of all M. australis 5.8S region data deposited in GenBank indicated that ten sequences from different specimens were identical. In the full collection of ITS1 and ITS2 sequences of M. alba, we identified one SNP (A/G) existing at the 82nd site of ITS1 (4 out of 41 sequences) and one SNP (T/G) existing at the 219th site of ITS2 (21 out of 107 sequences). In M. rubra, 4 and 1 variable loci existed in the ITS1 and ITS2 regions of 9 sequences, respectively. Among these 5 variable loci, 4 occurred in two sequences from different samples, which suggested the possibility of 4 SNPs in the ITS region of M. rubra. In M. celtidifolia, we identified 3 variable loci in the ITS1 and ITS2 regions out of five ITS sequences.

Phylogenetic analysis and estimation of divergence time
Bayesian inference of phylogeny based on ITS sequences separated Morus species into four large clades, as shown in Fig 1. The A clade was a polyphyletic group comprised of two Asian species (M. nigra and M. serrata) and two American species (M. rubra and M. celtidifolia). The A clade also included two clones (GenBank accession numbers HQ144175 and HQ144187) that were a red mulberry parent of a hybrid (M. rubra x M. alba). The calculated estimated divergence time between M. celtidifolia and M. nigra was 2.25 Mya. The B clade, which was the largest clade, was a monophyletic taxa comprised of only Chinese white mulberry (M. alba). The B clade also included two clones (GenBank accession numbers HQ144170 and

Discussion
Mulberry is not only an important economic woody tree [24] but also plays significant roles in the recovery and restoration of damaged ecosystems [4], and was planted by famers as early as 5000 years ago [38]. Although the genus Morus is widely cultivated in the Eurasian, African, and American continents, its taxonomy is complex and has been disputed. Since Linnaeus [9], researchers have used morphological structures of the leaf, winter bud, bark, pistil, syncarp, and leaf idioblasts to infer evolutionary relationships among Morus species [10][11][12][13][14]. DNA-based markers have been developed to better classify Morus species [24][25][26][27][28][29][30][31][32]41]. Currently, ITS data have the highest discriminatory power for plant species determination [34] and this marker is widely used for taxonomic and phylogenetic analyses [42,43]. Although ITS data have previously been used to classify Morus species [24,28,[44][45][46], one recognized species, M. notabilis [11], was not included in these studies. In this study, we supplemented the 24 ITS sequences (GenBank accession numbers including KF784875-KF784897, KF850474) of Morus by including data for M. notabilis and M. yunnanensis (14 chromosomes), and M. nigra (308 chromosomes) collected using the method of cloning and sequencing.
Previously reported lengths of the Morus 5.8S gene were 152, 159, or 100 bp [28,45,47]. Here, we found the 5.8S length to be 163 bp, which was consistent with that of other species such as Malus, Triticum, and Canella [48][49][50][51][52][53][54]. We also found that the samples with a 215 bp ITS1 sequence carried identical 5.  [15,24]. M. mongolica, M. cathayana, M. australis, M. wittiorum, and M. macroura used to be widely accepted as distinct Morus species, however, in this study, we found that they should be attributed to M. alba. Furthermore, we found that the 5.8S sequences of M. notabilis (6 clones, originating from different samples), M. rubra (9 clones from different samples), and M. celtidifolia (4 clones from different samples) were also identical. M. insignis (GenBank accession number HM747169), M. mesozygia (GenBank accession number HM747171), and M. serrata (GenBank accession number HM747176) had only one ITS sequence each deposited in NCBI, so it was necessary to obtain additional data to test the identification of their 5.8S sequences. Based on characterization of the 5.8S sequences from M. alba, M. rubra, M. notabilis, and M. celtidifolia, we found that the intraspecific 5.8S sequence was invariable in these species, whereas the interspecific 5.8S sequences were variable in Morus species exhibiting no evidence of hybridization. Therefore, the inclusion of the 5.8S region as part of the ITS sequence is critically important for taxonomic purposes [34].
In the present study, we found that M. serrata (GenBank accession number HM747176), referred to as M. serrata Roxb and also known as the Himalayan Mulberry [55], was distinct from M. serrata Wall., a synonym of M. alba. M. yunnanensis was previously regarded as a variant of M. mongolica, but M. yunnanensis Koidz. S. S. Chang was a synonym for M. notabilis, which was supported by our results. In addition, M. microphylla was used as a synonym for M. celtidifolia [56]. Therefore, with no regard for synonyms, subspecies, or varieties, the generally accepted Morus species identified in our study included four species native to Asia (M. alba, M. serrata, M. nigra, and M. notabilis), three New World species (M. celtidifolia, M. insignis, and M. rubra), and one species from Africa (M. mesozygia).
The use of ITS as a species barcode has been criticized due to its limitations, which include intraspecific variation [42] and heterogeneous copies [43,57]. However this heterogeneity did not affect the molecular identification of species [57]. The heterogeneity of ITS could be detected by both direct sequencing and sequencing after cloning [42]. In this study, we analyzed the heterogeneity of four Morus species. The results showed that there were SNPs in the ITS regions of M. notabilis, M. alba, M. rubra, and M. celtidifolia. Among these, the proportion of SNPs in the ITS region of M. notabilis was similar to that in the entire M. notabilis genome [38]. For Morus species without genome samples or repetitive ITS data in GenBank, ITS heterogeneity cannot be studied. Although intraspecific ITS SNPs were detected in Morus spp., the lengths of their intraspecific ITS regions and the positions of the Morus spp. with intraspecific ITS SNPs in the tree were invariable.
In this study, we demonstrated that interspecific hybrids of Morus species could be identified by ITS sequences. For example, two M. alba (GenBank accession numbers HQ144171 and HQ144170) and two M. rubra (GenBank accession numbers HQ144175 and HQ144187) clones were detected in a hybrid sample. Another example was an ITS sequence of M. rubra (FJ605516) whose sequence was identical to sequences of a hybrid (HQ144170) in the ITS1 and 5.8S regions, except for five separate locus differences in the ITS2 region. This suggested that the sample was possibly a hybrid (M. rubra x M. alba) because interspecific hybrids are quite commonly misidentified by morphological criteria [58].
We reconstructed the phylogenetic tree of the genus Morus based on ITS sequences. According to the phylogenetic analysis, the largest clade, B, was defined by a 215 bp ITS1 sequence and was a monophyletic group (M. alba), which agreed with the results obtained from sorting Morus species names. The C clade included M. notabilis from Ya'an county and M. yunnanensis from Dawei Mountain in China, each of which contained 14 chromosomes and an ITS1 sequence length of 233 bp. The distribution of parents of interspecific hybrids into separate clades (GenBank accession numbers HQ144175 and HQ144187, HQ144171 and HQ144170) indicated that ITS could be used to investigate Morus interspecific hybrids. The results of the phylogenetic analysis also indicated that there were only eight species in the genus Morus. Furthermore, the calibrated point of 63.5 (46.9-76.6) Mya for estimating the divergence time of the genus Morus was adopted from the mulberry genome [38]. This calibrated point is consistent with the divergence time of Moraceae (55 Mya) estimated by nuclear and chloroplast DNA analysis combined with data from multiple fossils [59], which confirmed that the estimated divergence time of the Morus genus was reliable.
Therefore, from the results obtained in this study after including new data for ITS sequences and reconstructing the phylogenetic tree of the genus Morus, we suggest that the genus Morus can be classified into eight species and that ITS might to be used to discover hybridization in this genus.