Multiple ITS Copies Reveal Extensive Hybridization within Rheum (Polygonaceae), a Genus That Has Undergone Rapid Radiation

Background During adaptive radiation events, characters can arise multiple times due to parallel evolution, but transfer of traits through hybridization provides an alternative explanation for the same character appearing in apparently non-sister lineages. The signature of hybridization can be detected in incongruence between phylogenies derived from different markers, or from the presence of two divergent versions of a nuclear marker such as ITS within one individual. Methodology/Principal Findings In this study, we cloned and sequenced ITS regions for 30 species of the genus Rheum, and compared them with a cpDNA phylogeny. Seven species contained two divergent copies of ITS that resolved in different clades from one another in each case, indicating hybridization events too recent for concerted evolution to have homogenised the ITS sequences. Hybridization was also indicated in at least two further species via incongruence in their position between ITS and cpDNA phylogenies. None of the ITS sequences present in these nine species matched those detected in any other species, which provides tentative evidence against recent introgression as an explanation. Rheum globulosum, previously indicated by cpDNA to represent an independent origin of decumbent habit, is indicated by ITS to be part of clade of decumbent species, which acquired cpDNA of another clade via hybridization. However decumbent and glasshouse morphology are confirmed to have arisen three and two times, respectively. Conclusions These findings suggested that hybridization among QTP species of Rheum has been extensive, and that a role of hybridization in diversification of Rheum requires investigation.

Rheum L. (Polygonaceae) contains ,60 species, mainly distributed in the QTP and adjacent regions [33][34][35]. This diversity appears to result from two radiation events, the first around 9.9-12.0 million years ago (Mya) and the second around 5 Mya [14,36]. There is extensive morphological and ecological variation between species [33][34][35], and certain adaptive traits, such as decumbent habit and ''glasshouse'' morphology involving translucent bracts, have evolved multiple times in parallel [14,[37][38][39]. However, reticulate evolution due to hybridization might give the impression of a character evolving multiple times, when in fact it only evolved once. However, a possible role for hybridization in the diversification of Rheum has not yet been investigated.
In this study, we extended our previous examination of the diversification history of the genus Rheum [10,14,37,39,68]. We cloned and sequenced ITS sequences for 30 Rheum species, representing all seven sections of this genus. We sought evidence of hybridization via (i) multiple ITS copies and (ii) incongruence between ITS and cpDNA, and used this data to evaluate the extent of hybridization during diversification in Rheum.

Plant materials
We collected one accession each of 30 species, representing seven sections of Rheum (Table 1). Most of the species examined by Sun et al. [14] for cpDNA were included, except for four which could not be obtained: R. acuminatum, R. delavayi, R. palaestinum and R. tataricum. Most species samples were collected in the QTP, and voucher specimens were deposited in the herbaria of Northwest Plateau Institute of Biology (HNWP), the Chinese Academy of Science, and School of life Sciences, Lanzhou University, China (Table 1). We selected nine species from five other genera of the Polygonaceae, plus Limonium sinense from the Plumbaginaceae, to serve as outgroups to root our phylogenetic analyses.

DNA extraction, amplification, cloning and sequencing
Total DNA was extracted from silica dried leaves using a modified CTAB method [69]. The primers used for amplification were ITS1 (59-TCCGTAGGTGAACCTGCGG-39) and ITS4 (59-TCCTCCGCTTATTGATATGC-39) [70]. The amplification program consisted of an initial template denaturation step at 95 for 5 min, followed by 38 cycles at 94 for 20 s, 50 for 30 s, and 72 for 40 s, and extension at 72 for 5 min. We firstly sequenced the amplification products directly. However, most of these sequences can not be identified exactly, especially for seven species, i.e. R. hotaoense, R. officinale, R. tanguticum, R. pumilum, R. likiangense, R. franzenbachii and R. reticulatum, mainly because there are a lot of impurity peaks. So for each of the 30 Rheum species examined, the amplified fragments were then ligated and transformed into Escherichia coli strain DH5a system using a pUC18 vector (Takara Inc.). At least 15 positive clones for each species were selected and sequenced on an ABI Prism automated sequencer with universal primers M13rev and M13uni. In order to reduce false base substitutions resulting from PCR polymerase mismatch, any polymorphism that was observed in only one clone was removed from the analyses. Sequences were edited and aligned with MegAlign and manually adjusted at two positions that had minor length polymorphism. All sequences have been submitted to GenBank (Table S1 in File S1).

Data analyses
The boundaries of the sequenced ITS (including ITS1, 5.8 S, and ITS2) regions were identified in comparison to Rumex cripus ITS sequence from GenBank (Accession number: AF338221), and ITS2 boundaries and alignments were confirmed by Hidden Markov Models, following Keller et al. [71]. Clustal W was used for alignment of sequences initially [72], BioEdit v 5.0.6. [73] was used to refine the alignments manually and to determine the lengths and GC contents of ITS1, 5.8 S, and ITS2 separately. The repeats in the ITS sequences were detected with the Tandem Repeats Finder [74]. For each species, sequences for the eight regions of cpDNA examined by Sun et al. [14] were retrieved (Table S1 in File S1). Thus, two sequence matrixes (ITS and cpDNA) were used in further analyses. All indels in the two matrixes were coded as binary states (0 for absence, and 1 for presence) using the GAPCODER program [75].

Phylogeny analyses
Phylogeny analyses were conducted based on the sequences of entire ITS region, ITS1 region and ITS2 region, respectively. We used MrModeltest 2.0 [76] to choose the most appropriate model for each dataset for the ML and Bayesian analyses (the selected model was GTR + I + G for each analysis). Maximum likelihood analyses were performed using PHYML 3.0 with 1000 bootstraps under the GTRIG model [77]; partial parameterization commands used here: -b 1000 -m GTR -v e -f e -t e -a e -o tlr, see the PHYML manual for detail. MrBayes 3.1.2 was used to perform the Bayesian inference analysis to find the optimal tree topology [78]. Four runs were made, each to 10 million generations, saving every 1000th tree. The parameters of the selected model were optimized during searches as recommended, running two independent Markov chain Monte Carlo (MCMC) chains with one cold and three hot chain searches for each dataset with a 50% 'burn-in'. MCMC convergence was also explored by examining the Potential Scale Reduction Factor convergence diagnostics [79] for all parameters in the model. The posterior probabilities indicating support values for each branch were also estimated.

Genetic distance between species and between ITS versions
Based on the effects of incomplete concerted evolution of ITS sequences, hybridization can be inferred in the ancestry of any individual that contains copies of ITS that are less similar to one another than each is to a sequence present in another species [80]. Seven species contained multiple copies of ITS, i.e. R. hotaoense, R. officinale, R. tanguticum, R. pumilum, R. likiangense, R. franzenbachii and R. reticulatum. We used MEGA5 [81] to compute the p-distance between the two ITS versions within each of these species, and that between each version and the most similar sequence detected across all species. This was done using the minimum differences between sequences.

Checking for pseudogenes
Where multiple ITS copies were detected, we checked each copy for two characters that might indicate it to be a pseudogene. The first was the presence of three 5.8 S motifs present which are necessary to give the ITS2 proximal stem connecting 5.8 S to 28 S its correct functional structure, and whose absence hence indicates pseudogenes [82][83][84]. These comprise a one seed plant specific 14bp motif common to all seed plants [82], and two that are common to all angiosperms [83][84]. The second was lower GC content, which also characterises pseudogenes.

ITS sequence variation
The entire ITS region, including ITS1, the 5.8 S rDNA and ITS2, was amplified from at least 15 positive clones per Rheum species. Additive polymorphisms were positively identified only when each version of ITS were present in at least two clones per species. Seven accessions contained two highly divergent ITS versions , which differed from one another by at least 27 bases as follows: R. hotaoense  (Figure 1). For each of those species, one or two additional accessions were examimed, and every accession was found to contain both versions with different frequency (Table S2 in File S1). In each species, There were two versions of ITS sequences between 2-3 individuals, among sequences of each version, they were different at 2-4 bases, no identical ITS sequences were found, hence there was far greater similarity between individuals than between ITS versions. Furthermore, in each of these species, the two ITS versions were not sister to one another in phylogenetic analysis; for each species, each version for formed a discrete clade not closely related to the other (see below). Single accessions of three other species, i.e R. nobile, R. nanum and R. alexandrae, each contained two barely divergent versions of ITS, that differed at between 1 and 3 sites in each case, and which were strongly supported as reciprocal sister sequences in phylogenetic analysis. In the remaining accessions, only one version was detected. The possibility that any of these ITS sequences could reflect fungal contamination was eliminated following the methods of Li et al. [44].
The ITS sequences of Rheum species ranged from 518 bps (R. pumilum 2) to 590 bps (R. nobile). The ITS1 region ranged from 151 bps (R. pumilum 2) to 220 bps (R. nobile) (Figure 1). The 5.8 S region had a length of 164 bps in all sequences. The ITS2 region ranged from 202 bps (R. franzenbachii 2) to 209 bps (R. nobile) annotated according to Keller et al. [71] (Table S3 in File S1). In the seven species, ITS2 regions of two versions between 2-3 individuals of the same species, they were different at 7-16 bases, but with different frquency. Three putative repeats were detected using the 'Tandem Repeats Finder' with the default search options (alignment parameters 2, 7, 7, and minimum alignment score 50) ( Table 2). These putative repeats were 17-46 bps in length, with 2.1-3.0 copies, and 62-100 percent matches. Among these, two repeats were located in the ITS1 region, and one in the ITS2 ( Table 2). The analysis of gene conversion was performed using  'Gene-conversion Software' on a basis of the ITS data of all 30 Rheum species and no gene conversion was detected.
To check if any ITS copies might be pseudogenes, for each species with multiple copies, the GC content of each copy was calculated and compared with the Rumex cripus ITS region (including ITS1, 5.8 S, and ITS2) , which was designated as the presumed functional paralog. The GC content of ITS of all Rheum species possessed similar GC values in the spacers with Rumex cripus ITS (Accession number: AF338221, 67.17% in ITS (Table S3 in File S1).

Phylogenetic analyses
When analysed separately, ITS1 and ITS2 each resolved a monophyletic Rheum ( Figure S1 in File S1), as did a dataset with indels excluded (not shown). Support values were low for most nodes in each analysis, and most differences between the trees therefore might reflect imperfect resolution. Despite this, the very different position of each R. franzenbachii version between the two trees is noteworthy, and for this species recombination affecting one ITS version should not be ruled out.
An initial phylogenetic analysis on the combined dataset was conducted using all versions and accessions of ITS for all Rheum species ( Figure S2 in File S1). However, for those that contained two barely divergent ITS versions, removing one of the two copies had no effect on support values or topology. Likewise, including only one accession per species, for those from which multiple accessions were sampled, had no effect on support values or topology. Therefore, the analysis was re-run using one randomly selected accession per species, and one randomly selected version from those with barely divergent ITS versions, but both versions of those with highly divergent ITS versions.
The Maximum likelihood (ML) trees based on nrDNA ITS and cpDNA are shown in Figure 2, including bootstrap and Bayesian support values. All species from Rheum comprised a monophyletic clade sister to the two-species genus Oxyria with high support values of 100% ( Figure 2). As with cpDNA data [14], four major tentative clades (A, B, C, D) were recovered although the support values for some of them remain low (Figure 2). Within these clades, phylogenetic structure was resolved within B and C, although only a few nodes in each had strong support (Figure 2).
The distinct versions from each of the seven species (R. hotaoense, R. officinale, R. tanguticum, R. pumilum, R. likiangense, R. franzenbachii and R. reticulatum) did not cluster together, but nested into different clades with those from other species. Three of them (R. hotaoense, R. officinale, R. tanguticum) had ITS version 1 and cpDNA of Clade A, but ITS version 2 within clade C; R. franzenbachii was similar except that ITS version 2 was in Clade B (Figure 2). R. pumilum also had ITS version 1 in Clade A, but had both ITS version 2 and cpDNA in Clade B. R. reticulatum had ITS version 1 in clade B, but version 2 and cpDNA in Clade C. Most strikingly, R. likiangense, was in Clade A for cpDNA, but its two ITS versions were in Clades B and C, respectively ( Figure 2).
In addition to these, two species displayed incongruence regarding their clade membership within the two phylogenies. R. lhasense was in Clade B for ITS but Clade A for cpDNA, whereas R. globulosum was in Clade C for ITS but Clade B for cpDNA. There was also incongruence within Clade C, with for example R. moorcroftianum strongly supported as sister to R. spiciforme for ITS, but R. reticulatum for cpDNA. Incongruence also occurs within Clade B, but is difficult to interpret because clade composition is very different between the two trees ( Figure 2).

Pairwise distances
For each of the seven species containing highly divergent versions of ITS, the p-distance between versions was greater than that between each version and the most similar ITS sequence found in another species (Table 3), indicating likely acquisition of a second version via hybridization [80]. Indeed, within-species pdistance was between 19 and 35 for these seven species, whereas the highest between-species p-distance detected was 16 (Table 3).

Discussion
ITS data revealed seven instances of Rheum species containing two divergent ITS versions, which resolved in different phylogenetic positions in each case (Figure 2), indicating hybridization. Furthermore, incongruence between ITS and a cpDNA phylogeny [14] (Figure 2), regarding the position of R. lhasaense and R. globulosum, revealed two further instances of hybridization, making nine in total. Other cases of minor incongruence within clades could indicate further hybridization events but other explanations such as lineage sorting effects are possible for these. In each of R. nobile, R. nanum and R. alexandrae , single accessions were found to contain two very similar but not identical versions of ITS; these might be due to ITS variation within species [85] (e.g. divergence between populations followed by gene flow) but do not provide additionally reliable evidence for interspecific hybridization.

Multiple ITS versions in seven Rheum species from the QTP
The presence of two ITS versions in seven Rheum species (R. hotaoense, R. officinale, R. tanguticum, R. pumilum, R. likiangense, R. franzenbachii and R. reticulatum) indicates that in each case a second version has been acquired, and that concerted evolution [86][87][88] has not yet had time to reduce the number of version back to one. This situation has been frequently reported in Angiosperms and is usually attributed to hybridization [52,[61][62][63][64]. An alternative hypothesis that the extra versions are ITS pseudogenes (see [88]) can be rejected because pseudogenes normally form a distinct clade, having a single common ancestor sequence at the time of duplication, typically before species diversification [89][90][91][92][93]; this was not the case for Rheum. Furthermore, none of the ITS sequences for the species concerned had lost the conserved seed Bootstrap support values from ML analyses using PHYML are given below branches and the corresponding Bayesian posterior probabilities from Bayesian analyses using MrBayes are shown above branches. For simplification, three monophyletic clade A1, A, B, C, D, were marked and also a paraphyletic group A2 on the cpDNA tree. On the ITS tree, four clades and ploidy of each Rheum species was marked and the seven species with multiple clones were also marked with clone serial numbers. The different colours branches were used to mark species with different characters, and the branches of glasshouse species was marked with triangle tag. doi:10.1371/journal.pone.0089769.g002 Table 3. p-distances between ITS within and between species, involving species with two divergent versions of ITS. plant specific 14-bp motif [82] and other two motifs in 5.8 S [83][84], or had the lower GC content (Table S1 in File S1), both of which are expected for pseudogenes. Additionally, we sequenced ITS sequences from 2-3 individuals for the seven species to confirm that the multiple copies were not from the sequencing error, and the two versions per species each clustered into two clades with distinct phylogenetic positions ( Figure S2 in File S2). The number of generations necessary for concerted evolution to reach completion is thousands to millions [87,90,[94][95][96][97]. The generation time in Rheum can be several decades, due to long-lived roots and rhizomes [14,33]. Therefore, the hybridization events revealed by these multiple ITS versions could be as much as several million years old, and if clonal reproduction makes genets live even longer, they could be older still [52,[61][62][63][64]. However, recent hybridization is also possible in each case (see below).
All of the seven species with two ITS versions were from the QTP. Among these, two were tetraploid (R. officinale and R. pumilum), four were diploid, and the chromosome number of R. hotaoense is unknown [43]. The tetraploids might be allopolyploids, but could also have received the second ITS copy via introgression following autopolyploidization [28]. Likewise the diploids could be homoploid hybrid species or have acquired a second ITS copy via introgression. Hence in each case three hypotheses are possible: hybrid speciation (allopolyploid or tetraploid depending on the species), ancient introgression, and recent introgression. However, despite this clear evidence for hybridization, the source species for the second ITS versions were not clear. A population genetic strategy based on more samples through its range and the nextgeneration sequencing method should help us to resolve the hybrid history of the 7 species more clearly.

Other evidence of hybridization in Rheum
In addition to these, we found that R. globulosum and R. lhasense had different phylogenetic positions on the ITS tree to the cpDNA tree ( Figure 2). The differing relative positions within Clade B of R. forrestii and R. sublanceolatum between ITS and cpDNA also suggest possible hybridization for all of these, although the clade composition is so different between the trees that it is difficult to interpret ( Figure 2). Small differences between topology within Clade C are also evident, e.g. R. moocroftianum has a different sister species in each one. This could reflect lineage sorting in a young clade, or recent hybridization. In particular, the possibility of shared haplotypes between species [98][99][100][101] needs to be examined in light of this result.
It is noteworthy that neither of the two ITS copies for R. likiangense (in clades B and C) has the same phylogenetic position as its cpDNA, which is basal to Clade A. This could reflect more than one hybridization event in its history, and strongly indicates that at least one such event was not recent.
It is likely that not all hybridization events have been detected by this analysis. While concerted evolution towards the paternal ITS type would produce incongruence such as seen here in R. globulosum [102], concerted evolution towards the maternal type would remove the signature of hybridization producing congruent phylogenetic positions [48].

Ancient or recent hybridization in Rheum?
Ancient introgression is very difficult to distinguish from hybrid speciation, unless large numbers of independent markers are available [103]. This study has uncovered such extensive evidence of hybridization in Rheum that a much larger, and more comprehensive data set will be required to unravel it. Of 24 species from the QTP, 9 show unequivocal evidence of hybridization, and there is tentative evidence in many more.
There are no cases where pairs or groups of hybridized species are monophyletic for both cpDNA and ITS, therefore there is no evidence in our data for a hybrid speciation event followed by subsequent speciation. However, R. officinale and R. tanguticum are closely related for both markers, so the possibility cannot entirely be ruled out.
Recent introgression by a single event would lead to all individuals having identical sequences for the captured ITS version, which is not what was observed in any of the species with two divergent ITS versions. Hence if the second versions were acquired by recent introgression then there were multiple, independent introgression events in each affected species. Furthermore, if all instances of multiple ITS versions resulted from recent introgression, then in each case one ITS version would match that for another species, the ITS donor. However, no two ITS sequences detected in this study were identical and the most similar between two species were R. tanguticum and R. palmatum which differed at 4 loci, which might be due to the limited sample size and the mutation after introgression for ITS loci was unlikely the parsimonious explanation. Although not all members of Rheum were examined, those species not examined tended to be narrowly distributed species, and hence unlikely to be recent ITS donors for the accessions with duplicated ITS. Although not unequivocal, this evidence makes it likely that at least some cases of duplicated ITS reflect ancient hybridization events. In particular, the incongruence between both ITS copies and cpDNA for R. likiangense cannot easily be explained by recent hybridization.
Morphology also provides clues to past events. Within ITS clade C, all species are decumbent except for four whose second ITS copy falls within this clade. Three of these species (R. officinale, R. hotaoense and R. tanguticum) have both cpDNA and their other ITS copy from Clade A, and also the non-decumbent habit of this clade. From this, introgression of ITS from a Clade C lineage might be a likely hypothesis for their origin. A similar but more complex origin involving a third lineage might apply to R. likiangense. Conversely, R. reticulatum has ITS from both clades B and C, but contains both cpDNA and the decumbent habit of Clade C, indicating possible introgression of ITS from Clade B. Hence ITS is also consistent with a decumbent habit being ancestral in Clade C, and cpDNA remains congruent with decumbent habit in all five cases above. However, R. globulosum has the ITS and decumbent habit of Clade C but the cpDNA of clade B, indicating chloroplast capture as a possible past event for this species. Hence the presence of this species in cpDNA clade B does not indicate an independent origin for the decumbent habit in this clade, as previously thought [14]; instead it is more consistent with a reticulation event. Nonetheless, three separate origins or decumbent habit (Clade C, R. alpinum and R. tibeticum) are confirmed by the ITS data, and the two separate origins for glasshouse morphology (R. nobile and R. alexandrae) [14] are also supported.
Minor incongruence within clades, notably Clade C, might reflect more recent and ongoing hybridization events, such as the sharing of plastids between species, which is common among rapidly radiated groups [99][100] and other species-rich genera [101][102]. Therefore, extensive sampling of different individuals and populations of Rheum around the QTP, for both cpDNA and ITS, will be necessary to tease apart the effects of recent and ancient hybridization events. For now we can conclude that hybridization among QTP species of Rheum has been extensive, and recent enough for concerted evolution not to have acted in many cases.

Supporting Information
File S1 Figure S1, The phylogenetic trees reconstructed using maximum likelihood method on a basis of nrDNA ITS1 (left) and ITS2 matrix (right), respectively. Bootstrap support values from ML analyses using PHYML are given below branches and the corresponding Bayesian posterior probabilities from Bayesian analyses using MrBayes are shown above branches. Figure S2, The phylogenetic trees reconstructed using maximum likelihood method on a basis of nrDNA ITS matrix including extra sequences from more individuals. Bootstrap support values from ML analyses using PHYML are given below branches and the corresponding Bayesian posterior probabilities from Bayesian analyses using MrBayes are shown above branches. The letter (X, Y, Z) after the species name present different individuals, and the numbers mean clone order. Table S1, Plant materials and list of accession numbers for the taxa used in the present study. The intron of trnK includes the matK gene and non-coding segments; rbcL-accD and trnL-F are intergenic spacers. Table S2, The frequency of two versions from all positive clones per Rheum species within 2-3 individuals. Table S3, GC content of ITS regions, the character of 5.8 S and ITS2 of Rheum species. Rumex crispus ITS region was used as reference. a present sharing three conserved motifs (motif 1: 59-CGATGAAGAACGTAGC-39, motif 2: 59-GAATTGCAGAATCC-39 and motif 3: 59-TTTGAACGCA-39); b present a stem hybridization; c present homologous structure existed. (DOC)