Analyzing the phylogeny of poplars based on molecular data

Methods for constructing trees using DNA sequences, known as molecular phylogenetics, have been applied to analyses of phylogenetic origin, evolutionary relatedness and taxonomic classification. Combining data sequenced in this study and downloaded from GenBank, we sampled 112 (chloroplast data) / 122 (ITS data) specimens belonging to 49 (chloroplast data) / 46 (ITS data) poplar species or hybrids from six (chloroplast data) / five sections (ITS data). Maximum parsimony and Bayesian inference were used to analyze phylogenetic relationships within the genus Populus based on eight chloroplast combinations and ITS regions. The results suggested that Bayesian inference might be more suitable for the phylogenetic reconstruction of Populus. All Populus species could be divided into two clades: clade 1, including subclades 1 and 2, and clade 2, including subclades 3 and 4. Species within clade 1, involving five sections except for Leuce, clustered coinciding with their two specific main geographical distribution areas: China (subclade 1) and North America (subclade 2). Clustering in subclade 3, section Leuce was confirmed to be of monophyletic origin and independent evolution. Its two subsections, namely Albidae and Trepidae, could be separated by chloroplast data but had frequent gene flow based on ITS data. Phylogeny analysis based on chloroplast data demonstrated once more that section Aigeiros was paraphyletic and further showed that the P. deltoides lineage is restricted in subclade 2 and that P. nigra lineage, located in subclade 3, originated from a hybrid of which an Albidae ancestor species was the material parent. Similarly, section Tacamahaca was found to be paraphyletic and had two lineages: a clade 1 lineage, such as P. cathayana, and a clade 2 lineage, such as P. simonii. Section Leucoides was paraphyletic and closely linked to section Tacamahaca. Their section boundaries were not conclusively delimitated by sequencing information.

Introduction Poplars (Populus L.), one of the world's most important forest trees, are accepted as model trees due to their high growth, strong adaptability, easy propagation and small genome [1][2][3]. In plant systematic databases, Eckenwalder classified Populus into 22 species in six sections: Tacamahaca, Aigeiros, Leuce, Leucoides, Turanga and Abaso [4], but the Flora of China recorded 71 species from five sections (except for Abaso) [5]. They are widely distributed across the northern hemisphere [5][6]. China has rich poplar germplasm resources because it is one of the most important poplar distribution areas. In Flora of China, 47 species are endemic and relatively unknown outside the country [4][5]7]. Many studies have only investigated a few species, such as P. tomentosa and P. euphratica, while almost completely ignoring other species. To develop and utilize resources as an important foundation for basic and applied research, it is imperative to understand the genetic and phylogenetic relationships among Populus.
Frequent natural interspecific hybridization in Populus has resulted in difficulty with its classification. Most studies [4,[8][9][10][11] have concluded that the genus Populus is monophyletic, but the taxonomic and phylogenetic relationships within this genus are ambiguous. Molecular evidence has shown differing results. Among the five sections, Turanga is controversial. AFLP [12] and single-copy nuclear DNA [11] data suggest that section Turanga forms a separate clade, while chloroplast data show that this section is related to section Tacamahaca [11]. Section Leuce is thought to be monophyletic [9][10][11][12], and the main arguments are whether it is the basal lineage and how to assess the taxonomic positions of its two subsections (Albidae and Trepidae). Most studies have demonstrated a close relationship between sections Tacamahaca and Aigeiros [9][10][11][12]. However, both sections have relatively high intrasectional differences and are likely polyphyletic [11]. This has led to a lack of information, such as the number of subsections (or lineages) and the relationship between subsections (or lineages). Section Leucoides is often ignored and has been rarely studied. Wang et al. [11] showed that section Leucoides was related to sections Tacamahaca and Aigeiros. Cervera et al. [12] suggested that P. lasiocarpa and P. violascens of section Leucoides were classified into section Tacamahaca. Meanwhile, the classification of some endemic species as separate species is still doubtful. For example, P. gonggaensis has annual shoots with a thin pubescence and is subtly different from species that have dense, crimped villi or are hairless, such as P. lasiocarpa. This variation is difficult to morphologically identify by eye.
The highly different evolution rates among DNA regions can infer the phylogenetic relationships between Populus species at any classification level. Ideally, the evolutionary relationships among species are unique and can be described using a species tree. However, we have only explored these relationships using gene trees based on one or a few gene regions [13][14][15]. With increasing amounts of molecular evidence becoming available, an increasing number of cases showing inconsistencies among gene trees have been found, such as the placement of P. nigra, which showed high genetic differentiation with consectional P. deltoides when Populus was molecularly analyzed [9,11,[16][17].
Conflicting gene trees were easily obtained from the chloroplast and mitochondrial genomes with uniparental inheritance and from nuclear genomes with biparental inheritance [18][19]. The chloroplast genome is associated with organism phylogeny but is not hybrid and allopolyploid in nature. The nuclear genome can be used to analyze hybrid origin and reticulate evolution but does not confirm whether the gene tree is attributable to paralogous copies [20][21]. Therefore, combining gene trees created using chloroplast and nuclear datasets is imperative if we are to improve understanding of the phylogenetic relationships among Populus.

Taxon sampling
We collected leaves from Populus species across China and obtained sequence data from Gen-Bank. Salix matsudana was used as an outgroup. The information on sampled species and locations is shown in S1 Table. No permits were required for the described study because Chinese legislation does not forbid access to study poplar in nature reserves and national parks. We confirm that the study specimens included only Salicaceae samples, and these samples were not involve from endangered or protected species.
A total of 112 specimens, representing 49 species or hybrids from six sections, were used for chloroplast DNA phylogeny, in which 80 specimens we collected were successfully amplified and sequenced for chloroplast regions. To avoid stochastic error, we only downloaded chloroplast genome sequences from Populus and extracted and combined regions. Of 122 specimens representing 46 species or hybrids from five sections provided for ITS phylogeny, 62 of the specimens we collected were successfully sequenced for regions.

DNA extraction, PCR amplification and sequencing
Populus leaves were dried in silica gel, and modified SDS [29] was used for genomic DNA extraction. The 25-μl PCR amplification reaction, containing 1 μl of DNA (approximately 20 ng), 12.5 μl of 2× Taq MasterMix, 1 μl of both reverse and forward primers (10 pmol) and 9.5 μl of ddH2O, was performed as follows: one cycle of initial denaturation at 94˚C for 4 min; 35 cycles of denaturation at 94˚C for 30 s, annealing at approximately 54˚C-59˚C (depending upon the primer sets used) for 45 s, and elongation at 72˚C for 60 s; and a final cycle of elongation for 5 min. Then, sequencing was completed by Sangon Biotech Co., Ltd. (Shanghai, China).

Data analysis
After the DNA sequences had been edited and aligned using Clustal X 2.0 [30], MEGA 5.02 [31] was used to measure the sequence lengths and count the number of variable and informative sites for each region. Pairwise distances were calculated on the basis of the Kimura 2-parameter (K2P) model using MEGA 5.02 with the pairwise deletion and uniform rates options.
The incongruence length difference (ILD) test [32] was used to evaluate the congruence of eight chloroplast regions in PAUP � 4.0b10 [33]. The ITS data and the combined data for all eight chloroplast regions, with Salix matsudana as the outgroup, allowed us to carry out a phylogenic analysis of the genetic relationships between Populus species using two algorithms. Maximum parsimony analysis was undertaken. A full heuristic search was used for branch support with 1000 replicates [34]. Bayesian inference was performed in MrBayes 3.1.2 [35] based on a best-fitting nucleotide substitution model and the Akaike information criterion (AIC) derived from Modeltest 3.7 [36]. Parameter settings included 1,000,000 generations, in which trees were sampled once every 100 generations and the first 25% of sampled trees were calculated as burn-in. Posterior probability was also estimated using Markov chain Monte Carlo (MCMC).

Length and number of variable and informative sites in each region
The high congruence for all eight chloroplast regions was identified by the ILD test (P = 0.18 > 0.05). As shown in Table 1, the combined region had 188 variable sites and 84 informative sites and was 5171 bp long. ITS region analysis suggested that this region contained 74 informative and 92 variable sites that belonged to a 575-bp aligned sequence.

Pairwise distance analysis
The average K2P distances (Table 2) based on the combined eight chloroplast and the ITS regions in Populus were 0.00292 and 0.01818, respectively. The chloroplast combination dataset showed that the pairwise distance between the six sections ranged from 0.00211 (Abaso and Aigeiros) to 0.00397 (Leuce and Abaso), while it ranged from 0.01057 (Tacamahaca and Leucoides) to 0.03754 (Leuce and Turanga) by ITS analysis.
The chloroplast combination dataset showed that the average interspecific pairwise distance was highest for section Tacamahaca at 0.00241, followed by Leucoides at 0.00233, Aigeiros at 0.00221, Leuce at 0.00142 and Turanga at 0.00112. The rank order for average intraspecific divergence was Tacamahaca (0.00071) > Turanga (0.00043) > Leuce (0.00026) > Leucoides (0.00014) > Aigeiros (0). The ITS region dataset showed that the highest inter-and intraspecific distances were in both sections Tacamahaca (0.01093 and 0.00558, respectively), and the lowest values were present in sections Leuce (0.00221) and Leucoides (0.00059). The error bars for pairwise distances are the standard deviations of linear fit. As shown in Figs 1 and 2, large error bars between consectional species were observed for sections Tacamahaca, Aigeiros and Leucoides, which suggested that the fluctuation of interspecific variation within section was high.

Phylogenetic analysis
Using S. matsudana as the outgroup, the phylogenetic relationships among Populus, based on the chloroplast combination dataset, showed that four major clades with high or moderate support values were distinguished in the MP tree (Fig 3). Bayesian inference further adjusts mixed (0.98 posterior probabilities), and they were clearly separated from section Turanga (1.00 posterior probabilities). All American Populus species formed subclade 2 (1.00 posterior probabilities), in which P. mexicana was identified with 1.00 posterior probabilities. These two  species from section Leuce were clustered in clade 2, with 90% bootstrap support, and they were clearly separated from the others. Sections Tacamahaca, Aigeiros, Leucoides and Turanga and their natural hybrids were sister taxa (74% bootstrap support) in clade 3, in which Turanga could be identified with 100% bootstrap support. Compared with the above MP analysis, Bayesian inference (Fig 6) more strongly supported these three clades, with 1.00 posterior probabilities. Bayesian tree analysis also more clearly showed the phylogenetic relationships of some specimens from sections Tacamahaca, Aigeiros, Leucoides and Turanga. For instance, P. afghanica in section Aigeiros was clustered with P. lasiocarpa in section Leucoides, which was supported with 0.90 posterior probabilities. However, a number of specimens were characterized with "comb" and could not be identified. Some specimens belonging one species were separated, such as P. simonii, or were clustered with specimens of other species, such as P. qamdoensis.

The incongruence gene trees for Populus
One of most notable difficulties in phylogenetic reconstruction is the widespread occurrence of incongruence among methods and among individual genes or different genomic regions [37][38]. Incongruence among methods and genes [37] has been generally accepted and is also shown in the phylogeny results produced by this study. High incongruence was associated with differences between chloroplast and ITS regions due to actual differences in their evolutionary histories. High frequency hybridization events played important roles in Populus phylogenies, which is reflected by ITS tree analysis. Four section species (except for Leuce) were not clearly separated and formed a "comb" clade. Comprising similar numbers of informative sites as in the ITS analysis (Table 1), combined chloroplast regions effectively discriminated clustering of section, subsection and similar species with high reliability. Differences in the gene trees based on the same data were attributed to differences between algorithms models. In comparison, we found that the support values from Bayesian posterior probabilities were higher than those from maximum parsimony. Furthermore, the Bayes trees were able to group together relative species because of the high posterior probabilities derived from calculating the statistical likelihood of their sequences. For instance, P. wilsonii and P. szechuanica var. tibetica #2 were independent of four clades in the chloroplast MP tree, but they were clustered into subclade 1 of the Bayes tree with high posterior probabilities. Clade 3 in the chloroplast MP tree was only identified with 69% bootstrap support, but it was supported with 0.97 posterior probabilities as clade 2 in the Bayes tree. Therefore, Bayesian inference is more suitable for phylogenetic reconstructions of Populus.

The phylogenetic relationships between subsections
Section Leuce has been classified into two subsections: Albidae and Trepidae [12,[39][40][41], which however are not clearly separated by nuclear data [9,11,41], chloroplast data [9,11,42], RAPD data [43] or AFLP data [44]. The major disagreement centers around the taxonomic position of P. adenopoda. In this study, we sampled 13 species or hybrids from section Leuce: P. tomentosa, P. alba var. pyramidalis, P. caspica and P. alba represented subsection Albidae, and P. rotundifolia, P. tremula, P. davidiana, P. hopeiensis, P. qiongdaoensis and P. adenopoda represented subsection Trepidae. The chloroplast phylogenetic trees showed that subsection Albidae could be identified and its species grouped with P. adenopoda. P. rotundifolia, P. rotundifolia var. duclouxiana and P. davidiana could not be clearly separated, and they are sister to P. tremula, P. qiongdaoensis and species in subsection Albidae. The ITS phylogenetic trees showed that the two subsections could not be clearly separated. These results suggested that subsection Albidae was monophyletic and that there was frequent gene flow between the two subsectional species, especially for P. adenopoda.
The section Aigeiros is composed of two main species, P. nigra and P. deltoides. Some molecular evidence has shown significant genetic differences between these species [11][12][45][46][47][48]. ISSR analysis supported the suggestion that P. nigra grouped with species in section Tacamahaca [10], whereas AFLP analyses suggested that P. deltoides grouped with Tacamahaca species [12]. The study of Li et al. [44] was able to divide these two species using AFLP markers, but they were still in one clade. Chloroplast data [9,11,49] showed that P. nigra grouped with species in section Leuce, which suggested a possible hybrid origin for P. nigra after comparing the nuclear sequences [9,11].
We agree with the opinion that P. nigra is a hybrid derived from a natural cross between section Leuce as the maternal parent and P. deltoides as the paternal parent. Furthermore, subsection Albidae is highly likely to be the maternal parent because the Albidae and P. nigra lineages share a common ancestor. P. nigra var. italica and P. beijingensis (P. nigra var. italica × P. cathayana) belong to the maternal lineage of P. nigra. Chloroplast data showed that these species were clearly separated from the remaining species (P. canadensis and P. deltoides × P. nigra cv. Chile are hybrids from crosses between P. deltoides and P. nigra, and P. deltoides 'Shan Hai Guan' and P. deltoides 'Lux' are cultivars of P. deltoides) belonging to the P. deltoides maternal lineage and clustered with species in section Leuce, whereas they were not identified using the ITS phylogeny. P. fremontii had been a subspecies of P. deltoides until Flora of North America considered it a separate species. Our chloroplast phylogeny analysis supported its high maternal homology with P. deltoides.
The diversity of Tacamahaca species and their distribution areas is very suitable for analyzing the phylogeny of Populus [11]. However, most species are wild types and are difficult to collect. This limits the phylogenetic reconstruction of section Tacamahaca and even the genus Populus. Previous research has shown the complexity in origin and evolution that, in most cases, has led to large genetic distances between consectional species [11-12, 17, 50]. This section is thus thought to be paraphyletic, and the interspecific relationship is most complicated. Our study supports the paraphyletic nature of section Tacamahaca after analyzing 24 species or hybrids. The results show high interspecific pairwise distance values and error bars for both the chloroplast and ITS datasets, which indicate distinct genetic differences among these species. Section Tacamahaca species were divided into two clades (1 and 2) in the chloroplast Bayes tree, in which the species in subclades 1 and 4 had overlapping distribution, suggesting its two lineages. These lineages showed frequent gene flow, reflecting nuclear genome affinity with recombination during concerted evolution, explaining why taxonomic positions differed between the chloroplast and ITS phylogenic trees.
Cervera et al. [12] found that section Leucoides showed interspecific heterogeneous relationships. The four species in section Leucoides from this study, namely, P. lasiocarpa, P. wilsonii, P. gonggaensis and P. pseudoglauca, also produced high pairwise distance values and had different phylogenetic positions (especially for P. gonggaensis). This indicated a paraphyletic nature, although it is doubtful whether P. gonggaensis can be considered a separate species.

The phylogenetic origin and evolution of Populus
It has been conclusively confirmed by many studies that Populus is of monophyletic origin [9,[11][12]51]. During the subsequent reticulate evolution, the genesis of new species or speciation has brought about the diversification of lineages, which are widely accepted to divided into six sections at present. Phylogenetic analyses, especially those based on gene sequences, are one of the most important and widely used ways to reconstruct the evolutionary process [52][53]. Phylogenetic analysis based on AFLP [12] and ITS [51] data showed that section Leuce was the most basal lineage in the genus Populus. ITS sequence-based phylogeny from this study also defined it as a basal taxon of the tree.
However, the opinion of section Leuce as the basal lineage contradicts the fossil records. Fossils are the only unequivocal proof of the actual relationships between leaves, stems and reproductive organs [54]. P. wilmattae, one of the earliest probable Populus fossil species known, is remarkably similar to the extant species P. mexicana from section Abaso [6,54]. P. mexicana had been placed in section Aigeiros until Eckenwalder [55] made the taxonomic decision to place it in a new section, "Abaso", after unscrambling the morphological, distributional, ecological and paleobotanical information. Further analysis based on morphological evidence showed that P. mexicana was closely related to section Turanga, followed by section Aigeiros [4,56]. Our chloroplast data clustered section Abaso, Aigeiros and Turanga into clade 1 in the Bayes tree. Consequently, this clade characterizes more traits of earliest probable fossil species P. wilmattae than clade 2, including section Leuce. The appearance of section Leuce as the basal taxon of the ITS tree is related to the fact that it might have little reticulate evolution with other sections and was clustered into a species-poor group. A widespread misunderstanding occurs when researchers consider species-poor groups as basal branches and interpret them as ancestral [57][58][59].
Moreover, species within clade 1 (involving five sections except for Leuce) clustered coinciding with their specific geographical distribution areas. Species within subclade 1 were geographically restricted mainly in China (except for P. ilicifolia, located in East Africa), while subclade 2 only contained North America species. These findings suggested that geographical isolation is a main factor contributing to diversification of Populus lineages and that convergent evolution of chloroplast may function in their evolutionary process.
After comparing the morphological characteristics, section Leucoides was found to be similar to section Turanga. In other words, section Leucoides might be an ancestral member of Populus. In addition, its preference for permanent swamp accords with the hypothesis that the Populus ancestor is a mountain species [4,60]. Our chloroplast phylogenetic tree showed that, although it contained only four species, section Leucoides was closely linked to section Tacamahaca. P. gonggaensis and P. wilsonii clustered with the subclade 1 species (e.g., P. cathayana) in section Tacamahaca, whereas P. lasiocarpa and P. pseudoglauca clustered with the subclade 4 species (e.g., P. simonii) in section Tacamahaca. However, phylogenetic analysis did not conclusively delimitate their section boundaries.

The phylogenetic evolution of P. szechuanica var. tibetica
Probable Populus fossils have been found that date from the Upper Cretaceous to the Oligocene ages [61][62]. Without extant (ignoring introduced members of northern taxa) and fossil species from the Southern Hemisphere, Raven and Axelrod [63] suggested that Populus was in Laurasian but did not confirm the specific location. The abundant genetic resources for Populus and the geological history of Southwest China suggested that this region might be a center of the genus Populus [64][65][66][67]. Gong[60] also supported the hypothesis that Populus originated from Southwest China after combining data from fossil, paleogeographic, paleoclimatic, and geographic information sources, etc.
Species in phylogenetic trees grouped generally along their species lines. However, P. szechuanica and P. szechuanica var. tibetica in Southwest China were found to be exceptions. For P. szechuanica, two specimens we collected were located in subclade 4 of the chloroplast Bayes tree, and one specimen (MG262357) obtained from GenBank was distributed to subclade 1. Because we lack information for MG262357, we do not know whether the phenomenon reflects a real difference within species or just a specimen misidentification.
P. szechuanica var. tibetica is a variety of P. szechuanica according to classical taxonomy and is widely distributed at altitudes of approximately 2000-4500 m above sea level in the Tibet Plateau and adjacent areas [68][69]. The study based on EST-SSR [69] revealed that low genetic differentiation was attributable to populations with genotypes from low-, mediumand high-altitude species in the Sejila Mountain area, and there was no clear correlation with altitude. SSR analysis performed by Bo [70] divided four natural populations of Tibetan poplar into two groups. One group contained populations from Nyingchi and Lhasa, and the other contained populations from Xigaze and Shannan. Variations were found mainly within individuals, and no significant correlation was found between genetic and geographical distances.
After analysis of taxonomic position by trnL-F sequence, Wei et al. [42] found that P. szechuanica var. tibetica was independent of the other Populus species used in his study, which suggested an independent evolutionary path that correlated with willow (Salix).
The phylogeny in this study showed that, over a wider region, specimens of P. szechuanica var. tibetica clustered coinciding with their geographical location. Three specimens (#4, 5 and 6) of P. szechuanica var. tibetica from Lhasa grouped with subclade 1 (Fig 4) species, such as P. cathayana, P. wilsonii and P. euphratica, based on the chloroplast Bayes tree and were located as the basal taxa of the ITS trees. The natural barriers, i.e. the Tibetan Plateau and Hengduan Mountains, had made a relatively enclosed space and prevented gene flow with conspecific and consectional Populus. In contrast, the two specimens of P. szechuanica var. tibetica from Deqin (#1) and Yajiang (#3) were separated from three specimens in Lhasa by both chloroplast and ITS analyses and were always near the top of the tree. The #2 sample of P. szechuanica var. tibetica from Markam was a transitional type because it clustered with specimens #4, 5 and 6 from Lhasa based on chloroplast phylogeny and with specimens #1 from Deqin and #3 from Yajiang based on ITS phylogeny. The combination of phylogenetic position and geographical region seemed to provide an evolutionary path (Fig 7), but further detailed studies with more populations and specimens are needed to confirm this possibility.
In conclusion, our study has focused on the phylogenetic relationships of Populus and has revealed the intrasectional relationships and reticulate evolutional patterns, which confirmed some of the hypotheses put forward in previous studies and offers some new suggestions. Multiple gene trees and extensive geographical species are effective resources when assessing the Phylogeny of poplars systematics and reconstructing the phylogeny of Populus. However, further analyses on more specimens (e.g., P. szechuanica var. tibetica population), species (e.g., P. mexicana) and sequence information (e.g., single-copy nuclear genes and all chloroplast genomes) are required.
Supporting information S1