Barcoding Poplars (Populus L.) from Western China

Background Populus is an ecologically and economically important genus of trees, but distinguishing between wild species is relatively difficult due to extensive interspecific hybridization and introgression, and the high level of intraspecific morphological variation. The DNA barcoding approach is a potential solution to this problem. Methodology/Principal Findings Here, we tested the discrimination power of five chloroplast barcodes and one nuclear barcode (ITS) among 95 trees that represent 21 Populus species from western China. Among all single barcode candidates, the discrimination power is highest for the nuclear ITS, progressively lower for chloroplast barcodes matK (M), trnG-psbK (G) and psbK-psbI (P), and trnH-psbA (H) and rbcL (R); the discrimination efficiency of the nuclear ITS (I) is also higher than any two-, three-, or even the five-locus combination of chloroplast barcodes. Among the five combinations of a single chloroplast barcode plus the nuclear ITS, H+I and P+I differentiated the highest and lowest portion of species, respectively. The highest discrimination rate for the barcodes or barcode combinations examined here is 55.0% (H+I), and usually discrimination failures occurred among species from sympatric or parapatric areas. Conclusions/Significance In this case study, we showed that when discriminating Populus species from western China, the nuclear ITS region represents a more promising barcode than any maternally inherited chloroplast region or combination of chloroplast regions. Meanwhile, combining the ITS region with chloroplast regions may improve the barcoding success rate and assist in detecting recent interspecific hybridizations. Failure to discriminate among several groups of Populus species from sympatric or parapatric areas may have been the result of incomplete lineage sorting, frequent interspecific hybridizations and introgressions. We agree with a previous proposal for constructing a tiered barcoding system in plants, especially for taxonomic groups that have complex evolutionary histories (e.g. Populus).


Introduction
The species within the genus Populus (collectively known as poplars) are one of the world's most important groups of forest trees: they are widespread and play a significant role in the ecosystem of temperate and boreal forests across the northern hemisphere [1][2]. In addition, because of their fast growth rates, profuse vegetative propagation, adaptability to various ecological conditions and the numerous uses for their wood (e.g. timber, paper pulp and bio-energy resources), species of this genus are widely cultivated and exploited [1,[3][4]. Meanwhile, poplars have become a model organism for the study of tree biology [5][6][7] and publication of the full genome of western black cottonwood (P. trichocarpa) has attracted even more research into this group [6][7][8].
From an economical view of point, the wild Populus species undoubtedly provide the most critical breeding resource in the future [1,[3][4]. However, identification of the wild Populus species is still difficult because of their extensive interspecific hybridization and high levels of intraspecific morphological variation [2,[9][10][11]. The recently developed DNA barcoding approach is one way to address this problem.
DNA barcoding aims to achieve accurate species identification by sequencing a standard region of DNA [12][13][14][15][16]. The mitochondrial CO1 (cytochrome c oxidase subunit 1) gene has been found to be highly efficient for discriminating animal species, including amphibians (e.g. [17]), birds (e.g. [18]) and fishes (e.g. [19]). However, in plants, mitochondrial regions are unsuitable for this approach because of their low mutation rates [20][21] and the search for suitable candidates has instead focused on chloroplast and nuclear gene fragments [14][15][16][22][23][24]. Based on assessments of recoverability, sequence quality, and levels of species discrimination, the CBOL Plant Working Group [15] recommend the two chloroplast locus combination of matK+rbcL as the core barcode for land plants, and trnH-psbA and the nuclear ribosomal internal transcribed spacer (ITS) as being complementary. A recent survey involving a larger collection of samples argued that the ITS (or ITS2) region should also be included in the core barcode for seed plants [16]. In addition to choosing barcoding markers, sampling of many individuals within each species is also essential for establishing a reference database for universal application [16].
In Populus, the identification of different species and clones is in continuous development. Previous studies demonstrated that isozyme markers are useful for the differentiation of the sections Aigeiros, Tacamahaca and Populus (e.g. [25][26]), and nuclear simple sequence repeats (SSRs) as well as amplified fragment-length polymorphism (AFLP) markers have a high potential to discriminate between clones [27][28][29][30][31]. Meanwhile, some nuclear genes, such as PPO, LEAFY, GA20 oxydase or CAD-like, are capable of differentiating species and hybrids [10,32]. Recently, Schroeder et al. [11] tested 40 chloroplast barcoding markers in seven widely cultivated poplar species and found that the combination of two intergenic spacers (trnG-psbK, psbK-psbI) and the coding region rpoC had the highest discrimination power. Wang et al. [33] surveyed the population genetics of two poplar sister species from the arid area of western China and suggested that the major ITS genotype clearly differentiated this species pair.
The genus Populus is dioecious, and since the pollen and seeds (usually small and numerous) are dispersed by wind, interspecific hybridization and introgression occur intensively and extensively among sympatric (or parapatric) species. This can occur between naturally co-existing species [33][34][35] or between exotic (cultivated) and native species [36][37][38]. Therefore, it would be most practical to test barcode markers in a group of species that are native to a particular area, although it is necessary to perform a preliminary screening of barcode markers among a group of widely cultivated (but well diverged) species that are naturally distributed across different areas. In this study we used several widely accepted plant barcoding markers (chloroplast: matK, rbcL and trnH-psbA; nuclear: ITS) as well as two efficient chloroplast intergenic spacers (trnG-psbK and psbK-psbI) to delimitate 95 samples, representing 21 Populus species, comprising more than eighty percent of the native popular species that occur in western China. We aimed to (1) test the discrimination power of these barcode markers, alone or in combination; (2) establish a reference database to facilitate future identification of Populus species from this area; and (3) propose a way forward for the development of a highly efficient species identification approach in this genus.

Ethics Statement
All leave samples employed in this study were collected from tree species that are not endangered, and these trees grow in public area where no permission for collection of leaves is needed in China.

Sample Collection
Leaves of 95 trees representing 21 poplar species were collected from western China (Table S1). Fresh leaves were dried and stored in silica gel, and the latitude, longitude and altitude of each collection site were recorded using an eTrex GIS unit (Garmin, Taiwan). Among these species, P. 6canescens (Aiton) Smith is a well studied hybrid between P. alba and P. tremula [39240], and P. 6 jrtyschensis Ch. Y. Yang is known to be a hybrid between P. laurifolia and P. nigra [41][42] according to morphological characters but has not previously been assessed genetically. For the 21 species used to evaluate the candidate barcode loci, 17 species were represented by two or more individuals (Table S1).

Data Collection
Six candidate DNA barcodes, including two coding genes (matK and rbcL) and three intergenic spacers (trnH-psbA, trnG-psbK and psbK-psbI) from the chloroplast genome, as well as a nuclear ITS region, were evaluated. Total DNA was extracted from silica-dried leaves using a modified cetyltrimethylammonium bromide (CTAB) method [43]. Polymerase chain reaction (PCR) amplifications were carried out using the primers listed in Table S2, following the protocol of Schroeder et al. [11] and the China Plant    BOL Group et al. [16]. For DNA sequencing, we used the service provided by BGI (Beijing, China), and all sequences reported in this study have been deposited in NCBI GenBank under accession numbers KC485082-KC485262 (Table S1).

Data Analyses
Sequence alignments were performed using MUSCLE [44] and refined manually in MEGA 5 [45]. Insertions/deletions (Indels) and single nucleotide polymorphisms (SNPs) were identified by DnaSP version 5.0 [46]. To assess the effects of barcode combinations on species discrimination, all two-or three-locus combinations and the combination of each chloroplast region and ITS were evaluated; we took this approach because two or threelocus barcodes are often recommended in published studies, e.g. [47][48]. To evaluate species discrimination success, we applied two different methods, PWG-Distance and Tree-Building. The PWG-Distance method (simple pair-wise matching for DNA barcoding) recommended by the CBOL Plant Working Group [15] employs distances calculated from pair-wise alignments counting unambiguous base substitutions only, and pair-wise pdistances were calculated using PAUP* 4.0b10 [49] and we considered discrimination to be successful if the minimum uncorrected interspecific p-distance involving a species was larger than its maximum intraspecific distance. When using the Tree-Building method, a Neighbor-Joining tree was constructed in the program PAUP*4.0b10 [49] under the Kimura 2-parameter substitution model, and species were considered discriminated if all individuals of a species formed a monophyletic group (e.g. [16]).

Sequence Characterization
All the tested chloroplast regions, matK, rbcL, trnH-psbA, trnG-psbK and psbK-psbI were successfully amplified with primers used in previous studies (Table S2). However, problems were encountered in the sequencing of trnH-psbA, where a mono-nucleotide repeat (poly(A)) fragment in the middle part of this region lead to sequencing failure of the latter half of this region for most individuals. Therefore this relatively short chloroplast region was sequenced with both forward and reverse primers and then assembled. Similar problems were also found in psbK-psbI for several individuals. When amplifying and sequencing the nuclear ITS region with universal primer combinations, about fifteen percent of the resultant sequences are unreadable, and several high quality sequences were very difficult to align with ITS sequences of Populus. These sequences were therefore examined using NCBI BLAST online, and the results showed that they are ITS regions belonging to fungi that occur on leaves of poplar (e.g. poplar crust). A Populus specific primer at the ITS1a end (Table S2) was then designed, and together with the universal primer at the ITS4 end, these fungal contaminated individuals were amplified and sequenced again. At last, chloroplast regions were successfully amplified and sequenced for all 21 species, while ITS sequences were successfully recovered for 20 species (all except P. purdomii).
In total, 555 new Populus sequences were generated in this study (Table S1). The aligned lengths of the six DNA regions range from 281 bp (trnH-psbA) to 966 bp (rbcL), and variable sites (SNPs) were lowest for rbcL (1.35%) and highest for ITS (6.54%) ( Table 1). In addition, we found one and three heterozygous sites in the nuclear ITS sequences of the two putative hybrid species, P. 6 canescens and P. 6 jrtyschensis, respectively.

Species Discrimination
Based on these sequences, we calculated the successful discrimination rate for six single regions and 27 combinations of two, three, five or six regions based on the PWG-Distance and Tree-Building methods (Fig. 1); note that the total number of species were 21 for any single chloroplast region or combination of chloroplast regions, and 20 for the nuclear ITS or combinations between this region and any single chloroplast region. In the following text, if there is one rate given for a single region or a combination of regions, this is the rate for both the PWG-Distance and Tree-Building methods. The discrimination efficacy ranged from 0 (rbcL, trnH-psbA) to 19.0% (matK) for any single chloroplast locus, 4.8% (R+H) to 28.6% (M+G) for combinations of two chloroplast loci, 14.3% (R+H+P, R+H+G, R+P+G) to 28.6% (M+P+G, M+R+G) for combinations of three chloroplast loci; and was 28.6% for the combination of all five chloroplast loci (M+R+H+P+G). Meanwhile, the discrimination efficacy of the nuclear ITS region was 40.0% according to the PWG-Distance method and 45.0% according to the Tree-Building method; these values are higher than the efficacy of any single chloroplast locus or two-or three-locus combinations, and even higher than the combination of all chloroplast loci (Fig. 1). Furthermore, no chloroplast barcode was successful in differentiating the sister species, P. euphratica and P. pruinosa, in Section Turanga Bunge, but the nuclear ITS barcode was [33]. Among the five combinations of each chloroplast region and the nuclear ITS region, the discrimination efficacy was highest for I+H (Tree-Building: 55.0%, PWG-Distance: 50.0%), intermediate for I+R (Tree-Building: 50.0%, PWG-Distance: 45.0%), I+M (45%), and I+G (45%), and lowest for I+P (40%). At last, the combination of the nuclear ITS region and all five chloroplast regions successfully identified 8 out of 20 species (40%).

Low Efficacy of the Five Chloroplast Locus Candidates
Populus is an economically and ecologically important genus of trees, and therefore considerable efforts have been put into the differentiation of widely cultivated species, especially cultivars and hybrids [10][11]27,29,[31][32]50]. Until now, few studies have focused on the delimitation of wild Populus species (e.g. [11]), and the recent development of DNA barcoding provides a novel opportunity. DNA barcoding aims to identify species based on a single DNA region or a combination of a few DNA regions without taxonomic knowledge [12]. To achieve this, the barcode regions should be of short lengths, with high recovery rates (success rate for amplifying and sequencing) and have a high species discrimination rate [14][15][16]48].
However, in this study, we found that all tested single chloroplast barcoding regions or different combinations of them (2-, 3-or 5-loci) have surprisingly low discrimination efficacy in Populus species from western China (Fig. 1). Among the single region barcodes, our highest discrimination rate was 19.0% for matK, while rbcL and trnH-psbA failed to differentiate any species (Fig. 1). For combinations of chloroplast loci, the highest discrimination rate was 28.6% for one 2-locus combination, two 3-locus combinations, as well as the combination of all five chloroplast regions. It is noteworthy that the core barcode region combination (matK+rbcL) recommended by the CBOL Plant Working Group [15] could only successfully identify 23.8% of all species, and adding trnH-psbA (M+R+H: 23.8%) did not improve the discrimination rate. This is in accordance with other investigations of potential candidate barcoding chloroplast regions within genera, which have revealed low discrimination rates for tree species, such as Picea (28.7% for a combination of seven chloroplast regions; [54]) and Araucaria (27.8% for a combination of five chloroplast regions; [48]).

The Nuclear ITS Region is a more Promising Barcode in Populus
The nuclear ITS region has been suggested as barcode for plants by numerous authors (e.g. [14,53,[55][56]), because this region evolves rapidly, leading to genetic changes that can differentiate closely related, congeneric species [14,55,57]. A recent test of this region across 1757 seed plant species further demonstrated its discrimination power at the species level [16]. In the current study, the successful delimitation rate for the nuclear ITS region is much higher (PWG-Distance: 40.0%; Tree-Building: 45.0%) than any single chloroplast region (#19.0%) and even higher than the combination of all five chloroplast regions (28.6%) (Fig. 1).
These results suggest that, in Populus, the nuclear ITS region is a relatively effective barcode, out-competing chloroplast DNA regions. First, the nuclear ITS region is shorter and has a higher mutation rate than the chloroplast DNA regions (ITS: 6.54%, 38 variable sites out of 581 bp; five chloroplast markers combined: 2.81%, 85 variable sites out of 3029 bp) (Table 1). Secondly, compared to chloroplast DNA regions, the nuclear ITS region experiences less introgressions. Introgressions can lead to transfer of genetic material across species boundaries, and genomic components with lower intraspecific gene flow are prone to introgression [58][59]. In Populus, the nuclear ITS region is biparentally inherited and dispersed by both pollen and seeds, while the chloroplast DNA regions are maternally inherited and dispersed only by seeds [2]. Thus, the ITS region has higher intraspecific gene flow, lower interspecific gene flow (i.e. lower introgression) and higher inter-species differentiation than the chloroplast regions. Thirdly, the nuclear ITS regions in plants comprise multiple (reiterated) copies and usually experience concerted evolution [60]. During this process, these different copies become homogenized to the same sequence type (at least become almost identical type) as a result of mechanisms such as high-frequency unequal crossing over or gene conversion [60]. Therefore, this region may have undergone fast lineage sorting and subsequently interspecific differentiation, comparable even to that experienced by speciation genes and linked fragments [33,60]. Considered together, the characteristics of plant nuclear ITS regions, such as high mutation rate, low introgression and concerted evolution, have facilitated higher species delimitation efficiency of this nuclear marker than chloroplast markers in Populus.

A Combined Nuclear and Chloroplast Barcode is an Effective Approach in Species Delimitation and Hybridization Detection
Since the nuclear ITS region and the chloroplast regions have different inheritance modes and track different evolutionary histories, a combination of them will improve species delimitation power and further our understanding of evolutionary processes in plants [16]. In this study, we found that among Populus species from western China, using a combination of the ITS region and a single chloroplast region usually improved discrimination efficiency (Fig. 1). This is in agreement with previous work on DNA barcoding of single plant genera, e.g. Alnus (Betulaceae) [61] and Holcoglossum (Orchidaceae) [62].
Meanwhile, the combinations of nuclear and chloroplast regions are also useful in understanding the evolutionary process experienced by plants, e.g. detecting recent hybridization events. There are two such cases in our study. P. 6 canescens is a cross between P. alba and P. tremula, and its parentage has been clearly demonstrated in previous studies (e.g. [39][40]), and our study confirmed that P. 6 canescens individuals from Xinjiang province are also hybrids between the above two parental species (Figs. 2, 3); however, ITS sequences of these individuals are closer to P. tremula (Fig. 2). Another case is P. 6 jrtyschensis, which is generally thought to be the natural hybrid between P. nigra and P. laurifolia [41][42], since the morphology of P. 6 jrtyschensis obviously combines characters from both putative parent species. However, this hypothesis, based on morphological characters, needs to be tested further and, indeed, our results provided positive evidence for the assumed parentage. On one hand, ITS sequences of P. 6 jrtyschensis have three sites that combined SNPs from the P. nigra and P. laurifolia complex, which includes P. laurifolia, P. talassica and P. pilosa (Figs. 2, 3), although the Tree-Building method revealed that ITS genotypes of all hybrid individuals from different populations are closer to P. nigra (Fig. 2). On the other hand, the chloroplast haplotype of hybrid individuals clustered with either the P. laurifolia complex or P. nigra (Fig. 3), which suggests that there are hybrid individuals with either P. nigra or the P. laurifolia complex as maternal donor species. As shown above, the combination of the nuclear ITS region and chloroplast region is an effective approach for detecting recent hybridization events.

Improving the Taxonomy and DNA Barcoding System in Wild Populus Species
Although Populus species are widely distributed across the northern hemisphere, most research concerning this genus has focused on the six to eight species that have been widely cultivated and/or commercially exploited, i.e. P. alba, P. tremula, P. tremuloides, P. nigra, P. deltoides, P. trichocarpa (ref. [11] and references therein). As yet, there is no consensus on the true number of Populus species globally due to misinterpretation of hybrids and difficulties in delimitating species boundaries. Frequent interspecific hybridizations and introgressions as well as a high level of intraspecific morphological variation in this genus are the main causes (e.g. [2,9,38]). Previous studies have suggested that Populus comprises 22 to 85 species plus hundreds of hybrids, varieties and cultivars [2,3,9,63]. Although the widely accepted taxonomic treatment [9] classified Populus into 29 species in six sections (Abaso, Aigeiros, Leucoides, Populus, Tacamahaca, Turanga), the Flora of China [64] recognized 71 species (47 endemic, including at least nine hybrids) from five sections (all sections except Abaso). Given the sharp contrast between different authors with respect to the number of recognized species in this economically and ecologically important genus, it is imperative to initiate a large scale taxonomic revision of Populus across the world, especially in China. DNA barcoding may be an appropriate tool for this, as it is considered to represent a powerful methodology in testing existing taxonomic treatments based solely on morphological characters (e.g. [61,65]).
In this study, we undertook a study of the DNA barcoding of 21 species from western China, employing the nuclear ITS region and five chloroplast regions (matK, rbcL, trnH-psbA, trnG-psbK and psbK-psbI). As discussed above, each of these markers and their combinations have different success rates in discriminating species. Their discrimination rates may vary as a result of many factors, including mutation rate and inheritance mode. However, discriminating between closely related Populus species from parapatric or sympatric areas in western China using DNA barcodes always failed. This may have been the result of incomplete lineage sorting, interspecific hybridization and/or introgressions [59,66]. In order to delimit species boundaries between these sympatric and parapatric species, as well as species boundaries for widespread species that often encounter local species, studies based on molecular data (e.g. DNA sequence, microsatellite, AFLP etc.) at population level are essential (e.g. [33]). Therefore, we agree with a previous proposal for the improvement of DNA barcoding success rate, namely the introduction of highly variable loci that are nested under core barcoding markers, i.e. build a tiered barcoding system [67]. Such a barcoding system would be especially valuable for economically and ecologically important genera, such as Populus. Based on our results, we suggest that in Populus, species delimitation should be performed at least at two levels: the inter-species-complex level and the intra-species-complex level. First, performing a large scale survey of species or species complex (a group of close related species in which gene flows occur frequently) boundaries among all wild species from as many populations as possible using moderately variable loci. Here we recommend the combination of the nuclear marker ITS and one (trnH-psbA) chloroplast maker, and the combination of two (matK and trnG-psbK) or three chloroplast markers (matK, psbK-psbI and trnG-psbK). Subsequently, screening should focus on each species complex, based on population genetics studies using highly variable loci and a denser sampling strategy. Although such studies at population level are usually performed in order to infer the evolutionary history and population genetics of each species complex (e.g. divergence and gene flow among species, demographic history of each species etc.), they are also helpful in detecting differentiation between species [33,68]. Two types of molecular markers with stable recovery rates and easy sample preparation, nuclear SNPs and nuclear SSRs, have particular potential in such studies. Previous studies suggest that Nuclear SNPs is able to identify widely cultivated Populus species [10,32], and the combinations of nuclear SSRs are able to identify different Populus species [27,29,31,33]. The development of nuclear SNP markers and nuclear SSRs (and multiplex SSRs, e.g. [69]) became increasingly convenient given the sequencing of genome (e.g. [8]) and transcriptome (e.g. [70][71]) of Populus species as well as re-sequencing at the population level (e.g. [72]).
In conclusion, we consider that the current taxonomy of wild Populus species needs further revisions with the help of a tiered DNA barcoding system, where different DNA barcodes are applied at inter-and intra-species-complex levels. Doubtless, a tiered DNA barcoding system is extremely important in taxonomic groups that have experienced a complex evolutionary history, e.g. with frequent hybridizations and introgressions among species. Figure S1 The heterozygous sites in the nuclear ITS sequences of P.6jrtyschensis (upper) and P.6canescens