We have generated matK, rbcL, and nrITS2 DNA barcodes for 320 specimens representing all 18 extant genera of the conifer family Podocarpaceae. The sample includes 145 of the 198 recognized species. Comparative analyses of sequence quality and species discrimination were conducted on the 159 individuals from which all three markers were recovered (representing 15 genera and 97 species). The vast majority of sequences were of high quality (B30 = 0.596–0.989). Even the lowest quality sequences exceeded the minimum requirements of the BARCODE data standard. In the few instances that low quality sequences were generated, the responsible mechanism could not be discerned. There were no statistically significant differences in the discriminatory power of markers or marker combinations (p = 0.05). The discriminatory power of the barcode markers individually and in combination is low (56.7% of species at maximum). In some instances, species discrimination failed in spite of ostensibly useful variation being present (genotypes were shared among species), but in many cases there was simply an absence of sequence variation. Barcode gaps (maximum intraspecific p–distance > minimum interspecific p–distance) were observed in 50.5% of species when all three markers were considered simultaneously. The presence of a barcode gap was not predictive of discrimination success (p = 0.02) and there was no statistically significant difference in the frequency of barcode gaps among markers (p = 0.05). In addition, there was no correlation between number of individuals sampled per species and the presence of a barcode gap (p = 0.27).
Citation: Little DP, Knopf P, Schulz C (2013) DNA Barcode Identification of Podocarpaceae—The Second Largest Conifer Family. PLoS ONE 8(11): e81008. https://doi.org/10.1371/journal.pone.0081008
Editor: Mehrdad Hajibabaei, University of Guelph, Canada
Received: April 30, 2013; Accepted: October 9, 2013; Published: November 27, 2013
Copyright: © 2013 Little et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Funding from the Alfred P. Sloan Foundation (http://www.sloan.org/; 2010-6-02) to DPL is gratefully acknowledged. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Podocarpaceae is a family of evergreen trees and shrubs that are sometimes cultivated as ornamentals in suitably warm climates. In terms of number of species, Podocarpaceae is the second largest family of conifers . Podocarpaceae are often a minor subcanopy component of angiosperm–dominated forests. They are most abundant in the mid– to high–elevation tropics where they thrive on nutrient–poor soils. In addition, Podocarpaceae are found in some unusual low–elevation forest types (e.g. kerangas of Borneo; ).
Accurate identification of tropical forest trees, such as Podocarpaceae, is often very difficult. The most easily accessed material is usually sterile. If fertile material is present, it is frequently either inaccessible or detached from the tree making it difficult to convincingly associate the fertile and sterile portions. Although sterile material of Podocarpaceae can usually be identified to genus using phyllotaxis and leaf form , , accurate species identification often requires careful microscopic examination of internal – and external characteristics –. Proper use of the existing identification tools requires training in botanical terminology, skill in microtechnique, and familiarity with Podocarpaceae.
Species of Podocarpaceae are of conservation concern primarily as a result of small population sizes and limited available habitat. Twenty–seven Podocarpaceae species are included in the International Union for the Conservation of Nature (IUCN; ) red list under the categories of vulnerable (10 species), endangered (14 species), and critically endangered (three species). Two species are included in the appendices of the Convention on International Trade in Endangered Species (CITES; ): Podocarpus parlatorei is listed in Appendix I (trade is not allowed) and Po. neriifolius is listed in Appendix III (trade with, some limitations, is allowed).
Podocarpaceae have a minor role in commerce. Nageia nagi, when labeled as Asian bayberry, can legally be sold in the United States of America as an herbal dietary supplement . The seeds are processed into an edible oil that is also used in manufacturing . The young leaves are also edible, but not typically consumed . The conspicuous fleshy reproductive structures (receptacles or epimatium) of Afrocarpus falcatus, Dacrycarpus dacrydioides, Dacrydium cupressinum, Po. elatus, Po. macrophyllus, Po. totara, and Prumnopitys taxifolia are eaten either raw or cooked .
Although their use is currently very limited, Podocarpaceae are known to have medicinal properties that benefit humans and animals , . The receptacles and leaves contain a variety of bio–active compounds such as antioxidants, nordi–terpenes, podocarpic acid, and tatarol , , . Some of these compounds have antimicrobial, fungistatic, or bacteriostatic properties , , . Other compounds have cytotoxic properties that may be useful in destroying cancer –.
The rarity of large uniform stands coupled with a slow rate of growth for most species makes harvest of Podocarpaceae wood generally unsustainable . The growth rate of Po. totara may however accommodate sustainable harvest . Relative scarcity results in a meager international trade—primarily originating from New Zealand and South Africa , –. Timber from Podocarpaceae, referred to as ‘podo’ (or ‘yellow yew’) in commerce, has straight even grain. The wood of some species is brittle when worked and not particularly durable outdoors , . Wood from Po. totara is durable and highly amenable to industrial machining . Wood from Lepidothamnus intermedius, Manoao colensoi, and Pr. taxifolia is very rot resistant . Timber of Ma. colensoi and Le. intermedius have long been used for railway ties . In addition, the wood of some Podocarpaceae species is highly insect resistant (e.g. Af. gracilior , Po. hallii , Po. macrophyllus , and Po. nivalis ).
A reference library of Podocarpaceae DNA barcodes will allow researchers unfamiliar with the family's morphology and anatomy to make accurate identifications. We hope that DNA barcodes will permit foresters, ecologists, conservationists, customs authorities, etc. to make accurate biodiversity inventories and to monitor trade in threatened and endangered Podocarpaceae species so that future conservation and management decisions can be based on sound data.
We aim to generate and evaluate a DNA barcode reference library for Podocarpaceae. The library will be assessed both by the quality of the constituent sequences and the degree to which observed sequence variation unambiguously distinguishes Podocarpaceae species from one another.
Materials and Methods
We sampled 320 individuals representing all of the 18 extant genera of Podocarpaceae (including Phyllocladus). Our sample included 145 of the 198 recognized species (73.2%; ). Between 1 and 9 individuals per species were sequenced (median = 2; IQR = 1–3). All samples were expert–identified using a combination of morphology and leaf anatomy . Voucher information is in Dataset S1.
DNA was extracted from herbarium specimens or silica–dried tissue using the Qiagen DNeasy96 kit. The manufacturer's protocol was modified for herbarium specimens: instead of the recommended incubation, homogenized tissue was digested with 30 µL (20 µg/µL stock) of proteinase K in 400 µL AP1 (supplied by the manufacturer) and 1 µL of DX (supplied by the manufacturer) at 42°C for 24 hours with slow mixing (60 rotations per minute).
The Polymerase Chain Reaction (PCR) was used to amplify matK in a 15 µL volume containing: 20 mM tris pH 8.8, 10 mM KCl, 10 mM (NH4)2 SO4, 2 mM MgSO4, 0.1% (v/v) Triton X-100, 5% (w/v) sucrose, 0.025% (w/v) cresol red, 0.025 µg/µL BSA, 0.2 mM dNTPs, 1 µM of Gym_F1A (5′-ATY GYR CTT TTA TGT TTA CAR GC-3′; ), 1 µM of Gym_R1A (5′-TCA YCC GGA RAT TTT GGT TCG-3′; ), 0.5 units Taq, and 0.5 µL genomic DNA. The reaction mixture was incubated for 150 sec at 95°C, cycled 35 times (30 sec at 95°C, 60 sec at 52°C, 40 sec at 72°C) and then incubated at 72°C for 10 minutes.
PCR amplification of nrITS2 was similar to that of matK except primer annealing was carried out for 30 sec at 58°C rather than 60 sec at 52°C and primers S2F (5′-ATG CGA TAC TTG GTG TGA AT-3′; ) and S3R (5′-GAC GCT TCT CCA GAC TAC AAT-3′ ) replaced Gym_F1A and Gym_R1A.
Unused primers and dNTPs were neutralized using ExoSAP-IT (USB). PCR products were bidirectionally sequenced, using the amplification primers, with BigDye v3.1 (Life Technologies) at the High–Throughput Genomics Unit (University of Washington). PCR amplification and sequencing of rbcL was described previously .
Bases were called and quality values (QV) assigned using KB 1.4 (Life Technologies). Sequencer 4.1 (Gene Codes) was used to construct sequence contigs, trim contigs to a uniform beginning/end (priming sites were excluded), and resolve differences between sequencing reads. Sequences of matK and rbcL were checked for stop codons and frameshift mutations. To identify potential contaminates, all sequences were queried against GenBank using BLAST 2.2.26 . Only hits with an e–value of 10−20.0 or less were retained. Additional contaminates were identified by aligning each marker with MUSCLE 3.8 , coding the resulting indels using ‘simple indel coding’ , , and resampling the resulting matrix 1000 times using the jackknife  procedure. For each resampled matrix, the search for optimal trees was conducted in TNT 1.1 . The search consisted of ten random addition replicates with five trees held in memory per replicate and SPR followed by TBR (BB) branch swapping. The strict consensus tree from each resampled matrix was used to calculate the jackknife tree.
In order to make meaningful comparisons across makers, we only analyzed data from specimens for which matK, rbcL, and nrITS2 were able to be sequenced—referred to as ‘complete samples’ hereafter. Sequences from specimens that could not be definitively identified using morphology/anatomy were excluded.
Sequence quality was assessed using the barcode quality index (B; ) with the acceptable quality threshold (q) set to 30 (an average of one error per thousand sequenced bases). The expected coverage (x) was set to 2. The contig size (c) was set to the observed size. Linguistic complexity (LC; ), a measure of sequence repetitiveness, was calculated for each sequence, using COMPLEX 6.1.0  with window size set to 100 bases, step size set to 1 base, minimum pattern size set to 3 bases, and maximum pattern size set to 6 bases. The threshold for significant increase in homopolymer (mononucleotide repeats) induced PCR artifacts has been empirically determined to be eight bases —thus sequences with homopolymers, eight bases or longer, were identified. Statistical differences, in sequence quality, linguistic complexity, and homopolymer frequency among markers were evaluated with Scheffé's test – at p = 0.05 using the Gaussian distribution. Correlations between sequence quality and linguistic complexity as well as sequence quality and homopolymer frequency were measured by Spearman's rank correlation tests , .
TNT 1.1  was used to analyze phylogenetic relationships among complete samples. Each marker was aligned with MUSCLE 3.8 , indels were coded using ‘simple indel coding’ , , and markers were combined by concatenation . The resulting matrix was searched for optimal trees using 1000 random addition replicates: for each replicate two trees were held in memory, SPR branch swapping was followed by TBR (BB) branch swapping and a 200 iteration ratchet  perturbing 8% of the characters per iteration (4% up weighted, 4% down weighted). Clade support was assessed by 10,000 jackknife resamplings . For each resampled matrix, the search for optimal trees consisted of ten random addition replicates with five trees held in memory per replicate and SPR followed by TBR (BB) branch swapping. The jackknife frequency of each clade in the strict consensus of the original matrix was calculated with SUMTREES 3.3.1  using the strict consensus tree from each resampled matrix , . Trees were rooted following . Tree–based species discrimination was assessed using the ‘least inclusive clade’ method .
Species discrimination was calculated using BRONX 2.0 , . Discrimination success would be overestimated if the reference database just included sequences in the complete sample—thus a BRONX reference database was constructed from all sequences for each marker and marker combination (Dataset S1). To calculate species discrimination, sequences of each complete sample were queried against the reference database. Species were considered distinct if all queries for a given species returned only sequences belonging to that species. The binomial distribution, with each species considered an independent test, was used to compute 95% confidence intervals , , . Differences in species discrimination among markers and marker combinations were quantified using Scheffé's test – at p = 0.05. The binomial distribution was used for tests of species discrimination and the Gaussian distribution was used to test if the number of species conflated when identification failed varied among markers.
Relative variation within and among species—the ‘barcode gap’ —was quantified by comparing pairwise distances for complete samples. Each pair of sequences in the complete sample was aligned separately with MUSCLE 3.8  and the number of unambiguous nucleotide differences was divided by the total number of aligned positions to calculate the edit distance (uncorrected p–distance; ). To minimize sampling and analytic artifacts, the maximum intraspecific distance was compared to the minimum interspecific distance for each species . For each marker, the frequency of barcode gap (maximum intraspecific > minimum interspecific) occurrence was assessed using the binomial distribution and Scheffé's test – at p = 0.05. The point–biserial correlation coefficient was used to examine the relationship between number of samples per species and the occurrence of a barcode gap , . Sequences from all three markers were used simultaneously with McNemar's test ,  to measure the correlation between the occurrence of a barcode gap and whether or not a species can be consistently distinguished from all other species using diagnostic nucleotide positions.
Results and Discussion
In total, 281 matK, 202 rbcL, and 212 nrITS2 finished sequences were generated (Dataset S1). BLAST  queries indicate that the newly generated sequences are consistent with other samples of Podocarpaceae deposited in GenBank (data not shown). Phylogenetic arrangement of genera and species is roughly consistent with previous molecular phylogenetic studies (Figure 1; , –). Sequences derived from individuals of the same species are always in close phylogenetic proximity, but in some cases the sequences do not form a monophyletic group (sensu ). Sequences of some morphological/anatomical species are unambiguously polyphyletic (sensu ; e.g. Podocarpus oleifolius, Figure 1). Mismatches between morphological/anatomical species circumscription and barcode sequences warrant further investigation as they may indicate the presence of cryptic species, introgression, or ancestral polymorphism followed by incomplete lineage sorting. Together the BLAST and phylogenetic contaminate screens indicate that the sequences generated are indeed Podocarpaceae and that no PCR artifacts or errors in sample handling could be detected.
Strict consensus of 3600 most parsimonious trees (L = 1205; CI = 0.59; RI = 0.93; all tree statistics exclude uninformative nucleotide positions) obtained from the simultaneous analysis of matK, rbcL, and nrITS2 sequence data. Numbers at nodes indicated jackknife support above 50%. Species that can be distinguished from all other species using the ‘least inclusive clade’ method are in boldface (the least inclusive clade method cannot be applied to species with only one sample). Genera have been abbreviated: Ac. = Acmopyle, Af. = Afrocarpus, Dc. = Dacrycarpus, Dd. = Dacrydium, F. = Falcatifolium, La. = Lagarostrobos, Le. = Lepidothamnus, Ma. = Manoao, Mi. = Microcachrys, N. = Nageia, Ph. = Pherosphaera, Po. = Podocarpus, Pr. = Prumnopitys, R. = Retrophyllum, and S. = Saxegothaea. Sample codes correspond to those used in Dataset S1.
Newly generated sequences of matK vary from 760 to 775 bp (median = 769; IQR = 769–769), sequences of rbcL are uniformly 607 bp, and sequences of nrITS2 vary from 420 to 435 bp (median = 425; IQR = 425–425). The multiple sequence alignment of matK results in six indels—3 bp (two indels), 6 bp (three indels), and 9 bp (one indel), respectively. The 20 nrITS2 indels resulting from multiple sequence alignment range from 1 to 17 bp (median = 1; IQR = 1–2).
Of the 320 individuals sequenced, finished matK, rbcL, and nrITS2 sequences were generated for 159 individuals. These samples, representing 15 of the 18 extant genera (83.3%; ; Halocarpus, Parasitaxus, and Phyllocladus are not included in the complete sample) and 97 of the 198 recognized species (48.9%; ), were analyzed for sequence quality, linguistic complexity, species discrimination, and barcode gaps. The complete sample set contained between 1 and 3 individuals per species (median = 1; IQR = 1–2; Table 1). In total, there were 95 distinct matK sequence types, 70 rbcL sequence types, and 81 nrITS2 sequence types. The complete sample contained 71 (74.7%) matK sequence types, 61 (87.1%) rbcL sequence types, and 65 (80.2%) nrITS2 sequence types for a combined 90 distinct multilocus genotypes.
Sequence quality and complexity
Sequence quality, as measured by B30 , ranged from 0.775 to 0.989 for matK (median = 0.967; IQR = 0.960–0.975), 0.596 to 0.951 for rbcL (median = 0.938; IQR = 0.929–0.944), and 0.671 to 0.933 for nrITS2 (median = 0.924; IQR = 0.919–0.927; Figure 2). The vast majority of sequences were of high quality: across all markers, 93.5% of the positions in the median sequence were assigned a quality value of 30 or greater—indicating that few, if any, of the finished sequences contain erroneous base calls. Although differences in sequence quality among markers was statistically significant (p = 0.05; matK > rbcL > nrITS2), even the lowest quality sequences exceeded the minimum requirements of the BARCODE data standard (version 2.3; )—thus the statistical differences observed are not particularly meaningful in practice.
Circles represent individual matK (blue; m), rbcL (red; r), and nrITS2 (yellow; i) sequences. Black squares indicate marker means. Error bars span three standard deviations.
Published B30 values for sequences generated using different primer sets are not directly comparable to those reported here because the primer sets define (slightly) different marker regions. Although not comparable in all cases, the median Podocarpaceae sequence is of higher quality than the average sequence reported for angiosperms across all three markers , , . The largest comparable difference between literature reports and newly generated Podocarpaceae sequences was observed in nrITS2 (0.829 versus 0.924; ). The high quality of Podocarpaceae matK sequences is notable, but the gymnosperm specific primers  used to generate the matK sequences make direct comparisons to published values for angiosperms tenuous.
Linguistic complexity is a measure of the number of repeated ‘words’ in a sequence (words 3–6 bp were examined in this case; ). Sequences of matK, rbcL, and nrITS2 have statistically distinct linguistic complexity (p = 0.05) with matK being the simplest (median = 0.443; IQR = 0.437–0.449), followed by nrITS2 (median = 0.527; IQR = 0.513–0.566), and rbcL (median = 0.584; IQR = 0.577–0.590; Figure 2). The range of nrITS2 linguistic complexity is relatively broad especially in comparison to that of matK and rbcL—perhaps a result of different functional constraints on structural versus protein coding sequences.
One might expect that sequences with lower linguistic complexity (i.e. those with homopolymers and/or simple sequence repeats) will have lower sequence quality due to slip–strand mispairing at the site of repetitive sequence elements , –, however lower linguistic complexity is correlated with higher sequence quality in Podocarpaceae (p <2.2×10−16). The sequences with the lowest linguistic complexity generally have sequence quality typical for the marker in question (Figure 2).
A homopolymer eight bases or longer was found in 35 sequences of the complete sample: 34 were matK sequences and one was an nrITS2 sequence. In the matK sequences from the complete sample, there is a single occurrence of A8 and 33 occurrences of T8. The T8 homopolymers occupy alignment positions 465–472 (found in all samples of Afrocarpus, Lepidothamnus, Nageia, Prumnopitys, and Retrophyllum) and 720–727 (found in some Podocarpus). The frequency of homopolymer occurrence was significantly different among markers (p = 0.05). Counter to previous findings , , high homopolymer frequency is correlated with high sequence quality in Podocarpaceae (p = 2.1×10−16). Previous investigations of the relationship between homopolymers and sequence quality focused on homopolymers ten bases or longer because they consistently result in low sequence quality –, however homopolymers ten bases or longer are not found in any sequence of the complete sample. Thus we cannot determine if this length homopolymer has any effect on sequence quality.
The observed correlation between increased sequence quality and decreased linguistic complexity as well as the correlation between increased sequence quality and increased homopolymer frequency indicate that a mechanism other than slip–strand mispairing is responsible for the low quality sequences in the complete sample.
For individual markers, BRONX ,  species discrimination ranged from 28.8% to 38.1% (Figure 3; Table 1). Discrimination for marker combinations was slightly better at 46.4% to 56.7%. Discriminatory power did not statistically differ (p = 0.05) among markers or marker combinations. When species identification failed, the number of conflated species ranged from a mean of 4.3 (σ = 1.8) to 5.6 (σ = 4.1) species for individual markers and a mean of 2.9 (σ = 1.4) to 3.6 (σ = 2.1) species for marker combinations. There were no unambiguous statistical differences (p = 0.05) in the number of conflated species among markers or marker combinations.
Squares indicate means for matK (blue; m), rbcL (red; r), nrITS2 (yellow; i), matK combined with rbcL (purple; mr), matK combined with nrITS2 (green; mi), rbcL combined with nrITS2 (orange, ri), and all markers combined (black; mri). Error bars indicate 95% confidence intervals.
A synergistic effect was observed for marker combinations both in terms of an increase in discriminatory power and decrease in the number of conflated species (Table 1; e.g. the two specimens of Af. mannii examined cannot be consistently distinguished from Af. dawei, Af. falcatus, Af. gracilior, or Af. usambarensis by any single marker, but when matK and nrITS2 are combined, Af. mannii can be consistently distinguished from all other species). In no case did combining markers result in a loss of discriminatory power.
The core barcode markers (matK and rbcL) were able to consistently distinguish among 46.3% of the species in the complete sample (Figure 3). In comparison, studies that analyzed sequences of matK, rbcL, and nrITS2, individually and in combination, using comparable methods of species discrimination (the ‘best match’ procedure  or the ‘simple pairwise matching’ technique ) had a median success rate of 59.5% (range = 35.7–71.4) for core barcode markers (Table 2; , , –). In these same studies, species discrimination noticeably improved with the addition nrITS2 as a supplemental marker (median = 92.6%; range = 57.1–99.3). Although species discrimination did improve in Podocarpaceae with the addition of nrITS2 (Figure 3), the rate of species discrimination (56.7%) is less than the lowest published value (Table 2; ).
Of the 49 species represented by two or more individuals in the complete sample, BRONX could distinguish 28 (57.1%) from all other species using a combination of three markers (Table 1). In contrast, the ‘least inclusive clade’ method could distinguish 21 (42.9%) species (Figure 1). This provides another example of the poor performance of tree–based algorithms for barcode sequence discrimination , .
The complete sample was composed of 97 species represented by 90 distinct multilocus genotypes. Thus, if intraspecific variation is assumed to be near zero, one could plausibly expect that species discrimination would be close to 92.8%, however only 56.7% of species could be consistently distinguished using all three markers simultaneously.
In many cases, identification failed in spite of ostensibly useful variation being present—this most often occurred when genotypes were shared among species (e.g. Po. guatemalensis and Po. matudae are sister species  that have a total of three multilocus genotypes [Figure 1]: the first multilocus genotype is restricted to Po. guatemalensis, the second multilocus genotype is restricted to Po. matudae, and the third multilocus genotype is found in both species). In the cases where genotypes are shared across species boundaries, the data cannot definitively distinguish between the underlying causal mechanisms of recent introgression versus ancestral polymorphism followed by incomplete lineage sorting. In these cases, it is unlikely that sequence data from additional markers will increase species discrimination.
In some cases, identification failure is the result of an absence of sequence variation (e.g. Dc. compactus and Dc. expansus are sister species  that have identical sequences for all three markers [Figure 1]). Sequence data from additional markers may improve species discrimination in these cases. Although we did not test the utility of supplementary plastid markers, it seems unlikely that better discrimination will be provided by additional plastid data given the small difference in species discrimination between matK and rbcL (4.1%; Figure 3)—discriminatory power for plastid markers usually plateaus at two markers . Rather than sequencing more plastid markers, effort would be better invested in variable unlinked markers that are easily recovered from Podocarpaceae (e.g. NEEDLY intron 2 , ).
Discrimination success was mixed for the two CITES–listed Podocarpaceae species (Table 1): Po. parlatorei (CITES Appendix I) can be distinguished from all other species using nrITS2 (matK and rbcL cannot distinguish Po. parlatorei from Po. sprucei; rbcL cannot distinguish Po. parlatorei from Po. transiens); Po. neriifolius (CITES Appendix III) cannot be distinguished from Po. thailandensis using all three markers (using single markers, Po. neriifolius can also be conflated with Po. archboldii, Po. assamica, Po. brassii, Po. crassigemmis, Po. drouynianus, Po. gibbsiae, Po. insularis, Po. ledermannii, Po. philippinensis, Po. polystachyus, Po. ramosii, Po. rubens, and/or Po. subtropicalis). The herbal dietary supplement, N. nagi (Asian bayberry), cannot be distinguished from N. formosensis using all three markers (matK also cannot distinguish N. nagi from N. motleyi).
The barcode gap is a measure of the relative variation within and among species . In the complete sample, 39.1% of species had a barcode gap for matK, 34.0% for rbcL, 38.1% for nrITS2, and 50.5% for all markers simultaneously (Figure 4; Table 1). There is no statistical difference (p = 0.05) in the frequency of barcode gaps among markers. The presence of a barcode gap is not correlated with sample size in Podocarpaceae (rpb = 0.06; p = 0.27).
Circles represent the set of matK (blue), rbcL (red), and nrITS2 (yellow) sequences for each species. Opaque filled circles denote diagnostic sequence sets. Non–diagnostic sequence sets are indicated with semi–transparent filled circles. Equal intra– and inter–specific variation is marked by the gray line. Points above the gray line indicate species with ‘barcode gaps’.
Barcode gaps quantify species distinctness at the barcode locus and thereby provide a crude measure of identification reliability (i.e. a species without a barcode gap may be more likely to be misidentified since it is not particularly distinctive; ). In this data set, whether a species can be consistently distinguished from all other species is unrelated to the presence or absence of a barcode gap (p = 0.02). For matK and rbcL, all of the species that can be consistently diagnosed have a barcode gap, but there are six species with barcode gaps that cannot be consistently differentiated from all other species (matK: Dd. beccarii, Po. bracteatus, Po. novae–caledoniae, Po. nubigenus, Po. rumphii, and R. minus; rbcL: Manoao colensoi, N. wallichiana, Po. bracteatus, Po. pilgeri, Po. spinolosus, and Pr. andina). In contrast, there are four species that do not have nrITS2 barcode gaps, but can be consistently diagnosed with nrITS2 (Af. gracilior, Dc. imbricatus, Po. lambertii, and Pr. ferruginoides). There are also four species that have nrITS2 barcode gaps that cannot be consistently differentiated from all other species using nrITS2 (Po. bracteatus, Po. celatus, Po. longifoliolatus, and Po. sprucei). There are no species with multilocus barcode gaps that cannot be consistently diagnosed using all three markers simultaneously, but there are six species that do not have multilocus barcode gaps that can be consistently diagnosed (Af. gracilior, Af. mannii, Dc. imbricatus, Dc. kinabaluensis, Po. polystachyus, and Pr. ferruginoides; Table 1).
The absence of a barcode gap coupled with discrimination success serves to contrast algorithmic approaches that use diagnostic nucleotide positions (i.e. those positions that consistently distinguish one species from all others) with distance–based methods. The presence of a barcode gap, does not guarantee that a species will be distinct. For example, a species may have a large amount of intraspecific variation combined with a small, but consistent, amount of interspecific variation rendering the species without a barcode gap, but consistently diagnosable—one nucleotide difference that consistently differentiates the species in question from all other species is all that is required. Thus, the absence of a barcode gap is a poor predictor of discrimination success.
The presence of a barcode gap coupled with discrimination failure is an artifact of the analysis conducted: barcode gaps were computed using only sequences in the complete sample whereas discrimination was calculated with a reference database composed of all sequences. Thus, the barcode gap calculation did not necessarily include samples with zero interspecific distance that were included in the discrimination calculation. Restricting the discrimination calculation to sequences in the complete sample would have overestimated discrimination success for Podocarpaceae. At the same time, calculating the barcode gap using all sequences would have resulted incomparable values.
Sampling of additional individuals cannot decrease the maximum intraspecific distance, nor can it increase the minimum interspecific distance. Thus, new sequence data for matK, rbcL, and nrITS2 will either maintain or decrease the number of species with barcode gaps. Likewise, the rate of species discrimination cannot improve, and will most likely deteriorate, with additional sampling of individuals. New sequences of unlinked markers may however increase the number of species with barcode gaps and/or improve the rate of species discrimination.
The vast majority of barcode sequences generated for this study were of high quality (Figure 2). Even the lowest quality sequences exceeded the minimum requirements of the BARCODE data standard. In the few instances that low quality sequences were generated, the responsible mechanism could not be discerned: slip–strand mispairing at the site of repetitive sequence elements cannot adequately explain the low quality sequences observed.
The power of matK, rbcL, and nrITS2, individually and in combination, to discriminate among Podocarpaceae species is relatively low (56.7% of species at maximum; Table 1; Figure 3). There were no statistically significant differences in the discriminatory power of markers or marker combinations. Although the discrimination rate for Podocarpaceae is below the rate reported for comparably analyzed studies (Table 2), it is not markedly lower. Plant DNA barcoding studies that heavily sample within taxonomic groups usually report low rates of species discrimination.
Discrimination success was mixed for Podocarpaceae species important in commerce and of conservation concern (Table 1). The CITES Appendix I species, Po. parlatorei, can be distinguished from all other species using nrITS2. Unfortunately, the CITES Appendix III species, Po. neriifolius, and the herbal dietary supplement, N. nagi, cannot be unambiguously distinguished from all other Podocarpaceae using all three markers.
The presence of a barcode gap was not predictive of discrimination success. There was no statistically significant difference in the frequency of barcode gaps among markers in Podocarpaceae (Figure 4). In addition, there was no correlation between number of individuals sampled per species and the presence of a barcode gap.
Sequences of additional variable unlinked markers that are easily recovered from Podocarpaceae (e.g. NEEDLY intron 2) may increase the rate of species discrimination.
We thank Joan Deutsch for providing excellent technical assistance. License for the use of TNT was generously provided by the Willi Hennig Society.
Conceived and designed the experiments: DPL PK CS. Performed the experiments: PK CS. Analyzed the data: DPL CS. Contributed reagents/materials/analysis tools: DPL PK CS. Wrote the paper: DPL PK CS.
- 1. Farjon A (2001) World checklist and bibliography of conifers. Richmond: Royal Botanic Gardens, Kew, 2 edition.
- 2. Cernusak LA, Adie H, Bellingham PJ, Biffin E, Brodribb TJ, et al.. (2011) Podocarpaceae in tropical forests: a synthesis. In: Turner BL, Cernusak LA, editors, Ecology of the Podocarpaceae in tropical forests, Washington, D.C.: Smithsonian Insitiution Scholarly Press, volume 95 . pp. 189–195.
- 3. de Laubenfels DJ (1969) A revision of the Malesian and pacific rainforest conifers, i. Podocarpaceae, in part. Journal of the Arnold Arboretum 50: 315–369.
- 4. de Laubenfels DJ (1988) Coniferales. Flora Malesiana 10: 337–453.
- 5. Buchholz JT, Gray NE (1948) A taxonomic revision of Podocarpus I: the sections of the genus and their subdivisions with special reference to leaf anatomy. Journal of the Arnold Arboretum 29: 46–63.
- 6. Buchholz JT, Gray NE (1948) A taxonomic revision of Podocarpus II: the American species of Podocarpus section Stachycarpus. Journal of the Arnold Arboretum 29: 64–76.
- 7. Buchholz JT, Gray NE (1948) A taxonomic revision of Podocarpus IV: the American species of section Eupodocarpus subsections C and D. Journal of the Arnold Arboretum 29: 123–151.
- 8. Gray NE, Buchholz JT (1948) A taxonomic revision of Podocarpus III: the American species of Podocarpus section Polypodiopsis. Journal of the Arnold Arboretum 29: 117–122.
- 9. Gray NE, Buchholz JT (1951) A taxonomic revision of Podocarpus V: the south Pacific species of Podocarpus section Stachycarpus.. Journal of the Arnold Arboretum 32: 82–92.
- 10. Gray NE, Buchholz JT (1951) A taxonomic revision of Podocarpus VI: the south Pacific species of Podocarpus section Sundacarpus. Journal of the Arnold Arboretum 32: 93–98.
- 11. Gray NE (1953) A taxonomic revision of Podocarpus VII: the African species of Podocarpus section Afrocarpus. Journal of the Arnold Arboretum 34: 67–76.
- 12. Gray NE (1953) A taxonomic revision of Podocarpus VIII: the African species of section Eupodocarpus, subsections A and E. Journal of the Arnold Arboretum 34: 163–175.
- 13. Gray NE (1955) A taxonomic revision of Podocarpus IX: the south Pacific species of section Eupodocarpus, subsection F. Journal of the Arnold Arboretum 36: 199–206.
- 14. Gray NE (1956) A taxonomic revision of Podocarpus X. Journal of the Arnold Arboretum 37: 160–172.
- 15. Gray NE (1958) A taxonomic revision of Podocarpus XI. Journal of the Arnold Arboretum 39: 424–477.
- 16. Gray NE (1960) A taxonomic revision of Podocarpus XII: section Microcarpus. Journal of the Arnold Arboretum 41: 36–39.
- 17. Gray NE (1962) A taxonomic revision of Podocarpus XIII: section Polypodiopsis in the south Pacific. Journal of the Arnold Arboretum 43: 67–79.
- 18. Schoonraad E, Vanderschijff HP (1974) Anatomy of leaves of genus Podocarpus in South Africa. Phytomorphology 24: 75–85.
- 19. Knopf P, Nimsch H, Stützel T (2007) Dacrydium × suprinii, sp. Nova—a natural hybrid of Dacrydium araucarioides × D. guillauminii. Feddes Repertorium 118: 51–59.
- 20. Knopf P (2011) Differential diagnosis and evolution within the Podocarpaceae s. l. Ph.D. thesis, Ruhr–University Bochum.
- 21. Stockey RA, Ko H (1990) Cuticle micromorphology of Dacrydium (Podocarpaceae) from New Caledonia. Botanical Gazette 151: 138–149.
- 22. Stockey RA, Ko H, Woltz P (1992) Cuticle micromorphology of Falcatifolium de Laubenfels (Podocarpaceae). International Journal of Plant Sciences 153: 589–601.
- 23. Stockey RA, Ko H, Woltz P (1995) Cuticle micromorphology of Parasitaxus de Laubenfels (Podocarpaceae). International Journal of Plant Sciences 156: 723–730.
- 24. Stockey RA, Frevel BJ (1997) Cuticle micromorphology of Prumnopitys Philippi (Podocarpaceae). International Journal of Plant Sciences 158: 198–221.
- 25. Stockey RA, Frevel BJ, Woltz P (1998) Cuticle micromorphology of Podocarpus, subgenus Podocarpus, section Scytopodium (Podocarpaceae) of Madagascar and South Africa. International Journal of Plant Sciences 159: 923–940.
- 26. Mill RR, Schilling DM (2009) Cuticle micromorphology of Saxegothaea (Podocarpaceae). Botanical Journal of the Linnean Society 159: 58–67.
- 27. Schilling DMS, Mill RR (2011) Cuticle micromorphology of Caribbean and Central American species of Podocarpus (Podocarpaceae). International Journal of Plant Sciences 172: 601–631.
- 28. IUCN (2012) International union for the conservation of nature red list of threatened species, version 2012.2. Available: http://www.iucnredlist.org. Accessed 2013 March 5.
- 29. CITES (2012) Convention on international trade in endangered species, appendices I, II, and III. Available: http://www.cites.org. Accessed 2013 March 5.
- 30. McGuffin M, Kartesz JT, Leung AY, Tucker AO (2000) Herbs of Commerce. American Herbal Products Association, 2nd edition.
- 31. Fu L, Li Y, Mill RR (1994) Podocarpaceae. In: Wu Z, RavenPH, Hong D, editors, Flora of China, St. Louis: Missouri Botanical Garden, volume 4 . pp. 78–84.
- 32. Facciola S (1990) Cornucopia: a source book of edible plants. Vista: Kampong Publications.
- 33. Abdillahi HS, Stafford GI, Finnie JF, Van Staden J (2010) Ethnobotany, phytochemistry and pharmacology of Podocarpus sensu latissimo (s.l.). South African Journal of Botany 76: 1–24.
- 34. Abdillahi HS, Verschaeve L, Finnie JF, Van Staden J (2012) Mutagenicity, antimutagenicity and cytotoxicity evaluation of South African Podocarpus species. Journal of Ethnopharmacology 139: 728–738.
- 35. Bauch J, Schmidt O, Hillis WE, Yazaki Y (1977) Deposits in heartshakes of Dacrydium species and their toxicity against fungi and bacteria. Holzforschung 31: 1–7.
- 36. Symonds EL, Konczak I, Fenech M (2012) The Australian fruit illawarra plum (Podocarpus elatus Endl., Podocarpaceae) inhibits telomerase, increases histone deacetylase activity and decreases proliferation of colon cancer cells. British Journal of Nutrition 15: 1–9.
- 37. Abdillahi HS, Finnie JF, Van Staden V (2008) Antibacterial activity of Podocarpus species. South African Journal of Botany 74: 359–360.
- 38. Abdillahi HS, Stafford GI, Finnie JF, Van Staden J (2008) Antimicrobial activity of South African Podocarpus species. Journal of Ethnopharmacology 119: 191–194.
- 39. Hayashi Y, Matsumoto T, Tashiro T (1979) Antitumor activity of norditerpenoid dilactones in Podocarpus plants—structure activity relationship on in vitro cytotoxicity against Yoshida sarcoma. Gann 70: 365–369.
- 40. Hembree JA, Chang CJ, McLaughlin JL, Cassady JM, Watts DJ, et al. (1979) Cytotoxic norditerpene dilactones of Podocarpus milanjianus and Podocarpus seellowii.. Phytochemistry 18: 1691–1694.
- 41. Hembree JA, Chang C, Mclaughlin JL, Cassady J (1980) Milanjilactone A and milanjilactone B, 2 novel cytotoxic norditerpene dilactones from Podocarpus milanjianus Rendle. Experientia 36: 28–29.
- 42. Park HS, Takahashi Y, Fukaya H, Aoyagi Y, Takeya K (2003) S-R-podolactone D, a new sulfoxide–containing norditerpene dilactone from Podocarpus macrophyllus var. maki. Journal of Natural Products 66: 282–284.
- 43. Park HS, Takahashi Y, Fukaya H, Aoyagi Y, Takeya K (2004) New cytotoxic norditerpene dilactones from leaves of Podocarpus macrophyllus var. maki. Heterocycles 63: 347–357.
- 44. Bergin DO (2000) Current knowledge relevant to management of Podocarpus totara for timber. New Zealand Journal of Botany 38: 343–359.
- 45. Phillips EWJ (1941) The identification of coniferous woods by their microscopic structure. The Journal of the Linnean Society of London, Botany 52: 259–320.
- 46. Dallimore W, Jackson AB, Harrison SG (1967) A handbook of Coniferae and Ginkgoaceae. New York: St. Martin's Press.
- 47. Farjon A (2008) A natural history of conifers. Portland: Timber Press.
- 48. Cockayne L (1919) New Zealand plants and their story. Wellington: M. F. Marks, Goverment Printer.
- 49. Kubo I, Matsumoto T, Klocke JA (1984) Multichemical resistance of the conifer Podocarpus gracilior (Podocarpaceae) to insect attack. Journal of Chemical Ecology 10: 547–559.
- 50. Russell GB, Singh P, Fenemore PG (1972) Insect–control chemicals from plants: nagilactone c, a toxic substance from the leaves of Podocarpus nivalis and P. hallii. Australian Journal of Biological Sciences 25: 1025–1029.
- 51. Saeki I, Sumimoto M, Kondo T (1970) Termiticidal substances from wood of Podocarpus macrophyllus D. Don. Holzforschung 24: 83–86.
- 52. Li Y, Gao LM, Poudel RC, Li DZ, Forrest A (2011) High universality of matK primers for barcoding gymnosperms. Journal of Systematics and Evolution 49: 169–175.
- 53. Chiou SJ, Yen JH, Fang CL, Chen HL, Lin TY (2007) Authentication of medicinal herbs using PCR–amplified ITS2 with specific primers. Planta Medica 73: 1421–1426.
- 54. Knopf P, Schulz C, Little DP, Stützel T, Stevenson DW (2012) Relationships within Podocarpaceae based on DNA sequence, anatomical, morphological, and biogeographical data. Cladistics 28: 271–299.
- 55. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215: 403–410.
- 56. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792–1797.
- 57. Simmons MP, Ochoterena H (2000) Gaps as characters in sequence–based phylogenetic analysis. Systematic Biology 49: 369–381.
- 58. Little DP (2005) 2xread: a simple indel coding tool. Available: http://www.nybg.org/files/scientists/2xread.html. Accessed 2013 January 5.
- 59. Farris JS, Albert VA, Kallersjo M, Lipscomb D, Kluge AG (1996) Parsimony jackknifing outperforms neighbor–joining. Cladistics 12: 99–124.
- 60. Goloboff PA, Farris JS, Nixon KC (2008) TNT, a free program for phylogenetic analysis. Cladistics 24: 774–786.
- 61. Little DP (2010) A unified index of sequence quality and contig overlap for DNA barcoding. Bioinformatics 26: 2780–2781.
- 62. Pesole G, Attimonelli M, Saccone C (1996) Linguistic analysis of nucleotide sequences: algorithms for pattern recognition and analysis of codon strategy. Methods in Enzymology 266: 281–294.
- 63. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends in Genetics 16: 276–277.
- 64. Shinde D, Lai Y, Sun F, Arnheim N (2003) Taq DNA polymerase slippage mutation rates measured by PCR and quasi–likelihood analysis: (CA/GT)n and (A/T)n microsatellites. Nucleic Acids Research 31: 974–980.
- 65. Scheffé H (1953) A method for judging all contrasts in the analysis of variance. Biometrika 40: 87–104.
- 66. de Mendiburu F (2012) Agricolae version 1.1-3. Available: http://cran.r-project.org/. Accessed 2013 January 5.
- 67. R development core team (2012) R: a language and environment for statistical computing (version 2.15.2). Vienna, Austria: R Foundation for Statistical Computing.
- 68. Spearman C (1904) The proof and measurement of association between two things. The American Journal of Psychology 15: 72–101.
- 69. Salinas NR, Little DP (2013) 2matrix: a utility for indel coding and phylogenetic matrix concatenation. Available: https://github.com/nrsalinas/2matrix. Accessed 2013 September 13.
- 70. Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15: 407–414.
- 71. Sukumaran J, Holder MT (2010) DendroPy: a Python library for phylogenetic computing. Bioinformatics 26: 1569–1571.
- 72. Paradis E, Bolker B, Claude J, Cuong HS, Desper R, et al.. (2013) The APE (Analyses of Phylogenetics and Evolution) package version 3.0-10. Available: http://cran.r-project.org/. Accessed 2013 January 5.
- 73. Little DP, Stevenson DW (2007) A comparison of algorithms for identification of specimens using DNA barcodes: examples from gymnosperms. Cladistics 23: 1–21.
- 74. Little DP (2011) DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability. PLoS ONE 6: e20552.
- 75. Little DP (2012) BRONX: Barcode Recognition Obtained with Nucleotide eXposés version 2.0. Available: http://www.nybg.org/files/scientists/dlittle/BRONX2.html. Accessed 2013 January 5.
- 76. Wilson EB (1927) Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association 22: 209–212.
- 77. Harrell Jr FE (2012) The Hmisc package version 3.10-1. Available: http://cran.r-project.org/. Accessed 2013 January 5.
- 78. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biology 3: e422.
- 79. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics|Doklady 10: 707–710.
- 80. Meier R, Zhang G, Ali F (2008) The use of mean instead of smallest interspecific distances exaggerates the size of the “barcoding gap” and leads to misidentification. Systematic Biology 57: 809–813.
- 81. Rizopoulos D (2013) Latent Trait Models under IRT version 0.9-9. Available: http://cran.r-project.org/. Accessed 2013 January 5.
- 82. McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12: 153–157.
- 83. Conran JG, Wood GM, Martin PG, Dowd JM, Quinn CJ, et al. (2000) Generic relationships within and between the gymnosperm families Podocarpaceae and Phyllocladaceae based on an analysis of the chloroplast gene rbcL. Australian Journal of Botany 48: 715–724.
- 84. Kelch DG (2002) Phylogenetic assessment of the monotypic genera Sundacarpus and Manoao (Coniferales: Podocarpaceae) utilising evidence from 18S rDNA sequences. Australian Systematic Botany 15: 29–35.
- 85. Sinclair WT, Mill RR, Gardner MF, Woltz P, Jaffré T, et al. (2002) Evolutionary relationships of the New Caledonian heterotrophic conifer Parasitaxus usta (Podocarpaceae), inferred from chloroplast trnL–F intron/spacer and nuclear rDNA ITS2 sequences. Plant Systematics and Evolution 233: 79–104.
- 86. Biffin E, Conran JC, Lowe AJ (2011) Podocarp evolution: a molecular phylogenetic perspective. In: Turner BL, Cernusak LA, editors, Ecology of the Podocarpaceae in tropical forests, Washington, D.C.: Smithsonian Insitiution Scholarly Press, volume 95 . pp. 1–20.
- 87. Farris JS (1974) Formal definitions of paraphyly and polyphyly. Systematic Zoology 23: 548–554.
- 88. Hanner R (2009) Proposed standards for BARCODE records in INSDC (BRIs). Technical report, Database Working Group, Consortium for the Barcode of Life, Available: http://barcoding.si.edu/PDF/DWG data standards-Final.pdf. Accessed 2013 January 5.
- 89. Jeanson ML, Labat JN, Little DP (2011) DNA barcoding: a new tool for palm taxonomists? Annals of Botany 108: 1445–1451.
- 90. Aubriot X, Lowry PP, Cruaud C, Couloux A, Haevermans T (2013) DNA barcoding in a biodiversity hot spot: potential value for the identification of Malagasy Euphorbia L. listed in CITES Appendices I and II. Molecular Ecology Resources 13: 57–65.
- 91. Devey DS, Chase MW, Clarkson JJ (2009) A stuttering start to plant DNA barcoding: microsatellites present a previously overlooked problem in non–coding plastid regions. Taxon 58: 7–15.
- 92. Fazekas AJ, Steeves R, Newmaster SG (2010) Improving sequencing quality from PCR products containing long mononucleotide repeats. BioTechniques 48: 277–285.
- 93. Fazekas AJ, Steeves R, Newmaster SG, Hollingsworth PM (2010) Stopping the stutter: improvements in sequence quality from regions with mononucleotide repeats can increase the usefulness of non–coding regions for DNA barcoding. Taxon 59: 694–697.
- 94. Meier R, Shiyang K, Vaidya G, Ng PKL (2006) DNA barcoding and taxonomy in Diptera: a tail of high intraspecific variability and low identification success. Systematic Biology 55: 715–728.
- 95. CBOL Plant Working Group (2009) A DNA barcode for land plants. Proceedings of the National Academy of Sciences 106: 12794–12797.
- 96. Yan HF, Hao G, Hu CM, Ge XJ (2011) DNA barcoding in closely related species: a case study of Primula L. sect. Proliferae Pax (Primulaceae) in China. Journal of Systematics and Evolution 49: 225–236.
- 97. Baker DA, Stevenson DW, Little DP (2012) DNA barcode identification of black cohosh herbal dietary supplements. Journal of AOAC International 95: 1023–1034.
- 98. Yang JB, Wang YP, Möller M, Gao LM, Wu D (2012) Applying plant DNA barcodes to identify species of Parnassia (Parnassiaceae). Molecular Ecology Resources 12: 267–275.
- 99. Little DP (2004) Documentation of hybridization between Californian cypresses: Cupressus macnabiana × sargentii. Systematic Botany 29: 825–833.