DNA Barcode Detects High Genetic Structure within Neotropical Bird Species

Background Towards lower latitudes the number of recognized species is not only higher, but also phylogeographic subdivision within species is more pronounced. Moreover, new genetically isolated populations are often described in recent phylogenies of Neotropical birds suggesting that the number of species in the region is underestimated. Previous COI barcoding of Argentinean bird species showed more complex patterns of regional divergence in the Neotropical than in the North American avifauna. Methods and Findings Here we analyzed 1,431 samples from 561 different species to extend the Neotropical bird barcode survey to lower latitudes, and detected even higher geographic structure within species than reported previously. About 93% (520) of the species were identified correctly from their DNA barcodes. The remaining 41 species were not monophyletic in their COI sequences because they shared barcode sequences with closely related species (N = 21) or contained very divergent clusters suggestive of putative new species embedded within the gene tree (N = 20). Deep intraspecific divergences overlapping with among-species differences were detected in 48 species, often with samples from large geographic areas and several including multiple subspecies. This strong population genetic structure often coincided with breaks between different ecoregions or areas of endemism. Conclusions The taxonomic uncertainty associated with the high incidence of non-monophyletic species and discovery of putative species obscures studies of historical patterns of species diversification in the Neotropical region. We showed that COI barcodes are a valuable tool to indicate which taxa would benefit from more extensive taxonomic revisions with multilocus approaches. Moreover, our results support hypotheses that the megadiversity of birds in the region is associated with multiple geographic processes starting well before the Quaternary and extending to more recent geological periods.


Introduction
One of the striking patterns in geographic distribution of terrestrial biodiversity is the increase in species richness towards lower latitudes in several groups of organisms, including birds. The possible causes for this pattern is one of the highly debated topics in ecology and evolution, even though no definitive conclusion was yet been achieved [1,2,3,4]. The Neotropical area alone holds a third of the recognized extant bird species (about 3,300 out of 10,000) [5], with a biodiversity hotspot in the tropical forests [6]. Moreover, recent phylogenies suggest the number of species in the area is underestimated because reproductively isolated lineages are frequently described in these studies [7,8,9,10,11]. In stark contrast to bird taxonomy in temperate zones, genetic evidence for species limits in the Neotropics is often discordant with traditional taxonomy due to the high incidence of species complexes. These complexes commonly feature gradual variation in morphological and behavioural characters, masking the occurrence of similar species that can be uncovered with genetic analyses [11,12,13,14,15].
DNA barcodes based on the 59 portion of the cytochrome oxidase I gene (COI) linked with specimens vouchers and locality information provides a rapid and inexpensive method to identify species and detect 'provisional new species' [16]. Pilot DNA barcode surveys in birds of North America, sister-species pairs, and birds of Korea were successful in either identifying recognized species of birds, and detecting some potential new species, except for a minor proportion of cases where species are very recently diverged or hybridize [17,18,19,20]. Critics questioned if the success observed in North American birds could be extrapolated to the tropics [21], where species clearly exhibit a higher level of phylogeographic subdivision [22]. However, DNA barcoding has subsequently proved to be highly successful in identifying Neotropical species of birds; all 16 species (100%) of antbirds (Thamnophilidae) that were barcoded [23] and 494 of 500 (95.8%) species of birds of Argentina [24] had distinguishable COI signatures. The screening of Argentinean birds also detected 21 species with deep intraspecific structure, and revealed more complex patterns of regional divergence in the Neotropical than in the North American avifauna [24]. Even though more species will doubtlessly be shown to share barcodes when complete coverage of species and genera is available, it is clear that large-scale sequencing of COI associated with vouchered specimens and locality information is a valuable tool in understanding genetic differentiation within and among species of birds [17,21,24,25].
In this study we increased the coverage of Neotropical bird species that have been barcoded by adding 637 samples from 431 species, with higher representation in tropical forest areas of Brazil and Guyana, but also including samples from localities ranging from Mexico to Argentina and Chile. We compared these sequences with previously published sequences of congeneric species of Neotropical birds, totaling 1,431 samples from 561 different species of birds, 296 of which were represented by multiple individuals. We showed that a high success rate in species identification (93%) with DNA barcodes can be achieved in this large sample of avian biodiversity from the mega-diverse Neotropical region similar to that obtained in broad geographic surveys in the Nearctic and Palearctic regions of the world. Additionally, a higher percentage (12%) of species had multiple deep phylogeographic splits than in previous surveys, some of which are likely reproductively isolated lineages.

Species identification in Neotropical birds
About 93% of the species in our sample (520 out of 561) did not share sequences with any other species included in the analysis, and when multiple individuals were sampled (296 species), mean genetic distances among individuals were lower than to the closest species from the same genus (File S1, Table S1). Kimura 2-Parameter genetic distances (K2P) within-species had a wide range (0 up to 13.7%), with more than 75% of the observations below 1% K2P. Conversely, 10% of the pairwise comparisons were higher than 3% K2P (range 3.1-13.7%), overlapping considerably with among-species variation ( Figure 1, Table S1). Pair-wise comparisons among-species of the same genus were distributed from 0.08 to 20.3% K2P with most of the comparisons observed between 5-15% K2P ( Figure 1). Extremely high genetic distances suggestive of higher rates of evolution or ancient divergences were observed among species within Trogon (Trogoniformes) and Crypturellus (Tinamiformes), with maximum distances of 19, and 20.3%, respectively. One specimen identified as Nothoprocta ornata differed from other species in the genus by 24.7%, but was only 0.34% divergent from specimens of Tinamotis pentlandii. Hence it was either incorrectly identified in the field or possibly is a hybrid between the two genera, as both species occur near to the collecting locality in Chile.
A total of 41 species did not have unique barcodes, of which 21 share sequences with other species (Table 1). Eight of those species despite being reciprocally monophyletic, or represented by one sample, only differed from their sister species by 1 to 6 diagnostic characters, or 0.14 to 0.86% K2P distance. Aggregation of closely related haplotypes in phylogenetic trees can either represent distinct taxonomic units, or random branches of lineages within the same taxonomic group [26]. To distinguish between the two scenarios, we applied a statistical test of taxonomic distinctiveness proposed by Rosenberg [26] for sister species differing by less than 1% K2P. With the current limited sampling of individuals, chance occurrence of reciprocal monophyly between species could not be rejected (p.0.05), so they were considered not distinguishable by COI barcodes (Table 1). Species pairs differed by few nucleotide substitutions, with marginal values for the test of chance occurrence of reciprocal monophyly (0.01,p,0.05). Thus the following species groups were considered to be distinguishable by COI barcodes: the ducks Anas puna/versicolor (p = 0.01) [24], the greenfinches Carduelis atrata/barbata/versicolor (p = 0.01) [24] and the orioles Icterus cayanensis/chrysocephalus (p = 0.03).
Sixteen species had multiple divergent clusters (K2P genetic distances between 1.54 up to 13.7%), not recovered monophyletic with COI, that often corresponded to samples from different areas of endemism or ecoregions (Table 1-cat. IV, Table 2, Figure 2). A few exceptions were observed, where paraphyletic divergent specimens were found in the same geographic locality. For instance, specimens from the long-tailed hermit (Phaethornis superciliosus) from Aripuanã and Juruena, both within the Rondonian area of endemism, were 8% divergent, and the specimen from Juruena differed from a scale-throated hermit (P. eurynome) from Southern Atlantic forest by 7.4%. Even more strikingly, two samples of the yellow-margined flycatcher (Tolmomyias assimilis) from Napo were 8.3% divergent. The species pair of thrushes Turdus albicollis/leucomelas were paraphyletic in their COI sequences, sharing barcodes in their Amazonian distribution (Figure 3). Vireo olivaceus 10 IV -a I) share barcodes with sympatric species; II) share barcodes with allopatric species; III) monophyletic but very closely related to sister species; IV) paraphyletic species with lineages more than 1.5% divergent (see Table 2). b Previouly reported by Kerr et al [24] and/or Campagna et al [62]. c Only performed for reciprocally monophyletic species pairs. doi:10.1371/journal.pone.0028543.t001 Deep genetic structure within Neotropical bird species Deep intraspecific divergences in 48 species overlapped widely with among-species distances (K2P 1.6 to 7.8%, Table 3). These genetically structured species belong to 21 bird families from nine different bird orders, most frequently represented by antwrens (Thamnophilidae, Passeriformes). Most of the species with deep genetic structure were broadly distributed in the Neotropics, and several are subdivided into multiple subspecies [27]. Often samples from different areas of endemism or different ecoregions were the most divergent within species (Table 3). Some species showed genetic discontinuities in some pairs of geographic areas, but not in others, such as the ochre-bellied flycatcher (Mionectes oleagineus). Samples from the Napo, Imeri and Guyanian areas of endemism were not very distinct genetically, but specimens from Belém were 2.76% divergent from the others ( Figure 4). All samples of the white-shouldered antshrike (Thamnophilus aethiops) from different areas of endemism (Belém, Rondonian, Imeri, and Napo) had deep instraspecific genetic variation. The deepest split was between Belém and the other areas, and then the next split was between Rondonian, Imeri, and Napo ( Figure 5).

Identification of Neotropical species with DNA barcodes
Despite the high success we obtained in Neotropical bird species identification with DNA barcodes (93%), comparable to previous barcode surveys in birds [23,24], most of the genera and species were not sampled across their entire distribution, which overestimates its potential to differentiate species. This was observed for at least two species previously distinct with DNA barcodes, Anas sibilatrix, and Celeus lugubris [24], who were shown to be sharing sequences with Anas flavirostris and Celeus elegans, respectively, when samples from other areas of their geographic range were included in this study. When comprehensive genus and species coverage becomes available in Neotropical birds, more species are likely to not have unique DNA barcodes [21]. Nonetheless, more certainty will be achieved overall in the identification of species with COI barcodes because we will be able to better address monophyly of lineages and to verify the frequency with which individuals from different populations within species complexes are exchanging genes [16]. In most of the genera for which we had better species coverage for COI, such as Paroaria, Coryphospingus, Hemithraupis, Cyanerpes, Cyanocompsa, Mimus, Phacellodomus, and Dendrocincla, species did not share barcodes. Even though we obtained only single sequences for many species, they will contribute to future systematic efforts as part of the public standardized DNA barcode library [28]. They also will aid in faster identification of specimens that are difficult to identify morphologically, such as embryos and eggs, which will positively impact the conservation of avian wildlife in the Neotropical region.

Species not identified by DNA barcodes
Among the species we considered not identifiable with COI barcodes, some were very closely related with very similar barcode  sequences (category III, Table 1). Our sampling was not comprehensive enough to reject their monophyly by chance [19,25,26], but once more individuals from different areas of their range are included stronger support might be adduced for their reciprocal monophyly [19,24,25]. On the other hand some species might be recovered as not monophyletic with increased sample sizes, due to unsorted ancestral polymorphism or hybridization [25]. In that case they would not be identified by DNA barcodes at the species level, suggesting that future studies should employ multilocus phylogenetic inference with faster evolving nuclear sequences in a coalescent framework to try to resolve species lineages [29]. Once larger sample sizes are available for closely related species, character-based approaches implemented automatically, such as in CAOS [30], are preferable to genetic distance levels to determine their distinctiveness, as distance levels within and among species can overlap considerably even when substitutions among species are fully sorted. The species recovered as non-monophyletic (category IV, Table 1) are strong candidates for taxonomic revision, and some of their divergent lineages might correspond to different species. For instance, the divergent lineages within the bearded flycatcher (Myiobius barbatus) belong to different recognized subspecies: amazonicus, insignis, and mastacalis [31,32]. They currently are allopatric, have morphological differences and differ in their K2P genetic distances by 12.6-13.7%. The three subspecies clades were not recovered as monophyletic with COI barcodes because the ruddy-tailed flycatcher (Terenotriccus_erythrurus) and the black-tailed myiobius (Myiobius_atricaudus) were included in the species clade. Similarly, specimens from North and South Atlantic forest of the rufous gnateater (Conopophaga lineata) differ by 9.6%. However, the lineages from the two localities are not monophyletic because the chestnut-belted gnateater (Conopophaga aurita) and the hooded gnateater (Conopophaga roberti) are embedded in this group (File S1), as shown previously with more comprehensive sample sizes and mitochondrial markers [33]. The morphological characters used to define these lineages as members of a single species could be under strong stabilizing selection, and thus not mirroring the accumulation of mutations through time in neutral genes like COI. Most cases of paraphyly in birds are caused by incorrect taxonomy [34]. Alternatively, paraphyletic species can arise when geographically isolated lineages merge in part of their distribution before complete reproductive isolation has evolved [35]. Phylogeographic studies including samples from their entire geographic range and from the closely related species are needed to properly understand their diversification patterns, and establish their taxonomic status.
The 17 species that shared barcodes with closely related species in sympatry likely experienced hybridization, or recent speciation and incomplete lineage sorting, or could simply be examples of incorrect taxonomy or sample misidentification. For instance, the flightless steamer duck (Tachyeres pteneres) shares barcodes with the flying steamer duck (Tachyeres patachonicus) in Argentina, even though these species are very distinct morphologically. In this example, misidentification of the sample is less likely. A multigene phylogeny of four duck genera also reported difficulty in resolving the relationships among species of Tachyeres, and attributed this to a rapid diversification of the group, with possible incomplete lineage sorting, founder effects, and introgression [36]. The tawnycrowned greenlet (Hylophilus ochraceiceps) had intraspecific clusters differing by almost 7% sequence divergence between Napo/Imerí and Rondonian endemic areas, and shared barcodes with the grey-chested greenlet (H. semecinereus) in their Rondonian distribution. Both species are comprised of multiple subspecies, and some of their variants are morphologically alike. The current taxonomy of the genus might not be an accurate reflection of lineage relationships, but misidentification of samples cannot be ruled out.
Two species pairs occurring in allopatry were not reciprocally monophyletic: the bicolored (Gymnopithys leucaspis) and rufousthroated (Gymnopithys rufigula) antbirds, and the ochre-collared piculet (Picumnus temminckii) and spotted piculet (Picumnus pygmaeus). In both cases they are morphologically distinct and do not share identical barcodes with the other species; genetic distances among samples were around 0.5% and 1.0%, respectively. In these examples the lack of reciprocal monophyly could be result of recent speciation and shared ancestral polymorphism, and hybridization. A faster evolving marker such as the control region or larger mitochondrial sequences might recover their reciprocal monophyly [22,25].

Complex patterns of population structure detected with DNA barcodes
Our results agree with previous hypotheses that complex patterns of speciation were responsible for the high diversity in Neotropical bird species [37], and strongly supports the view that most avian species in the region are narrowly endemic rather than widely distributed [9,38]. Several hypotheses were proposed to explain the patterns of taxon distribution in the Amazonian lowland region. The forest refugia hypothesis [5,39,40] suggested that cycles of expansion and retraction of dry patches within forest areas were associated with interglacial and glacial periods, and this could create multiple events of isolation among widely distributed groups, promoting speciation [5,39,40]. The riverine hypothesis suggested that the formation of the rivers in the Amazon region could have acted as important geographic barriers to promote speciation, as they delimit most areas of endemism [41,42,43]. This would have started at least by the end of the Miocene with the uplift of the Northern portion of the Andes [44,45]. Another proposal is the marine incursions hypothesis, in which sea-level rises of about 100 m in the Quaternary and late Tertiary are suggested to have fragmented the Amazonian lowland into a large number of true islands and archipelagos, favoring active allopatric speciation [46,47]. The wide range of divergence levels we observed within the 61 nonmonophyletic and monophyletic species with deep intraspecific variation (1-13% K2P distances), together with the high incidence of recently evolved species, is consistent with speciation events starting well before the Pliocene and Pleistocene, and extending to more recent geologic periods [38]. Although several groups of species have similar patterns of genetic and geographic breaks among the same areas of endemism, different levels of genetic distances between the same areas were also recovered in other species. The wide range of intraspecific genetic distances observed between a pair of geographical localities might reflect multiple vicariant events that have occurred at different geological times [15,48], or they could reflect multiple dispersal events that followed a major isolation process [49], or variation in rates of evolution across different species [50,51] whose populations were isolated by a single vicariant event. Additionally, a significant relationship was observed in previous studies [52] between interspecific levels of cross-barrier genetic differentiation within the forest stratum at which a species forages in Neotropical rain forest. More comprehensive taxon sampling and estimates of times of diversification that take into account variation in rates of evolution across lineages [50] are needed to properly associate the diversification of a particular taxon with geographical events.
We have chosen not to flag divergent lineages as provisional new species, because our sampling was not comprehensive enough to properly quantify genetic variation in each locality in different species, such as the red-eyed vireo (Vireo olivaceus) and the ultramarine grosbeak (Cyanocompsa brissonii). Specimens of red-eyed vireo from Puna+Napo and Atlantic Forest were genetically divergent (2-3%), but haplotypes from the Atlantic Forest and Puna were observed in the Chaco. Similarly, specimens of ultramarine grosbeak from Caatinga and Puna were also divergent (2.7%), and both haplotypes are also found in Chaco. Both species may have reinvaded the Chaco after being isolated on the borders of this area. To check if these lineages deserve species recognition it is important to investigate if the highly divergent specimens in sympatric zones are reproductively isolated. Some of the deep intraspecific lineages we described in this study were reported previously, such as the difference among thrush-like Schiffornis (Schiffornis turdina) from Rondonian and Napo areas of endemism [11]. Others, such as the whiskered myiobius (Myiobius barbatus) from Belém, Para2 and Atlantic forest will likely prove to be different species.
DNA barcodes of several new species of Neotropical birds will contribute to a deeper understanding of the systematics and diversification of these taxa in the area. Assuming the current species taxonomy, studies of historical patterns of diversification of species in the area can be obscured since many species were revealed not to be monophyletic. Moreover, a high number of species in the Neotropical realm are comprised of multiple divergent lineages, thus the sample sizes of barcoded individuals and other markers within and among species in the area need to be higher than in other biogeographic areas that are not as taxondiverse. This can be achieved by complementary efforts of independent research groups. Common and divergent patterns of genetic distances observed within and among closely related species suggest that multiple geographic processes have shaped the distribution of avian taxa in the Neotropics, and DNA barcodes surveys will continue to reveal many more interesting geographic patterns in the region.

Taxon sampling
We analyzed 637 individuals from 431 species of Neotropical bird species from two tissue collections: Laboratório de Genética e Evolução Molecular de Aves (LGEMA) in the Universidade de São Paulo, São Paulo, and The Royal Ontario Museum in Toronto (ROM), with high representation in the Amazon lowlands and Atlantic Forest (Table S2, File S2). Whenever available, individuals from different localities of their distribution range were sampled (Table S2, GenBank numbers JN801479 -JN802115, project ''Neotropical-BRAS'' in the completed projects section of the Barcode of Life Data System-BOLD [53]). To increase intraspecific sampling and to compare more closely related congeneres, we added sequences of individuals from the same species and same genera of Neotropical birds from the study of birds from Argentina [24] (project ''Birds of Argentina-Phase I-BARG'' in the completed projects section of BOLD [53]), thus extending our survey to 1,431 samples from 561 different species (Table S2, File S2).

DNA extraction and amplification
DNA was extracted by a membrane purification procedure in glass fiber-filtration plates (Acroprep 96 Filter Plate-1.0 mm Glass, PALL Corporation) [54], and collected in PCR plates. Sequences of about 700 base pairs (bp) were obtained from the 59end of the mitochondrial gene Cytochrome oxidase I (COI). Polymerase Chain Reaction (PCR) amplifications were performed in 12.5 mL reactions in a buffer solution containing 10 mM Tris-HCl (pH8.3), 50 mM KCl, 2.5 mM MgCl 2 , 0.01% gelatin, 0.4 mM dNTPs, 0.2 mM of each primer, 1 U Taq Polymerase (Invitrogen) and 20-25 ng of DNA. Cycle conditions were: an initial denaturation at 94uC for 5 min, 36 cycles of 94uC for 40 sec, 50uC for 40 sec and 72uC for 1 min, and a final extension at 72uC for 7 min. Bird universal primers used in COI amplifications were LTyr (forward -TGTAAAAAGGWCTACAGCCTAACGC [19]) and COI907aH2 (GTRGCNGAYGTRAARTATGCTCG [19]) resulting in a long but very stable amplified product of about 910 bp. This primer set successfully amplified the 59 end of COI across a wide range of bird species. The amplified segments were purified by excising bands from agarose gels and centrifuging each through a filter tip [55]. Sequences were obtained on an ABI3730 (Applied Biosystems) according to the manufacturers' suggested protocols using the same primer LTyr to sequence the 59end, and the internal primer COI748Ht (reverse-TGGGARATAATTC-CRAAGCCTGG [19]) to sequence the reverse 39end, resulting in a sequenced product of about 750 bp. Sequences were checked for ambiguities in CodonCode Aligner (CodonCode Corporation), and Geneious 5.3 [56].

Data analyses
Sequences were aligned in Geneious 5.3 using the Geneious alignment algorithm, with gap penalty set as 12.8, and gap extension penalty set as 3. Species and genera counts were performed in the software environment R 2.12 [57]. Genetic distances were calculated under the Kimura 2 -Parameter model (K2P) for all pair-wise comparisons in the matrix using PAUP4b10 [58]. Two datasets of genetic distances were built in R: the first, including all within-species comparisons; and the second, including among-congener comparisons (excluding within-species ones). We wrote R scripts to summarize the mean, variance, maximum, and minimum genetic distances per species and among congeners, respectively, using the first two datasets. Frequency plots of pairwise genetic distances for congeners of different species, and with only within species comparisons were built in R. The maximum likelihood tree topology for the complete dataset was calculated in Geneious 5.3 [56] using PHYML [59]. The best fit-model (General Time Reversible with proportion of invariable sites and gamma, GTR+I+C, I=0.5, C = 0.42) was selected with jModelTest [60] with a sample of the original dataset including one or two representative samples of each bird family. Species were considered not distinguishable by DNA barcode if: a) they were not monophyletic; b) they shared barcodes with other species; or c) their intraspecific variation overlapped with the lowest 5% of among-species variation, and reciprocal monophyly of sampled individuals could not be distinguished from random branching at p = 0.05 with the test for chance occurrence of reciprocal monophyly [25,26].
Within-species clusters with minimum pairwise distances higher than 1.5% K2P were considered for analyses, because this level of genetic distances overlapped with more than 5% of among congeners comparisons (Figure 1), but information on clades differing by less than 1.5% K2P distance is also available (File S1). Species without unique barcodes were sorted into the following non-exclusive categories: I) they share barcodes with species occurring in sympatry or II) they share barcodes with species occurring in allopatry, or III) were monophyletic differing from their sister species by few mutations, or IV) paraphyletic species with lineages more than 1.5% divergent.
For all the paraphyletic and monophyletic species with deep intraspecific divergences, we compared the genetic discontinuities with the geographic locality of the samples. Because areas of endemism are known to harbor unique biota, and many subspecies of birds are delimited also by these zones [38,39,40], we classified the sample localities of individuals according to the areas of endemism in the Amazon and in the Atlantic forest where they occur (Figure 1). We adopted the revised areas of endemism in Amazon and Atlantic forest from Bates et al. [37] and Borges [61]. Samples collected in other localities were classified according to their respective ecoregion according to the simplified map from Haffer [5] (Figure 2).

Supporting Information
File S1 Maximum likelihood tree of 1,431 COI barcodes from the 561 Neotropical bird species surveyed. Zip file including the tree topology in pdf format. Codes after species names correspond to their Process ID in BOLD (Table S2)