DNA Barcoding in Pencilfishes (Lebiasinidae: Nannostomus) Reveals Cryptic Diversity across the Brazilian Amazon

Nannostomus is comprised of 20 species. Popularly known as pencilfishes the vast majority of these species lives in the flooded forests of the Amazon basin and are popular in the ornamental trade. Among the lebiasinids, it is the only genus to have undergone more than one taxonomic revision. Even so, it still possesses poorly defined species. Here, we report the results of an application of DNA barcoding to the identification of pencilfishes and highlight the deeply divergent clades within four nominal species. We surveyed the sequence variation in the mtDNA cytochrome c oxidase subunit I gene among 110 individuals representing 14 nominal species that were collected from several rivers along the Amazon basin. The mean Kimura-2-parameter distances within species and genus were 2% and 19,0%, respectively. The deep lineage divergences detected in N. digrammus, N. trifasciatus, N. unifasciatus and N. eques suggest the existence of hidden diversity in Nannostomus species. For N. digrammus and N. trifasciatus, in particular, the estimated divergences in some lineages were so high that doubt about their conspecific status is raised.


Introduction
Neotropical ichthyofauna is extremely large and diverse based on the latest survey of the diversity of freshwater fishes of Central and South America [1], it includes 71 families encompassing 4,475 known valid species. Moreover, the authors estimate there are 6,000 species in the neotropics, close to half the number of freshwater fish worldwide.
The Amazon, home of more than two thousand freshwater fish species, is well known as a diversity hotspot. The complex evolutionary history of Amazonian organisms, including fishes, is gradually beginning to be better understood. Postulated reasons for the origin and evolution Steindachner, N. marginatus Eigenmann, and N. trifasciatus Steindachner (A. L. Netto-Ferreira, unpublished data). Many of the species have conspicuous color variations depending on their geographic location. Indeed, some of these color variants have been described as separate species in the past [39][40][41][42].
Recently, molecular analyses were conducted in two Nannostomus species (N. eques and N. unifasciatus), both of which are widely distributed throughout the Amazon basin [43][44]. Strikingly, the divergence time analyses of the lineages of both species allowed the authors to estimate an approximate divergence time in the middle Pliocene for lineages in N. eques (around 2.9 Mya) and in the late Miocene for N. unifasciatus (around 8.4 Mya). The presence of distinct phylogroups in both species have been observed in representatives from the Negro River Basin, showing that due to hidden diversity, those species should not be treated as having a single stock for management purposes. Such data may be useful in regulating the exploitation of fish species by the aquarium trade [43][44].
Given the biological and economic importance of these species, and as part of a study of the molecular phylogeny of Nannostomus, we report herein an application of DNA barcoding to the identification of 12 species of pencilfishes and highlight the deeply divergent clades detected within several nominal species, including the presence of additional cryptic species.

Ethics statement
This survey was carried out in strict accordance with the recommendations by the National Council for Control of Animal Experimentation and the Federal Board of Veterinary Medicine. The protocol was approved by the Committee on the Ethics of Animal Use of the Instituto Nacional de Pesquisas da Amazônia (041/2012).
All specimens for this study were collected in accordance with Brazilian laws under a permanent scientific collection license issued in the name of Dr. Jorge I.R. Porto and approved by the Brazilian Institute of Environment and Renewable Natural Resources (IBAMA) through the System Authorization and Information on Biodiversity (SISBIO #11489-1).
Sampling sites were chosen at distant points in the Amazon basin in order to cover at a minimum, the distribution range of the species [33] (Fig. 1). Two lebiasinids were used as outgroups (Lebiasina colombia and Copella nigrofasciata) because they are members of each of the two subfamilies in Lebiasinidae.
All specimens were identified with the help of taxonomists and identification keys [34,37] end all procedures complied with the recommendations of local ethics committees. Voucher specimens were deposited in the collection of the Instituto Nacional de Pesquisas da Amazônia, Manaus, AM, Brazil.

Extraction, amplification, and DNA sequencing
Genomic DNA was isolated from muscle tissue based on standard techniques and protocols [45]. The partial mitochondrial COI gene (648pb) was amplified by PCR using a combination of primers FishF1, FishR1, FishF2 FishR2) [20] or cock-tail C_VF1LF_t1-C_VR1LR_t1 under conditions previously described [46].
Polymerase chain reactions were performed in a total volume of 15μL (*10-50 ng DNA template, 1X buffer (750 mM Tris-HCl, pH 8.8, 200 mM (NH 4 ) 2 SO 4 ), 1U Taq polymerase (Thermo Scientific, Waltham, USA), 0.2 mM dNTPs, 0.2 μM of each primer, 2 mM MgCl 2 , and ultrapure water. PCR cycling was performed with the initial denaturation for 2 min at 95°C followed by 35 cycles of 30s at 95°C, 30s at 52°C-54°C, 1 min at 72°C and with a final extension for 10 min at 72°C. PCR products were resolved on 1% agarose gels and purified using polyethylene glycol 8000 (USB, Cleveland, USA). The bi-directional sequencing was performed utilizing an ABI BigDye TM Terminator v.3.1 Cycle Sequencing Ready Reaction Kit and an ABI 3130xl DNA Analyzer (Applied Biosystems, Foster City, USA).
Data sequences, collection sites, primers details and trace files were submitted to the Barcode of Life database (BOLD; http//www.boldsystems.org) in under project "Barcoding of Lebiasinids."

Data analysis
Consensus sequences for the COI gene were generated using the BioEdit program [47], and after editing the sequences, the final matrix was 574bp.
All sequences were analyzed using MEGA 5 to check the occurrence of deletions, insertions, and stop codons. Search tools with local alignment were used to identify the sequence in Gen-Bank and the BOLD. Sequences were aligned using Clustal W [48], and the program DnaSP version 4.0 [49] was used to determine the nucleotide composition, number of polymorphic sites, and haplotypes diversity. The genetic distance among and within observed clusters was calculated using the Kimura-2-parameter (K2P) model. A Bayesian phylogenetic analysis was conducted using MrBayes 3.2 [50]. For this analysis, Markov chain Monte-Carlo sampling was conducted every 120,000 generations until the standard deviation of split frequencies was <0.01. A burn-in period equivalent to 25% of the total generations was necessary to recapitulate the parameter values and trees. The parameter values were evaluated based on 95% credibility levels to ensure a sufficient number of generations had been run for the analysis. A neighbor-joining (NJ) tree of K2P distances was created to provide a graphic representation of the relationships among specimens and clusters with MEGA 6.0 [51]. Bootstrap resampling [52] was applied to assess the support for individual nodes using 1000 pseudo-replicates. The program Haploviewer [53] was used to construct a tree-based haplotype network. Independent networks were regarded as unconfirmed candidate species.

Results
COI sequence data were obtained for 110 specimens of Nannostomus representing 14 specieslevel taxa. A mean of 8 individuals (range 1-17) represented each species, with only N. harrissoni represented by a single specimen.
The amplified product was approximately 650 bp but only 573 nucleotides were considered for this analysis, of which 234 were variable and parsimony informative. These variations defined 68 haplotypes ranging from 1-5 individuals per haplotype. At no time was a detected haplotype shared between different species. There were no deletions, insertions, or stop-codons. As expected for fish COI, the nucleotide composition showed a CT bias (means C = 24.0%, T = 32.7%, A = 25.9%, G = 17.3%) within Nannostomus.
The results indicated that species could be discriminated by the DNA barcode approach since the samples of distinct species were represented by a unique haplotype, a single tight cluster of haplotypes, or distinct clusters of haplotypes in neighbor-joining tree (Fig. 2).

Discussion
Molecular methodologies developed rapidly in recent years, and currently DNA barcoding is considered the most useful tool for species identification. Indeed, a great advantage offered by DNA barcoding is the possibility of identifying cryptic species, as can be seen in several publications since its launch in 2003 [54].
According to the Fish Barcode of Life project database (www.fishbol.org), only one Nannostomus species had been previously barcoded. In our data set, 14 Nannostomus species were DNA barcoded, and they were easily identified by this approach, given that all recognized species formed monophyletic clusters (Fig. 2). However, two species (N. nitidus and N. limatus) revealed shallow interspecific sequence divergence (2.2%) when compared to other    Nannostomus species (Table 1). Despite this, no evidence of shared sequences among both species were observed, suggesting either recent speciation or the need of synonymization. In contrary, four species (N. digrammus, N. trifasciatus, N. unifasciatus, and N. eques) showed deep intraspecific sequence divergence, suggesting the existence of overlooked species within Nannostomus, although some lineages were represented by a single individual ( Table 2).
The COI delineations of the T1, T2, T3 and E5 Nannostomus lineages were achieved with only one individual. We are aware that hypothetical intraspecific lineages can be hindered by inadequate sample size in large geographic areas such as the Amazon Basin. In flathead fishes, for example, the limited number of specimens in a particular lineage and the sparse geographical spread of the samples for some of the proposed lineages restricted the ability to evaluate the extent of genetic diversity across several groups [55]. Thus, we suggest future surveys of these Nannostomus lineages to confirm if they do, in fact, represent evolutionary units.
All the lineages found in each of the above mentioned species were well supported by bootstrap values (>80) in the NJ tree, independent of the mutational steps necessary to connect the haplotype networks below the standard statistical parsimony (Figs. 2 and 3). The two lineages observed in N. digrammus (D1 and D2) diverged by 12.6%. The five lineages of N. trifasciatus had a mean divergence of 8.1%, while that of the three lineages of N. unifasciatus was of 7.1%. Finally, the five lineages of N. eques diverged by 4.3% (mean). The divergence between the lineages of Nannostomus was greater than obtained for other marine and freshwater fish species [20,[56][57][58][59][60][61].
It was evident in this study that the cutoff value of 2% does not apply to Nannostomus species, given the mean congeneric (*19%) and conspecific (*3%) distances were high. Indeed, surveys on North American and Neotropical freshwater ichthyofauna have shown that the mean congeneric and conspecific genetic distances are usually > 6.8% and <0.73%, respectively [62][63], with the exception of the Pampa Plain freshwater fishes at, the southernmost distribution range of many Neotropical species, where the mean congeneric genetic distance is 1.67% [64].
Among lebiasinids, only Nannostomus has twice been taxonomically revised, in addition to the phylogenetic hypotheses [32][33][34]36]. Morphology-based taxonomy has shown that Nannostomus cannot be considered to have a phenotypic conservatism. Apparently, there are poorly defined Nannostomus species complexes based on morphological grounds, including N. beckfordi, N. eques, N marginatus, and N. trifasciatus (A. L. Netto-Ferreira, unpublished data). Recent mitochondrial and nuclear DNA data have revealed hypothetical evolutionarily significant units in species of Nannostomus. For example, in N. unifasciatus, the DNA sequence data of the intron in the S7 ribosomal protein gene revealed two distinct lineages in the Rio Negro basin [43]. In N. eques, the mitochondrial DNA control region also revealed the existence of two lineages in the Rio Negro basin [44]. Together, the morphological and genetic studies indicate that species richness in the genus is probably underestimated.
The Amazon aquatic ecosystem is rich in diversity and quick access to information on Amazonian biodiversity is essential. Forest degradation and, fishery over-exploitation enhance the risks of species extinction, and quickly updated information regarding the fishes caught in wild fisheries (such as the Nannostomus species) is necessary to implement appropriate practices for to their conservation and, management and to prevent exploitation.
Pencilfishes are commercial ornamental fish and constitute a source of revenue for the riverine people of the Amazon. The discovery of cryptic species becomes very important when they are targets for commercial use. Considering the economic importance of the Nannostomus species, its DNA barcoding contributes to conservation policy in two important ways: by enhancing Amazonian biodiversity assessments to prioritize conservation areas (e.g., upper Rio Negro), and by providing information about evolutionary histories and phylogenetic diversity (e.g., unveiling hidden diversity).
DNA barcoding of ornamental marine fishes has generated data and provided confidence in species identification, opening new avenues for managing business practices [65]. Of the approximately *500 Brazilian ornamental fishes allowed to be sold in the ornamental fish trade, 7 are pencilfishes: N. beckfordi, N. digrammus, N. eques, N. espei, N. marginatus, N. trifasciatus, and N. unifasciatus. Our study contributes new Amazonian fish barcodes, providing for a more comprehensive species identification of the ornamental pencilfishes, in addition to revealing the hidden diversity in the analyzed Nannostomus species. It is certain that future species delimitation of Nannostomus should be in accordance with the spirit of integrative taxonomy as well and based on congruence across analyses that utilize multiple data sources.