DNA Barcoding Reveals High Cryptic Diversity of the Freshwater Halfbeak Genus Hemirhamphodon from Sundaland

DNA barcoding of the cytochrome oxidase subunit I (COI) gene was utilized to assess the species diversity of the freshwater halfbeak genus Hemirhamphodon. A total of 201 individuals from 46 locations in Peninsular Malaysia, north Borneo (Sarawak) and Sumatra were successfully amplified for 616 base pairs of the COI gene revealing 231 variable and 213 parsimony informative sites. COI gene trees showed that most recognized species form monophyletic clades with high bootstrap support. Pairwise within species comparisons exhibited a wide range of intraspecific diversity from 0.0% to 14.8%, suggesting presence of cryptic diversity. This finding was further supported by barcode gap analysis, ABGD and the constructed COI gene trees. In particular, H. pogonognathus from Kelantan (northeast Peninsular Malaysia) diverged from the other H. pogonognathus groups with distances ranging from 7.8 to 11.8%, exceeding the nearest neighbor taxon. High intraspecific diversity was also observed in H. byssus and H. kuekanthali, but of a lower magnitude. This study also provides insights into endemism and phylogeographic structuring, and limited support for the Paleo-drainage divergence hypothesis as a driver of speciation in the genus Hemirhamphodon.


Introduction
Biological diversity is believed to be entering an era of mass extinction where it is disappearing worldwide at unprecedented rates [1]. Thus, precise taxonomic delineation of species is crucial in the context of biodiversity conservation [1]. Traditionally, species description and identification are based on morphological traits, however, the morphological approach alone has intrinsic limitations. Phenotypic plasticity and genotypic variation may mask the morphological eight species (H. byssus, H. kuekenthali, H. kapuasensis, H. tengah, H. chrysopunctatus, H. phaiosoma, H. sesamum and H. kecil) occur in Borneo, H. byssus and H. kuekenthali, which are endemic to Sarawak, are geographically separated from the other six species by mountain ranges traversing the island of Borneo. Hemirhamphodon sesamum that has been discovered in eastward flowing lowland coastal basins of South Kalimantan is most closely related to H. kuekenthali in terms of external morphology [21]. Hemirhamphodon kecil ("kecil" means small in the Malay/Indonesian language) has a smaller adult size compared to its closest congener, H. pogonognathus and is found in the lower Mahakam basin in east Kalimantan. Hemirhamphodon kapuasensis is endemic to the Kapuas river basin in Kalimantan Barat. The most morphologically distinct species is H. phaiosoma with a relatively higher number of dorsal-fin rays. However, based on reproductive behavior, H. tengah seems to be the most distinct species. As noted above, the Hemirhamphodon genus are livebearers or viviparous, but H. tengah has been observed to lay fertilized eggs [31][32][33]. This, undoubtedly, is a key factor that has generated much interest in the scientific biodiversity community.
This study focused on the preliminary assessment of the species diversity within the genus Hemirhamphodon through DNA barcoding. Based on the findings of Anderson and Collette [23] and the revision by Tan and Lim [21], we hypothesise that there is potentially high cryptic diversity in this genus due to its broad distribution and locale-specific polymorphisms.

Ethic Statement
This study was carried out in accordance with the recommendations and approval by the Universiti Sains Malaysia Animal Ethics Committee. No protected species were collected in this study. Permission was granted from the State Forestry Department for field sampling in Peninsular Malaysia and Sarawak. Samples from Sumatra were obtained from collaborators from Syiah Kuala University and Riau University in Sumatra.

Sample collection, preservation and DNA extraction
Collection of H. pogonognathus, H. kuekenthali and H. byssus species samples were conducted in Peninsular Malaysia, Sarawak and Sumatra as shown in Fig 1 and Table 1. The information on geographical locations and voucher specimens is shown in S1 Table. Specimens were euthanised with Transmore (NIKA Trading Co.), a commercial fish stabilizer commonly used in aquatic trading in Malaysia before the fin clips were excised from the pectoral fin of the specimens and stored in 95% ethanol for DNA extraction. Specimens were subsequently formalin fixed and preserved in 70% ethanol as vouchers. Voucher specimens were identified based on taxonomic keys by Anderson and Collette [23] and Tan and Lim [21]. Images of voucher specimens were captured by digital camera prior to being placed in 95% ethanol for long-term storage. Voucher specimens for each COI barcode were deposited at the Museum of Biodiversity, Universiti Sains Malaysia. DNA extractions were conducted using the high-salt DNA extraction protocol [34] and then stored at -20°C until required.

Gene amplification and sequencing
A partial segment of the COI gene of 3 to 5 individuals per location (except only 2 individuals from Koba) were PCR amplified using BIO-RAD T100 Thermal Cycler (BioRad Laboratories Inc., USA) with primers Fish F2 and Fish R2 [6]. The PCR conditions were conducted as described in Ward et al. [6]. The PCR products were electrophoresed on 1.5% agarose gels for band characterization and purified (PCR Clean-up System, Promega, Madison, WI, USA).
Sequencing was conducted by a service provider (First Base Laboratories Sdn. Bhd., Malaysia) using an ABI3730XL Genetic Analyzer (Applied Biosystems, Foster City, CA, USA).

Data Analysis
All sequences were edited using MEGA v6.0 [35]. Multiple alignments were evaluated using MUSCLE [36] that is integrated within MEGA. Minor adjustments were then made by eye to manually remove any false homologies. The pairwise comparison matrices were constructed using the Kimura 2 Parameter (K2P) model as it is most widely used in COI barcoding studies [2,37]. To check the presence of a "barcode gap" in our dataset, the maximum intraspecific divergences against the minimum nearest-neighbour divergences was plotted.
The Automatic Barcode Gap Discovery (ABGD) species delineation tool on a web interface (http://wwwabi.snv.jussieu.fr/public/abgd/abgdweb.html) with default settings for the K2P distance matrix was employed [38] to determine the number of operational taxonomic units (OTUs) based on pairwise sequence distances between individuals within the dataset. A Neighbour-Joining (NJ) COI gene tree was constructed using K2P models with the bootstrap procedure [39] of 10000 pseudoreplicates. A Bayesian Inference (BI) COI gene tree was included to examine any likelihood of different positioning of OTUs. The BI tree was constructed using MrBayes [40] with HKY+G+I model (optimal substitution model under Bayesian Information Criterion using ModelTest in MEGA) for 1000 pseudoreplicates. The length of the MCMC chain was 5 million with a sample frequency of 500.

Results
A total of 201 individuals were successfully PCR amplified for the COI gene consisting of three (morphologically) recognized species. All sequences have been deposited in GenBank  (accession number: KM405651 -KM405787 and KX216532 -KX216595) and are available on BOLD public datasets DS-HMRD. Several additional species and two out-group sequences totaling 112 individuals from GenBank (Table 2) were also included in the analyses. Three other documented species, H. kapuasensis, H. kecil and H. sesamun were not included in the analyses as no specimens were obtained and no GenBank sequences were available. The final COI gene segment consisted of 616bp with average nucleotide composition of A = 24%, T = 35%, C = 25% and G = 16%, 231 variable sites, of which 213 were parsimony informative. No insertions, deletions or stop codons after translation were found, indicating that the amplified sequences constitute functional mitochondrial COI sequences and did not harbour any nuclear mitochondrial pseudogenes (numts).
The summary values of genetic divergence across taxonomic level based on K2P are shown in Table 3. The mean genetic divergence within species was 2.5% while the within genus  The number of OTUs generated by ABGD based on K2P varied from 1 to 74 (Fig 5). The initial partition at a prior intraspecific divergence (P) (P = 0.0077-0.0359) produced 9 OTUs, and was in concordance with the NJ and BI trees. The additional OTU identified by ABGD was H. pogonognathus from Rambat.
Pairwise comparisons (Table 5) were computed for the newly assigned groupings according to the constructed NJ COI gene tree and the ABGD results. As expected, intraspecific divergence exhibited a decreased value with a mean of 1.5% ranging from 0% to 4.3%. However, the intraspecific divergence for H. pogonognathus (southern Sumatra) remained high (4.3%). Pairwise divergence between groups for H. byssus (southern vs central) and H. kuekenthali (northern vs central) were 4.9% and 6.9% respectively. These values might be considered high but did not exceed the minimum nearest-neighbour interspecific divergence (7.9%).
On the other hand, pairwise divergence among the three H. pogonongnathus groups ranged from 7.8% to 11.8%, where divergences of the Kelantan group from other populations exceeded the minimum interspecific divergence value. A re-analysis of the barcode gap was conducted (Fig 2) for the new grouping. Again, the results revealed absence of a barcode gap in H. pogonognathus from southern Sumatra, H. kuekenthali from northern and H. byssus from central. The results also revealed that members of the Hemirhamphodon genus appear to be allopatrically distributed. To determine whether the high intraspecific divergence was influenced by Paleo-drainage systems in Sundaland as discussed in de Bruyn et al. [41], the NJ COI gene tree clusterings were mapped against the Paleo-drainages (Fig 6) as suggested by Voris [42]. The mapping results revealed that only the divergence of H. pogonognathus from southern Sumatra is consistent with the Paleo-drainage (north Sunda) hypothesis. Samples from Malacca and Siam Paleo-drainages formed a single mixed group instead of two groups. Although there is no record of a Paleo-drainage for north Borneo (Sarawak), the divergence of H. byssus and H. kuekenthali seem likely to be congruent with a presently unknown barrier.

Discussion
The DNA barcoding approach is now widely recognized as an efficient tool to facilitate rapid identification of unidentified or unknown taxa through a DNA barcode reference library and also in assessment for conservation purposes, including of cryptic and microscopic organisms, particularly those with morphologically ambiguous characters [1,4]. In this study, DNA barcode analysis was used in an attempt to assess cryptic diversity in the genus Hemirhamphodon. The analysis also permitted insights into the influence of Paleo-drainage systems of Sundaland in driving species diversity.    The high levels of intraspecific divergence in H. pogonognathus, H. byssus and H. kuekenthali suggest that this genus exhibits high cryptic diversity. Several studies have reported the same phenomenon, for instance of the fighting fish Betta [43], flathead fishes [11] and fishes from Nujiang River [44], which revealed very high species diversity in the absence of apparent morphological differences. The constructed COI gene trees were generally consistent with the current morphological delimitation of Hemirhamphodon species, although several species were only represented by a single sequence here. However, the clustering of H. pogonognathus split into three well-supported clusters with high levels of divergence, except H. pogononathus from Kelantan (bootstrap value of 59). This result further supports the existence of cryptic diversity within the H. pogonognathus group. Given its broad distribution in a biodiversity hotspot and the recent documentation of species discovery in Hemirhamphodon [21], this is therefore not surprising. This pattern could be interpreted as any of these: a recent speciation event, interspecies hybridization, or as (morphological) misidentification [45][46]. When hybridization occurs, the divergent sequence will cluster with the clade of one of the hybridizing species. Conversely, for cryptic species, a new divergent clade will be apparent that is different from that of any currently recognized species [11]. Our results clearly exhibit the split of different clusters indicating the probable presence of cryptic diversity.
Multiple lineages generated through tree construction demonstrated the potential occurrence of sources from different drainages. The mapping of Sundaland Paleo-drainage systems against the NJ COI gene tree revealed that Paleo-drainages also likely played a role in the high intraspecific divergence values recovered here. This was evident in the H. pogonognathus southern Sumatran group which follows the north Sunda Paleo-drainage, diverging from the main H. pogononathus group (combined Malacca and Siam Paleo-drainages) as shown in Fig  6. Additionally, the NJ COI gene tree also shows further splits within H. byssus and H. kuekenthali into multiple lineages. However, these bifurcations did not split out from the currently recognized species to form a new cluster as in H. pogonognathus. de Bruyn et al. [41] found little evidence for Paleo-drainage systems driving divergence in Hemirhamphodon, but strong support for this mechanism influencing diversity in the halfbeak genus Dermogenys. They postulated that life history strategies (Hemirhamphodon = forest stream specialists; Dermogenys = brackish water generalists) could have been an important determinant of ability to migrate via these vast Pleistocene paleo-river systems, before subsequent allopatric splitting as sea-levels fell and paleo-systems waned. Although the mapping result demonstrated no obvious Paleodrainage assigned for north Borneo (Sarawak), the different lineages of H. byssus and H. kuekenthali seem very likely to be geographically restricted lineages resulting from unidentified barriers. On the other hand, these divergences could also be associated with ecosystem-dependent adaptive radiation. In addition, the Rajang River seems to have acted as an effective barrier leading to endemism of H. byssus and H. kuekenthali with H. byssus only found in the south of the Rajang basin and H. kuekenthali in the north.
The tree topology showing several distinct clusters within certain species coupled with high levels of intraspecific diversity indicated probable occurrence of cryptic species [44]. High genetic divergence within nominal species can be interpreted as misidentification, or more importantly as cryptic or unrecognized speciation events [11,12,47]. There are several criteria proposed for species delineation based on the DNA barcoding approach. Hebert et al. [48] proposed the '10X rule' as an indicator of cryptic speciation. On the other hand, Ward et al. [49], who analysed barcode data from about 1000 fish species, showed that individuals were much more likely to be congeneric than conspecific at a level of 2% distance or greater. Another criterion is the use of the barcode gap, which is the distance or gap between the maximum intraspecific and minimum interspecific distances [38,48,[50][51][52].
Barcode gap analysis for our dataset for both initial groupings and new groupings (Fig 2) revealed that no barcode gap was present in H. pogonognathus, H. kuekenthali and H. byssus, which indicated the probable existence of more than one species within each of these taxa. In addition, the ABGD method generated 9 OTUs, which is nearly concordant with the COI gene trees. The H. pogonognathus complex formed three OTUs, which potentially implies detectable intraspecific diversity. Thus, referring to the results of the genetic distance, COI gene trees and barcode gap analyses, the existence of high cryptic diversity among our dataset is apparent.
The three genetic lineages across H. pogonognathus most likely represent species-level taxa, suggesting that H. pogonognathus consists of at least two distinct species. Hemirhamphodon pogonognathus was the most widely distributed species and showed high intraspecific divergence values up to 14.8% with a mean of 4.5%, separated by three geographical splits (Fig 6). Further re-grouping revealed that the H. pogonognathus complex exhibited deep divergence among the three H. pogonongnathus groups, ranging from 7.8% to 11.8% (Table 4), where divergences of the Kelantan group from the other two groups exceeded the minimum interspecific divergence (7.9%). No obvious morphological characters were found to distinguish them into different species even though colour differentiation was observed among some localities. In fact, the genetically distant Kelantan group shared the same colour pattern with the main group. In addition, the Kelantan group was separated from the main cluster to form its own clade in the COI gene trees, further supporting the probable existence of cryptic species in the H. pogonognathus group. Thus, we propose that H. pogonognathus from Kelantan has the potential of being a new species record.
The six morphological species included in this study generated 9 OTUs. This DNA barcoding study shows the high potential in cryptic species assessment particularly within 'hyperdiverse' SE Asia. The species status for H. byssus and H. kuekenthali remains unclear, and each could be considered as a species complex. Based on this study, the genus of Hemirhamphodon is proposed to consist of at least 10 species namely H. pogonognathus, H. kuekenthali, H. byssus,  H. phaiosoma, H. chrysopunctatus, H. tengah, H. kapuasensis, H. sesamun, H. kecil (the last three species not included in this analysis) and the newly proposed H. pogonognathus sp. of Kelantan. Nevertheless, further investigations with combined molecular and morphological approaches, and also population level analyses with larger sample sizes are needed to clarify Hemirhamphodon taxonomy.

Conclusions
In conclusion, the present study exhibited the power of DNA barcoding in species diversity assessment. In addition, the findings highlight the high cryptic diversity of this genus, which is in agreement with our initial hypothesis. However, a more integrated study including molecular and morphological approaches needs to be conducted to resolve the issue of the identification of paraphyletic or species complexes of several Hemirhamphodon species. Lastly, more studies of freshwater fishes with a range of distributional, life history and other datasets (genetic, phenotypic and geographical) incorporating multiple research tools need to be conducted in order to have a better understanding of the drivers and maintenance of biodiversity within the SE Asian region.
Supporting Information S1