Evidence of cryptic diversity in freshwater Macrobrachium prawns from Indochinese riverine systems revealed by DNA barcode, species delimitation and phylogenetic approaches

The diversity of Indochinese prawns in genus Macrobrachium is enormous due to the habitat diversification and broad tributary networks of two river basins: the Chao Phraya and the Mekong. Despite long-standing interest in SE-Asian decapod diversity, the subregional Macrobrachium fauna is still not yet comprehensively clarified in terms of taxonomic identification or genetic diversification. In this study, integrative taxonomic approaches including morphological examination, DNA barcoding, and molecular species delimitation were used to emphasize the broad scale systematics of Macrobrachium prawns in Indochina. Twenty-seven nominal species were successfully re-verified by traditional and molecular taxonomy. Barcode gap analysis supported broad overlapping of species boundaries. Taxonomic ambiguity of several deposited samples in the public database is related to inter- and intraspecific genetic divergence as indicated by BOLD discordance. Diagnostic nucleotide positions were found in six Macrobrachium species. Eighteen additional putative lineages are herein assigned using the consensus of species delimitation methods. Genetic divergence indicates the possible existence of cryptic species in four morphologically complex and wide-ranging species: M. lanchesteri, M. niphanae, M. sintangense, and some members of the M. pilimanus group. The geographical distribution of some species supports the connections and barriers attributed to paleo-historical events of SE-Asian rivers and land masses. Results of this study show explicitly the importance of freshwater ecosystems in Indochinese subregions, especially for the Mekong River Basin due to its high genetic diversity and species composition found throughout its tributaries.

Introduction DNA barcoding has been promoted as an effective molecular tool for rapid surveys of microbial [1,2], floral [3], and faunal diversity [4]. Recently, DNA barcode data from single and multiple molecular loci have been integrated with various datasets and applied at broad and specific scales for uses such as food authentication and traceability [5,6], wildlife forensics [7,8], and ecological community [9] and systematic studies [10][11][12]. The selection of gene fragments from different sources of genomic DNA has been proposed to find a standard region for species identification within larger groups of organisms, such as chloroplast genes for plants [13], and mitochondrial genes for animals [14]. In the animal kingdom, the cytochrome C oxidase subunit I (COI) is widely used for species screening [14]. This mitochondrial gene contains a rich informative site, which is desirable for molecular systematic study. Moreover, additional markers for DNA barcode studies such as mitochondrial and nuclear ribosomal genes have been integrated with COI, expanding the power of species delimitation and improving sequence clustering results [15,16].
The barcoding region has also shown significant utility in biodiversity studies that aim to rapidly identify unknown specimens collected from various habitat and sample types such as soil [17,18], water [19], feces [20,21], and dietary products [22,23]. Moreover, it has been used to construct regional and local DNA barcode libraries for further biodiversity assessment [24,25]. The use of integrative methods in systematics has shown remarkable ability to resolve species identification problems in organismal groups [26,27]. Because they are cost-effective tools for species validation, classification, and phylogeography pattern testing, DNA taxonomy based on COI barcode, single-multi locus phylogeny, and species delimitation approaches have become popular in modern systematic research to clarify taxonomic problems, especially in morphologically complex groups [28][29][30][31][32][33][34]. As a consequence of applying integrative methods, cryptic diversity and phylogeographical patterns of organisms have been revised, especially in crustacean taxa [34][35][36][37][38]. Taxonomy of several freshwater and marine crustacean species has been refined by using the integration of morphological identification and molecular delimitations [39,40].
The palaemonid prawn genus Macrobrachium Spence Bate, 1868 has economic importance worldwide. Recently, forty-nine Macrobrachium species have been promoted for fisheries industries [41]. These include widespread species such as M. rosenbergii, M. lanchesteri, and M. sintangense, which are consumed daily by local people as a protein resource. However, the high genetic diversity, evolutionary relationships and related biological adaptations of indigenous Macrobrachium species have only minimally been investigated in Asia; most research has focused on Neotropical American and Australian fauna [42][43][44]. In SE-Asia, the giant river prawn, M. rosenbergii, is the most economically important species [41], with the most advanced breeding program in the aquaculture industry [45]. Another study showed high genetic diversity of this species across drainages in SE-Asia [37]. Some life history traits of other Asian Macrobrachium species also have been tested using the integration of molecular, morphological, and ecological data [42,46,47].
The distribution of Macrobrachium prawns has been associated with river tributary networks across mainland SE-Asia including the Indochina subregion [48][49][50]. In addition, Macrobrachium prawns have been suggested for use in biological and ecological monitoring of Asian river ecosystems [51]. Some indigenous and endemic species are associated with specific habitats in this area such as M. sirindhorn and M. spelaeus from mountainous and limestone karst territory in Thailand; M. lanatum from southern Myanmar; and M. dalatense, M. hungi, and M. saigonense, which are endemic to the Mekong River basin in Cambodia and Vietnam [52][53][54].
The morphological and physiological changes during the developmental period of aquatic animals is assumed to be related to ecological factors such as current, water level and variation of habitat type [42, [55][56][57][58]. Habitat gradient, one of the selective mechanisms in aquatic animals, is thought to have had an impact on species diversification in several phylogenetic lineages of Macrobrachium that originated from the paleo ocean and further invaded freshwater habitats since the Mesozoic [59]. The adaptation to live as diadromous species offered high dispersal ability in Macrobrachium species (e.g., M. nipponense; see Chen, Shih [60]). The presence of environmental gradients in aquatic habitats plays a crucial role and may be a key for speciation in marine and freshwater animals [61]. The effect of altitude gradient on habitat diversification has been suggested in Macrobrachium diversity [62]. In Asia, some Macrobrachium species live in extreme environments, such as those which occupy subterranean water systems and exhibit troglobitic morphological features and life history adaptations [63][64][65][66].
Because of the habitat diversity in Indochina and the insufficient data of taxonomic validity and diversity records, broad scale sampling of freshwater Macrobrachium prawns is crucially needed to fill gaps in the existing taxonomic data [48,50]. Moreover, the decline of several prawn species and populations in their natural habitats is of critical concern because of habitat destruction [79,80]. This study focuses on the improvement of regional data through integrative taxonomy, emphasizing species identification, species delineation, phylogenetic relationships and geographical distributions. In this study, newly determined COI sequences of twenty-three nominal species of Indochinese prawns are provided. The re-collection of topotypes of some described taxa is used to reconfirm species identity and position within the phylogenetic tree. The representative taxa used in this study cover 53% of all nominal taxa recorded in the Indochina subregion ( Table 1). The results of this study can benefit the conservation management of Indochinese freshwater prawns.

Specimen collection, preparation and identification
Prawns were collected from natural habitats during 2017-2019 in various river systems of Indochinese countries, including Thailand, Laos, Cambodia, and Vietnam (Fig 1). Collecting permission in remote areas of Thailand was granted by the Department of National Parks, Wildlife and Plant Conservation, Thailand (DNP 0907.4/14262). Permission in Cambodia was granted by the Fisheries Administration Ministry of Agriculture, Forestry and Fisheries of Cambodia. Animal use in this project strictly followed the recommended protocols approved by Chulalongkorn University (Protocol Review No. 1723018) and Mahidol University's Institute Animal Care and Use Committee (MU-IACUC) under approval number MU-IACUC 2018/004. Prawns were trapped by using artificial baited net traps and by net sieving. GPS coordinates were recorded in each collecting locality. The collecting localities were illustrated in

DNA extraction, amplification, and sequencing
Prawn specimens were dissected to obtain somatic tissue. The Nucleospin TM DNA extraction kit (MACHEREY-NAGEL, Germany) was used for genomic DNA extraction. Genomic DNA yield was quantified by Nanodrop spectrophotometer (Thermo Scientific, USA). Cytochrome C oxidase subunit I (COI) gene was amplified using universal LCO1490 forward primer [14,89] and a newly designed reverse primer specific for Macrobrachium, MacroR (5'-GCGGGTA-GRATTAARATRTATACTTC-3'). The standard PCR mixture contained 1 μl of genomic DNA, 2.5 μl of forward and reverse primers, 25 μl of EmeraldAmp PCR Maser Mix (TAKARA BIO, Japan) and 18 μl of ddH 2 O. PCR was carried out by an Eppendorf Master Cycler Pro S (Eppendorf, Germany) with gradient temperature function. The PCR conditions were set as follows: 94˚C for 5 min as an initial step followed by 36 cycles of 94˚C for 30 s for denaturation, 41-45˚C for 40 s, 72˚C for 15 s for extension, and then final extension at 72˚C for 10 min. The amplicon products were run by 1% agarose gel electrophoresis stained with SYBR Safe illuminant (Invitrogen, USA). Observation was made under a UV gel documentation machine. Later, the target products were purified using a QIAquick purification kit (QIAGEN, Germany). The purified products were sequenced by commercial sequencing company (Macrogen and Bioneer, Korea) using Applied Biosystems automatic sequencer.

Sequence editing, alignment, and phylogenetic reconstruction
Prawn COI gene sequences were aligned with deposited sequences in the GenBank library using the BLASTn algorithm to verify the correct group of organisms from obtained sequences. The verified sequences were assembled, edited, and aligned using MUSCLE algorithm [90] in MEGA 7 [91]. Input files for each phylogenetic method were configured in MEGA 7 and Mesquite v3.61 [92]. The best fit nucleotide substitution model was sampled by using JModelTest v2.1.10 [93]. In this study, maximum likelihood (ML) and Bayesian inference (BI) methods were applied to reconstruct phylogenetic trees from the COI dataset. For ML analysis, the dataset was analyzed in RAxML 8.0.0v [94]

DNA barcode analysis
Sequence datasets were registered and deposited in BOLD system [99]. The taxonomic account, voucher specimen ID, collecting locality, and voucher depositor were incorporated into the system for further analysis. Available barcoding sequences from previous literature such as Hanamura, Imai [50], Saengphan, Panijpan [69], Saengphan, Panijpan [70], Siriwut, Jeratthitikul [88], Wowor, Muthu [100] were included in the dataset. A list of sequences used in barcode analysis is presented in Table 2. The analysis tools in BOLD were used to calculate nucleotide diversity, barcoding gap, diagnosis nucleotide, and sequence cluster. All sequences in the dataset were initially registered to obtain Process ID and to check for possible pseudogenes or nuclear copies of mitochondrial DNA (NUMTs) based on the occurrence of stop codons or frameshift mutations in each submitted sequence. The genetic distances among sequences was analyzed using the MUSCLE option in the distance summary tool of the BOLD workbench. The barcode gap analysis was performed to indicate the genetic distance distribution between the operational targeted species and the nearest neighbor species. Two distance models were used: pairwise distance and K2-P distance. The alignment of sequence datasets was carried out under the MUSCLE option. The diagnostic nucleotide of non-singleton species was predicted under K2-P distance model and MUSCLE alignment algorithm. All parameters were set as default. According to a barcode index number (BIN) automatically assigned [101] in the BOLD system, the sequence dataset was checked with BIN records. The clustering result using the Refined Single Linkage algorithm (RESL) initially validated the putative OTUs in the sequence dataset. Alternatively, the DNA barcode gap plot was constructed via MEGA under the distance calculation panel; the dataset obtained from pairwise comparison, intraspecific and interspecific distances of defined sequences was comparatively implemented to construct a barcode gap diagram.

Species delimitation
Species delimitation was performed using four standardized methods: automated barcode gap (ABGD; Puillandre, Lambert [102]), the multi-rate Poisson Tree Processes (mPTP, Kapli, Lutteropp [103]), Bayesian implementation of Poisson Tree Processes model (bPTP, Zhang, Kapli         [104]), and the Generalized Mixed Yule Coalescent model (GMYC, Fujisawa and Barraclough [105]). For the ABGD method, the initial distance of intra-and interspecific variations obtained from each sequence was calculated in MEGA 7 and the optimized barcode relative gap analysis was implemented using the ABGD online sever (http://wwwabi.snv.jussieu.fr/ public/abgd/abgdweb.html). All parameters were configured as default settings except X (relative gap width), which was set as 1. The operating distance used in ABGD was calculated with Kimura (K80) TS/TV method. The graphical ratio between the number of OTUs and prior interspecific divergence was established to obtain the optimized threshold for barcode gap designation. The equivalent phase of recursive and initial partitions was selected to depict the number of putative OTUs found in the delimitation result. The mPTP analysis was conducted using the online server (http://mptp.h-its.org). The ML tree generated from RAxML in Nerwick format was uploaded to the web server. The outgroup selection was initially designated for the purpose of tree rooting. The visualization of the final delimitation tree was set as default. The bPTP species delimitation was carried out by online sever (https://species.h-its.org/). The Bayesian tree was initially calculated in MrBayes under 10 million sampling generations. The tree result was then transformed to either NEXUS or Nerwick format and submitted to the web server. All parameters such as number of MCMC generation and burn-in were set as default. The OTU clustering was estimated under 95% confidence of statistical probability. In the GMYC method, the initial Bayesian tree was calculated in BEAST v1.10.4 package [106,107] using a Mixed-Yule coalescent model tree prior. The sequence dataset was transformed into NEXUS format. All parameter settings were configured in BEAUTi v1.8.4. Tracer v1.6 was used to check ESS values and run the trace file. The construction of an ultra-metric tree was done in BEAST v1.10.4 via the CIPRES sever [95]. The maximum clade supported tree result was summarized in TreeAnnotator v1.10.4. The GMYC species delimitation was performed using R program with "splits" package [108].

Integrative taxonomic decision scheme
The results of delimitation methods can be varied due to the criteria of justification: ABGD requires the threshold value from genetic distance, GMYC is processed under ultramatric estimation of gene tress, PTP uses the same fundamental estimation as GMYC does but the effect of branch length proportion and the amount of genetic change is implied. Each molecular method result may be affected by population size, number of species involved, species divergence and number of sampling singletons in dataset. For morphological delimitation, the effect of geographical variation and limitation of samples for comparative study interfered with the species discrimination that reached variable OTU number in delimitation results. For this reason, we implemented the consensus criteria of OTU counts to summarize the final result. The results of five species delimitation methods were compared to find the consensus results, including 1) the traditional morphological identification, 2) molecular phylogenetic clade support from ML and BI reconstruction methods based on COI gene datasets (� 70 bootstrap value for ML and � 0.95 posterior probability for BI), 3) the distance gap of COI barcode analysis in BOLD and ABGD, 4) the Poisson Tree Processes model under multi-rate and Bayesian implementation (mPTP and bPTP), and 5) the generalized mixed Yule-coalescent (GMYC). The delimitation criteria for putative clustering were established based on a 66% consensus of five methods [109]. The species boundary was discriminated when the delimitation results showed congruence in at least four of the five methods. Species boundary lines were illustrated on a delimitation tree. Exemption criteria were implied in cases of singleton OTUs of representative taxa in datasets of this study. The BIN discordance result was established to serve as a warning signal for further morphological re-examination and taxonomic revision.

Morphological identification and species occurrences
Twenty-four nominal species from newly collected materials were successfully identified based on comparative morphology using various taxonomic sources. In total, thirty-six morphological species were included in the dataset of this study including identified specimens with deposited sequences from previous molecular taxonomic work. Pictures of some common morphological species groups found in the Indochina subregion are presented in Fig 2. The diagnostic characters have been evaluated and additional taxonomic characters have been recorded. The main taxonomic characters used in species identification were rostrum shape, rostrum teeth, epistome, anteromedial spine on carapace, second pereiopods, uropodal diaresis, and shape of telson. However, some taxonomic characters were highly variable in some geographical populations. In this study, juvenile specimens of some

PLOS ONE
proportion of overlap with other species living in the same area, such as number of teeth on fingers of second perieopods, the structure and number of rostrum teeth, and the presence of velvet pubescence on second perieopods. Nevertheless, the integration of morphology and molecular phylogenetic analysis (see below) represented an effective means to clarify the species boundaries among these prawn species.

Molecular phylogeny
One hundred ninety-eight partial sequences of COI gene were successfully obtained. The final aligned dataset contained a total of 678 bp, which included 326 conservative sites, 352 variable sites, and 284 parsimony-informative sites. All sequences were deposited in the BOLD database. BOLD identification numbers are presented in Table 2. The phylogenetic tree and clade composition compiled from three reconstruction methods was illustrated on an ultrametric topology (Figs 3-5). According to the accepted criteria for a monophyletic clade, the bootstrap value in RAxML failed to surpass the threshold value, whereas W-IQ-TREE and BI tree both supported monophyly. However, phylogenetic relationships among some Macrobrachium species are unresolved due to low statistical support at the deep node. The topology supports monophyly in most Macrobrachium species sampled in this study.

DNA barcoding
All new sequences deposited in BOLD and GenBank database are shown in Table 2. The project code for barcode analysis was accounted as "PRSEA" in BOLD. All COI sequences successfully passed the check for possible insertion of stop codons. The nucleotide composition of COI fragments used in analysis is as follows. The composition frequency of nucleotides is G: 18.91±0.07%, C: 25.94±0.22%, A: 26.71±0.14%, and T: 28.44±0.14%. The GC content of each codon position is 54.59 ±0.15% in position 1, 44.20±0.012% in position 2, 35.74±0.73% in position 3 and 44.84±0.29% overall. The genetic divergence distances calculated from 222 sequences retrieved from 36 morphological species in the dataset are 5.52% and 20.25% within species and genus, respectively. The normalized within-species divergence is 3.06±0.12%.
The barcode gap analysis based on comparison between mean/max-intraspecific variation and nearest-neighbor distance was calculated using two alternative models: K2-P and P-distance. The comparison results for singleton species were excluded due to non-applicable data in calculations. In K2-P and P-distance models, the species comparison advocated fourteen species with both max-intra and mean-intra distances higher than nearest neighboring species. A scatter plot warning of high genetic divergence species based on K2-P distances is provided in Fig 6 and a summary of barcode gap comparisons among all Macrobrachium species is presented in Table 3.
The DNA diagnostic characters in Macrobrachium species with a minimum of three representative sequences in the input dataset were successfully detected. The five categories of nucleotide characters were default assigned: diagnosis, partial or diagnosis, partial or uninformative, invalid and uninformative characters. The frequency of each character level found in the dataset are as follows: six species contained diagnosis character, one species with diagnosis or partial character, and 15 species with partial character. Partial-uninformative and invalid characters were not detected in any of the COI sequences in this study. The COI sequences of M. sirindhorn and M. rogersi contained four and five diagnosis characters, respectively. The nucleotide position and assigned character category of selected Macrobrachium species are summarized in S1 Table. The BIN discordance analysis indicated five cluster groups which contained non-confirmation of sequence cluster validity based on the BIN comparison of input records within the cluster. These BIN clusters contained previously deposited sequences and the newly obtained sequence dataset from this study. The discordant BIN clusters were identified as follows:

Species delimitation
The ABGD method delimited the sequence dataset into 37 MOTU clusters (excluding outgroup taxa). The clustering assigned nine putative lineages from singleton samples.  In total, the five delimitation methods, including morphological and molecular schemes, assigned the number of putative OTUs from the input COI dataset (including outgroups) as follows: 36 groups by morphology, 70 groups by BIN, 37 groups by ABGD, 36 groups by mPTP, 60 groups by bPTP, and 52 groups by GMYC. The delimitation results from each method are illustrated in Figs 3-5 based on topology of the ultrametric tree. The consensus result based on an integrative taxonomic scheme designated 27 nominal and 18 putative species from the COI dataset (excluding two outgroups).

Cryptic diversity and morphological variation of Macrobrachium prawns in Indochina
In this study, Macrobrachium COI sequences retrieved twenty-seven nominal and eighteen unknown putative species under the consensus tree generated from traditional morphology and molecular phylogenetic affinities. The barcode gap analysis in BOLD depicts 13 clustering species having maximum intraspecific distance greater than the distance to their nearestneighbor species in the dataset (Table 3). Cryptic speciation seems to occur within species groups with morphological complexity and co-existing distributions. From delimitation results, unknown putative OTUs with morphological variation and high genetic divergence were detected in several nominal species such as M. dienbienphuense, M. forcipatum, M. lanchesteri, M. niphanae, M. sintangense, M. spelaeus, and M. suphanense. The tree topology also indicates a genetic structural pattern correlated to geographical distinction in several widespread species. Macrobrachium lanchesteri, a common and widespread species, is represented by two morphological patterns based on the following composite characters: the proportional length of palm and finger of second pereiopods, rostrum shape, and body size. In the ultrametric tree, the M. lanchesteri clade is shown as two geographically distinct putative groups (putative sp. 3 and 4, Fig 3), which are supported by both BI and ML analyses. The geographical differentiation of M. lanchesteri has been previously reported, based on karyotypes [111], microsatellite markers, traditional morphometry, and single locus phylogeny [112]. The long-distance dispersal species, M. sintangense, and the recently described species, M. suphanense [70], are widely distributed in the Chao Phraya Basin, and some populations may migrate into the Mekong Basin. The phylogenetic results reveal several deeply divergent lineages, and delimitation methods cluster four new putative groups and place them between M. sintangense and M. suphanense (Fig 4). Another conspecific species closely resembling M. sintangense and M. suphanense is M. nipponense, which has been reported as an introduced species in northern Laos, Vietnam, and Myanmar [49,50,53,113]. In this study, sequences of M. nipponense from Vietnam were successfully sequenced and validated with a previous taxonomic study of Vietnamese fauna [53]. It is unclear whether the distribution of M. nipponense in Indochina is due to human introduction or native migration. However, a recent study on phylogeography and population structure of M. nipponense between coastal China and the island of Taiwan provides evidence that migration was associated with the Pleistocene glacial cycle and ecological isolation within a lake [60].
In Indochina, there are four additional Macrobrachium species that likely belong in this group: M. dolatum and M. tratense described from Thailand and two Mekong River species, M. saigonense and M. hungi, described from the lower Mekong Basin, in the Tonle Sap Basin in Cambodia and the Mekong Delta in southern Vietnam. These four species morphologically resemble M. sintangense. Only one deposited sequence of M. saigonense was available (GBCMD2451-09) to be included in this study. The delimitation result suggested the insertion of M. saigonense within the distinct lineage of putative sp. 9, which was previously identified as M. sintangense based on morphology. This result indicates the need for taxonomic reassessment of these two morphologically similar species. The life history traits of these species show association with estuarine environments and their morphological characters of the second pereiopods show similarly pattern (by having long and slender telopodites with 1-3 large teeth on chela and pollex). However, the inclusive phylogenetic relationship is unresolved due to low support values in BI and ML. Further revision of taxonomic descriptions and distribution patterns of these species is required.
The two remaining species groups found in this study, i.e., M. niphanae sensu Hanamura, Imai For the clade composed of the M. pilimanus group (clade L), genetic divergence shows congruence with delimitation approaches by having OTU counts that seem higher than from traditional identification. Recently, the M. pilimanus group was taxonomically examined for its morphological variability [50]. Arguments regarding species validity were discussed due to geographical variability, and the group was re-investigated because of suspected misidentification of name-bearing types [48,50,116,117]. In this study, two patterns of geographical distributions are detected in mainland SE-Asian species, namely the Chao Phraya and Mekong Basins (Fig 5: clade N), although statistical support is lacking for the monophyly of the Chao Phraya group (Fig 5: node O). In the Mekong Basin group (clade Q), tree topology indicates two nominal and two putative species.
Macrobrachium dienbienphuense is the most widespread species of the M. pilimanus group in Indochina and is placed in the Mekong Basin group. The taxonomic problem of morphological variability was reported in M. dienbienphuense, and insufficient species delimitation was attributed to the effect of geographical variation [48,50,110]. In this study, species delimitation approaches indicate one putative OTU, putative sp. 18. The Chao Phraya group also contains two new distinct lineages (putative sp. 16 and 17). Unexpectedly, the new distinct lineage from Cambodian territory (putative sp. 15) appears as a basal clade of the M. pilimanus group. This result suggests the need for further taxonomic revision and multi-locus phylogenetic study in order to clarify the taxon validity and to explore cryptic diversity of the M. pilimanus species group in Indochina.
Macrobrachium yui and M. hendersoni, both land-locked species, are distributed in the northwestern montane area of Indochina with occurrence reported from two major river basins: Chao Phraya and Mekong [48,50]. Distribution ranges of these species and other nominal taxa such as M. lanatum and M. patheinense [49,118] serve as evidence of exchange between Indochinese and Indian-Burmese fauna due to populations found in the Indian subcontinent. In this study, seven species with cross-basin distribution were found: M. hendersoni, M. lanchesteri, M. neglectum, M. nipponense, M. rogersi, M. rosenbergii, and M. vilosimanus. The transitional zone of these freshwater species exists along the montane and coastal area of Myanmar-Thailand border. This result advocates that the diversity of Indochinese fauna might be underestimated due to habitat diversification and the connection of river networks. For this reason, the biogeographical study between Indochina and other regions such as India-Burma, East Asia, and some Sunda Islands of the Malay Archipelago should emphasize evolutionary history at a broad regional scale. Moreover, the taxonomic revision of several species previously reported from Laos, Cambodia and Vietnam requires reinvestigation using an integrative taxonomic approach.

Implications of DNA barcode delimitation of groups with high morphological diversification in genus Macrobrachium
Barcode-based delimitation provides an additional tool for cryptic diversity exploration and can be used to resolve taxonomic validity in morphologically complex organisms including crustacean decapods [35,44,[119][120][121][122][123]. Previously, the proposal of a universal threshold for species delimitation in crustaceans using genetic divergence has been introduced [39]. In decapods, the barcode delimitation approach has also been used in particularly important groups such as crayfish [124], shrimp and prawns in families Atyidae and Palaemonidae [121,125,126]. The universal barcode gap threshold based on COI and 16s markers has been excavated by broad sampling analysis of crustacean families. The result suggested that the barcode gap threshold from the COI marker is helpful for taxonomy in species-level identification [39]. In this study, the barcode gap distance generated from MEGA provided the broad picture of overall distance values. The graph (Fig 7) can be used to explain the overlap between intraand inter-specific distances either the best compromised barcode gap threshold value present clearly or hardly determine. A graph from BOLD provides the evidence of high-low genetic differentiation among sequence clusters. The BOLD graph is useful for further taxonomic reinvestigation because the fine resolution of clustering group determination and warning signal provided from BIN discordance. For this reason, barcode gap graphs might give a broad resolution of delimitation based on distance method in both best and compromised barcode gap threshold proportion. The median universal genetic divergence between species of the same genus is between 0.25-1.01 substitutions per site. In this study, the maximum interspecific divergence is 20.9, whereas maximum intraspecific divergence was as high as 23.49 (in M. malayanum; Table 2). This finding suggests the possibility of cryptic divergence in Macrobrachium species.
Morphological adaptations driven by intrinsic and extrinsic factors in shrimp and prawns have been reported worldwide [56,[127][128][129]. An impact on developmental stages in species with hermaphroditism was found when the food resource and space were limited [46]. Morphological variation such as phenotypic plasticity or sequential hermaphroditism in aquatic animals was reported in association with food supply and habitat space [58,130]. These aforementioned factors also cause significant impact on species discrimination due to the degree of morphological overlap in Macrobrachium species [131,132]. In this study, juvenile specimens of the M. pilimanus species group are difficult to identify because of the lack or incomplete development of species diagnostic characters generally used in broad sampling. Sexual dimorphism was documented in some Macrobrachium species, such as in a male with the pair of second perieopods showing different shape and length of each telopodite, whereas the female exhibited similar form on both sides. A lack of life history data is problematic for species identification as well as for potential use in aquaculture [133][134][135].
In this study, the BIN discordance function in BOLD provided a warning signal for taxa with low genetic distance compared to their nearest-neighbor species. For example, deposited COI sequences of M. lanchesteri and M. dienbienphuense exhibit genetic affinity and are possibly closely related to M. rosenbergii and M. puberimanus sequences (in this study), respectively. The taxonomic identification of M. puberimanus and M. dienbienphuense was clearly resolved based on multi-locus phylogenetic analysis despite their sympatric distribution patterns and morphological variability [88]. The morphological characters and molecular delimitation agree that these co-existing species must be accepted as distinct species. Moreover, the BIN discordance signal suggests that some M. dienbienphuense sequences (GBCMD28530-19 and GBCMD28531-19) placed within the M. puberimanus clade require re-investigation of their morphological characters for identification. Within clade B (Fig 3), one deposited sequence defined as M. lanchesteri (GBCMD 28533-19) is not placed within M. lanchesteri s.l. clade but rather as a sister lineage of M. rosenbergii.
Recent taxonomic revision has validated the taxonomic identity of M. rosenbergii [136]. In Indochina, the introduced population of M. rosenbergii stock from different geographical populations used in breeding programs may lead to genetic contamination among native populations. This result raises a signal of caution for specimen identification and use of Indochinese Macrobrachium sequences deposited in the online database. For this reason, the intensive study of riverine species is critically needed to clarify taxonomic boundaries and to document genetic diversity of native populations. In this study, barcode gap analysis also reveals the concealing effect of this method for species delimitation among Indochinese species due to high inter-and intra-specific genetic variations.
Recently, primers for the COI region were re-configured due to species-specification, improvement of amplification ability, and avoidance of nuclear copied gene amplification. However, the use of DNA barcoding has been debated for intensive systematic study [137][138][139]. The nuclear mitochondrial DNA (numts), a common DNA segment found in crustaceans, was suspected to interfere with the COI sequences used in crustacean phylogeny and barcode analysis [122,140,141]. The low quality of COI sequences containing numts may cause misleading results in species genealogies and clustering methods [142]. In this study, all COI sequences were automatically checked before BIN assignment in BOLD. The clustering result based on COI sequences was partially congruent with morphological species identification. The phylogenetic position of several morphological species such as M. dienbienphuense, M. eriochierum, M. hirsutimanus, M. neglectum, M. niphanae, M. sirindhorn, and M. thai were illustrated. However, the single-locus phylogenetic analysis was unable to resolve relationships in deeply divergent lineages of some nominal species. The integration of other molecular loci for decapod identification such as 16S, 28S, and H3 would allow improvement of taxonomic identification and study of phylogenetic relationships [15,100]. According to DNA barcoding results, it can be suggested that taxonomic revision and assessment of phylogenetic relationships of mainland SE-Asian Macrobrachium including the Indochina subregion are crucially needed, especially in groups that show morphological complexity such as M. lanchesteri, M. niphanae, and M. sintangense. Moreover, deposited sequences in the public barcode library might be of significant importance for fundamental knowledge, utilization and future conservation management projects of mainland SE-Asian Macrobrachium fauna.

Distribution patterns of Indochinese freshwater Macrobrachium prawns
The biogeography and life history of Macrobrachium in SE-Asia have been revealed based on the combination of molecular, morphological and ecological data [37, 38, 100]. The marinefreshwater habitat diversification has been suggested to have had an impact on morphological and genetic diversifications in aquatic animals [143][144][145]. Previously, life-history traits of Macrobrachium prawn species were matched with their evolutionary relationships by using multi-loci phylogenetic analysis [100]. The marine species with larval development in saline water were depicted as the ancestral group for all Macrobrachium, while the freshwater group appear to be the derivative group. In this study, the ultrametric tree depicts both marine-brackish and freshwater species. However, the fine resolution of phylogenetic relationships between species with the two life history traits remains undetermined due to low structural signal in the deep node position. Broad-scale phylogeographical studies of giant river prawn species and some Indo-Australian species [43,146] suggested that the vast genetic diversity could be divided into several groups based on morphological complexity, and that the genetic affinity of some populations supported the hypothesis of an ancient river system during the last glacial maximum period [147,148]. The historical connection of several rivers in mainland SE-Asia has been frequently detected in historical biogeographical studies in other aquatic organisms such as semi-aquatic earthworms [149], freshwater mussels [150,151], fishes [152,153], as well as from evidence in the fossil record, e.g., crocodilian [154].
From multiple lines of ecological field evidence in this study, together with habitat preferences and distribution ranges gleaned from previous records, the life history of Indochinese Macrobrachium are divided into three groups. The first group includes species found in montane streams and comprises seventeen taxa including nominal and putative clustering species: Distribution patterns of some Macrobrachium groups show correlation with historical evidence supporting a paleo river systems hypothesis [37,150]. In this study, the occurrence records of three Macrobrachium species could be used to refer historical geography. Firstly, M. naiyanetri exhibited disjunct distribution between the southern Thai peninsula, east coast of Thailand and inland water system of Cambodia. The genetic affinity based on clade composition of sampling population from those areas also supported the geographical isolation. Secondly, the subpopulation of M. dienbienphuense has been found along the western part of Thailand, whereas the major population seems distributed in Mekong tributaries. The previous hypothesis about the connection between minor tributary systems of Mekong River and Chaophraya River was discussed and promoted based on phylogeographic analyses of extant and extinct organisms (mentioned below). The records of occurrence for this species would support the river connection hypothesis. M. lanchesteri is commonly abundant in lentic water habitats in Indochina. In this study, the genetic affinities were detected and divided into three geographical areas. The genetic structure likely fits with three major river systems in Thailand (Chauphraya, Tapi and Mekong), which were previously part of the ancient river system during Pleistocene. Further investigation on population genetics and molecular dating would be useful to match species diversification with the effects of historical distribution scenarios.
From phylogenetic analysis, two widespread species, M. neglectum and M. sintangense exhibit low genetic divergence among different geographical populations. In contrast, the phylogenetic tree indicates geographical isolation among the populations of two widespread species, M. dienbienphuense and M. lanchesteri. Populations of M. dienbienphuense and M. lanchesteri often are found co-existing with other congeneric species. Macrobrachium lanchesteri are highly abundant in flood-plain areas, whereas M. dienbienphuense predominantly migrates within the Mekong River and its tributaries. Recently, the migratory period for M. dienbienphuense has been locally reported as "parading" in some remote areas in tributaries of the middle Mekong, and the species currently might be threatened by human activity [155]. The discovery of subterranean species such as M. elegantum, M. lingyunense, and M. spelaeus [63,87] highlights the high adaptive ability to extreme habitats and emphasizes the significance of habitat diversity in continental Asia. Previous taxonomic studies and the results of DNA barcoding in this study advocate that the freshwater fauna of Indochina is still underestimated for cryptically diverse species because of river tributary connections [48][49][50]156]. For this reason, the distribution patterns of several Macrobrachium species in Indochina are evidence that help solve the puzzle of SE-Asian freshwater faunal diversity related to a paleo-river basin presented in previous reports. Further analysis using a multi-locus phylogeny and additional samples from the Sundaic region are required to clarify the broad-scale diversification and biogeographical pattern of SE-Asian Macrobrachium fauna.

Conclusion
This study provides the first DNA barcode library and cryptic evidence of genus Macrobrachium in Indochina. Diagnostic characters of some species have been detected from nucleotide positions and can be included as additional characters for taxonomic identification and species validation (S1 Table). Despite wide geographical dispersion, several species show low genetic affinity between different geographical populations. In contrast, the morphologically complex Macrobrachium species group possesses high genetic diversity and the geographical distribution shows allopatry between Chao Phraya and Mekong river basins. The broad scale phylogenetic relationships of Indochinese species from the COI dataset are still unresolved. However, examples of stable clade composition and monophyletic lineages are found, especially in estuarine species. The DNA species delimitation suggests several candidate OTUs which might be cryptic species hidden within common species, such as the M. lanchesteri, M. sintangense and M. pilimanus species groups. The barcode gap analysis provides a delimitation threshold for Indochinese taxa despite the high intraspecific variation detected in some species. The inappropriate taxonomic identification of some available sequences from the public database raises caution and suggests that dataset reconstruction and re-verification for further taxonomic comparison is required. Finally, the results of this study indicate that the regional fauna share interconnection with other neighboring regions such as India-Burma and East Asia, as indicated by the records of some widely dispersed species.
Supporting information S1