Phylogeography and genetic structure of the oriental river prawn Macrobrachium nipponense (Crustacea: Decapoda: Palaemonidae) in East Asia

The oriental river prawn (Macrobrachium nipponense) is mainly distributed in East Asia. The phylogeography, population genetic structure and historical demography of this species in the East Asia were examined by using partial sequences of the cytochrome oxidase subunit I (COI) and 16S rRNA in mitochondrial DNA. Ten populations that included 239 individuals were collected from Taiwan (Shihmen Reservoir, SMR, Mingte Reservoir, MTR and Chengching Lake Reservoir, CLR), mainland China (Taihu Lake, TLC, Min River, MRC, Jiulong River, JRC and Shenzhen Reservoir, SRC), Japan (Biwa Lake, BLJ and Kasumigaura Lake, KLJ) and Korea (Han River, HRK). The nucleotide diversity (π) of all individuals was 0.01134, with values ranging from 0.0089 (BLJ, Japan) to 0.01425 (MTR, Taiwan). A total of 83 haplotypes were obtained, and the haplotypes were divided into 2 main lineages: lineage A included the specimens from BLJ, KLJ, CLR, MTR, TLC, MRC and JRC, and lineage B comprised the ones from HRK, SRC, SMR, MTR, TLC, MRC and JRC. Lineage A could be further divided two sub-lineages (A1 and A2). Individuals of lineage A2 were only from TLC. Demographic expansion was observed in each lineage, starting within the second-to-latest interglacial period for lineage A and within the last glacial period for lineage B. All FST values among the ten populations were significantly different, except for the values between MRC and JRC, and SMR and SRC. The phylogeography and genetic structure of M. nipponense in East Asia might be influenced by Pleistocene glacial cycles, lake isolation and human introduction. The possible dispersal routes of M. nipponense in the East Asia were also discussed.


Introduction
The complex geological events and climatic history of various regions helped shape current phylogeographical patterns [1,2]. Therefore, the current population genetic structure of a a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 DNA extraction, amplification and sequencing Total genomic DNA was extracted from pereopod muscle using QIAamp DNA Mini Kit. [24]. Two different fragments (16S rRNA and COI) of mtDNA were amplified and sequenced. The  Table 1.  16S rRNA and COI sequences were amplified using 1471 (5'-CCT GTT TAN CAA AAA CAT-3') and 1472 (5'-AGA TAG AAA CCA ACC TGG-3') [25], and COI-F (TTT ATC TTC GGA  GCG TGA GC) and COI-R (AGT TAT TCC TGG GGC TCG TAT G) [26] primers, respectively. Thermal cycling was performed on a GeneAmp 2400 thermal cycler (Perkin-Elmer, Norwalk, CT, USA), and PCR conditions consisted of 39 cycles of denaturation at 95˚C for 50 s, annealing at 50˚C for 1 min, and extension at 72˚C for 1.5 min. An initial denaturation step at 95˚C for 5 min and a final extension holding at 72˚C for 10 min were included in the 1st and last cycles, respectively. The PCR product was separated by electrophoresis on 1.5% agarose gels, purified with the Gene Clean II kit (Bio101, Vista, CA, USA), and sequenced on an ABI 377 DNA sequencer (Applied Biosystems, Inc.; Foster City, CA, USA).

Sequence analyses
All sequences were aligned using MegAlign (DNASTAR, LaserGene, WI, USA). The number of variable and parsimony informative sites, base composition, haplotype diversity and nucleotide diversity [27] were calculated using DnaSP version 5.00 [28]. Part of the sequences of the 16S rRNA and COI genes were concatenated in the following analyses. Phylogeographic analyses of 16S rRNA and COI genes were carried out by the neighbour-joining (NJ) and maximum likelihood (ML) methods, respectively, by MEGA 6 [16,29]. Bootstrap analyses with 1,000 replicates were used to evaluate the phylogenetic relationships of all haplotypes. The optimal substitution model was determined using MEGA. A network of haplotypes was also constructed using the median-joining method [30] in Network version 4.6.1.3, available at http://www.fluxus-engineering.com. The historical demographic expansion was investigated by examining the frequency distributions of pair-wise differences between sequences (mismatched distribution) with ARLEQUIN [16]. The approximate age of the population or lineage was estimated with the formula A = μπ [31]. A is the age of the population or lineage, π is nucleotide diversity and μ is μ (the mutation rate) x generation time; the approximate dates of population expansion were estimated with the formula τ = 2μT [32], where T is the time since expansion, τ is the expansion time, and 2μ is μ (the mutation rate) x generation time x the number of bases sequenced. The average divergence rate of 1.17-1.66% per million years and a generation time of one year were used [33].
To examine the genetic differentiation between any two populations, pair-wise F ST statistics were estimated by ARLEQUIN version 3.5 [34]. A dendrogram of the ten sampling sites was also constructed using the unweighted pair-group method with arithmetic means (UPGMA) based on the F ST values. The population structure was also assessed by an analysis of molecular variance (AMOVA; [35] in ARLEQUIN). Various groupings of these populations were suggested by an UPGMA tree of these ten populations. The grouping that revealed the maximal value of F CT and significantly differed from a random organization of similar groupings was assumed to represent the most-probable geographic subdivisions [36]. The significance test of the statistical result was evaluated by a permutations test with 10,000 random permutations.
Tajima's D [37] was used to check for deviations from neutrality, indicating whether population expansion had occurred in the past. Fu's Fs test [38] was also carried out to assess the evidence for population expansion using DnaSP. In addition, population expansion was also investigated with a mismatch analysis to examine the frequency distributions of the nucleotide difference as a function of frequency using DnaSP.
The Mantel test [39], available in ARLEQUIN, was used to test for isolation by distance. We used the pair-wise F ST values and the corresponding pair-wise geographical distances as the input data, and 1000 permutations were performed to determine the level of significance. The approximate geographic distances between sampling locations were used as the minimum distance map.

Results
In total, 239 specimens were sequenced. The size of the 16S rRNA fragment was 421 bp, with 78 variable sites and 37 parsimony informative sites, resulting in 38 unique haplotypes (S1 Dataset). All sequences were deposited in GenBank, with accession numbers KU235597-KU235646, KU235699-KU235720 and KY084569-KY084735. No gap was detected. The frequency of the nucleotide composition showed an AT bias (with G + C contents of 33.7%). The size of the COI fragment was 371 bp; there were 42 variable sites and 31 parsimony informative sites, resulting in 47 unique haplotypes (S2 Dataset). All sequences were deposited in GenBank, with accession numbers KU235799-KU235848, KU235901-KU235922 and KY092174-KY092340. The frequency of nucleotide composition showed an AT bias (with G + C contents of 39.8%).
The following results were obtained by analyzing the combined sequences of the 16S rRNA and COI genes (S3 Dataset). The haplotype diversity (h) of all ten populations was 0.956, with values that ranged from 0.572 (BLJ) to 0.964 (MTR) ( Table 1). The nucleotide diversity (π) of all populations was 0.011, with values that ranged from 0.001 (BLJ) to 0.014 (MTR) ( Table 1). A total of 83 haplotypes was detected in 239 specimens. The most common allele was shared by 36 individuals from the HRK (17), SMR (10), MRC (3), JRC (3) and SRC (3) populations. The second most common allele was shared by 20 individuals from the populations of MRC (14) and JRC (6). The third most common alleles were shared by 17 individuals from the populations of SMR (12) and SRC (5).
The best-fitting model explaining our data was the K2 model. This model was used for NJ and ML reconstructions and AMOVA analyses. A phylogenetic tree of all haplotypes is shown in Fig 2. The results of both the NJ and ML trees were very similar. Two distinct lineages (A and B) were found. Lineage A might be further divided into two sub-lineages (A 1 and A 2 ). Bootstrap values are 77 and 75 for the NJ and ML trees between lineage A and lineage B, respectively. The bootstrap values are 74 and 66 for the NJ and ML trees between lineage A 1 and lineage A 2 , respectively. The network for all specimens (Fig 3) supported the result obtained from these phylogenetic trees. Two sub-lineages were also found in lineage A in the network. The distribution of specimen of lineages A and B for different populations are also shown in Fig 1 and  All F ST values among the ten populations were significant, except for the ones between MRC and JRC and between SMR and SRC ( Table 2). The UPGMA tree of these ten sampling areas could be divided into two main groups (Fig 4); the first group included the SMR, HRK, SRC, MRC and JRC, and the second group included the other five populations. The second group may be further divided into four subgroups; the first subgroup included the CLR and   Five different groupings for the ten populations were suggested by the UPGMA trees. The results of the AMOVA are shown in Table 3. The AMOVA for the ten populations yielded a significant F ST value of 0.57445, indicating that at least one of the pair-wise populations had significant heterogeneity. Significant values of F CT were observed in all groupings. The highest F CT values (0.4817) were found in grouping 2, and supported the conclusion that these ten populations could be divided into two main groups: the first group included SMR, HRK, SRC, MRC, and JRC, and the second group included the other five populations. Significant F CT values were also found in different groupings, indicating that an additional genetic discontinuity may also have occurred among populations.
No significant Tajima's D values were found for all populations, except SRC (Table 1). However, Tajima's D values were significant for each lineage and for the total population. Fu's Fs tests were significant for the MTR, TLC and HRK populations (Table 1). Significant Fu's Fs values were also obtained for lineages A and B and for the total population. The mismatched distribution of all specimens was bimodal (Fig 5a), with one mode corresponding to the number of differences within the lineages and the other mode corresponding to differences between the two lineages. A unimodal distribution was obtained from either lineage A or B, which did not significantly differ (as measured by the sum of the squared deviation; P > 0.05) from that predicted by the growth expansion model (Fig 5b and 5e). A unimodal distribution was also obtained from either lineage A 1 or A 2 (Fig 5c and 5d).
No correlations between genetic differentiation and the distance of geographic separation among populations were observed for lineages A and B (lineage A: p = 0.838; lineage B: p = 0.835), which indicates that oriental river prawns did not conform to an isolation-by-distance model of maternal gene flow.

Discussion
Two distinct lineages (A and B) of M. nipponense in East Asia were found in this paper, and this result is the same as the one obtained in our previous paper [16]. Populations with ancestral genotypes tend to preserve higher nucleotide and haplotype diversities because of the long-term accumulation of mutations [40,41,42]. The nucleotide diversity (π = 0.009) and haplotype diversity (h = 0.956) in lineage A were significantly higher than the values (π = 0.005; h = 0.866) in lineage B. This suggested that lineage A was older than lineage B. The approximate age of lineages A and B calculated were 490,000 and 253,000 years, respectively (the average mutation rate of 1.42% / myr was used to calculate the age of the population or lineage). The estimate of time of expansion for lineage A (209,533 yrs ago), lineage A 1 (188,371 yrs ago), lineage A 2 (36,100 yrs ago) and lineage B (49,248 yrs ago) also proved this result. Mismatch distributions of the two distinct lineages show that lineage B had a steeper peak, which indicates that there is a smaller original population before an expansion or bottleneck (Fig 5) [32]. This picture also suggests that lineage B could have experienced expansion in the more-recent past than lineage A, the pairwise distribution mode of which was more-clearly displaced to the right of the distribution pattern (Fig 5). This supported the conclusion that the time of expansion of lineage A was earlier than that of lineage B.
The oriental river prawn originated in mainland China [12], and thus, the high genetic diversity of populations in mainland China were expected. However, the nucleotide diversities (0.014) and haplotype diversity (0.964) in the MTR were significant higher than the values in these sites in mainland China and that may partly result from the specimens in MTR that were involved in two different lineages (Figs 1, 2 and 3). However, the estimated nucleotide diversity (0.013) and haplotype diversity (0.942) (Table 1) excluding the specimens that belonged to lineage B in MTR were still significantly higher than the values in different populations (such Population structure of the oriental river prawn as TLC, π = 0.006, h = 0.924) ( Table 1). Most individuals in MTR formed various unique haplotypes, not shared by other individuals from other sampling sites (Fig 3). This indicated that the MTR population had a longer time to accumulate mutations than the individuals from different sampling sites. Moreover, the haplotypes from the MTR population were distributed in two different locations in sub-lineage A 1 (Figs 2 and 3), and this revealed that two different ancestral groups were recruited in the MTR population. Thus, these two different recruited ancestral groups may also cause the high diversity in the MTR population. We found that the haplotypes from the TLC population were also divided into two different sub-lineages (A 1 and A 2 ) (Figs 2 and 3), and this might reveal secondary back contact between the TLC and Japan populations occurred in this sampling site. However, the nucleotide diversities (0.008) and haplotype diversity (0.931) of the TLC population were still significantly lower than the values obtained from MTR. A large number of cultured prawns released into TLC in recent years [29] may have led to lower genetic diversity in TLC. For these lineage B populations, the nucleotide and haplotype diversities in mainland China populations (e.g., SRC and JRC populations) were higher than the values from the other populations (e.g., SMR and HKR), and this supported the conclusion that the oriental river prawn originated in mainland China and dispersed to other areas.
Although two lineages were found (Figs 1, 2 and 3), not all ten populations simultaneously included both individuals from lineages A and B. The individuals in SMR, SRC and HRK were only found to belong to lineage B, and the specimens from MTR, TLC, MRC and JRC were included in lineage A or B, respectively, while the specimens in CLR, BLJ and KLJ were only discovered in lineage A. This may partly result from different dispersal routes and different times of origination for the two lineages, and their arrival time was apparently less than one million years ago [11,12,43]. Demographic expansions were observed in each lineage, starting in the second-to-latest interglacial period for lineage A (209,533 yrs ago) and within the last glacial period in lineage B (49,248 yrs ago). The event of connecting the islands of Taiwan, Japan and Korea to the mainland China occurred 2 to 3 times in the Pleistocene [44,45,46]. Our pervious paper [16] supported that (1) lineage A was older than lineage B, (2) lineages A and B may originate from the same ancestor in mainland China and were then dispersed to Taiwan at different times, and (3) lineage A moved to Taiwan earlier than lineage B.
For lineage B, the most common allele was shared by all populations (HRK, SMR, MRC, JRC and SRC populations), and this indicated all populations have the same ancestor. Populations with ancestral genotypes tend to preserve higher nucleotide and haplotype diversities because of the long-term accumulation of mutations [40,41,42]. The nucleotide diversity (π = 0.009) and haplotype diversity (h = 0.867) in SRC population was significantly higher than the values from the other populations, and the HRK population had the lowest nucleotide and haplotype diversities (Table 1). Therefore, the SRC and HRK populations were the oldest and the youngest populations, respectively. Furthermore, the oriental river prawn originated in mainland China [12]. Based on above discussions, the dispersal route of lineage B might be from the south of China (SRC), north to higher latitude areas (MRC, JRC and Taiwan), and further north to the north of China (TLC) and Korea. For lineage A, we also found that KLJ and BLJ populations located the most north sampling areas liked as HRK population in lineage B had the lowest nucleotide and haplotype diversities. Furthermore, the relationship between KLJ (located in the eastern Japan) and TLC (in the north of China) populations was not such close as the one between BLJ (located in the western Japan) and TLC (Fig 3). Thus, two different dispersal routes might yield for lineage A. The first route is from the south of China, north to higher latitude areas (MRC, JRC and Taiwan), and further north to eastern Japan (KLJ); and the other one is from the south of China, north to higher latitude areas (MRC, JRC and Taiwan), north to the north of China (TLC), and to western Japan (BLJ).
Significant genetic differences were found in most pairs of the ten populations (Table 2), and the hierarchical AMOVA revealed that a significant genetic structure across all hierarchical levels existed among the ten populations, but no obvious geographic division was found in genealogic reconstructions except for TLC in lineage A (Figs 2 and 3). This outcome is different from the outcomes that were often found of a high degree of phylogeographically related genetic structure [11,42,47,48,49,50,51,52,53,54]. This might be because the expansion time for lineage A (209,533 yrs ago) or lineage B (49,248 yrs ago) were too short to accumulate enough genetic variation and form geographically unique clade [55,56,57]. The haplotypes of lineage A 2 (TLC population) were close to BLJ haplotypes, and this might result in the secondary back contact from BLJ populations.
There was no significant genetic divergence between MRC and JRC, which may result from frequent gene flow because the geographical distance between JRC and MRC is very close and their environments were similar (Fig 1). The transportation of M. nipponense from SRC to SMR for aquaculture or the maintenance of genetic diversity [54] may partly explain the lack of genetic variation between SRC and SMR populations.

Conclusions
Our study indicated a high level of genetic structure among the oriental river prawn populations in East Asia. Two main lineages (A and B) were found, and lineage A was older than lineage B. A rough estimate of the ages of lineages A and B were obtained and were approximately 490,000 and 253,000 years old, respectively. Lineages A, A 1 , A 2 and B of the oriental river prawn M. nipponense have experienced population expansion since the Pleistocene glacial cycles in East Asia (approximately 209,533, 188,371, 36,100 and 49,248 yrs ago, respectively). The possible dispersal routes of M. nipponense in the East Asia were also speculated. Lineage B could have experienced expansion more recently than lineage A. The phylogeography and genetic structure of M. nipponense in East Asia might be influenced by land bridges during Pleistocene glacial maximums and by human colonization.