Molecular phylogeny and species delimitation of the genus Tonkinacris (Orthoptera, Acrididae, Melanoplinae) from China

Tonkinacris is a small group in Acrididae. While a few species were occasionally sampled in some previous molecular studies, there is no revisionary research devoted to the genus. In this study, we explored the phylogeny of and the relationships among Chinese species of the genus Tonkinacris using the mitochondrial COI barcode and the complete sequences of ITS1 and ITS2 of the nuclear ribosomal DNA. The phylogeny was reconstructed in maximum likelihood and Bayesian inference frameworks, respectively. The overlap range between intraspecific variation and interspecific divergence was assessed via K2P distances. Species boundaries were delimitated using phylogenetic species concept, NJ tree, K2P distance, the statistical parsimony network as well as the GMYC model. The results demonstrate that the Chinese Tonkinacris species is a monophyletic group and the phylogenetic relationship among them is (T. sinensis, (T. meridionalis, (T. decoratus, T. damingshanus))). While T. sinensis, T. meridionalis and T. decoratus were confirmed being good independent species strongly supported by both morphological and molecular evidences, the validity of T. damingshanus was not perfectly supported by molecular evidence in this study.


Introduction
The genus Tonkinacris Carl, 1916 is a small group of grasshoppers with six known species worldwide so far [1][2][3][4][5][6]. Among the four species having distribution in China, T. sinensis has the most wide distributional range covering northeastern Yunnan, Sichuan, Guizhou, Chongqing, southern and western parts of Hubei and Hunan, Guangxi and northern Vietnam, T. decoratus is distributed mainly in south Guangxi  and T. meridionalis are both endemic to China with collection records only from the type localities to date. T. ruficerus and T. yaeyamaensis are distributed only in Ryukyu Islands, Japan (Fig 1).
For morphological species identification, T. sinensis can be easily distinguished by the median black longitudinal stripe on the dorsum of pronotum narrower and lateral yellowish ones broader than those in three other species, the immaculate external surfaces of hind femora and the subconical cerci in male (Fig 2A-2E). However, T. meridionalis seems to be an intermediate species between T. sinensis and the group comprising T. decoratus and T. damingshanus. That's to say, T. meridionalis is similar to T. sinensis in the immaculate external surfaces of hind femora and conical cerci in male (Fig 2G and 2J), but similar to T. decoratus and T. damingshanus in the pattern of pronotal stripes and the overall appearance of the body (Fig 2F). T. decoratus and T. damingshanus are the most similar pair of species in the genus. According to the original description [4], the minor difference between them is the three complete black transverse maculations on the upper surfaces of hind femora for T. damingshanus (Fig 2U and 2V). T. decoratus usually has only two black transverse maculations on the upper surfaces of hind femora (Fig 2M and 2N). But variation in this character was observed in T. decoratus, of which some individuals have also three black transverse maculations on the upper surfaces of hind femora (Fig 2O and 2P). Both T. decoratus and T. damingshanus have similar patterns of variations in the black transverse maculations on the external surfaces of hind femora (Fig 2P-2R and 2W-2Y). It seems that the morphological difference between them nearly disappears and the validity of T. damingshanus becomes questionable.
Since molecular data has demonstrated its significant implication in exploring phylogeny and species delimitation in many grasshopper groups (see reference [22] for a short review), we decided to investigate through molecular approaches the phylogeny of Chinese Tonkinacris species and to clarify the relationship between T. decoratus and T. damingshanus. The two Japanese Tonkinacris species were not included in this study because they are morphologically extremely different from the four species distributed in continental China and Vietnam and may not belong to the genus Tonkinacris. More importantly, the only mitochondrial COI fragments sequenced by Grzywacz et al. [20] for the Japanese species T. ruficerus and T. yaeyamaensis do not overlap even partially with, but are distantly separated from, the standard barcoding region sequenced in this study and our previous similar studies [18,22]. Therefore, the partial COI sequences of T. ruficerus and T. yaeyamaensis sequenced in Grzywacz et al.'s study [20] cannot be combined into our dataset for analysis. In this study, we sampled 215 individuals belonging to 15 genera and 20 species in Acrididae and sequenced the 658-base fragment of the barcode region in mitochondrial COI gene for animals [25], and the complete sequences of ITS1 and ITS2 of the nuclear ribosomal DNA for 149 individuals. Phylogeny of the species involved in this study was reconstructed from molecular sequence datasets using maximum likelihood and Bayesian inference methods, and the species boundary was delimitated for Chinese Tonkinacris species using multiple methods, including genetic distance, NJ tree, the haplotype network constructed using the statistical parsimony method [26] and analysis of the generalized mixed Yule coalescent model (GMYC) [27]. The results demonstrate that Chinese species of the genus Tonkinacris is a monophyletic group and the phylogenetic relationship among them is (T. sinensis, (T. meridionalis, (T. decoratus, T. damingshanus))). While T. sinensis, T. meridionalis and T. decoratus were confirmed being good independent species strongly supported by both morphological and molecular evidences, the validity of T. damingshanus was not perfectly supported by molecular evidence in this study.

Taxon sampling and generation of molecular sequences
A total of 218 individuals representing 2 suborders 3 families 17 genera and 22 species were sampled (S2 Table). The sampling strategy was the same as that in our previous studies [18,22]. Species assignation of specimens was performed mainly following Li & Xia's [11] keys to species. All specimens were preserved in anhydrous ethanol and stored at room temperature. The protocols for DNA extraction, PCR amplification, sequencing, sequence assembly and alignment followed those in our previous studies [18,22]. COI barcode region for animals, ITS1 and ITS2 were newly sequenced for 149 individuals and the remaining sequences were derived from our previous studies (S3 Table) [18,22]. Haplotype nucleotide sequences were deposited in GenBank with accession numbers MW053510-MW053544, MW056459-MW056489 for COI haplotypes, and MW054567-MW054626, MW055691-MW055701 for ITS sequences. T. ruficerus and T. yaeyamaensis were not included in the analysis because the fragment sequenced by Grzywacz & Tatsuta [20] is not the barcode region. The twenty samples of T. decoratus were collected from three localities in Nonggang Nature Reserve, Longzhou County, Guangxi, with gh050 ( Fig 2O) having an extremely distinct black transverse maculation on the base of the upper surface of hind femur, gh063 having a slightly distinct black transverse maculation, and gh054, gh062, gh067 as well as gh069 having an indistinct black transverse maculation, respectively.

Data analysis
Sequence divergences were calculated using the Kimura two parameter (K2P) distance model to explore the extent of overlap between intraspecific variation and interspecific divergence [28,29]. A neighbor-joining (NJ) tree of K2Pdistances was created to provide a graphic representation of the patterning of divergence between species [30] as a profile for the setup of taxa and groups for calculating genetic distances and a reference framework for species delimitation. The calculation of the sequence divergences and NJ tree building with 1000 bootstrap replicates were implemented in MEGAX [31].
The phylogeny was reconstructed in maximum likelihood and Bayesian inference frameworks with Ergatettix dorsiferus in Tetrigidae and Conocephalus longipennis in Tettigoniidae as outgroups. The sequences of COI were divided into three subsets which consisted of the first, second or third position of the codons, respectively. Maximum-Likelihood phylogenies were reconstructed using IQ-TREE [32], best-fit models of nucleotide evolution and best-fit partitioning scheme were selected using ModelFinder [33], the approximately unbiased branch support values were calculated using UFBoot2 [34], and the analysis was performed in W-IQ-TREE [35] using default settings most of the time. BI analyses were accomplished in mrbayes 3.2.1 (http://morphbank.Ebc.uu.SE/mrbayes/) [36], with two independent runs, each with four Markov Chain Monte Carlo (MCMC) chains. The analysis was run for 1×10 7 generations, sampling every 100 generations, and the first 25% generations were discarded as burnin, whereas the remaining samples were used to summarize Bayesian posterior probabilities (PP).
Considering the possible simple correspondence between the identity of traditional species or evolutionarily significant units (ESUs) and an objective standard of genetic differentiation: the 95% connection limit in statistical parsimony networks [27,[37][38][39], we constructed haplotype networks for Tonkinacris species. The construction of haplotype networks was implemented in TCS1.21 [40].
Since the generalized mixed Yule coalescent model (GMYC) was considered a robust tool for delimiting species when only single-locus information was available [41], the single-threshold GMYC analysis of COI sequences was conducted in Rv3.6.1 in a Windows environment with the use of the splits package. The ultrametric single-locus gene tree required for the GMYC method was obtained using BEAST 1.8.2 [42] with 1×10 7 MCMC generations under the Yule speciation model and a burn-in of the first 10% generations was used to avoid suboptimal trees in the final consensus tree.

Phylogeny
The phylogeny was reconstructed in both maximum likelihood (ML) and Bayesian inference (BI) frameworks using three different datasets, respectively.
The ML tree inferred from COI sequences retrieved monophyly for the genus Tonkinacris and all species with extremely high bootstrap values except Fruhstorferiola tonkinensis, of which the bootstrap value for the clade is only 55 (Fig 3). However, the monophyly of the genus Tonkinacris was not supported, and the sepcies T. sinensis, T. meridionalis, Longgenacris maculacarina, Paratonkinacris vittifemoralis and Emeiacris maculata did not individually cluster into an independent clade in the ML tree from ITS1sequences (S1 Fig Phylogeny revealed by BI analyses was similar to that by ML analyses. In BI trees deduced from COI (S4 Fig) and combined sequences (Fig 4), the monophyly of the genus Tonkinacris and all species was strongly supported with the clades of most species having a posterior probability value of 1 except that of T. decoratus, which had a relatively lower posterior probability value of 0.84 (Fig 4)

Intraspecific variation and interspecific divergence
K2P distances within and between species/populations were calculated to measure intraspecific variation and interspecific divergence, respectively. The results showed similar distribution patterns to that in our previous study (S4 Table) [18,22]. For COI sequences, variations within population were mostly distinctly less than or slightly above 1%, except those of T. sinensis within Diding, Damingshan, Nonggang and Gaozhai populations, of which the maximum pairwise distance was 1.86%, 1.86%, 1.70% and 3.13%, respectively. Mean interpopulation variations for T. sinensis ranged from 0.18% (between Yong'an and Gaoji populations) to 1.70% (between Dayaoshan and Emeishan populations; Table 1), and those for Emeiacris maculata ranged from 4.24% to 4.73% (average 4.45%). The pairwise distances more than 3.00% for T. sinensis within Gaozhai population and between populations were all derived from the extreme sequence gh112 (S4 Table). The pairwise distances between gh112 and other individuals of T. sinensis ranged from 2.33% to 3.29%. Except the distances between gh112 and gh138 as well as gh135 from Nonggang population were 2.33% and 2.49%, respectively, all other distances were more than 2.81%, and most of them were more than 2.97%. Among all intraspecific pairwise distances of T. sinensis, only 106 out of 1540 (representing 6.88% of the total) were more than 2.00%, and 55 were derived from gh112, 14 from gh030 of Diding population, 28 from gh026 and gh027 of Gaozhai population, respectively, and the remaining 9 from gh025, gh028, gh029 of Gaozhai population. In a word, 93.12% of the total intraspecific variations were less than 2.00%, most of the high intraspecific variations more than 2.00% were derived from gh112 and other five individuals of Gaozhai population (gh025, gh026, gh027, gh028, gh029), and only one individual (gh030 from Diding population) out of Gaozhai population had 14 intraspecific distances more than 2.00%. ITS1 and ITS2 sequences showed much lower intraspecific variations than COI except in a few species/populations (S4 Table). Interspecific divergences of COI sequences within the genus Tonkinacris ranged from 1.02% (between T. decoratus and T. damingshanus) to 6.21% (between T. decoratus and T. sinensis), and those between other species pairs were all more than 4.76% (S5 Table). The interspecific divergences calculated from ITS1 and ITS2 sequences displayed similar distribution patterns to those from COI (S6 and S7 Tables). Species within Tonkinacris had an interspecific mean distance less than 1%, but the mean distances between other pairwise species within the subfamily Melanoplinae were usually more than 1%, and the mean distances between species in Melanoplinae and those out of Melanoplinae were all more than 10%.

Species boundary delimitation of the genus Tonkinacris
To explore the species boundary within the genus Tonkinacris, we sampled 56 individuals for T.sinensis from 8 populations, 20 individuals for T.decoratus, 10 individuals for each species of T. damingshanus and T. meridionalis for comparison. Considering the poor performance of ITS1 and ITS2 sequences in resolving phylogeny, we used only COI sequences to estimate the gap between intra-and inter-specific distances within the genus Tonkinacris. The results showed that all species pairs, except T. decoratus and T. damingshanus, had distinct gaps between intraspecific variations and interspecific divergences calculated from COI sequences. Although there is no overlap between T. decoratus and T. damingshanus, the intra-and interspecific pairwise distances of them also had no gap (Table 2).
In NJ trees from both COI and combined sequences (S7 and S8 Figs), all the four species formed reciprocally monophyletic clades. While the clades of three species have a bootstrap value of 100, the one of T. decoratus is only 63 and 73, respectively, indicating the possibly instable relationship between T. decoratus and T. damingshanus. The 6 individuals of T. decoratus with distinct or indistinct black transverse maculation on the base of the upper surface of hind femur did not clustered into an independent clade either by themselves or with individuals of T. damingshanus. The relationship among Tonkinacris species was entirely unresolved in NJ trees from ITS1and ITS2 sequences (S9 and S10 Figs).
For COI sequences, haplotype network analysis detected 17 haplotypes in T. sinensis (S8 Table), of which haplotype 1 is shared by four populations, haplotype 3, 8 and11 by two populations, respectively, and haplotype 6 by three populations (Table 3), indicating the close gene link among all sampled populations. In addition, the individuals of T. decoratus with and those without black transverse maculation on the base of the upper surface of hind femur can share  the same haplotype (S8 Table). Haplotypes 16 and 17 are private for Yong'an population, but Yong'an and Gaozhai are located at the west and east sides of Maoershan Nature Reserve, respectively and can be regarded as the same locality in a larger scale. Five haplotypes were detected in T. decoratus and 6 were discovered in T. meridionalis. However, only 2 haplotypes were detected in T. damingshanus. For ITS1 sequences, the number of haplotypes detected in the four species are 11, 3, 2 and 2, respectively, and no haplotype was shared by different species (S9 Table). For ITS2 sequences, only 8 haplotypes were detected in the genus Tonkinacris, of which haplotypes 1, 2, 3, 5 and 6 were private for T. sinensis, haplotype 8 was private for T. damingshanus, haplotype 4 was shared by T. sinensis and T.decoratus, and haplotype 7 was shared by all of the four Tonkinacris species (S10 Table).
In the network from COI haplotypes (Fig 5A), T. sinensis formed two separate clades with one consisting of the single gh112 only, and the other one comprising the 16 remaining haplotypes. T. meridionalis formed an independent clade, but T. decoratus and T. damingshanus clustered into the same clade together. While the two haplotypes of T. damingshanus formed a separate subclade, the mutation steps between T. decoratus and T. damingshanus were 6, distinctly less than the maximum connection steps of 11 at 95% connection limit. In the network from ITS1 haplotypes (Fig 5B), all of the four Tonkinacris species clustered into a single network, but haplotypes of each species formed a separate subclade. In the network from ITS2 haplotypes (Fig 5C), the four species not only formed a single network together, but also had two shared haplotypes, with one shared by T. sinensis and T. decoratus, and the other one shared by all of the four Tonkinacris species.
In GMYC analysis based on COI sequences, 26 putative species were delineated from the whole dataset (S11 Table). The results of GMYC analysis for Paratonkinacris vittifemoralis and Emeiacris maculata were the same as those in the previous study [22]. T. decoratus and T. damingshanus were delineated as the same species. Samples of T. sinensis were delineated into 4 putative species, and samples of each remaining species were delineated as an independent species. For the 4 putative species delineated from samples of T. sinensis in GMYC analysis (S11 Table), the putative species 5 consisted of individuals from 3 populations (Dayaoshan, Damingshan and Gaozhai), the putative species 6 consisted of individuals from 7 populations (Gaozhai, Yong'an, Gaoji, Damingshan, Diding, Nonggang, Emeishan), the putative species 7

Phylogeny of the genus Tonkinacris
The molecular marker of COI has been demonstrated informative and useful for comparison within and between species in Acrididae in some previous studies [18,22,[43][44][45]. While nuclear ITS1 and ITS2 sequences could not resolve the phylogeny within the genus Tonkinacris by any approach of ML, BI and NJ analyses (S1, S2, S5, S6, S9 and S10 Figs), mitochondrial COI barcode sequences performed an excellent resolution. The monophyly of the genus Tonkinacris and the four species of the genus had been retrieved in most phylogenetic trees only if the COI sequences were used (Figs 3 and 4 Since there is possibility that ITS1 and ITS2 sequences are not suitable for addressing phylogenetic problems below subfamily level because of their high homology [22,46], and the morphology agrees to a large extent with the results from molecular datasets comprising either single COI or combined sequences, it is reasonable to provisionally propose such a hypothesis that the Chinese species of the genus Tonkinacris is a monophyletic group and the phylogenetic relationship can be describedas (T. sinensis, (T. meridionalis, (T. decoratus, T. damingshanus))).
Although T. ruficerus and T. yaeyamaensis distributed in Japan were not involved in this study, we believe that it does not matter because they may not belong to the genus Tonkinacris due to their distinct morphological differences from their congeners in continental China and Vietnam just as mentioned in the introduction section.

Species delimitation within the genus Tonkinacris
There is no doubt that both T. sinensis and T. meridionalis are good independent species because: (1) they are not only distinguishable morphologically (Fig 2) but also supported by molecular evidence, i.e. the monophyly of each species is supported in nearly all phylogenetic trees from COI and combined datasets with extremely high bootstrap values or posterior probabilities (Figs 3 and 4; S5, S8 and S9 Figs) except in the ML tree from combined dataset (S3 Fig); (2) there are distinct gaps between their intra-and inter-specific genetic distances (Table 2), and (3) haplotypes of COI of each species formed a separate network (Fig 5A) except the haplotype represented by gh112, which is the only extreme sequence and possibly derived from sequencing error. However, the relationship between T. decoratus and T. damingshanus is not so unambiguous. While the monophyly of the two species was supported by COI and combined datasets most of the time, there was no gap for them between intra-and inter-specific distances ( Table 2). The differences in morphological character between T. decoratus and T. damingshanus proposed in original reference was not supported by molecular evidence in this study, but improved to be a natural variation. Furthermore, the mutation steps of 6 between them in the network from COI sequences is distinctly less than the maximum connection steps of 11 at 95% connection limit. More importantly, there are only two haplotypes for T. damingshanus detected from 10 samples, whereas 5 haplotypes were detected for T. meridionalis from the same amount of samples. Despite of the similar ratio of haplotypes detected for T. decoratus, the increased number of samples led to an increase of haplotype amount detected (S8 Table). Therefore, it is possible that conspicuous overlap for T. decoratus and T. damingshanus between intra-and inter-specific distances will arise when more individuals are sampled. Finally, T. decoratus and T. damingshanus were delineated as the same species in GMYC analysis. Since the only distinguishing morphological character between T. decoratus and T. damingshanus disappears and they were delineated as the same species by molecular approaches most of the time, it is appropriate to treat them as the same species, and a formal nomenclatural change will be made when the genus is revised based on the integrative evidences in the near future.
As for the 4 putative species delineated from samples of T. sinensis in GMYC analysis, they can be interpreted as a result of oversplitting because one of the four GMYC species (putative species 6) covers individuals from seven of the eight populations sampled (S11 Table), and this splitting was not supported by either morphological evidence or results from other molecular approaches, including the monophyly in most of the phylogenetic trees, the intraspecific genetic distances less than 2.00% most of the time, the close relationships among populations linked through shared haplotypes, and the single independent haplotype network (excluding the extreme one, gh112).

Performance of ITS1 and ITS2 sequences in resolving phylogeny of grasshoppers
Sequences of ribosomal DNA (rDNA), including both ribosomal RNA (rRNA) genes and their associated spacer regions, may be used to address phylogenetic problems at a variety of taxonomic levels [47]. Coding sequences are fairly conservative and have been used for examining relationships among more distant taxa. In contrast, spacer regions, which bracket the large and small coding elements, are subject to fewer constraints and, consequently, evolve more rapidly, supposedly making them useful for deducing relationships among strains or closely related groups [48][49][50]. However, this may not be the case in all groups of organisms. While the sequences of ITS1 in the genus Melanoplus had appreciable differences when insertions and deletions were taken into account, its utility might be questionable because of the long stretches of very high homology in this region [46]. Despite the fact that ITS2 sequences have been used to explore phylogeny and delimitate species boundary in some grasshopper groups [51][52][53], our previous [22] and present studies achieved similar results to the supposition by Kuperus & Chapco [46]. In this study, the relationship within and between closely species was nearly completely unresolved using ITS1 and ITS2, but the relationship at the subfamily level was resolved much better, and the monophyly of the subfamily Melanoplinae and some distantly related species was retrieved with high bootstrap values or posterior probabilities in most phylogenetic trees from ITS1 or ITS2 (S2, S5, S6, S9 and S10 Figs), except in the ML tree from ITS1 (S1 Fig) in which the monophyly of the subfamily Melanoplinae was not supported. Therefore, ITS1 and ITS2 sequences may not be suitable for addressing phylogenetic problems in Acrididae at genus and species levels, but may be usable at or above subfamily levels.