Species Delimitation and Phylogenetic Relationships in Ectobiid Cockroaches (Dictyoptera, Blattodea) from China

We collected Ectobiidae cockroach specimens from 44 locations in the south of the Yangtze valley. We obtained 297 COI sequences specimens and carried out phylogenetic and divergence dating analyses, as well as species delimitation analysis using a General Mixed Yule Coalescent (GMYC) framework. The intraspecific and interspecific sequence divergence in Ectobiidae cockroaches ranged from 0.0 to 7.0% and 4.6 to 30.8%, respectively. GMYC analysis resulted in 53 (confidence interval: 37–65) entities (likelihood ratio = 103.63) including 14 downloaded species. The COI GMYC groups partly corresponded to the ectobiid species and 52 ectobiid species were delimited successfully based on the combination of GMYC result with morphological information. We used the molecular data and 6 cockroach fossil calibrations to obtain a preliminary estimate of the timescale of ectobiid evolution. The major subfamilies in the group were found to have diverged between ~125–110 Ma, and morphospecies pairs were found to have diverged ~10 or more Ma.


Introduction
Cockroach species are often difficult to differentiate, both at the adult and juvenile stages. Individuals of closely related species are often very morphologically similar [1][2][3]. Cockroaches display high developmental stochasticity, which results in great variation in external spination, setation and coloration [4,5], making it difficult to distinguish species on the basis of morphological characters. The male genitalia is of great value in the discrimination of male adult cockroaches; but for some closely related species, it is also very challenging (Zheng et al. [6], Che, Y.L., personal observation). Most taxonomic keys for cockroaches are based on adult male genitalia, which means that the females of closely related cockroaches cannot easily be matched with males of the same species, or females may appear to be entirely different species (Wang, Z.Q., personal observation). More importantly, juveniles may comprise up to 80% (Wang, Z. Q., personal observation) or 90% of individuals in most cockroach surveys [7]. Individuals may be highly polymorphic over the course of development and adults are often significantly different from juveniles [8,9]. The difficulty in distinguishing different developmental stages within a species and the nymphs of different species from each other makes it difficult to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 net, as well as by light trapping and canopy fogging (by Guo ZHENG, Institute of Zoology, Chinese Academy of Sciences). The collected insects were stored in 95% or 100% ethanol.

Morphological types
We checked all the specimens, then the specimens of male adults were morphologically identified into species where possible or species indet (represented by sp.1, sp.2 and so on). Standard morphological characters were chosen to identify the specimens as follows: presence or absence of pulvilli and arolia, spinal type on anterior-ventral margin of fore-femur, the claws distinctly toothed or not, the degree of wing development (i.e., the number of incomplete branches and the area of appendicular field of hind wing), overall body shape, characteristics of male genitalia and variation of abdominal tergal glands [9,[46][47][48][49][50][51][52][53][54][55].
Within each species or species indet, we did not genetically sample each individual but chose the male individual form different localities for barcoding in order to obtain more genetic diversity. At the same time we also attempted to sample different variants within the same types to make sure that we could uncover new diversity. Specimens of female adults and nymphs were not delimited but used directly for PCR analysis and DNA sequencing.

DNA extraction, PCR and sequencing
The hind legs were used for molecular studies, and the other body parts were stored in 95% ethanol as voucher specimens. In total, 297 cockroach specimens were used for COI sequencing in this study ( Table 1). All specimens were deposited into the College of Plant Protection, Southwest University, Chongqing, China.
Total DNA was preserved in 100% ethanol and stored at −20˚C. The extraction procedure was according to the TIANamp Genomic DNA Kit (Tiangen Biotech, Beijing). Fragments of COI were amplified using PCR. Primers for the amplifications are LCO1490 (5'-GGT CAA CAA ATC ATA AAG ATA TTG G-3') and HCO2198 (5'-TAA ACT TCA GGG TGA CCA AAA AAT CA-3') [56]. The amplification conditions were: initial denaturation at 94˚C for 3 or The PCR amplification products were tested by electrophoresis on 1% agarose gel containing Godview-II. The successful PCR products were sent for sequencing at the BGI Technology Solutions Company Limited (BGI-Tech) (Beijing, China) using the aforementioned primers. All sequences were deposited at the National Center for Biotechnology Information (NCBI) GenBank (Table 1).

Sequence alignment and phylogenetic analysis
A total of 314 COI sequences were analyzed. This included 297 sequences from this study (Table 1), 5 sequences representing 4 species of Ectobiidae cockroaches downloaded from GenBank; 10 sequences representing 10 species of Blaberidae cockroaches downloaded from GenBank; and 2 mantid sequences (outgroups: Bantia werneri and Hoplocorypha sp) ( Table 2). Sequences were aligned using MUSCLE 3.8 [58]. Among our 297 sequences, 115 identical COI haplotypes were found and removed from the analysis. Intraspecific and interspecific genetic divergence values are quantified based on the Kimura 2-parameter (K2P) distance model [59], using MEGA 6.06 [60].
To explore phylogenetic relationships among these closely related species, Maximum likelihood (ML) and Bayesian inference (BI) analyses were performed using RAxML 7.7.1 [70] and MrBayes 3.2 [71] respectively. For ML, the GTRGAMMA model was selected for the COI datasets and 1000 bootstrap replicates were performed. For BI, we selected the nucleotide substitution model of COI according to the Bayesian Information Criterion (BIC) in ModelGenerator v.0.851 [72]. The best-fit model for COI was GTR+I+G. Two independent sets of Markov chains were run, each with one cold and three heated chains for 1×10 7 generations, and every 1000 th generation was sampled. Convergence was inferred when a standard deviation of split frequencies <0.01 was completed. Sump and sumt burninfrac was set to 25% and contype was set to allcompat.

Divergence date analyses
We also performed divergence date analyses to infer the evolution of Ectobiidae. For this analysis, the best-fitting models were chosen as follows: Codon position 1: SYM+G, Codon position 2: K81uf+I+G and Codon position 3: TrN+G using PartitionFinder V1.1.1. The molecular clock was calibrated using six minimum age constraints based on cockroach fossils as shown in Table 3. Analyses were performed using a relaxed molecular-clock model using the Bayesian phylogenetic software BEAST 1.8.1 [73]. Rate variation was modeled among branches using uncorrelated lognormal relaxed clocks [73], with a single model for all genes. A Yule speciation process was used for the tree prior [74] and posterior distributions of parameters, including the tree, were estimated using MCMC sampling. We performed two replicate MCMC runs, with the tree and parameter values sampled every 5,000 steps over a total of 50 million generations. A maximum clade credibility tree was obtained using TreeAnnotator within the BEAST software package with a burn-in of 1,000 trees. Acceptable sample sizes and convergence to the stationary distribution were checked using Tracer 1.5 [73].

GMYC analyses
Species, defined as independently evolving lineages, were delimited using the GMYC approach [29]. Time-resolved gene trees were estimated in BEAST 1.8.1 [73] under a strict clock model with the mean clock rate fixed to 1, and using the randoming starting tree. The Birth-Death speciation was used as a tree prior on divergence times. This is an appropriate choice of tree prior because the GMYC approach uses the constant-size coalescent as a null model for hypothesis testing (see [81]). We then applied the single-threshold GMYC method to the ultrametric gene tree generated by BEAST using the SPLITS package [82] in R [83]. The groups delimited were compared to a one-species null model using a likelihood ratio test.

Evaluating the two methods to delimt species
We used the GMYC result combined with morphological evidence in understanding species limits among ectobiid cockroaches. If GMYC species confromed to the morphospecies that we identified based on morphological data, we could conclude that our original grouping is one species. As to the females and nymphs, we also considered that if they grouped monophyletically with the correspongding males in the BI and ML inferences. But if GMYC result was inconsistent with the morphological result, we checked the specimen again especially the morphological divergence in genitalia to verify species delimitation.

COI sequence variation
In this study, the sequenced length of COI excluding the primer was approximately 658bp. All 297 sequences have been deposited in GenBank with accession numbers KY349516 to KY349812 for COI. The COI sequences that we sequenced have high AT content (65.4%), with an average nucleotide composition of A = 29.6%, T = 35.8%, C = 18.4%, and G = 16.2%. Sequence analysis revealed that 313 sites were variable, of which 289 were parsimony informative.

Phylogenetic inference
For COI, phylogenetic constructions yielded similar topologies for the two methods utilized (Fig 2 and S1 Fig). Females and nymphs formed monophyletic groups with their males as recovered in BI and ML analyses (Fig 2, S1 Fig). Most members of one genus were clustered together, with a few exceptions (Anaplectoidea and Symploce). Members of Sorineuchora, Episymploce, Margattea and Sigmella each formed monophyletic groups with high support values. Ectobiidae was found to be paraphyletic with respect to Blaberidae, although support for this grouping was low, and indeed support among the deeper branches of the tree was generally poor. The ectobiid Phyllodromica iberica was recovered to be the sister of Blaberidae + partial Blattillinae, although support for this grouping was not strong. Two subfamilies of Ectobiidae, Blattillinae and Pseudophyllodromiinae were found to be paraphyletic, although they were not well supported.

Divergence date analyses
The timescale for evolution of ectobiid species diversification based on COI and calibrations based on 6 cockroach fossils is shown in Fig 3.

GMYC analysis
The likelihoods of the null and GMYC models were 1077.505 and 1129.322 respectively. The GMYC was an improvement over the null model, and was clustered into 53 (confidence interval: 37-65) entities (likelihood ratio = 103.63) including 14 downloaded species.

Evaluating the two methods to delimt species
There is a difference in species delimitation between the two methods. The COI GMYC groups of our samples partly corresponded to the 28 ectobiid species with light blue highlights in Fig  2. Based on genitalia information (Fig 4) Fig 2, were separately categorized into the same morphotype; but based on morphological information, they were all treated different species (Figs 5-7). As to the incongruence, we checked the specimens again to make sure that the morphological divergence in genitalia was the intraspecific difference or not. Finally, the variations between different morphospecies of Anaplectoidea varia, Margattea bisignata and Margattea spinosa (light red highlights in Fig 2) was determined as intraspecific difference although large genetic distance existed, whereas the variations among other morphospecies (light purple highlights in Fig 2) was determined as interspecific difference although maybe there was slight genetic distance between them.
Based on the combination of GMYC result with morphological information, 52 ectobiid species were delimited successfully. The intraspecific and interspecific sequence divergence ranged from 0.0 to 7.0% and 4.6 to 30.8%, respectively.

Discussion
The number of species recovered from our GMYC analyses (53 entities (confidence interval: 37-65)) partly conformed to the number of morphological species that we identified (69 species, including 55 ectobiid species). Finally 52 ectobiid species were delimited successfully using GMYC method with morphological information. GMYC method exhibited a significant reduction (13 species) in total species count. Our results therefore show that DNA based species delimitation methods perform not well for cockroaches, just as a complementary method to species delimitation based on morphological data. Certainly, DNA based identification   methods are especially useful for cockroaches, due to a lack of defining characters among females and nymphs of these organisms. Our study is the first attempt to investigate species delimitation of a large number of cockroach species, which included females and nymphs. GMYC employing a tree based on COI helped us to accurately identify all 52 ectobiid morphospecies. Thus, phylogenetic analysis of COI in combination with GMYC proved to be an invaluable tool for delimiting cockroach species and complementing classical taxonomy in the context of effective identification.

Species delimitation
For the ectobiid cockroaches studied here, the intraspecific and interspecific K2P genetic divergence ranged from 0.0 to 7.0% and 4.6 to 30.8%, which more or less similar to other groups (e.g. thrips: 0.0 to 7.91% and 8.65% to 31.15% [84]; mosquitoes: 0-1.67% and 2.3-21.8% [85]), although the greater intraspecific diversity showed some overlap with interspecific divergence. Hebert et al. [86] proposed that the genetic divergence cutoff for species identification should be at least 10 times greater than within species. However, many exceptional cases that do not follow this proposal have been reported. Barcode sequence divergence in conspecific specimens ranged from 0-1.67% and congeneric species showed from 2.3-21.8% divergence for 122 mosquito species in China [85], while for 21 mosquito species in Pakistan, these values were from 0-2.4% and 2.3-17.8% divergence [87]. Rebijith et al. [84] reported that the intraspecific and intrageneric distances of COI barcode sequence for 151 thrip species ranged from 0.0 to 7.91% and 8.65% to 31.15% respectively. Both Meyer & Paulay [88] and Wieners et al. [89] proposed that the "barcoding gap" was an artifact of insufficient sampling across taxa. In other words, if sufficient sampling were undertaken, intraspecific variation would overlap with interspecific divergence.
Although the COI GMYC groups of our samples partly corresponded to the 28 ectobiid species, the male adults of one species have the same morphological and genitalia characters, but for females or nymphs, it may be not the same as males and only DNA based methods can be used to solve it in that case in general. Females and nymphs formed monophyletic groups with their males as recovered in BI and ML analyses (Fig 2, S1 Fig), consistent with the results from GMYC method.
Although genital morphology has been proved to be more effective in diagnosing cockroaches, it is also very challenging to use it in the taxonomy. Anaplectoidea varia (Fig 4I-4L), Margattea bisignata (Fig 4A-4D) and Margattea spinosa (Fig 4E-4H) were each split into 2 different morphospecies because of their difference in genitalia. After careful examination, the variations between different morphospecies of three species listed above was determined as intraspecific difference. Anaplectoidea varia (the genetic distance of AnapoVar4, 6 ( Fig 4I and  4J) vs. others (Fig 4K and 4L): 6.3%), Margattea bisignata (the genetic distance of MargBisi4 (Fig 4A and 4B) vs. others (Fig 4C and 4D): 6.0%) and Margattea spinosa (the genetic distance of MargSpiA4, 7 (Fig 4E and 4F) vs. others (Fig 4G and 4H): 7.8%) with light red highlights in Fig 2 all showed slight difference in genitalia and large genetic distance, much higher than that of other intraspecific divergences, even higher than interspecific genetic divergences (4.6%) between some species (Episymploce kunmingi and Episymploce sp.4, Episymploce kryzhanovshii and Episymploce conspicua). The sample localities for Margattea bisignata were geographically distant from each other (~1000 km); MargBisi4 was from E'mei, Sichuan Province while other members were from Guangxi Province, Guangdong Province, Fujian Province and Hubei Province. The samples of AnapoVar4, 6 were from Hainan Province, which is an isolated island separated by Qiongzhou Strait from mainland. So gene flow might be hindered between the two geographically distant populations, which accounts for the larger intraspecific divergences of Margattea bisignata and Anaplectoidea varia. But for Margattea spinosa, all samples were from Guangxi and Hainan Province, which are the tropical regions in China. That the tropical and subtropical taxa had the greater diversity and substantial phylogeographic structure [90] maybe resulted to increase intraspecific genetic divergence.
The similar pairs Episymploce kunmingi and Episymploce sp.4 was delimited as one GMYC species but each recovered as single group in BI and ML inference (Fig 2, S1 Fig). On the other hand, the genetic distance between Episymploce kunmingi (Fig 6: u-v) and Episymploce sp.4 (Fig 6S and 6T) was only 4.6%, yet there were strong morphological differences between them as follows: (1) the former with minute spines present in the right margin of subgenital plate, but in the latter, large spines present; (2) the right style shorter than the left one in Episymploce kunmingi, but for Episymploce sp.4, the right style distinctly longer than the left one; (3) Episymploce kunmingi with a spinelike process near the right side of excavation, but Episymploce sp.3 without any process near the excavation. After we checked the morphological characters including the male genitalia, we were unable to find differences between them. The genetic distance between Episymploce conspicua and Episymploce kryzhanovshii was also 4.7%, yet Episymploce conspicua (Fig 6O and 6P) was clearly distinguished from Episymploce kryzhanovshii (Fig 6M and 6N) by the following characters: (1) body of Episymploce conspicua medium, about 2.2 cm including tegmina, but in Episymploce kryzhanovshii, body small and about 1.1 cm including tegmina; (2) posterior margin of supra-anal plate with a V-shaped concavity at middle and symmetrical in Episymploce conspicua, only a shallow crack present in Episymploce kryzhanovshii and asymmetrical; (3) lateral margin of genital plate with apex tapering and without spines scattered in Episymploce conspicua, but in Episymploce kryzhanovshii, apex rounded and scattered with spines; (4) spines absent in both styli of Episymploce conspicua, but present in Episymploce kryzhanovshii.

Phylogeny and evolutionary timescale of ectobiidae
Blaberoidea includes the groups Ectobiidae (Pseudophyllodromiinae, Blattellinae, Ectobiinae, Nyctoborinae) and Blaberidae [65,91]. These groups have been shown to form a monophyletic group by morphological [92,93] and molecular [65,[94][95][96][97][98] data; in most previous studies, Ectobiidae was recovered as paraphyletic with respect to Blaberidae. In our study, we obtained the clade Blaberoidea with high support values (BPP = 100, MLB = 100) based on substantial cockroach COI samples on a large scale; however, we had no samples from Nyctoborinae. Although our analyses recovered each as paraphyletic (see below), it should be noted we only used one mitochondrial gene, which is likely to be less reliable compared with the multi-gene analyses of other studies.
Grandcolas [92] proposed that Pseudophyllodromiinae was monophyletic and the sister group of Blaberidae. Klass [99,100] and Klass & Meier [93] considered the Pseudophyllodromiinae to be paraphyletic, while Inward et al. [94] obtained a monophyletic Pseudophyllodromiinae as sister group of Ectobiinae. In our study, Pseudophyllodromiinae was paraphyletic and one part of Pseudophyllodromiinae (Allacta, Balta, Sorineuchora) was recovered to be the sister group of the left Blaberoidea members (MLB>90).
The trees based on BI and ML analyses show that the members of genus Anaplectoidea was not clustered together; on the contrary, Anaplectoidea spinea and Malaccina sinica formed a monophyletic group (BPP = 99, MLB = 100), which was the sister to other members of Anaplectoidea (BPP = 100, MLB = 100). These two genera are highly morphologically similar, and the differences between these two genera mainly manifest in the numbers of incomplete branches and the area of appendicular field of hind wings according to the morphology. Roth [53] transferred two species of Anaplectoidea to Malaccina. Anaplectoidea and Malaccina should probably be treated as one genus because of their close genetic relationship.
The relationships between Hemithyrsocera and Symplocodes, Episymploce and Symploce were similar to those of Anaplectoidea and Malaccina. The only character that clearly separates Symplocodes from Hemithyrsocera is the distinctly toothed tarsal claws in the former [52]. Episymploce and Symploce are highly morphologically similar, and the main differences between them are the symmetry of the supra-anal plate and the thickness of lateral margins in  vehicles and ships as an important pest in China, while other members are found in leaf litter and grass or shrubs in forested area [101]. Blattella radicifera, Blattella sp.1 and Blattella sauteri formed a separate clade from other Blattella in our analysis. They are clearly distinguished from other Blattella members by short and broad supra-anal plate (Fig 5J and 5L) (while in other Blattella members, supra-anal plate is tongue-shaped (Fig 5B and 5F)).
Blaberidae was not recovered to be a monophyletic group but formed a monophyly with partial Blattellinae (Symplocodes, Hemithyrsocera and Sigmella). This was not consistent with other recent studies [65,91,102], which revealed that Blaberidae was monophyletic.  The present study is the first to provide fossil calibrated molecular estimates of divergence time for the major lineages of Ectobiidae based on a wide variety of taxa, although the dates should be interpreted with caution due to the use of only a single mitochondrial marker. The divergence of Blaberidae and Ectobiidae was estimated to have occurred 142.3 Ma (125-167.4 Ma), largely consistent with previous estimate (Lo et al. [103]:~140-145 Ma; Djernaes et al. [91]:~185 Ma). The major subfamilies of Ectobiidae were found to have diverged betweeñ 125-110 Ma, and most morphospecies pairs were found to have diverged~10 or more Ma.

Conclusion
Our results show that GMYC methodology generates species hypotheses for cockroaches that are partly consistent with those based on traditional morphological techniques. However, it's tenuous to only take GMYC for granted as effectiveness of cockroach species delimitation, despite it performs well for other groups. The GMYC technique shows promise as a rapid, precise, independent identification approach for the discrimination of cockroach species of different life stages and color morphs to some extent. Moreover, as our study has revealed the combination of GMYC method with morphological data to delimit species successfully, the approaches we used may help to increase our understanding of cockroach biodiversity.