Complete mitochondrial genomes of two blattid cockroaches, Periplaneta australasiae and Neostylopyga rhombifolia, and phylogenetic relationships within the Blattaria

Complete mitochondrial genomes (mitogenomes) of two cockroach species, Periplaneta australasiae and Neostylopyga rhombifolia, 15,605 bp and 15,711 bp in length, respectively, were determined. As reported for other cockroach mitogenomes, the two mitogenomes possessed typical ancestral insect mitogenome gene composition and arrangement. Only several small intergenic spacers were found: one, which was common in all sequenced cockroach mitogenomes except for the genus Cryptocercus, was between tRNA-Ser (UCN) and ND1 and contained a 7bp highly conserved motif (WACTTAA). Three different types of short tandem repeats in the N. rhombifolia control region (CR) were observed. The homologous alignments of these tandem repeats with other six cockroach mitogenome CRs revealed a low similarity. Three conserved sequence blocks (CSB) were detected in both cockroach mitochondrial CRs. CSB1 was specific for blattinine mitogenomes and was highly conserved with 95% similarity, speculating that this block was a possible molecular synapomorphy for this subfamily. CSB3 located nearby downstream of CSB1 and has more variations within blattinine mitogenomes compared with CSB1. The CSB3 was capable of forming stable stem-loop structure with a small T-stretch in the loop portion. We assessed the influence of four datasets and two inference methods on topology within Orthopteroidea. All genes excluding the third codon positions of PCGs could generate more stable topology, and higher posterior probabilities than bootstrap values were presented at some branch nodes. The phylogenetic analysis with different datasets and analytical methods supported the monophyly of Dictyoptera and supported strongly the proposal that Isoptera should be classified as a family (Termitidae) of the Blattaria. Specifically, Shelfordella lateralis was inserted in the clade Periplaneta. Considering the K2P genetic distance, morphological characters, and the phylogenetic trees, we suggested that S. lateralis should be placed in the genus Periplaneta.


Introduction
Cockroaches are a various insect of some 4,600 species and 460 genera are described now [1]. A relatively small number, considering the large number of species, of cockroaches are known as pests [2]. These pest cockroaches cause health problems, such as asthma and allergies, as well as economic costs for controlling them [3]. The cockroach (Insecta: Blattaria) Periplaneta australasiae and Neostylopyga rhombifolia belong to the subfamily Blattinae family Blattidae. They both are abundant and widely distributed urban pests [4]. Within the Blattinae, the species of the genus Neostylopyga are unique with the tegmina and wings strongly reduced or absent [5]. The Australian cockroach (Periplaneta australasiae) is a tropical cockroach, which is similar to type species, Periplaneta americana, in morphology, except for the pattern on the pronotum. There are approximately 324 species and 24 genera of Blattinae [6], as most of them lack of molecular data, the phylogenetic relationships among them are still contrasty [7].
Insect mitogenome is typically 15-18 kb in size which encodes 13 protein-coding genes (PCGs) plus 22 transfer and 2 ribosomal RNA genes [17]. In addition, there are a variety of noncoding structural features of which the largest is known as the A+T-rich region, including some conserve structural elements [18]. Owing to maternal inheritance, a relatively rapid evolutionary rate, and lack of intermolecular recombinations, mitochondral DNA has been used widely in studies of population structures, molecular evolution, phylogeography, and phylogenetic relationships [9,[19][20][21]. Complete mitochondrial genomes are not only more informative than single or multi-genes, but also provide several genome-level characters, such as gene content and gene organization, genetic codon variation, tRNA and rRNA gene secondary structures, and pattern of controlling replication and transcription [22]. However, only 12 complete mitochondrial genomes are currently available for Blattaria. Considering the diversity of the Blattaria, which contains 4,600 described species, the existing full-length mitogenome sequence information for the Blattaria remains rather limited.
In this study, we sequenced and annotated the complete mitochondrial genomes of P. australasiae and N. rhombifolia, identified double control regions in both species, and compared various motifs to the other available blattarian mitogenomes. We reconstructed a phylogeny of Orthopteroidea to determine the relationships within Dictyoptera especially within Blattaria at the family level by using these two new mitogenomes in addition to the previously published mitogenomes of Orthopteroidea. We also assessed the influence of data types, phylogeny inference methods, exclusion and inclusion third codon positions of PCGs on topology and nodal support within Dictyoptera. In order to avoid the taxonomical confusions, we essentially followed the taxonomy system for the cockroaches, proposed by Louis [6] including six recognized families: Blattidae, Polyphagidae, Cryptocercidae, Blattellidae, Nocticolidae, and Blaberidae.

Sample and DNA extraction
Cockroaches (Insecta: Blattaria) Periplaneta australasiae and Neostylopyga rhombifolia are all abundant and widely distributed urban pests [2]. These two cockroaches closely associated with human food, storage, harborage, and conditions provided by humans. They even cause health problem, such as asthma and allergies [2]. People always try to catch or kill these cockroaches for controlling their number in house. In this study, Periplaneta australasiae and Neostylopyga rhombifolia were collected respectively in Dongguan in Guangdong Province, and in Yulin in Guangxi Province on February 2016. Both specimens were collected in volunteers' homes. We thanked both volunteers, Shilin He and Wujiao Li, in the Acknowledgments section. No specific permissions were required for these locations and this study did not involve endangered or protected species.
The fresh materials were preserved in 100% ethanol and stored in a -20˚C refrigerator. Whole-genomic DNA was extracted from muscle tissue with the TIANamp Genomic DNA kit (TIANGEN, Beijing, China).

PCR amplification and sequencing
The research follows Simon et al amplification and sequencing methods [23]. The primers were designed from aligned conserved nucleotide sequences of Periplaneta americana [10] and Periplaneta fuliginosa [24]. Then, based on the obtained sequences, species-specific primers were designed using Primer Premier 5.0 to amplify the overlapping fragments. Primer sequences and locations for each PCR are listed in Table 1. Primers Nr1F and Pa1F were from Du et al [25]. Primers Nr9F, Nr9R, Nr10F, and Pa10F were from Xiao et al [10]. Within each PCR product, the full doublestranded sequence was determined by primer walking (PTC-100 thermal cycler, BioRad, Hercules, CA). The PCR was performed using Vazyme Taq enzyme with the following cycling conditions: an initial denaturation for 5 min at 94˚C, followed by 35 cycles of denaturation for 30s at 94˚C, annealing for 30 s at 51-62˚C (depending on primer combinations), elongation for 1-3 min (depending on putative length of the fragments) at 72˚C, and a final extension step of 72˚C for 10 min. The PCR products were assessed by electrophoresis in a 1.5% agarose gel and were stained by double-stranded DNA binding fluorescent dye (GoldView stain). The PCR products were purified using the DNA agarose gel extraction kit (OMEGA, China) and then sequenced from both directions on an ABI PRISM 3730 DNA sequencer by Tsingke Biotechnology Company (Chengdu, China). and trnR of N. rhombifolia were routinely not found by tRNAScan-SE; they were identified by eye, through reference to secondary structure models for those genes from other blattarian insects. The secondary structures of tRNA genes were drawn using Adobe Illustrator CS6. The 13 protein-coding regions between tRNA were identified by ORF Finder implemented by NCBI website (http://www.ncbi.nlm.nih.gov/projects/gorf/) with invertebrate mitochondrial genetic codes. The rRNA gene boundaries were interpreted as the end of a bounding tRNA gene, and comparison of sequences with homologous regions of known blattarian mitogenomes was done using MEGA 5.0 [27]. The A+T content of nucleotide sequences, genetic distances, and relative synonymous codon usage (RSCU) were calculated using MEGA 5.0. The AT skewness was calculated according to the following formula: AT skew = [A-T] / [A+T] [28]. Secondary structures of repeat regions within the mitochondrial control region were inferred from the mfold web server [29] (http://mfold.rna.albany.edu/?q=mfold/DNA-Folding-Form). Tandem Repeat Finder v4.07 was used to confirm tandem repeats in A+T-rich regions [30].

Phylogenetic inference
We used mtDNA sequences of 57 species taken from GenBank plus those of two additional species newly sequenced for this study. The list included 14 cockroaches, 17 termites, 8 mantids, 9 grasshoppers, 8 stick insects and a mantophasmatid. Mitochondrial genomes of two dragonflies (Odonata), Brachythemis contaminata and Hydrobasileus croceus, were used as outgroups (S1 Table). We set up four datasets with different gene content: nucleotide data of all genes (protein-coding, ribosomal RNA, and transfer RNA genes) (ALL-123), nucleotide data of all genes excluding the third codon sites of the protein-coding genes (ALL-12), nucleotide data of protein-coding genes (PCG-123), nucleotide data of protein-coding genes excluding third codon sites (PCG-12). Protein-coding genes, ribosomal RNA, and transfer RNA genes of these 57 species were extracted according to GenBank annotations using GenScalpel [31]. PCGs were aligned as DNA codons in MEGA 5.0, the unaligned and unmatched regions were selected and then back-translated into nucleotides and then deleted. The third codon positions of the 13 PCGs were excluded using DAMBE 6.4.42 [32]. Nucleotide sequences of RNA genes from the mitogenomes of the 59 species were aligned with MEGA 5.0, the unaligned and unmatched regions were removed, and then the concatenated nucleotide sequences were combined to the end of the aligned nucleotide of 13 PCGs respectively. In order to reconstruct the phylogenetic relationships within Orthopteroidea, maximum likelihood (ML) and Bayesian inference (BI) were used to determine the effect of analytical method on topology and nodal support. The program Modeltest ver. 3.7 [33] was used with Akaike Information Criterion (AIC) [34] to calculate the substitution model selection. The GTR+I+G model was chosen as the best-fitting model for BI analysis. Then Bayesian inference (BI) analyses of nucleotides were performed with MrBayes 3.2.2 [35]. Four chains (one cold chain and three hot chains) ran in parallel for 10,000,000 generations, sampling every 100 generations and burn-in of 2500 trees. For maximum likelihood (ML) of nucleotide datasets, we implemented the GTR matrix in the PHYML online web server (http://www.atgc-montpellier. fr/phyml/) [36] with 1000 bootstrap replications. The phylogenetic trees were visualized by FigTree v1.4.0 [37] program with adjustable settings.

Neighbor-joining analysis
To explore the phylogeny between Periplaneta species and Shelfordella lateralis, all of the haplotypes of the Periplaneta species and Shelfordella lateralis COI barcodes were taken from NCBI ( Table 2). Sequences were trimmed to a final length of 598 bp. Pairwise nucleotide sequence divergences were estimated between all of the haplotypes of Periplaneta (five species) and Shelfordella lateralis COI barcodes sequences using the Kimura 2-parameter (K2P) model [38] implemented in MEGA 5.0 [27].

Genome content and organization
The mitochondrial genomes of Periplaneta australasiae and Neostylopyga rhombifolia were typical circular molecules, 15,605bp and 15,711bp in length, respectively, and were submitted to GenBank under the accession numbers KX640825 and KX640826. The sizes of the P. australasiae and N. rhombifolia mitogenomes were within the range of other blattarian mitogenomes, with the lengths ranging from 14,996 bp (NC_006076.1) to 17,340 bp (NC_029224.1) (S1 Table). The gene order and orientation of P. australasiae and N. rhombifolia mitogenomes were identical to those of other reported blattarian cockroach species (Fig 1 and Table 3) and had the ancestral insect gene composition and arrangement [23]. Additionally, as other dictyopteran insect mitogenomes, the nucleotide composition of the P. australasiae and N. rhombifolia mitogenomes had a high A+T bias of 74.9% and showed a skew of A's (S1 Table).
The relative synonymous codon usage (RSCU) value of P. australasiae and N. rhombifolia mitogenomes was summarized in S2 Table. All initiation and termination codons were included: the UUA (L) codon was used most frequently, followed by CGA (R) and ACA (T) in P. australasiae, and CGA(R) and GUA (V) in N. rhombifolia. The codon usage preference of A +T-rich over synonymous codon families, which played a major role in the A+T bias of the entire mitogenome. All codons ending with A or T outnumber those ending with C or G, Table 2. K2P genetic distances between all of the haplotypes of Periplaneta (five species) and Shelfordella lateralis COI barcodes sequences.

Protein-coding genes
A summary of the mitogenomes of P. australasiae and N. rhombifolia was given in Table 3. As ancestral insect mitogenomes, four PCGs (ND5, ND4, ND4L, and ND1) were coded on the minority strand (N-strand) and the remaining nine of the 13 PCGs were coded on the majority strand (J-strand). For P. australasiae, nine of 13 PCGs had ATG as the start codon, while COX1 utilized TTG, ATP8, ND6 and ND3 translated from ATT. For N. rhombifolia, ATG also was the most common start codon and occurred in the other eight PCGs except for COX1 (TTG), ATP8, ND6 and ND3 (ATT), and ND5 (ATA). COX1 uses TTG as the start codon which is an accepted conventional start codon for blattarian mitogenomes [10,13,39]. As for the termination codons, TAA and TAG were commonly used except COX2, ATP6, ND3, ND6, and CytB in P. australasiae, and COX2, ND6, and CytB in N. rhombifolia. For P. australasiae, COX2, ND3, and CytB stopped with T-, and ATP6 and ATP8 ending with TA-. For N. rhombifolia, COX2 and CytB stopped with incomplete T-, and ND6 used TA-nucleotides as incomplete stop codon.

tRNA genes
The secondary structures of each tRNA are shown in S1 and S2 Figs. The typical set of 22 tRNA genes ranged from 64 to 71 bp in P. australasiae and from 64 to 72 bp in N. rhombifolia, which were conserved among insects [40]. Among these 22 tRNA genes of P. australasiae mitogenome, all can be detected by tRNAScan-SE with the exception of tRNA-Ser (AGN) due to the absence of DHU arm. In N. rhombifolia, besides tRNA-Ser (AGN), tRNA-Arg also can't be spotted by tRNAScan-SE, in which the T-arm contained six paired nucleotides. These transfer RNAs were determined through comparison with previously published blattarian mitogenomes [10,13,41]. The secondary structures of most tRNA genes of the two mitogenomes were conversed except for tRNA-Lys, which only had four paired nucleotides in the anticodon arm. Findings showed twenty-nine mismatches of base pairs in P. australasiae tRNA genes, with twenty-four noncanonical matches of G-U base pairs. The other three U-U, one A-C, and one U-C base-pairings showed as mismatches in the stems of five different tRNAs (tRNA-Met, tRNA-Leu (CUN), tRNA-Ser (AGN), tRNA-Val, tRNA-Trp). In N. rhombifolia, there were five more unmatched base pairs in the tRNA genes than in P. australasiae. Twenty-eight of the mismatches in N. rhombifolia were G-U pairs; the remaining six mismatches were as follows: two U-U mismatches in tRNA-Leu (CUN), one U-U and one A-A mismatch in tRNA-Ser (AGN), one A-C mismatch in tRNA-Met and one U-C mismatch in tRNA-Trp.

Non-coding regions
Generally, the insect mitogenomes display the evolutionary economic perspective, but large intergenic spacers are existing in some insects [42]. Nevertheless, the complete genomes of P. australasiae and N. rhombifolia only contained a few short non-coding fragments and no long intergenetic spacers were found. The two longest intergenic spacers regions of more than 10 bp for P. australasiae were between tRNA-Leu (UUR) and COX2 (16bp) and between tRNA-Ser (UCN) and ND1 (25bp). N. rhombifolia had two relatively large intergenic spacers: one 18bp long was located between tRNA-Cys and tRNA-Tyr and another 17bp was at the same position as for the P. australasiae mitogenome (between tRNA-Ser (UCN) and ND1). We aligned all blattarian mitogenomes reported and found that, except for the genus Cryptocercus, the intergenic spacer between tRNA-Ser (UCN) and ND1 appeared in all sequenced blattarian mitogenomes and ranged from 15 bp in Eupolyphaga sinensis (NC_014274.1) to 58 bp in Gromphadorhina portentosa (NC_030001.1). These intergenic spacer sequences showed a 7bp highly conserved motif (WACTTAA) (Fig 2), which can be deemed to be the binding site of the transcription termination [43]. The control region (CR) was thought to have played an important role in regulating the mtDNA transcription and replication [44][45]. The lengths of the CR of P. australasiae and N. rhombifolia were 779bp and 903bp, with AT contents of 81.6% and 80.0%, respectively. Three different types of short tandem repeats in N. rhombifolia CR were observed. The repeats were located closely to tRNA-Met which consisted of two full A type units, two full B type units, two full C type units, and partial C unit (Fig 3A). In insect mitochondrial control regions, low levels of primary sequence similarity across taxa have led to the suggestion of conserved structural elements [18]. Zhang & Hewitt [18] summarized that the structural elements among control regions were as follows: a long sequence of thymines, a [TA(A)] n stretch between the poly T stretch, a highly conserved secondary structure, and conservative structure flanking the stemloop structures. Three conserved sequence blocks were also identified in the P. australasiae and N. rhombifolia CRs: blocks 1, 2 and 3 (Fig 3). It is worth noting that these conserved blocks are spread through the whole A+T-rich region.
Conserved sequence block 1 (CSB1) was located closely downstream of the tRNA-Ile gene. This block was highly conserved within the Blattinae with only one nucleotide variation, or 95% similarity (Fig 3B), and it has not been found in other insect mitogenomes. Another conserved sequence block (CSB2) was identified by aligning with the [TA(A)] n sequence described by Zhang et al [46]. The motif has the similar core sequence 5'-A. . .TAATTTA. . .TT. . .ATA. . .ACATTT-3' which resembles the template stop signals for Dloop syntheses in human and mouse mtDNA [47]. The CSB3 was located nearby the downstream of CSB1 and has more variations ( Fig 3D) compared with CSB1, which was a major stem and loop (or hairpin) found in the A+T-rich region (Fig 4). The stem ranged in size from 30 paired bases in Eupolyphaga sinensis to 77 paired bases in Blattella bisignata and the loop from 11 bp in size in Eupolyphaga sinensis to 15 bp in P. australasiae and N. rhombifolia. In addition, small T-stretches were observed in the loop portion of hairpin structures (Fig 4).

Phylogenetic analyses
The phylogenetic relationships were analyzed based on four datasets using both ML and BI methods. The results of analysis of the ALL-12 dataset are presented in Figs 5 and 6, and the results of other analyses are presented in S3-S8 Figs. The topology and nodal support are sensitive to different datasets and inference methods. The major effect of the optimality criteria was seen at the interordinal level. In ML analyses (except ALL-12) (S4, S6 and S8 Figs), Orthoptera was paraphyletic, with Ensifera being sister to Phasmatodea. Besides, for the PCG-12 dataset, Mantophasmatodea had a sister relationship with Dictyoptera in ML analysis, whereas Mantophasmatodea clusters with Phasmatodea when Bayesian inference were used in analysis (S7 and S8 Figs). With regard to the nodal support, the posterior probabilities in BI analyses were generally higher than bootstrap values in ML analyses in some branch nodes. Different mitogenome data did not appear to affect support values much, but they did slightly affect topology at the interfamily level. When 13 protein-coding genes were analyzed as a single partition (PCG-123), Blattellidae + Bleberidae was clustered with Blattidae, but which was    (Figs 5 and 6).
The saturation analyses on the first, second and third codon positions of the 13 PCGs were showed in Fig 7. Saturation plots indicated substantial substitution saturation in the third codon positions of all PCGs. So, exclusion of the third codon position from protein-coding genes had a considerable improvement in phylogenetic reconstruction. As shown in all Figs, the analyses based on the PCG-123 and ALL-123 performed poorly compared to the PCG-12 and ALL-12, resulting in unique topologies from BI and ML methods. For ALL-123 dataset, the monophyly of Orthoptera is never recovered in ML or BI analyses when compared to ALL-12 (Figs 5 and 6; S3 and S4 Figs).

Discussion
The newly determined mitogenomes in present study were similar in gene arrangement (Fig  1), nucleotide composition (S1 Table), and pattern of codon usage (S2 Table) when compared to the other available blattarian mitogenomes as well as to the presumed ancestral hexapod [30]. It suggested the conservation of mitochondrial genome evolution within the Blattaria. In contrast, some other dictyopteran insect lineages (termites and mantids) showed a number of variations in gene order or nucleotide composition [48][49][50]. Lineage-specific purifying selective forces, life history characteristics, or demographic histories may help us to understand the relatively slow rate of evolution in nuclear genes and conserved mitogenome evolution in cockroaches [51]. However, this analysis is preliminary due to the lack of mitochondrial genomes in other major cockroach lineages.
The A+T-rich region known as the control region for insect mitogenome is the largest noncoding region in all blattarian mitogenomes. Because of the various motifs and copies of tandem repeats, the control region exhibits a higher level of sequence and length modifiability than other regions [46]. Among these 14 sequenced blattarian mitogenomes, the lengths of the control regions showed distinctive differences, which ranged from 208bp in Periplaneta fuliginosa [24] to 3967bp in Opisthoplatia orientalis [52] (S1 Table). These large length differences mainly result from the absence or presence of tandem repeats and diverse motifs in their control regions. The A+T-rich regions which had comparatively longer sequences contained more and longer tandem repeats (Fig 8). Repetitive sequences of control regions have been used for phylogeographic or population genetics studies. In Isoptera [9], the presence/absence of different repeats could be a marker to resolve the early branching patterns within the Termitidae. Mancini et al [53] reported that the variable number of tandem repeat units were useful for reasoning the genetic structures of populations among closely related taxa. In present study, seven blattarian mitogenome control regions contain tandem repeats, and these tandem repeats appear in dispersed phylogeny positions. Additionally, the homologous alignment among these repetitive sequences of seven blattarian mitogenome control regions revealed a low similarity. None of these repeat units among these seven blattarian mitogenome control regions were sequence homologous and included any conserved sequence (S3 Table). The high sequence diversity between the tandem repeat regions implies that they may have different origin. Besides, if detailed nucleotide divergence of repeat units in more blattarian insect mitogenomes were obtained, these repeat sequences would be probably used for phylogenetic inference and species identification.
Three conserved sequence blocks were identified in the A+T-rich region of P. australasiae and N. rhombifolia (Fig 3). The characteristic of CSB1 within the Blattinae did not correspond to the structures previously described in Orthopera and Diptera by Zhang et al [46] because no polythymidine stretch was present. Since our findings showed that the CSB1 occurred in all Blattinae mitogenomes reported and was highly conserved within the Blattinae (95% similarity) (Fig 3B), CSB1 might be a molecular marker to distinguish the Blattinae from other subfamilies. Considering the limited samples, we cannot immediately confirm this block is of functional importance or of identified value for all blattinine species. Separate from CSB1, both CSB2 and CSB3 were common among other dictyopteran insects (Isoptera [9] and Mantodea [50]) as well as among other insect orders such as Lepidoptera [54] and Plecoptera [55]. CSB3 is capable of forming stable stem-loop structure with T-stretch in the loop portion (Fig  4). Since replication has been shown to initiate within or close to stem-loop structures, they may play an important role in regulatory functions during replication as well as transcription of mtDNA [56]. CSB3 also was successfully used as a marker in phylogeny studies. In Cameron's research [9], they detected stem-loop structures could be molecular synapomorphies within termites. Nevertheless, the presence or absence of stem-loop structures identified in our study is not consistent across clades in our phylogenetic trees. Two aspects should be considered to explain this difference. One involves high nucleotide substitution rate and length mutation rate, which cause highly polymorphic structures in control regions [18]. Another involves sampling number in studies, only some typical species are included in Cameron's research [9], and additional genera need to be tested to determine the evolution of this feature.
Previous studies have shown insect mitogenomes were the source of sequence data for phylogenetic analysis [17]. Besides, the effectiveness of different analytical approaches was extensively tested [9,20,57]. In the present study, the phylogenetic relationships among cockroach families are sensitive to variations in phylogenetic inference methods and different datasets. However, when all genes excluding the third codon positions of PCGs analyzed simultaneously (ALL-12), there was no apparent effect on topology between the optimality criteria analyses. The ALL-12 dataset always recovers the monophyly of Orthoptera, and supports a phylogeny of (((Blattellidae + Bleberidae) + (Polyphagidae + (Blattidae + (Cryptocercidae + Isoptera)))) with high nodal supports (Figs 5 and 6). In fact, when the smaller subsets of data (PCG) are analyzed under different optimality criteria, the effect is more evident in that different topologies and support values were recovered. It indicates rRNA and tRNA genes provide considerable phylogenetic signal, which stabilized the topology structure of phylogenetic tree. Besides, our results showed that the third codon positions of all PCGs were substantial substitution saturation. Compared topologies generated by other datasets, all genes excluding the third codon positions of PCGs could provide better phylogenetic topologies and these topologies were approximately identical to recent studies of Blattaria based on molecular and morphological characters [8,15]. Former study also found that the inclusion of the third codon positions has a negative effect on phylogenetic reconstructions [58]. Therefore, it's significant to assess the effect on topology of inclusion vs. exclusion of third codons by repetitive analyses in each study.
The phylogenetic analysis with different datasets and inference methods showed some similar topologies among major lineages within the Dictyoptera, and they results strongly supported the monophyly of Blattidae, Blaberidae, Blattellidae, Polyphagidae, as well as the paraphyly of Cryptocercidae + Isoptera (Figs 5 and 6; S3-S8 Figs). Within Dictyoptera, Mantodea was the basal branch in all trees, which has been demonstrated in two studies based on molecular [6] and morphology [12]. However, Lo et al [59] found that Nocticola spp. was a sister group of Mantodea with low support value (<50) when Nocticolidae (Blattaria) was added into the dictyopteran phylogenetic analyses (based on mitochondrial COX2, nuclear 18S, and Histone 3 genes). In addition, our phylogenetic analysis showed a strong support for a sister group relationship between termites and Cryptocercus species. Although previous researches placed termites outside the cockroaches [60] or even used termites as out-groups [61], resent studies indicated that Isopetera is deeply nested within Blattaria as the sister group of Cryptocercidae based on morphological [12] and molecular data [7]. Our study strongly supported the proposal that Isoptera should be classified as a family (Termitidae), or small set of termite families, within Blattodea, as it was first put forward definitively by Inward et al [62].
The relationships among families and genera of cockroaches were still ambiguous. The placement of Polyphagidae was variable among different datasets and analytical methods. Previous studies also provided different perspective on the position of Polyphagidae. Cheng et al [13] supported that Polyphagidae as the the basal group of Blattodea based on mitochondrial PCGs in NJ and MP analyses. Pellens et al [63] placed Polyphagidae as sister to Cryptocercidae + Isoptera + Blattidae based on a combined data set of 12S, 16S, 18S, and COX2. When Nocticolidae was considered into phylogenetic analyses, the Polyphagidae + Nocticolidae were placed as a sister group to Cryptocercidae + Isoptera (based on five gene loci: COX1, COX2, 16S rRNA, 18S rRNA, and 28S rRNA) [7] or to remaining Blattodea (based on combined dataset of 12S, COX2, 28S, 18S, and histone 3) [62]. These conflicting results about the position of Polyphagidae might be caused by different molecular markers and approaches used in the phylogenetic analyses. Because only one species from the family Polyphagidae was included in the analyses, we could not form a conclusive status for Polyphagidae. The lack of Polyphagidae mitogenomes may lead to some deviations among the Blattaria, so further studies are needed with more diverse species included.
The clade (Blaberidae + Blattellidae) has been called as Blaberoidea, which was supported by many studies [13,15,[59][60][61]. Previous studies on the position of Blaberoidea had a variety of conclusions. Most studies sustained Blaberoidea as sister to remaining Blattodea, such as Djernaes et al (based on 5 gene loci) [7], Legendre et al (based on four mitochondrial and two nuclear markers) [15], and Djernaes et al (used both molecular (12S, 16S, COII, 18S, 28S, H3) and morphological characters) [8], but several consistently yielded this clade as sister to Blattidae using mitochondrial COX2 [64], mitochondrial rRNA genes (12S+16S) [60], and 13 mitochondrial PCG genes [13]. Little suggested that Blaberoidea was sister clade to Polyphagidae based on mitocondrial and nuclear genes [14,65]. It is difficult to assess which phylogenetic scheme is more realistic, but our analysis is more in consistent with the most studies that Blaberoidea as basal clade of Blattodea [63,66]. Considering these differing research results, the position of Blaberoidea within Blattaria still remains inconclusive and more complete mitogenomes recruited would have a higher probability to resolve the intractable phylogenetic relationship [67][68].
The family Blattidae in this study included one subfamily Blattinae, three genera. An interesting point to consider was that Shelfordella lateralis (Shelfordella Adelung, 1910) was inserted in the clade Periplaneta (Burmeister 1838), and sister to Periplaneta americana in all trees with high support values. This clustering result in present study was in accordance with several previous studies [11,13,15,61,69], indicating Shelfordella lateralis had close affinity with Periplaneta americana. Inter-generic variation exceeds intra-generic variation to such an extent that a "barcoding gap" exists can be a good way to delimit genera [70]. Levels of genetic divergence in the COX1 dataset (five Periplaneta species and one Shelfordella lateralis) were estimated by calculating K2P genetic distances ( Table 2). The average interspecific genetic distance within the genus Periplaneta was 0.13 (0.076 to 0.170), and the average inter-generic divergence between Periplaneta species and Shelfordella lateralis was 0.13 (0.126 to 0.163). In Maekawa's result [11], the K2P distances of COX2 between Periplaneta species and Shelfordella lateralis (0.119~0.140) were also within intra-generic distance of Periplaneta (0.063~0.140). No barcoding gap was detected in either analysis. In addition, female and male genitalia are excellent genetic and specific characters used in categories [6]. All male Periplaneta and Shelfordella lateralis have symmetrical paraprocts and hypandrium without any armament, and they possess a pair of elongate and fusiform stylis as well as two similarly shaped phallomeres. Historically, this species was originally described as Periplaneta lateralis, and the classification of Shelfordella questionable [71]. Phylogenetic relationship, genetic distance, and morphological characters suggest that this species should be positioned in the genus Periplaneta rather than Shelfordella as presently recognized.