The Complete Mitochondrial Genomes of Three Bristletails (Insecta: Archaeognatha): The Paraphyly of Machilidae and Insights into Archaeognathan Phylogeny

The order Archaeognatha was an ancient group of Hexapoda and was considered as the most primitive of living insects. Two extant families (Meinertellidae and Machilidae) consisted of approximately 500 species. This study determined 3 complete mitochondrial genomes and 2 nearly complete mitochondrial genome sequences of the bristletail. The size of the 5 mitochondrial genome sequences of bristletail were relatively modest, containing 13 protein-coding genes (PCGs), 2 ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes and one control region. The gene orders were identical to that of Drosophila yakuba and most bristletail species suggesting a conserved genome evolution within the Archaeognatha. In order to estimate archaeognathan evolutionary relationships, phylogenetic analyses were conducted using concatenated nucleotide sequences of 13 protein-coding genes, with four different computational algorithms (NJ, MP, ML and BI). Based on the results, the monophyly of the family Machilidae was challenged by both datasets (W12 and G12 datasets). The relationships among archaeognathan subfamilies seemed to be tangled and the subfamily Machilinae was also believed to be a paraphyletic group in our study.

common small insects living under rocks or dead leaves and were not included in the ''List of Protected Animals in China".

DNA extraction, PCR amplification and sequencing
Total genomic DNA from each individual was extracted using the QIAGEN DNeasy Blood& Tissue Kit following the manufacturer's protocol. DNA samples were stored at −20°C until further use. Based on alignments and comparisons of complete mitochondrial sequences of other bristletails and our previous researches [24], ten pairs of universal primers were designed for amplification of the mitochondrial genomes (S1 Table). All PCRs were performed using a Bio-RAD MJ Mini Personal Thermal Cycler. Based on the sequence information got from earlier PCR by universal primers, several pairs of specific primers were designed to obtain the whole mitochondrial genome. PCR reactions were carried out in a 50 μL reaction volume consisting of 32.75 μL sterile deionized water, 5.0 μL 10×PCR Buffer (Mg 2+ Free), 5.0 μL MgCl 2 (25 mM), 4.0 μL dNTP Mixture (2.5 mM each), 1.0 μL DNA template, 1.0 μL each primer (10 ppm), 0.25 μL Takara Taq DNA polymerase (5 U/μL) with the following conditions: after an initial denaturation at 94°C for 4 min, then 94°C for 40 s (denaturation), 46-62°C for 50 s (annealing), 72°C for 1-5 min (extension) for 35 cyclers, followed by 72°C for 10 min (final extension). Each amplicon (5μL) was examined by agarose gel electrophoresis to validate amplification efficiency. After being purified by Axygen agarose-out kit, all the amplicons were sent to Sangon Biotech Company (Shanghai, China) for sequencing from both directions by using the primers used in the PCR amplification.

Sequence analysis and annotation
All of the typical protein-coding genes were identified by BLAST searches of NCBI database. Sequences were checked and assembled using the programs Seqman (DNASTAR, Inc.) and then adjusted manually. PCGs and rRNA genes were identified based on sequences similarity with published bristletail mitochondrial genome sequences from public database (e.g., Gen-Bank). The base composition, codon usage, and open-reading frames were analyzed using program Mega 5.0 [25]. Translation initiation and translation termination codons were identified based on comparison with the mitochondrial genomes of other known bristletails. The overlapping regions and intergenic spacers between genes were counted manually. The tRNA genes were identified by their cloverleaf secondary structure using tRNAscan-SE Search Server v.1.21 [26] with default settings. The tRNA genes which could not be determined by tRNAscan-SE were determined by sequence similarity to tRNAs of other bristletails. Sequence data had been deposited in the NCBI database, and the accession numbers of Petrobiellus bannaensis, Allopsontus halawuensis, Allopsontus helanensis, Pedetontinus luanchuanensis and Petrobiellus puerensis were KJ754503, KJ754500, KJ754501, KJ754502 and KJ754504, respectively.

Phylogenetic analyses of Archaeognatha and basal Hexapoda
To elucidate the phylogenetic relationships among basal Hexapoda, 35 sequences of the complete mitochondrial genomes were obtained from GenBank database (Table 1). Among the mitochondrial genomes we selected, there were 5 from the earlier published bristletails [2,24,[27][28][29], 3 from another primitive wingless order Zygentoma [30][31][32], 8 from the Paleoptera order: Ephemeroptera and Odonata [33][34][35][36], 4 from the Polyneoptera order: Blattaria [37], Isoptera [38], Mantodea [39] and Orthoptera [40]. To optimize the outgroup selection, we selected 14 entognathan species and 2 Crustacean species as outgroups to construct phylogenetic trees, respectively [28,30,[41][42][43][44][45][46]. The nucleotide and putative amino acid regions for each of the 13 mitochondrial protein-coding genes were aligned using Clustal W in the program Mega 5.0 [25]. All the alignments were analyzed with the program Gblock 0.91b using default settings, to select conserved regions of the putative amino acids [47]. A saturation analysis was performed on the first, second and third codon positions of PCGs using DAMBE 4.2.13 [48]. According to the results from saturation analysis, third codon positions were satured, so they were excluded from the final nucleotide alignment. Phylogenetic analyses were conducted using four methods: neighbor-Joining (NJ), maximum parsimony (MP), maximum likelihood (ML) and Bayesian (BI). GTR+I+G model was chosen for the likelihood and Bayesian analyses. The Maximum likelihood analysis (ML) of the nucleotide alignment was performed using PAUPÃ 4.0b10 with 100 bootstrap replicates [49]. Maximum parsimony analysis (MP) and Neighbor-Joining (NJ) were performed using PAUPÃ 4.0b10. 1000 bootstrap replicates were generated for the MP analysis with 10 replicates with random taxon order. BI analysis was performed using MrBayes, ver.3.1.2 [50]. For Bayesian Inference, 4 independent Markov chains were simultaneously running for 1000000 generations. Trees were sampled every 100 generations and the first 20% of the generations were discarded as burn-in and the remaining trees were used to calculate Bayesian posterior probabilities.
We selected Onychiurus orientalis from Collembola and Campodea lubbocki from Diplura as outgroups to construct the phylogeny of the order Archaeognatha, and for the first time, tried to elucidate the relationships among subfamilies of Archaeognatha using mitochondrial genomes. W12 dataset was constructed based on the nucleotide sequences of all the 13 proteincoding genes, which included all the indels after being aligned; G12 dataset was constructed using the program Gblock 0.91b, this dataset possessed conserved regions of all the 13 proteincoding genes, so all the poorly aligned positions of nucleotide alignment were removed. Then, W12 and G12 datasets were utilized to constructed phylogenetic trees with NJ, MP, ML and BI algorithms, respectively.

Genome organization
The complete mitochondrial genomes of P. bannaensis, A. halawuensis, and A. helanensis were obtained with the size of 15843 bp, 15532 bp, and 15538 bp, respectively ( Table 2). But unfortunately, we were not able to successfully amplify the control region of P. luanchuanensis and P. puerensis with the length of amplified sequences of 15196 bp and 14562 bp, respectively. The length of the five mitochondrial genomes were all within the range of typical size for metazoan mt genomes, which was considered to be 14-18 kb [16]. All the 5 mitochondrial genomes contained 13 protein-coding genes (CO1-3, ND1-6, ND4L, atp6, atp8 and cob), a small subunit ribosomal RNA gene (rrnS), a large subunit ribosomal RNA gene (rrnL), and 22 transfer RNA genes, and 23 of which were transcribed on the majority J-strand and the others mapped to the minority N-strand. The gene orders of the five mitochondrial genomes were congruent with Drosophila yakuba, which was regarded as the ancestral metazoan mitochondrial genome [51].
In the mitochondrial genome of A. halawuensis, there was a total of 43 bp gene overlap, which was the highest in the 5 mt genomes, and the lowest was A. helanensis with a total of 20 bp overlap. The 43 bp gene overlap in 9 positions of A. halawuensis mt genome was ranging in size from 1 to 16 bp and the longest is a 16 bp overlap between tRNA Pro and ND6, which was also the longest overlap between individual genes of the 5 mt genomes. Compared to previously published 5 bristletail mt genomes, the gene overlap occurred between atp8 and atp6 seemed to be common across bristletail mt genomes and this 7 nucleotides "ATGATAA" overlap also existed in other insects [52][53].
In addition to the large non-coding region, several small non-coding intergenic spacers were also found in the 5 mitochondrial genomes. The non-coding regions of mtDNA of P. bannaensis is 187 bp in total, which is the highest in the 5 mt genomes, and made up of 19 intergenic spacer sequences, ranging in size from 1 bp to 34 bp. There were two relatively large spacers flanked by the protein-coding genes CO1 and ND1 in the P. bannaensis mt genome. Interestingly, the intergenic spacers flanked by ND1 gene seemed to be a common feature across bristletail mitochondrial genomes. The total nucleotides of intergenic spacers between tRNA Ser2 , ND1 and tRNA Leu1 in P. bannaensis, A. halawuensis, A. helanensis and P. puerensis were 38 bp, 19 bp, 41 bp and 36 bp, respectively. But notably, such feature was not found in the P. luanchuanensis mt genome, a 8 bp overlap existed between tRNA Ser2 and ND1 gene.

Protein-coding genes
13 protein-coding genes were identified in the 5 mitochondrial genomes of P. bannaensis, A. halawuensis, A. helanensis, P. luanchuanensis and P. puerensis, with characteristics similar to previously published 5 bristletail mitochondrial genomes. Among the 13 protein-coding genes, 4 genes located on the minority N-strand, and the others were transcribed on the majority Jstrand. The commonly used protein-coding gene start codons were ATN, GTG [54] and TTG [55], but there was also reported no typical ATN start codon for CO1 gene in mitochondrial genomes of some other species, such as CCG [56], CGA [57], ACC [36], ACG [58], TTA [59] and even a tetranucleotide codon, ATTA, had been proposed in the mitogenome of Locusta migratoria [60]. All the 13 protein-coding genes of the 10 bristletail mitochondrial genomes (5 from our work and 5 from NCBI database) started with typical ATN and GTG (Table 2). In the mitochondrial genome of P. bannaensis, 6 genes shared ATG initiation codon, 3 shared ATT and ATA respectively, and only 1 gene started with ATC. With the similar characteristics in P. bannaensis mitochondrial genome, most protein-coding genes in bristletail mt genomes started with ATG and ATT. While interestingly, 7 protein-coding genes in N. australica (from family Meinertellidae) mt genome had a ATA start codon, which was quite different with the other 9 species.
In the P. bannaensis and P. puerensis mitochondrial genomes, all the PCGs terminated with the conventional stop codons TAG or TAA. While in terms of the A. halawuensis mt genome, 4 genes (CO1, CO2, CO3 and ND5) had incomplete stop codon T adjacent to downstream tRNA genes. And the incomplete stop codon T was also found in A. helanensis (CO1, CO2 and ND5) and P. luanchuanensis (CO2, CO3 and ND5) mitochondrial genomes. The presence of incomplete stop codon was common in metazoan mitochondrial genomes, and these truncated stop codons was presumed to be complete by post-transcriptional polyadenylation [61].
The A+T content of protein-coding gens, excluding stop codons, was 67.3%, 66.6%, 66.9%, 72.2% and 69.6% in P. bannaensis, A. halawuensis, A. helanensis, P. luanchuanensis and P. puerensis, respectively. In the 5 mitochondrial genomes, the first, second and third codon positions was calculated separately and the third codon positions seemed to have the highest A+T content (75.7%, 75.5%, 75.8%, 86.9% and 81.3%, respectively), the strongest bias toward T was in the second codon positions of the 5 species (Table 3).

Transfer RNA, Ribosomal RNA genes and A+T rich region
The complete set of 22 tRNAs (one specific for each amino acid and 2 each for leucine and serine) found in typical metazoan mitochondrial genomes were identified in P. bannaensis, A. halawuensis, A. helanensis, P. luanchuanensis and P. puerensis. Among these tRNA genes, 14 tRNAs were encoded by the J-strand with the rest by the N-strand. All the tRNA genes could be folded into the typical cloverleaf structure using tRNAscan-SE 1.21 except for tRNA Ser (AGN), in which its dihydrouridine (DHU) stem simply formed a loop, as was often found in other insect mitochondrial genomes. The anticodons of the 5 species were identical to those of the respective tRNAs found in previously published bristletail mitochondrial genomes. Unmatched base pairs had been detected in stems of tRNA secondary structure. Among the 5 mitochondrial genomes, P. bannaensis possessed 42 unmatched base pairs, consisting of 35 G-U pairs, 2 U-U and 3 U-C mismatches, which was the highest compared to other 5 species. Except for G-U, U-U, U-C pairs and mismatches, the other types of mismatches like A-C and A-A had also been identified. tRNA gene cluster rearrangement occurred between protein-coding genes ND3 and ND5 had been found in other bristletails. In the mt genomes of T. alternatus and P. brevistylis, the tRNA gene orders between ND3 and ND5 were tRNA Arg -tRNA Asn -tRNA Ser1 -tRNA Glu -tRNA Ala -tRNA Phe and tRNA Ala -tRNA Arg -tRNA Asn -tRNA Ser1 -tRNA Glu -tRNA Tyr -tRNA Phe , respectively. But in our study, there was no tRNA gene rearrangement that had been detected, and the 5 bristletail species owned the typical insects tRNA gene order within ND3 and ND5, which was tRNA Ala -tRNA Arg -tRNA Asn -tRNA Ser1 -tRNA Glu -tRNA Phe .
The ends of ribosomal RNA genes were impossible to be precisely determined by DNA sequencing alone, so the ends of rRNA genes were assumed to be at the boundaries of flanking genes. As in the 5 mitochondrial genomes, the lrRNA gene was between tRNA Leu1 and tRNA-Val , while the srRNA was flanked by tRNA Val and A+T rich region and both genes were encoded on the minor strand. The length of srRNA in P. bannaensis mitochondrial genome was 848 bp, which was relatively higher than that of A. halawuensis (813 bp) and A. helanensis (806 bp) mt genomes. We could not determine the length of P. luanchuanensis and P. puerensis because of our inability to successfully amplify the srRNA gene and A+T rich region. The lengh of lrRNA gene was similar to other bristletail mt genomes.
The largest non-coding region of bristletail mitochondrial genome was the control region, which was located between srRNA and the tRNA Ile -tRNA Gln -tRNA Met gene cluster. Compared to previously published bristletail mitochondrial genomes, the length of the control region ranged from 538 bp (N. australica) to 1149 bp (T. alternatus), and the size of control region in this study were well within this range. The control region length variation was probably due to tandem repeat regions and differences in copy numbers.

Phylogeny of Archaeognatha and basal Hexapoda
The Archaeognatha consisted of only 2 extant families and approximately 500 species: Meinertellidae and Machilidae [3]. The Machilidae was considered to possess more primitive features (e.g. relating to ovipositor, penis, parameres, coxal vesicles and tarsomeres) than the Meinertellidae. Kaplin subdivided the family Machilidae into 3 subfamilies: Machilinae, Petrobiinae and Petrobiellinae [5]. And Sturm proposed the monophyly of the 3 subfamilies had to be proved by additional arguments [4]. We performed phylogenetic analysis with nucleotide sequences of 13 mitochondrial protein-coding genes from 10 archaeognathan species and 2 outgroup species (Onychiurus orientalis [30] and Campodea lubbocki [46]). Four phylogenetic algorithms (NJ, MP, ML and BI) yielded 2 datasets ( Fig. 1 and Fig. 2): W12 (13 PCG nucleotide sequences with poorly aligned positions of nucleotide alignment) and G12 (13 PCG nucleotide sequences analyzed by Gblock 0.91b). The 2 datasets showed the following features: firstly, the family Machilidae seemed to be a paraphyletic group, which was not supporting the taxonomic classification by analysis of morphological data. In the W12 dataset, all the analyses support this view, while in the G12 dataset, both NJ and BI analyses yielded the same results. Secondly, the monophyly of the subfamily Machilinae was also challenged by all analyses in this paper. The group consisting of (S. xinxiangensis + (A. halawuensis + A. helanensis)) was strongly supported as being monophyletic across all analyses, but another Machilinae species T. alternatus was not in this clade. The main morphological characters for the diagnosis of this subfamily by Sturm were plesiomorphies, so he proposed a subdivision or an additional diagnosis was required, and our results suggested the phylogenetic relationships among Machilinae was not congruent with morphological taxonomy. Thirdly, the Machilidae subfamily Petrobiellinae seemed to have a closer relationship with the other family Meinertellidae. P. puerensis and P. bannaensis were clustered with the Meinertellidae species N. australica in most analyses with relatively high nodal supports. This subfamily comprised only one genus Petrobiellus, and besides some plesiomorphies, this genus shared an apomorphic feature with the Petrobius group: the weak and indistinct separation of the distal mandibular teeth. Sturm suggested that this feature could indicate the Petrobiellus group and Petrobius were monophyletic or the parallel evolution had taken place [3]. But in our analyses, the genus Petrobiellus and Petrobius were not in the same clade, interestingly, the 2 Petrobiellus species formed a monophyletic clade with the Meinertellidae species N. australica in most analyses (all analyses in W12 dataset and NJ, BI analyses in G12 dataset). Our result may provide evidence for resolving phylogenetic relationships of Archaeognatha, although the currently limited taxon sampling showed the preliminary nature of this analysis. The unsampled groups should be included in further studies to provide a more comprehensive phylogenetic estimate.
To elucidate the phylogenetic relationships among basal Hexapoda, 35 sequences of the complete mitochondrial genomes were obtained to construct phylogenetic trees using 14 Entognathan species and 2 Crustacean species as outgroups, respectively. Based on the results, the monophyly of Archaeognatha was an undoubted hypothesis strongly supported by both morphological features and molecular data that was also demonstrated in our study based on mitochondrial genomes (Figs. 3, 4, 5 and 6). The topologies showed the similar characteristics with datasets W12 and G12, corroborating the paraphyly of Machilinae and other features demonstrated by those two datasets. The traditional taxon Thysanura was the combination of bristletails and Zygentoma species. However, it had been recognized as an unnatural group in recent years. The topologies showed that silverfish (Zygentoma) were more closely related to the winged insects (pterygota) than to bristletails. Most of our analyses yielded a topology of (Ephemeroptera + (Odonata + Neoptera)), but both NJ analyses yielded a (Odonata + (Ephemeroptera + Neoptera)) clade (S1 Fig. and S3 Fig.), hence, the phylogenetic relationships between the two Paleoptera orders needed a further investigation. The monophyly of Pterygota and Neoptera was also obtained from our analyses. The monophyly of Collembola were verified, but the phylogenetic position of Diplura was challenged by our datasets. All the analyses except for both ML analyses (with relatively low support values) showed that Diplura was a paraphyletic group (S2 Fig. and S4 Fig.). Three dipluran species: Japyx solifugus, Campodea lubbocki and Campodea fragilis were utilized for the phylogenetic analyses. Japyx solifugus seemed to have a closer relationship with Archaeognatha and formed a monophyletic group with true insects. The monophyly of Diplura had been questioned because the subgroups Campodeoidea and Japygoidea differed in cerci, sperm morphology and ovarian structure [62], but some analyses based on molecular data had led to incongruent results [63]. In addition, the phylogenetic position of Diplura was controversial [64]. Our phylogenetic estimate differed from that obtained by Luan et al. in their analysis of ribosomal RNA gene sequences. They inferred the relationships (Collembola + (Protura + Diplura)), with Diplura a monophyletic group, which was in accordance with our ML tree outgrouped by two Crustacean species (nodal support was relatively low in our analysis). Most other researches based on molecular data supported the monophyly of Diplura. Gao et al. [65] also retrieved a (Collembola +   The Paraphyly of Machilidae and Insights into Archaeognathan Phylogeny (Protura + Diplura)) clade using 18S and 28S rRNA gene sequences. But he stated that the nonstationarity of nucleotide frequencies among dipluran and proturan sequences because of the high content of GC, so their result might not be solid. Given the situation above, we suggested a more detailed research, including more dipluran taxa and more characters, would be necessary for a better understanding of the phylogenetic relationships of Diplura and of the subgroups within basal Hexapoda.