Ancestral Gene Organization in the Mitochondrial Genome of Thyridosmylus langii (McLachlan, 1870) (Neuroptera: Osmylidae) and Implications for Lacewing Evolution

The first complete mitochondrial genome of the lacewing family Osmylidae (Thyridosmylus langii (McLachlan, 1870)) (Neuroptera) was sequenced in this study. The genome is a circular molecule of 16,221 bp containing the typical 37 genes but is arranged in the same order as that of the putative ancestor of hexapod and lacks translocation of trnC as shared by all previously sequenced neuropteran mtDNAs. This reveals that trnC translocation does not represent an organizational synapomorphy in the mitochondrion for the entire Neuroptera clade. Comparative analysis of neuropteran tRNA genes reveals a relatively slow and conserved evolution of the mitochondrion throughout the order. Secondary structure models of the ribosomal RNA genes of T. langii largely agree with those proposed for other insect orders. Nevertheless, domain I of T. langii rrnL is consisted of nine helices rather than eight helices which is typical for neuropteran rrnL. Protein-coding genes have typical mitochondrial start codons, with the exception of COI, which uses the TCG start codon also found in Ithonidae and Chrysopidae. Like other neuropteran insects, the control region is the most AT-rich region and comparatively simple, with little evidence of conserved blocks or long tandem repeats. Considering the issues of base-compositional and branch length heterogeneity, we used a range of phylogenetic approaches to recover neuropteridan relationships and explored the effect of method choice on recovery of monophyly of Neuropterida: ((Neuroptera + Megaloptera) + Raphidioptera). The monophyly of Neuroptera and the more basal position of Osmylidae were also recovered by different datasets and phylogenetic methods.


Introduction
The study of mitochondrial (mt) genome has yielded significant insights into different scientific disciplines including animal health, comparative and evolutionary genomics, molecular evolution, phylogeography, phylogenetics and population genetics [1,2]. The insect mt genome is a circular, double-stranded molecule of 14-19 kb in length that comprises a set of 37 genes for 22 tRNAs, 2 rRNAs and 13 proteins [3]. More than 270 insect mt genomes have been sequenced, representing 26 of the 30 orders, with most from the mega-diverse orders such as Diptera, Lepidoptera, Hemiptera, Coleoptera and Hymenoptera. Based on a series of analyses, mt genomes may be superior to individual nuclear gene or partial mt genes to reconstruct phylogenetic relationships as they possess a number of evolutionarily interesting and potentially informative features such as length variation, altered tRNA anticodons or secondary structures, atypical start codons, base compositional bias, codon usage, and gene rearrangement [4][5][6].
Neuropterida contains at least 6,500 described valid species divided into three orders: Raphidioptera (with slightly more than 200 species in two families), Megaloptera (with about 300 species in two families) and the extremely heterogeneous Neuroptera (with 6,000 species in 15 families) [7,8]. Relationships within Neuropterida have been controversial over time. The traditional view is a sister group relationship between Megaloptera and Raphidioptera, which was corroborated by Wheeler et al. (2001) and Wiegmann et al. (2009) [9][10][11]. This view has been challenged recently with some researchers proposing that Raphidioptera are sister to Megaloptera + Neuroptera [8,12,13]. Megaloptera and Raphidioptera exhibit the typical insect ancestral gene order [14][15][16]. Conversely, Neuroptera with only six mt genomes fully sequenced to date, all share the common feature of translocation of trnC [15,[17][18][19].
Neuropteran family-level relationships have been controversial. The first quantitative cladistic analysis of neuropteran relationships based on adult and larval morphological characters divided Neuroptera into three suborders: Hemerobiiformia, Myrmeleon-tiformia and Nevrorthiformia [20]. The former two suborders were proposed as sister groups and comprise most neuropteran species, while Nevrorthiformia includes the single small family Nevrorthidae, supposedly representing the sister group to the rest of the order. The monophyly of the suborder Hemerobiiformia was not recovered with any satisfactory support when originally coined [20] and has been repeatedly shown to be paraphyletic in all subsequent quantitative analyses based on morphological or molecular data [8,12,13]. In a detailed analysis of Neuropterida using combined molecular and morphological data by , Hemerobiiformia were recovered as a paraphyletic grade containing Nevrorthiformia, with Myrmeleontiformia as a monophyletic derived clade [8].
Here, we present the first complete mt genome sequence of Thyridosmylus langii (McLachlan, 1870), a representative of Osmylidae. Osmylidae are considered one of the more basal groups of Neuroptera, exhibiting many plesiomorphic characteristics [8,11,12,19].  recovered Osmylidae as sister to Sisyridae and Nevrorthidae in a near-basal clade that originated during the Permian [8]. Characterization of the mt genome of Osmylidae provides an important opportunity to understand the sequence of evolution of the mitochondrion in Neuroptera, especially with reference to gene order, insertion and deletion events and ribosomal secondary structure. We provide analyses of the gene order, nucleotide composition, codon usage, compositional biases, RNA secondary structure, the assessment of base compositional and branch length heterogeneity and evaluate the phylogenetic relationships among the three orders of Neuropterida based on a range of phylogenetic approaches. This work reveals that the trnC translocation, which is absent in Thyridosmylus, occurred within higher Neuroptera and cannot be used as a synapomorphy characteristic for all Neuroptera.

Results and Discussion
Genome organization and structure The mt genome of T. langii is a double-stranded circular DNA molecule of 16,221 bp in size and has been deposited in the Genbank under accession number (KC515397). This mt genome contains the entire set of 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a control region (also called A+T-rich region) that are typically present in metazoan mt genomes (Fig. 1). The arrangement of mt genes of T. langii is the same as the putative ancestor of insects [21][22][23][24][25][26] and differs with the other six previously sequenced neuropteran mt genomes which share a common synapomorphic feature of transposition of tRNA Trp and tRNA Cys genes. The rearrangement of tRNA Cys - tRNA Trp -tRNA Tyr (CWY) was considered as a potential synapomorphic character for Neuroptera or some subgroup within the order in the previous study [15]. This opinion is further corroborated in our study that the rearrangement of CWY is just synapomorphic for some subgroup within Neuroptera. The likely explanation for transposition of adjacent genes in Neuroptera involves tandem duplication of the entire region affected by the transposition followed by inactivation and loss of the first copy of the second gene [21].
Genes are encoded by both strands of T. langii mt genome: 23 genes on the majority strand (J-strand) with the rest on the minority strand (N-strand). Eleven pairs of adjacent genes in the mt genome of T. langii overlapped by 1-8 bp and involved a total of 38 bp which make the genome relatively compact (Table S1). Two pairs of overlapping genes are common to all sequenced mt genomes of lacewings: ATP8/ATP6 and ND4/ND4L, and share a near identical seven bp sequence (ATGNTAA), with the exception of the ascalaphids, in which this consensus-overlapped sequence were not found in the ND4/ND4L gene pair.

Transfer RNAs
The mt genome of T. langii consists of all 22 tRNA genes, ranging from 64 to 73 bp. Most of them were detected using tRNAscane-SE [27] with two exceptions of the tRNA Arg and tRNA Ser(AGN) genes. Sequences of these two tRNAs were determined through comparison with previously published neuropteran mt genomes [14,15,18,19]. All tRNAs could fold into the typical cloverleaf structure except for tRNA Ser(AGN) , in which its dihydrouridine (DHU) arm simply formed a loop (Fig. 2).
Based on the secondary structure, a total of 23 mismatched base pairs are found in the T. langii tRNAs: eighteen of them are G-T pairs, which form a weak bond, located in the AA stem (3 bp), T stem (3 bp), AC stem (5 bp) and the DHU stem (7 bp); the remaining five pairs include T-T mismatches (3 bp) in the AA stem of tRNA Leu , the T stem of tRNA Trp and the AC stem of tRNA Leu , respectively; C-T (1 bp) mismatches in the AC stem of tRNA Met ; A-G (1 bp) mismatches in the AA stem of tRNA Trp . Mismatches observed in tRNAs are common in arthropod mt genomes and may be corrected through RNA editing processes [28].
Comparative analyses of secondary structures of neuropteridan tRNAs were undertaken based on generating a multiple alignment of tRNAs across exemplars and the percent of identical nucleotides (%INUC) (i.e. conservation) calculated. All tRNAs exhibited high levels of conversation with the average 89%INUC. The pattern of nucleotide conservation was obviously J strand-biased. Indeed, trnG and trnK, which showed the highest levels of nucleotide conservation (%INUC $95%), were all located on the J strand; while trnR and trnC showed the lowest levels of nucleotide conservation (%INUC<80%) on both strands. Among neuropteran tRNAs, nucleotides at the AA and AC stems and loops were conserved: trnG and trnN showed 100%INUC in these two arms; and variations were largely restricted to the TYC, DHU, and extra loops (Fig. 2). The high levels of nucleotide conservation characterize the low rate of nucleotide substitution and reveal that Neuroptera has in general experienced a slow and conservative evolution.

Ribosomal RNAs
The large and small rRNA subunits (rrnL and rrnS) in T. langii are located at tRNA Leu(CUN) -tRNA Val and tRNA Val -control region, respectively ( Fig. 1 and Table S1). The length of rrnS and rrnL is determined to be 798 bp and 1,325 bp, respectively. The inferred secondary structure models for rrnS and rrnL are provided in Fig. 3 and Fig. 4, respectively. The secondary structure of both rrnS and rrnL are largely in agreement with the secondary structure models proposed for other endopterygote orders [29][30][31]. The secondary structure of T. langii rrnL consists of six structural domains (domain III is absent in arthropods) and 51 helices (Fig. 4), and the rrnS is comprised of three structural domains and 34 helices (Fig. 3). As domain I in rrnL is highly variable among insects and difficult to predict, most previous studies have no proposed models for this region [29][30][31]. Negrisolo et al. provided the first predicted secondary structure of domain I for rrnL of Libelloides macaronius [18]. Compared with this model, nine helices in domain I of T. langii rrnL were predicted rather than eight helices in L. macaronius.
The multiple alignment of neuropteran rrnSs spanned 813 positions and contained 319 variable sites including 168 parsimony-informative sites. Nucleotide variability among domains and helices was unevenly distributed. The helices (H11-176, H18-341) exhibit almost 100%INUC, while some others (i.e. H6-49, H7-63, H31-634) display very low %INUC. The multiple alignment of neuropteran rrnLs extended over 1376 positions and contained 583 variable sites, including 321 parsimony-informative sites. Conserved nucleotides were unevenly distributed throughout the rrnL secondary structure, with the highest level of non-varying positions located on domains IV-V and lowest levels observed in domains I-II (Fig. 4).

Protein-coding genes
The mt genome of T. langii includes the canonical 13 PCGs that are also present in all other sequenced lacewings and most animal mt genomes [15,18,19]. The orientation of PCGs is identical in the mt genomes of all sequenced lacewings: nine in J-strand and four in N-strand. All but one PCG of T. langii initiate with ATN as the start codon (seven with ATG, four with ATT and one with ATC) ( Table S1). The only exception is the COI gene, which uses noncanonical TCG as a start codon. The unusual start condons for the COI gene in Neuroptera is a common feature and include ACG for L. macaronius [18] and Myrmeleon immaculatus [15]; TTA for Ascaloptynx appendiculatus; TCG for Polystoechotes punctatus [17], Chrysoperla nipponensis and Apochrysa matsumurae [19] respectively. It is a widespread phenomenon in Neuroptera and other endopterygote insects that the COI gene often starts with non-standard putative codons [32][33][34]. Nine PCGs of T. langii share the complete termination codons TAA (ND2, ATP8, ATP6, COI, COIII, ND4L and ND6) or TAG (ND1 and ND3), while the remaining four had incomplete termination codons T (ND4, ND5, COII and CytB) (Table S1), which is also common in other lacewings. The truncated stop codons are suggested to be simply a product of the selective pressure to reduce genome size [35,36] and are presumed to be completed via post-transcriptional polyadenylation [37].

Nucleotide composition and codon usage
The nucleotide composition of the T. langii mt genome was biased toward A and T as in most other insects. The A+T content of the whole genome, PCGs, tRNAs, rRNAs and the control region is 76.7%, 74.3%, 77.4%, 80.4% and 88.5%. The base composition of nucleotide sequences can be described by its skewness: the whole genome and the control region exhibit the negative AT-and GC-skews while the rRNAs show the opposite situation; the PCGs and tRNAs have the negative AT-and positive GC-skews (Table S2).
Analysis of base composition at each codon position of the concatenated 13 PCGs showed that the third codon position (88.9%) was higher in A+T content than the first (68.0%) and second (66.2%) codon positions (Table S2). Nucleotide frequencies in all codon positions differ between the two strands. In the Jstrand, the frequencies of nucleotides are T.A.G.C at the first codon position, T.C.A.G at the second codon position and T.A.C.G at the third codon position. In the N-strand, all three codon positions show the common feature (T.A.G.C).
The nucleotide bias was also reflected in the codon usage. Four most frequently used codons, TTA (leucine), ATT (isoleucine), TTT (phenylalanine) and ATA (methionine), were all composed wholly of A and/or T. The two-fold degenerate codon usage also demonstrated bias favoring T/A over C/G at the third codon position on both stands. The four-fold degenerate codon usage presented a definite bias towards thymine at the third codon position of PCGs except Gly (GGN), Arg (CGN) and Ser (AGN), whose most frequently used codons terminated with adenine ( Fig. 5).

Non-coding region
The longest non-coding region (1,358 bp; putative control region) is flanked by rrnS and tRNA Ile with 88.5% A+T content, and is the most AT-rich region in the mt genome of T. langii. Unlike hymenopteran [38] or hemipteran [34] insects, this region in the mt genome of sequenced lacewings is comparatively simple, with little evidence of conserved blocks or long tandem repeats. The abundance of A/T is due to the repeated motifs containing mostly/only A/T of different length. The most abundant AT motif occurs 231 times, AAT occurs 91 times, AATT occurs 62 times, AATTT occurs 24 times, AATTTTT occurs 12 times, and TTATATAT occurs six times. The motifs are represented by shorter identical strings as well as longer consensus patterns sharing 80% of identical positions. When an 80% minimum identity was allowed among repeated motifs, two stings were identified in the control region of T. langii, each containing 41 nucleotides, respectively, from base 224 to base 264 and from 428 to 468. Their consensus sequence was ATTT(T/A)AAA(GA/ AG)AATAAATTTAAAA(T/C)CGCAGTTTCTCCCTAAA. This trend also occurs in other neuropteran insects [18].
In addition, some short tandem repeat sequences were detected in the control region of T. langii, such as (ATTTT) 4 , (TAATA-TAA) 2 , (AT) 4 , (AT) 6 , (AT) 7 , and (AT) 8 . These tandem repeat elements can be considered as microsatellites and are useful in the study of geographical structure and phylogenetic relationship of species [25]. Thus, the utility of the AT-rich region as a phylogenetic marker should be most effective at low taxonomic level (family level and below).
Outside the control region, there are 128 bp non-coding sequences in 13 intergenic regions of T. langii, ranging in size from one to 63 bp (Table S1). Three of them spanned more than 15 bp. The longest spacer sequence (63 bp) exists between tRNA Ile and tRNA Gln which is also present in all part/fully sequenced neuropteran mt genomes, varying from 33 bp to 66 bp; this spacer diverged relatively very fast among different families of the lacewings [18]. The third longest insertion of T. langii is located between ND1 and tRNA Ser(UCN) , which shares the conserved motif (TTAAATNNNNTNAGAT, N = A/T) across Neuroptera. The conservation of this region is also known in many other insects, which is shown to be a binding site for a bidirectional transcription termination factor (DmTTF) [39].

The assessment of base compositional heterogeneity
The total AT% of all included neuropteran species ranged between 74.50% and 79.10% with a mean of 77.63%. The total AT% of megalopteran species ranged between 74.90% and 78.30% with a mean of 76.33%. The AT% of the species from Raphidioptera was 80.30%.
The assessment of base-compositional heterogeneity in the neuropteridan dataset shows that the overall dataset exhibited the complex pattern of compositional bias. To assess the base compositional heterogeneity, we made 55 pair wise comparisons to calculate the disparity index (I D ). When we compared all 13 PCGs simultaneously, 48 comparisons had a statistically significant heterogeneous substitution pattern. In general, the I D was low between the taxa with lower ranges of AT% and very high between the taxa with wider ranges of AT%. Eight of eleven neuropteridan species exhibited a similar base composition with a mean I D between four and six in the dataset (PCG123), suggesting the overall dataset exhibited a high level of base compositional heterogeneity ( Table 1). The dataset (PCG12) (removal of fast evolving sites) shows a lower level of base compositional heterogeneity than the dataset (PCG123) with a mean I D between one and two in most taxa. The I D values calculated from individual codon partitions suggested that the level of base compositional heterogeneity was the lowest in PCG2, followed by PCG1 and PCG3 (Table 1). Among nucleotide sequences, the ranked order of the datasets from best to worst was: PCG2 . PCG1 . PCG12 . PCG3 . PCG123.

Phylogenetic analyses
Base-compositional heterogeneity and branch length heterogeneity are known to be major issues with insect mt genomes in general and neuropteridan/coleopteran mt genomes in particular [8,40]. According to the test of base-compositional heterogeneity in the neuropteridan dataset herein, and previous studies [40,41], the datasets of PCG12 and AA are less likely to violate the 'stationarity' assumptions, which imply that base composition is constant over all lineages in the dataset [42]. To alleviate the influence of base-compositional heterogeneity and any concerns of possible long-branch attraction, we used a range of phylogenetic approaches to recover the neuropteridan relationships and optimized outgroup selection empirically based on short branches with Trachypachus holmbergi (Coleoptera), Adoxophyes honmai (Lepidoptera) and Drosophila melanogaster (Diptera).
Phylogenetic analyses of the four datasets yielded three different patterns of the relationships between the holometabolan orders (Fig. 6). Examination of the trees recovered from these analyses indicated that each analysis produced identical topologies (Fig. 7), with respect to neuropteran and megalopteran relationships. Bayesian posterior probabilities and bootstrap values of ML of major nodes recovered by various phylogenetic approaches are listed in Table 2. Within Neuropterida, a sister relationship between Megaloptera and Neuroptera was recovered in all analyses of different datasets (node = 9) with high support, which is consistent with the results from the previous mt genome phylogeny of Neuropterida [15,16]. The largest difference is the uncertain position of Raphidioptera among the different datasets. The datasets of PCG12 (BI and ML) and PCG123 (BI) recovered identical tree topologies which supports a monophyletic Neuropterida: ((Neuroptera + Megaloptera) + Raphidioptera). The datasets (PCG12R and AA) of both BI and ML analyses fail to support a monophyletic Neuropterida as Neuroptera + Megaloptera are recovered as the unlikely sister group to Diptera.
Among different datasets, PCG12 that exhibits a lower level of base compositional heterogeneity seems most effective at solving the order level relationships. The uncertain position of Raphidioptera that differs in the phylogenetic trees from different datasets may be also influenced by branch length heterogeneity. There exists some variation in the branch lengths within the taxa (Table S3). The families from Megaloptera and Neuroptera show similar uniform branch lengths. The single exemplar of Raphidioptera shows significantly longer branches than that for Megaloptera and Neuroptera. Based on the limited sample of Raphidioptera and the influence of branch length heterogeneity, the position of Raphidioptera is still uncertain in this study. Increased taxon sampling within Raphidioptera, from both families Inocellidae and Raphidiidae, will likely help solve this problem.
The monophyly of Neuroptera is an undoubted hypothesis strongly supported by both larval features [20,43] and molecular data [11] that is also demonstrated in our study based on different datasets and methods. Comparative analysis of 11 complete mt genomes sequenced to date reveals some important and novel insights into the phylogeny and evolution of mt gene organization of Neuropterida. All nodes among Neuroptera were perfectly supported in both analyses except that subtending Mantispidae  and Ithonidae. Due to the limited number of mt genomes sequenced, the family-level relationships within Neuroptera remain still largely unresolved in our analyses, but important patterns are evident. Our data does not support the proposed division of Neuroptera into three suborders (i.e. Nevrorthiformia, Myrmeleontiformia and Hemerobiiformia) by Aspöck et al. (2001) and reflects the results of most subsequent authors where only Myrmeleontiformia are only recovered as monophyletic, with Hemerobiiformia being paraphyletic relative to both Myrmeleontiformia and Nevrorthiformia [8,12,13,44]. The ancestral gene arrangement in Osmylidae presented here represents important additional evidence as to the paraphyly of Hemerobiiformia. In morphological analyses based on characteristics of adults and larvae and on larval and adult head structures [43,45], Osmylidae were placed in a weakly supported clade Hemerobiiformia. This is contrary to all analyses using molecular data separately [12,44] or in combination with morphological data [8] that consistently recover Osmylidae, along with Sisyridae and Nevrorthidae near the base of the phylogeny, clearly reinforcing the apparent antiquity of the origin of this family. Osmylidae, together with Nevrorthidae and Sisyridae, also share an apparent synapomorphy of enlarged female external gonocoxites 9 [13] and have a series of sympleisiomorphies characters indicative of the likely early cladogenesis of these families during the Late Permian or Early Triassic [8]. The basal position of Osmylidae within   Table 2 for different support statistics for that node recovered by different data combinations. doi:10.1371/journal.pone.0062943.g007 Neuroptera is confirmed in our study with high support nodes recovered by different datasets (node = 6). As apparently one of the oldest and most primitive groups of Neuroptera [13], Osmylidae also exhibit the plesiomorphic condition of gene order found throughout the rest of Neuropteroidea (i.e. WCY) (Fig. 7). On the basis of the irreversible process of gene rearrangement and using estimates of divergence times by   [8], then the translocation to the derived gene order in higher Neuroptera may be occurred somewhere around 260 million years ago, after the divergence of the Osmylidae ancestor. Further work is needed to confirm with the more sequenced mitochondrial genomes in Neuroptera to determine the timing of this gene rearrangement more precisely.

Conclusions
This work has added to our current knowledge of gene arrangement of Neuroptera. The first complete mt genome of T. langii (Neuroptera: Osmylidae) presented here displays an ancestral gene organization that lacks translocation of trnC as shared by all previously sequenced neuropteran mtDNAs. This discovery reveals that the trnC translocation is not synapomorphic for the entire neuropteran clade but only for higher Neuroptera. Comparative analyses suggest that gene junctions, tRNAs and rRNAs are comparatively conserved among the neuropteran mt genomes. The mt genomic phylogeny of different datasets and methods herein reconstructed clearly supports the monophyly of Neuroptera, the more basal position of Osmylidae and the sister relationship between Megaloptera and Neuroptera. However, based on the limited taxon sampling and the influence of base compositional and branch length heterogeneity, only the dataset of PCG12 in BI and ML analyses and PCG123 in BI analyses recovered identical tree topologies, which support a monophyletic Neuropterida: ((Neuroptera + Megaloptera) + Raphidioptera). To more fully elucidate phylogenetic relationships within Neuropterida based on comparative analysis of mt genomes, further study is needed based on increased taxon sampling within major groups of Neuropterida, especially representatives of neuropteran families such as Coniopterygidae, Dilaridae, Hemerobiidae, Nevrorthidae and Sisyridae, and well as Inocellidae (Raphidioptera).

Sample origin and genomic DNA extraction
Specimens of Thyridosmylus langii were collected from Motuo, Tibet Autonomous Region, China in August 2011. All specimens were preserved in 100% ethanol and stored at 220uC until DNA extraction. Total genomic DNA was extracted from thorax muscle tissue and the leg muscle tissue using the TIANamp Genomic DNA Kit in accordance with the manufacturer's instructions. Voucher specimens (Nos. NEU-0018), preserved in alcohol, are deposited at the Entomological Museum of China Agricultural University (Beijing).
PCR amplification, cloning and sequencing PCR amplification was performed using a set of insect universal primers [46,47] and primers specifically designed on the Thyridosmylus langii sequences (Table S4). Initially, 13 fragments were amplified using the universal primers from previous work (Fig. 1). On the basis of these short fragments for secondary PCRs, Seven perfectly matching primers were designed by using Primer Premier 5.0 to bridge the gaps.
Short PCRs were conducted using Qiagen Taq DNA polymerase (Qiagen, Beijing, China) with the following cycling conditions: 5 min at 94uC, followed by 35 cycles of 50 s at 94uC, 30 s at 40-55uC, and 1-2 min at 72uC. The final elongation step was continued for 5 min at 72uC. Long PCRs were performed using NEB Long Taq DNA polymerase (New England Biolabs) under the following cycling conditions: 30 s at 95uC, followed by 45 cycles of 10 s at 95uC, 50 s at 48-55uC, and 3-5 min at 65uC. The final elongation was continued for 10 min at 65uC. These PCR products were detected via electrophoresis in a 1% agarose gel and stained with ethidium bromide.
All PCR Products were sequenced in both forward and reverse directions using the BigDye Terminator Sequencing Kit (Applied Bio Systems) and the ABI 3730XL Genetic Analyzer (PE Applied Biosystems, San Francisco, CA, U.S.A.) with two vector-specific primers and internal primers for primer walking.

Assessment of the degree of heterogeneity
To measure the variation in evolutionary pattern, we calculated the disparity index (I D ) [54] for all 13 PCGs together and pairwise. We also tested the homogeneity of substitution pattern (I D test) using a Monte-Carlo method with 1000 replicates as implemented in MEGA 4.0 [51]. We calculated the probability of rejecting the null hypothesis that sequences have evolved with the same pattern of substitution at a,0.01. All positions containing gaps and missing data were removed from the data set (complete deletion option). Also to assess the degree of branch length heterogeneity among Neuropterida, we extracted branch length estimates from the most likely tree, after Maximum-likelihood and Bayesian analysis of the datasets. Branch lengths were conveniently displayed on this tree, using FigTree (v. 1). We then manually calculated the branch length for each taxon, from the tip to the basal node of the tree.

Phylogenetic analysis
Phylogenetic analysis was carried out based on the 13 complete mt genomes of neuropteridan and endopterygote outgroups from Genbank (Table S5) in addition to the one presented here for Osmylidae. Within Endopterygota, the closest living relatives of Neuropterida are the beetles [11,15], so a representative of Coleoptera and two species from Diptera and Lepidoptera respectively were selected for outgroup comparison. All PCGs were aligned at the amino acid level using the default settings in Mega version 4.0 [51]. The alignments were back-translated into the corresponding nucleotide sequences. The stop codons of PCGs were excluded when aligned. Two rRNAs were respectively aligned in Mega version 4.0 [51] as nucleotides. Ambiguously aligned regions of PCGs and rRNAs were checked by eye.
To overcome the base compositional heterogeneity and branch length heterogeneity and explore the effect of method choice on recovery of monophyly of Neuropterida, four datasets were used for phylogenetic analyses: 1) a concatenated nucleotide sequence alignment of PCGs (PCG123); 2) a concatenated nucleotide sequence alignment of the first and the second codon positions of PCGs (PCG12); 3) a concatenated nucleotide sequence alignment of the first and the second codon positions of PCGs and two rRNA genes (PCG12R); and 4) a concatenated amino acid sequence alignment of PCGs (AA). We analyzed these datasets in both Bayesian inference (BI) and Maximum-likelihood (ML) frameworks.
Partitioned ML and Bayesian analyses were run with PCG123, PCG12, PCG12R, and AA matrix, using PHYML online web server [55] and MrBayes Version 3.1.1 [56]. The best-fit model for the amino acid sequence alignment was determined with ProtTest [57] and the jModelTest 0.1.1 [58] was used for the nucleotide sequence of each gene, according to the Akaike Information Criterion (AIC). GTR+I+G model for nucleotide sequence and MtREV model for amino acid sequence were used to optimize the topology. For Bayesian analyses, two simultaneous runs of 5 to 10 million generations were conducted for the matrix and trees were sampled every 1,000 generations, with the first 25% discarded as burn-in. Stationarity was considered to be reached when the average standard deviation of split frequencies was below 0.01. In ML analysis, the parameters were estimated during analysis and the node support values were assessed by bootstrap resampling (BP) [59] calculated with 1000 replicates.