Comparative Mt Genomics of the Tipuloidea (Diptera: Nematocera: Tipulomorpha) and Its Implications for the Phylogeny of the Tipulomorpha

A traditionally controversial taxon, the Tipulomorpha has been frequently discussed with respect to both its familial composition and relationships with other Nematocera. The interpretation of internal relationships within the Tipuloidea, which include the Tipulidae sensu stricto, Cylindrotomidae, Pediciidae and Limoniidae, is also problematic. We sequenced the first complete mitochondrial (mt) genome of Symplecta hybrida (Meigen, 1804), which belongs to the subfamily Chioneinae of family Limoniidae, and another five nearly complete mt genomes from the Tipuloidea. We did a comparative analysis of these mt genomics and used them, along with some other representatives of the Nematocera to construct phylogenetic trees. Trees inferred by Bayesian methods strongly support a sister-group relationship between Trichoceridae and Tipuloidea. Tipulomorpha are not supported as the earliest branch of the Diptera. Furthermore, phylogenetic trees indicate that the family Limoniidae is a paraphyletic group.


Introduction
The animal mitochondrial (mt) genome typically contains 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and a large non-coding region (also referred to as the control region, or CR) [1]. It is being widely used for understanding the phylogenetic relationships, because it can provide more phylogenetic information than shorter individual nuclear genes and multiple genome-level characteristics, such as modes of control of a replication and transcription, RNA secondary structures. Although there has been some criticism of using mt genomes for phylogenetics as the effects of accelerated substitution rate and (represented by Tipulidae alone as Trichoceridae has not been subject to RNAseq analysis yet) represented neither the earliest branching dipteran infraorder, nor the most derived branch of the lower Diptera (= Nematocera) [42][43].
Since there is only one nearly complete mt genome sequence of Tipuloidea (Tipula abdominalis, JN_861743) available in GenBank (as of June 2015) [6], we sequenced and described the first complete and another five nearly complete mt genomes from Tipuloidea (Table 1), representing its four families (Cylindrotomidae, Limoniidae, Pediciidae and Tipulidae sensu stricto). We annotated these genomes and did a comparative analysis of these mt genomics. Using these new sequences, along with published representatives of the Nematocera, we constructed phylogenetic trees of the Tipulomorpha. The implications of the phylogenetic relationship between Trichoceridae and Tipuloidea, and the position of the Tipulomorpha in the lower Diptera were given in this paper.

Ethics statement
No specific permits were required for the specimens collected for this study. The specimens were collected by net. The specimens were common in China and the field studies did not involve endangered or protected species. The species were not included in the "List of Protected Animals in China".

Specimen collection and preparing
All specimens used for DNA extraction were collected from China. The details of the collection information were listed in S1 Table. Specimens were initially preserved in 95% EtOH in the field, and then transferred to -20°C for the long-term storage at China Agricultural University (CAU). Specimens were identified by Zehui Kang (CAU).

DNA extraction, PCR and Sequencing
Thoracic muscle tissues were removed for extraction of whole genomic DNA using the TIA-Namp Genomic DNA Kit (TIANGEN). The mt genomes of six species were amplified using NEB Long Taq DNA polymerase (New England Biolabs, Ipswich, MA). First, fragments of 500-1500 bp were amplified using standard primers conserved across insects [44]. Additional sequences were obtained using taxon-specific primers designed based on these preliminary sequence. The details of primers information are listed in S2 Table. PCR amplification conditions are as follows: a hot-start denaturation step at 95°C for 30sec; 40 cycles of denaturation at 95°C for 10sec; annealing at 40-55°C for 50sec; extension at 65°C for 1kb/min; final elongation step at 65°C for 10min. The quality of PCR products was evaluated by electrophoresis in a 1% agarose gel stained with Gold View nucleic acid stain. Purified PCR amplicons were sequenced in both directions using the BigDye Terminator Sequencing Kit ver. 3.1(Applied Bio Systems) and ABI 3730XL Genetic Analyzer (PE Applied Biosystems, San Francisco, CA, USA) using both amplification and internal primers designed via primer walking.

Bioinformatic and Phylogenetic analysis
Sequences were assembled manually. First, sequences were identified and aligned into contigs using BioEdit version 7.0.5.3 [45]. After fully assembling each mt genome, we identified the protein-coding genes as open reading frames and by alignment with homologous sequences annotated in the mt genomes of 45 published nematoceran species. The tRNA genes were identified using tRNAscan-SE [46], and analyzed with a COVE score cutoff of 1 for identifying all possible tRNA genes.
Those tRNA genes (the tRNA Ser(AGN) of six sequenced species) that could not be identified using tRNAscan and the rRNA genes were identified by alignment with homologous sequences from the 45 published nematoceran species. MEGA 5.0 [47] was used to analyze the nucleotide substitution rates, base composition and codon usage. Nucleotide compositional skew was calculated using the formulae: AT-skew = (A-T)/ (A+T); GC-skew = (G-C)/ (C+G) [48].
A phylogenetic analysis was conducted using a total of 29 species of Diptera as an ingroup and three outgroup species from Diptera's close relatives, Bittacus pilicornis, Westwood (NC_015118) and Boreus elegans, Carpenter (NC_015119) of Mecoptera and Jellisonia amadoi, Ponce-Ulloa of Siphonaptera [22][23]. Details of the species used for phylogenetic analysis in this study are listed in Table 1.
Because tRNA Ile , tRNA Gln and tRNA Met were not sequenced for the 5 species whose incomplete mt genomes were obtained, the phylogenetic analyses only include the remaining 19 tRNAs, 13 PCGs, lrRNA, and a portion of srRNA (the alignment was trimmed to exclude the missing regions of srRNA). Each gene was aligned in MEGA 5.0 [47] based on the annotation procedures proposed by Cameron [49]. Individual genes were concatenated into a single data matrix using SequenceMatrix v1.7.8 [50]. Two datasets were assembled for phylogenetic analyses: the first dataset consisted of the first and second codon positions of the 13 PCGs (PCG12), two rRNAs and 19 tRNAs (PCG12RNA); the second dataset consisted of the first and second codon positions of five PCGs (CO1, CO2, CO3, CYTB and ATP6), which was excluded the remaining eigth difficult aligned genes(5PCG12RNA) [51], two rRNAs and 19 tRNAs (5PCG12RNA). The two aligned datasets were 9815 and 6067 bp long for the PCG12RNA and the 5PCG12RNA matrices respectively. We used PartitionFinder v1.1.1 [52] to select the bestfit partitioning scheme and the substitution models for each partition. The best-fit partitioning scheme for constructing phylogenetic tree is listed in S3 Table. Bayesian inference (BI) was used for phylogenetic analyses. BI was conducted using MrBayes 3.2.2 [53] for 2-4 million generations. We considered that the stationarity was reached when the average standard deviation of split frequencies between runs was below 0.01, which was tested using AWTY [54]. . The mt genome of S. hybrida was complete and the remaining five were nearly complete. The newly sequenced complete mt genome fall within the middle of the size range previously reported for mt genomes from the Nematocera, which ranges from 15,214bp in Ptychoptera (Ptychopteridae) to about 18,600bp in Bittacomorphella (Ptychopteridae) [6]. All mt genomes are typical of insect mt genomes in gene content: 13 protein-coding genes, 22 tRNA genes and two rRNA genes. Gene order is also identical to that of the ancestral insect mt genome, with 23 genes encoded on the majority strand (J-strand), and the remaining 14 genes encoded on the minority strand (N-strand) (Fig 1).

Results and Discussion
Three conserved regions were found in overlapping regions of genes of each sequenced Tipuloidea: AARYYTTA (tRNA Trp -tRNA Cys ), ATGATTA (ATP8-ATP6) and TTAACAT (ND4-ND4L). These conserved regions are also shared with some other Diptera [56]. Furthermore, there were also two non-coding intergenic regions conserved in dipteran insects, which have been shown to be binding sites for a bidirectional transcription termination factor (DmTTF) [6]. The first one is located between tRNA Glu and tRNA Phe and ranges from 19 bp to 32 bp in length. This intergenic region is absent in other insect orders and not completely conserved in Diptera [6]. In Diptera, this intergenic region is present in all Brachycera and some Nematocera, being absent in Culicidae [6]. The second intergenic region is found between tRNA Ser(UCN) and ND1 and ranges from 16bp to 38bp in length (S4 Table). It is highly conserved across insects and similar sequences for this region are present in other orders, such as Mecoptera [6,57].  The control region (CR) is the longest intergenic region in the mt genome. Only one complete control region from S. hybrida was sequenced in this study. It is 897bp in length and located in the conserved position between srRNA and tRNA Ile [56]. It is a medium-sized CR in the Nematocera, where the length of the control region ranges from 369bp in Ptychoptera to about 3.7kb in Bittacomorphella [6]. We did not find any conserved features identified in other insect CRs, such as poly-T stretch, (TA) n like stretch or stem-loop structure at the 3'-end of the control region [56,58]. However, we identified three tandem repeat copies of a sequence within the CR with a total length of 174bp. The second and third repeat units are identical in sequence while the first is much shorter at only 46 bp. Large tandem repeats in the control region are common in the Nematocera, for example, Beckenbach detected such repeats in five nematoceran species (Sylvicola fenestralis; Cramptonomyia spenceri; Protoplasa fitchii; Arachnocampa flava; Bittacomorphella fenderiana) [6].

Base composition
As with other insects, the nucleotide composition of the tipuloid mt genomes are biased towards A and T [6,56]. In general, the AT content of these mt genomes are intermediate for nematocerans, in which AT content ranges from about 73% in the Trichocera (Trichoceridae) to about 83% in Cecidomyiidae [6]. For protein-coding genes, the AT content of N strand genes (average content: 76.8%) is higher than that of the J strand genes (average content: 72.9%). The AT content of PCG third codon positions is much higher than that of the first and second codon positions. For RNA genes, the average AT content (81.5%) of the lrRNA is slightly higher than that of the srRNA (75.9%). Each of the six tipuloid mt genomes overall has a weakly positive AT-skew and a negative GC-skew on the J-strand, while for PCGs T content is higher than A content. Of each codon position in the PCGs, AT-bias is strongest at the second codon position. Statistics also indicated that the AT-bias is stronger in RNA-encoding genes than in PCGs (Table 2).

Codon usage
Codon usage for the six tipuloid species is shown in S5 Table. The AT rich codons TTA (Leu), ATT (Ile), TTT (Phe), ATA (Met), AAT (Asn) and TAT(Tyr) are the most frequently used codons.
Similar to most other Diptera, the most commonly used stop codon in tipuloids is TAA, which was found in 7 of the 13 PCGs (ATP6, ATP8, CO1, CO3, ND2, ND4L, and ND6) for almost all tipuloids. The stop codon TAG is used in almost all the ND1 and also can be found in ND3 and CYTB. All the CO2 genes in tipuloids use the partial stop codon T, and the two remaining PCGs (ND5 and ND4) of tipuloids' mt genomes usually have the partial stop codon T or TA. In all sequenced nematoceran flies, TAA is also the most commonly used stop codon that found in every PCG, especially in ATP6, ND4l, ND6, ATP8, ND1, ND2 and ND3. All the ATP6, ND4l, ND6 of nematoceran flies used TAA as stop codon, and TAG were found in a minority of ATP8, ND1, ND2 and ND3. Two partial stop codons, T or TA, are found in CO1, CO2, CO3, ND4 and ND5, especially in CO1 and CO2. In ND4 and ND5, there are two kind of partial stop codons (T or TA). (Fig 2, S6 Table).

Transfer and ribosomal RNAs
All 22 tRNA genes in S. hybrida and 19 of the 22 tRNA genes in the remaining five tipuloids were identified. The length of mt tRNAs ranges from 64 bp to 72 bp. Most tRNA genes can be folded into a typical clover-leaf secondary structure (Fig 3), whereas tRNA Ser(AGN) is an exception for lacking a DHU arm [59]. Some mispairings (U-U and G-U) are found in tRNAs. For example, four mismatched base U-U pairs and 17G-U pairs are found in tRNA secondary structures in S. hybrida, while no other types of mispairings are found. The mt rRNA genes have frequently not been annotated via the use of functional features, so it is hard to annotate them from their DNA sequences alone [56,[60][61]. Beckenbach has proposed that the start of srRNA is AARGUUUU based on an alignment across dipteran and mecopteran sequences [6]. Hence, we annotated the lrRNA gene as in other dipteran species, where it is between tRNA Leu (CUN) and tRNA Val , while the srRNA gene is flanked at the 3' end by tRNA Val and the motif AARGUUUU. Furthermore, we inferred the secondary structures for lrRNA and srRNA in the Tipuloidea using the sequences of S. hybrida based on the published lrRNA and srRNA secondary structures, the sepsid fly Nemopoda mamaevi Ozerov, 1997 [56]. The secondary structures of lrRNA and srRNA are similar to those in N. mamaevi and other Dipteran species [56,62]. The lrRNA has five structural domains (domain III absent as in other insects) and 42 helices while the srRNA includes 3 domains and 25 helices (Figs 4 and 5).

Phylogeny of Tipulomorpha
The phylogenetic trees based on BI analyses of two datasets are given in Figs 6 and 7. The tree based on dataset PCG12RNA is disordered in four main branches (Fig 6), especially in the infraorder Culicomorpha, which was a well-supported monophyletic clade in previous studies [26,29,30,41,63,64]. Then, we construct another tree based on dataset 5PCG12RNA, which are excluded eight difficult aligned genes (Fig 7). The monophyly of infraorder Culicomorpha is well supported, as well as the Tipulomorpha and the Bibionomorpha. The BI tree of 5PCG12RNA supports the Ptychopteromorpha is the earliest branch within the Diptera, and Psychodidae+Tanyderidae is the sister group to the Brachycera. However, two phylogenetic trees have very similar topologies for the branch Tipulomorpha. The monophyly of Tipulomorpha (Trichoceridae + Tipuloidea) is consistently supported, as is the monophyly of  The earliest linkage of the Nematocera and the phylogenetic position of the Tipulomorpha within the lower Diptera were controversial. Tipulomorpha has been inferred to be the earliest lineage of the Nematocera [26][27]30] or as the most derived branch in the Nematocera [24].  Transcriptome-based phylogenetic studies, however, thought that the Culicomorpha as the earliest branch of the Diptera with the Tipulomorpha in intermediate branch [42][43]. Our BI tree of 5PCG12RNA supports the Ptychopteromorpha as the earliest branch of the Diptera. This result is consistent with Oosterbroek and Courtney's morphological findings [31]. Bertone's phylogenetic tree based on multiple nuclear genes also supports the Ptychopteromorpha topologically as one of the earliest branch of the order [38]. For the position of the Tipulomorpha, all analyses in our study support it as having an intermediate phylogenetic position within the Nematocera, never sister to the remaining flies or to the derived Brachycera.
The composition of the infraorder Tipulomorpha has long been contentious. It has been variously defined to include both Tipuloidea and Trichoceridae or just Tipuloidea [25][26][27][28][29][30]. Our molecular data supports a more traditional conception of Tipulomorpha as containing both Tipuloidea and Trichoceridae, consistent with Hennig's hypothesis [26]. This relationship has also been accepted by some other researchers [35][36][37][38][39]. Beckenbach's mt genome phylogeny, however, failed to give a clear resolution of this question [6]. In Beckenbach's study, one analysis using only a set of less variable major genes (CO1, CO2, CO3, CytB, ATP6 and rRNAs) supported the pairing of these two families, whereas inclusion of all major genes inferred a topology that would define Tipulomorpha as only consisting of Tipuloidea.
Within the Tipuloidea, Starý (1992) considered that the Limoniidae was the sister-group to a clade Pediciidae + (Tipulidae + Cylindrotomidae) according to 11 adult morphological characters [65]. The arrangement of Pediciidae being the sister-group to the remaining Tipuloidea,   ) showed that Cylindrotomidae and Tipulidae were sister-group. However, their placement within Tipuloidea was less certain [66][67]. In our study, both BI analyses support the Pediciidae as the sister-group of the remaining Tipuloidea. The sister relationship between Tipulidae and Cylindrotomidae is also strongly supported in both analyses. These results are concordant with Petersen et al.'s research, which presented a new classification system recognizing a two-family Tipuloidea (Tipulidae and Pediciidae) [67]. Two trees had different topologies across Limoniidae, with the Limnophilinae sister to Chioneinae + Limoniinae and the Chioneinae sister to Limnophilinae + Limoniinae. Anyway, family Limoniidae is not supported as a monophyletic clade and subfamily Limoniinae seems to have a closer relationship with Cylindrotomidae + Tipulidae.
Compared with Beckenbach's study, we come to a steady conclusion on the composition of the infraorder Tipulomorpha. The variation of the composition the Tipulomorpha in Beckenbach's study might be caused by lack of data from other tipuloids, especially the family Pediciidae, which was the sister of the remaining Tipuloidea, in our BI analyses, or in a clade (along with members of the Limoniidae) that is sister to the remaining Tipuloidea. Therefore, we consider that increasing the sampling comprehensiveness, especially the relatively primitive group, can help to give us a more reasonable phylogenetic tree.
In our study, Limoniidae is not a monophyletic clade. The family Limoniidae consists of four subfamilies proposed by Starý (1992), of which the subfamily Dactylolabinae contains only one genus. Although we selected representatives for the remaining three subfamilies, it seems that our current taxon sampling is not extensive enough to build an initial framework of these clades. Therefore, further detailed studies with more taxa are needed before natural families can be confidently defined within the Tipuloidea.
Supporting Information S1