Complete Mitochondrial Genome of Three Bactrocera Fruit Flies of Subgenus Bactrocera (Diptera: Tephritidae) and Their Phylogenetic Implications

Bactrocera latifrons is a serious pest of solanaceous fruits and Bactrocera umbrosa is a pest of Artocarpus fruits, while Bactrocera melastomatos infests the fruit of Melastomataceae. They are members of the subgenus Bactrocera. We report here the complete mitochondrial genome of these fruit flies determined by next-generation sequencing and their phylogeny with other taxa of the subgenus Bactrocera. The whole mitogenomes of these three species possessed 37 genes namely, 13 protein-coding genes (PCGs), 2 rRNA and 22 tRNA genes. The mitogenome of B. latifrons (15,977 bp) was longer than those of B. melastomatos (15,954 bp) and B. umbrosa (15,898 bp). This difference can be attributed to the size of the intergenic spacers (283 bp in B. latifrons, 261 bp in B. melastomatos, and 211 bp in B. umbrosa). Most of the PCGs in the three species have an identical start codon, except for atp8 (adenosine triphosphate synthase protein 8), which had an ATG instead of GTG in B. umbrosa, whilst the nad3 (NADH dehydrogenase subunit 3) and nad6 (NADH dehydrogenase subunit 6) genes were characterized by an ATC instead of ATT in B. melastomatos. The three species had identical stop codon for the respective PCGs. In B. latifrons and B. melastomatos, the TΨC (thymidine-pseudouridine-cytidine)-loop was absent in trnF (phenylalanine) and DHU (dihydrouracil)-loop was absent in trnS1 (serine S1). In B. umbrosa, trnN (asparagine), trnC (cysteine) and trnF lacked the TψC-loop, while trnS1 lacked the DHU-stem. Molecular phylogeny based on 13 PCGs was in general concordant with 15 mitochondrial genes (13 PCGs and 2 rRNA genes), with B. latifrons and B. umbrosa forming a sister group basal to the other species of the subgenus Bactrocera which was monophyletic. The whole mitogenomes will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general.


Introduction
Fruit flies in the genus Bactrocera are potentially destructive pests of commercial fruits and vegetables [1]. Seventy-three species have been documented as economically important in the Pacific Region [2]. Seven out of the nine species are rated as the most serious pests (Category A) [2], these being members of the subgenus Bactrocera; the other two less harmful species are B.  [3]. This pest has a broad geographical range occurring in Pakistan, India, Sri Lanka, Myanmar, China, Taiwan, Thailand, Laos, Vietnam, Malaysia, Singapore, Brunei, and Indonesia, and has been introduced into Hawaii, Okinawa, Tanzania, and Kenya [2][3][4][5][6].
A less serious pest species of the subgenus Bactrocera is B. umbrosa (Fabricius)-one of 16 species in Category C consisting of relatively minor oligophagous or specialist fruit or cucurbit pests [2]. It infests Artocarpus fruits and is widespread from southern Thailand through New Guinea to New Caledonia [2]. Another species, B. melastomatos Drew & Hancock of the subgenus Bactrocera is not known to damage commercial crop plants but infests the fruit of Melastomataceae [7]. It has been documented in India (Andaman Island), Thailand, Peninsular Malaysia, Singapore, and Indonesia (Java, Kalimantan, Sumatra) [7].
There are few reports on the molecular phylogeny of B. latifrons, B. melastomatos and B. umbrosa. Based on 16S rRNA and cytochrome oxidase I nucleotide sequences, B. latifrons shows close affinity to B. umbrosa and is most basal to subgenus Bactrocera [8]. In another study based on cytochorome oxidase I, B. umbrosa forms a sister group with B. facialis while B. latifrons is basal to subgenus Bactrocera [9]. Based on 17 enzyme loci profile using starch-gel electrophoresis, B. melastomatos is distinct from the lineage of B. dorsalis and B. carambolae [10].
To date, the complete mitochondrial genomes (mitogenomes) of six species of the subgenus Bactrocera-B. arecae, B. carambolae, B. correcta, B. dorsalis (incuding the conspecific taxa B. papayae and B. philippinensis), B. tryoni, and B. zonata-are available in GenBank. We report here the mitogenome of three additional species of the subgenus (B. latifrons, B. melastomatos, and B. umbrosa) determined by next-generation sequencing and their phylogenetic relationships with other taxa of the subgenus Bactrocera.

Ethics statement
B. latifrons, B. melastomatos and B. umbrosa are insect pests. They are not endangered or protected by law. No permits are required to study these fruit flies. isolated by standard differential centrifugation method [13] and the mtDNA was extracted using Mitochondrial DNA Isolation Kit (Abnova, Taipei, Taiwan) following the manufacturer's instructions with minor modification. The mtDNA was eluted using 30 ul elution buffer instead of Tris-EDTA (TE) buffer to avoid interference of Ethylenediaminetetraacetic acid (EDTA) with the enzyme such as transposases.

Sample and library preparation
The purified mtDNA was quantified using Qubit dsDNA High Sensitivity Assay Kit (Life Technologies, USA) and normalized to a final concentration of 50 ng (20 ul of mtDNA at 2.5 ng/ul). Library was prepared using Nextera DNA Sample Preparation Kit (Illumina, USA) following the manufacturer's protocols. Size estimation of the library was performed on a 2100 Bioanalyzer using High Sensitivity DNA Analysis Kit (Agilent Technologies). The library was quantified with Qubit 2.0 Fluorometer (Life Technologies, USA).

Sequence and genome analysis
Raw sequence reads were extracted from the Illumina NextSeq 500 system in FASTQ format. The quality of sequences was evaluated using the FastQC software [14]. All ambiguous nucleotides and reads with an average quality value lower than Q20 were excluded from further analysis. The trimmed sequences were mapped against three reference mitogenomes namely, Bactrocera dorsalis (NC_008748), B. tryoni (NC_014611) and B. zonata (NC_027725) using the CLC Genomic Workbench version 8.0.1 (Qiagen, Germany) with mapping parameters of length fraction = 0.6 and similarity fraction = 0.7. The mapped sequences were then subjected to de novo assembly. Contigs greater than 15 kbp were subjected to BLAST [15] alignment against the nucleotide database at National Center for Biotechnology Information (NCBI). Contigs with hits to mitochondrial genes or genomes were identified and extracted using the CLC Genomic Workbench interface.

Mitogenome identification, annotation and visualization
A single contig which blasted as mitochondrial sequence was manually examined for repeats at the beginning and end of the sequence to establish a circular mtDNA. It was then annotated with MITOS [16] followed by manual validation of the coding regions. Open reading frames (ORFs) were predicted using the NCBI ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). The sequin file generated from MITOS was edited and submitted to NCBI according to ORF Finder result and can be accessed at NCBI GenBank using the accession numbers: Bactrocera latifrons KT881556; Bactrocera melastomatos KT881557; and Bactrocera umbrosa KT881558. The circular mitogenome was visualized with Blast Ring Image Generator (BRIG) [17]. Procecidochares utilis NC_020463) were used for phylogenetic comparison. Species of Drosophila (D. incompta NC_025936; D. melanogaster NC_024511; D. yakuba NC_001322) were used as outgroup taxa.

Phylogenetic analysis
The total length of the aligned sequences of each mitogenome comprised of 13 protein-coding genes (PCGs), 2 rRNA genes and 15 mt-genes (13 PCGs, 2 rRNA genes). This data as well as the selected models used for maximum likelihood (ML) and Bayesian Inference (BI) analyses are summarized in Table 1.
Phylograms of 13 concatenated PCGs, 2 rRNA genes and 15 mt-genes were constructed using TreeFinder [24]. Bootstrap values (BP) were generated via 1,000 ML bootstrap replicates. Bayesian analyses were conducted using the Markov chain Monte Carlo (MCMC) method via Mr. Bayes v.3.1.2 [25], with two independent runs of 2×10 6 generations with four chains, and with trees sampled every 200 th generation. Likelihood values for all post-analysis trees and parameters were evaluated for convergence along with burn-in (a specified number of samples from the beginning of the chain to be discarded) using the "sump" command in MrBayes and the computer program Tracer v.1.5 (http://tree.bio.ed.ac.uk/software/tracer/). The first 200 trees from each run were discarded as burn-in (where the likelihood values were stabilized prior to the burn-in), and the remaining trees were used for the construction of a 50% majority-rule consensus tree. Phylogenetic trees were viewed and edited by FigTree v.1.4 [26]. To assess the level of variation, uncorrected pairwise (p) genetic distances were estimated using PAUP Ã 4.0b10 software [27].

Mitogenome analysis and features
The sequencing reads, GC content and base composition of Bactrocera mitogenomes produced by next-generation sequencing on Illumina NextSeq 500 Desktop Sequencer are summarized in Table 2.
The nucleotide compositions of the mitochondrial whole genome, protein-coding genes, rRNA genes and control region of B. latifrons, B. melastomatos and B. umbrosa are summarized in S5-S7 Tables. All three species were A+T rich as expected for mitochondrial genomes. The A + T content for PCGs was lowest in cox1 (61.8% for B. latifrons, 65.0% for B. melastomatos, and 60.5% for B. umbrosa) and highest in nad4l (76.4% for B. latifrons and 74.4% for B. umbrosa) and nad6 (78.7% for B. melastomatos). The A + T content of the non-coding control region was 86.8% for B. latifrons, 89.0% for B. melastomatos and 86.2% for B. umbrosa. For the two ribosomal operons, rrnL had a higher A + T content than rrnS (78.9% vs 74.4% for B. latifrons, 80.2% vs 74.6% for B. melastomatos, and 79.0% vs 73.6% for B. umbrosa). The GC skew content which included the whole genome, PCGs, rRNA genes and control region in the three species were negative indicating a bias toward the use of Cs over Gs. Although the AT skewness value was positive for the whole genome, rRNA genes and control region, it was variable in the individual PCGs.
As     Table 3.

Discussion
Mitochondrial genomes of insects are extensively studied with particular reference to their phylogenetic and evolutionary studies [28]. The use of heterogeneous CAT and CAT 1 GTR models indicates that the complete nucleotide sequences (PCG and PCGRNA) of mitogenome are suitable for resolving higher-level phylogeny of Paraneopteran insects [29]. To date there  Table). They are identical in seven PCGs-nad2, cox1, cox2, cox3, nad4, nad4l, and nad1 (S4 Table). In this study, B. umbrosa differs from the other species in the possession of ATG (instead of GTG) start codon for atp8. B. melastomatos differs from the other species in having ATC (instead of ATT) start codon for nad3 and nad6.
Seven PCGs (cox1, atp6, nad3, nad5, nad6, cob, nad1) have incomplete stop codons in some members of the nine Bactrocera taxa of the subgenus Bactrocera (S4 Table); only TA for cox1 and T for nad1 are present in all the nine taxa. The incomplete stop codons (T and TA) can be converted to TAA by post-translational polyadenylation [30].
Two recent studies on the mitogenomes of Bactrocera fruit flies of the subgenus Bactrocera have reported the sister lineage of B. correcta and B. zonata [41] and that of B. arecae and B. tryoni [10], in addition to the sister lineage of B. dorsalis and B. carambolae. The present results of B. melastomatos being distinct from the lineage of B. dorsalis and B. carambolae agree with earlier finding based on 17 enzyme loci profile using starch-gel electrophoresis [9].
Based on 13 PCGs and 15 mt-genes, B. latifrons and B. umbrosa form a sister group basal to the other members of the subgenus Bactrocera (Fig 2). This finding concurs with that based on rrnL and cox1 sequences [8]. However, it differs from that based on cox1 gene which reveals B. latifrons is most basal to the subgenus Bactrocera but does not form a lineage with B. umbrosa [38]. The species tree differs from the finding based on cox1, rrnL, trnP, nad6 and period genes in which B. latifrons and B. umbrosa do not form a sister lineage [42]. With the inclusion of B. latifrons, the present finding helps to resolve the inference of B. umbrosa (based on cox1, cox2, rrnS and rrnL nucleotide sequences) forming a lineage with B. (Gymnodacus) calophylli instead of with the subgenus Bactrocera [43]. It is evident that a broader taxon sampling and the use of mitogenomes will enable a better understanding of the phylogeny of Bactrocera and other tephritid fruit flies.
In summary, we have successfully sequenced the whole mitochondrial genomes of B. latifrons, B. melastomatos and B. umbrosa by using next generation sequencing technologies. The mitochondrial genome features are similar to other tephritid fruit flies. The phylogenetic species tree based on 13 PCGs is in general concordant with that based on 15 mt-genes. Based on concatenated 13 protein-coding genes and 15 mt-genes of the mitogenome, B. latifrons and B. umbrosa form a sister lineage most basal to the subgenus Bactrocera. The subgenus Bactrocera is monophyletic. The whole mitogenomes will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general.