Comparative mitochondrial genome analysis of Daphnis nerii and other lepidopteran insects reveals conserved mitochondrial genome organization and phylogenetic relationships

In the present study, the complete sequence of the mitochondrial genome (mitogenome) of Daphnis nerii (Lepidoptera: Sphingidae) is described. The mitogenome (15,247 bp) of D.nerii encodes13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosomal RNA genes (rRNAs) and an adenine (A) + thymine (T)-rich region. Its gene complement and order is similar to that of other sequenced lepidopterans. The 12 PCGs initiated by ATN codons except for cytochrome c oxidase subunit 1 (cox1) gene that is seemingly initiated by the CGA codon as documented in other insect mitogenomes. Four of the 13 PCGs have the incomplete termination codon T, while the remainder terminated with the canonical stop codon. This mitogenome has six major intergenic spacers, with the exception of A+T-rich region, spanning at least 10 bp. The A+T-rich region is 351 bp long, and contains some conserved regions, including ‘ATAGA’ motif followed by a 17 bp poly-T stretch, a microsatellite-like element (AT)9 and also a poly-A element. Phylogenetic analyses based on 13 PCGs using maximum likelihood (ML) and Bayesian inference (BI) revealed that D. nerii resides in the Sphingidae family.


Background
The oleander hawk moth, D.nerii (Lepidoptera: Sphingidae) is one of the most widely distributed species of Sphingidae. It occursin the tropical and subtropical regions ranging from Africa to south-east Asia. It was first reported on Guam in August, 2005 as a plant pest. It feeds on a variety of plant species ranging from shrubs to trees such as Catharanthus, Vinca, Adenium, Vitis, Tabernaemontana , Gardenia, Trachelospermum, Amsonia, Asclepias, Carissa, Rhazya, Thevetia, Jasminum and Ipomoea. While, Nerium oleander has been documented as the most preferred host of the D.nerii. The management of this species is extremely important and that require deep knowledge on its different biological aspects [1]. Although a few studies are a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 available on its ecology, reproduction and development and so on but its genetic characteristics are rarely documented. To improve the management of the D.nerii, it is extremely important to know more knowledge about this pest, particularly its genetic characteristics and phylogentic position. Moreover, the study of mitogenome is an important subject to understand molecular evolution, comparative and evolutionary genomics, phylogenetics, and population genetics [2][3][4].
The metazoan mitogenome is a closed-circular DNA molecule, ranged in size from 14 to 19 kilobases (kb), including intergenic spacers being very short or absent [5]. It contains 13 protein-coding genes (PCGs), 2 ribosomal RNA genes (rRNAs), and 22 transfer RNA genes (tRNAs) [6]. In addition, there is one major non-coding region (control region) that in other Lepidopterans and in invertebrates is named as A+T-rich region because of its enormously high content in Adenines and Thymines. This control region is generally believed to control the initiation of transcription and replication of animal mitogenome [7].
The order Lepidoptera is one of the largest insect orders and includes greater than 160 000 described species that are classified into 45-48 superfamilies [8]. Sphingidae is one of the most diverse superfamilies, and contains 203 genera and 1348 species distributed worldwide. Despite this enormous species diversity, only two complete mitogenomes are available in Gen-Bank (Table 1) [9]. Newly accessible Lepidoptera mitogenomes will provide further insight into our understanding of evolutionary relationships between these species. In this study, we described the complete sequence of the mitogenome of D. nerii and compared it with other Lepidoptera species sequenced to date to highlight evolution of Lepidopterans, particularly, phylogenetic relation-ships of Bombycoidea.

Experimental insects and DNA extraction
The D. nerii specimens were collected from Anhui Agricultural University, Anhui Province, China. The total DNA was extracted using the Genomic DNA Extraction Kit, according to the manufacturer's instructions (Aidlab Co., Beijing, China). The extracted DNA quality was examined by 1% agarose gel electrophoresis (w/v) and used to amplify the complete mitogenome of D. nerii.

PCR amplification and sequencing
We designed twelve pairs of primers from the conserved nucleotide sequences of known mitogenome of Lepidopteran species to determine the D. nerii mitogenome [10,11]. The complete list of successful primer is given in Table 2 (Sangon Biotech Co., Shanghai, China). All amplifications were performed on an Eppendorf Mastercycler and Mastercycler gradient in 50μL reaction volumes, which contained 35μL sterilized distilled water, 5μL 10×Taq buffer (Mg 2+ plus), 4 μL dNTP (25 mM), 1.5 μL extracted DNA as template, forward and reverse primers 2 μL each (10 μM) and 0.5 μL (1 unit) TaqDNA polymerase (Takara Co., Dalian, China). The PCR amplification conditions were as follows: an initial denaturation one cycle at 94˚C for 4 min followed by 38 cycles, one cycle at 94˚C for 30 s, one cycle at 48-59˚C for 1-3 min (depending on the putative length of the fragments), and a final extension one cycle at 72˚C for 10 min. The PCR products were detected by 1% agarose gel electrophoresis (w/v), and were purified using a DNA gel extraction kit (Transgen Co., Beijing, China), and directly sequenced with PCR primers.

Sequence assembly and gene annotation
Sequence annotation was performed using blast tools available from the NCBI (https://blast. ncbi.nlm.nih.gov/Blast.cgi), and SeqMan II program from the Lasergene software package (DNASTAR Inc.; Madison, USA). The protein-coding sequences were translated into putative proteins on the basis of the Invertebrate Mitochondrial Genetic Code. The skewness was measured by the method given by Junqueiraet al. [12], and the base composition of nucleotide sequences were described as: AT skew = [A−T]/[A+T], GC skew = [G−C]/[G+C]. The relative synonymous codon usage (RSCU) values were calculated using MEGA 5.1 [13].
The tRNA genes were determined using the tRNAscan-SE software (http://lowelab.ucsc.edu/ tRNAscan-SE/) [14], or predicted by sequence features of being capable of folding into the typical Thitarodes pui 15,064 NC_023530 [54] cloverleaf secondary structure with legitimate anticodon. The tandem repeats in the A+T-rich region were determined by the tandem repeats finder program (http://tandem.bu.edu/trf/trf.html) [15].

Phylogenetic analysis
To reconstruct the phylogenetic relationship among Lepidopterans, 36 complete or near-complete mitogenomes were downloaded from the GenBank database (Table 1). The mitogenomes of Drosophila melanogaster (U37541.1) [16] and Locusta migratoria (NC_001712) [17] were used as outgroup. The multiple alignments of the 13 PCGs concatenated nucleotide sequences were conducted using ClustalX version 2.0. [18]. Then concatenated set of nucleotide sequences from the 13 PCGs was used for phylogenetic analyses, which were performed using Maximum Likelihood (ML) method with the MEGA version 5.1 program [13] and Bayesian Inference (BI) with MrBayes 3.2 version program [19]. The ML analyses were used to infer phylogenetic trees with 1000 bootstrap replicates. BI analysis as the following conditions: the Markov chains were run for 100,000 generations with trees being sampled every 100 generations. The consensus trees were visualized by FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/) program with adjustable settings.

Genome structure, organization and composition
The complete sequence of the mitogenome of D.nerii is 15,247 bp in length (S1 File and Fig 1), which is well within the range observed in the whole sequenced Lepidoptera species with the Table 2. Details of the primers used to amplify the mitogenome of D. nerii.  ) and a large non-coding-region with high A+T-rich composition that is usually found in most animal mtDNAs ( Table 3). The gene arrangement and orientation of D.nerii mitogenome is trnM-trnI-trnQ that is different from the ancestral gene order trnI-trnQ-trnM [2]. The comparison of D. nerii mitogenome composition and skewness level with other sequenced Lepidoptera species is represented in Table 4. The genome composition of the  formosae) to -0.174 (G. argentata). Moreover the positive AT skewness (0.017) indicates the occurrence of more As than Ts that has also been reported in several other lepidopteran species such as B. mandarina (0.057), H. cunea (0.010) and L. dispar (0.016).

Protein-coding genes and codon usage
The mitogenome of D.nerii contains 13 protein-coding genes. Most protein-coding genes (12 PCGs) begin with ATN (one with ATA, two with ATT, seven with ATG and two with ATC) codons, except for the cox1. The cox1 gene of D.nerii seems to be started with CCA codon as previously documented in Cerura menciana [20] and in Spoladea recurvalis [21]. Several authors have maintained the problematic translational start at the cox1 locus in many insect species, with TTAG, ACG, and TTG proposed as start codons for cox1 [22][23][24]. A most common stop codon of the PCGs is TAA, but an incomplete termination stop codon T is present at cox1, cox2, nad5 and nad4. This has been well documented in other invertebrate mitogenomes and is a common evolutionary feature shared by mtDNA. The single T stop codon was recognized by endonucleases processing the polycistronic pre-mRNA transcription, and produced functional stop codons by polyadenylation from its contiguous PCGs [25]. We analyzed the codon usage among eight Lepidopteran species, of which four belong to Bombycoidea and one each from Noctuoidea, Tortricoidea, Hesperioidea and Geometroidea (Fig 2). The results revealed that Asn, Ile, Leu2, Lys, Phe, Tyr and Met were the most frequently utilized amino acids. There were at least 4 codon families with no less than 60 codons per thousand codons (Leu2, Lys, Met, and Tyr), and 3 families with at least 80 codons per thousand codons (Asn, Ile and Phe) that were observed in the 8 insect species. The rarest used codon family was Arg. Codon distributions of four species in Bombycoidea are in consistency, and each amino acid has equal contents in different species (Fig 3).  The Relative Synonymous Codon Usage (RSCU) was assessed in the PCGs for five available Lepidopteran superfamilies mitogenomes (Fig 4). All possible codon combinations are present in the PCGs of D.nerii except for the GCG. The absence of codons containing high GC content is also a characteristic feature of several Lepidopteran species such as M. sexta (CGG&CGC), H. cunea(GCG), G. argentata (GCG&CGC&CCG), P. atrilineata (CGG), C. pomonella (GCG), H. vitta (GCG), and so on. Further, these codons are likely to be less, and this featureis conserved in insect mitogenomes [20,26].

Ribosomal and transfer RNA genes
The mitogenome of D.nerii includes two rRNA genes usually present in other animals sequenced to date. The large ribosomal gene (rrnL) is 1338 bp long, and resided between tRNA Leu (CUN) and tRNA Val, whereas the small ribosomal gene (rrnS) is only 778 bp long, and located between tRNA Val and A+T-rich region ( Table 3). The A+T content (84.74%) of two rRNAs fall within the range from 80.16% (A.pernyi) to 85.93% (P. atrilineata) of Lepidopterans. Both AT skewness (-0.006) and GC skewness (-0.362) are negative, that is similar to other previously sequenced Lepidopteran mitogenome [6,27].
The D.nerii harbors an entire set of 22 tRNA (ranging from 64 to 71 nucleotides in length) commonly present in most of Lepidopteran mitogenomes. This region is highly A+T biased, accounting for 82.53%, and exhibit positive AT-skewness (0.012), while negative GC skewness (-0.155) ( Table 4). All tRNA spresented the typical cloverleaf secondary structure but trnS1 lacked the DHU stem (Fig 5) similar to several other previously sequenced Lepidopterans [10,28]. Moreover 14 of the 22 tRNA genes were coded by the H-strand and remainder eight by the L-strand.
A total of 15 mismatched bps in the D.nerii tRNAs were identified. Most of them are G-U wobble pairs scatter throughout ten tRNAs (two in acceptor stem, seven in DHU, one in anticodon stem, and one in TψC), a A-A mismatch in the anticodon stem of the trnS1 and three U-U mismatches in acceptor stem of the trnA, trnL2 and trnS1 (Fig 5).

Overlapping and intergenic spacer regions
The mitogenome of D.nerii contains 12 overlapping regions with a total length of 26 bp. The six overlapping regions are resided between tRNA and tRNA (trnW and trnC, trnA and trnR, trnD and trnS1, trnS1 and trnE, trnE and trnF, trnT and trnP), two between tRNA and protein (nad2 and trnW, trnS2 and cytb), and four between protein and protein (atp6 and atp8, atp6  and cox3, nad4 and nad4L, nad6 and cytb). The length of these sequences varies from 1 bp to 7 bp with the longest overlapping region present between atp6 and atp8 (Table 3), which is usually found in Lepidopteran mitogenomes [29,30]. Further, we observed the longest region in ten Lepidopteran species (Fig 6), Which indicates the seven nucleotides sequence ATGATAA is a strikingly, common feature across Lepidopteran mitogenomes [6]. The mitogenome of D. nerii has 12 intergenic spacers in a total of 126 bp with a length varying in 1 to 55 bp. Of which there are four major intergenic spacers at least 10 bp in length ( Table 3). The longest intergenic spacer (55bp) is located between the trnQ and nad2, with an extremely high A+T nucleotides content, this characteristic feature has been frequently described in Lepidopteran mitogenomes [21]. The 19 bp spacer between trnS2 (UCN) and nad1 contains the motif ATACTAA ( Fig 7A) that is highly conserved region and found in most insect mtDNAs, and it seems to be a possible mitochondrial transcription termination peptide-binding site (mtTERM protein) [31].

The A+T-rich region
The A + T-rich region of D.nerii mitogenome is located between the rrnS and trnM with a length of 351 bp that is remarkably shorter than G. dimorpha (848 bp) and longer than S.  (Table 4). This region harbors the highest A+T content (95.16%) in the mtDNA, and most negative AT skewness (-0.126) and GC skewness (-0.413) ( Table 4). We identified several short repeating sequences scattered throughout the entire region, including the motif ATAGA followed by a 17 bp poly-T stretch, a microsate-like (AT) 9 element and a poly-A element upstream of trnM gene similar to other Lepidopteran mitogenomes (Fig 7B). The length of poly-T stretch varies from species to species [6,20], while ATAGA region is conserved in Lepidoptera species [9].

Phylogenetic analyses
To reconstruct the phylogenetic relationship among Lepidopteran insects, the nucleotide sequences of the 13 PCGs were firstly aligned and then concatenated. The phylogenetic analyses showed that D.nerii has a close relationship to M. sexta and S. morio that was well supported from both BI and ML analyses (Fig 8A and 8B). The D. nerii is within the family Sphingidae (Bombycoidea) and clustered with other superfamilies, including the Geometroidea, Noctuoidea, Pyraloidea, Gelechioidea, Papilionoidea, Tortricoidea, Yponomeutoidea and Hepialoidea. Further the analyses revealed that Sphingidae is more closely related to Bombycidae than Saturniidae. Interestingly, Bombycoidea was more closely related to Noctuoideain ML methods, while in BI method Bombycoidea closely related to Geometroidea. These phylogenetic relationships are consistent with previously reportedstudies of Lepidopterans [11,32]. We concluded from the present study that more research on the diverse Lepidoptera species is needed, to be able to understand better the relationships among them. Supporting information S1 File. Mitochondrial genome of Daphnis nerii. (SEQ)