Characterization of the Complete Mitochondrial Genome of Cerura menciana and Comparison with Other Lepidopteran Insects

The complete mitochondrial genome (mitogenome) of Cerura menciana (Lepidoptera: Notodontidae) was sequenced and analyzed in this study. The mitogenome is a circular molecule of 15,369 bp, containing 13 protein-coding genes (PCGs), two ribosomal RNA (rRNA) genes, 22 transfer RNA (tRNA) genes and a A+T-rich region. The positive AT skew (0.031) indicated that more As than Ts were present. All PCGs were initiated by ATN codons, except for the cytochrome c oxidase subunit 1 (cox1) gene, which was initiated by CAG. Two of the 13 PCGs contained the incomplete termination codon T or TA, while the others were terminated with the stop codon TAA. The A+T-rich region was 372 bp in length and consisted of an ‘ATAGA’ motif followed by an 18 bp poly-T stretch, a microsatellite-like (AT)8 and a poly-A element upstream of the trnM gene. Results examining codon usage indicated that Asn, Ile, Leu2, Lys, Tyr and Phe were the six most frequently occurring amino acids, while Cys was the rarest. Phylogenetic relationships, analyzed based on the nucleotide sequences of the 13 PCGs from other insect mitogenomes, confirmed that C. menciana belongs to the Notodontidae family.


Introduction
The insect mitochondrial DNA (mtDNA) is a circular DNA molecule, 14-19 kb in size [1]. It contains seven NADH dehydrogenase genes (nad1-nad6 and nad4L), three cytochrome c oxidase genes (cox1-cox3), two ATPase genes (atp6 and atp8), one cytochrome b (cob) gene, two ribosomal RNA genes (rrnL and rrnS), 22 transfer RNA (tRNA) genes and an adenine (A) + thymine (T)-rich region containing some initiation sites for transcription and replication of the genome [2,3]. MtDNA is maternally inherited and is subject to little if any sequence recombination, and is thus, useful for identifying species and characterizing population genetic structure and molecular evolution [4][5][6][7].

PCR amplification, cloning and sequencing
To amplify the whole mitogenome of C. menciana, we designed thirteen pairs of universal primers according to published mitogenomes from other Notodontidae insects, which were then synthesized by SangonBiotech Co., Shanghai, China (Table 2). All PCRs were performed in a 50 μL reaction volume, including 35 μL sterilized distilled water, 5 μL 10 × Taq buffer (Mg2+ plus), 4 μL dNTP (25 mM), 1.5 μL DNA, 2 μL each primer (10 μM) and 0.5 μL (1 unit) Taq (Aidlab Co., Beijing, China). The PCR was performed under the following conditions: an initial denaturation at 94°C for 4 min followed by 35 cycles of 30 s at 94°C, 40 s at 49-58°C (depending on primer combination), 1-3 min (depending on putative length of the fragments) at 72°C, and a final extension step of 72°C for 10 min. PCR products were separated on a 1% agarose gel and purified using a DNA gel extraction kit (Transgen Co., Beijing, China). The purified PCR fragments were ligated into the T-vector (TaKaRa Co., Dalian, China) and then transformed into Escherichia coli DH5α. Recombinants were cultured overnight at 37°C in Luria-Bertani (LB) solid medium containing Ampicillin (AMP), isopropylthiogalactoside (IPTG) and 5-bromo-4-chloro-3-indolyl-D-galactopyranoside (X-Gal). White colonies carrying insert DNA were selected, grown overnight in liquid media, and then sequenced at least three times by Invitrogen Co. Ltd. (Shanghai, China).

Sequence assembly and gene annotation
The final consensus sequence of the mtDNA of C. menciana was performed using the SeqMan II program from the Lasergene software package (DNAStar Inc., Madison, USA). Sequence Table 2. Details of the primers used to amplify the mitogenome of C. menciana.

Primer pair
Primer sequence (5' !3') annotation was performed using the online blast tools in NCBI website (http://blast.ncbi.nlm. nih.gov/Blast). The nucleotide sequences of the PCGs were initially translated into putative proteins on the basis of the invertebrate mtDNA genetic code. These exact initiation and termination codons were identified in ClustalX version 2.0 using reference sequences from other lepidopteran insects. To describe the base composition of nucleotide sequences, we calculated composition skewness as described by Junqueira [12]: The Relative Synonymous Codon Usage (RSCU) values were calculated using MEGA 5.0 [13]. The overlapping regions and intergenic spacers between genes were counted manually.
The tRNA genes were verified using either program tRNAscan-SE Search with the default settings [14] or by manually identifying sequences with the appropriate anticodon capable of folding into the typical cloverleaf secondary structure. Tandem repeats in the A+T-rich region were found with the Tandem Repeats Finder program (http://tandem.bu.edu/trf/trf.html) [15].

Phylogenetic analysis
Twenty lepidopteran mitogenomes were downloaded from GenBank to illustrate the phylogenetic relationships among lepidoptera insects. The mitogenomes of Drosophila incompta (NC_025936) [16] and Anopheles gambiae (NC_002084) [17] were downloaded and used as outgroups. The multiple alignments of the 13 PCG concatenated nucleotide sequences of these lepidopteran mitogenomes was conducted using ClustalX version 2.0. The phylogenetic analysis was performed using Maximum Likelihood (ML) method with the MEGA 5.0 program [13].

Protein-coding genes and codon usage
We found that the 13 Protein-Coding Genes of C. menciana were 11,190 bp in length and accounted for 72.81% of the whole mitochondrial genome. Nine of these PCGs (nad2, cox1, cox2, atp8, atp6, cox3, nad3, nad6 and cob) were coded by the H-strand, while the remaining four PCGs (nad5, nad4, nad4L and nad1) were coded by the L-strand. The AT skew was positive (0.038) indicating the occurrence of more As than Ts. All PCGs started with the canonical putative start codons ATN except for the cox1 gene which started with CGA instead, similar to other lepidopterans [22,23]. Ten genes shared complete termination codon TAA, while three genes used incomplete stop codons (a single T for cox1 and cox2, TA for nad4). The single T as a stop codon for cox1 and cox2 has been reported in the majority of the sequenced lepidopteran mitogenomes, and even in some mammalian mitochondrial genes [20,22].
A comparison of the codon usage of eight mitochondrial genomes from the Lepidoptera reveals they are divided into five superfamilies: four species belonging to Noctuoidea, and four belonging to Bombycoidea, Pyraloidea, Tortricoidea, and Papilionoidea (Fig 2). Our results indicated that Asn, Ile, Leu2, Lys, Tyr and Phe were the six most frequently present amino acids, while Cys was rare. Codon distributions of four species in Noctuoidea are consistency and each amino acid has equal content in different species (Fig 3). All codons were present in the PCGs of the C. menciana mitogenome (Fig 4). This was similar to L. dispar, A. selene and Tyspanodes hypsalis, but differed from A. ipsilon, H. cunea, C. pomonella and Luehdorfia taibai, which lacked the codons GCG&GGC, GCG&GTG, GCG, CGG&CAG&GTG, respectively. Codons with a high GC content are abandoned in other some lepidopteran insects [4,24].

Ribosomal RNA and transfer RNA genes
The rrnL and rrnS gene in C. menciana were located between trnL1 (CUN) and trnV, and between trnV and the A+T-rich region, respectively. The rrnL was 1358 bp while rrnS was 779 bp. The A+T content of the two rRNA genes totaled 83.81%, which is within the previously  range (80.16% in Antheraea pernyi to 85.53% in Lista haraldusalis; Table 3). The AT skew was positive (0.022), while the GC skew was negative (-0.416), similar to that reported for other sequenced lepidopteran mitogenomes [5,25]. The C. menciana mitogenome harbored 22 tRNA genes, ranging from 64 bp (trnR) to 73 bp (trnW). Fourteen genes were encoded on the H-strand with the rest on the L-strand ( Table 3). The tRNA genes were also highly A+T biased (82.13%) and exhibited positive AT-skew (0.026; Table 4). All the tRNAs could be folded into the expected secondary cloverleaf structures except the trnS1 (AGN) gene (Fig 5). In the trnS1 (AGN) gene; its dihydrouridine (DHU) arm simply forms a loop, as is often found in several other insect mitogenomes [26][27][28]. Ten unmatched base pairs of G-U occurred in C. menciana mitochondrial tRNA genes. In addition, the trnA contained a U-U mismatch in the acceptor stem. All of mismatches were located in the acceptor, DHU and anticodon stems. The mismatches were scattered among 10 of the 22 C. menciana tRNA genes, including trnA, trnC, trnQ, trnG, trnL1 (CUN), trnL2 (UUR), trnF, trnP, trnS1 (AGN) and trnV (Fig 5). All of the secondary structures were drawn by the RNAstructure program.

Overlapping and intergenic spacer regions
Eleven overlapping sequences with a total length of 33 bp were identified in the C. menciana mitogenome. These sequences varied in length from 1 bp to 8 bp with the longest overlapping region present between trnW and trnC (Table 3). Other overlap regions included 7 bp between atp8 and atp6, 4 bp between nad4 and nad4L, 3 bp between the trnI and trnQ, with all other overlapping sequences shorter than 3 bp (Table 3). The 7-bp overlap between atp8 and atp6 is common in many Lepidoptera mitogenomes [29,30].
The intergenic spacers of C. menciana mitogenomes, spread over 15 regions and ranged in size from 1 bp to 57 bp with a total length of 205 bp. The longest spacer (57 bp) was extremely A+T rich and occurred between trnQ and nad2. Intergenic spacers in C. menciana were shorter than those in O. lunifer (371 bp over 20 regions) but longer than those in A. selene (137 bp over   13 regions) [5,21]. The 18 bp spacer region between trnS2 (UCN) and nad1 contained the 'ATACTAA' motif. This 7 bp motif is a common feature amongst the 11 species of different families we selected, indicating that this region is conserved and present in most insect mtDNAs (Fig 6A).
The A+T-rich region The A+Trich region harbors the highest A+T content (94.35%), most negative AT skew (-0.060) and most negative GC skew (-0.143). The presence of multiple tandem repeat elements has been reported to be a characteristic of the insect A+T-rich region [31]. For example, in M. separate, the A+T-rich region contains a duplicate 51 bp repeat element that occurs twice [8], while in Cnaphalocrocis medinalis there is a duplicated 25 bp repeat element and in Chilo suppressalis a duplicated 31 bp repeat element [32]. We found no conspicuous long repeats in the A+T-rich region of C. menciana. We did find several short repeating sequences scattered throughout the entire region, including the motif 'ATAGA' followed by an 18 bp poly-T stretch, a microsatellite-like (AT) 8 and a poly-A element upstream of trnM gene (Fig 6B). These sequences are similar to those found in the genomes of other lepidopteran insects [21,[33][34][35]. In addition, the presence of extra tRNA-like structures in the A+T-rich region has been reported in the lepidopteran insects, such as Chinese B. mandarina [31]. In this study, however, we did not detect a tRNA-like structure in the C. menciana A+T-rich region.

Phylogenetic relationships
We reconstructed the phylogenetic relationships among the seven superfamilies of lepidopteran using Maximum Likelihood (ML) method based on concatenated nucleotide sequences of the 13 PCGs. The resulting phylogenetic tree revealed that different species from the same family clustered together (Fig 7). The phylogenetic analyses also showed that C. menciana was most closely related to P. flavescent of the Notodontidae family. Noctuoidea is closely related to Bombycoidea and Geometroidea, but Hepialoidea was a sister group to the other superfamilies. This result is consistent with that described in previous research [4,36]. Further studies using larger sample sizes are needed to confirm these phylogenetic relationships.