Comparative mitochondrial genome analysis of Dendrolimus houi (Lepidoptera: Lasiocampidae) and phylogenetic relationship among Lasiocampidae species.

Dendrolimus houi is one of the most common caterpillars infesting Gymnosperm trees, and widely distributed in several countries in Southeast Asia, and exists soley or coexists with several congeners and some Lasiocampidae species in various forest habitats. However, natural hybrids occasionally occur among some closely related species in the same habitat, and host preference, extreme climate stress, and geographic isolation probably lead to their uncertain taxonomic consensus. The mitochondrial DNA (mtDNA) of D. houi was extracted and sequenced by using high-throughput technology, and the mitogenome composition and characteristics were compared and analyzed of these species, then the phylogenetic relationship was constructed using the maximum likelihood method (ML) and the Bayesian method (BI) based on their 13 protein-coding genes (PCGs) dataset, which were combined and made available to download which were combined and made available to download among global Lasiocampidae species data. Mitogenome of D. houi was 15,373 bp in length, with 37 genes, including 13 PCGs, 22 tRNA genes (tRNAs) and 2 rRNA genes (rRNAs). The positions and sequences of genes were consistent with those of most known Lasiocampidae species. The nucleotide composition was highly A+T biased, accounting for ~80% of the whole mitogenome. All start codons of PCGs belonged to typical start codons ATN except for COI which used CGA, and most stop codons ended with standard TAA or TAG, while COI, COII, ND4 ended with incomplete T. Only tRNASer (AGN) lacked DHU arm, while the remainder formed a typical “clover-shaped” secondary structure. For Lasiocampidae species, their complete mitochondrial genomes ranged from 15,281 to 15,570 bp in length, and all first genes started from trnM in the same direction. And base composition was biased toward A and T. Finally, both two methods (ML and BI) separately revealed that the same phylogenetic relationship of D. spp. as ((((D. punctatus + D. tabulaeformis) + D. spectabilis) + D. superans) + (D. kikuchii of Hunan population + D. houi) as in previous research, but results were different in that D. kikuchii from a Yunnan population was included, indicating that different geographical populations of insects have differentiated. And the phylogenetic relationship among Lasiocampidae species was ((((Dendrolimus) + Kunugia) + Euthrix) + Trabala). This provides a better theoretical basis for Lasiocampidae evolution and classification for future research directions.


Introduction
Dendrolimus houi Lajonquiere (Lepidoptera: Lasiocampidae), being one of the most abundant phytophagous caterpillar in southern China and some countries in Southeast Asia, voraciously feeds on about 12 species of coniferous trees, including Cryptomeria fortunei, Pinus yunnanensis, Platycladus orientalis, P. kesiya var.langbianensis and Cupressus funebris, causing thousands hectares of dead or dying forests, and it tends to be continuously spreading rapidly [1]. Biologically, different geographical populations of D. houi have different life cycles, host preference and adaptation to local extreme climatic factors [2,3], which might lead to population differentiation or taxonomic controversy based on previous reasearches on D. kikuchii [4] and D. punctatus [5].
Moreover, insects in the family Lasiocampidae are some of the most serious phytophagous pests worldwide, causing the host withering and rapid death, having serious impact on the ecological environment during their outbreaks [6][7][8][9][10][11][12][13][14][15]. Examples include such D. pini infesting Scots pines in Europe [16], D. houi in Yunnan, Sichuan, Fujian and Zhejiang province [2,[9][10], and D. punctatus at fifteen provinces in the south of China [11]. Other speices like D. tabulaeformis, D. kikuchii, D. spectabilis, Euthrix laeta and Trabala vishnou guttata often occur in China as well [8,11], and thus face the stress of multiple and complex host and environmental factors. Consequently, they could potentially evolve in two directions: firstly, some Dendrolimus species may inevitably share the same host species and tend to inhabit the same forest, which might lead to taxonomically mis-discrimination [9] and hybridization [8]. (For example, D. tabulaeformis and D. spectabilis are the subspecies of D. punctatus probably due to hybridization of these three species [11,12]). Or different populations of same species might differentiated and evolved into separate species as a result of long-term adaptation to different hosts and other climatic factors [2]. (For example, D. kikuchii and D. houi is thought to have evolved from ta commmon ancestor, and evolved separately into different species as revealed by phylogenetic analysis [9]). Transcriptome analysis of antenna still showed that they still have high similarity and close phylogenetic relationship, however, they have different sex pheromone components [13], indicating somehow interspecific reproductive isolation. However, how do we identify and evaluate their ecological function and phylogenetic relationship? Fortunately, many previous studies have focused on the taxonomic relationship of Lasiocampidae, especially the taxonomic relationship within genera [5][6][7][8][9][10][11][12][13][14][15], but there are still some species whose taxonomic status is controversial and without a complete consensus [5][6][7][8][9][10][11][12][13][14][15]. Currently, some taxonomic relationships focus on the comparison with Lepidoptera [16][17][18][19], however very few reports have focused on the phylogenetic relationship among Lasiocampidae genera, so the taxonomy remains unclear.
In this study, we obtained the complete mitogenome of D. houi and downloaded the available mtDNA data of nine species (four genera in Lasiocampidae) from the public database, containing some other congenetic species of D. spp., Kunugia undans, E. laeta and T. vishnou guttata to compare mitogenome composition and structure. Then, we reconstructed phylogenetic tree and analyzed phylogenetic relationship between D. houi and other Lasiocampidae species.

Ethics statement
There is no endangered or protected species involved in this study, no specific permissions were required for this serious and widespread forest pest, feeding on leaves of Cryptomeria fortunei and causing thousands hectares of dying and dead forests. Additionally, this study is sponsored and permitted by NSFC (National Natural Science Foundation of China). We confirm that the locations are not privately owned or otherwise protected.

Samples collection mitochondria DNA extraction
Samples of D. houi were collected in Yongtai, Fuzhou, Fujian Province, China. Mitochondrial DNA of D. houi was extracted from muscle tissue using the GENMED Mitochondrial DNA Extraction kit (Genmed Scientifics Inc., Arlington, MA, USA). Muscle tissue of D. houi was crushed under ice bath condition following instruction of Extraction kit. The quality of DNA was assessed using NanoDrop2000, qubit3.0, and 1% agarose gel electrophoresis.

Sequencing and assembling of the mitochondrial genome
After DNA isolation, 1 μg of purified DNA was fragmented and used to construct short-insert libraries (430 bp) according to the manufacturer's instructions (Illumina, Hercules, CA, USA), and then sequenced on the Illumina Hiseq 4000 platform [24]. In order to obtain high quality clean reads, the raw reads were filtered to remove adaptors, the reads containing unknown nucleotide "N" over 10% and the duplicated sequences. Then, the clean reads were assembled into contigs using SOAPdenovo2.04 [25].

Gene annotation and analysis
13 PCGs of D. houi mitogenome were annotated by utilizing the online program ORF, while 2 rRNAs and 22 tRNAs were annotated using the online software MITOS2 Web Server (http:// mitos2.bioinf.uni-leipzig.de/index.py). The unpredicted tRNA genes used tRNAscan-SE [26] to predict secondary structure. We determined the location of each gene, and corrected the annotation based on data from the reported related species mitogenomes. Then, the genome was aligned with the Nr (non-redundant protein sequence), Swiss-Prot (a manually annotated, non-redundant protein sequence), COG (clusters of orthologous groups of proteins), GO (gene ontology) and KEGG (kyoto encyclopedia of genes and genomes) databases by BLAST v2.2.31 with a cut-off e-valve of 10 −5 [2].  Table 1). To obtain the information of gene loss, duplication, rearrangement, and horizontal transfer in Lasiocampidae, multiple genome alignments were conducted using Mauve software [27]. The base composition, codon usage and relative synonymous codon usage (RSCU) frequency within these 10 species of mitogenomes were analyzed by MEGA 7.0 software. The formula for calculating the composition skew in mitogenomes was as follows, AT-skew = (A-T) / (A+T), GC-skew = (G-C) / (G+C).

Phylogenetic analysis
The dataset of 13 PCGs, from these ten species plus Bombyx mori as the reference outgroup, were used to reconstructed the phylogenetic tree. Sequences blast was conducted by MAFFT of Translator X online server, the empty spaces and fuzzy sites were removed by GBlocks, and the single genes were combined to obtain the mitochondrial gene datasets, then an optimal evolution model was calculated by Modeltest 3.7 for the subsequent phylogenetic analysis. We also used RAXML 7.2.6 [28] to build a ML tree (bootstrap value is 1000), and BI analysis was carried out by using Mr Bayes 3.2.2 [29] (Markov chains were run for 1×10 5 generations, sampling every 100 generations) to construct the phylogenetic tree of Lasiocampidae.

Genome structure and nucleotides composition of D. houi
The complete mitogenome of D. houi was 15,373 bp in length (Fig 1), which contained 37 genes (13 PCGs, 22 tRNAs, 2 rRNAs) and a non-coding control region (A+T-rich region) related to replication and transcription. The gene sequences were similar to that of some known related species. Among them, 23 genes were located in J-strand, including 9 PCGs (COI-COIII, ATP6, ATP8, ND2, ND3, ND6, CYTB) and 14 tRNAs, while the remaining 14 genes were located in the N-strand.
The genes in the mitogenome were closely arranged with overlapping and interval phenomena. Typically, there were 7 overlapping regions with total length of 25 bp, and the longest overlapping region was 8 bp, while there were 23 intergenic spacers with a total length of 372 bp, with the longest one 60 bp between COI and tRNA Tyr , followed by another one with 57 bp between tRNA His and ND5. Six regions did not have overlaps or intervals (S1 Table), and the mitogenome structure of D. houi was basically similar to those of most species within Lasiocampidae.
The number of A, T, G and C in the mitogenome within D. houi were 6,321 (41.12%), 5,954 (38.73%), 1,175 (7.64%) and 1,923 (12.51%) respectively, and the content of A was the highest while G was lowest; A+T accounted for 79.85%, while G+C was 20.15%; AT skewness was 0.0299 and GC skewness was -0.2417 (S2 Table). Furthermore, the A+T content at codon site 2 of PCGs (79.70%) was slightly higher than that at the first site (79.43%), while the A+T content at site 3 was the highest (80.43%). The non-coding region was especially A+T -rich, with the total content was as high as 91.85%. AT skewness was negative, which indicated that the content of T was higher than A.
The PCGs sequence was 11,091 bp, accounting for 72.15% of the complete mitogenome. A, T, G, C bases were 3,749 (33.8%), 4,924 (44.4%), 1,239 (11.17%), 1,179 (10.63%), respectively. The A+T content was 78.20%, which was 1.65% lower than that of the whole mitogenome. The complementary G+C content accounted for 21.80%, which was different from the whole mitogenome, with the highest T content and the lowest C content. AT skewness was -0.1356, GC skewness was 0.0248. A total of 4,617 codons (excluding stop codon) were encoded by 13 PCGs within the mitogenome. The most frequently encoded amino acids were Leu (13.26%), Ile (10.83%), Asn (10.66%), Phe (9.75%), and Lys (9.18%) respectively. The frequency of relative synonymous codons showed that the mitogenome of D. houi had obvious bias towards A and T, for example when it encoded the same amino acid, it was preferred to use UUU ( The A+T -rich region was the main non-coding region of mitogenomes, located between rRNAS and tRNA Met genes, with a total length of 319 bp. The A+T content was 91.85%, which was significantly higher than other genes of the mitogenome (S2 Table). There were also typical structural features of Lepidopteran mitogenomes in the CR. The results also showed that there was a 14 bp poly-T stretch with motif ATAGA that was 15,075-15,093 bp downstream of rRNAS gen, and 4 microsatellite-like repeat sequences containing AT, AAT and AAT in this region. Additionally, there were also multiple poly-T stretch in this region (S3 Fig) except for the poly-T stretch at the beginning of replication.

Comparative analysis of mitogenomes of lasiocampidae species
Complete genome alignment using Mauve software was done for 10 species of Lasiocampidae (Dendrolimus spp., 1 Kunugia, 1 Euthrix and 1 Trabala) (Table 1, Fig 2). Generally, most of the genes within these ten species maintain a consistent position and direction, and no rearrangement or inversion events were found in the locally-collinear blocks (LCBs). Interestingly, there were two specific tRNA Arg (Fig 3A) within the mitogemone of K. undans, which was absent among Lasiocampidae (Fig 3B).
The complete mitochondrial genomes of Lasiocampidae ranged from 15,281 to 15,570 bp in length, TVG was 15,281 bp and KU was 15,570 bp ( Table 2). The first genes of these species all started from trnM, and had same direction. Moreover, there was base composition bias toward A and T in these mitogenomes. The A+T content of these mitogenomes ranged from 78.64% to 80.87%, KU was 78.64% and TVG was 80.87% (Table 3).

Phylogenetic relationship
The result showed that the best model of ML and BI trees was GTR+I+G, where 1nL value was -163,261.5358, AIC value was 326,791.0716, and ΔAIC value was 0. Two types of phylogenetic trees were constructed by using the 13 PCGs dataset having the same structure as Fig 4, which indicate that both methods (ML and BI) and results were consistent and reliable. The phylogenetic relationship among four genera was ((((Dendrolimus) + Kunugia) + Euthrix) + Trabala). Interestingly, we found two different phylogenetic trees can be constructed by using two groups data of D. kikuchii. When we use the data of DK1 to construct tree, the relationship was ((((D. punctatus + D. tabulaeformis) + (D. spectabilis + DK1)) + D. superans) + D. houi) (Fig 4A). However, the relationship have changed as ((((D. punctatus + D. tabulaeformis) + D. spectabilis) + D. superans)+ (DK2 + D. houi) (Fig 4B) by using the data of DK2. If two groups of D. kikuchii data were used simultaneously to construct phylogenetic tree, their phylogenetic relationship demonstrated that those two groups of DK data have different genetic relationship with D. houi (Fig 4C).

Discussion
In this study, the complete mitogenome of D. houi was obtained, with a total length of 15,373 bp. The mitogenome of D. houi had similar structural composition and gene arrangement, which indicated that the mitogenome was stable and suitable for the study of phylogenetic relationships. Technically, we obtained the mitogenome by using high-throughput sequencing which was different from previous research [30][31][32][33]. The traditional mitogenome sequencing is mainly Sanger sequencing method based on PCR amplification products [34]. However, this method requires primer information of each segment of the genome, and the experimental process is complex, time-consuming and laborious. With the development of next generation sequencing technology (NGS) [33][34][35], the use of high-throughput sequencing technology has provided great convenience for the rapid acquisition of mitogenome [35][36][37], and the sequencing cost has also been significantly reduced in recent years, which provides an alternative choice for the sequencing of small genomes such as animal mitochondria [37].
Mitogenomes are more comprehensive and accurate in species identification and phylogeny, avoiding artificial biases caused by blindness of a single gene [38][39][40][41][42][43], and even the differences among species can be determined based on genes structure and arrangement, which has higher reliability [31,43]. Methodologically, using the 13 PCGs in the sequencing genome was superior in obtaining reliable results compared with clustering analysis by using individual genes [44][45][46][47][48][49][50], while both two clustering trees by using both methods were consistent, and the  ND2  COI  COII  ATP8  ATP6  COIII  ND3  ND5  ND4  ND4L  ND6  CytB  ND1   DP  ATT/TAA CGA/T ATA/T ATT/TAA ATG/TAA ATG/TAA ATG/TAA ATT/TAA  ATG/T  ATG/TAA ATA/TAA ATG/TAA ATG/TAA   DT  ATT/TAA CGA/T ATA/T ATT/TAA ATG/TAA ATG/TAA ATG/TAA ATT/TAA  ATG/T  ATG/TAA ATA/TAA ATG/TAA ATG/TAA phylogenic relationship among 4 genera in Lasiocampidae was consistent with some previous studies [15,[50][51][52]. In this study, the mitogenome total lengths of D. houi from two geographical populations (Yongtai and Jingdong [10]) were various, which may be caused by individual differences of samples. Furthermore, two original data of D. kikuchii were used to compare mitogenome structures and construct phylogenetic trees. Interestingly, we obtained the different phylogenetic trees concerning the relationship between D. houi and D. kikuchii. The phylogenetic relationship between D. kikuchii and D. spectabilis was closer than that between D. kikuchii and D. houi by using DK1 data (Fig 4A), which was different from previous research [7][8]10]. However, we obtained the same result as previous reports [10] by utilizing DK2 data (Fig 4B), and show D. houi and D. kikuchii have a close relationship ( Fig 4B) and evolved earlier than the other four pine caterpillar species. Obviously, some populations of D. kikuchii from different locations can not be clustered as one group (Fig 4C) whereas DK1 and DK2 data were clustered simultaneously, indicating differentiation occurred due to geographical isolation [4]. These differences were thought to be mostly caused by individual differences or geographic population variation, because the samples of DK1 and DK2 were originally collected originally from Yunnan and Hunan Province respectively [6,14], and some previous studies have shown that insects have generated genetic variation after long-term living in different geographical habitats or from feeding on different hosts [4,5,53]. The degree of genetic differentiation and gene exchange within and between populations have an impact on the geographical populations' genetic diversity of insects. However, even now, little is known about the different geographical populations genetic diversity of D. houi. Effective molecular markers should be used to prove whether there is differentiation among different populations, or to better determine the genetic relationship and variation at the genus level.
In this study, all species looked at have the start codon CGA with COI gene, which were reported in previous research [11][12][13][14][15][16][17][18][19]54,55]. Theoretically, it is considered a typical trait due to it being high conservation in Lepidoptera specicies. Similarly, the stop codon ended with TAA in the majority of PCGs, which was also consistent with the results of most Lepidoptera species [6][7][8][9][10]. Currently, there was only one trnR within the mitochondrial structure of D. houi, which is similar to most species of Lasiocampidae speices, and different from that of K. undans with two trnRs. Therefore, we presume that the structure of two trnRs is unique in Kunugia spp., which requires further clarification because more species' genomes are not available at this time.
Currently, the sequencing of animal mitogenomes is rapidly increasing, but the data of insect mitogenomes are obviously still insufficient. Actually, several species of pine caterpillars periodically occur and outbreak [1][2][3]9,10], endangering local coniferous forests in China [56] under the reasonable climate. Their phylogenetic relationship might be complicated because of homogenization or hybrid of congeneric species. Therefore, the acquisition of the complete mitogenome of D. houi and comparison to Lasiocampidae species may provide more genetic varieties for future research.