Novel Detection of Insecticide Resistance Related P450 Genes and Transcriptome Analysis of the Hemimetabolous Pest Erthesina fullo (Thunberg) (Hemiptera: Heteroptera)

Erthesina fullo (Thunberg, 1783) is an economically important heteropteran species in China. Since only three nucleotide sequences of this species (COI, 16S rRNA, and 18S rRNA) appear in the GenBank database so far, no analysis of the molecular mechanisms underlying E. fullo’s resistance to insecticide and environmental stress has been accomplished. We reported a de novo assembled and annotated transcriptome for adult E. fullo using the Illumina sequence system. A total of 53,359,458 clean reads of 4.8 billion nucleotides (nt) were assembled into 27,488 unigenes with an average length of 750 bp, of which 17,743 (64.55%) were annotated. In the present study, we identified 88 putative cytochrome P450 sequences and analyzed the evolution of cytochrome P450 superfamilies, genes of the CYP3 clan related to metabolizing xenobiotics and plant natural compounds, in E. fullo, increasing the candidate genes for the molecular mechanisms of insecticide resistance in P450. The sequenced transcriptome greatly expands the available genomic information and could allow a better understanding of the mechanisms of insecticide resistance at the systems biology level.


Introduction
To understand the diversification and evolution of life, especially for diversified radiated insects, more transcriptomic data for non-model organisms is crucial. This is a serious situation for the Heteroptera group, which has no important model organism. The transcriptomic data of very few species has been reported [1][2][3][4][5]. With the amazing development of high-throughput sequencing technology and sharp declines of the cost per species, transcriptomic analysis is becoming a possible and efficient way to achieve large-scale genetic data of insect genome with high heterozygosity at low cost. The transcriptome of Erthesina. fullo will be critical for further genome sequencing and annotation, as well as the inherent defense mechanisms of traditional control and resistance development mechanisms.
Erthesina fullo (Thunberg, 1783), the yellow marmorated stink bug, is one of the most widely distributed phytophagous pests in East Asia and has caused severe loss to many commercially important fruits, such as apples, cherries, and pears [6,7]. In addition to an agricultural pest, E. fullo may invade home and indoor spaces as adults before reappearance in spring according to the observation in the lab. It is well-known that the brown marmorated stink bug (BMSB, Halyomorpha halys) is an invasive insect native to Asia and has poses a considerable ecological and economical threat to North America and Europe [8]. According to the research and investigation [6,7], the occurrence regularity and plant hosts of E. fullo are quite similar with those of H. halys. Given proper condition and circumstances, E. fullo would constitute an ecological and economic danger to the world as a representative of a group of agriculturally significant true bugs. However, only three nucleotide sequences (COI, 16S rRNA, and 18S rRNA) were found in the GenBank database.
The cytochrome P450 enzymes, or CYP genes, are one of the largest gene families, which have corresponding representatives in almost all extant lineages, from bacteria to animals [9]. Their important tasks include the metabolism of alien chemicals of natural or synthetic origin which make them quite important in the research of insecticide resistance mechanisms to control devastating economic pests [10]. Like other detoxifying enzymes, the diversified gene families, by which the cytochrome P450 superfamily of monooxygenases are encoded, are difficult to entirely characterize. Phylogenetic analysis of insect P450s protein sequences revealed that they are distributed into four major clades [11] which are strongly supported by bootstrap analysis (CYP2, CYP3, CYP4 and Mitochondrial CYP clans). Nevertheless, most of the related research on P450 focuses on the model organisms or the world-wide pests [12,13]. The diversified constitution and function of the non-model organisms or local pests remains mysterious.
Current resistance mechanism research in insects posits that insecticide resistance commonly comes up through two chief mechanisms, mutating target site of certain insecticide to reduce their mutual binding [14] (e.g. the voltage-gated sodium channel for pyrethroids) or enhancing the metabolism or sequestration of insecticide by functional enzymes such as cytochrome P450 monooxygenases, carboxyl/choline esterases (CCEs) and glutathione-Stransferases (GSTs) [15][16][17][18][19][20]. In the case of P450s for example, CYP6G1 was linked to insecticide resistance in DDT-resistant Drosophila melanogaster Meigen (Diptera: Drosophilidae) [21], and CYP6Z1 in the mosquito malaria vector Anopheles gambiae Giles (Diptera: Culicidae) was capable of directly metabolizing DDT [22]. However, due to insufficient genomic information and the complexity of detoxification gene families and pathways involved in resistance, the cytochrome P450-induced insecticide resistant mechanism is not well understood and pest control efforts are therefore hampered.
Here, we used the Illumina sequencing system to generate a de novo assembled and annotated transcriptome for adult E. fullo, a species that has lacked genomic resources. We also identified and analyzed the evolution of cytochrome P450 superfamilies using the Maximum Likelihood method and Bayesian inference of likelihood. The outcome of this study remarkably increase the genome-level information and will potentially contribute to improving our understanding of this pest at the systems biology level.
This location did not require a specific permission for the field studies, and our study did not involve endangered or protected species either. After wild collection, fresh tissue was stored in liquid nitrogen to prevent total RNA from degradation. After RNA extraction and enrichment, cDNA library construction and library normalization, the transcriptome was sequenced with the Illumina Hiseq 2000 system, which is widely employed for large-scale transcriptome data acquisition. After data acquisition, we used FastQC (http://www.bioinformatics.babraham.ac. uk/projects/fastqc) to assess the quality of the transcriptomic data. An unpublished integrative program based on Bioconductor was utilized to conduct the read cleaning including multiple N (stands for unknown nucleotide) base, low quality region on both ends (quality score less than Q20), possible mixed adaptor and contaminated rRNA and virus sequences. After quality assessment and read cleaning, reads are assembled using Trinity [23].

Bioinformatic Analysis
Unigene function annotation was screened against the NCBI Non-redundant protein sequences (Nr), SwissProt, Kyoto Encyclopedia of Genes and Genomes (KEGG), and Cluster of Orthologous Groups of proteins (COG) protein databases using BLASTX with a cutoff E-value of 10 -5 . Unigene Gene Ontology (GO) classification was executed with Blast2GO using default parameters. The Clusters of Orthologous Groups of proteins (COG) were a framework for functional transcriptome or genome analysis based on homology, which were delineated by comparing protein sequences encoded in representative complete genomes of major phylogenetic lineages. Unigene GO classification was executed with the Java-based software Blast2GO [24][25][26][27] using default parameters. The protein coding region prediction of unigenes that had no hits when searching against protein databases (ordered by priority: Nr, SwissProt, KEGG and COG) was executed with ESTScan [28] while metabolic pathway analysis was processed with the KEGG annotation system.

Gene Family Evolution
We aligned amino acid sequences using MAFFT v. 7.130 [29] for each mutually found orthologous gene. Refinement was executed with MUSCLE v. 3.8.31 [30,31] for each gene. The cytochrome P450 clustering was inferred using the Maximum Likelihood method and Bayesian inference of likelihood, based on the best model given by ProtTest 3.4 [32] with the Akaike Information Criterion (AIC) method, the Bayesian information criterion (BIC) and the hierarchical likelihood-ratio tests [33]. The analysis involved 203 amino acid sequences and a total of 806 positions in the final data matrix, 88 of which belonged to E. fullo while the others came from other well-explored taxons. Evolutionary analyses were conducted in RAxML v8.0 [34] with 1000 rapid bootstrap replicates and checked a posteriori if the computation of bootstrap trees was sufficient using the bootstrapping criteria (default settings) [35]. Bayesian inference of likelihood was perfomed in a modified GPU-accelerated version of MrBayes [36] with aamodelpr = mixed. The analysis was proceeded until the effective sample size (ESS) was larger than 200 or the standard deviation of split frequencies was below 0.01. Two simultaneous runs of 5,000,000 generations were conducted while each set was sampled every 100 generations with a burn-in of 20,000. The posterior probabilities (PP) of all the clades were computed from the remaining trees.

Data Submission
The clean data generated in this study have been deposited at the National Center for Biotechnology Information (NCBI) with the following characters: BioProject (accession ID PRJNA263698),

Unigene Function Annotation
According to annotation results against the Nr database, a total of 17,276 distinct sequences (62.85% of all assembled unigenes) were matched to known proteins. Of the matched known proteins, 52.9% had strong similarity with common insects such as Tribolium castaneum (Herbst) (Coleoptera: Tenebrionidae) and Acyrthosiphon pisum (Harris) (Hemiptera: Aphididae) (S2 Fig). Due to the insufficiency of heteropteran genome reference, Nr annotation of species distribution is not as efficient as re-sequencing species with a known genome. The COG database was constructed by mutual comparison of protein coding genes in every complete genome. The principal is that each COG should consist of individual proteins or groups of paralogs from at least 3 lineages. When considering proteins from a certain genome or transcriptome, the comparison will lead to the best match functional group of proteins (Fig 1). For GO classification, unigenes were divided into three ontologies, which are cellular component, molecular function and biological process. 8,992 unigenes (32.71% of entirety) were categorized into different functional groups in 257 KEGG pathways. Cellular process and catalytic activity were the two largest ontologies, comprising 5,703 and 4,533 unigenes severally (Fig 2).

Unigene Metabolic Pathway Analysis and Protein Coding Region Prediction
First, unigene sequences were searched against protein databases using BLASTX with a cutoff E-value of 10 -5 in the following order: Nr, SwissProt, KEGG and COG. Unigene sequences that had hits in a preferential database were not moved to the next round. Then the relatively best blast results information was adopted to extract a Coding DNA Sequence (CDS) from unigene sequences. Unigenes with no hits in blast results were predicted by the coding region detection program ESTScan [28,37,38]. A total number of 19,486 unigenes had reasonable protein coding regions.   KEGG annotation system was utilized to conduct the Unigene metabolic pathway analysis. 12,804 unigenes were mapped to 257 KEGG pathways; metabolic pathways contained 1,800 unigenes (14.06%), substantially larger than any other pathways involved, followed by regulation of actin cytoskeleton (4.28%), focal adhesion (3.75%), and pathways in cancer (3.53%) (S1 Table).

Cytochrome P450 and Insecticide Resistance
A total of 88 possible P450 sequences were found in the assembled transcriptome. All of them were screened against the Nr database using BLASTX with a cutoff E-value of 10 -5 to specifically identify to certain gene families. Meanwhile, the identified sequences and related P450 genes from Acrythosiphon pisum (Harris) (Hemiptera: Aphididae), Aedes aegypti (Linnaeus) (Diptera: Culicidae), Apis mellifera (Linnaeus) (Hymenoptera: Apidae), Bombyx mori (Linnaeus) (Lepidoptera: Bombycidae), Laodelphax striatella (Fallén) (Hemiptera: Delphacidae) and Tribolium castaneum (Herbst) (Coleoptera: Tenebrionidae) were analyzed with Maximum Likelihood method and Bayesian inference of likelihood, which classified most of the genes into the CYP2, CYP3, CYP4 and mitochondrial CYP clans (Figs 3-7). 26 of 88 possible sequences were classified as CYP4 clan members, while 6 and 3 of them were categorized as mitochondrial CYP and CYP2 clans respectively. Except for 4 undefined sequences, other possible clans had an equal phylogenetic position with members of the CYP3 clan (Table 2).For the Maximum Likelihood analysis, most of the relative relationship among four clans is as same as that of Bayesian analysis except for several members of CYP4 clan have merged into the CYP 3 clan as unresolved branches. Traditionally, the maximum likelihood analysis is widely used to reconstruct different level of phylogenetic relationship and supposed to be the most effective and reliable way. However, when the nucleotide substitutions broadly varies among different branches, incorrect topological structure might be treated as the actual tree more easily [39]. Concerning cytochrome P450 genes, the identity rules for family and subfamily designations are all members nominally over 40% and 55% identical separately [10]. The sequences are quite diversified compared to other genes involved in phylogenetic analysis. The high diversity and rapid evolution rate of P450 genes in different insects might be the reason for unresolved branches.
Compared to the relative affinities in other Hemiptera whose P450 genes have been to some extent identified, more P450 genes in E. fullo were found in the CYP3 clan. Genes from the CYP3 clan were testified to have the ability of metabolizing xenobiotics and plant natural compounds [11], meanwile, they appear to share some common characteristics with the "environmental response genes" defined by Berenbaum [40]. The results implied that E. fullo could evolve rapidly in response to its environment, including a high level of insecticide resistance. Searching the KEGG pathway analysis, genes from the mitochondrial CYP and CYP2 clans belonged to the "Insect hormone biosynthesis" pathway (ko00981). Only 13 out of 88 possible P450 genes were classified into the "Metabolism of xenobiotics by cytochrome P450" (ko00980), "Drug metabolism-cytochrome P450" (ko00982) and "Drug metabolism-other enzymes" (ko00983) pathways, which show a probably vital and direct role in insecticide resistance, while the other P450 genes play a basic role in metabolic pathways. Further studies of the role of these genes in hemipteran detoxification mechanisms would benefit our understanding of the interaction between phytophagous insects and plants.
For arthropod pests, P450s are well known to play an important role in metabolic resistance to a series of major insecticide classes including organochlorides, organophosphates (Ops), carbamates, pyrethroids, neonicotinoids and insect growth regulators [41]. Generally, the chemical basis of P450s-induced insecticide resistance is the reinforcement of catalysis activity against insecticide metabolism performed at three levels: increasing transcriptional efficiency at the transcription level, the amplification of structural genes at the gene level and increasing the stability of mRNA or protein at the translation level. Nevertheless, no research on P450-induced insecticide resistance mechanisms of E. fullo was reported due to the lack of genomic or transcriptomic information. The rising amount of P450s in insects with insecticide resistance had been shown to a result of overexpressing through up-regulation via modificaiton in cis-or trans-acting regulatory loci [19]. However, it has now been implicated that there exist correlation between the duplication or amplification of P450s and the insecticide resistance in some insects such as D. melanogaster, Culex quinquefasciatus (Say) (Diptera: Culicidae), Anopheles funestus (Giles) (Diptera: Culicidae) and Myzus persicae (Sulzer) (Hemiptera: Aphididae) [42][43][44][45]. For hemipterans, amplification of genes that encode esterases has been reported [46]. In the gene amplification studies of arthropod pest, the delineation of the molecular basis for esterase-mediated resistance in the peach potato aphid, M. persicae, is one of the most successful and complete case [47,48]. So far, no amplification of P450 genes has been reported in E. fullo.  According to previous research, the P450 gene families involved in up-regulation and amplification are CYP4, CYP6 and CYP9 (belonging to the CYP3 and CYP4 clans), all of which were identified in E. fullo. Our transcriptome provides a set number of P450 genes for the further exploration of insecticide resistance mechanisms and the potential discovery of novel functional targets for insecticides. The transcriptome of E. fullo also detected dozens of esterase sequences with potential relevance to heteropteran insecticide resistance. Further comparison and analysis of esterase genes and gene copy numbers or folds between wild-type and resistance-phenotype bugs could explain the currently unknown insecticide resistance mechanisms of the true bugs. Furthermore, the availability of large-scale genomic and transcriptosomic data makes the discovery of more gene members in different detoxifying enzyme gene families easier.

Conclusions
Using the Illumina sequence system, we reported a de novo assembled and annotated transcriptome for the phytophagous pest E. fullo. A total of 53,359,458 clean reads of 4.8 billion nucleotides (nt) were assembled into 27,488 unigenes with a mean length of 750 bp, from which 17,743 (64.55%) were annotated with different databases. In the present study, we also identified and analyzed the evolution of cytochrome P450 superfamilies. Genes of the CYP3 clan related to metabolizing xenobiotics and plant natural compounds were found in E. fullo, which increases the candidate genes for further research into the molecular mechanisms of P450 insecticide resistance. Moreover, more CYP3 clan genes were found in E. fullo, which implied that E. fullo could evolve rapidly in response to its environment, including a high level of insecticide resistance. Our study greatly expands the available genomic information and especially provides a better understanding of the mechanisms of insecticide resistance at the systems biology level for this economically important non-model organism.