Analysis of the Antennal Transcriptome and Insights into Olfactory Genes in Hyphantria cunea (Drury)

Hyphantria cunea (Drury) (Lepidoptera: Arctiidae) is an invasive insect pest which, in China, causes unprecedented damage and economic losses due to its extreme fecundity and wide host range, including forest and shade trees, and even crops. Compared to the better known lepidopteran species which use Type-I pheromones, little is known at the molecular level about the olfactory mechanisms of host location and mate choice in H. cunea, a species using Type-II lepidopteran pheromones. In the present study, the H. cunea antennal transcriptome was constructed by Illumina Hiseq 2500TM sequencing, with the aim of discovering olfaction-related genes. We obtained 64,020,776 clean reads, and 59,243 unigenes from the analysis of the transcriptome, and the putative gene functions were annotated using gene ontology (GO) annotation. We further identified 124 putative chemosensory unigenes based on homology searches and phylogenetic analysis, including 30 odorant binding proteins (OBPs), 17 chemosensory proteins (CSPs), 52 odorant receptors (ORs), 14 ionotropic receptors (IRs), nine gustatory receptors (GRs) and two sensory neuron membrane proteins (SNMPs). We also found many conserved motif patterns of OBPs and CSPs using a MEME system. Moreover, we systematically analyzed expression patterns of OBPs and CSPs based on reverse transcription PCR and quantitative real time PCR (RT-qPCR) with RNA extracted from different tissues and life stages of both sexes in H. cunea. The antennae-biased expression may provide a deeper further understanding of olfactory processing in H. cunea. The first ever identification of olfactory genes in H. cunea may provide new leads for control of this major pest.


Introduction
In this study, we used the Illumina Hiseq 2500 TM platform to sequence the antennal transcriptome of H. cunea. After analyzing the transcriptome data, we identified 124 olfactionrelated genes in total, including 30 OBPs, 17 CSPs, 52 ORs, 14 IRs, 9 GRs, and two SNMPs. In addition, the predicted protein sequences were compared with orthologs from moth species by building phylogenetic trees, and motif patterns of OBPs and CSPs were also constructed. On the basis of analyzing the antennal transcriptome, gene functional annotation was also obtained. Furthermore, OBPs and CSPs expression patterns in different tissues and development stages were determined using reverse transcription PCR (RT-PCR) and quantitative real time PCR (RT-qPCR). Lastly, we constructed phylogenetic trees of OBPs and CSPs based on our H. cunea data and previous published work on C. cunea to access the potential overlap in olfactory chemosensory ability.

Insect rearing and antennae collection
Pupae of H. cunea were collected from straws bundled around host trees (Populus canadensis) at Sixian, Anhui Province, China, and were maintained in plastic tubes. Tubes were buried in wet sand to provide high humidity, and were held at 25°C. Forest Pest Control Station of Anhui Province issued the permit for the field collection (by the director, Jun Fu). To eliminate the differences in each individual, the antennae from 60 newly emerged unmated moths (40 males and 20 females) were dissected, flash frozen in liquid nitrogen, and stored at -80°C until RNA extraction.

RNA extraction and preparation of cDNA library
The stored antennae were ground and homogenized by vitreous Tissue-tearors (DEPC-water treated). Total RNA was extracted using TRIzol reagent (Invitrogen, USA) following the manufacturer's instructions. RNA degradation and contamination was monitored on 1% agarose gels, and purity was checked using a NanoPhotometer 1 spectrophotometer (IMPLEN, CA, USA). Illumina sequencing of the samples was performed at Novogene Co., Ltd., Beijing, China. Sequencing libraries were generated using NEBNext 1 Ultra RNA Library Prep Kit for Illumina 1 (New England Biolabs, USA) following the manufacturer's recommendations. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. First strand cDNA was synthesized using random hexamer primers and M-MLV Reverse Transcriptase (RNaseH). Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Then, DNA fragments were treated for end-repairing, adenylation of 3' ends and ligation of adaptors. The library fragments were purified with AMPure XP system (Beckman Coulter, CA, USA) to preferentially select cDNA fragments of 150~200 bp in length. Then, suitable fragments were enriched by PCR amplification.

Transcriptome sequencing and assembly
The library preparations were sequenced on an Illumina Hiseq TM 2500 platform and pairedend reads were generated. Clean reads were obtained by removing reads containing adapter, reads containing poly-N, and low quality reads from the raw reads. Transcriptome assembly was accomplished based on clean data with high quality using Trinity [34] to produce transcripts. Then the longest transcript of each single gene was selected as a unigene.

Gene functional annotation
Unigenes obtained from antennae of H. cunea were identified by BLAST searches with annotation against the Nr database using an e-value cut-off of 10 −5 . The unigene sequences were also aligned to protein databases such as Swiss-Prot, Pfam, KOG/COG and KO to find the highest similarity to the given unigenes along with putative functional annotations. Blast2GO v2.5 [35] was used to get GO annotation, and GO enrichment analysis of the differentially expressed genes was implemented by the GOseqR packages based on Wallenius non-central hyper-geometric distribution [36]. The open reading frame (ORF) of each gene was determined using an ORF finder tool (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). The signal peptide of the protein sequences was predicted using SignalP 4.0 [37]. The transmembrane domains of ORs, GRs, IRs and SNMPs were predicted by using TMHMM Server v. 2.0 (http://www.cbs.dtu.dk/ services/TMHMM/).

Phylogenetic analysis
Phylogenetic trees were built based on amino acid sequence alignment of the candidate OBPs, CSPs, ORs, GRs, IRs, and SNMPs from H. cunea and those of other insects species using Clus-talX2.0 [38] [39] software. Node support was generated from 1,000 bootstrap pseudo replications of the data.

Motif analysis of OBPs and CSPs
In order to find the potential conversed motif, we compared the motifs-pattern of OBPs and CSPs in different families of Lepidoptera. A total of 76 OBPs and 43 CSPs from H. cunea, B. mori, and H. armigera were used for motif discovery and pattern analysis. All the OBP and CSP sequences used in this study were translated to amino acid sequences. The MEME (version 4.11.1) online software (http://meme-suite.org/tools/meme), which has been widely used for discovery of protein motifs [7,[40][41][42], was used to discover and analyze the motifs in this analysis. The parameter settings used for motif discovery were as follows: minimum width = 6, maximum width = 10, and the maximum number of motifs = 8.

Tissue expression analysis of OBPs and CSPs
The expression patterns of OBPs and CSPs in different tissues (antennae, thoraces, abdomens, legs, wings) and life stages (pupae of both sexes and larvae) were analyzed by RT-PCR. Fifty male and female antennae, 10 whole insect body without antennae, thoraces, abdomens, legs, wings, and 10 pupae of both sexes and 10 larvae were collected, and frozen in liquid nitrogen for RT-PCR. Total RNA from different tissues was extracted as described above, including three replications of samples. PrimeScript 1 RT reagent Kit with gDNA Eraser (Perfect Real Time, Takara, Dalian, China) was used for reverse transcription in order to remove residual trace amounts of genomic DNA. The cDNA (20 ng) was used as a template in RT-PCR. Primers were designed with the Primer Premier5 software (PREMIER Biosoft International, CA, USA). EF1-a-H. cunea voucher W72 elongation factor 1 alpha gene-was used as a reference gene. The cDNA template was replaced by RNase-free water in the negative control. PCR reaction was carried out under the conditions of 94°C for 30s, 52°C for 30s, 72°C for 15s using 2xEs Taq Master Mix (CWBIO, Beijing, China) in 30 cycles. PCR products were run on a 1% agarose gel.
The expression patterns of OBPs and CSPs in different tissues (male antennae, female antennae, legs, wings) and life stages (pupae of both sexes and larvae) were analyzed by RT-qPCR. Twenty male and female antennae, 20 legs, 20 wings, and 10 pupae of both sexes and 10 larvae were collected, and frozen in liquid nitrogen for RT-qPCR. cDNAs from antennae and other tissues were synthesized as described above. The equal amount of cDNA (2.5 ng) was used as a template in RT-qPCR. Primers were designed with the Beacon Designer 7.9 software (PREMIER Biosoft International, CA, USA). HyphEF1-a (elongation factor 1 alpha gene) and HyphGAPDH were used as the reference genes. The cDNA template was replaced by RNasefree water in the negative control. The RT-qPCR was performed on a CFX96 Detection System (Bio-rad, Hercules, CA, USA) using a mixture of 25μL reaction: 12.5μLSYBR 1 Premix Ex Taq II (Tli RNaseH Plus) (Takara, Dalian, China), 1μL of each primer (10μM), 2.0μL of template cDNA, and 8.5μL of sterilized ultrapure H 2 O. The RT-qPCR reaction was carried out under the conditions of 95°C for 30s, followed by 40 cycles of 95°C for 5s and 60°C for 30s, then the melting curve was measured. Each sample included three biological replications which measured in three technique replications. The RT-qPCR results were analyzed using the CFX96 analysis software, and the expression levels of above genes were calculated relative to two reference genes using the Q-gene method [43,44]. Data of relative expression levels from various samples were subjected to ANOVA (one-way analysis of variance), followed by Duncan's new multiple range test using the SPSS 22.0 software (SPSS Inc., Chicago, IL, USA).

Unigene assembly and transcriptome sequencing
A total of 65,177,438 raw reads were obtained from an Illumina Hiseq 2500 platform (Table 1). After removing adaptors and low quality reads, 64,020,776 clean reads were acquired with a Q20 percentage of 96.03%, which were assembled into 78,131 transcripts with a mean length of 1123 bp and an N50 length of 2520 bp. 59,243 unigenes were selected from the above transcripts with a mean length of 829 bp and an N50 length of 1803 bp. 35,976 unigenes were longer than 300 bp which accounted for 60.73% of all unigenes (S1 Fig).

Homology analysis and gene functional annotation
The functional annotation of unigenes was performed by a BLAST homology search against the protein databases. 15,242 (25.72%) unigenes were annotated in the Nr database. As a result, 91.30% of annotated unigenes had more than 60% similarity with known proteins (S2A Fig).  [45,46] with numerous proteins in the NCBI protein database used for homology analyzing.
GO annotation was obtained using the program Blast2GO against the Nr database. A total of 12,565 unigenes were assigned to three main GO classes among all 59,243 unigenes. Specifically: these included genes for biological processes (34,685), cellular components (22,506), and molecular function (15,726) (S3 Fig). In the molecular function category, binding (7,161) and catalytic activity (5,275) were two major terms of antennal gene expression. In the biological processes, cellular processes (7,295), metabolic processes (6,606), and single-organism processes (5,716) were the most abundant. Cell (4,446) and cell parts (4,446) were enriched in the same level of cellular component, followed by organelle (2946), macromolecular complex (2661) and membrane (2523).
After a total of 5,781 unigenes were annotated in the KO database, we acquired a KEGG pathway classification for the H. cunea antennal transcriptome. Five subcategories of KEGG pathway were as follows: cellular processes (A), environmental information processing (B), genetic information processing (C), metabolism (D), and organisimal systems (E) (S4 Fig). Signal transduction (698) was the highest term in the environmental information processing subcategory, which indicated the strong association with odorant binding and transduction of the antennal tissue. In addition, genes associated with biodegradation and metabolism of xenobiotics (130) were identified; these are likely involved in odorant degradation in olfactory processes.

Identification of putative odorant-binding proteins
Analysis of the H. cunea antennal transcriptome identified 30 putative OBPs, including 3 PBPs ( Table 2). The signal peptide prediction showed that 26 unigenes had a complete ORF (Table 3). Most OBPs had a low similarity to known lepidopteran OBPs, possibly due to the relatively low conservation among different families. OBPs can be generally divided into different subclasses according to the number of conserved cysteines, including Classic OBPs, Plus-C OBPs and Minus-C OBPs [47]. We identified 15 classic OBPs using multiple amino acid sequence alignments, which were matched up with the six-cysteines pattern C1-X 25-30 -C2-X 3 -C3-X 36-42 -C4-X 8-14 -C5-X 8 -C6 (where X stands for any amino acid) proposed by Xu et al. [7] (S5 Fig). A phylogenetic tree was constructed based on the neighbor-joining method (Fig 1). HyphOBP1, 6, and 23 clustered with the Plus-C subfamily, whereas the Minus-C subfamily contained HyphOBP5 and 19. General OBPs clustered together with PBPs, including HyphPBP1, 2, and 3, which all belong to the classic OBPs.

Identification of putative chemosensory proteins
Seventeen putative CSPs were identified in the H. cunea antennal transcriptome (Table 2), verified by the four-cysteines pattern C1-X 6 -C2-X 18 -C3-X 2 -C4 (S6 Fig). Among these sequences, 16 had a complete ORF with a predicted signal peptide. Almost 90% of the CSPs (15) had more than 70% similarity with other species' CSPs, much higher than the sequence similarities of the OBPs (26.7%) ( Table 4). This indicated that the CSPs are more highly conserved than OBPs. The CSPs were scattered in different branches of the phylogenetic tree, except HyphCSP6 and HyphCSP7, which clustered in the same subfield (Fig 2).

Identification of putative odorant receptors and gustatory receptors
We identified 52 putative ORs by analyzing the antennal transcriptome ( Table 2). The TMHMM prediction showed that five unigenes (HyphOR9, 12, 21, 27 and 34) had seventransmembrane domains, and 42 sequences had a full-length ORF (Table 5). Fifty-two sequences showing multiple amino acid alignment with ORs from B. mori, H. armigera, S. inferens, O. brumata, and A. segetum were used to construct a phylogenetic tree (Fig 3). HyphOR27 clustered with the lepidopteran ORco (olfactory receptor coreceptor) family and had a high degree of similarity with these ORs. The lepidopteran PR family was also detected, and HyphOR1, 7, 50 belonged to this family. In addition, HyphOR50 was clustered with ObruOR1 and AsegOR3, which had a high orthology. Nine putative GRs were discovered and four of them had a complete ORF (HyphGR1, 2, 3, 7) (Tables 2 and 6). The prediction showed that HyphGR7 had none transmembrane domain ( Table 6). The phylogenetic tree of the GRs showed that HyphGR1 and HyphGR7 clustered into the same branch, and HyphGR3 had a complete similarity (100%) with HassGR1 and HarmGR1 (Fig 4).

Identification of putative ionotropic receptors
Transcriptome assembly and analysis led to the identification of 14putative IRs ( Table 2). 11 of the 14 sequences had a complete ORF, with HyphIR1, 2 and10 being the exceptions (Table 6). And two IRs (HyphIR1 and 10) had none transmembrane domain (Table 6). In the IR phylogenetic tree, 14 IR sequences were distributed in differential subclades. HyphIR4 and HyphIR7 may belong to the IR41a clade and had 100% orthology with each other. HyphIR9 clustered with the IR76b sequences from other insects, and showed homology with SinfIR76b, which may be characterized as HyphIR76b. The same situation occurred with HyphIR8 and HyphIR11, which could be identified as a member of the IR21a and IR75p subgroup, respectively. In addition, HyphIR14 was found in the high conserved IR8a subfamily, while HyphIR12 was a member of IR25a subclade (Fig 5). Olfactory Genes in Hyphantria cunea (Drury)

Identification of putative sensory neuron membrane proteins
Two SNMPs (SNMP1 and SNMP2) were detected from our transcriptome ( Table 2). SNMP2 was presumed to have a full-length ORF (Table 7). In the phylogenetic tree, HyphSNMP1 clustered with SinfSNMP1 belonging to the SNMP1 family. HyphSNMP2 formed a unique branch in the SNMP2 family (Fig 6).

Motif pattern analysis of H. cunea OBPs and CSPs
The purpose of conserved motifs analyses are an important step to better understand the functional domains and the conserved motifs in OBPs and CSPs from H. cunea, B. mori, and H. armigera. The MEME server was used to help us compare motif patterns of OBP and CSP proteins in distinct lepidopteran families. As a result, eight motifs for both OBPs and CSPs were obtained, 27 different motif patterns of 76 OBPs and 25 motif patterns of 43 CSPs. We listed 11 relatively common motif patterns, including 54 OBPs (Fig 7). armigera also had the same motif pattern characterized by motif 2 at the C-terminal with the motif order as 6-7-2, and the motif 6, 7 were only found in the PBPs and located at the same position as the central part.
We also list 11 common motif patterns containing 30 CSPs in Fig 8. The motif pattern 8-2-6-3-5-7-1-4 was the only one which had five homologous CSPs (BmorCSP10/12, HarmCSP2, and HyphCSP5/7) from all three species and also the most common pattern. Motif 8 existed in 28 out of 30 CSPs at the N-terminal, with the exception of HarmCSP3/6, and motif 1, 3, which also existed as 28 CSPs with the exception of BmorCSP16 and HyphCSP2 that both located at the central part. In addition, motif 1, 2, 3, 7 existed at different positions infrequently.

Tissue expression analysis of OBPs and CSPs
We analyzed the expression patterns of OBPs and CSPs in different tissues and life stages of H. cunea using RT-PCR (Figs 9 and 10). The results indicated that 16 OBPs of H. cunea (HyphOBP6-8, HyphOBP10, HyphOBP12-16, HyphOBP20, HyphOBP22, HyphOBP24-25, and HyphPBP1-3) were uniquely or primarily expressed in the female and male antennae. Three OBPs-HyphOBP2, HyphOBP19, and HyphOBP23 -were expressed not only in the antennae but also in other tissues like the thoraces, abdomens, legs, and wings, and also in pupae and larvae (Fig 9). As for the CSP genes, 12 CSPs (HyphCSP1-2, HyphCSP5-6, HyphCSP9-12 and HyphCSP14-17) were relatively intense bands in the female and male antennae. Seven HyphCSP genes (HyphCSP5-6, HyphCSP9-12 and HyphCSP15) were expressed in all tested tissues. A wide range of expression in the pupae and larvae of HyphCSP genes (HyphCSP2-14, HyphCSP16-17) suggested the connection between chemosensory proteins and H. cunea pupae and larvae, involving various chemosensory processes (Fig 10). In order to confirm the RT-PCR results, real-time quantitative PCR (RT-qPCR) analyses were conducted to characterize the expression profiles of the OBPs and CSPs in different tissues and life stages of H. cunea. The results showed that all OBPs and CSPs were expressed in antennae, confirming the authenticity of the transcriptome data (Figs 11 and 12). For 22 of the 30 OBPs (including three HyphPBP1-3), were observed the highest expression levels in antennae (Fig 11). Two OBPs-HyphOBP2 and HyphOBP23-had a relatively high expression both in antennae and legs. The expression levels of two OBPs (HyphOBP19 and HyphOBP21) in wings were significantly higher than organs. Five OBPs (HyphOBP5, HyphOBP14, HyphOBP16, HyphOBP22 and HyphOBP25) were detected the highest expression levels in pupae and one OBP (HyphOBP26) showed a higher expression levels in larvae (Fig 11). In addition, the expression levels of 16 antennae-enriched OBPs (HyphOBP2-4, HyphOBP6, HyphOBP8-9,  HyphOBP11-13, HyphOBP15, HyphOBP18, HyphOBP20, HyphOBP23-24, and HyphOBP26-27) was higher in female antennae than in male antennae. Three PBPs (HyphPBP1-3) and two OBPs (HyphOBP7, and HyphOBP10) were significantly overexpressed in male antennae and displayed male antennae-biased expression. For the CSPs, all of HyphCSP genes were expressed in all tested tissues and life stages. Among of 17 CSPs, eight CSPs (HyphCSP5, HyphCSP9-11, and HyphCSP13-16) and two CSPs (HyphCSP4 and HyphCSP7) were highly enriched in legs and in the wings, respectively. Whereas, only two CSPs (HyphCSP1 and HyphCSP12) showed a significantly higher expression in antennae than in other non-olfactory tissues. In addition, we also found that some CSPs were highly enriched in the pupae and larvae, such as two CSPs (HyphCSP3 and HyphCSP8) in larvae and three CSPs (HyphCSP2, HyphCSP6, HyphCSP17) in pupae (Fig 12).
In the whole, the results from RT-PCR bands were consistent with the results of RT-qPCR. For example, several HyphOBPs (HyphOBP2, HyphOBP10, HyphOBP13, HyphOBP15, and HyphPBP1-3) and HyphCSPs (HyphCSP2, HyphCSP5-6, HyphCSP9-12, and HyphCSP15), which were relatively intense bands in antennae, were also highly enriched in antennae by RT- qPCR (Figs 9-12). In short, the RT-PCR results were verified by RT-qPCR. Whereas, there were a few results of RT-qPCR were to be in disagreement with the bands shown using RT-PCR. For example, HyphOBP14, HyphOBP16 and HyphOBP25 were detected in pupae by RT-qPCR, but the expression bands of these genes were very faint by RT-PCR. These differences may owing to the differentiated sensitivity of these two techniques. RT-qPCR is considered to be the most powerful, sensitive, and quantitative assay for the detection of RNA levels.

Discussion
Numerous olfactory genes have been reported in recent studies of Lepidoptera, such as H. armigera [48], M. sexta [2], Epiphyas postvittana [27], Chilo suppressalis [26], Cydia pomonella [24], D. houi and D. kikuchii [42]. However, most research has focused on species using Type-I pheromones, with little known about species using the less common Type-II pheromones (Table 8). Here, we have identified numerous olfactory genes from an arctiid moth producing Type-II pheromones, using Illumina Hiseq TM 2500 platform sequencing to analyze the antennal transcriptome of H. cunea as a step towards understanding olfactory processing in this and related species. In total, 30 OBPs, 17 CSPs, 52 ORs, 14 IRs, 9 GRs, and two SNMPs were identified from the antennae of H. cunea.
In the transcriptome sets, a total of 59,243 unigenes were assembled from 78,131 transcripts. Compared with several reported moth transcriptomes, the mean length of unigenes in H. cunea (829bp) was longer than M. sexta (460bp) [2] and S.litura (603bp) [49], but shorter than A. ipsilon (967bp) [50] and H. armigera (991bp) [48]. Furthermore, the minimum length 201bp and the maximum length 29,665bp among all unigenes indicates the high quality and depth of sequencing at the transcriptome level. The species classification obtained by using Blastx in the Nr protein database showed that the highest similarity was to B. mori (48.1%) and D. plexippus (29.5%), possibly in part because of the extensive identification of genes, including olfactory genes, from B. mori (14,623) [46] and D. plexippus (16866) [51] with the genome sequencing approach. GO and KO annotation were also generated during the bioinformatics analysis. In the H. cunea transcriptome, 21.1% of the unigenes (12,565) were annotated in GO, slightly more than the KO which comprised 9.75% (5,781), that is, over 70% of the unigenes had no annotation in either the GO or KO databases, suggesting a large number of new potential olfactory genes.
We identified 30 OBPs on the basis of antennal transcriptome of H. cunea by homology alignment. The number of H. cunea OBP genes predicted in this study was similar to H. assulta (29) [52] and A. ipsilon (33) [50], but less than S. litura with 38 OBPs [41], or B. mori (44) [53], the latter of which had whole-genome data. The homology analysis in the phylogenetic tree showed that HyphOBPs were divided into several different branches with another 219 OBPs from nine lepidopteran species, such as the Plus-C subfamily (HyphOBP1, HyphOBP6, HyphOBP23), the Minus-C subfamily (HyphOBP5, HyphOBP19), and the PBP subfamily including HyphPBP1-3. The differential types of H. cunea OBPs, suggested by the various molecular structures constructed from diverse numbers of cysteines [7], indicates that HyphOBPs may be involved in biological processes other than olfaction. OBPs are a key link in olfactory processing because they transport odorants from the external environment through the sensilla lymph to the ORs [5,54]. Many researches have shown that insect OBPs are found specifically in antennae [53,[55][56][57], in our study, most of the OBPs of H. cunea were highly abundant in the antennae by RT-PCR and RT-qPCR analyses, suggesting their putative role in the odorant detection. However, some OBPs were expressed in tissues other than antennae: HyphOBP19 and HyphOBP21 had a relatively high expression in wings than other organs, and HyphOBP2 and HyphOBP23 were leg-enriched. In addition, several OBPs (HyphOBP5, HyphOBP14, HyphOBP16, HyphOBP22, HyphOBP25 and HyphOBP26) were also enriched in pupae and larvae. These suggest that insect OBPs are widely distributed in other tissues (legs and wings) besides the antennae, and adapt to complex olfaction-related activities in different development stages.
Sex pheromones play a crucial role as signals between sexually reproducing insect species [29]. Moth sex pheromones consist of two major types: Type-I and Type-II. About 75% of   [19,20]. Type-II pheromones are found in 15% of lepidopteran species, primarily the Arctiidae and Geometroidea [20,58]. H. cunea (Drury) is one of the most destructive species in the group of Arctiidae that use Type-II pheromone. In this study, three PBPs were identified among the 30 OBPs of H. cunea. According to the RT-qPCR method, all three PBPs (HyphPBP1-3) showed high expression level (Fig 11) in antennae and were male antennae-biased, suggesting their putative role in detecting of the female sex pheromones. Bioinformatics analysis led to the identification of 17 CSPs from our transcriptome data. CSPs are another type of soluble protein that have similar functions to OBPs in carrying semiochemicals [59], while being smaller and more conserved than OBPs. The Blastx analysis of the CSPs proved their relatively high conservation among various species (Table 4). Compared with the unique and/or primary expression in antennae of OBPs, CSPs demonstrated, as many CSPs were found on various body parts such as, antennae, thoraces, abdomens, legs, and wings, even in pupae and larvae (Figs 10 and 12). This ubiquitous expression characteristic of CSPs suggested that they may participate in regulatory mechanisms or other physiological processes in on-olfactory tissues.
In motif pattern analysis by MEME, eight motifs for both 76 OBPs and 43 CSPs from H. cunea, B. mori, and H. armigera were identified. The most common motif pattern in OBPs had a motif order of 4-1-5-3-2, including 14 homologous OBPs and the motif pattern 8-2-6-3-5-7-1-4, which had five homologous CSPs was the most common pattern in CSPs. There still have more conserved motifs in OBPs and CSPs in three distinct lepidopteran families. Similar results also reported by Gu et al and Zhang et al [41,42], which compared the motif patterns within genus and between Lepidoptera OBPs and CSPs. The most noteworthy is that HyphPBP1 and HyphPBP2 had the same conserved motifs with HarmPBP1 and HarmPBP2, despite the different pheromone types between the two species. Further research on the functional roles of the proteins may explain this phenomenon and determine the binding characteristics of PBPs and the Type-II pheromone components of H. cunea.
Insect SNMPs are two-transmembrane, olfactory-specific membrane proteins that are homologous with human CD36 receptors [13]. Two SNMPs, SNMP1 and SNMP2, have been identified in insects, and expressed at different locations in antennal sensilla [13,[60][61][62][63]. In this study, SNMP1 and SNMP2 were identified, and it is clear that these two SNMPs belong to separate subfamilies from the phylogenetic tree (Fig 6). Nine GRs were discovered in this study, of which, HyphGR3 was clustering with BmorGR8 which has been identified as a sugar receptor; Thus, possibly HyphGR3 plays functions as a sugar receptor, and the antennae of H. cunea may have a role in sugar detection [64]. This suggests that the antennae of H. cunea may play a role in sugar detection, and more GRs participating in detection of bitter and other compounds may be found by further study of H. cunea. We discovered 14 IRs from the antennal transcriptome, and all of them were distributed in different IR subfamilies. Among of them, HyphIR14 and HyphIR12 were identified as the highly conserved coreceptors IR25a, IR8a, respectively, and HyphIR9 was characterized as IR76b subunit, which maybe the second putative coreceptor [65].
ORs are pivotal in sophisticated olfaction systems and have been proposed to be a link between the external environment and insect physiological reactions [5]. A total of 52 ORs were identified in the H. cunea antennae transcriptome. The number of HyphORs is higher than in other lepidopteran species, such as H. armigera (47) [48], S. littoralis (47) [66], S. inferens (39) [67] and S. litura (26) [49]. It is generally accepted that ORs are divided into atypical odorant receptors and traditional odorant receptors. In our study, HyphOR27 was identified as one of the atypical odorant receptors, also called ORco, and it clustered with ORco from B. mori and S. inferens with >90% homology (Fig 3). Three pheromone receptors (PRs) (HyphOR1, 7, 50) were also located in the PR subcategory branches. HyphOR50 was orthologous to ObruOR1, another known pheromone receptor in species using Type-II pheromone. This phenomenon may indicate the common ancestor of PR genes in Type-II pheromone responding moths. AsegOR3, responding to both Type-I and Type-II pheromones from a Type-I pheromone producing moth, was also clustered with these two PRs, which may suggest similarities in evolution. Several branches were noteworthy, such as-HyphOR39, 33, 3, 25, and 32, which share a high homology with SinfOR19 and forming a separate subset. The same situation occurred with HyphOR40, 29, 45 and SinfOR16. These orthologies suggest similar protein structures and functions between H. cunea and S. inferens, which would need to be followed up in further research.
Chouioia cunea Yang is a native parasitoid wasp that represents a significant natural enemy to H. cunea and which could play a vital role in the biological control of the fall webworm [30,68]. The mechanisms by which C. cunea locates, recognizes, and parasitizes H. cunea are not known, but there may be some overlap in the chemosensory abilities of the two species [69]. Thus, we constructed two phylogenetic trees using OBPs and CSPs from H. cunea and C. cunea [70] (S7 and S8 Figs). Six clusters (C1, C2, C3, C4, C5, C6) were generated from the OBPs of the two species, indicating some similarities in olfaction between the parasitoid wasp and its host. In particular, HyphOBP10 and CcunOBP13 were considered orthologous, with 75% similarity, and probably similar molecular structure and function [71]. Compared with OBPs, CcunCSP7 had a higher orthology with HyphCSP1 and HyphCSP2, perhaps due to strong conservation of this class of proteins. Similar results have been reported shared OBPs and CSPs from the antennal transcriptome study of another serious pest, Monochamus alternatus, and its parasitoid Dastarcus helpophoroides [69]. This could be explained by another herbivore-plant-parasitoid system, wherein the homoterpene E-4,8-dimethyl-1,3,7-nonatriene (DMNT), is a key plant compound released by plants under attack by herbivores, and subsequently used as a cue by natural enemies in finding prey; the herbivore species S. littoralis was in turn deterred by this herbivore-induced plant volatiles [72]. This overlap is ecologically significant, as herbivores and their parasites are expected to share the ability to detect several biological relevant compounds, which may include kairomonal detection of pheromones or herbivore-induced plant volatiles (HIPVs) [69]. Our study provides supporting evidence for the hypothesis that herbivores and their parasites may share olfactory capabilities for perceiving similar biologically relevant compounds. The similarities in proteins may be due to the parasite utilizing similar environmental cues to locate hosts, or possibly for detecting the host directly via H. cunea pheromones. However, this hypothesis remains to be verified by testing the compounds that could be physiologically or behaviorally active in H. cunea and C. cunea. A better understanding of the similarities in chemosensory genes and the interactions between H. cunea and C. cunea may indicate an efficient method to eliminate this invasive pest.

Conclusions
The transcriptome analysis of H. cunea has provided, for the first time, identification of 124 genes related to the olfactory system of a Type-II lepidopteran pheromone using species and provides insights towards a better understanding of the molecular mechanisms of olfaction for Arctiid moths. Importantly, we found three PBPs (HyphPBP1-3), one putative sugar receptor (HyphGR3), three conserved coreceptors (HyphIR9, HyphIR12 and HyphIR14), one ORco (HyphOR27) and three PRs (HyphOR1, 7, 50), based on phylogenetic analysis. The motifs analysis in OBPs and CSPs from H. cunea, B. mori, and H. armigera were conducted, using a MEME system, and many conserved motif patterns of OBPs and CSPs were found. It was noteworthy that HyphPBP1 and HyphPBP2 had the same conserved motif patterns with HarmPBP1 and HarmPBP2, despite the different pheromone types between the two species. These investigations might provide some insights into the function and evolution of insect OBPs and CSPs. We further verified the expression of OBPs and CSPs by RT-PCR and RT-qPCR analysis and confirmed the authenticity of the transcriptome data. The most of the OBPs had antenna-biased expression and a few of OBPs were enriched in pupae and larvae. And the CSPs demonstrated a ubiquitous expression characteristic. Moreover, three PBPs (HyphPBP1-3) were antennae-enriched and displayed a male antennae-biased expression. The tissue and sex-biased expression patterns may provide a deeper further understanding of olfactory processing in H. cunea. Our work allows for further functional studies of these pheromone binding proteins and potential olfactory receptors in H. cunea, which may be meaningful targets for the management of this devastating invasive species in China.