Antennal Transcriptome Analysis and Comparison of Chemosensory Gene Families in Two Closely Related Noctuidae Moths, Helicoverpa armigera and H. assulta

To better understand the olfactory mechanisms in the two lepidopteran pest model species, the Helicoverpa armigera and H. assulta, we conducted transcriptome analysis of the adult antennae using Illumina sequencing technology and compared the chemosensory genes between these two related species. Combined with the chemosensory genes we had identified previously in H. armigera by 454 sequencing, we identified 133 putative chemosensory unigenes in H. armigera including 60 odorant receptors (ORs), 19 ionotropic receptors (IRs), 34 odorant binding proteins (OBPs), 18 chemosensory proteins (CSPs), and 2 sensory neuron membrane proteins (SNMPs). Consistent with these results, 131 putative chemosensory genes including 64 ORs, 19 IRs, 29 OBPs, 17 CSPs, and 2 SNMPs were identified through male and female antennal transcriptome analysis in H. assulta. Reverse Transcription-PCR (RT-PCR) was conducted in H. assulta to examine the accuracy of the assembly and annotation of the transcriptome and the expression profile of these unigenes in different tissues. Most of the ORs, IRs and OBPs were enriched in adult antennae, while almost all the CSPs were expressed in antennae as well as legs. We compared the differences of the chemosensory genes between these two species in detail. Our work will surely provide valuable information for further functional studies of pheromones and host volatile recognition genes in these two related species.


Introduction
Olfaction plays a key role in many aspects of insect behavior, such as foraging, oviposition and mate recognition. Possessing a sophisticated olfactory system to detect and interpret odorants in the environment is a prerequisite to survival and reproduction for insects. An understanding of how chemicals are detected by the antenna, transduced to the brain, and consequently translated into behavior is essential to clarify the mechanism of odorant detection in insects.
In the last few decades, much progress has been made in deciphering the mechanisms of periphery detection of insect olfaction. Olfactory signal transduction is best summarized in several discrete steps: first, the hydrophobic chemical volatiles enter into the sensillum lymph through the pores on the surface of the sensilla [1,2], and then bind to water-soluble odorant binding proteins (OBPs)/ chemosensory protein (CSPs), which are aboundunt in the sensillum lymph (up to 10 mM) [3][4][5][6][7][8][9][10]. Then, the odorants activate the odorant/ionotropic receptors (ORs/IRs) expressed on the dendritic membrane of olfactory sensory neurons (OSNs) alone or in complex with the binding proteins [11,12], upon which, the chemical signal is translated into electrical signals that are transduced to the antennal lobe (AL). In addition, sensory neuron membrane proteins (SNMPs), and odorant degrading enzymes (ODEs) are also involved in different steps in signal transduction pathway [13][14][15][16].
The two closely related species, H. armigera and H. assulta are important pest species in China. These two species are so similar that they both use (Z)-11-hexadecenal (Z11-16:Ald) and (Z)-9-hexadecenal (Z9-16:Ald) as their main sex pheromone components, but in nearly reversed ratios [32]. However, their foraging ranges are widely differentiated, H. armigera is a polyphagous species posing a major threat to over 200 different plants, while H. assulta is an oligophagous insect, which mainly feed on Solanaceae plants, including tobacco and hot pepper [33,34]. Their feeding preferences may be associated with differences in their olfactory and gustatory system. Aiming to understand the olfactory mechanism of these two related species, our lab previously conducted a 454 sequencing of adult male and female antennae from H. armigera [24]; several chemosensory genes, including 45 ORs and 12 IRs were identified. However, considering the hypothesis that the number of the glomeruli equals the number of ORs [35,36], the number of ORs was lower than expected, suggesting that some ORs might have been missed. The number of other identified chemosensory genes (GRs, OBPs, and CSPs) was also less than the number of genes reported in other Lepidoptera insects [26,37]. This might be because the sequencing depth of 454 is much lower than Illumina technology and some lowlevel expressed genes were omitted. In order to find the missing chemosensory genes in H. armigera and identify all of the olfactory genes in H. assulta, we sequenced the adult antennae of H. armigera and H. assulta using Illumina HiSeq 2000 platform. Our study greatly enriches the information on the molecular mechanisms of chemoreception in these species, with a total number of 133 putative chemosensory unigenes in H. armigera including 60 ORs, 19 IRs, 34 OBPs, 18 CSPs, and 2 SNMPs, and 131 putative chemosensory unigenes with 64 ORs, 19 IRs, 29 OBPs, 17 CSPs, and 2 SNMPs in H. assulta. Then we completely compared the differences of the chemosensory genes (ORs, IRs, OBPs, CSPs) between these two species. Further Reverse Transcription-PCR (RT-PCR) assays in H. assulta were conducted to examine the expression profile of these unigenes in different tissues. Our work will surely provide an extensive molecular basis for further research in pheromone and host volatile recognition of these two related species.

Transcriptome assembly and Gene Ontology (GO) Annotation
The RNA extracted from the mix of male and female antennae of H. armigera was sequenced using Illumina HiSeq 2000 platform, while the RNA from male and female antennae of H. assulta was sequenced separately. A total of 47,407,880 and 51,051,262 raw reads were obtained from male and female H. assulta antennae samples, respectively and a total of 58,035,052 raw reads were got from the mixed sample of H. armigera. After trimming adaptor sequences, contaminating sequences and low quality sequences, high quality contigs were generated. These contigs were further assembled by paired-end joining and gap-filling, and clustered into unigenes. An overview of the sequencing and assembly process is presented in Table 1. The clean reads of the three antennal transcriptomes in this study have been in the NCBI SRA database, under the accession number of SRX707455 (H. assulta male), SRX707456 (H. assulta female) and SRX707450 (H. armigera mix-sex). The results showed 50.8% (H. armigera) and 54.0% (H. assulta) of the unigenes were matched to the entries in NCBI non-redundant (nr) protein database by blastx homology search with a cut-off E-value of 10 −5 (S1 Material). Fig. 1 illustrates the distribution of the H. armigera and H. assulta unigene set in GO terms. Among the 53,479 (H. armigera) and 44,319 (H. assulta) unigenes, only 12,611 (23.6%) and 11,369 (25.6%) correspond to at least one GO-term, respectively. Similar results were observed in other transcriptome analyses for M. sexta [23], and Sesamia inferens [29]. As one unigene could align to multiple GO categories, 51,360 and 54,273 were assigned to biological process, 26,169 and 27,809 to cellular component and 14,086 and 15,489 to molecular function in H. assulta and H. armigera, respectively. In the molecular function category, the terms of binding and catalytic activity were the most represented. In the cellular component terms, cell and cell part were the most abundant. Cellular process, single-organism process and metabolic process were most abundant in the biological process category. In each of the three GO categories, the more abundant terms were almost the same as those observed in the antennal transcriptome of M. sexta [23], S. littoralis [26][27][28], S. inferens [29], A. gossypii [31] and Agrotis ipsilon [38].
The candidate olfactory receptors in H. armigera and H. assulta As the centerpiece of peripheral olfactory reception, ORs are the most important and determine the sensitivity and specificity of odorant reception [16]. All of the unigenes were searched by blastx and tblastn using the ORs identified from H. armigera [24,39] and H. virescens [40], leading to identification of 60 putative OR genes in H. armigera and 64 OR genes in H. assulta, which was almost consistent with the number of glumeruli (65) in H. armigera and H. assulta      [40,44] were used for a phylogenetic analysis (Fig. 2). In this analysis, the olfactory co-receptor (Orco) and pheromone receptor (PR) families were highly conserved, while other HassORs clustered seperately with HarmORs. Consistent with the results of sequence similarity analysis, the phylogenetic analysis also showed that there are 59 groups of orthologous OR genes in these two species.  [42,45]. These differences may account for the recognition of species-specific pheromone blends. Other ORs, which had relatively low similarities compared to PRs, may be associated with detection of host plant compounds. Functional characterization of the ORs from these two species will ultimately contribute to an explanation of principles in host selection. The RT-PCR results showed that 64 ORs were exclusively or primarily expressed in the antennae, which verified the integrity of the transcriptome assembly (Fig. 3). The PR13 was biased in male antenna while other ORs were almost equally expressed in both male and female antenna, which is consistent with results reported recently [42]. Faint expressions of OR9 and OR18 in the legs is consistent with the existence of chemosensory sensilla on the surface of legs, as earlier study by Sun et al also reported that OR5 was expressed in male moth legs of Plutella xyllostella [46].  [50] were used for a phylogenetic analysis (Fig. 4). The two highly conserved co-receptors IR8a and IR25a were present here as well as the two large sub-families of IR7d and IR75 clades. The phylogenetic analysis proved that there are 15 groups of orthologous IR genes in these two species, 5 H. armigera unique  RT-PCR showed the IR genes were all predominantly expressed in male and female antennae (Fig. 5).

Candidate odorant binding proteins in H. armigera and H. assulta
OBPs are small, water-soluble, extracellular proteins that are located in the sensillum lymph, and generally thought to contribute to capture and transport of the odorants and pheromones Transcriptome Comparison between Helicoverpa armigera and H. assulta to the receptors [16,53]. In all, we annotated 29 OBP genes from H. assulta and 34 OBPs genes from H. armigera. The 29 OBPs identified in H. assulta and the 8 OBPs not identified in previous 454 sequencing from H. armigera were listed here ( Table 4). The number of OBPs identified in the present study was in a reasonable range compared to other transcriptome analyses from Agrotis ipsilon (33) [38], S. littoralis (36) [26], S. inferens (24) [29]. All of the 34 OBPs from H. armigera as well as 29 OBPs from H. assulta were used to construct a phylogenetic tree clustering with OBPs from B. mori [17] and H. virescens (Fig. 6) and named based on the homologous genes. The newly identified genes were named according to the length of the unigenes for HarmOBP23-30. We did not detect homologous genes (HarmOBP7, 7.2, 9, 16, 17, 20, 21) in H. assulta. The comparison of the OBP genes in H. armigera and H. assulta were listed in S3 Material. These differences in OBPs might be associated with differences in the recognition of volatiles emitted by the host plant, therefore functional characterization of these proteins will be a prerequisite for research in host selection. The RT-PCR results indicated that 24 HassOBP genes (GOBP1, 2, PBP1, 2, 3, OBP6, 8, 9.2, 13, 14, 15, 18, 19, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 and 32) were primarily or exclusively expressed in male and female antenna (Fig. 7). HassOBP1, 2, 3, 4 and 5 were almost equally expressed in male and female antennae and legs. Some OBPs have been reported not being exclusive to the antenna, Sun et al reported that PxylPBP2 and PxylPBP3 had faint expressions in legs [54] and Sengul et al also reported that an OBP gene was expressed in male fly legs [55]. Expression of OBP56a in the oral disk of the house fly Phormia regina has been reported as a fatty acid solubilizer [56,57].

Candidate chemosensory proteins in H. armigera and H. assulta
CSPs are another class of small soluble proteins in insects. Several CSPs are present at high concentrations in the lymph of chemosensilla and exhibit binding activity towards odorants and pheromones [58]. In this study, we identified 17 CSP genes in H. assulta and 18 CSP genes  in H. armigera. The 17 CSPs identified in H. assulta and the 6 CSPs not identified in previous 454 sequencing from H. armigera were listed here ( Table 5). The comparison of the CSP genes in H. armigera and H. assulta were listed in S3 Material. All of the 17 CSPs from H. assulta and 18 CSPs from H. armigera were used to construct a phylogenetic tree clustering with CSPs from B. mori [59] and H. virescens [60] (Fig. 8) and named based on the homologous genes, while the newly identified genes HarmCSP14-19 were named according to the length of the unigenes. We did not detect the homologous gene of HarmCSP9 in H. assulta.
Candidate sensory neuron membrane proteins in the H. assulta SNMPs are insect membrane proteins that are associated with pheromone-sensitive neurons in Lepidoptera and Diptera [67][68][69][70]. The insect SNMP family consists of two subfamilies, SNMP1 and SNMP2, which were first identified from A. polyphemus [67] and Manduca sexta [71], respectively. Since then, much progress has been achieved in the identification of SNMP1 and SNMP2 in different insect orders [69,[72][73][74][75][76][77][78][79][80]. The expression of SNMP1 in pheromone-specific olfactory neurons suggests it may be involved in pheromone detection [44,67,72], while SNMP2 was found to express in the supporting cells [70,77]. In this study, we identified two SNMP genes, SNMP1 and SNMP2, from H. assulta antennal transcriptomes (Table 6), for the identification of two SNMP in our previous study, we did not list the HarmSNMP here. Both HassSNMP1 and SNMP2 have the complete ORFs and the two transmembrane domains. RT-PCR results showed SNMP1 was primarily expressed in antennae and SNMP2 was aboundant expressed in antennae as well as in legs (Fig. 5).

Conclusion
We RT-PCR confirmed the distribution profiles of these chemosensory genes in different tissues  and found a predominance of expression in the male and female antennae. This transcriptomic analyses greatly expands the information in H. armigera compared to the 454 sequencing and will surely benefit the exploration of chemoreception mechanisms in H. armigera and H. assulta.  Adult moths were given a 10% honey solution after emergence. Antennae were excised from 3day-old male and female moths and immediately frozen and stored in liquid nitrogen until use.

cDNA Library Construction and Illumina Sequencing
Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Total RNA was dissolved in RNase-free water and RNA integrity was verified by gel electrophoresis. RNA  . To disrupt mRNA into short fragments, fragmentation buffer and divalent cations were used at 94°C for 5 min. Using these short fragments as templates, random hexamer-primers were used to synthesize firststrand cDNA. Second-strand cDNA was generated using buffer, dNTPs, RNase H, and DNA polymerase I. After end-repair and ligation of adaptors, the products were amplified by PCR and purified with QIAquick PCR purification kit (Qiagen, Hilden, Germany) and resolved with elution buffer (EB) for end reparation and poly (A) addition. Then, the short fragments connected to sequencing adapters and detected by agarose gel electrophoresis were selected as templates for PCR amplification and sequencing using Illumina HiSeq 2000 (Illumina, San Diego, CA, USA).

Assembly and function annotation
Transcriptome de novo assembly was carried out with the short read assembly program Trinity (version 20120608) [81]. Then the Trinity outputs were clustered by TGICL [82]. The consensus cluster sequences and singletons make up the unigenes dataset. The annotation of unigenes were performed by NCBI blastx against a pooled database of non-redundant (nr) and Swis-sProt protein sequences with e-value < 1e-5. The blast results were then imported into Blas-t2GO [83] pipeline for GO Annotation.

Identification of chemosensory genes
The tblastn program was performed, with available sequences of OBP, CSP, OR, IR and SNMP proteins from Lepidoptera species as "query" to identify candidate unigenes encoding putative OBPs, CSPs, ORs, IRs and SNMPs in H. armigera and H. assulta. All candidate OBPs, CSPs, ORs, IRs and SNMPs were manually checked by the blastx program at the National Center for Biotechnology Information (NCBI).

Sequence and phylogenetic analysis
The open reading frames (ORFs) of the putative chemosensory genes were predicted by using ORF finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). Putative N-terminal signal peptides of OBPs and CSPs were predicted by Signal IP 4.0 (http://www.cbs.dtu.dk/services/SignalP/) [84]. The TMDs (Transmembrane Domain) of ORs and IRs were predicted using TMHMM Server Version 2.0 (http://www.cbs.dtu.dk/services/TMHMM). Alignments of amino acid sequences were performed by MAFFT (https://www.ebi.ac.uk/Tools/msa/mafft/). The phylogenetic trees were constructed by FastTree for phylogenetic analyses of ORs, IRs, OBPs, and CSPs, based on the amino acid sequences of the putative chemosensory genes and the sequences of other insects.

Expression analysis by semi-quantitative reverse transcription PCR
Semi-quantitative reverse transcription PCR was performed to verify the expression of candidate chemosensory genes. Male and female antennae, legs (both sexes mixed) were collected from 3-day olds adult H. assulta after eclosion and total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) and digested with DNase I (Fermentas, Vilnius, Lithuania). The cDNA was synthesized from total RNA using RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific, Waltham, MA, USA). Gene specific primers were designed using Primer 3 (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) (S2 Material) and synthesized by Sangon Biotech Co., Ltd (Shanghai, China). Taq MasterMix (CWBIO, Beijing, China) was used for PCR reactions under general 3-step amplification of 94°C for 30s, 55-60°C for 30s, 72°C for 30s.
Supporting Information S1 Material. Species distribution of the blastx results. Unigenes were searched against the NR protein database using blastx with a cut off e-value < 10 −5 . Species with proportions of more than 1% are shown.