Candidate Olfaction Genes Identified within the Helicoverpa armigera Antennal Transcriptome

Antennal olfaction is extremely important for insect survival, mediating key behaviors such as host preference, mate choice, and oviposition site selection. Multiple antennal proteins are involved in olfactory signal transduction pathways. Of these, odorant receptors (ORs) and ionotropic receptors (IRs) confer specificity on olfactory sensory neuron responses. In this study, we identified the olfactory gene repertoire of the economically important agricultural pest moth, Helicoverpa armigera, by assembling the adult male and female antennal transcriptomes. Within the male and female antennal transcriptomes we identified a total of 47 OR candidate genes containing 6 pheromone receptor candidates. Additionally, 12 IR genes as well as 26 odorant-binding proteins and 12 chemosensory proteins were annotated. Our results allow a systematic functional analysis across much of conventional ORs repertoire and newly reported IRs mediating the key olfaction-mediated behaviors of H. armigera.


Introduction
Olfaction, the sense of smell, plays a predominant role in mediating insect behavior including food source identification, oviposition site selection, mate choice, kin recognition and predator avoidance. Of insect olfactory events, sexual communication in moths is an excellent model system for understanding the mechanism of animal sensory perception at the molecular level because of the complexity, specificity and extreme sensitivity of males to specific pheromone molecules emitted from conspecific females [1,2].
The surface of insect antennae is covered with several different types of sensilla that are small sensory hair structures in which olfactory receptor neurons extend dendrites into the antennal lymph where peripheral olfactory signal transduction events occur. Previous studies have shown diverse olfactory genes including at least odorant-binding proteins (OBPs), chemosensory proteins (CSPs), Sensory neuron membrane proteins (SNMPs), odorantdegrading enzymes (ODEs), ionotropic receptors (IRs) and odorant receptors (ORs) involved in different steps in signal transduction pathway [3,4,5]. All of these, ORs play a central role in chemosensory signal transduction processes that occur in olfactory receptor neurons. ORs located on the surface of olfactory sensory neuronal dendrites in the antennae possess seven transmembrane domains (TMDs). In insects, it is generally thought that odor recognition is mediated by a single set of odor receptor heterodimers composed of a conserved, nonconventional ORCO protein acting as an ion channel and a variable, conventional OR that apparently mediates odorant-binding specificity [6,7,8,9,10,11]. In addition, a novel family of candidate chemosensory ionotropic receptors (IRs) involved in odorant recognition was recently characterized by a genome-based bioinformatics screen in Drosophila melanogaster [12]. IRs are not closely related to ionotropic glutamate receptors in the phylogenetic analysis. However, they possess an obviously similar modular organization to ionotropic glutamate receptors [12,13]. Like ORs, IRs are extremely divergent and expresssed in sensory dendrites. Misexpression of several Drosophila IRs suggested that they might be tuned to a small odor panel such as small amine-like volatile compounds [12,14,15].
In addition, the aforementioned multiple proteins may interact with ORs/IRs and are required for olfactory signal transduction. The soluble OBPs are thought to facilitate the transport of odorant molecules through the sensillum lymph and are sometimes thought to be directly involved in ligand binding [16,17,18]. The CSPs are another class of soluble proteins in the sensillum lymph with abundant expression, however their function in olfactory transduction remains largely unknown [16,19]. SNMPs locate in the dendritic membrane of peripherally olfactory receptor neurons just adjoining ORs and are presumed to trigger ligand delivery to the receptor [20,21,22]. Within the olfactory sensilla, ODEs are reputedly involved in the signal inactivation step serving to rapidly remove stimulatory odorant molecules [23,24].
Genome sequencing and molecular studies have together characterized the complete OR repertoires and other olfactory genes in several insect species such as D. melanogaster, Anopheles gambiae, Bombyx mori and others, promoting understanding of olfactory signal pathways in these insects. However, a systematic analysis of the olfactory genes especially ORs/IRs is still largely absent in from most insect species, including important crop pest insects where a genome sequence is not yet unavailable. Traditional homology-based cloning strategies could identify some conserved genes such as ORCO and OBPs, but would not readily identify divergent genes, especially the divergent ORs and IRs families. Therefore, recent studies of the molecular mechanisms of olfactory signal transduction in moths have largely been limited to B. mori due to the availability of genome sequence. In other Lepidoptera species, only a few ORs and PRs have been identified [25,26,27,28,29,30,31]. Expressed Sequence Tag (EST) sequencing strategies have been successfully used to identify olfactory genes in Lepidoptera without genomic data such as Manduca sexta [32], Epiphyas postvittana [33] and Spodoptera littoralis [34,35]. But most identified genes are OBPs; relatively few are divergent genes such as ORs and IRs. The identification of moth ORs and IRs is still a challenging exercise. Recently, RNA-seq approaches have been used to identify olfactory genes in species where a genome sequence is not yet available. To date, next-generation sequencing of antennal transcriptome has been successfully used to identify substantial numbers of candidate ORs and IRs in M. sexta [36], Cydia pomonella [37], and Bactrocera dorsalis [38].
Moths are a diverse group of insects that include many economically important crop pests. The subfamily heliothinae contains approximately 365 species of noctuid moths including the world's most injurious crop pests such as the Old World bollworm and relatives, and the New World tobacco budworm, which cause huge economic loss annually [39,40,41]. Better understanding on the molecular mechanisms of olfactory recognition in these moths could ultimately lead to identify new targets for developing environment-friendly control strategies.
To explore the olfaction-related transcriptome of the worldwide agricultural pest, H. armigera, we conducted 454 pyrosequencing of RNA extracted from adult male and female antennae. Our goals were to identify olfaction-related genes and olfactory signal transduction mechanisms in H. armigera antennae. Here we report the identification of 47 candidate OR genes and 12 IR genes as well as 26 OBP genes and 12 CSP genes in the male and female antennal transcriptomes.

Insect Rearing
Helicoverpa armigera used in all experiments were obtained from a colony maintained at the Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China. Larvae were reared on an artificial diet at 2761uC under a photoperiod of 14:10 (L: D). Male and female pupae were placed in separate glass tubes. Adults were fed 10% honey solution. Antennae of female or male individuals were dissected 1-3 days after eclosion and immediately frozen in liquid nitrogen, and stored at -70uC until extraction.

Extraction of Total RNA
Frozen antennae were transferred to a liquid nitrogen cooled mortar and ground with a pestle. The homogenate was covered with 1 mL of TriZol reagent (Invitrogen, Carlsbad, CA, USA) and total RNA extractions were performed following the manufacturer's instructions. Total RNA was dissolved in H 2 O and RNA integrity was verified by gel electrophoresis. RNA quantity was determined on a Nanodrop ND-1000 spectrophotometer (Nano-Drop products, Wilmington, DE, USA).

Sequencing
PolyA+mRNA was separated from 10 mg of total RNA extracted from approximately 300 antennae of 1-3 day old adult male or female moths using the PolyA+ Ttract mRNA Isolation System (Promega, Madison, WI, USA) and further purified by using the RNeasy MinElute Clean up Kit (Qiagen, Hilden, Germany) following the manufacturer's protocol. mRNA integrity and quantity were verified on an Agilent 2100 Bioanalyzer. mRNA was fragmented in the RNA fragment reagent for 1 min according to the NEBNextH Magnesium RNA Fragmentation Module Protocol (NEB, Ipswich, MA, USA). After recovered by RNeasy MinElute Clean up Kit, the fragmented mRNA larger than 250 bp was used to synthesize double-stranded-cDNA (ds-cDNA). The first strand cDNA was synthesized using random primers and MMLV reverse transcriptase (Promega, Madison, WI, USA). The second strand was then synthesized using E. coli DNA Polymerase I and RNase H. The resulting ds-cDNAs were used as a template for NextGen sequencing on a Roche 454 FLX by using standard chemistry according to the manufacturer's instructions (Roche, Branford, CT, USA). One full picotiterplate (PTP) with two regions was used for sequencing the male and female samples. 1/2 run data of each sample was generated separately for next analysis.

Unigene Generation
The raw 454 sequences in SFF files were extracted using the Python script sff_extract.py developed by COMAV (http://bioinf. comav.upv.es). All the raw sequences were then processed to remove low quality and adaptor sequences using programs TAGDUST [42], LUCY [43] and SeqClean [44]. The resulting sequences were then screened against the NCBI UniVec database and bacterial genome sequences to remove possible contaminants. Sequences shorter than 50 bp were discarded. The cleaned 454 read sequences from both males and females were assembled into a single assembly by Newbler version 2.5.3 using default parameters under the cDNA option (Roche, Branford, CT, USA).After assembly, the clean reads of both male and females samples were mapped to the assembled unigenes using GS Reference Mapper (Roche, Branford, CT, USA). These reads in male and female samples mapped to each unigene were derived and counted separately. If all the reads mapped certain unigene were from male or female samples, this unigene was determined as male or female special.

Gene Identification and Functional Annotation
The unigenes were searched against the NCBI non-redundant protein database on a local server using the National Center for Biotechnology Information (NCBI) BLASTALL program [45]. GO Annotation was performed by using Blast2GO (GO association done by a BLASTX against the NCBI NR database) [46,47]. The ORFs of the unigenes were predicted by using ORF finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). The signal peptide of the protein sequences were predicted using SignalP 4.0 [48]. The TMDs of annotated genes were predicted using TMHMM Server Version2.0 (http://www.cbs.dtu.dk/services/ TMHMM). H. armigera antennal unigenes were searched with B. mori ORs [49], OBPs [50] and IRs [15] as queries using TBLASTN in the free software BioEdit program. Read numbers of each unigene in male and female samples were derived and counted separately.

Phylogenetic Analyses
The phylogenetic reconstruction implemented for the analysis of OR, IR, OBP and CSP was performed based on the amino sequences of the candidate olfaction genes and the collected data sets. The OR data set contained OR sequences identified in Lepidoptera (12 from H. armigera, 21 from H. virescens and 64 from B. mori) [29,30,49,51,52]. The IR data set contained 12, 18 and 66 IR sequences from S. littoralis, B. mori and D. melanogaster, respectively [15,35]. The OBP data set contained 15 sequences from H. armigera, 17 sequences from H. virescens and 35 sequences from B. mori [50,53]. The CSP data set contained the 7 sequences from H. armigera [53], 9 sequences from H. virescens [54], two sequences from Spodoptera exigua and the 16 sequences from B. mori [19]. The protein name and accession number of the genes used for phylogenetic tree building are listed in supplementary material S1. Amino acid sequences were aligned using ClustalW2 [55]. Unrooted trees were constructed by the neighbor-joining method, with Poisson correction of distances, as implemented in MEGA5 software [56]. Node support was assessed using a bootstrap procedure base on 1000 replicates.

Expression Analysis of the Candidate Receptors by Semiquantitative Reverse Transcription PCR
To illustrate and compare the expression of candidate receptors in male and female antennae, semi-quantitative reverse transcription PCR was performed using cDNAs prepared from male antennae, female antennae and legs (male and female mixture). Legs were used as a control to verify that the candidate receptors were antennae enriched. Total RNAs were extracted as described above. Prior to cDNA synthesis, RNA were treated with DNase I (Fermentas, Vilnius, Lithuania) to remove trace amounts of genomic DNA. The cDNA was synthesized by First Strand cDNA Synthesis Kit (Fermentas, Vilnius, Lithuania) and was used as a template in PCR reactions with gene-specific primers. An actin gene fragment was used as control. Primers were designed using the Primer Premier 5 software (PREMIER Biosoft International). The primer sequences are available in supplementary material S2. PCR was performed with Veriti Thermal Cycler (Applied Biosystems, Carlsbad, CA, USA) under the following conditions: 94uC for 2 min, 33 cycles of 94uC for 30 s, 55-60uC for 30 s; 72uC for 30 s, and 72uC for 8 min. The cycle number was reduced to 26 and 29 for Actin and OR2 amplifications because of their high expression level. The experiment was repeated three times using three independently isolated RNA samples. PCR amplification products were run on a 2% agarose gel and verified by DNA sequencing.

Sequencing and Unigene Assembly
Using GS/FLX 454 pyrosequencing technology, a total of 753,643 raw reads were obtained from the male sample and 518,746 raw reads from female sample. After removing low quality, adaptor, and contaminating sequence reads, male and female antennae yielded 731,001 (average read length 522 bp) and 463,908 (average read length 431 bp) clean reads, respectively. The total bases of sequence data were approximately 382 million and 200 million from male and female samples, respectively. All clean reads from male and female samples were combined into an assembly that generated 37,920 unigenes larger than 100 bp with 991 bp average length. Of these, 8,706 (23.0%) unigenes were from the male antennae and 3,698 (9.8%) unigenes were from female antennae. A flow chart of sequencing and unigene assembly are shown in supplementary material S3. The gene length, ORF length, read number in male or female samples of each unigene were integrated in supplement material S4.

Gene Identification and Functional Annotation
The gene functional annotation was first performed by GO annotation using Blast2GO. Figure 1 illustrates the distribution of the unigene set in GO terms. Among the 37,920 unigenes, 11,233 (29.6%) corresponded to at least one GO term ( Figure 1), 9,164 were assigned to a molecular function (24.2%), 8,329 to a biological processes (22%), and 6,588 to a cellular component (17.4%). There was no difference between the GO terms of male and female sets. In the molecular function category, binding and catalytic activities were the most abundant and enriched GOs terms in both male and female sets. In the biological process terms, cellular and metabolic processes were the most represented. In the cellular component terms, cell, cell part and organelle were the most abundant ( Figure 1). GO annotations of the unigenes are presented in supplementary material S4.
The unigenes were then searched against the NCBI nonredundant nucleotide and protein database using BLASTN and BLASTX. Of the 37,920 unigenes, 24,675 (65.1%) showed similarity to known proteins supplementary material S1). The ORs, IRs, OBPs and CSPs were annotated according to the BLAST result. The BLASTN and BLASTX best hit result is listed in supplementary material S4.

Identification of Candidate Chemosensory Receptors
All the contigs were searched by BLASTX and further by TBLASTN using 63 and 21 known ORs from B. mori and H. virescens, respectively, leading to identification of 47 different contigs that were putative OR genes. All 47 sequences possessed overlapping regions with low identity, and therefore, likely represent unigenes. Of these, 13 HarmOR sequences had full-

Identification of Candidate Ionotropic Receptors
The IRs sequences in the H. armigera antennal transcriptome assembly were represented according to the similarity with known insect IRs. Bioinformatic analysis led to the identification of 12 candidates IRs. Sequence analysis identified 9 unigenes with a full length ORF. The insect IRs contained three transmembrane domains. TMHMM2.0 predicted 11 candidate IRs with three transmembrane domains ( Table 2). Eight of the 12 putative IRs had at least 68% identity with the corresponding IRs of S. littoralis. These may be the orthologous genes in H. armigera. One candidate IR was represented to be IR8a due to its high identity to BmorIR8a. The remaining three putative IRs had relatively low similarity to other insect IRs. Unigene27689 had 65% identity with IR25a of D. melanogaster. Unigene32538 had 34% identity with IR1 of S. littoralis. Unigene28761 had 61% identity with IR75p of S. littoralis. The orthologs of these genes probably haven't been identified in S. littoralis. The phylogenetic analyses validated accurate prediction of the IRs. In the neighbor-joining tree of IRs, all candidate H. armigera IRs clustered with their orthologs of S. littoralis and B. mori into a separate clade (Figure 3). Eleven of the 12 candidate IR unigenes were named according to their similarity to known IRs. The new IR unigene32538 was named HarmIR1.2. The information including unigene reference, length, and BLASTx best hit of all the12 IRs are listed in Table 2. The nucleotide sequences of all 12 IRs were listed in supplementary material S5.

Identification of Putative Odorant-binding Proteins
Within the H. armigera antennal transcriptome 26 different sequences encoding odorant binding proteins were identified, including three PBPs and two GOBPs. Sequence analysis identified 19 unigenes with a full length ORF with a predicted signal peptide sequence. Signal peptide sequence was not detected in the remainder of putative OBPs due to incomplete N-termini. All 26 putative OBPs had high similarity to known lepidopteran OBPs. Unigene35868 and unigene24747 had very high similarity with HarmOBP7 and HarmOBP9 (94% and 95%, respectively). They may be isoforms of the respective H. armigera OBPs or potentially new OBP genes. As expected, the PBP and GOBP sequences were clustered in separate clade in the OBP neighbor-

Identification of Candidate Chemosensory Proteins
Bioinformatic analysis led to the identification of 12 different sequences encoding candidate CSPs. Ten sequences were predicted to have full length and all of them had a signal peptide. Neighbor-joining tree showed all the 12 sequences were clustered with one lepidopterans orthologous gene and the candidate CSPs could be well identified ( Figure 5). The unigenes corresponding to CSP genes were named following the identified CSPs. The rest 6 CSPs named from HarmCSP8 to HarmCSP13 following the known HarmCSP7. The information of all the 12 CSPs was listed

Identification of Candidate Sensory Neuron Membrane Proteins
SNMPs were first identified in pheromone-sensitive neurons of Lepidoptera [20] and are thought to play a role in pheromone detection [21]. Two kinds of SNMPs (SNMP1 and SNMP2) have been identified in insects and both kinds of SNMPs were discovered in H. armigera transcriptome. The nucleotide sequence of contig5289 was Identical to the HarmSNMP published in Genbank. Contig5355 had 61% identity with SNMP2 of H. virescens and was annotation to be SNMP2 of H. armigera ( Table 5). The nucleotide sequences of the 2 SNMPs were listed in supplement material S5.

Tissue-and Sex-specific Expression of Candidate H. armigera OR, GR and IR Genes
The expression patterns of the candidate 47 ORs, 1 GR and 12 IRs in male antennae, female antennae and legs were analyzed by semi-quantitative reverse transcription PCR. Figure 6 shows the detection of all the 60 candidate receptors in antennae of H. armigera.
The expressions of all the 47 ORs were detected in antennae. Of the six candidate PRs, four PRs were found to be exclusive to the male antennae and this result was congruent with the expression profile calculated during unigene assembly. The antennal expression level of HarmOR6 was very low. RT-PCR results demonstrated that the ORs HarmOR17, HarmOR37, and HarmOR41 are male-specific; and the ORs HarmOR5and HarmOR21 are female-specific. The remaining 36 ORs were expressed in both sexes, with some of them differentially expressed in male or female antennae. The single GR identified in this study, HarmGR1, was found to be highly expressed at equal levels in antennae of both sexes. Compared to ORs, the expressions of all IRs had no significant difference between males and females.

Discussion
We used transcriptomic sequencing to identify putative olfactory system genes consisting of 47 ORs, 12 IRs, 26 OBPs, 12 CSPs, and 2 SNMPs in the antennae of H. armigera. The olfactory system genes identified in this study are comparable to recently reported insect antennal transcriptome sequence of M. sexta with 47 ORs, 6 IRs, 18 OBPs, and 21 CSPs, and C. pomonella  [36,37]. All the previously annotated or characterized ORs [51,52], OBPs [53], CSPs [53], and SNMPs of H. armigera were included in the candidate olfaction genes identified in this work. Many nonreceptor olfaction genes including OBPs, CSPs and SNMPs were also identified in our antennal transcriptome. The number identified was slightly less than B. mori. Probably the remaining OR genes are exclusively expressed in other olfaction organ such as maxillary palp and proboscis or developmental period. Activated ORs are the first critical step mediating odorant recognition in the peripheral olfactory signal transduction pathway. Only a few OR genes could be identified according to homolog-based strategy on the condition of ORs of closely related species. Previous studies have suggested that single olfactory receptor neuron (ORN) class generally expresses a single OR (except ORCO) [57], and ORN innervates a corresponding glomerulus in the insect olfactory system [57]. While the relationship is not exactly 1:1:1, the number of glomeruli could form the basis of a rough estimate of the number of ORs in a species [3,36]. In H. armigera, 65 distinct glomeruli have been found in each sex [58] and therefore, the total number of ORs should correspond with the number of glomeruli. Despite much effort searching for ORs in H. armigera, only the coreceptor (HaOR2) and ten OR-coding genes or fragments have been identified and deposited in GenBank [51,52]. In this work, we identified 47 ORs in H. armigera antennae and largely extend the number of ORs. Obviously, the number of ORs identified in this study is still less than expected based on the number of glomeruli. There are several possibilities to address the phenomena. Firstly, we only sequenced the antennal transcriptomes of male and female adult. Some OR genes might specifically expressed in different developmental stage such as larval stage or other olfactory organs of adults such as maxillary palp and proboscis [49]. Previous reports indicated that at least 6 ORs and 1o ORs specifically expressed in Bombyx mori and Drosophila larvae antennae, respectively, which supported our hypothesis (Current Biology 2008;Neuron, 2005 Kreher). Secondly, some glomeruli should be innervated by OSNs expressing other classes of chemoreceptors such as ionotropic receptos and gustatory receptors identified in this study. Thirdly, some OR paralogs with highly sequence similarity could be missing in current analysis since they are difficult to be separated with polymorphism without genome sequence(2012 PLOS one Bengtsson). Finally, we couldn't exclude the possibility that 454 pyrosequencing is not powerful enough to exhaustedly obtain all ORs, especially those ORs with extremely low expression level in the antennae.
Six PRs were identified which were named OR6, OR11, OR13, OR14, OR15, OR16 according H. virescens [29]. Only four PRs (OR11, OR13, OR14, and OR16) had been identified in H. armigera before. Of the 47 ORs identified in this work, six belong to the PR group. They were orthologous genes of six PRs identified in H. virescens. The male-specific or male-enriched expression profiles of these six ORs ( Figure 6) were consistent with the PR expression in closely related species, H. virescens [29].
All the 41 normal ORs were also specifically expressed in antennae. Experiments were conducted to identify ORs with differential expression patterns between male and female because they might perform specific functions in each gender. The RT-PCR showed that three ORs (HarmOR17, HarmOR37, and HarmOR41) were male specific and two other ORs (HarmOR5 and HarmOR21) were female specific ( Figure 6). However, the expression profile calculated according to number of copies found in transcriptomes gave a different result and inconsistencies were also found in other 454 sequences [37]. This may have occurred because 454 reads are suboptimal for expression profiling because the number of reads acquired are relatively low (less than million) compared to Illumina RNA-Seq. In B. mori, BmorOR19 and BmorOR30 were found to be specifically expressed in female [49,59]. We identified one homolog of BmorOR30 (HarmOR48) and two homologs of BmorOR19 (HarmOR21.2 and HarmOR22). But expression of OR48, OR21.2, and OR22 was detected in both male and female H. armigera antennae. This may be occurred because the sequences and expression patterns of normal ORs in different insects showed greater specificity than PRs. The sex-specific ORs need further study in H. armigera.
Recently, a new family of candidate chemosensory ionotropic receptors was discovered, first in D. melanogaster [12] and then in several other species through genome analyses [15]. D. melanogaster antennal IRs have been reported to detect a variety of molecules [14]. In D. melanogaster, 66 IRs were identified 15 of which proved to be antennae-specific [12,15]. Twelve IRs were identified in the antennae of S. littoralis [35]. We also found 12 IRs in H. armigera antennae including two co-receptors, IR8a and IR25a [14]. This is the first report of IRs in H. armigera. Sequences alignments showed that the putative H. armigera IRs have higher similarity with known IRs than ORs. Unlike ORs,   the expression of the IRs appeared to be similar between male and female. Similar results were also observed in the expression study of S. littoralis IRs [35]. The relatively high sequence conservation and expression of IRs implies a probable functional conservation. The antennal IRs are a novel group of chemosensory receptors.

Conclusions
The main objective of antennal transcriptome sequencing was to identify genes potentially involved in olfactory signal detection in H. armigera. The numbers of ORs, IRs, OBPS, CSPs and SNMPs identified in this species are close to the complete repertoire of olfactory genes identified from the antennae of other Lepidopteran species. This study demonstrates that high-throughput 454 pyrosequencing enables the recovery of rare or low copy number expressed genes, especially receptor genes with low expression levels in a species without an available genome sequence. Our findings lay the foundation for future research on the molecular basis of olfactory system of H. armigera and provide information for comparative and functional genomic analyses of related species. In addition, further studies of olfactory systems, exploring the potential for olfaction-based management of pest moth populations will be feasible with the findings reported in this antennal transcriptome sequencing study.

Supporting Information
Supplementary Material S1 Accession numbers for amino acid sequences of ORs, IRs, OBPs and CSPs used in phylogenetic analyses.