Identification of candidate chemosensory genes by transcriptome analysis in Loxostege sticticalis Linnaeus

Loxostege sticticalis Linnaeus is an economically important agricultural pest, and the larvae cause great damage to crops, especially in Northern China. However, effective and environmentally friendly chemical methods for controlling this pest have not been discovered to date. In the present study, we performed HiSeq2500 sequencing of transcriptomes of the male and female adult antennae, adult legs and third instar larvae, and we identified 54 candidate odorant receptors (ORs), including 1 odorant receptor coreceptor (Orco) and 5 pheromone receptors (PRs), 18 ionotropic receptors (IRs), 13 gustatory receptors (GRs), 34 odorant binding proteins (OBPs), including 1 general odorant binding protein (GOBP1) and 3 pheromone binding proteins (PBPs), 10 chemosensory proteins (CSPs) and 2 sensory neuron membrane proteins (SNMPs). The results of RNA-Seq and RT-qPCR analyses showed the expression levels of most genes in the antennae were higher than that in the legs and larvae. Furthermore, PR4, OR1-4, 7–11, 13–15, 23, 29–32, 34, 41, 43, 47/IR7d.2/GR5b, 45, 7/PBP2-3, GOBP1, OBP3, 8 showed female antennae-biased expression, while PR1/OBP2, 7/IR75d/CSP2 showed male antennae-biased expression. However, IR1, 7d.3, 68a/OBP11, 20–22, 28/CSP9 had larvae enriched expression, and OBP15, 17, 25, 29/CSP5 were mainly expressed in the legs. The results shown above indicated that these genes might play a key role in foraging, seeking mates and host recognition in the L. sticticalis. Our findings will provide the basic knowledge for further studies on the molecular mechanisms of the olfactory system of L. sticticalis and potential novel targets for pest control strategies.


Introduction
The beet webworm, Loxostege sticticalis L. (Lepidoptera: Pyralidae), a worldwide distributed and migratory pest in North China, causes serious economic damage every year [1,2]. L. sticticalis seems to be polyphagous in its larval stage, but it has been reported to have obvious hostplant selection for crops (sugar beet, potato and soybean) and pastures [3][4][5]. This has been PLOS  artificial diet at a temperature of 22 ± 1˚C with 70% ± 10% relative humidity under a photoperiod of 16L: 8D (Light, Dark). When the larvae grew up to the third instar, 20 third instar larvae were picked and frozen in liquid nitrogen for conservation. Male and female pupae were placed into separate cages for eclosion. The adult moths were fed with a 5% honey solution after emergence. The antennae and legs from the male and female individuals were excised at 1 to 3 days after eclosion, immediately frozen and stored in liquid nitrogen until the RNA extraction.
The total RNAs were isolated from 100 adult male antennae, 100 adult female antennae, 24 adult legs (male: female = 1:1) and 2 third instar larvae respectively. Three biological replicates were prepared for each pilot part. Total RNA was extracted using Trizol reagent (Invitrogen, Shanghai, China), following the manufacturer's instructions. The integrity of the RNA samples was detected by gel electrophoresis, and a NanoDrop 2000 spectrophotometer (NanoDrop, Wilmington, DE, USA) was used to determine RNA quantity. Before sequencing, the RNA samples were stored at -80˚C.

cDNA library construction, and Illumina sequencing
The cDNA library construction and Illumina sequencing of our RNA samples were performed at Biomarker technologies CO., LTD., Beijing, China. First, the NanoDrop 2000, Qubit 2.0 (Invitrogen, Carlsbad, CA, USA) and Agilent 2100 (Agilent Technologies, Santa Clara, CA, USA) methods were used respectively to detect the purity, concentration and integrity of each RNA sample (10ug). Second, Oligo (dT) magnetic beads were used to gather mRNA (poly-A RNA). Using a fragmentation buffer, the mRNA of each sample was broken into short fragments randomly at 94˚C for 5 min. Third, The first-strand cDNA were synthesized using N6 random primers and mRNA templates and the second strand cDNA were synthesized using buffer, dNTPs, RNase H and DNA polymerase I. The synthetic cDNA was purified using AMPure XP Beads (Beckman Coulter, Inc.). These dual-strand DNA samples were treated with T4 DNA polymerase and T4 polynucleotide kinase, respectively, for end-repairing and dA-tailing, followed by adaptor ligation to the dA tail of the dsDNA using T4 DNA ligase. Then, suitable fragments were selected with AMPure XP beads (Beckman Coulter, Inc.). Finally, the products were amplified by PCR and purified using the QIAquick PCR Purification Kit (Qiagen, Valencia, CA, USA) to create a cDNA library. The libraries were sequenced on an Illumina HiSeq™ 2500 platform, and paired-end reads were generated using a PE125 strategy (paired-end reads of 125 base pairs per read).

De novo assembly and function annotation
High-quality clean reads were obtained from the raw reads by removing reads containing either an adapter or poly-N sequence and reads that were in low-quality. Transcriptome de novo assembly was performed with the short read assembly program Trinity [35]. Then, the Trinity outputs were clustered by TGICL [36]. The consensus cluster sequences and singletons compose the unigene dataset. The annotation of unigenes was performed by NCBI BLASTx against a pooled database of non-redundant (nr) and Swiss-Prot protein sequences with evalues < 1e-5. The Blast results were then imported into the Blast2GO [37] pipeline for GO Annotation. Protein coding region prediction was performed by OrfPredictor [38] according to the blast results.

Phylogenetic tree analysis
Multiple alignments of the L. sticticalis amino acid sequences of the chemosensory genes were performed by ClustalX 2.0 [39]. The phylogenetic trees were constructed by MEGA 6.0 [40] using the neighbor-joining method [41] with a p-distance model and a pairwise deletion of gaps. Bootstrap support was assessed by a boot strap procedure based on 1000 replicates. The data sets of chemosensory gene sequences, which were chosen from other Lepidopteran species, are listed in supporting information (S2 Table).

RT-qPCR analysis
Using real-time quantitative PCR (RT-qPCR), we measured the expression profiles of chemosensory genes in different parts (male antennae, female antennae, legs and third instar larvae). The primers used for the RT-qPCR were designed using the Primer Premier 5.0, which are listed in supporting information (S3 Table). The RT-qPCR was performed by ABI 7500 Detection System (Applied Biosystems, Carlsbad, CA, USA). Before transcription, RQ1 RNase-Free DNase (Promega, Madison, USA) was used to remove residual genomic DNA of total RNA. An equal amount of cDNA (150 ng/u l) was synthesized using 1 st strand cDNA synthesis kits (TaKaRa, Dalian, China) as the RT-qPCR templates. Each RT-qPCR reaction was conducted in a 25 μ l reaction: 12.5 μ l of 2X SuperReal PreMix Plus (TianGen, Beijing, China), 0.75 μ l of each primer (10 μ M), 2 μ l of sample cDNA, and 9 μ l of sterilized ddH 2 O. The RT-qPCR was run as follows: 94˚C for 2 min, followed by 40 cycles of 95˚C for 15 s, 60˚C for 30 s, 60˚C for 1 min, heated to 95˚C for 30 s and cooled to 60˚C for 15 s to measure the melting curve.
RT-qPCR data analyses were performed using the 2 -ΔΔCT method [42]. Data of relative expression levels in various tissue were subjected to one-way analysis of variance (ANOVA), followed by a least significant difference test (Tukey) for mean comparison. The data were analyzed directly by SPSS 9.20 software (SPSS Inc., Chicago, IL, USA). Differences were considered significant at p < 0.05. The RT-qPCR data were analyzed and exported as TIF files by Graphpad Prism 5.0 (GraphPad Software, La Jolla, CA, USA).

Transcriptome assembly of L. sticticalis
Using the Illumina HiSeq™ 2500 platform, we performed next-generation sequencing on a cDNA library constructed from L. sticticalis. A total of 869.3 million clean reads (86.93 Gb) were obtained. Q30 bases were more than 85.01% in all the samples. After de novo assembly, we assembled 3,266,885 contigs with a mean length of 68.57 nt and an N50 length of 63 nt, 148,291 transcripts with a mean length of 971.37 nt and an N50 length of 1828 nt and identified 80,761 unigenes with a mean length of 722.82 nt and an N50 length of 1495 nt ( Table 1). The size distribution analysis of the unigenes indicated that 14,484 unigenes were larger than 1000 nt in length, which represented 17.93% of all unigenes (S1 Fig). All of the clean data used in this study were uploaded to SRA with the accession number SRS1782539 to SRS1782550 (male antennae: SRS1782539, SRS1782546 and SRS1782548; female antennae: SRS1782540, SRS1782545 and SRS1782550; legs: SRS1782541, SRS1782544 and SRS1782547; larvae: SRS1782542, SRS1782543 and SRS1782549). Most assembled unigene sequences were uploaded to GeneBank with the accession number GFCJ01000001 to GFCJ01079039. The accession numbers of 131 candidate chemosensory genes identified in this study were listed in supporting information (S4 Table).

Nr homology analysis and Gene Ontology (GO) annotation
Of the 80,761 unigenes, the results of annotation by NCBI BLASTx showed that 30,581 (37.87%) unigenes matched to known proteins. The remaining unigenes failed to match any sequence, with an e-value < 1e-5, in neither the Nr nor the Swiss-Prot databases. Among the Nr homology annotated unigenes, 49.62% of the homologous species had best blast match to Lepidopteran sequences. The highest match percentage (28.12%) was to Bombyx mori sequences followed by Danaus plexippus (20.09%) and Papilio xuthus (1.41%) (S2 Fig). Of the Nr annotated unigenes, 62.01% of the unigenes showed strong homology, with an e-value < 1e-45.
Gene ontology (GO) annotation of the unigenes was acquired using the Blast2GO pipeline according to the BLASTx search against Nr, which was used to classify transcripts into functional groups according to the GO category. Of the 80,761 unigenes, 16,899 (20.92%) unigenes were assigned to the various GO terms. Among the 16,899 GO annotated unigenes, the unigenes were allocated to the biological process terms more than the molecular function terms or the cellular component terms. In the molecular function category, the genes expressed in the antennae were mostly enriched for molecular binding activity (e.g., nucleotide, ion and odorant binding) and catalytic activity (e.g., hydrolase and oxidoreductase). In the biological process category, cellular, metabolic and single-organism processes were the most represented. In the cellular component category, cell, cell part and organelle were the most abundant groups (Fig 1). These results are comparable to the reported Chilo suppressalis transcriptional profile [21].

Identification and expression of candidate ORs of L. sticticalis
In this study, we identified 54 candidate ORs in L. sticticalis by bioinformatics analysis. Of these, 38 unigenes had full-length ORFs that encoded 325 to 474 amino acids, and 16 unigenes were partial sequences by the NCBI BLASTp analysis. The 54 OR sequences had a BLASTx best hit to Lepidopteran sequences, with an e-value < 1e-5 ( Table 2). Using the TMHMM Server v. 2.0, we also detected 54 candidate OR sequences with 0-8 transmembrane domains (TMDs).
The unigene C57376.g0 was named LstiOrco due to the high level of identity with the conserved Orco proteins of other insect species in Lepidoptera, which was clustered into the Orco clades of Lepidoptera in the phylogenetic tree (Fig 2). Among the 54 candidate LstiORs, LstiOrco showed the highest expression levels in the antennae in both RNA-Seq and RT-qPCR analysis (Fig 3). Five unigenes, named "LstiPRm" (m = 1 to 5), were considered to be pheromone receptors (PRs) because they shared considerable similarity with previously characterized Lepidopteran PRs and were clustered together into one subgroup in the phylogenetic tree (Fig 2). For the relatively conserved PR genes, LstiPR1 and LstiPR2 were clustered together with PR 1, 2, 3 and 4 in C. suppressalis. LstiPR3, 4 and 5 were not closely grouped with the Pyralidae PRs but clustered with the B. mori, H. armigera and H. assulta PR clade with high bootstrap support ( Fig  2). The five LstiPRs showed higher expression in the antennae of both sexes than in the legs and larvae (p < 0.05) (Fig 3).

Identification and expression of candidate IRs and GRs of L. sticticalis
Based on bioinformatic analysis, we identified 18 candidate IR sequences in L. sticticalis. Ten sequences contained full-length open reading frames (ORFs), and the remaining 8 sequences    were marked as incomplete because they lacked a complete 5' or 3' terminus. Seventeen putative IRs in L. sticticalis were predicted to have 1-4 TMDs by TMHMM Server v. 2.0 (Table 3). A phylogenetic tree of the LstiIRs was constructed based on the amino acid sequences from L. sticticalis, Drosophila melanogaster, B. mori and S. littoralis (Fig 4). The neighbor-joining tree analysis showed a clear segregation between Dmel ionotropic glutamate receptors (iGluRs) and insect IRs, and 18 LstiIR candidates were clustered to antennal IRs and the IR25a/IR8a clades, but did not belong to DmeliGluRs. According to their BLASTx best hits to Lepidopteran IRs and their positions in the phylogenetic tree, the 18 candidate IRs were given names consistent with the number and suffix of the Dmel/Bmor/Slit IR orthologs in the same clade (Table 3).
In total, we identified 13 GR candidates in L. sticticalis, including 3 unigenes with fulllength ORFs and 10 unigenes with partial sequences. Thirteen putative GRs were predicted to have 1-7 transmembrane domains (Table 3). Of the 13 putative LstiGRs, 11 sequences were named based on their clustering into the clades of Dmel/Bmor/Hass/Harm GRs in the phylogenetic tree (Fig 6). Two unigenes (C52834.g1 and C3705.g0) had low bootstrap values and were unable to be placed on the phylogenetic with confidence and were named LstiGR6 and LstiGR7, respectively. The RT-qPCR results showed that 13 candidate LstiGRs were enriched in the antennae and the expression amounts of LstiGR63a.1 in the male antennae was the highest. Interestingly, the putative LstiGR6 was sex-specific expressed in the female antennae, but also expressed in the larvae (Fig 7).

Identification and expression of putative OBPs of L. sticticalis
In the process of identification of putative OBPs, we used not only keyword searching by PSI--BLAST, but also motif scanning to detect the conserved six cysteine residue pattern, which is   (Table 4). Four unigenes (C59843.g0 C52747.g0, C52060.g0 and C58964.g0) were clustered into the PBP and GOBP clades of Lepidoptera in the phylogenetic tree (Fig 8) and were named LstiPBP1, LstiPBP2, LstiPBP3 and LstiGOBP1, respectively. The remaining 30 sequences were named LstiOBP1-30 on the basis of the similarity to known Lepidopteran OBPs and female antennal expression levels. OBPs usually were classified into three phylogenetic families. Classic OBPs, which include the PBP-GOBP group, are characterized by the conserved 6 cysteine residue pattern. The Minus-C class has lost cysteine residues, which are generally C2 and C5, and lysine can replace the position of the lost C2 [15]. In contrast, the Plus-C class has 1-2 extra cysteines and one characteristic proline next to the end of the sixth conserved cysteine residue [5]. The results of our sequence analysis showed that 23 complete ORF OBPs of L. sticticalis could be divided into three groups: 17 Classic OBPs (LstiPBP1, PBP3, GOBP1, OBP1, OBP3, OBP4, OBP6, OBP9, OBP12, OBP14, OBP15, OBP16, OBP18, OBP19, OBP21, OBP26 and OBP29), 4 Minus-C OBPs (LstiOBP7, OBP13, OBP17 and OBP28) and 2 Plus-C OBPs (LstiOBP11 and OBP22) ( Table 4).

Identification and expression of candidate CSPs and SNMPs of L. sticticalis
CSPs have a conserved cysteine pattern of C1-X6-C2-X18-C3-X2-C4 [11]. Through bioinformatics analysis, we identified 10 candidate CSPs in L. sticticalis. Eight sequences had fulllength ORFs, but other unigenes were partial sequences. In addition, the unigenes C50444.g0 and C54133.g0 failed in the SignalP tests ( Table 5). The 10 candidate CSPs of L. sticticalis best matched to Lepidopteran sequences, with an e-value < 1e-5 and an identity of more than 55%  (Table 5). We named the 10 CSP candidates according to their expression levels in the L. sticticalis female antenna. The 10 CSP sequences in L. sticticalis were clustered with Lepidopteran orthologous genes from L. sticticalis, C. suppressalis, C. punctiferalis, B. mori and H. armigera in the phylogenetic tree (Fig 10). The RT-qPCR results showed that candidate LstiCSP2, LstiCSP7 and LstiCSP10 presented higher expression in the antennae, LstiCSP5 had enriched expression in the legs, and the putative LstiCSP9 was highly expressed in the larvae. In addition, the other 5 LstiCSP candidates (CSP1, CSP3, CSP4, CSP6, and CSP8) were mainly expressed in the antennae and legs (Fig 11).
In L. sticticalis, we obtained two SNMPs that were 3'lost and 5'lost sequences, respectively. The two SNMPs separately had a BLASTx best hits to Ostrinia nubilalis SNMP1 (similarity 88%) and SNMP2 (similarity 85%) sequences with an e-value < 1e-05 by NCBI BLASTp (Table 6). LstiSNMP1 and LstiSNMP2 had significantly higher expression in the antennae than in the legs and larvae validated by RT-qPCR analysis (P < 0.05) ( Table 5). According to the phylogenetic analysis, LstiSNMP1 and LstiSNMP2 clustered with the known Lepidopteran SNMP groups (Fig 12).
The protein sequences of the candidate chemosensory genes were listed in supporting information (S5 Table).

Analysis and comparison of RNA-Seq data and RT-qPCR data
We obtained 131 candidate chemosensory genes (54 ORs, 18 IRs, 13 GRs, 34 OBPs, 10 CSPs and 2 SNMPs) in L. sticticalis by Illumina sequencing. The results of RNA-Seq showed that most genes in the antennae had higher FPKM (Fragments per Kb per million reads) than in the legs and larvae (p < 0.05), especially 76 genes with specific expression in the antennae ( Fig  13A). Furthermore, the OR7 showed female antennae-specific expression (Fig 13A and 13B). All results analyzed were based on FPKM.
To test the result of Illumina sequencing, we investigated the expression patterns of 131 L. sticticalis chemosensory genes with RT-qPCR analyses. The RT-qPCR results showed that the expression levels of these candidate chemosensory genes in different tissues were mostly consistent with the results of RNA-Seq. Most notably, a majority of olfactory genes were predominantly expressed in the antennae. However, the expression levels of several chemosensory genes between the results of RT-qPCR and RNA-Seq have obvious differences. For example,  the results of RT-qPCR showed LstiOR28, 29/IR64a, 75P.1/OBP16, 24 in the antennae, LstiOBP29 in the legs and LstiIR1 in the larvae had specific expression (Figs 3, 5 and 9), but these genes in Illumina sequencing analyses only showed higher expression (Fig 13A); on the contrary, the (IR8a, 76b/PBP1-3, GOBP1, OBP1-3, 8, 16/CSP2) only showed higher expression levels in the antennae by RT-qPCR (Figs 5, 9 and 11). These differences in the results need further research for confirmation.

Discussion
At present, the molecular basis of chemoreception in Lepidoptera is well understood compared to other insects, but the research on Pyralidae is relatively scarce. Therefore, we sequenced and analyzed the transcriptome of adult antennae, adult legs and larvae from L. sticticalis and obtained a dataset of 54 ORs, 18 IRs, 13 GRs, 34 OBPs, 10 CSPs and 2 SNMPs. In this study, comparing to the antennal transcriptome in Lepidoptera from C. suppressalis (47 ORs,20 [33,52], our LstiOR dataset of sequences has no notable difference in the identified gene numbers. RNA-Seq and RT-qPCR results both showed 54 putative LstiORs were mainly expressed in the antennae, which was similar to the other Lepidopteran results [9,21,31,33,43,45]. Studies   about B. mori showed that three female-biased ORs (OR19, OR45 and OR47) are capable to respond to host plant volatiles (linalool, benzoic acid, 2-phenylethanol and benzaldehyde) [49,53]. The 6 female-biased expression LstiORs (OR4, OR23, OR29, OR30, OR32 and OR34) that were clustered with the female-biased ORs from B. mori in the Phylogenetic tree might have similar functions, but further studies were needed. In view of the host selectivity of larvae [3,4], LstiOR5, OR34 and OR40 that were richly expressed in larvae might play important roles in host-plant selection. Some reports showed that PRs specific expressed in male antennae detected the sex pheromone components of female moths [54,55,57,58]. However, in our study, 5 candidate PRs of L. sticticalis were expressed in the antennae of both sexes, which is consistent with the recent reported results of 6 putative PRs identified in C. suppressalis, 2 PRs (OR6 and OR13) in H. armigera and 2 PRs in S. littoralis [21,56,57]. Therefore, the recognition mechanism of LstiPRs to the sex pheromone [59] of the female moth requires further research.
As the complement of ORs, ionotropic receptors were first discovered in D. melanogaster [28] through genomic analyses. Compared to ORs, the IR family is relatively conserved both in sequence and expression pattern. In our study, among the 18 LstiIRs we discovered, 13 sequences have orthologs found in Dmel/Bmor/Slit IRs; the expression levels were not significantly different between male and female antennae, which were similar to the IR expression in S. littoralis [54], C. suppressalis [21] and H. armigera [33]. Lsti76b, as well as LstiIR8a and LstiIR25a, was highly expressed in the antennae, and these genes might also be special subunits of individual odor-specific receptors [60]. The functions of IRs in L. sticticalis are likely to be conserved as IRs in other Lepidoptera, both in terms of the relatively high sequence conservation and the comparability of expression levels.
Gustatory receptors play a critical role in the detection of chemicals, which ultimately influence the insects' decisions when looking for food, mates and egg deposition sites [32,62]. Interestingly, our LstiGR4 shared 72% homology with HarmGR4 which were identified as a sugar receptor [47,61], so LstiGR4 might be a sugar receptor and participate in sugar detection and consumption. GR21a/GR63a that were expressed in CO2-sensing neurons could allow the detection of CO2 concentration in D. melanogaster [62][63][64]. In our study, 5 LstiGRs (GR21a, GR21b, GR63a, GR63a.1, and GR63a.2) were clustered into the clades of DmelGR21a/Bmor-GR63a in the phylogenetic tree and might be CO2 receptors. However, annotation of these GRs awaits further demonstration. Of our 34 LstiOBPs, most LstiOBPs were richly expressed in the antennae of both sexes that was similar to other transcriptome analyses in Lepidoptera [9,21,33,43,44]. As specific OBPs, PBPs usually were considered to have a connection with male moth perception of the sex pheromone components released by female moths [66][67][68][69]. Our 3 LstiPBPs were closely clustered into the PBP clade of other Lepidoptera in the phylogenetic tree, which suggests that our LstiPBPs might have similar function. Currently, studies also show that OBPs specifically   expressed in larvae displayed a high recognition capacity to the major sex pheromone component [65]. Thus, one of the LstiOBPs (OBP11, OBP20, OBP21, OBP22, and OBP28) which specifically expressed in the larvae might play a key role in the perception of female sex pheromone in L. sticticalis.
CSPs are more highly conserved than OBPs across insect species and are widely expressed in different parts of the insect body [31,70]. Our 10 LstiCSPs were primarily expressed in the legs and antennae of the adults, which was similar to the results of other Lepidoptera [9,21,31,33,43,45]. But LstiCSP9 was mainly expressed in larvae. The antennal enriched CSPs might be involved in chemoreception [71], and the CSPs expressed in the legs might participate in other   physiological processes beyond chemoreception [72]. However, the function of our putative LstiCSPs requires further research.
Because SNMPs were first identified in Lepidopteran pheromone-sensitive neurons [17,73], these proteins are believed to be involved in the recognition of insect pheromones. In this study, the expression levels of SNMPs in L. sticticalis were consistent with the reported results that SNMP1 of H. assulta was primarily expressed in the antennae, and SNMP2 of H. assulta was abundantly expressed in the antennae and legs [33]. Previous studies showed that SNMP1 was crucial for the detection of the volatile pheromone 11-cis-vaccenyl acetate in D. melanogaster [18]. SNMP2, in contact with pheromone-sensitive sensilla, was expressed in sensilla support cells [74]. According to the similar expression levels and physiological analysis to other Lepidoptera, we can infer that SNMPs in L. sticticalis might have the same role as in D. melanogaster. However, the general mechanism of SNMPs' function in insects remains inadequately understood. Therefore, future studies on the function of SNMP1 and SNMP2 in L. sticticalis are necessary.

Conclusion
Our aim of this study was to identify genes potentially involved in olfactory signal detection in L. sticticalis, and this aim was well met by the identification of a repertoire of 54 ORs, 18 IRs, 13 GRs, 34 OBPs, 10 CSPs and 2 SNMPs. Our results not only establish a means to further elucidate the molecular mechanisms of chemosensation, but also provide potential targets for disrupting the chemical communication system in L. sticticalis as a means of pest control.