Identification and Comparative Expression Profiles of Chemoreception Genes Revealed from Major Chemoreception Organs of the Rice Leaf Folder, Cnaphalocrocis medinalis (Lepidoptera: Pyralidae)

To better understand the olfactory mechanisms in the rice leaf folder, Cnaphalocrocis medinalis (Guenée), a serious pest of rice in Asia, we established six partial transcriptomes from antennae, protarsus, and reproductive organs of male and female adults. A total of 102 transcripts were identified, including 29 odorant receptors (ORs), 15 ionotropic receptors (IRs), 30 odorant-binding proteins (OBPs), 26 chemosensory proteins (CSPs), and 2 sensory neuron membrane proteins (SNMPs). The expression patterns of these genes were calculated by fragments per kilobase of exon per million fragments mapped (FPKM) and validated by real-time quantitative PCR (RT-qPCR). Some transcripts were exclusively expressed in specific organs, such as female protarsus, whereas others were universally expressed, this varied expression profile may provide insights into the specific functions mediated by chemoreception proteins in insects. To the best of our knowledge, among the 102 identified transcripts, 81 are novel and have never been reported before. In addition, it also is the first time that ORs and IRs are identified in C. medinalis. Our findings significantly enhance the currently limited understanding olfactory mechanisms of the olfactory mechanisms underlying the chemoreception system in C. medinalis.


Introduction
The leaf folder, Cnaphalocrocis medinalis (Guenée) is a migratory rice pest in humid tropical and temperate regions of Oceania, Africa, and Asia [1]. The larvae can damage plants by folding the leaves and scraping the green leaf tissues within the fold, this activity reduces sensation [28]. Insect IRs contain structural regions that are conserved in iGluRs, including three trans-membrane domains, a two-way ligand-binding domain with two lobes, and one ion channel pore. However, the conserved iGluR glutamate-binding residues in the two lobes are not retained in IRs, indicating their atypical binding characters [28]. Unlike ORs, two or three IR genes appear to be always co-expressed with one or both of the conserved IR8a and IR25a genes in an IR-expressing neuron [29]. In Drosophila, IR64a and IR8a formed a functional ion channel that allowed ligand-evoked cation currents indicating that IR8a is a subunit that forms a functional olfactory receptor with IR64a in vivo to mediate odor detection [30]. Furthermore, IRs can be classified into two distinct subfamilies. One is the conserved 'antennal IRs', which have been proposed to be derived from an animal iGluR ancestor and thus represent the first olfactory receptor family in insects. The other is the species-specific 'divergent IRs', which have been implicated in taste and are derived from 'antennal IR' ancestors [31]. Additionally, sensory neuron membrane proteins (SNMPs) of which there are two sub-types of SNMP proteins, SNMP1 and SNMP2 in different insects [32], might also participate in chemoreception.
Identification of chemosensory genes is a prerequisite for functional characterization of olfactory genes. With the development of next generation sequencing (NGS) techniques, numerous chemoreception genes have been identified from various insect species, such as Manduca sexta [33], Cydia pomonella [26], Aphis gossypii [34], Helicoverpa armigera [35], Sesamia inferens [36], Chilo suppressalis [37], and Spodoptera littoralis [18,38]. Although numerous chemosensory genes have been molecularly identified based on sequence similarity to reported genes in almost all insect orders, their exact functions are largely unknown. The expression profiles, particularly the tissue distribution, could provide important information on the functions of the chemosensory genes [39][40][41].
Previously, we reported sequencing and analyses of a C. medinalis antennal cDNA library and identified a subset of chemosensory genes corresponding to 5 OBPs and 3 CSPs [42]. Herein, to extend our view of the C. medinalis transcriptome and significantly increase the number of annotated olfactory genes, samples of the major olfactory organs, including antennae, protarsus, and reproductive organs from both male and female were analyzed. Our new analysis resulted in a total of 67,357 unigenes gene ontology (GO) annotations of which revealed enrichment in binding and catalytic activities. These data allowed us to identify novel C. medinalis olfactory genes, including binding proteins and chemosensory receptors. Furthermore, fragments per kilobase of exon per million fragments mapped (FPKM) calculations were performed to represent the expression levels, and real-time quantitative PCR (RT-qPCR) experiments were conducted on eight selected genes to validate our results.

Materials and Methods
Insect resources C. medinalis larvae or pupae were collected from rice field in Wuxue, a county of Hubei Province in China (115°45 0 E; 30°00 0 N). The properties were not privately owned or protected in any way, and this field study did not involve endangered or protected species. The larvae were reared in buckets in the laboratory until pupation. Pupae were sexed and maintained separately inside glass tubes until moths emerged. Immediately after emergence, female and male adults were provided a 10% sucrose solution. To obtain mated females, newly emerged male and female moths were paired in plastic-screen cages (20 × 20 × 10 cm). Antennal (At), Protarsus (P), and reproductive organs (Ro) were isolated from mixed population adults 1-5 days after eclosion and kept separately in a freezer (-80°C) until use.

RNA preparation and cDNA library construction and sequencing
Frozen samples were individually crushed in a liquid nitrogen-cooled vitreous homogenizer and total RNA from each sample (~200 adult male or female moths) was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions. Residual genomic DNA was removed using DNase I (Promega, Madison, WI, USA). Total RNA was dissolved in RNase-free water and RNA integrity was verified by gel electrophoresis. The RNA quantity was determined on a Nanodrop ND-2000 spectrophotometer (NanoDrop products, Wilmington, DE, USA).
Poly-A RNA was isolated from about 10-20 μg of the total RNA of different tissue samples was extracted using oligo (dT) magnetic beads. Then, poly-A RNA for each sample was digested into short fragments in a fragmentation buffer. Random hexamers were used for firststrand cDNA, followed by second-strand cDNA synthesis using RNase H and DNA polymerase I. These dual-strand DNA samples were treated with T4 DNA Polymerase and T4 Polynucleotide Kinase for end-repair and dA-tailing, followed by adaptor ligation to the dsDNA dA tail using T4 DNA ligase. Products corresponding to insert length~200 bp were collected by 2% agarose gel electrophoresis and purified with a Takara quick Gel Extraction Kit (Takara, Hilden, Germany) and used as templates for PCR amplification to create a cDNA library. The library was pair-end sequenced using a PE90 strategy (paired-end reads of 90 base pairs per read) on an Illumina HiSeq™ 2000 (Illumina, San Diego, CA, USA) at the Beijing Genome Institute (Wuhan, China). Different libraries were sequenced in one lane; raw-reads were sorted by barcodes in the sequencing adaptor.

Assembly and functional annotation
The clean-read dataset was generated from raw-reads by the following steps. First, reads with adaptors or those containing more than 5% unknown nucleotides (Ns) were directly removed. Second, low quality reads containing more than 20% suspect-nucleotides with a Phred Quality Score less than 10 were filtered out. Finally, both ends of the reads were evaluated to trim unreliable ends containing more than 3 successive suspect-nucleotides. Each clean-read dataset of all of the male and female tissues were separately de novo assembly using a paired-reads mode and default parameters in Trinity r2012-06-08 [43]. The Trinity outputs were clustered using TGICL [44]. Consensus cluster sequences and singletons made up the unigene dataset.
Those unigenes larger than 150 bp were first aligned using BLASTx to protein databases, including Nr, Nt, Swiss-Prot, KEGG, COG, and GO (e-value<1e-5). Proteins with the highest sequence similarity with the given unigenes along with their protein functional annotations were subsequently retrieved. BLAST results were then imported into Blast2GO pipeline [45] for GO Annotation. Protein coding region prediction was performed using OrfPredictor according to the BLAST results. Blast2GO was used to retrieve GO annotations of the unigenes with GO functional classification obtained using WEGO software. Then the best hit sequences were searched by BlastX in NCBI.

Expression abundance analysis of unigenes
The expression abundance of the unigenes was calculated by the FPKM method [46] using the formula with software program: FPKM (A) = 106×C/ (N×L/103). In this formula, FPKM (A) is the expression abundance of unigene A; C is the number of fragments that uniquely aligned to gene A by SOAP [47]; N is the total number of fragments that uniquely aligned to all genes; and L is the total number of bases in gene A. The FPKM method can eliminate the influence of different gene lengths and sequencing discrepancy on the calculation of expression abundance. These results can be directly used to compare expression differences in samples.

Quantitative real-time PCR validation
Total RNA was extracted from tissue samples according to the methods above. First-strand cDNA was synthesized according to the protocol provided with the One Step SYBR 1 Prime-Script 1 RT-PCR kit (Takara Code: DRR066A). The eight tested genes were selected and PCR primers were designed with the NCBI primer design tool Primer-BLAST (S1 Table). Tissue expression profiling of adults was carried out by a Bio-Rad IQ5 real-time qPCR system. Product sizes of~100-160 bp were used to measure increased fluorescence of SYBR green. Realtime qPCR was conducted in 20 μl reactions that contained 10 μl 2× SYBR Green PCR Master Mix, 0.8 μl of each primer (10 mM), 2 μl sample cDNA, and 6.4 μl sterilized ultrapure H 2 O (Millipore). To properly assess the efficiency of PCR amplification, all the primers were tested by at least five orders of magnitude in multiples (5logs) consecutive dilutions of template concentration with at least 3 times parallel repeated. Then the efficiency can be controlled >90%.
To assess reproducibility, test samples and endogenous controls were carried out in triplicate. Cycling parameters were as follows: 94°C for 2 min, 40 cycles at 95°C for 10 s, and 60°C for 30 s. Products were analyzed by agarose gel electrophoresis, sequencing, and melt curve analysis, which indicated that the respective reactions did not yield non-specific amplification products. CmedActin was used as an endogenous control to normalize the expression of target genes in olfactory tissues [42]. Female antennae of 0d old adults were used as a calibrator to calculate ΔΔCt values between tissues (ΔCt male antennae or female legs or other tissues at any time point-ΔCt female antennae of 0 d). Moreover, relative quantification was performed using the comparative 2 -ΔΔCt method [49] to identify differences in mRNA expression levels in different tissues and sexes of adults data indicate means ± SE (n = 3). One-way ANOVO statistical analysis was used to measure the different expression levels in tissues.

Results and Discussion
Assembly Transcriptomic sequence data were generated using Illumina HiSeqTM2000/MiSeq technology. Approximately, 83.6, 90.2, 85.2, 84.5, 86.2, and 85.7 million raw-reads were obtained from libraries constructed from female and male antenna, tarsus, and reproductive organs, respectively. Additionally, 77.0, 83.3, 77.1, 76.7, 79.1, and 78.4 million respective clean-reads were obtained after filtering, followed by merging and clustering. A final transcript dataset with consisting of 67,357 unigenes was obtained. The dataset was 50.63 megabases with a mean length of 857 nt and an N50 of 1405 nt. There were 18,943 unigenes longer than 1000 nt, which accounted for 28.12% of the transcriptome assembly (Table 1).

Homology analysis and gene ontology annotation
Among the 67,357 unigenes, 36,966 (54.8%) were matched using a BLASTX homology search to entries in the NCBI non-redundant (nr) protein database with a cut-off E-value of 10-5. The highest percentage of matched sequences (refers to only the best match for each unigene to a sequence in the blast database) was to Danaus plexippus (54.0%), followed by Bombyx mori (6.6%), Tribolium castaneum (4.2%), Papilio xuthus (2.9%), Silurana tropicalis (1.9%), Papilio polytes (1.1%), and Acyrthosiphon pisum (1.1%). The remaining 28.3% sequences were matched to other insect species (Fig 1A).  Gene Ontology (GO) annotations were used to classify the 67,357 unigenes into different functional groups by BLAST2GO. Based on sequence homology, 15,748 unigenes (23.37%) could be annotated with each unigene classified into one or more functional groups of the three biological processes (Fig 1B). In the molecular function category, genes expressed in the antennae were mostly enriched in molecular binding activity (e.g., nucleotide, ion, and odorant binding) and catalytic activity (e.g., hydrolase and oxidoreductase). Among the biological process terms, cellular and metabolic processes were the most frequently represented, and in the cellular component terms, cell, cell part, and organelle were the most abundant ( Fig 1B). These results are comparable to the reported Chilo suppressalis transcriptional profile [37].

Identification of chemosensory receptor candidates
The unigenes related to chemosensory receptor candidates were identified by keyword searches of BLASTx annotations. To identify additional OR candidates, the predicted protein sequences of the unigenes were further searched by PSI-blastp with known lepidopteran chemosensory receptors (Tables 2 and 3). We identified 29 distinct unigenes that were candidates for OR genes and no GR gene like sequences longer than 200bp were identified in any tissue even tarsus. Among the OR genes, four sequences were full length ORs because they had an intact open reading frame with a general length of 1200 bp and 4 to 7 predicted transmembrane domains, which are characteristic of typical insect ORs. Based on typical OR lengths (around 400 amino acids), over 60% of the sequences identified represented fragments with missing 5' ends and only seven contained a deduced protein longer than 200 amino acids. Similar results have been reported in the transcriptomes of other species, such as S. Inferens [36], C. suppressalis, and H. armigera.
In the phylogenetic analyses, lepidopteran putative pheromone receptors clustered in a subgroup (Fig 2). These four OR candidates were named "CmedPRx" (x = 1 through 4) to more clearly indicate putative function; this nomenclature was adopted to more clearly differentiate lepidopteran pheromone receptors from general odorant receptors, which generally have been poorly classified. The C. medinalis Orco orthologue, termed-CmedOrco, had a high degree of identity with other insect co-receptors. Almost all CmedOR candidates clustered with at least one putative lepidopteran orthologous in the phylogenetic tree.
Compared to genome-based identification the number of ORs identified in transcriptome analyses is typically limited. The number of ORs identified in the genomes of D. melanogaster, A. gambiae, and B. mori are 62, 79 and 72, respectively, while in antennal transcriptomes 47 were identified in M. sexta [33], 43 in C. Pomonella [26] and 47 in H. armigera [35]. In the European corn borer (Ostrinia nubilalis), a neuroanatomical study suggested 64 glomeruli in the antennal lobe of both genders [19], and in H. virescens, the total number of glomeruli, including dimorphic and isomorphic units, consisted of 64 in males and 62 in female [50]. It is believed that that one olfactory receptor type is expressed in the OSN and axonal projects of different OSNs express the same olfactory receptor, which converges on the same antennal lobe glomeruli. Indeed, some glomeruli can also be activated by OSNs that express other classes of chemoreceptors, such as ionotropic receptors [29]. Although there has been no reports on the number of glomeruli in C. medinalis, our dataset of 29 OR sequences is somewhat smaller than those of other insects [36,37] even though the sequencing depth of our transcriptome was greater than others. There are several possible explanations to address this potential discrepancy. First, according to the sequence reads, the expression level of ORs in the C. medinalis antenna is very low, suggesting low transcript numbers (S1 Fig), resulting in lower detection metrics, which suggests that there is a high chance to identify additional low expression C.
medinalis chemosensory genes that were missed in our assembly. Second, it is possible that the remaining OR and GR genes are either exclusively expressed in other olfactory organs such as maxillary palp (especially OR) or are temporally restricted to the developmental period (embryonic, larval, or pupal).

Identification of candidates for sensory neuron membrane proteins and ionotropic receptors
Sensory neuron membrane proteins (SNMPs), which are located in the dendritic membrane of primarily pheromone-specific OSNs, are thought to trigger ligand delivery to the receptor [32]. The two 'discovered' SNMPs share 100% sequence similarity with those reported previously [51].
The putative IR genes in the C. medinalis antennal transcriptome can be represented according to their similarity to known insect IRs. Bioinformatics analysis led to the identification of 15 IR candidates, in which three sequences contained a full-length ORF, and the remaining 12 sequences were fragments. Insect IRs have three trans-membrane domains, and TMHMM2.0 predicted the six candidates C. medinalis IR also have three trans-membrane domains (Table 4). For phylogenetic analysis, 13 of the putative C. medinalis IRs were aligned with orthologous IRs from D. plexippus, B. mori, S. Littoralis, and D. melanogaster, the remaining two C. medinalis IR sequences were too short to align successfully. From the phylogenetic tree, we detected clear segregation between the different subfamilies, such as iGluR, IR75q, IR41a, and IR93a. In this phylogenetic tree, the most of C. medinalis IR candidates clustered with orthologous ionotropic receptors into separate clades (Fig 3). Based on their positions in the phylogenetic tree and strong bootstrap support, 12 of the 13 analyzed C. medinalis IRs candidates clustered with the IR8a, IR75p, IR75q, IR21a, IR41a, IR68a, IR76b, IR87a, and IR93a groups, which formed small expansions with other putative genes. Similar to previous reports [28,31], IRs and iGluRs clustered phylogenetically into separate clades, with the exception of the IR8a and IR25a lineages, which clustered with the iGluRs. Unexpectedly, an orthologe of co-receptor IR25a, which is typically among the highest expressed IR transcripts in other insect antennae, was not found in our transcriptome. This is the first report of IRs in C. medinalis and sequence alignments showed that the IRs are more highly conserved across species and orders than the ORs are. In our database, we identified fragments of putative IRs, which were specifically and highly expressed in the antennae of both sexes. Data for these genes, including unigene reference numbers, length, and best BLASTx hit for all the 15 IRs, are listed in Table 4. The sequences of all 15 IRs are listed in S1 File.

Identification of putative odorant-binding proteins and chemosensory proteins
In addition to a keyword searching and PSI-Blast, we also used a motif scan for the conserved 6 cysteine residue pattern (C1-X 5-39 -C2-X 3 -C3-X 21-44 -C4-X 7-12 -C5-X 8 -C6) characteristic of odorant-binding proteins [52]. In our transcriptome data set, we identified 30 sequences that potentially encode odorant-binding proteins, including six previously annotated OBPs. Among these 30 sequences, 22 had an intact ORF detected, six unigenes lacked signal peptides and the other two were missing the 5' sequence. Sequence alignment showed that 15 of the 22 putative intact OBPs had the classic cysteine motif (Fig 4). The number of CmedOBPs candidate identified was far less than the 46 annotated OBPs found in the B. mori genome [53], but was more than the 18 putative OBPs identified in M. sexta [33]. In the phylogenetic tree, the PBP and GOBP sequences were clustered into the PBP and GOBP clades, respectively, as expected ( Fig 5). All candidates for OBP sequences were clustered with at least one lepidopteran ortholog. By comparing our putative OBPs with the NCBI records for C. medinalis, we identified six previously annotated "genes", GOBP1, GOBP2, GOBP3, PBP4, OBP1, and OBP2. All of the previously annotated sequences had >99% amino acid identity with their most similar NCBI records. Therefore, we named these candidates GOBPs and PBPs based on their existing NCBI records, and named the other OBP candidates "CmedOBP" followed by a number in descending order of their coding lengths (Tables 5 and 6).  Bioinformatics analysis led to the identification of 26 sequences that encoded CSPs candidates (Fig 6). Among them, 20 sequences had full-length ORFs and signal peptides, which were found by using the SignalP test, except for CmedCSP15. Neighbor-joining tree analysis showed that all 26 sequences clustered with orthologous Lepidopteran genes (Fig 7). These CSP candidates were named "CmedCSPx" followed by a number in descending order of the length of the coding region. Information of the CSPs is presented in Table 7. The CSP sequences are listed in S1 File.
Validation of tissue-and sex-specific expression of candidate chemosensory genes In C. medinalis, all 29 ORs were expressed in antennae, 12 of which showed sex differentiation, with the expression level of CmedOR7, CmedOR14, CmedOR15 and CmedOR21 higher in female antennae than in male. Female-biased OR expression, as quantified using RNA-seq data, has also been reported for ORs expressed in the antennae of the adult mosquito, Anopheles gambiae [54] and other lepidopterans [55]. Three putative ORs were also expressed in legs and reproductive organs, respectively (Fig 8A), and CmedOrco was specifically expressed in adult antennae. Functional characterization of these male-and female-biased ORs, as well as members of the PR clade, remains to be conducted.
Unlike ORs, the expression of the IRs appeared to be similar between male and female ( Fig  8D). We observed that some CmedIRs transcripts were not specific to chemosensory tissues, as 13 of the 15 candidate IRs showed exclusive expression in male and female antennae. Similar results were also observed in S. littoralis IRs [56]. The relatively high sequence conservation and expression of IRs implies a probable functional conservation. The antennal IRs are a novel group of chemosensory receptors. Additionally, CmedIR4 and CmedIR68a showed considerable expression in the male and female tarsus, respectively. In D. melanogaster, most of the 15 antennal IRs were found to be expressed only in antennae, two were also expressed in other tissues, such as the proboscis [29]. And CsupIR3 and CsupIR64a have considerable expression in leg [37]. Thus, the chemosensory function of ORs and IRs might not be restricted to antennae (Fig 8B). In Lepidoptera, legs and ovipositors are known to carry contact chemosensory sensilla. For example, the ovipositor of the moth H. virescens has OR-expressing sensilla [57]. Taken together, these observations suggest that, although classified as antennal IRs, some IRs might be involved in functions other than chemoreception [56]. Compared with the chemoreception genes of C. medinalis previously submitted to NCBI, most of those sequences were found in our data and the expression patterns of these genes in different tissues were nearly identical. In addition, we identified 13 new CSP genes, 11 of which were only expressed in the female protarsus, the remaining two were also expressed in the male protarsus and one was expressed in the male reproductive organs (Fig 9C). In total, 12 OBP genes were exclusively expressed in antennae and four in the female tarsus. In addition, Cme-dOBP6 and CmedOBP12 were also expressed in the male reproductive organs; and Cme-dOBP16 and CmedOBP24 showed sex differentiation (Fig 9B). The high expression levels in male antenna could help male moths to identify sex pheromones emitted by female moths. In many species, soluble PBPs in the sensillum lymph surrounding the dendrites are thought to transfer the usually hydrophobic pheromone molecules to the dendrite membrane of the sensory neurons [58], and are male biased in expression [59,60]. Several studies supported the role of PBP in pheromone detection, since female moths release a blend of sex pheromones to attract males over long distances, and males detect the released pheromones with extreme sensitivity and selectivity [61], PBPs though are not the only protein that can bind sex pheromones, OBPs and CSPs also function in this. Labeling was observed in antennae sensilla chaetica, but not in olfactory sensilla or sensilla coeloconica, leading to the suggestion that in Orthoptera, CSPs are involved in contact chemoreception. This may also apply to Lepidotera as BmorCSP4 was higher expressed in contact organs (antennae, legs, and wings) than noncontact organs (head, thorax and abdomen); many insects have high expression levels in antennae, legs and wings but lower levels in the abdomen, thorax and head for both sexes [62]. Furthermore, among insects many adult females do not automatically oviposit once they have reached the spawning place, initial determination of host suitability may occur through the tarsal sensilla [63]. In C. medinalis, four OBPs and 12 CSPs were identified that were expressed exclusively in female protarsus, which may function in host plant discrimination. We speculate that prior to oviposition, females assess the suitability of leaf surfaces with their legs. This behavior may help to characterize the function of these proteins in future research. In the Mediterranean fruit fly, Ceratitis capitata, CcapOBP99d is abundant in the antennae, but also present in tarsi, wings and male abdomen [61].
To validate the expression patterns of candidate genes, eight genes including two OBPs, two CSPs, and four ORs were selected and analyzed by qRT-PCR. The qRT-PCR amplicons were sequenced directly after amplification and show ≧99% identical at the nucleic acid level with the corresponding sequences from the transcriptome, indicating that the assembly of the transcripts was reliable. The expression levels were consistent with the initial FPKM calculation (Fig 8), which further supported the validity of our data.

Conclusion
The main objective of this study was to investigate the transcriptome of the major chemosensory organs in the rice leaf folder C. medinalis. Using RNA-sequencing, we annotated a total of 30 candidate OBPs, 14 CSPs, 36 ORs, 15 IRs, and 2 SNMPs in the antennae of C. medinalis. Most of the previously annotated C. medinalis chemosensory genes available in NCBI were also found in our dataset. Prior to this study, members of the major chemosensory genes had only been identified in the antennae of C. medinalis, and no ORs or IRs were identified. This strategy is particularly relevant for the identification of new insect chemosensory receptors in species for which no genomic data is available. The availability of our large antennal transcriptome represents a valuable resource for further studies on insect olfaction in this species as well as other leptidopterans.  Although the generation of gender-specific transcriptomes did not highlight strong differences between sexes, we found evidence for female-enriched ORs. A comparison of the transcriptomes in major chemosensory organs from males and females that have encountered diverse experiences might lead to the identification of more regulated genes, such as candidates for genes involved in gender-specific behaviors and make it possible for further research into the C. medinalis olfactory system at the molecular level. These studies can also provide information for comparative and functional genomic analyses of related species.