Transcriptome analysis in different developmental stages of Batocera horsfieldi (Coleoptera: Cerambycidae) and comparison of candidate olfactory genes

The white-striped longhorn beetle Batocera horsfieldi (Coleoptera: Cerambycidae) is a polyphagous wood-boring pest that causes substantial damage to the lumber industry. Moreover olfactory proteins are crucial components to function in related processes, but the B. horsfieldi genome is not readily available for olfactory proteins analysis. In the present study, developmental transcriptomes of larvae from the first instar to the prepupal stage, pupae, and adults (females and males) from emergence to mating were built by RNA sequencing to establish a genetic background that may help understand olfactory genes. Approximately 199 million clean reads were obtained and assembled into 171,664 transcripts, which were classified into 23,380, 26,511, 22,393, 30,270, and 87, 732 unigenes for larvae, pupae, females, males, and combined datasets, respectively. The unigenes were annotated against NCBI’s non-redundant nucleotide and protein sequences, Swiss-Prot, Gene Ontology (GO), Pfam, Clusters of Eukaryotic Orthologous Groups (KOG), and KEGG Orthology (KO) databases. A total of 43,197 unigenes were annotated into 55 sub-categories under the three main GO categories; 25,237 unigenes were classified into 26 functional KOG categories, and 25,814 unigenes were classified into five functional KEGG Pathway categories. RSEM software identified 2,983, 3,097, 870, 2,437, 5,161, and 2,882 genes that were differentially expressed between larvae and males, larvae and pupae, larvae and females, males and females, males and pupae, and females and pupae, respectively. Among them, genes encoding seven candidate odorant binding proteins (OBPs) and three chemosensory proteins (CSPs) were identified. RT-PCR and RT-qPCR analyses showed that BhorOBP3, BhorCSP2, and BhorOBPC1/C3/C4 were highly expressed in the antenna of males, indicating these genes may may play key roles in foraging and host-orientation in B. horsfieldi. Our results provide valuable molecular information about the olfactory system in B. horsfieldi and will help guide future functional studies on olfactory genes.

Introduction encoding OBPs and CSPs, because the olfactory system is crucial for insects to locate hosts, oviposition sites, and food sources. Finally, we validated the differentially expressed candidate olfactory genes identified in the transcriptome data by RT-PCR and RT-qPCR.

Insect rearing and sample collection
Larvae, pupae and adults of B. horsfieldi were collected in June 2016 from in the Poplar Planting Base of Luojiang City, Sichuan Province, China (31.07˚N, 104.08˚E). The field studies did not involve endangered or protected species, and no specific permission was required for the research activity at this location. Adults were used just emergence and unmated. The characteristics used to identify mated B. horsfieldi were the villi on the abdomen of mated males and the obvious mating plaques on the backside of the mated females [42]. Female and male adults were placed on ice and quickly dissected into antenna, thorax (without thoracic legs), hind wing, and thoracic legs for RT-PCR analysis. RT-qPCR was performed using nucleic acids from male and female adult organism. All samples were immediately frozen in liquid nitrogen and stored at −80˚C until use. Each sample contained either larvae, pupae, or male or female adult tissues from at least five insects. After pooling the tissues for each sample, three biological replicates were conducted for each treatment.

RNA extraction and sequencing
Mixed larvae from the first instar to the prepupal stage, pupae, and adults (females and males) were prepared for RNA extraction. Total RNA was isolated from homogenized sample in TRIzol reagent (Takara, Dalian, Liaoning, China) following the manufacturer's protocols. The concentration of total RNA was quantified with a Qubit3.0 (Thermo Fisher Scientific, Waltham, MA, USA) and an Agilent2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). UV absorption values at 260 nm/280 nm was recorded to monitor the purity of the RNA products (Nanodrop2000, Thermo Fisher Scientific, Waltham, MA, USA). After RNA extraction, mRNAs were purified using the interaction of the poly (A) tails and magnetic oligo (dT) beads and collected using RNeasy RNA reagent. Mixed mRNAs were fragmented into 300-800 bp pieces using RNA fragment reagent (Illumina), and the pieces were collected using an RNeasy RNA cleaning kit (Qiagen). Subsequently, RNA fragments were copied to make first-strand cDNA using MMLV reverse transcriptase (Takara, Dalian, Liaoning, China) and random primers. Second-strand cDNA synthesis was performed using DNA Polymerase I and RNase H. The Illumina HiSeq2000 system and 125 paired-end reads were used for sequencing. Statistical analysis of the sequence lengths was performed to ensure sequence purity.

Assembly and functional annotation
Raw sequence data in fasta format were first processed through in-house Perl scripts [43]. In this step, clean data (clean reads) were obtained by removing reads containing adapter, poly-N, and low-quality reads from the raw sequence data [44,45]. The Q20, Q30, GC content, and sequence duplication level of the clean data were calculated [44]. All downstream analyses were based on good-quality clean data.
The flow chart of transcriptome assembly described by Grabherr et al. [46] was used in the present analyses. A Perl pipeline described by Haas et al. [43] was used to analyze the sequence data. As suggested by Haas et al. [43], when multiple sequencing runs are conducted for a single experiment, the resultant reads can be concatenated into two files if paired-end sequencing is used. The left files (read 1 files) from all the samples were pooled into a single large left.fq file, and the right files (read 2 files) were pooled into a single large right.fq file. Transcriptome assembly was accomplished based on the left.fq and right.fq using Trinity (http://trinityrnaseq. github.io) with min_kmer_cov set to two by default and all other parameters set to default. The assembled unigenes were annotated by BLASTX searches and ESTScan against the NCBI nonredundant protein sequences (Nr), NCBI non-redundant nucleotide sequences (Nt), Swiss-Prot, Gene Ontology (GO), protein families (Pfam), Clusters of Eukaryotic Orthologous Groups (KOG), and KEGG Orthology (KO) databases (E <10 −5 ), and the best annotations were selected [44,45,47]. Differentially expressed genes were selected based on a log 2 fold change >1 and q value <0.005 using DESeq [48] (S1 Table). The simple sequence repeats (SSR) in the B. horsfieldi unigene sequences were screened with MISA software (http://pgrc.ipk-gatersleben.de/ misa/misa.html). The expression level of genes were calculated based on FPKM method [49]. The nucleotide sequences of the identified olfactory gene are listed in S2 Table. Homology analysis A neighbor-joining (NJ) tree was constructed with MEGA version 5.0 and the Jones-Taylor-Thornton model [50]. The olfactory genes of other coleopteran species were obtained from the NCBI databases. Bootstrap support values were based on 1000 replicates. All the candidate olfactory genes were named according to the nomenclature system described previously [51,52]. The olfactory genes from different species were marked with different colors and the phylogenetic tree was generated with iTOL software (http://itol.embl.de)

RT-PCR and RT-qPCR validation of differentially expressed candidate olfactory proteins
Seven OBPs and three CSPs that were predicted to be highly abundant in antenna or had complete ORFs were selected for further analysis. Specific primer pairs were derived from the transcriptome data, and primer pairs for each gene were designed to amplify 100-200 bp products, which were verified by sequencing. A semi-quantitative RT-PCR (Bio-Rad S1000, US) analysis was performed for each primer pair using rTaq DNA polymerase (Takara, Dalian, Liaoning, China) before the RT-qPCR analysis to ensure that the correct products were amplified and no primer dimers were present [53]. The RT-qPCR analysis was carried out using an Mx 3000P detection system (Agilent, Palo Alto, CA, USA) as described previously, with thermal cycler parameters of 2 min at 94˚C, then 40 cycles of 20 s at 94˚C, 20 s at 58˚C, and 20 s at 72˚C. The 18S gene was used as an internal control: 18S forward and reverse, 5'-GAGACTCTAGCCT GCTAACT-3' and 5'-TGTTTGTACGCCGACAGT-3'. A standard curve was derived from 10-fold serial dilutions of plasmid containing the target DNA segment to determine the PCR efficiency and to quantify the amount of target mRNA. All primers tested gave amplification efficiencies of 90-100%. For each treatment, three biological replicates were conducted. RT-qPCR data were analyzed by the 2 −ΔΔCT method [54]. The primers used in this experiment were designed with Primer premier 5.0 and Oligo 6.0 and are listed in S3 Table. A Chi-square test was using to compare the expression level of male and female adult. The RT-qPCR data were analyzed and output as PDF files using Graphpad 5.0.

Illumina sequencing and assembly
This filtering resulted in a total of 50,028,651, 51,705,759, 49,935,243, and 47,402,329 clean reads in larvae, pupae, and females and males of B. horsfieldi, respectively. All the clean reads were assembled into transcripts by Trinity software; the longest copy of redundant transcripts was regarded as a unigene [43,44,46]. A total of 171,664 transcripts were obtained and assembled into 87,732 unigenes. Many unigenes exceeded 2000 bp in length, while approximately 21.13% unigenes exceeded 1000 bp, and 25.74% were 500-1000 bp ( Table 1).

Annotation of the B. horsfieldi unigenes
The assembled unigenes were annotated against the Nr, Nt, Swiss-Prot, Pfam, GO, KOG/ COG, and KO databases [53]. A total of 26,511 unigenes were annotated in B. horsfieldi pupae, 23,380 in larvae, 30,270 in males, and 22,393 in females. Among them, 549 were BP-specific, 595 were BL-specific, 515 were BF-specific, 922 were BM-specific, 11,012 were common among the groups, and 87,732 were in the BP-BL-BF-BM combined dataset ( Table 2). The numbers and percentages of unigenes annotated in each of the databases were counted. The Nr database had the best matches against the unigenes in the BP-BL-BF-BM combined dataset (50,968, 58.10%) ( Table 2) (S1-S12 Texts).
After functional annotation, the numbers of sequences from different species that matched the B. horsfieldi unigenes were calculated from the annotation results. The five most represented species (about 76% of all the species) were Tribolium castaneum (   The KOG classification placed 25,237 unigenes into 26 functional categories (Fig 3). The 'general function prediction only' category was the largest (3,920 unigenes), followed by 'signal transduction mechanisms' (3,625 unigenes), and 'posttranslational modification, protein turnover, chaperons' (2,531 unigenes). The top three categories had 39.93% of the unigenes assigned to KOG categories (S14 Text).

SSR analysis
We screened for SSRs in the B. horsfieldi unigene sequences using MISA software and designed primers with Primer 3 for the SSR markers (Fig 5). We identified 87,732 sequence segments with total length of 179,705,476 bp, among which 44,015 SSRs sequences were authenticated.

Expected number of fragments per kilobase of transcript sequence per millions base pairs sequenced (FPKM) analysis
We used the RSEM software to calculate statistics for the bowtie comparison results, and convert FPKM [55]. We obtained the number of read counts for each gene and conducted FPKM analysis accordingly. From the perspective of general distribution of expression quantity ( Fig  6A) and discrete angle (Fig 6B), the gene expression quantity of different forms of B. horsfieldi are different.

Expression profiles of candidate olfactory genes
We identified 10 candidate CSPs and 16 candidate OBPs by searches against the Nr database (Deduced amino acid sequences are listed in S22 Text). Significant differences were detected in the expression profiles of the 10 candidate CSPs and 16 candidate OBPs in male and female adults (Tables 3 and 4, respectively).
To further explore the olfactory genes, a local BLAST search was performed against the B. horsfieldi unigene database using the known olfactory gene sequences of Lissorhoptrus oryzophilus, Monochamus alternatus, and Dendroctonus ponderosae as queries. The calculated expression values based on the FKPM method of all the candidate olfactory genes are listed in Table 5.

Tissue-and sex-specific expressions of candidate olfactory genes
Semi-quantitative RT-PCR showed that all of the tested candidate olfactory genes were primarily male antenna specific (Fig 11) (S23 Text). BhorOBP2/C2 and BhorCSP1 showed olfactory and non-olfactory tissue expression, whereas BhorOBP1/3, BhorOBPC1/C3/C4, and BhorCSP2/3 showed olfactory tissue-specific expression. BhorOBP3 and BhorOBPC1/C3 were highly expressed in the antenna of males.

Discussion
Establishing insect transcription libraries for high-throughput sequencing is an important approach for molecular biology studies of insects. Transcriptome data can be used to detect   Candidate olfactory genes in Batocera horsfieldi new genes, transcription sites, and differentially expressed genes, as well as obtain functional gene information and transcription expression abundance, and the data can be used for molecular marker development, gene expression analysis, and small RNA analysis. [56][57][58][59]. Transcriptome for many insects has been established and analyzed, including Ericerus pela [60], M. alternatus [61], Timema cristinae [62], Cyrtotrachelus buqueti [53], Drosophila melanogaster [63], Bombyx mori [64], and Bombus terrestris [65].
In the present study, developmental transcriptomes were established for B. horsfieldi at various stages, mixed-age larvae, pupae, and male and female adults, and a relatively comprehensive gene pool was obtained. The alignments against the Nr database showed that 57.1% and Candidate olfactory genes in Batocera horsfieldi 15.3% of the B. horsfieldi unigenes were similar to T. castaneum and D. ponderosae sequences, respectively. Thousands of differentially expressed genes were identified, facilitating developmental and evolutionary studies of B. horsfieldi, and contribute to future work in B. horsfieldi comparative genomics. Insects can sense changes in odorant substances in the environment through olfactory receptors, which transform chemical signals from odorant substances into electrophysiology signals that can generate various kinds of behaviors [11]. Proteins are required for odorant substances to interact with insect olfactory receptors, and for turning the chemical signals into electrophysiology signals [66]. These proteins are related to the olfactory sensation of insects and participate in the transmission of a series of signals in the insect olfactory system [67]. Moreover, olfactory proteins have been shown to act in insect nutrient uptake, life span, and behavior changes during developmental stages [67][68][69]. The developmental transcriptomes of B. horsfieldi provide an opportunity to understand the relationship between olfactory proteins and development. The evolution analysis of the OBPs of B. horsfieldi and 25 Coleoptera species and the CSPs of B. horsfieldi and 16 Coleoptera species showed that the OBPs of B. horsfieldi were quite similar to those of Anoplophora glabripennis and the CSPs of B. horsfieldi were quite similar to those of Monochamus alternates. Therefore, we speculated that these genes may have evolved from the same ancestral gene, but differentiated by adaptation to different types of environmental chemical factors during evolution, and perform the same or similar functions among different species [53]. Seven candidate OBPs and three candidate CSPs in male and female adults, showed significant differences in expression. To further explore the significant differences among these genes, a local BLAST was performed on the B. horsfieldi unigene database based on the known Combining RT-PCR and RT-qPCR data, most of the candidate olfactory genes were shown to have male-specific expression patterns in B. horsfieldi, suggesting that the olfactory system is highly developed in male and that olfactory detection plays a relatively important role in males. This result supports the existence of a contact sex pheromone that is produced by B. horsfieldi female, as previously shown on the molecular level [70]. Additionally, components of the female-produced sex pheromone have been identified in other longhorn beetles, such as Prionus californicus [71], Migdolus fryanus [72], Vesperus xatarti [73], and Ortholeptura valida [74].
BhorOBP1/3, BhorOBPC1/C3/C4, and BhorCSP2/3 showed olfactory-specific expression, suggesting that these candidate olfactory genes may play key roles in foraging and host-orientation in B. horsfieldi. A comprehensive, good-quality sequence resource from the developmental transcriptomes of B. horsfieldi larvae, pupae, female and male adults was constructed in this study. This resource enriches what is known about B. horsfieldi genomics, thus facilitating our understanding of metamorphosis, development, and fitness to environmental change. Several potential functional olfactory genes were identified. Future studies aimed at exploring the functions of these genes are the next logical step.