A De Novo Transcriptome and Valid Reference Genes for Quantitative Real-Time PCR in Colaphellus bowringi

Background The cabbage beetle Colaphellus bowringi Baly is a serious insect pest of crucifers and undergoes reproductive diapause in soil. An understanding of the molecular mechanisms of diapause regulation, insecticide resistance, and other physiological processes is helpful for developing new management strategies for this beetle. However, the lack of genomic information and valid reference genes limits knowledge on the molecular bases of these physiological processes in this species. Results Using Illumina sequencing, we obtained more than 57 million sequence reads derived from C. bowringi, which were assembled into 39,390 unique sequences. A Clusters of Orthologous Groups classification was obtained for 9,048 of these sequences, covering 25 categories, and 16,951 were assigned to 255 Kyoto Encyclopedia of Genes and Genomes pathways. Eleven candidate reference gene sequences from the transcriptome were then identified through reverse transcriptase polymerase chain reaction. Among these candidate genes, EF1α, ACT1, and RPL19 proved to be the most stable reference genes for different reverse transcriptase quantitative polymerase chain reaction experiments in C. bowringi. Conversely, aTUB and GAPDH were the least stable reference genes. Conclusion The abundant putative C. bowringi transcript sequences reported enrich the genomic resources of this beetle. Importantly, the larger number of gene sequences and valid reference genes provide a valuable platform for future gene expression studies, especially with regard to exploring the molecular mechanisms of different physiological processes in this species.


Introduction
Insects in a diapause status may have a stronger resistance to environmental stresses, e.g., seasonal adverse conditions and pesticides [1,2]. The cabbage beetle Colaphellus bowringi Baly is a serious insect pest of crucifers in mountain areas in China [3]. This beetle can survive during seasonal adverse conditions under a reproductive diapause status in soil [4,5]. Moreover, the application of pesticides is not always effective in its management [6,7]. It is well known that diapause, insecticide resistance and other physiological processes are regulated at the molecular level. However, a lack of genomic information and valid reference genes limits our understanding of the molecular bases of these physiological processes in efforts to develop new management strategies for this species.
In non-model organisms without reference genomes, abundant gene sequences can be obtained using Illumina, 454 pyrosequencing and other next-generation sequencing technologies [8][9][10][11]. Based on the identification of gene sequences, gene expression analyses performed using reverse transcriptase quantitative polymerase chain reaction (qRT-PCR) can provide new insight into complex biological process, such as in Aedes albopictus [12] and Nilaparvata lugens [13]. For example, a de novo analysis of the N. lugens antenna transcriptome was implemented via Illumina sequencing, and the expression of olfactory genes sequences obtained from the antenna transcriptome were analyzed by qRT-PCR [13]. Hence, prior to investigating molecular mechanisms related to diapause regulation and insecticide resistance in C. bowringi, abundant gene sequences from a de novo transcriptome can be obtained through Illumina sequencing.
In qRT-PCR analysis, the expression of a reference gene has been used to normalize target genes mRNA levels from the same sample [14][15][16]. The most common reference genes, such as ribosomal proteins (RPs), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), and actin (ACT) [17][18][19], have been used without verification for some time [20][21][22]. However, the stability of these traditional reference genes varies among different insect species and experimental conditions and no reference gene can be universally used for all experimental conditions [17,23,24]. Recently, many genes, e.g., GAPDH, RPs, ACT, Elongation factor-1 α (EF1α), TATA-Box binding protein (TBP), α-tubulin (αTUB), and β-tubulin (βTUB), have been widely screened as valid reference genes in insect species. These genes represent different functional classes and gene families and participate in basic cellular processes [23,25,26]. In addition, four algorithms, geNorm, NormFinder, BestKeeper and RefFinder, have been commonly used to screen for valid reference genes. geNorm is an program utilized to calculate the expression stability (M) of candidate reference genes [14], and NormFinder is an algorithm used for the calculation of reference genes in a set of candidates [16]. BestKeeper is a Microsoft Excel-based tool that uses pair-wise correlation [27]. RefFinder can assign an appropriate weight to an individual gene and calculate the geometric mean of the weights to attain the overall final ranking [28]. Thus, according to these candidate reference genes and algorithms, valid reference genes for C. bowringi can be selected under different conditions for further gene expression by using qRT-PCR.
In the present study, Illumina sequencing technology was used to generate a substantial dataset of transcript reads of the C. bowringi transcriptome. Based on the transcriptome database, 11 candidate reference genes were verified by reverse transcriptase polymerase chain reaction (RT-PCR) and validated in different developmental stages, tissues, sexes, strains, and photoperiods to obtain the most stable reference genes for different qRT-PCR analyses in C. bowringi. The results of this study not only generate substantial sequence information but also provide valid reference genes for the analysis of gene expression. These results provide a valuable platform for future gene expression research in this species, especially for exploring the molecular mechanisms of different physiological processes, including diapause and insecticide resistance.

Ethics statement
The beetles used in this study were collected from a natural population from the field of Xiushui County (29°1 0 N, 114°4 0 E), Jiangxi Province, China. The field studies did not involve endangered or protected species, and no specific permissions were required for these research activities in these locations. A specimen of this beetle was deposited at the Museum of Huazhong Agricultural University.

Experimental insects
A natural population of cabbage beetles collected in late November 2008 was maintained inlaboratory. Post-diapause adults emerging from the soil in early March 2012 (for transcriptome analysis) and October 2013 (for reference gene analysis) were reared in plastic containers (7.5 cm×7.5 cm×6 cm). The offspring of the second generation reared under conditions of 25°C and a 12:12 h light:dark photoperiod (L:D) were regarded as the high-diapause strain (HD strain) in this study. The individuals of HD strain reared at 25°C and 16:8 L:D were diapause-destined individuals while at 25°C and 12:12 L:D were non-diapause-destined individuals. A non-diapause strain (ND strain), which was not sensitive to photoperiod of diapause induction at 25°C, was established in our laboratory [29] and also used in this study. The individuals of the ND strain reared both under 16:8 L:D and 12:12 L:D at 25°C were non-diapausedestined individuals. All insects in this study were reared in illuminated incubators (SPX-250IC, Boxun Medical Instruments, Shanghai, China). The temperature was approximately 25 ± 1°C, with relative humidity at 70 ± 10%. The photoperiod was automatically controlled, with a light intensity of approximately 2.0 W/m 2 during the photophase.

Sample collection for transcriptome analysis
For the transcriptome analysis, 4-day-old larvae, fresh pupae, newly emerged adults without feeding, adults feeding for 2 days, and adults feeding for 5 days of four different groups were collected from the HD strain (12:12 L:D and 16:8 L:D) and ND strain (12:12 L:D and 16:8 L:D), which were reared at 25°C. The samples from each stage contained 5-10 individuals. The twenty stage samples were collected in 1.5-mL microcentrifuge tubes and immediately frozen in liquid nitrogen and stored at -80°C.

Illumina sequencing and sequence assembly
Total RNA was isolated using an SV total RNA isolation system (Promega), according to the manufacturer's protocol. To obtain complete gene expression information, equal amounts RNA of 20 stage samples were pooled into one sample and then used for the transcriptome analysis. Poly (A) + RNA was purified using oligo [25] magnetic beads and then fragmented into short sequences at 94°C for 5 min. The cleaved poly (A) + RNA was reverse-transcribed, followed by the synthesis of second-strand cDNA. After end repair and the ligation of adaptors, the products were amplified by PCR and purified using a QIAquick PCR Purification Kit (Qiagen). The cDNA library was sequenced using the Illumina sequencing platform Hiseq 2000 (Beijing Genomics Institution, Shenzhen, China). The raw reads from the images were generated using the Solexa GA pipeline 1.6.

Unigene annotation and classification
High-quality reads were assembled into contigs using Trinity software [30]. All unigenes were used as queries to search a local protein database containing all of the protein sequences of the nr database with the BLASTX algorithm. The Clusters of Orthologous Groups (COG) classification was analyzed using annotated unigenes to search the COG database with a 10 -3 cutoff. For a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis [31], all of the unigenes were used to search the KEGG database using the default parameters. For a comparative genomics analysis, the annotated contigs were compared using BLASTP with known genes in three insect species, namely, Drosophila melanogaster, Anopheles gambiae, and Tribolium castaneum.

Sample collection for reference gene analysis
Samples of different developmental stages, sexes, strains, photoperiods, and 24-h photoperiod of 4-day-old larvae were collected according to the method described in the sample collection for transcriptome analysis. According to the manufacturer's protocol, tissue samples were collected and kept in RNAsafer (Omega Bio-tek, D13MG) to avoid RNA degradation after dissection. The experimental design and sample collection for the reference gene analysis is given in Table 1. The samples from different experimental conditions contained 5-10 individuals, and the samples for the different tissues contained 50 individuals. Each sample was prepared in three independent biological replicates. A dissection needle, Vannas scissors (54138B, 66vision Tech, Suzhou, China), and stereo-microscope (SMZ-t4, Chong Qing Optec Instrument, Chongqing, China) were used to collect the tissue samples.

Reference gene selection and primer design
Eleven reported reference genes, GAPDH, RPL32e, RPL19, EF1α, TBP, TBP1, ACT1, ACT2, αTUB, αTUB1, and βTUBC, were selected for evaluation in C. bowringi. These candidate reference genes represent different functional classes and gene families and have been widely used in qRT-PCR analysis in insects [18,19,23,26]. All the gene sequences were obtained by screening the transcriptome data of C. bowringi. The gene sequences were deposited in GenBank (the accession numbers are listed in S1 Table). All gene-specific primers were designed using the NCBI Primer-BLAST primer designing tool (http://www.ncbi.nlm.nih.gov/tools/primer-blast/ index.cgi?LINK_LOC=BlastHome) (S1 Table). Table 1. Experimental design and sample collection for reference genes analysis in C. bowringi.

Statistical analysis of reference gene selection
A data analysis of the quantitative real-time PCR was carried out using Bio-Rad iQ5 Optical System software (version 2.1.94.617) (Bio-Rad Laboratories, Hercules, CA, USA). The fluorescent signal of the threshold cycle (Ct value) is the initial marker of a significant difference from the background. All the biological replicates were used to calculate the average Ct value. The stability of 11 candidate reference genes was comprehensively evaluated using the algorithms geNorm version 3.5 (http://medgen.ugent.be/~jvdesomp/genorm/) [14], NormFinder version 0.953 (http://www.mdl.dk/publications normfinder.htm) [16], and BestKeeper (http://www. wzw.tum.de/gene-quantification/bestkeeper.html) [27]. Lastly, the tested candidates were compared and ranked with a web-based analysis tool, RefFinder (http://www.leonxie.com/ referencegene.php) [28]. GeNorm calculated the expression stability value (M) of each gene and selected two most stable genes. M value below 1.5 indicates that the gene can be used as a reference gene and the lower M value means the more stable of the gene. Pairwise comparison (Vn/Vn+1) value was used to determine the optimal number of reference genes and pairwise comparison values below 0.15 indicate that an additional reference gene will not be required for accurate normalization. NormFinder calculated the genes Stability value (SV) and ranked the genes. The SV value is lower, the gene is more stable and one most stable reference gene can be confirmed. BestKeeper determines the genes expression stability through calculating the geometric mean (GM) and standard deviation (SD) based on all genes Ct values. SD value below 1 indicates that the gene can be used as a reference gene and the lower SD value means the more stable of the gene. Finally, RefFinder made a combination of these methods results and recommended the most suitable reference genes in different conditions.

Illumina sequencing and de novo assembly
Equal amounts of RNA from 20 samples (samples of four different groups of C. bowringi described in the Materials and Methods section) were pooled for the transcriptome analysis using the Illumina sequencing platform to obtain deep coverage of the transcriptome. After quality checks, 54 million 90-bp-long reads were obtained and used for assembly with Trinity software; a total of 39,390 unigenes was obtained ( Table 2). The length of 21,398 unigenes (54.32%) ranged from 100 to 500 bp, and the length of 8,658 unigenes (21.98%) ranged from 501 to 1000 bp; 5,333 unigenes (13.54%) had a length of more than 1500 bp. The mean unigene length was 792 bp (S1 Fig).

Annotation of predicted proteins
Of the C. bowringi transcripts, 24,070 (61.1%) showed significant similarity (E-value < 1e-5) to transcripts of known proteins in the NCBI database ( Table 3). The majority of the transcripts (55%) matched to arthropod proteins, and the remainder was similar to non-insect eukaryotes proteins (5%) and bacterial proteins (0.6%). A total of 19 sequences were similar to the sequences of viral proteins, and non-insect eukaryotic proteins (Table 3).

Comparative analysis of the transcripts
The derived C. bowringi transcripts were compared with protein sequences in the existing genomes of D. melanogaster, A. gambiae, and T. castaneum using the BLASTX algorithm. A high sequence similarity (57%, 22,342 of 39,390) between C. bowringi transcripts and the T. castaneum genome was observed (Fig. 1). The similarity between C. bowringi transcripts and the D. melanogaster and A. gambiae genomes was 33% and 34%, respectively. These four insect species shared 11,419 sequences. Approximately 42% of the C. bowringi sequences (16,662 out of 39,390) did not show BLASTX similarity, which suggest that they represent non-coding RNAs, un-translated regions, non-conserved regions, or novel proteins of C. bowringi (Fig. 1).

COG and KEGG classifications
A total of 24,070 transcripts with matching records were obtained, and a COG classification was obtained for 9048 sequences (Fig. 2). Of the 25 COG categories, a general function prediction was identified as the largest cluster group (3317, 37%), followed by cluster groups of replication, recombination, and repair (1792, 20%) and translation, ribosomal structure, and biosynthesis (1408, 16%). Extracellular structures (20, 0.00221%) and nuclear structure (9, 0.009947%) accounted for the smallest number of sequences (Fig. 2). For a biological pathways analysis in C. bowringi, the 24,070 annotated sequences were mapped to the canonical reference pathways in KEGG. In total, 16,951 sequences were assigned to 255 KEGG pathways (S2 Table). Of these sequences, 2480 belonged to metabolic pathways, 609 to purine metabolism pathways, and 582 to pathways regulating the actin cytoskeleton.

Identification of candidate reference genes
The sequences of the 11 candidate reference genes were screened from the C. bowringi transcriptome. The sequence accuracy of the 11 candidate reference genes was confirmed by RT-PCR, and the target RT-PCR products of the genes were sequenced again by the experts from Genscript Company (Nanjing, China). The similarity between these sequences from the transcriptome and the data obtained from Genscript was above 99.9%. The complete open reading frames (ORFs) obtained, including the partial sequences of the RPL19 ORF, were submitted to the GenBank database (S1 Table).

Expression profiles of candidate reference genes
For each reference gene, the primer sets amplified a unique PCR product with a single-peak dissociation curve, and all the products ranged from 80 to 137 bp. The melting temperature (Tm) of all primers for the PCR products was about 60°C. The amplification efficiency (E) of the PCR reactions ranged from 100.0% for RPL32e to 108.2% for TBP1, and the correlation coefficients (R 2 ) ranged between 0.996 and 0.998, respectively (S3 Table). The expression levels of these genes were analyzed by qRT-PCR, and the Ct values showed different levels of transcript abundance. The mean Ct values of the 11 reference genes ranged from 16.44 to 25.08 cycles. Judging from the variations in Ct values, ACT1 exhibited the lowest variation in expression (below two cycles), followed by EF1α, GAPDH, and ACT2, whereas αTUB showed the highest variation (Fig. 3).
Analysis of gene expression stability geNorm analysis. Based on the average M values, the most stable reference genes for developmental stages were RPL32e and RPL19; αTUB showed an M value higher than 1.0, indicating that its expression level was the most variable (Fig. 4A). With regard to different tissues, RPL32e, RPL19, and EF1α were found to be the most appropriate reference genes (Fig. 4B). RPL19 and EF1α were the most stable genes for different sexes (Fig. 4C). For different strains, the M value of all the candidate reference genes was below 0.5, and EF1α and βTUBC were the most invariable (Fig. 4D). Regarding different photoperiods, the most stable reference genes were TBP and TBP1 (Fig. 4E). Additionally, RPL32e and RPL19 were the most stable genes with the lowest M values for the 24-h photoperiod of 4-day-old larvae, and the M value of EF1α was the highest (Fig. 4F).
To determine the optimal number of reference genes required for accurate normalization, the pairwise variation (Vn/Vn+1) was calculated using geNorm. In terms of developmental stage, tissue, and sex, all the Vn/Vn+1 values, except for the V10/11 value, were below 0.15. In addition, regarding strain and photoperiod treatment, all the Vn/Vn+1 values were below 0.15 (Fig. 5).
NormFinder analysis. The results indicated that the ranking of the NormFinder reference gene stability was marginally different from that of geNorm (Table 4). For developmental stage, the most stable genes were EF1α and βTUBC, whereas the most unstable gene was αTUB. With regard to stability, the selected reference genes in different tissues ranked as follows: ACT1, ACT2, αTUB1, EF1α, RPL32e, TBP1, βTUBC, RPL19, TBP, GAPDH, and αTUB. For sex, the most stable genes were RPL19 and EF1α, sharing the same stability values. ACT1, βTUBC, and ACT2 were the most stable genes for different strains, with all the values of reference genes below 0.5. The most stable genes for photoperiod were RPL19 and αTUB1. For the 24-h photoperiod of 4-day-old larvae, the stability values of the 11 reference genes were below 0.2, with little variation. With the exception of the treatment of 24-h photoperiod of 4-day-old larvae, the least stable gene was the same, namely, αTUB.
The ranking of candidate reference genes for combinations of the samples was then analyzed. For different photoperiods combined with tissues, the best combination of two genes was TBP1 and αTUB1, the most stable gene was ACT1, and the most unstable gene was αTUB. According to the NormFinder analysis of the combination of photoperiod with strain, the best combination of two genes was EF1α and βTUBC, the most stable genes were βTUBC, and the most unstable gene was αTUB. In the two treatments of photoperiod and 24-h photoperiod of 4-day-old larvae, the best combination of two genes was RPL19 and αTUB1, the most stable gene was RPL19, and the most unstable gene was αTUB (S4 Table). The results of the NormFinder analysis indicated that the most stable genes for single factors and double factors may be different due to the expression levels of different genes.

Discussion
In this study, we established a valuable platform for gene expression analysis in C. bowringi. A substantial dataset of transcript reads of the C. bowringi transcriptome was obtained using Illumina sequencing technology. The sequences of 11 candidate reference genes from this transcriptome database were confirmed by RT-PCR. This study revealed that RPL19, ACT1, and EF1a were the most stable reference genes, whereas aTUB and GAPDH were the least stable reference genes for different qRT-PCR analyses in C. bowringi.
Transcriptome sequencing provides high-quality gene sequence information for C. bowringi Abundant gene sequences can be obtained using a powerful tool: next-generation sequencing technology. In our study, Illumina sequencing was applied to C. bowringi, and 83,809 contigs with a mean size of 369 bp and 39,390 unigenes with a mean size of 792 bp were obtained. Both the quantity of numbers and sizes of contigs and unigenes obtained, indicated higher abundance of our transcriptome compared with the results of another study on diapause mechanisms in A. albopictus that used a similar sequencing technology [33]. This abundant database will provide a basis for gene expression analysis and greatly improve the study of molecular mechanisms of different physiological processes including diapause and insecticide resistance in C. bowringi.
In general, gene function analysis is necessary for investigating the molecular mechanisms of diapause regulation and insecticide resistance in C. bowringi. In our study, 24,539 unigenes were further annotated with regard to gene function: 9,048 (36.9%) sequences had a COG classification, covering 25 categories, and 16,951 (69.1%) sequences were assigned to 255 KEGG pathways. Comprehensive gene function classifications for C. bowringi were similar to the results of studies in N. lugens [34] and Rhyacionia leptotubula [35]. This copious biological information is helpful for understanding the fundamental molecular pathways of this beetle.
EF1α, ACT1, and RPL19 are the most stable reference genes for different qRT-PCR analyses in C. bowringi qRT-PCR, with the advantages of high throughput, sensitivity, and accuracy, has been frequently used to study the molecular mechanisms of many physiological processes, such as diapause [36,37]. However, before a reference gene is used for qRT-PCR analyses, the stability of the gene should be evaluated under various conditions [20][21][22]. One major conclusion of our study was that appropriate reference genes (RPL19, ACT1, and EF1a) were found for several different experimental conditions and that aTUB and GAPDH as traditional reference genes were the least stable.
In detail, the results indicated that RPL19, ACT1, and EF1a showed steady expression in C. bowringi under several experimental conditions. RPL19 had the most stable gene expression in the different sexes and photoperiod treatments. Various ribosomal proteins have also been reported to have stable expression in different developmental stages in other species, such as in L. decemlineata [26], B. tabaci [18], B. mori [24], and Spodoptera exigua [24]. Thus, RPL19 could be a candidate reference gene with regard to biotic conditions in qRT-PCR analysis. In addition, ACT1 was the most stable gene in different tissues and strains. In fact, ACT has been deemed as the most stable reference gene in many species, such as in Folsomia candida and Orchesella cincta [38], Schistocerca gregaria [39], D. melanogaster [40], and Rhodnius prolixus [41]. However, for B. tabaci, ACT was found to be unstable and was disqualified as a suitable reference gene for different tissues and hosts [18]. Thus, caution should be exercised when utilizing ACT because the protein plays key roles in different important cellular processes [26]. The current data showed that EF1α exhibited stability in different developmental stages and sexes in C. bowringi. EF1a has rarely been used as a normalizer but has recently been selected as a suitable reference gene in a number of species [17,19,42]. EF1α was the most appropriate reference gene in another coleopteran Agrilus planipennis [43]. Taken together, although RPL19, ACT1, and EF1a showed steady expression under certain experimental conditions, these genes did not present high stability under other conditions and in other insects.
In contrast, our results showed that aTUB and GAPDH were the least stable reference genes in C. bowringi. Although aTUB1 was the most stable gene in different photoperiods and βTUBC was the second most stable gene in developmental stages and strains, aTUB was the least stable gene under all the experimental conditions (Figs. 4 and 6). Nonethelss, in Bactrocera dorsalis, aTUB was found to be much more stable than βTUB in all tissues and was one of the most stable genes in all samples [17]. GAPDH is an enzyme that plays an important role in energy metabolism and is often used as an endogenous control for normalization [19]. However, in the current study, GAPDH clearly showed reduced stability compared with other reference genes. Similarly, several previous studies have demonstrated that GAPDH has low expression stability under certain conditions, such as in the midgut, Malpighian tubules, and fat body of the oriental fruit fly B. dorsalis [17] and in B. tabaci under different biotic and abiotic conditions [18]. In contrast, GAPDH was identified as the most stable gene in the head of honeybees after bacterial challenge [44] and in different developmental stages and temperature-stressed larvae of S. litura [19]. Thus, the commonly used reference genes aTUB and GAPDH revealed unstable expression in C. bowringi and different expression stability in other insects. It suggests that the expression stability of aTUB and GAPDH should be evaluated before they are used as reference genes in the future.

Conclusions
In this study, a comprehensive transcriptome of C. bowringi was generated using Illumina sequencing. Altogether, 83,809 contigs with a mean size of 369 bp and 39,390 unigenes with a mean size of 792 bp were obtained. After comprehensive analysis, EF1α, ACT1, and RPL19 were found to be the most stable reference genes for different qRT-PCR analyses in C. bowringi (Table 5), whereas aTUB and GAPDH showed relatively low expression stability. This work established a platform for future gene expression studies in the cabbage beetle. The results could both benefit the investigation of gene function related to diapause regulation and insecticide resistance and also enrich organism genome resources and provide information on reference gene selection.