An Analysis of the Athetis lepigone Transcriptome from Four Developmental Stages

Athetis lepigone Möschler (Lepidoptera: Noctuidae) has recently become an important insect pest of maize (Zea mays) crops in China. In order to understand the characteristics of the different developmental stages of this pest, we used Illumina short-read sequences to perform de novo transcriptome assembly and gene expression analysis for egg, larva, pupa and adult developmental stages. We obtained 10.08 Gb of raw data from Illumina sequencing and recovered 81,356 unigenes longer than 100 bp through a de novo assembly. The total sequence length reached 49.75 Mb with 858 bp of N50 and an average unigene length of 612 bp. Annotation analysis of predicted proteins indicate that 33,736 unigenes (41.47% of total unigenes) are matches to genes in the Genbank Nr database. The unigene sequences were subjected to GO, COG and KEGG functional classification. A large number of differentially expressed genes were recovered by pairwise comparison of the four developmental stages. The most dramatic differences in gene expression were found in the transitions from one stage to another stage. Some of these differentially expressed genes are related to cuticle and wing formation as well as the growth and development. We identified more than 2,500 microsatellite markers that may be used for population studies of A. lepigone. This study lays the foundation for further research on population genetics and gene function analysis in A. lepigone.


Introduction
Athetis lepigone Möschler (Lepidoptera: Noctuidae) is found in many countries in Europe and Asia [1][2][3][4][5]. In 2005 it was first discovered that A. lepigone causes severe damage to maize crops in China [6], although the species had never before been documented as an agricultural pest. In subsequent years the range of occurrence has expanded and increasing crop damage has provoked interest in A. lepigone. In 2011, an A. lepigone outbreak occurred in Shandong, Shanxi, Henan, Anhui and Jiangsu provinces, over an area about 2.2 million ha. A. lepigone is now considered an important pest of maize in China.
In the Huang-Huai region of China A. lepigone can produce up to four generations a year and the larvae of the second generation cause the majority of the damage to the maize crop [7]. In maize fields larvae cluster together in straw next to maize seedlings and damage the seedlings in several different ways. Larvae drill into and eat young maize stems, resulting in wilt and death of the seedling. They also eat maize prop root, causing lodging and severe yield loss of maize crops [8].
A. lepigone is a generalist herbivore and has been observed eating 30 species of plants from 13 different plant families [8]. A. lepigone overwinters in the north of China, where cocooned instar larva can supercool to withstand temperatures reaching 225uC [9]. Both male and female adults can mate multiple times and females may produce up to 500 offspring at a time [10]. This species was previously known to exist in the region, but populations seem to have grown rapidly in recent years. A. lepigone populations may be benefitted by agricultural practices that return a large amount of straw to fields after the wheat harvest that is then planted with maize. This creates a suitable habitat and rich food source for A. lepigone. The availability of straw shelter appears to be the key factor influencing the outbreak of A. lepigone, rather than migration or diffusion from other regions [11].
As a newly discovered pest, very few genetic resources are available for A. lepigone. At publication time, there were only 17 COI haploid gene sequences of A. lepigone in the NCBI database, yet genetic information is urgently needed to study the molecular regulation mechanisms of A. lepigone and better understand how to control A. lepigone damage to maize crops.
In recent years, next-generation high-throughput DNA sequencing techniques have provided fascinating opportunities in the life sciences and dramatically improved the efficiency and costeffectiveness of gene discovery [12]. Next-gen sequencing generates a large amount of data and for that reason has been widely used in the study of humans [13], crops [14,15], agricultural pests [16] and other model organisms [17,18]. For example, Xue et al. sequenced the transcriptome of different development stages of the brown planthopper (BPH) Nilaparvata lugens, obtaining 85,526 unigenes, and found a large number of genes related to wing dimorphism and sex difference [19]. In another landmark study transcriptome sequencing was performed on 30 distinct developmental stages of Drosophila melanogaster [18]. This study identified 111,195 new functional elements, including coding and noncoding genes and splicing and editing events. In 2012 Chen et al. contrasted the transcriptomes of full-sister queen-larvae (QL) and worker-destined larvae (WL) using high-throughput RNA-Seq [20]. They found more than 4,500 genes with different expression between the two types larvae, of which more than 70% were upregulated in QL. They also confirmed that a higher juvenile hormone titer was necessary for QL development.
In this study we sequenced and analyzed the transcriptome of A. lepigone to understand differences in gene expression in different developmental stages and create a basis of molecular information for the development of molecular markers, identification of genes and the functional analysis of expressed genes.

Ethics Statement
Athetis lepigone Möschler (Lepidoptera: Noctuidae) is not an endangered or protected species and has recently become an important insect pest of maize (Zea mays) crops in China. A recent, serious outbreak of A. lepigone in Shijiazhuang, Hebei Province, provided ample opportunity for sample collection. There were no specific permits which were required for these locations/activities for collection.

Source of A. lepigone Material
The A. lepigone strain was collected from maize fields in Shijiazhuang, Hebei Province, China in July 2011 and kept indoors at a temperature of 2661uC with a 14:10 h light:dark photoperiod for eight generations. Samples used in this study were collected at four developmental stages of the ninth generation: 1) a mixture of three-day-old eggs; 2) a mixture of fifth instar larvae; 3) a mixture of three-day-old pupae and 4) a mixture of emergent male and female adults. Every sample was immediately frozen in liquid nitrogen and stored at 280uC until extraction.

RNA Isolation, cDNA Library Preparation and Illumina Sequencing for Transcriptome Analysis
Total RNA was extracted using the TRIzol Reagent (Invitrogen) according to the manufacturer's protocol. The mRNA was purified from total RNA using oligo (dT) magnetic beads and fragmented into small pieces in fragmentation buffer at 70uC for 4 min. First strand cDNA synthesis used SuperScript II and random primers and second strand cDNA synthesis used DNA polymerase I and RNaseH. The double-stranded cDNA was endrepaired using T4 DNA polymerase, Klenow Enzyme (NEB), and T4 polynucleotide kinase (NEB) followed by a single A base addition using Klenow exo-polymerase, then ligated with Adapter using DNA ligase (NEB). The products of the ligation reaction were amplified by PCR and purified using the QIAquick PCR Purification Kit (Qiagen) to create a cDNA library. The cDNA library was quantified with Qubit (Invitrogen) and a cluster of the DNA fragment was amplified using bridge PCR on the surface of a flow cell chip. When the single molecular DNA cluster was amplified many times, the products were sequenced on an Illumina GAII.

Assembly and Gene Identification
The reads obtained for each sample were assembled separately using Trinity software (http://trinityrnaseq.sourceforge.net/) with the K-mer = 25 [21]. Contig were obtained by extending based on the overlap between sequences. Next, the resultant contigs were joined into transcripts with the paired-end information. We selected the longest transcript from the potential alternative splicing transcripts as the sample unigene. The unigenes from four samples were combined to create a lifetime unigene database of A. lepigone by the BLAST-Like Alignment Tool (http://genome. ucsc.edu/cgi-bin/hgBlat). All raw transcriptome data have been deposited in the NIH Short Read Archive (SRA) with the accession number SRP019964.

Functional Annotation for Unigenes
Unigenes were aligned with the Nr, Nt, Swissprot, TrEMBL databases using BLAST with a cut-off e-value of 10 25 . Blast2GO was used to obtain Gene ontology (GO) annotation of the unigenes. The COG and KEGG pathway annotation were also performed using BLAST against Cluster of Orthologous Groups databases and Kyoto Encyclopedia of Genes and Genomes. All above searching were performed with a cut-off e-value of 10 25 .

Differential Gene Expression Analyses
In each sample the expression abundance of unigenes was obtained by mapping the reads from four samples against the reference set of A. lepigone lifetime unigenes using Blat software. The relative transcript abundance were output as RPKM values (Reads Per Kilobase of exon model per Million mapped reads) [22], which is calculated as

R~1 0 9 C NL
Where C is the number of mappable reads that fell onto the specific unigene, N is the total number of reads that could map to unigenes in that sample, and L is the length of the unigene. Differentially expressed genes were detected by IDEG6 software (http://telethon.bio.unipd.it/bioinfo/IDEG6/) with a general chi square test based on RPKM values. The test result was corrected by FDR (false discovery rate). Genes were regarded as differentially expressed when the FDR,0.01 and the absolute value of the log2 ratio .1 (the RPKM values of the gene in one sample was at least 2 times that of the gene in another sample).

Illumina Sequence Data and Assembly
The cDNA samples extracted from the eggs (SRX254872), larvae (SRX254873), pupae (SRX254874) and adults (SRX254875) of A. lepigone were sequenced using the Illumina sequencing platform and each sample generated more than 2 G of transcriptome data. The larvae sample produced the least amount of data (2265.42 Mb) and the pupae sample produced the most data (2857.80 Mb). The average quality value was $20 for more than 99.5% of the cycle (Table 1), suggesting the sequencing was highly accurate. Sample GC content was consistently about 48%. The number and total length of the transcripts and unigenes were different for each sample, but the distribution of lengths was consistent ( Table 2). Assembled reads from the four samples resulted in 81,356 unigenes with a total length of 49.75 Mb, a mean length of 611.5 bp and an N50 length of 858 bp. Out of these 81,356 unigenes, 12,070 unigenes were $1000 bp, accounting for 14.84% of the total.

SSR Development and Analysis
By using MISA, we identified 2,819 simple sequence repeats (SSRs) or microsatellites, distributed in 2,411 sequences, and accounting for 2.96% of the total number of unigenes. Trinucleotide repeats are the most common form of microsatellite present (13.66%), followed by dinucleotide repeats (6.63%) and compound nucleotide repeats (3.83%; Table 4). Among the trinucleotide repeats, the CGC/GCC/GAA/GGC/AAT/AAG/CAA/TGA types are the most common repeats in the recovered unigenes (39.48%). GC/TG/AT/TA repeats account for 54.9% of the total number of repeats in the unigenes and the (GC) n motif is the most frequently recovered dinucleotide repeat.
15,585 unigenes were annotated with 98,773 GO functions based on sequence similarity, with an average of 6.34 GO annotations per unigene (Table S2). The three main categories of GO annotations were 29,384 GO annotations (29.75%) for cellular component, 19,838 annotations (20.08%) for molecular function and 49,550 annotations (50.17%) for biological process. The main categories can be subdivided into 60 categories ( Figure 3, Table 5). In each of the three main categories, ''cell'', ''catalytic activity'' and ''metabolic process'' categories were the highest proportion of annotations, followed closely by ''cell part'', ''binding'' and ''cellular process''. In addition, there were 12 function subclasses with only a few genes (Table 5).
To identify the biological pathways that are active in A. lepigone, we mapped the unigene sequences to the reference canonical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG). In total, we assigned 11,824 sequences to 280 KEGG pathways (Table S3).

Differential Gene Expression at Four Developmental Stages
Differentially expressed genes at four developmental stages were identified in IDEG6. Significantly different expression levels were found in 6,202 genes in the egg and larva libraries (Figure 4). Among those genes, 1,620 were up-regulated in the larva stage and 556 genes were uniquely expressed in larva. 4,582 genes were down-regulated in the larva stage and 1,490 genes were uniquely expressed in the egg. The twenty genes exhibiting the strongest upregulated and down-regulated genes in the larva and egg unigene comparison are shown in Table S4. All ten genes up-regulated in the larvae stage have predicted functions, including three cuticular protein genes (putative cuticle protein, cuticular protein CPR2 and putative cuticle protein CPG36). Seven of the ten down-regulated genes have predicted functions, i.e., three cuticular protein genes (cuticular protein CPG8, cuticular protein RR-2 motif 101 and cuticular protein CPG7), a serine protease inhibitor gene (serine protease inhibitor 8) and a chondroitin gene (Chondroitin proteoglycan-2). According to the COG classification (Table S5), the genes that exhibit up-regulated expression in the larva stage mainly correlate to energy, material transport and metabolism, i.e., energy production and conversion, carbohydrate transport and metabolism, amino acid transport and metabolism, lipid transport and metabolism, inorganic ion transport and metabolism, secondary metabolites biosynthesis, transport and catabolism, and all genes related to RNA processing and modification, intracellular trafficking, secretion, and vesicular transport and chromatin structure and dynamics, were downregulated in larva.
In total, 5,609 genes demonstrate significant changes between the pupa and larva libraries (Figure 4). In the pupa library, 3,616 genes were up-regulated and 1,993 genes were down-regulated. 966 genes were uniquely expressed in the pupa while 754 genes were uniquely expressed in the larva. Nine of the ten most upregulated genes in the pupa library have predicted functions (Table S4), i.e., an apolipoprotein gene (32 kDa apolipoprotein), a neuropeptide related gene (neuropeptide-like precursor 4), a cuticular protein gene (cuticular protein hypothetical 12) and an enzyme gene (similar to Chymotrypsin-2). All ten of the most down-regulated genes have predicted functions, i.e., five cuticular protein genes (putative cuticle protein CPG36, Larval cuticle protein 16/17, cuticular protein CPR2, Larval cuticle protein 1 and putative cuticular protein) and three enzyme  genes (serine protease 28, lipase and trypsin T2a). Based on the COG functional classification (Table S5), up-regulated genes in the pupa were involved in RNA processing and modification, transcription, replication, recombination and repair, chromatin structure and dynamics, cell cycle control, cell division, chromosome partitioning, signal transduction mechanisms and cytoskeleton.
A comparative analysis between the adult and pupa libraries revealed 4,399 genes with significant expression changes ( Figure 4). Among these genes, 2,273 were up-regulated in the adult stage and 2,126 were down-regulated in the adult. 725 genes were uniquely expressed in the adult stage and 733 genes were uniquely expressed in the pupa. Nine of the ten most up-regulated genes in the adult library have predicted functions (Table S4), i.e., three yolk formation related genes (yolk polypeptide 2 and 2 vitellogenin), two flying related genes (troponin C type IIIa-like protein and flightin) and one antimicrobial peptide gene (Anionic antimicrobial peptide 2). Nine of the ten most down-regulated genes have predicted functions, i.e., an energy metabolism related gene (NADH dehydrogenase subunit 2) and two amino acid-rich protein genes (Cysteine and glycine-rich protein 1 and glycine/tyrosine-rich eggshell protein). In the COG classification (Table S5), the up-regulated genes in the adult stage are involved in translation, ribosomal structure and biogenesis, defense mechanisms, intracellular trafficking, secretion and vesicular transport and energy production and conversion. There were 3,254 genes with demonstrated significant changes between the adult and larva libraries (Figure 4). In the adult library 1,979 genes were up-regulated and 1,275 genes were downregulated. 576 genes were uniquely expressed in the adult and 505 genes were uniquely expressed in the larvae. The eight most upregulated genes in the adult library have the following predicted functions (Table S4), i.e., a cuticular protein gene (AF117586_1 putative cuticle protein), an egg formation gene (yolk polypeptide 2), a signal perception gene (chemosensory protein) and two enzyme genes (similar to cathepsin F like protease and beta-fructofuranosidase). Nine of the most down-regulated genes were annotated as an intestinal Table 4. Summary of simple sequence repeat (SSR) types in the A. lepigone transcriptome.     mucin gene (intestinal mucin SeM8), a molting hormone regulated protein (ecdysteroid-regulated protein), two cuticular protein genes (putative cuticle protein CPH37 and Larval cuticle protein 16/17) and two serine protease genes (serine protease 11 and serine protease 28).
According to the COG classification (Table S5), up-regulated genes in the adult stage compared to larvae were mainly involved in translation, ribosomal structure and biogenesis, transcription, replication, recombination and repair, cell cycle control, cell division, chromosome partitioning, signal transduction mechanisms, intracellular trafficking, secretion and vesicular transport. Comparison of the adult and egg libraries revealed 3,081 genes with differential expression, including 1,310 up-regulated genes and 1,771 down-regulated genes in the adult stage ( Figure 4). 504 genes were uniquely expressed in the adult stage and 505 genes were uniquely expressed in the egg stage. The nine most upregulated genes in the adult library were annotated (Table S4), i.e., four egg formation genes (yolk polypeptide 2 and three vitellogenin ) and a cuticle protein gene (putative cuticle protein). The nine most downregulated genes included four cuticular protein genes (cuticular protein CPG8, cuticular protein RR-1 motif 44, cuticular protein RR-2 motif 101 and cuticular protein CPG7) and an enzyme inhibitor gene (serine protease inhibitor 8). In the COG classification (Table S5) the upregulated genes in the adult stage were involved in posttranslational modification, protein turnover, chaperones, energy production and conversion, nucleotide transport and metabolism and inorganic ion transport and metabolism. There were 4,880 genes with demonstrated significant changes between the pupa and egg libraries (Figure 4). Among these genes, 1,958 were up-regulated and 2,922 were down-regulated in the pupa stage. 1,012 genes were uniquely expressed in the pupa and 826 genes were uniquely expressed in eggs. All ten of the most upregulated genes in the pupa library have predicted functions (Table S4), i.e., two tonB genes (two protein tonB, putative), an enzyme gene (similar to Chymotrypsin-2) and a beta-tubulin gene. Eight of the ten most down-regulated genes were annotated, including four cuticular protein genes (cuticular protein CPG8, cuticular protein RR-1 motif 44, cuticular protein RR-2 motif 101, cuticular protein CPG7). According to the COG classification genes up-regulated in the pupa stage compared to the egg were less identifiable to function than genes down-regulated in the pupa stage, except for cell wall, membrane and envelope biogenesis.

Discussion
High throughput transcriptome sequencing technology is a relatively new experimental technology that is continuously undergoing improvements. Next-gen sequencing has become an indispensable tool for genomics studies and has been widely used in a variety of important biological research. We compared the transcriptomes of four developmental stages of the maize pest A. lepigone to provide a framework for understanding changes in gene expression during development. We assembled a total of 81,356 unigenes with an average length of 612 bp and 12,070 unigenes longer than 1000 bp. The Illumina sequencing depth and assembly efficiency in this study are improvements over previous reports [16,19].
The generated 81,356 unigenes were searched against the nonredundant (nr) NCBI protein database using BLASTX. A total of 33,736 unigenes were returned. The sequence matching results showed that A. lepigone shared the highest similarity (20.23% of the annotated unigenes) with the red flour beetle T. castaneum (Coleoptera) rather than the silkworm B. mori (9.62%) of the same order. The same pattern was found in a transcriptome study of the lepidopteran diamondback moth Plutella xylostella [23]. In addition, a similar pattern was also found in the transcriptome study of the back plant hopper and the white back plant hopper (WBPH) [16,19]. These species exhibited a similarity of 18.89% and 16.17% matches with T. castaneum and 13.19% and 12.49% with the pea aphid Acyrthosiphon pisum of the same order, respectively. These results are likely due to the availability of more sequence resources of T. castaneum compared to B. mori and A. pisum in the  NCBI protein database, as the numbers of protein sequences of T. castaneum, B. mori and A. pisum in NCBI database were 27316, 7848 and 20463, respectively. Fast-evolving molecular markers are important tools for the study of population genetics and we identified 2,819 microsatellites in the transcriptome of A. lepigone. Our results indicate that the most common trinucleotide repeats in A. lepigone are similar to Sogatella furcifera and Laodelphax striatellus, in that they have a frequent motif of (AAG) n [16,24]. The most common dinucleotide repeats GC/TG/AT/TA that we found in A. lepigone have not been described in other reports. For example, AG/GA/CT/TC motif repeats are more frequently found in S. furcifera [16] and there is no (GC) n motif found in L. striatellus [24]. Our results indicate that the dinucleotide SSRs recovered here are distinctive to A. lepigone, although data on dinucleotide SSR variability is still limited for most insect orders. The (GA) n motif is known to regulate gene expression in animals and plants [25][26][27], but only a few (GA) n motifs exist in A. lepigone. Further study is needed to test whether these (GA) n motif repeats retain a regulatory function in A. lepigone or if the function of (GA) n has been replaced by other motif repeats. Moreover, the large numbers of potential molecular markers found in our study will be particularly useful for future genetic mapping, parentage analysis, genotyping and gene flow studies of this species.
Many differentially expressed genes were obtained by making pair-wise comparisons of the transcriptomes of four developmental stages. During the development of A. lepigone most of the differentially expressed genes were down-regulated in the larva stage compared to egg stage, and most of genes were up-regulated in the pupa stage compared to larva stage. The genes with the strongest difference in expression correlate with some of the characteristic features in the various development stages, for example, egg formation and flight protein related genes are highly expressed in the adult stage and correspond with some of the key life activities of the adult.
Giving insights into ten of the most differentially up-regulated and ten of the most differentially down-regulated genes, most of these genes had predicted functions. Cuticular protein genes existed in the twenty most differentially expressed genes in the comparison between each library, except between the adult and pupa stages. This indicates that insect cuticular protein compo-sition is expressed differently in different developmental stages. Insect cuticle is a compound of cuticular protein and chitin and not only supports and maintains the physical structure of the organism, but also serves as a natural barrier against external adverse impacts [28]. A change of cuticular protein determines the cuticle composition and performance and is the basis for explaining the performance of cuticle-based structures [29]. This result provides useful information for us to conduct further studies on the cuticular protein and cuticle.
Our study constructed four transcriptome libraries and compared the gene expression abundance of A. lepigone among different developmental stage. The annotated unigenes and unigene expression abundance provide useful information for the identification of genes involved in A. lepigone development.