De novo characterization of the pine aphid Cinara pinitabulaeformis Zhang et Zhang transcriptome and analysis of genes relevant to pesticides

The pine aphid Cinara pinitabulaeformis Zhang et Zhang is the main pine pest in China, it causes pine needles to produce dense dew (honeydew) which can lead to sooty mold (black filamentous saprophytic ascomycetes). Although common chemical and physical strategies are used to prevent the disease caused by C. pinitabulaeformis Zhang et Zhang, new strategies based on biological and/or genetic approaches are promising to control and eradicate the disease. However, there is no information about genomics, proteomics or transcriptomics to allow the design of new control strategies for this pine aphid. We used next generation sequencing technology to sequence the transcriptome of C. pinitabulaeformis Zhang et Zhang and built a transcriptome database. We identified 80,259 unigenes assigned for Gene Ontology (GO) terms and information for a total of 11,609 classified unigenes was obtained in the Clusters of Orthologous Groups (COGs). A total of 10,806 annotated unigenes were analyzed to identify the represented biological pathways, among them 8,845 unigenes matched with 228 KEGG pathways. In addition, our data describe propagative viruses, nutrition-related genes, detoxification related molecules, olfactory related receptors, stressed-related protein, putative insecticide resistance genes and possible insecticide targets. Moreover, this study provides valuable information about putative insecticide resistance related genes and for the design of new genetic/biological based strategies to manage and control C. pinitabulaeformis Zhang et Zhang populations.

Introduction Sooty mold is a devastating disease in pine trees [1] excreted by the sucking aphid C. pinitabulaeformis Zhang et Zhang, together with direct damage that aphids produce to pine needles results in economical loss in Asia. In China, C. pinitabulaeformis Zhang et Zhang is geographical distributed in several provinces including Liaoning, Beijing, and Shandong; high population levels of this aphid results in broad damage to forest resources, landscapes and has an impact in the ecological environment of China. Aphids, both nymphs and adults, feed exclusively on plant phloem sap [2] by inserting their mouthparts and then inject saliva that might be phytotoxic [1]. In addition, aphids can transmit several viruses: 275 out of 600 (nearly 50% of insect-borne viruses) which could cause many harmful diseases [3,4]. Therefore, effective control strategies for C. pinitabulaeformis Zhang et Zhang are critical steps to protect the pine trees, playing an important role in treatment and prophylaxis of pine diseases.
At present, the primary strategies to control C. pinitabulaeformis Zhang et Zhang include: physical control (burning of branches and leaves that contain eggs), chemical control (insecticides), biological control (natural enemies, such as Coccinella septempunctata [5], Adalia bipunctata [6], Hyperaspis repensis [7], Hippodamia variegate [8] have shown high effectiveness). Among these, biological control has unique advantages: (1) no (or low) resistance to biological agents, (2) no environmental risk and (3) through genetic modification natural enemies can be used to generate strongly pathogenic strains to overcome pine pests. However, biological control can be expensive and it can be a high risk strategy when introducing higher numbers of other insect populations; moreover, natural enemies may compete and produce toxic substances to inhibit other natural enemies [4]. In the past decade, genetic modification tools offer a new insight in pest control through genetic modification (transgenesis). However, there is currently lacking of knowledge regarding C. pinitabulaeformis Zhang et Zhang, gene function and gene expression in this insect.
We used next generation sequencing technology to sequence the transcriptome of C. pinitabulaeformis Zhang et Zhang and successfully built a transcript database. In addition, our data describes putative insecticide resistance genes, olfactory related receptors, stressed-related protein, detoxification related molecules, possible insecticide targets and propagative viruses. This study provides basic valuable information that can be used to develop new genetic based strategies and novel molecular tools to control C. pinitabulaeformis Zhang et Zhang.

COG classification
A total of 11,609 classified unigenes were identified using the COG database (Fig 2) under 26 functional categories. General function prediction (16.06%) was the largest group, followed by posttranslational modification, protein turnover and chaperones (11.80%), signal transduction mechanisms (11.12%), translation, ribosomal structure and biogenesis (10.29%), energy production and conversion (5.87%) and transcription (5.73%). In addition, secondary metabolites biosynthesis, transport and catabolism represented 2.53%, given the importance of secondary metabolic activity for resistance of insects. COG classification can further reveal the biological function and an insight into chemical reactions in molecular processes in C. pinitabulaeformis Zhang et Zhang.

KEGG analysis
A total of 10,806 annotated unigenes were analyzed to identify the represented biological pathways in C. pinitabulaeformis Zhang et Zhang. Briefly,8,845 unigenes matched with 228 KEGG pathways, summarized in S2 Table. Signal transduction in environmental information processing (1,136 members) was the most highly represented pathway, followed by translation and folding (1022 members), sorting and degradation in genetic information processing (686 members), endocrine system in organismal systems (683), transport and catabolism (642), carbohydrate metabolism (637) and lipid metabolism (439), immune system (408), amino acid metabolism (383), and digestive system (377). KEGG identified pathways can provide new insight in the study of insect biology and contribute to the prediction of higher-level complexity of cellular processes and organism behavior from genomic information.

Identified viruses
Viruses can be an important tool to control aphid pests, however, it is important to identify the virus transmitted by C. pinitabulaeformis Zhang et Zhang. For example, Bonning et al., used coat protein from luteovirus to deliver insect neurotoxins to create an effective oral toxin, which caused aphid dead after they eat the transgenic plant that contains the protein-toxin fusion [10]. There is a great potential to take advantage of the viruses that are present in C. pinitabulaeformis Zhang et Zhang. Among the identified virus sequences are reovirus, geminivirus, tymovirus and occlusion-derived virus (S3 Table). Aphid transmission of these viruses may cause plant weakness, yellowish, slow growing and/or death. For example, geminivirus can reprogram plant cell cycle and transcriptional control, by interfering with cell signaling, protein turnover, suppressing immune pathways, and inhibit cell death [11]. As result, transmission of these viruses can have economic impact in crops [12]. Although, plants have many strategies for virus resistance; some viruses rapidly evolve by recombination, mutation, and component capture, allowing these viruses to evade or counter these strategies rapidly [13]. The identified virus can suggest that aphids might be a reservoir for plant virus transmission resulting in plant diseases. Further studies on these viruses may be important to develop new strategies to control other hemipteran pests and may provide information for an efficient strategy to control virus spreading and use of viruses.

Vitellogenin genes
Four vitellogenin unigenes were identified in of C. pinitabulaeformis Zhang et Zhang transcriptome. Vitellogenin is the main source of nutrition for the embryo, it has antioxidant activity in honey bee [14,15]; and it was also reported it has many other biological functions, such as temporal division of labor and foraging specialization, regulation of hormonal dynamics and change in gustatory responsiveness [16,17]. Vg is an important factor for the population proliferation of pest, moreover, aphids have great potential of reproduction; further studies are needed to understand the molecular mechanisms of reproductive biology in C. pinitabulaeformis Zhang et Zhang, and generate new knowledge on the effect of Vg expression patterns in population dissemination.

Salivary gland related proteins
The saliva in aphids plays important physiological roles in detoxifying toxic substances or in continuous ingestion of the sieve element sap. We identified six laccases and 107 aminopeptidases unigenes in C. pinitabulaeformis Zhang et Zhang (S4 Table).
Laccases have been detected in the cuticles of many species like Bombyx mori and Schistocerca gregaria [18] but also in the midgut, Malpighian tubules, fat body and salivary gland [18]. It is suggested that acts as a detoxifying enzyme to overcome the chemical defenses of the host plant (phenolic compounds) [18,19]. Moreover, laccase knockdown in Tribolium castaneum can lead to lethality associated with defects in both pigmentation and cuticle hardening [7].
Aminopeptidase genes are generally expressed in the midgut [20], however, they can also be expressed in the salivary gland [21]. Aminopeptidase-N cleavages amino acids from the amino termini of proteins [22] and have been found to interact with plant-expressed-lectins in A. pisum [23]. Aminopeptidases expressed in salivary have been commonly detected in aphids, suggesting that they protect from toxic molecules such as plant lectins [6, 24,25]. The lack of aminopeptidase expression in aphids may avoid aphid to fix on the plant surface, which could contribute in the aphid's death; aminopeptidase can be a target for genetic manipulation strategies to control aphid pests.

Stress-related protein
Heat shock proteins (HSPs) are related to stress responses [26], and are divided into five major groups: small HSP (sHsp), HSP60, HSP70, HSP90, and HSP100 [27]. Among these groups, HSP70 family is the largest group of HSPs and can be separated into two sub-groups: HSP70 and heat shock protein cognates (HSC70), based on the expression patterns as response to various stimuli [28]; while HSP70 is inducible and expressed at very low levels under basal conditions; however, its transcription and translation are quickly induced as response to stress [29][30][31]. In general, HSPs are upregulated by stress conditions [26], and are better known for protecting the cell against thermal stress conditions [32][33][34][35]. Moreover, HSPs function as molecular chaperones to promote correct refolding and prevent aggregation of denatured proteins [36]. In C. pinitabulaeformis Zhang et Zhang we identified sHsp, HSP60, HSP70, HSP90; as well as HSP67B2, HSP68, HSPSTI1, HSP98, HSP110, HSPSSA, HSP cognate 3 (HSC70-3), HSP20, HSP9/12, and HSP-like protein (S5 Table). However, there are not detailed descriptions about these genes in C. pinitabulaeformis Zhang et Zhang. HSPs can play an important role in temperature tolerance of C. pinitabulaeformis Zhang et Zhang to survive under stress environmental conditions.
In general, olfactory protein include ORs, OBPs, CSPs, SNMPs, ionotropic receptors (IRs) and gustatory receptors (GRs), which are related to recognition of host plant and congener species, foraging, sexual behavior, defense, nest mate recognition and caste regulation.
In aphids olfactory receptors are needed to recognize pheromones during migration [54]. The understanding of the olfactory systems of C. pinitabulaeformis Zhang et Zhang can contribute in the development of novel biological control methods [55].

Calcium channels
Calcium channels play important roles in the transmission of Ca 2+ , which can affect other ion transmission. In C. pinitabulaeformis Zhang et Zhang a total of 45 calcium channels unigenes were identified and classified into L-, N-, P/Q-, R-, or T-type Ca 2+ channels based on their electrophysiological and pharmacological properties (S7 Table) [56]. Calcium channels in aphids are targets of the spider insecticide Hv1a. Hv1a is lethal to wide range of insects including aphids, blocking ion passage through the channels or modifying the gating mechanism that controls opening and closing of the ion pore [57].
Based on silkworm P450 gene homology and the phylogenetic analysis of Drosophila four large families of P450 enzyme system were identified: CYP2, CYP3, CYP4 and mitochondrial P450 [67].
It has been reported that 17 P450 genes from CYP3 and CYP4 families are associated with pesticide resistance and plant toxin resistance [67,68]. Based in phylogenic analysis; we found evidence that CYP2, CYP6 and mitochondrial families are present in the transcriptome of C. pinitabulaeformis Zhang et Zhang, as well as, CYP3/CYP5/CYP6/CYP9 and CYP4/CYP19/ CYP26. We also identified CYP6a2, CYP6a13, CYP6a14, CYP6a18, CYP6k1, CYP6CY3 and CYP6B1 belonging to CYP6 family, and CYP4v2 belonging to CYP4 family. Interestingly, the P450 gene families identified in C. pinitabulaeformis Zhang et Zhang differ from those reported in pea aphid and other insect systems [67,69].
According to recent studies, P450 possesses resistance-related function to insecticide and plant secretions [70]. There is also evidence that the overlap degree of P450 is related to detoxification and metabolism. However, there is no experimental data to prove the biological role of P450 complex related to insecticide resistance in C. pinitabulaeformis Zhang et Zhang.

Glutathione S-transferase (GST).
GSTs are responsible for detoxification of xenobiotic compounds like plant secondary metabolites and insecticides [71,72]. High levels of GSTs expression are involved in insecticide resistance in insects [73]. We identified 48 GST unigenes in C. pinitabulaeformis Zhang et Zhang, 15 of them with a size larger than 1000bp (31.25%) ( Table 2). According their cellular location, GSTs are divided into three major categories: microsomal, cytosolic and mitochondrial [74]. Moreover, in insects cytosolic matrix GSTs are further divided into at least six classes (delta, epsilon, omega, sigma, theta, and zeta) which is based on homology of the N-terminus sequence, immunoreactivity, substrate specificity, and sensitivity to different inhibitors [75][76][77]. Epsilon and delta classes are unique in insects [78]. We identified delta, omega, sigma, theta, zeta types of GST in C. pinitabulaeformis Zhang et Zhang transcriptome, nevertheless we could not identify any genes from the epsilon class. However, GST delta class is only expressed in wingless aphid like pea aphid, A. pisum [79]. Sigma, delta and theta GST classes have been identified in other insects such as Nasonia vitripennis; one delta unigene in T. yunnanensis; and epsilon, sigma, omega, and delta in T. castaneum have been identified [80,81]. The GST genes identified in the transcriptome of C. pinitabulaeformis Zhang et Zhang can contribute to further understanding of insecticide resistance mechanisms in aphid pests.

Other insecticide resistance-related genes and insecticide receptors
We also identified 187 unigenes (S9 Table) with potential insecticide resistance functions, pesticide receptors and detoxification, such as: carboxylesterase, sodium channel, superoxide dismutase, acetyl-CoA carboxylase, chloride channel, ryanodine receptor, c-aminobutyric acid (GABA) receptors, nicotinic acetylcholine receptor, acetylcholinesterase (Fig 3). These genes can provide basic information to contribute in the knowledge of insecticide mechanism, which are not described in aphids.

RNA interference-related genes
RNA interference (RNAi) is a mechanism of homology dependent gene silencing present in plants and animals [82]. RNAi is high specificity, efficiency, stable, transmissible and hereditary; this characteristics make it a valuable tool to address various questions in insect toxicology, as well as may become an effective strategy for management of insect pests [83]. Thirty-one unigenes related to RNAi pathways were identified in C. pinitabulaeformis Zhang et Zhang (Fig 4): two RNA-dependent RNA polymerases, twenty-two scavenger receptors (SRs), and seven Systemic RNA Interference Defective1 (SID-1) unigenes. These components previously have been reported in insects as part of RNAi uptake mechanisms [84][85][86]. However interestingly, we did not identify any vacuolar H + ATPase and RSD-3 unigenes, although they are involved in RNAi pathways. Among the identified genes, 20 unigenes were larger than 700bp (64.52%) and 17 were more than 1kb (54.84%). In addition, one SID-1-related unigene and one scavenger receptors were both represented by longer sequences in the transcriptome analysis, at c15255_g2 (4397bp) and c16045_g1 (4065bp), respectively.
Scavenger receptors (SRs) can recognize polyanionic ligands which are extensive and in Drosophila [87], they play key parts in mediating phagocytosis of pathogens and immune responses (suppression of scavenger receptors transcription by parasitoid factors). SRs bind to proteins, polysaccharides, polyribonucleotides and lipids directly [88,89]; recently it has been shown that SRs are involved in microRNA delivery [90]. Among 22 unigenes related to scavenger receptors, thirteen of them had lengths above 1kb, while the rest ranged from 300bp to 1kb. SID-1 is crucial for systemic RNAi pathways in Caenorhabditis elegans, which is a multispan transmembrane protein; it delivers dsRNAs to cells [87] and it was identified as a required gene for systemic, but not cell-autonomous, RNAi [91]. In the cotton aphid, Aphis gossypii a sid-1 orthologue has also been identified, although this gene has not been identified in D. melanogaster [92].
Further studies of scavenger receptors, SID-1 and RNA-dependent RNA polymerase genes can increase the understanding in the biology of cell protection against endogenous/exogenous nucleic acids of pathogens. Moreover, understanding the mechanisms of RNAi pathways will contribute to develop new molecular approaches to control plant pathogens and insects through suppression of key genes/proteins of infecting organisms.

Conclusion
Cinara pinitabulaeformis Zhang et Zhang is known as the main pest of pine trees and produce dense dew which lead to the sooty mold which makes pine trees infect with disease. In this work, we sequenced and characterized the transcriptome of C. pinitabulaeformis Zhang et Zhang by Illumina sequencing. We identified a large group of genes related to propagative viruses, nutrition-related genes, detoxification related molecules, olfactory related receptors, stressed-related protein, putative insecticide resistance genes and possible insecticide targets. This study provides valuable information that may play an important role in the developing new control strategies to manage C. pinitabulaeformis Zhang et Zhang populations.

Ethics statement
There are no specific permits for insect collection in the selected locations. The sampling locations are not privately-owned or natural protected areas. Insects used for the experiments are not considered endangered or protected species, and its collection is legal in China.

Insects
Wingless adult aphids of C. pinitabulaeformis Zhang et Zhang were collected from Pinus massoniana (P. massoniana) branches in the town of JinJiang, ZiMao mountain and FuJian province (N 26.15046˚; E 119.59261˚). Ten aphids (five females and five males) were stored at -80˚C until RNA extraction.

Illumina sequencing and cDNA library
Total RNA was extracted from whole aphid adults using TrizolH (Invitrogen, USA). Residual genomic DNA was removed by incubation with DNase I, Amp Grade (Invitrogen, USA) and RNA integrity was evaluated by agarose gel electrophoresis. RNA was quantified on a Nano-Drop ND-1000 spectrophotometer (Thermo Scientific, USA). For RNA sample preparations, a total amount of 1.5 μg RNA per sample was used as starting material. Generation of sequencing libraries was done using NEBNext1 Ultra™ RNA Library Prep Kit for Illumina1 (NEB, USA), following the manufacturer's recommendations. Briefly, purification of mRNA from total RNA was done using poly-T oligo-attached magnetic beads. Fragmentation was performed by using divalent cations under high temperature in NEBNext First Strand Synthesis Reaction Buffer (5x). First strand cDNA was synthesized using random hexamer primers and M-MuLV Reverse Transcriptase (RNase H-). Second cDNA strand was generated using DNA Polymerase I and RNase H. The residual overhangs were converted into blunt ends using exonuclease/polymerase. NEBNext Adaptor contained a hairpin loop structure was ligated for hybridization after adenylation of the 3' ends of DNA fragments. In order to select cDNA fragments of 150~200 bp in length, the purification of cDNA fragments were done using the AMPure XP system (Beckman Coulter, Beverly, USA). Three μl of USER Enzyme (NEB, USA) was used in conjunction with cDNA which was size-selected, adaptor-ligated at 37˚C for 15 min followed by 5 min at 95˚C before PCR reaction. PCR reaction was performed using Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. PCR products were purified (AMPure XP system) and library quality was assessed using the Agilent Bioanalyzer 2100 system.

Bioinformatic analysis
The cDNA library was sequenced by High-throughput sequencing platform which was allowed to produce many high-quality reads based on Sequencing by Synthesis (SBS) technology. Raw data were from cleaned low-quality reads and joint sequences. The identified transcript sequences in each fragment collection, was done using the De Bruijn method of graphing the sequencing read information [93]. E-value of the BLAST parameter was set at 10 −5 , and E-value of the HMMER parameter was set at 10 −10 . All unigenes were compared with Swiss-Prot [94], GO [95], NR [96], COG [95], KEGG [97], KOG [98] and databases using BLAST [99] software. COG and KEGG databases were used to predict metabolic pathways. The function of the identified transcript and assignment of Gene Ontology (GO) terms were determined using the GO database. The output of amino acid sequence of unigenes was analyzed using the HMMER [100] software and searched in the Pfam [101] database to gain annotated information for the unigenes.
Supporting information S1