Identification of putative Type-I sex pheromone biosynthesis-related genes expressed in the female pheromone gland of Streltzoviella insularis

Species-specific sex pheromones play key roles in moth sexual communication. Although the general pathway of Type-I sex pheromone biosynthesis is well established, only a handful of genes encoding enzymes involved in this pathway have been characterized. Streltzoviella insularis is a destructive wood-boring pest of many street trees in China, and the female sex pheromone of this species comprises a blend of (Z)-3-tetradecenyl acetate, (E)-3-tetradecenyl acetate, and (Z)-5-dodecenyl acetate. This organism therefore provides an excellent model for research on the diversity of genes and molecular mechanisms involved in pheromone production. Herein, we assembled the pheromone gland transcriptome of S. insularis by next-generation sequencing and identified 74 genes encoding candidate key enzymes involved in the fatty acid biosynthesis, β-oxidation, and functional group modification. In addition, tissue expression patterns further showed that an acetyl-CoA carboxylase and two desaturases were highly expressed in the pheromone glands compared with the other tissues, indicating possible roles in S. insularis sex pheromone biosynthesis. Finally, we proposed putative S. insularis biosynthetic pathways for sex pheromone components and highlighted candidate genes. Our findings lay a solid foundation for understanding the molecular mechanisms underpinning S. insularis sex pheromone biosynthesis, and provide potential targets for disrupting chemical communication that could assist the development of novel pest control methods.


Introduction
Lepidoptera sex pheromones, which are usually secreted by female moths to attract conspecific males, play a key role in sexual communication, and are used as a monitoring and trapping tool in integrated pest management programs [1][2][3]. In general, moth sex pheromones are composed of two or more components in a unique ratio, and are classified into four types (Type-I, Type-II, Type-III, and Type-0) according to their site of production, chemical structure, and biosynthetic features [4]. Type-I sex pheromones are alcohols and their derivatives

Ethics statement
S. insularis is not on the List of Endangered and Protected Animals in China. The Beijing Municipal Bureau of Landscape and Forestry issued a permit for field collection.

Sample collection
S. insularis individuals were collected from Fraxinus americana at Beijing Forestry University North Road, Haidian District, Beijing, China, in May 2017. Damaged trunks were chopped down, taken to the laboratory, and larvae inside trunks were fed on the phloem and xylem of the host under natural environmental conditions. Adult moths were sexed after emergence according to the genitalia. The pheromone gland and associated ovipositor valves, as well as parts of the terminal abdominal segments (together abbreviated as PG) were dissected from 1-day-old and 2-day-old female adults during the scotophase, which is reported to be the calling period of this moth [35,37]. In addition, antennae and legs were also collected at the same time, immediately placed in RNAlater (Ambion, Austin, TX, USA), and stored at -80˚C.

RNA extraction
Total RNA was extracted from 15 PGs (seven PGs from 1-day-old females and eight PGs from 2-day-old females) using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions, with three biological replicates. RNA purity was evaluated with a Nano-Drop 2000 instrument (Thermo, Waltham, MA, USA), and RNA concentration was measured using a Qubit RNA Assay Kit and a Qubit 2.0 Fluorimeter (Life Technologies, CA, USA). RNA integrity was determined by an Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA), and RNA degradation and contamination were monitored by 1% agarose gels to ensure the quality of the RNA samples for subsequent transcriptome sequencing.

cDNA library construction and Illumina sequencing
cDNA library construction and Illumina sequencing of samples were performed at Shanghai Majorbio Bio-pharm Technology Co., Ltd. (Shanghai, China). According to the TruSeq RNA Sample Preparation Guide V2 (Illumina), mRNA was purified from total RNA using Oligo (dT) magnetic beads, then fragmented by adding fragmentation buffer. Random hexamer primers were used to synthesize first-strand cDNA, followed by synthesis of the second strand using dNTPs, RNaseH, and DNA polymerase I. All remaining overhangs were converted into blunt ends via polymerase. After end-repair, poly-A tailing, and ligation of adapters, 150-200 bp cDNA fragments were purified using an AMPure XP system (Beckman Coulter, Beverly, MA, USA), and 3μl USER Enzyme (NEB, USA) was incubated with size-selected, adaptorligated cDNA at 37˚C for 15 min followed by incubation at 95˚C for 5 min, prior to PCR amplification. PCR products were purified using an AMPure XP system, and library quality was assessed on the Agilent Bioanalyzer 2100 system. Finally, S. insularis cDNA libraries were sequenced on an Illumina Hiseq 4000 platform, and paired-end reads were generated.

Sequence assembly and functional annotation
To obtain the clean reads, the raw reads were processed to remove low-quality reads and adapter sequences. Then, GC Content, Q20 and Q30 were used to assess the sequencing quality. The qualified reads assembly was carried out with the short reads assembling program-Trinity [38]. The largest alternative splicing variants in the Trinity results were called unigenes. The annotation of unigenes was performed by the National Center for Biotechnology Information (NCBI) BLASTx searches against the non-redundant (Nr) protein database, with a cut-off E-value of 10 −5 . Unigenes were also annotated using other protein databases including Gene Ontology (GO) [39], Clusters of Orthologous Groups of proteins (COG) [40], and Kyoto Encyclopedia of Genes and Genomes (KEGG) [41]. The longest open reading frame (ORF) for each unigene was determined by the NCBI ORF Finder tool (http://www.ncbi.nlm. nih.gov/gorf/gorf.html). Fragments per kilobase of exon per million mapped reads (FPKM) values were calculated by RSEM (RNA-Seq by Expectation-Maximization) with default parameters represented gene expression in S. insularis PG tissue [42].

Sequence and phylogenetic analyses
Amino acid sequences of candidate desaturases were aligned with those of other insect species using ClustalW by MEGA (Version 5.0) [43]. Phylogenetic tree construction was performed using the neighbor-joining method as implemented in MEGA (Version 5.0) with a p-distance model and pairwise deletion of gaps. Bootstrap support of tree branches was assessed by resampling amino acid positions 1000 times [44]. Phylogenetic trees were colored and arranged using FigTree (Version 1.4.2) [45].

Expression analysis by quantitative real-time PCR (RT-qPCR)
Expression patterns of putative ACC and DES genes in different tissues (antennae, legs, and PGs) were analyzed by RT-qPCR using a Bio-Rad CFX96 PCR System (Hercules, CA, USA). Total RNA was extracted from 25 antennae, 10 legs, and 15 PGs of female moths following the method described above, and transcribed into cDNA using a PrimeScript RT Reagent Kit with gDNA Eraser (No. RR047A; TaKaRa, Shiga, Japan). Gene-specific primers were designed using Primer 3 Plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi) and are listed in S1  GO annotation was used to classify the unigenes into three functional groups (molecular function, cellular component, and biological process) according to the GO categories. Of 30,307 unigenes identified in S. insularis, 8053 (26.57%) were annotated. As shown in Fig 2, 20,072 unigenes were assigned to the 'molecular function' category, and 'binding' (4141 unigenes, 43.14%) and 'catalytic activity' (3695 unigenes, 38.49%) were the most highly represented terms in this category. A total of 12,115 unigenes were assigned to GO terms in the 'cellular component' category, and 'cell part' (2409 unigenes, 19.88%) and 'cell' (2409 unigenes, 19.88%) were the most abundant terms. A further 20,072 unigenes were assigned to GO terms in the 'biological process' category, and the main terms were 'cellular process' (4329 unigenes, 21.56%) and 'single-organism process' (3326 unigenes, 16.57%). In addition, all unigenes were searched against the COG database for functional prediction and classification, and the results showed that 3865 unigenes (12.75%) could be assigned to 25 specific categories (Fig 3); 'signal transduction mechanisms' (567 unigenes, 14.67%) was the largest group, and 'cell motility' (5 unigenes, 0.13%) was the smallest group. Furthermore, KEGG annotation was used to divide unigenes into five KEGG pathways (cellular processes, environmental information processing, genetic information processing, metabolism, and organismal systems; Fig 4). Most unigenes were assigned to the 'processes' branch, and 'global and overview maps' (1251 unigenes, 28.07%) was the most highly represented term.

Pheromone biosynthesis-activating neuropeptide receptor (PBANR)
The biosynthesis of Type-I sex pheromones in female moths has been shown to be regulated by a C-terminally amidated 33 amino acid neuropeptide termed PBAN that is released from the subesophageal ganglion in the brain and transported through the hemolymph to the PG [47][48]. The binding of PBAN to its receptor in the PG cell membrane will induce the opening of Ca 2+ channels causing the influx of extracellular Ca 2+ , which then initiates sex pheromone production [49][50]. PBANR, a G protein-coupled receptor (GPCR), was first cloned from the PG of Helicoverpa zea [51]. PBANR has since been identified in Bombyx mori [52] and other Lepidoptera species [49,53]. PBANRs exist as PBANR multiple isoforms (PBANR-As, -A, -B, and -C) based on alternative splicing of the C-terminus [54]. The various isoforms play different functional roles in the ligand-induced internalization [55], a phase of GPCR feedback regulation and desensitization in diverse moth species [56][57]. Herein, we identified a single PBANR in the S. insularis PG transcriptome that is 84% identical to Mamestra brassicae PBANR isoform B (ARO85772.1) and is very low in abundance (0.56 FPKM; Table 2 and S1 Text). The number of PBANR-encoding genes in the S. insularis PG was in accordance with Plutella xylostella [25], Agrotis segetum [58], and Agrotis ipsilon [59]. In addition, previous studies identified three isoforms of PBANR in Ostrinia nubilalis [60] and Mamestra brassicae [61]. However, we did not discover other isoforms of PBANR in our transcriptomic data, which may be explained by lower expression levels in S. insularis.

Acetyl-CoA carboxylase (ACC)
The first step of saturated long-chain fatty acid biosynthesis is the ATP-dependent carboxylation of acetyl-CoA to malonyl-CoA catalyzed by ACC, a rate-limiting enzyme [13][14]. In the S. insularis PG transcriptome, we identified two ACCs with lengths of 723 and 7616 bp ( Table 2 and S1 Text), similar to the numbers reported previously for other moth species (two in A. ipsilon [59], one in P. xylostella [25], and one in A. segetum [58]). SinsACC1 with an ORF of 399 bp encodes for an ACC with 63% amino acid identity with the ACC of Amyelois transitella (XP_013185423.1), and SinsACC2 has an intact ORF of 7101 bp that shares high amino acid identity (90%) with the ACC of Papilio polytes (XP_013146614.1). The RT-qPCR results ( Fig 5) showed that SinsACC1 was more strongly expressed in the antennae than in the other tissues, whereas SinsACC2 was mainly expressed in the PG. However, both were present in low abundance (3.6 and 28.33 FPKM) in the S. insularis PG transcriptome. It was reported that the plastid-specific ACC is inhibited by herbicides that target the eukaryotic form of the enzyme in monocotyledonous plants [62][63][64]. Eliyahu et al. (2003) subsequently demonstrated that the herbicide diclofop inhibits PBAN-activated sex pheromone production in Helicoverpa zea, thereby implicating ACC plays a key regulatory role in fatty acid biosynthesis [65], which provides a basis for the development of a new pest control method based on disruption of sex pheromone production in females.

Fatty acid synthase (FAS)
FAS is the multifunctional protein that catalyzes acetyl-CoA, malonyl-CoA, and NADPH through-multienzyme complex that catalyzes the synthesis of long-chain fatty acids. Labeling studies demonstrated that palmitic acid (C16) and stearic acid (C18) are the FAS products in most moth PGs [15,[66][67]. Herein, we identified five FASs with lengths ranging from 301 bp to 8170 bp in the S. insularis PG transcriptome (Table 2 and S1 Text), These results are similar to those reported for other insects, with six and three FASs in A. segetum [58] and Sesamia inferens [68], respectively. Among the five FASs, only SinsFAS2 has an intact ORF. BLASTX results showed that FASs share high sequence similarity with Lepidoptera FASs in the NCBI

Desaturase (DES)
Double bonds are introduced into the fatty acid chain at specific positions by a variety of desaturases [69]. Three putative sex pheromone compounds of S. insularis were identified as Z3-14:OAc, E3-14:OAc, and Z5-12:OAc, which are unsaturated fatty acids with acetate esters as the functional group. It is therefore reasonable to assume that the saturated fatty acid precursor of S. insularis sex pheromones is palmitic acid (C16), which is desaturated by Δ5-desaturase and Δ9-desaturase to form the precursors Z/E5-16:acyl-CoA and Z9-16:acyl-CoA in the production of two major (Z3-14:OAc and E3-14:OAc) and one minor (Z5-12:OAc) sex pheromone component, respectively (Figs 6 and 7). From the S. insularis PG transcriptome, we identified 17 putative DESs with lengths ranging from 244 to 7148 bp (Table 2 and S1 Text). The number of DESs identified in S. insularis was more than that in A. ipsilon [59], P. xylostella [25], and A. segetum [58]. Of these DESs, the identity of the best BLASX match in the NCBI NR database ranged from 52% to 89%. Notably, SinsDES15 identified in the S. insularis transcriptome shared the highest identity (89%), comparable with DES1 in Bombyx mori (XP_004930794.1). Of the 17 DESs, nine DES sequences were either less than 1000 bp, or no common sites were found for computing distances; thus, we only used the remaining eight S. insularis DES sequences to construct our phylogenetic tree (Fig 8). In the tree, SinsDES13 and SinsDES16 are clustered in the 'Δ11-desaturases' clade. The SinsDES17 sequence shares high sequence homology with 'Δ9-desaturases', and it clusters with other enzymes also possessing the NPVE motif. The remaining DESs clustered into the 'other desaturases' ortholog clade. The qRT-PCR results (Fig 5) revealed that SinsDES6 and SinsDES8 were highly expressed in S. insularis PG compared with the other tissues, suggesting that they may play roles in S. insularis sex pheromone production. The other five DESs (SinsDES1, SinsDES3, SinsDES4, SinsDES6, and SinsDES7) were expressed at significantly higher levels in antennae than in other tissues.

Sex pheromone biosynthesis in Streltzoviella insularis
All DESs except SinsDES8 and SinsDES17 were present at low abundance (from 0.41 to 86.74 FPKM) in the S. insularis PG transcriptome. DESs play important roles in the generation of structural diversity in Lepidopteran sex pheromone biosynthesis, owing to the evolution of diverse enzymatic properties [22]. Based on the most likely sex pheromone biosynthetic pathways in S. insularis, both the Δ5-and Δ9-desaturase are likely involved, but it is not clear which of the 17 desaturase genes identified in our study encode these enzymes. Further biochemical analyses of these desaturases are required to determine which ones are involved in pheromone biosynthesis.

Fatty acyl-CoA reductase (FAR)
Chain-shortened fatty acyl CoA precursors are reduced to the corresponding alcohols by alcohol-generating FARs. Fatty alcohols can serve as sex pheromone components in many moths including Plutella xylostella [25]. Herein, we detected 13 transcripts homologous to putative FAR genes in the S. insularis PG transcriptome (Table 2 and S1 Text), similar to the number identified in other moth species (13 in A. ipsilon [59] and 10 in A. segetum [58]). Among them, SinsFAR6 was expressed at the highest level (476.06 FPKM). The FARs in S. insularis encode proteins shared 46-92% amino acid sequence identity with homologs in other Lepidoptera moths such as B. mori, Helicoverpa armigera, and Spodoptera exigua.

Alcohol dehydrogenase (AD)
Fatty alcohols can also be used as pheromone intermediates to produce corresponding aldehydes by ADs [73]. In the S. insularis PG, five homologous full-length AD genes were identified (Table 2 and S1 Text). The number of AD-encoding genes in S. insularis was in accordance with P. xylostella [25] and A. ipsilon [59]. Two ADs (SinsAD1 and SinsAD4) encode proteins that are homologous to ADs in Ostrinia furnacalis (BAR64763.1 and BAR64764.1) and share relatively high amino acid sequence identity (70%); SinsAD2 encodes a protein sharing 66% identity with Sesamia inferens AD1 (AII21999.1), SinsAD3 encodes a protein sharing 94% identity with the AD of Helicoverpa armigera (XP_021189392.1), and Sin-sAD5 encodes a protein sharing 71% identity with the AD of Cydia pomonella (AKQ06148.1). FPKM value analysis revealed low expression levels in the S. insularis PG for all five ADs (FPKM <50).

Aldehyde reductase (AR)
ARs are a group of the aldo-keto reductases that catalyze the reduction of fatty aldehydes to alcohols [74]. Whether ARs first produce aldehydes which are then converted to alcohols, or vice versa, is very difficult to distinguish in sex pheromone biosynthesis. Herein, we identified five AR genes in the S. insularis PG transcriptome, and four included intact ORFs (Table 2 and S1 Text). The number of ARs identified in S. insularis was less than that in A. ipsilon [59] and P. xylostella [25]. The deduced protein sequences of these five genes share high amino acid sequence identity (>60%) with their homologs in other Lepidoptera species, and all were expressed at low levels (from 6.56 to 125.24 FPKM) in the S. insularis PG.

Acetyltransferase (ATF)
ATF catalyzes the conversion of fatty alcohols to acetate esters, and this is the final enzyme in the pheromone biosynthetic pathway of the S. insularis. Previous studies showed that ATF is found almost exclusively in the PG, and is active during the photophase and all adult stages [75][76]. ATF is microsomal and exhibits specificity for the Z isomer of 12-, 14-, and 16-carbon monounsaturated fatty alcohol substrates [29][30][75][76]. However, the enzyme has not been identified at the gene level in any moth so far [58]. In the present study, we identified two transcripts predicted to encode ATFs in the S. insularis PG ( Table 2 and S1 Text). The number of ATF-encoding genes in the S. insularis PG was in accordance with P. xylostella [25]. The BLASTX results revealed 89% and 79% amino acid sequence identity shared with putative ATFs of Ostrinia furnacalis and Amyelois transitella (XP_028157143.1 and XP_013192024.1), respectively. Both ATF transcripts were present at low abundance (23.49 and 0.47 FPKM) in the S. insularis PG.
Supporting information S1