De Novo Transcriptomes of Olfactory Epithelium Reveal the Genes and Pathways for Spawning Migration in Japanese Grenadier Anchovy (Coilia nasus)

Background Coilia nasus (Japanese grenadier anchovy) undergoes spawning migration from the ocean to fresh water inland. Previous studies have suggested that anadromous fish use olfactory cues to perform successful migration to spawn. However, limited genomic information is available for C. nasus. To understand the molecular mechanisms of spawning migration, it is essential to identify the genes and pathways involved in the migratory behavior of C. nasus. Results Using de novo transcriptome sequencing and assembly, we constructed two transcriptomes of the olfactory epithelium from wild anadromous and non-anadromous C. nasus. Over 178 million high-quality clean reads were generated using Illumina sequencing technology and assembled into 176,510 unigenes (mean length: 843 bp). About 51% (89,456) of the unigenes were functionally annotated using protein databases. Gene ontology analysis of the transcriptomes indicated gene enrichment not only in signal detection and transduction, but also in regulation and enzymatic activity. The potential genes and pathways involved in the migratory behavior were identified. In addition, simple sequence repeats and single nucleotide polymorphisms were analyzed to identify potential molecular markers. Conclusion We, for the first time, obtained high-quality de novo transcriptomes of C. nasus using a high-throughput sequencing approach. Our study lays the foundation for further investigation of C. nasus spawning migration and genome evolution.


Introduction
The Japanese grenadier anchovy (Coilia nasus) is a small commercial fish in China, which belongs to the family of Engraulidae, order of Clupeiformes [1]. It is renowned for its delicate and tender meat. Moreover, C. nasus is well known for the long-distance ocean-river spawning migration of its anadromous population.
C. nasus lives in coastal ocean water for most of its lifetime, and normally reaches sexual maturity at the age of 1-2 years. C. nasus spawns between February and September [2]. Every year, when the spawning period arrives, thousands of mature C. nasus individuals undergo a long-distance migration from coastal ocean up to exorheic rivers, such as the Yangtze River, and then spawn in the lower and middle reaches of these rivers and adjacent lakes. Interestingly, the sedentary population of C. nasus in lakes has abandoned the long-distance migration for unknown reasons and become permanent residents there.
The ability to recognize the spawning ground is a key skill for successful reproduction. Recently, there has been a sharp decline in the population of anadromous C. nasus because of environmental pollution, overfishing and the destruction of spawning grounds. Therefore, the understanding of C. nasus spawning migration is essential for its conservation and stock management. However, little is known about the molecular basis of C. nasus spawning migration.
Previous studies on fish migration have mostly focused on salmonids. It has been hypothesized that salmonids use olfactory cues to return to natal rivers to spawn. Several studies, wherein the salmonid olfactory epithelium was altered, have concluded that salmonids without olfactory ability cannot discriminate natal streams and that functional olfactory ability is essential for their migration to spawn [3][4][5][6][7]. Similar conclusion was also drawn for American eels, and with the functional olfactory ability absent, anosmic eels lost the ability to migrate out of the estuary during the fall spawning migration [8]. Olfactory imprinting of dissolved amino acids in natal stream water has been reported in lacustrine sockeye salmon [9], and strong olfactory responses to natal stream water have also been found in sockeye salmon [10]. In wild anadromous Atlantic salmon, some of the olfactory receptor genes involved in the migration for reproduction have been identified [11]. These studies suggest that olfaction may be essential for the migration for reproduction in fish.
The olfactory epithelium in the nasal cavity is involved in the olfaction of fish. The olfactory functions of fish are induced by odorant elements such as steroids, bile acids and amino acids in water through the olfactory receptors in the olfactory epithelium. Subsequently, the information is processed by the central nervous system of fish to achieve the olfactory functions. To investigate the relationship between olfaction and the anadromous behavior of C. nasus, we sequenced the transcripts expressed in the olfactory epithelium. With this sequence information, we identified the genes and pathways involved in the migratory behavior of C. nasus. At present, little genomic information about C. nasus is available in the National Center for Biotechnology Information (NCBI) database. Therefore, the high-quality transcriptome data obtained in this study will be useful for future research on C. nasus.

Results and Discussion
Transcriptome sequencing and assembly As described in the Materials and Methods, cDNA libraries for the olfactory sac of wild anadromous and non-anadromous C. nasus were constructed and sequenced using the Illumina platform, which produced 51,261,228 and 126,241,752 clean reads, respectively (Table 1). For anadromous and non-anadromous C. nasus, 117,717 and 231,219 unigenes, respectively, were obtained, and 176,510 unigenes with a mean length of 843 nucleotides were assembled from the anadromous and nonanadromous C. nasus unigenes (Table 1 and Figure S1). The total length of the 176,510 assembled unigenes was 148,772,175 nucleotides.
The quality of the sequence assembly result and the size distribution are shown in Figure S1. Of all the unigenes, 8,608 or over 4.8% are $3,000 nucleotides in length. The coding regions have been identified for 81,315 sequences (72,601 using BLASTX and 8,714 using expressed sequence tag scan; Figure S2). While it is time-consuming to obtain large cDNA collections using the traditional Sanger sequencing method, the next-generation sequencing platform has been demonstrated in this study to be useful for efficiently generating high-quality transcriptome data of C. nasus.

Annotation of predicted proteins and classification using COG
The putative functions of 89,456 unigenes (50.68% of all unigenes) were annotated by sequence similarity analysis with E value #1610 25 (72,127 using the NR database, 65,888 using the NT database, 61,581 using the SwissProt database, 53,575 using the KEGG database, 25,272 using the COG database, and 41,888 using gene ontology terms). However, because of the lack of genome and EST sequence data from C. nasus, approximately 49.32% of the unigenes could not be functionally annotated.
The E-value distribution and similarity distribution for the 72,127 unigenes (40.86% of all unigenes) that were annotated using the NR database are shown in Figure S3. The species distribution of the best BLASTX hits is also shown in Figure S3. About 66.2% of the unigenes were functionally annotated with the known fish genes. However, a small number of sequences were matched to Paramecium tetraurelia and Tetrahymena thermophila SB210 genes. These sequences may represent contaminants from sample collection or parasitic infection of C. nasus.

Gene ontology assignments
To understand the functional capacity of the C. nasus transcriptome, 41,888 unigenes (46.8% of all unigenes) were assigned to three Gene Ontology (GO) categories: biological processes, cellular components and molecular functions (Figure 2). In the GO category of biological processes, 13,391 unigenes were involved in response to stimulus and 9,782 in signaling, both of which were enriched in this category. Of the unigenes assigned to the GO category of cellular components, 9,021 were involved in the membrane part. In addition, of the unigenes annotated with potential molecular functions, binding (27,140) and catalytic activity (16,082) were enriched in this category. GO terms of channel regulator activity (135 unigenes), electron carrier activity (256), receptor activity (1,845), and receptor regulator activity (48) were also well represented. The large number of regulatory transcripts found in our data may indicate transcriptional plasticity in the olfactory epithelium.
Approximately 41.9% of all the transcripts of C. nasus did not have GO terms assigned to them. This may be because of the fact that knowledge regarding the function of C. nasus genes is  currently limited. It is also possible that these transcripts are from non-coding RNA genes. Nevertheless, the unannotated transcripts in the olfactory epithelium should be documented as they may be involved in the olfaction of C. nasus, either directly or indirectly.
Previous studies on the transcriptome of fish olfactory epithelium have been limited to the goldfish Carassius auratus [12]. Since this goldfish does not have the ability to migrate, comparing C. auratus and C. nasus transcriptomes may provide useful information on the molecular mechanisms of migration. We compared the GO terms of response to stimulus and binding, which may be involved in olfaction and signal transduction. C. nasus had a higher proportion of both terms than C. auratus (6.30% versus 4.40% in response to stimulus; 47.90% versus 45.70% in binding), suggesting that C. nasus may have higher olfaction ability than C. auratus.

Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis
A total of 53,575 unigenes were annotated with the genes in the KEGG database. The number of unigenes in different pathways ranged from 2 to 5,243. The top 25 pathways with the highest sequence tag numbers are shown in Table 2. The top pathway (metabolic pathway) contained 5,243 unigenes. These predicted KEGG pathways may provide a useful resource for research into the spawning migration of C. nasus and other molecular studies in C. nasus.
Simple sequence repeats (SSRs) and SNPs as genetic markers Molecular markers are a useful tool for species evolution and population differentiation studies. At present, studies of the C. nasus population are restricted by the lack of effective molecular markers. Through de novo assembly of transcriptome data, 78,852 SSRs in 54,059 sequences were detected. These SSRs include 14,998 monomers, 50,071 dimers, 9,546 trimers, 2,317 quadmers, 1,523 pentamers, and 397 hexamers ( Figure S4). In addition, 224,779 single nucleotide polymorphism (SNP) sites were identified. 93,501 sites were found in anadromous C. nasus and 131,278 in non-anadromous C. nasus. There were 138,945 transition sites and 85,734 transversion sites (Table S1). The large number of putative molecular markers identified in our work may be useful for future studies on the evolution of the C. nasus genome, such as gene flow, genetic mapping, and genotyping.

A resource for investigation of migration genes
Previous studies on the migration of C. nasus have mainly focused on the behavioral and morphology aspects [1,2,[13][14][15][16][17][18][19]. In this study, we aimed to expand this knowledge and provide new insight into the molecular mechanism of C. nasus migration. The transcriptome data obtained in this study provide a good resource for identifying the putative genes involved in C. nasus migration.
Pathway of olfactory transduction. The hypothesis of olfactory imprinting and homing for salmon assumes that some odorant molecules in the natal stream are imprinted on the olfactory system of juvenile salmon during their downstream migration, and adult salmon detect the corresponding molecules to discriminate the natal stream during their homing migration [9,10,20]. In our study, the KEGG pathway of olfactory transduction (ko04740) [21][22][23][24][25][26][27][28][29] was used to annotate the largest number of genes ( Figure 3). 547 unigenes, or 1.02% of the KEGG-annotated unigenes, were assigned to the olfactory transduction pathway.
At present, little is known about the pathway of olfactory transduction in C. nasus; however, relevant information can be obtained from other vertebrate species [30]. The canonical pathway of the olfactory transduction is initiated from the detection of odor molecules by odorant receptors (Rs). Binding of the odor molecules to the odorant receptors activates the Ga olfcontaining heterotrimeric G protein (G olf ), which then activates adenylyl cyclase (AC) to produce cAMP [31]. Subsequently, cAMP opens the cyclic nucleotide-gated cation channels (CNG) [32]. Ca 2+ ions influx into the cells and depolarization occurs. Ca 2+ -activated chloride channels (CLCA) allow an efflux of Cl 2 ions, which leads to further depolarization of the cell [33][34][35][36][37][38]. The chemical signals are then converted into electronic signals that are delivered to the brain, where the signals are perceived as smells.
Elevated intracellular Ca 2+ triggers multiple molecular events, including the down-regulation of the affinity of the CNG channel to cAMP and inhibition of the activity of AC via CAMKII (calcium/calmodulin-dependent protein kinase II)-dependent phosphorylation [24]. Longer exposure to odorants can stimulate particulate guanylyl cyclase (pGC) in cilia to produce cGMP and activate cGMP-dependent protein kinase (PKG), leading to a further increase in the amount and duration of intracellular cAMP levels, which may function to convert inactive forms of protein kinase A (PKA) to active forms [39]. PKA can also inhibit the activation of pGC as a feedback.
Termination of the response may occur at all steps of the pathway, which include receptor phosphorylation by G protein receptor kinase (GRK) or protein kinase A (PKA) and 'capping' of the phosphorylated receptor by arrestin [40][41][42], inhibition of adenylyl cyclase activity by CaMKII and regulation of G protein signaling 2 (RGS2) [43,44], removal of Ca 2+ through a Na + -Ca 2+ exchanger [45], hydrolysis of cAMP by phosphodiesterase (PDE) activity, and desensitization of the CNG channel by Ca 2+calmodulin (CAM)-dependent processes [46]. However, the transcripts of arrestin, GRK and PDE involved in the response termination, and pGC are not detected in this study. This may be because C. nasus has a unique pathway with a lower termination ability. Since several terminators are absent in the olfactory transduction, sustained detection of odor elements in natal rivers may be possible for C. nasus. It is also possible that these transcripts are rare and thus undetected in this study.
Putative pheromone signaling pathway. The pheromone hypothesis was proposed based on research on Atlantic salmon Salmon salar and Arctic char Salvelinus alpines [47]. In sea lamprey, a mixture of sulfated steroids has also been demonstrated to function as a migratory pheromone [48]. Thus, the putative pheromone signaling pathway should also be considered in the study of the migration behavior of C. nasus.
Pheromones are secreted or excreted chemicals that can impact on the behavior of a receiving individual and trigger a social response within members of the same species. Vomeronasal type-1 receptors (V1Rs) and vomeronasal type-2 receptors (V2Rs) have been shown to function as pheromone receptors [49,50]. The binding of a pheromone to a V1R activates inhibitory adenylate cyclase G protein (Gi), and phospholipase Cb2 (PLCb2) is activated to produce inositol-1,4,5-trisphoshate and diacylglycerol from phosphatidylinositol-4,5-bisphoshate. This activates the transient receptor potential cation channel C2 (TRPC2). Activation of TRPC2 allows a Na + /Ca 2+ influx, which leads to depolarization. Recovery and adaptation of response may involve binding of CaM to TRPC2. The binding of pheromones to V2Rs activates G o , which is a G protein involved in many signal transduction channels [30]. In V2R-expressing neurons, TRPC2 has been shown to generate depolarizing currents [30]. In this study, we identified the family of V1R and V2R, and CaM in the transcriptomes of C. nasus. However, TRPC2 was not detected although we identified the other members of transient receptor potential cation channels, including TRPM4, TRPV4, TRPC5, and TRPV1. It is possible that the role of TRPC2 in the pheromone signaling pathway may be superseded by the other members of the gene family.

Conclusion
By using a high-throughput sequencing approach, we obtained the high-quality de novo transcriptomes of C. nasus for the first time. Our data provide valuable information for understanding the spawning migration of C. nasus, and lay the foundation for future research on the genome evolution of this species, especially as the genomic sequence is still unavailable for C. nasus.

Ethics statement
The study was approved by the Institutional Animal Care and Use Committee of Shanghai Ocean University and performed in strict accordance with the Guidelines on the Care and Use of Animals for Scientific Purposes set by the Institutional Animal Care and Use Committee of Shanghai Ocean University.

Fish material
Three males of non-anadromous C. nasus were collected from Poyang Lake in Jiujiang, Jiangxi Province in China at the end of March 2012 when anadromous males of C. nasus had not reached Poyang Lake to spawn. The fish collection was performed with the help of fisherman Baishan Zhan with the fishing license (No. 0400051) permitted by the Jiangxi Provincial Department of Agriculture. One male of anadromous C. nasus was collected from the Jingjiang section of the Yangtze River in Jingjiang, Jiangsu Province in China at the beginning of April 2012 when they were migrating to spawning grounds along the Yangtze River. The fish collection was performed with the assistance of fisherman Xiping Zhou with the fishing license (No. SuChuanBu 2011 JMF254) and the special fishing license of C. nasus in the Yangtze River (No. SuChuanBu 2012 ZX-M032) permitted by Jiangsu Provincial Oceanic and Fishery Bureau. All fish collections were carried out in wild water, and the captured live C. nasus was immediately buried in medical ice bags (220uC) until the loss of consciousness.
Before sampling, the C. nasus was dissected on ice and subsequently the anatomical characters of the testis gonadal development phase of C. nasus were rapidly checked [51]. If the individual's testis gonadal development phase was in phase III, then the olfactory capsules of C. nasus were collected. The operations were completed within 10 min after the loss of consciousness. After this procedure, the olfactory capsules from the non-anadromous C. nasus were placed into 2.0 mL tubes containing RNAlater (Ambion, US). Then the collected olfactory samples were stored at 4uC overnight and stored at 220uC for 12 hours during the delivery to Shanghai Ocean University, where the samples were transferred to 280uC before processing. The olfactory capsules from the anadromous C. nasus were immediately placed into 2.0 mL tubes and frozen in liquid nitrogen after collection and then delivered to the Shanghai Ocean University for further processing. All the remains of above sampled fish were stored in freezer.

RNA extraction
Total RNA was isolated from samples using TRIzol reagent (Invitrogen, USA) according to the manufacturer's instructions. The quality of purified RNA was verified on a 2100-Bioanalyzer (Agilent, USA). To prevent DNA contamination, the RNA samples were treated with DNase I. The high-quality RNA samples were then used for further experiments.

cDNA preparation and library construction
Poly(A)-containing mRNA samples were captured from total RNA with Oligo (dT)-Bead complex. The fragment mixture of the RNA fragmentation kit was added to mRNA to obtain RNA pieces with different lengths. Then single-and double-stranded cDNAs were synthesized from mRNA samples through reverse transcription using high-quality total RNA as the starting material.
The following cDNA purification was then performed. Purified cDNA fragments were suspended into End Repair Mix for end reparation and adenylate 39 ends. Short fragments produced from the above procedures were ligated with sequencing adaptors, and then fragments with adaptors were purified and enriched with cDNA fragments through PCR. Subsequently, the purified PCR products were used to create a cDNA library. The size distribution and accurate quantification of the library were checked on a 2100-Bioanalyzer (Agilent, USA) and an ABI StepOnePlus Real-Time PCR System. cDNA library sequencing cDNA libraries were constructed for sequencing with Illumina Hiseq 2000. Raw sequence data were processed through the trimming of adaptor sequences, ambiguous nucleotides, and empty reads to obtain the clean data. With software Trinity and TIGR Gene Indices (TGI) Clustering tools v2.1 [52,53], the short clean reads obtained from the two types of C. nasus were assembled and clustered. Sequences with the fewest nucleotides that could not be extended on either end were then obtained. These sequences were called unigenes.
Functional annotation using Gene Ontology terms (molecular functions, cellular components, and biological processes) was performed using BLAST2GO software v2.5.0 based on the NR annotation information [54]. After the gene ontology annotation, WEGO was used to obtain Gene Ontology function classification statistics of all the unigenes for understanding the species' gene function distribution [55].
The Kyoto Encyclopedia of Genes and Genomes (KEGG) database provides a systematic analysis of metabolic pathways and functions of gene products. In this study, the C. nasus unigenes were assigned to canonical pathways described in KEGG using BLASTX.

SSRs and SNPs analysis
Simple sequence repeats (SSRs) in the C. nasus unigenes were detected using the microsatellite identification tool (MISA) (http:// pgrc.ipk-gatersleben.de/misa/). Detection criteria of SSRs included perfect repeat motifs of one to six base pairs and a minimum repeat number of 12 for mono-, six for di-, five for tri-, five for tetra-, four for penta-, and four for hexa-nucleotide microsatellites. SOAPsnp (http://soap.genomics.org.cn/soapsnp.html) was used to detect single nucleotide polymorphisms (SNPs) in the C. nasus unigenes.

Data deposition
The raw Illumina sequencing data from the olfactory epithelium of C. nasus were deposited in the NCBI Sequence Read Archive (SRA) Sequence Database (accession number SRP035517).