RNA-Seq Analysis Using De Novo Transcriptome Assembly as a Reference for the Salmon Louse Caligus rogercresseyi

Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I–II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I–II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions.


Introduction
Major challenges facing transcriptomic research in non-model organisms are increasing the speed and accuracy of discovering new genes and metabolic pathways, as well as determining how gene transcription variations are regulated by specific DNA polymorphisms. Understanding the transcriptome is essential for interpreting the functional elements of the genome, for revealing the molecular constituents of cells and tissues, and for understanding complex biological processes such as growth, reproduction, and immune response [1]. Next-generation sequencing (NGS) technologies offer the opportunity to generate genomewide sequence data sets for a reasonable cost and time [2][3][4]. Although these powerful and rapidly evolving technologies have only been available for a few years, they are already making substantial contributions to the understanding of genome expression and regulation under different conditions. A popular application of NGS is species transcriptome generation, which affords direct access to the coding sequences of many genes and information on their relative expression levels [5][6][7]. NGS transcriptome data analysis is therefore a useful source for mining molecular markers such as SNPs and EST-SSRs [8][9][10][11].
Salmon lice are naturally occurring parasites for seawater salmon, and, compared to natural conditions, parasite infection and transmission are exacerbated under intensive fish farming. The salmon louse Caligus rogercresseyi is the main copepod ectoparasite responsible for significant economic losses of the farmed salmon industry in Chile [12]. This parasite is known to cause surface damage to fish, which results in mucus breakdown and in turn leads to open sores and lesions. A further problem may arise if fish become stressed due to the presence of sea lice [13,14]. It has been observed that chronic stress in fish may result in immunosuppression and a subsequent increased susceptibility to secondary infections [15]. Moreover, salmon lice infestations have been managed by antiparasite agents including organophosphates [16,17], pyrethroids [18], hydrogen peroxide [19], and avermectins [20,21]. However, overexposure to these chemical agents tends to promote drug resistance in wild populations of parasites [22].
These concerns, added to the scarce genomic knowledge of the molecular pathways affected by salmon lice treatments in C. rogercresseyi, provide incentive for the scientific community to increase sequencing efforts in order to identify novel candidate genes that could be related to drug resistance and susceptibility. So far, EST datasets have been reported for a few copepod ectoparasites [23]. As   substantial number of sequences that are similar to already reported genes, but a large proportion do not show EST hits with known proteins. Whole transcriptome shotgun sequencing, or RNA sequencing (RNA-seq), tools allow for expression analysis in organisms without previously sequenced genomes, such as marine invertebrates where the majority of species do not have reference genomes available.
In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for the salmon louse C. rogercresseyi. The goal of this study was to produce whole transcriptome sequences, which would provide pivotal genomic knowledge on the processes involved in the life cycle of the salmon louse. In total, 83,444 transcripts were identified in association with all major signaling pathways and developmental processes of C. rogercresseyi. RNA-seq cluster analysis using the MiSeq Illumina platform between larval stages (nauplius I, nauplius II, copepodid and chalimus) and adult individuals (female and male) evidenced a wide diversity of candidate genes related to ontogenetic development, immune response, stress, drug resistance, the nervous system, and reproduction.

Salmon lice culturing
Female specimens of C. rogercresseyi were collected from recently harvested fish at a salmon farm located in Puerto Montt, located in the south of Chile. Individuals were transported back to the laboratory on ice, and their egg strings were then removed and placed in culture buckets supplied with seawater flow at 12uC and with gentle aeration. Eggs were allowed to hatch and develop until the infectious copepodid stage. These were then used to inoculate a tank containing host fish according Bravo [24]. Prior to the collection of salmon lice, fish were anaesthetized. Salmon lice were then harvested for RNA extraction and cDNA library construction. All laboratory infections and culture procedure were carried out under guidelines approved by the ethics committee of University of Concepción and appropriate veterinary supervision.

Illumina sequencing
The life cycle of C. rogercresseyi comprises eight development stages: nauplius 1-2, copepodid, chalimus 1-4 and adult [25]. Herein, twenty individuals from each instars of C. rogercresseyi were separately collected. In the case of the chalimus stage, samples from the instars 3-4 were collected. Immediately after sampling, each salmon lice stage were pooled into two biological replicates in 1 mL of RNAlater stabilization solution (AmbionH, USA) and stored at 280uC. Total RNA was extracted from pools using the Ribopure TM kit (AmbionH, Life Technologies TM , USA) following the manufacturer's instructions. The concentration and purity were measured with a spectrophotometer (ND-1000, Nanodrop Technologies), and the integrity was visualized with electrophoresis in MOPS/formaldehyde agarose gels at 1.2% staining with ethidium bromide at 0.001%. RNA was also checked for quality on the Bioanalyzer TapeStation 2200 (Agilent Technologies Inc., USA) using the R6K reagent kit according to the manufacturer's instructions. RNA extracts that presented 260/280 and 260/230 purity indices equal to or greater than 2.0 and integral RNA in electrophoresis and Bioanalyzer measurements (RIN.8) were selected. Subsequently, mRNA pools were precipitated overnight with 26 volume of absolute ethanol and 0.16 volume of 0.3 M sodium acetate at 280uC for cDNA library construction. Following this, double-stranded cDNA libraries were constructed using the TruSeq RNA Sample Preparation kit v2 (IlluminaH, USA). Two biological replicates for each developmental stage were

Data deposition
The cleaned short read sequences were deposited in the Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/ sra) under the accession number SRR1106551. The de novo assembly sequence data is available from corresponding author on request.

De novo transcriptome assembly
The raw data for each pool of samples were separately trimmed and de novo assembled in a unique file using the CLC Genomics Workbench software (Version 6.0.1, CLC Bio, Denmark). The overlap settings for this assembly were a mismatch cost of 2, an insert cost of 3, a minimum contig length of 200 base pairs (bp), a similarity of 0.8, and a trimming quality score of 0.05. This assembly yielded 83,444 contigs that were annotated according to Gene Ontology terms with the Blast2Go program [26], that was executed as a plugin of CLC by mapping against the UniprotKB/ Swiss-Prot database (http://uniprot.org) with a cutoff E-value of 1E-05. Furthermore, to determine putative gene descriptions, homology searches were carried out through querying the NCBI EST-database using the tBLASTx algorithm. Finally, the assembled sequences were compared to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [27]. KEGG pathways were assigned to the assembled sequences using the KEGG Automatic Annotation Server (KAAS). The bidirectional best hit (BBH) method was used to obtain KEGG Orthology assignments for each developmental stage of the salmon louse.

Differential gene expression analysis and clustering
The consensus contigs generated by de novo assembly in the previous step were used as a reference for RNA-seq expression analysis. Using the CLC Genomic Workbench software, the readings for each biological replicate were separately mapped against 83,444 contigs. The RNA-seq settings were a minimum length fraction of 0.6 and a minimum similarity fraction (long reads) of 0.5. Then the number of reads per kilobase per million . Clusters of gene expression levels between Nauplius I-II and copepodid stages of C. rogercresseyi. Dendrograms of the transcription patterns were estimated for 83,444 contigs generated by de novo assembling. The bar color reflects the gene expression level from black (low), red (medium) to yellow (high). Contig annotations of these 4 clusters are listed in Table S2. doi:10.1371/journal.pone.0092239.g004 mapped reads (RPKM) was obtained with the same software [28]. This normalized the number of reads to the size of assembled contigs and allowed for assessing the transcripts that were overexpressed among different groups. In order to identify differences between developmental stages, RNA-seq analyses were performed for nauplius I, nauplius II, copepodid and chalimus, and female and male adult stages. Following this, the transcripts that were differentially expressed in comparison to normalized Figure 5. Clusters of gene expression levels between copepodid and chalimus stages of C. rogercresseyi. Dendrograms of the transcription patterns were estimated for 83,444 contigs generated by de novo assembling. The bar color reflects the gene expression level from black (low), red (medium) to yellow (high). Contig annotations of these 5 clusters are listed in Table S3. doi:10.1371/journal.pone.0092239.g005 expression values were visualized in a clustering heat map and selected according to the identified cluster. For an optimal comparison of the results, k-means clustering was performed to identify candidate genes involved in specific gene expression patterns. The distance metric was calculated with the Manhattan method, where the mean expression level in 5-6 rounds of kmeans clustering was subtracted. Finally, a Volcano plot and Kal's statistical analysis test were used to compare gene expression levels for larval stages and adults in terms of the log 2 fold change (P,0.0005, FDR corrected).

Validation by qRT-PCR
Nine genes were chosen for the confirmation of differentially expressed genes by qRT-PCR in the six studied developmental stages. Herein, specific primers were designed from acetoacetyl-CoA synthetase, flotillin, allatostatin precursor protein, tropomyosin, putative cuticle protein, vitellogenin 1, vitellogenin 2, Figure 6. Clusters of gene expression levels between adult females and males of C. rogercresseyi. Dendrograms of the transcription patterns were estimated for 83,444 contigs generated by de novo assembling. The bar color reflects the gene expression level from black (low), red (medium) to yellow (high). Contig annotations of these 6 clusters are listed in Table S4. doi:10.1371/journal.pone.0092239.g006 argonaute 1 isoform C and vasa gene (Table S1). The qPCR runs were performed with StepOnePlus TM (Applied Biosystems, Life Technologies, USA) using the comparative DCt method. Each reaction was conducted with a volume of 10 mL using the MaximaH SYBR Green/ROX qPCR Master Mix (Thermo Scientific, USA). The amplification conditions were as follows: 95uC for 10 min, 40 cycles at 95uC for 30 s, 60uC for 30 s, and 72uC for 30 s. Three putative housekeeping genes (HKG), Elongation factor 1-alpha, b-actin and b-tubulin were statistically analyzed by NormFinder algorithm to assess their transcriptional expression stability. Here, b-tubulin was selected as HKG for gene normalization.

Sequencing analysis and assembly from C. rogercresseyi transcriptome
Six types of cDNA samples, which represented different developmental stages and adult tissues of C. rogercresseyi, were prepared and sequenced using the MiSeq Illumina platform. The sequencing runs yielded a total of 154.84 M reads with an average length of 171 bp. The CLC Genomic Workbench software was used with default parameters to screen for adapter sequences and eliminate poor quality reads. After quality trimming and removal of adapter sequences, 151.78 M reads, representing 97% of the raw reads, remained in the dataset. Of these, 132.5 M reads (88%) wholly or partially assembled into contigs, and 19.26 M reads remained singletons. The remaining reads were excluded from further analyses. The high-throughput sequencing performed for each developmental stage showed similar numbers of yielded reads and average length. Interestingly, the number of singletons did not show major differences among larval and adults stages. The number of nucleotides generated from the C. rogercresseyi transcriptome using Illumina technology was up to 25.9 Gigabases (Table 1). De novo assembly yielded 83,444 contigs with an average length of 819 bp, of which 58,320 contigs had a length between 300 and 2,000 bp and 25,124 contigs were longer than 2,000 bp. The average coverage among the contigs was 351.1 reads/bp, suggesting that every base pair in the salmon louse transcriptome was sequenced up to 300 times on average. The contigs yielded from the de novo assembly performed for each developmental stage ranged from 29,887 in nauplius I to 50,174 in copepodids, with an average length of 823 bp ( Table 1). The sequencing results evidenced lower variation between the biological replicates for each stage. For instance, the average coverage did not show significant differences among replicates (data not shown).
Gene Ontology analysis was carried out to explore and summarize the functional categories of the genes sequenced in this study. Among the 83,444 assembled contigs, 15,314 were assigned to biological processes (27%), molecular functions (34%), and cellular components (39%). Within each of these three main categories, genes that annotated for translation, the nucleartranscribed mRNA process, viral transcription, egg hatching, larval development, the response to drugs, protein biding, metal ion biding, and cytoplasm were the most abundant ( Fig. 1).  Important cell procedures related to early development were somewhat evidenced, such as with genes involved in cell motion, cell proliferation, cuticle formation, myogenesis, and locomotion. Final functional classification and pathway assignment were performed using bi-directional BLAST with an E-value of 1e -3 against the KEGG database. Of these sequences, 16,213 had significant matches in the database. Among the matched sequences, metabolic pathways, such as carbon metabolism, the biosynthesis of amino acids, oxidative phosphorylation, glycolysis, the citrate cycle, and lipid metabolism, were well represented in C. rogercresseyi sequences. Given the important roles of lipids in the copepod lifecycle, especially during ecdysis, greater attention was placed on lipid metabolism. Genes were found in several pathways involved in fatty acid biosynthesis, such as fatty acid elongation, steroid biosynthesis, and ether lipid metabolism. Furthermore, genes related to nervous system development were highly annotated to signaling pathways such as the neuroactive ligandreceptor, the GABAergic and glutamatergic synapse, axon guidance, and the cholinergic synapse. Interestingly, immune response genes were found associated with the NF-kappa B signaling pathway, the TNF signaling pathway, and the Toll-like receptor, among others.
Differentially expressed genes among developmental stages of C. rogercresseyi In addition to obtaining gene annotations for the salmon louse, another major aim of the present transcriptomic study was to analyze the overall gene expression profile in order to identify genes participating in pivotal biological process and molecular functions related to the developmental stages, especially for larval stages and adult individuals. After de novo assembly, the contigs that showed matching reads for all samples were sorted to generate a gene reference dataset. Then, gene expression data was normalized for six RNA-seq experiments so as to separately compare the expression levels between larval stages, and female and male adult individuals. This approach was applied as the most critical physiological changes in the ontogeny of parasite copepods occur during the free-swimming (nauplius, copepodid), larval settlement (chalimus), and mature female and male adult phases [29][30][31][32].
Cluster analysis was conducted for 83,444 genes and showed differential transcription expression values among the analyzed developmental stages. The overall expression profiles are displayed in Figure 2. Clustering of the profiles from the six stages evidenced an increasing expression ratio (log 2 ) from the nauplius I to adult stages at about a 5-fold change ( Fig. 2A). However, high-resolution analysis of transcription patterns among the salmon lice stages revealed specific upregulated or downregulated gene clusters from the nauplius to adult stages. Herein, transcription activity was found associated with gene clusters showing up-regulation from the nauplius stage to the last developmental stages, as well as an down-regulation from early larval stages to male adults ( Fig. 2B and C, respectively). Furthermore, the k-means and distance were estimated by the Manhattan method to identify clusters of candidate genes involved in specific gene expression patterns (Fig. 3). Through this, four clusters were observed differentially expressed between nauplius I-II and copepodid, where 604 transcripts (Fig. 4) were mainly overregulated in the copepodid stage (Clusters 4) and nauplius I-II stages (Clusters 1) ( Table 2). It is important to note that no significant expression differences between nauplius I and nauplius II were observed. Then, for further analysis the two larval instars were considered as nauplius I-II stage. In addition, five clusters were evidenced differentially expressed between the copepodid and chalimus stages, where 2,426 transcripts (Fig. 5) were mainly associated to chalimus stage (Clusters 3, 4, and 5) as compared to copepodids (Clusters 1 and 2). The greatest differences in transcription expression were found in 271 putative genes that comprised Cluster 4 ( Table 2). In addition, genes from female and male adults that showed differential transcription were highly identified into six clusters containing 2,478 transcripts (Fig. 6). Interestingly, half of the clusters that evidenced differential transcription activity were overregulated in females (Clusters 2, 3, and 5) as well as in the male transcriptome (Clusters 1, 4, and 6). Two clusters (3 and 5) linked to female gene expression displayed the highest RPKM values of the analyzed clusters (Table 2).
In regards to gene annotation, relevant genes were identified through transcriptome cluster analysis between the nauplius I-II and copepodid. For instance, genes related to mitochondrial metabolism, and also to molting cycle were mainly associated to clusters 1, 2 and 3. In contrast, the cluster 4 evidenced a wide diversity of proteins, including an important number of hypothetical proteins annotated for Lepeophtheirus salmonis and Daphnia pulex. Moreover, for copepodid and chalimus stages of C. rogercresseyi, clusters 1, 3, and 6 were comprised of genes related to nervous system development, such as the neuronal acetylcholine receptor subunit alpha-3, Cerebellin-3, High-affinity choline transporter,  and GABA-alpha subunit. Clusters 2 and 4 were mainly annotated to genes associated with cuticle and contractile elements, such as the cuticle protein, ferritin, myosin and actin, and with some genes related to the immune response, such as akirin, agglutinin isolectin, E-selectin, peroxinectin, and cathepsin. In addition, clustering analysis between female and male adults revealed a major diversity of identified genes. Some genes were linked to the morphogenesis process and cellular proliferation, such as the cuticle protein 6, gamma-crystallin A, hemicentin protein, calreticulin, and vasa gene (Clusters 1-3). Genes involved in the processes of gametogenesis and reproduction, such as the proliferation-associated protein, insulin-like growth factor-binding protein, nuclear sperm protein, vitellogenin, and estradiol 17-betadehydrogenase, were also annotated (Clusters 4-6). A detailed list of relevant, identified genes is shown in the supplementary material (Table S3-S5 in File S2) In order to identify the genes highly expressed between copepodid and chalimus stages, and between female and male adults, statistical analyses visualized on a Volcano plot were performed to evaluate fold change values (Fig. S1 in File S1). From this, genes associated with Gene Ontology terms such as amino acid transfer, repair and breakdown, metabolism, and nervous system development were identified for the nauplius I-II, copepodid and chalimus stages. An upregulation specific to the copepodid stage was observed for the genes metalloproteinase, arginine kinase, E-selectin, L-selectin, tropomyosin, cuticle proteins, flotillin, allotostatin, and opsin, among others. For the chalimus stage, the genes trypsin, alpha amylase, carboxipeptidase, bleomycin, gamma-crystalin A, nanos homolog, and vitellogenin were overregulated (Table 3 and 4). With regards to female and male adults, most candidate genes were related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and the transcription and translation process. For female adults, hemicentin, TSP1-containing protein, vitellogenin, homeobox, vasa, argonaute, and several transcription factors were upregulated. For salmon louse males, actin, troponin, myosin, cuticle protein, brain-specific angiogenesis inhibitor and sperm proteins such as nuclear autoantigenic sperm protein, motile sperm domain-containing protein 1 and peroxidosomal N1-acetyl-spermine/ spermidine oxidase were mainly overregulated (Table 5). Finally, statistical analysis showed 63, 166 and 114 hypothetical proteins and unannotated contigs up/down-regulated from nauplius I-II/ copepodid, copepodid/chalimus and female/male adults, respectively (Fig. S2 in File S1). A detailed list of hypothetical proteins and unannotated contigs is shown in the supplementary material (Table  S6-S8 in File S3).
In overall, the gene expression patterns revealed through the developmental stages of C. rogercresseyi, suggest lower changes of transcription activity between nauplius I, II and copepodid stages. In contrast, higher gene expression differences were found during the infective stage of chalimus and adults of the salmon louse. The principal component analysis showed correlation values that grouped nauplius I, nauplius II and copepodid stages by separate from chalimus and adults instars (Fig. S3 in File S1). Moreover, to confirm the usefulness of the C. rogercresseyi cDNA database established by the Illumina paired-end sequencing method, we investigated by qRT-PCR the expression of 9 genes selected from catalytic activity, nervous system development, molting, contractile elements, reproduction and cellular process (Fig. S4-S8 in File S1). The correlation between expression levels quantified by qPCR and the in silico analysis confirmed the robustness of the illumina sequencing results (Fig. S9 in File S1) Radar plot of contigs with significant expression values (P#10-16; |fold-change|.5) in terms of percentages for nauplius I-II/ copepodid, copepodid/chalimus and female/male from C. rogercresseyi were analyzed in order to evidenced the proportions of genes up/down-regulated that are associated to key biological process and molecular functions (Fig. 7). Interestingly, the analysis revealed that the nauplius I-II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes.

Discussion
Sea lice are the most prevalent ectoparasites found in the farmed salmon industry worldwide, and two species, Lepeophtheirus salmonis (Krøyer, 1838) and Caligus rogercresseyi (Boxshall and Bravo, 2000), are responsible for major economic losses in countries such as Norway, Scotland, Canada, and Chile [33]. According to Hamre et al. [34], the complete life cycle is now known for 17 species of Caligidae, as represented by just three genera, Caligus (12 species), Lepeophtheirus (four species), and Pseudocaligus (one species). However, the number of developmental stages appears to vary among species, with the free-living phase being comprised of two nauplii stages and the infective copepodid stage, while there are several chalimus and adult stages [35]. In Caligus species, four chalimus instars have been found, the last of which molts into the definitive adult [24,25,36,37]. In contrast, the life cycle of Lepeophtheirus species have been reported to have four chalimus and two pre-adult stages that allow the louse the ability to detach from a temporary frontal filament shortly after molting and move over the surface of the skin [38]. However, recent findings from observing chalimus larvae molting and through morphometric cluster analysis from L. salmonis reported only two chalimus stages, and, consequently, a life cycle comprised of six post-nauplius instars [34].
Understanding salmon louse biology is critical for establishing strategies that allow for the control and management of this ectoparasite. However, evidence supporting morphological and physiological changes in correlation with transcriptome profiles during the life cycle of salmon lice is still limited. For instance, EST collections for the developmental stages of L. salmonis have only been reported in female and male adults [31,32,39], which so far represents the most comprehensive and publicly available transcriptome for L. salmonis at 129,250 transcripts [23]. In this context, the present study provides 84,023 high quality contigs and, subsequently, 29,000 significant annotated proteins from different developmental stages of the salmon louse C. rogercresseyi.
This sequencing effort represents the most comprehensive transcriptome resource available for this caligid species.
Salmon lice included in the present RNA-seq study were evaluated between the nauplius I-II and copepodid, copepodid and chalimus stages, and also between female and male adults. This approach was applied in order to identify relevant transcriptome profiles across larvae instars and with sexual differentiation. In fact, it could be hypothesized that these developmental phases are representative of the major physiological changes during the life cycle of copepods. For instance, a study related to peptidergic signaling in C. finmarchicus reported that the highest expression levels from six stages (embryo, early nauplius, late nauplius, early copepodid, late copepodid, and adult) are seen in the naupliar and copepodid stages, while the lowest levels are present in embryos and adult females [29]. Specifically in the copepodid stage, host-seeking behavior has been displayed by L. salmonis during its infectious stage, including moving towards river mouths and maintaining location in haloclines during salmon migrations [40]. Consequently, the copepod must be able to cope with abiotic stress conditions [32] and respond to the inflammatory defense mechanisms at the site of salmon parasite attachment [41]. Furthermore, relatively little is known concerning sex differentiation and its endocrine control in crustaceans, and most available data have been obtained in decapods [42]. However, transcriptome sequencing studies have facilitated the discovery of novel sex-related genes, which thus far have suggested pivotal transcriptional differences between female and male adults [31,43].
The present transcriptome analysis of C. rogercresseyi revealed 3,030 transcripts that comprised nine clusters, which were differentially expressed between the nauplius I-II, copepodid and chalimus stages. Interestingly, some upregulated genes were mainly associated with metalloproteinase, arginine kinase, and cuticle protein, which evidence participation in the digestion of intake proteins, tissue development, cuticle remodeling, and in specific cleavage events to activate or inactivate proenzymes and bioactive peptides [44]. With respects to nervous system development in copepodids, some relevant genes such as nicotinic acetylcholine receptor, flotillin, synaptotagmin, allotostatin, frequenin, and opsin were highly overexpressed. These results are congruent with previous studies of transcriptome profiles reported for copepodid stages [29,45,46]. Furthermore, a study by Wilson and Hartline [47] demonstrated high peripheral and central nervous system development in individuals transitioning from the nauplius to copepodid stage. The upregulation of allotostatin could be associated with the regulation of juvenile hormone production, or, more interestingly, with recent findings where the activation of neurons, or neuroendocrine cells, that expressed the neuropeptide allotostatin modulated feeding behavior in Drosophila, including increased food intake and enhanced behavioral responsiveness to nutrients or molecular clues [48]. It is important to note that these effects on feeding behavior could be related to changes induced by the start of the parasitic phase in the salmon louse. Furthermore, investigations of opsin function outside of vertebrate systems have long been focused on arthropod visual pigments [49], indicating that copepods possess a sensory apparatus sensitive to different wavelengths that could have implications during the host-finding process, especially in the copepodid stage [50]. For the chalimus stage, trypsin, alpha amylase, and carboxipeptidase genes were overregulated. Peptidases from the different families may be involved in a wide range of cellular and biological processes, thus making it difficult to infer specific functions across salmon louse development. Host blood has been reported as a major food component for the salmon louse L. salmonis [51]. Blood degradation in several hematophagus organisms has been shown to require the catalysis of several peptidases [52,53]. Of the regulated peptidases in the present study, the most overregulated was trypsin, a secretory endopeptidase within the serine protease superfamily. This superfamily includes important digestive enzymes that constitute a major part of digestive fluids and act as activators of other digestive enzymes. These results are congruent with previous reports on the interaction between parasitic copepods and salmon hosts [54,55].
In addition, genes showing differential transcription from female and male adults were highly annotated into six clusters comprised of 2,478 transcripts. For female adults, sex-related genes such as vitellogenins and estradiol 17-beta-dehydrogenase were identified. However, effects of the vertebrate-like steroid hormones on reproductive processes, such as oocyte maturation in crustaceans, still remain unresolved. Vitellogenins are the major yolk proteins in most invertebrates, and several different vitellogenins typically give rise to vitelline granules in mature eggs [56]. The role of multiple vitellogenin genes in some organisms, such as insects, is unknown [57] despite that proteins with domain structures similar to vitellogenins are also involved in other developmental processes, such as in the regulation of osmolarity, immunity, and clotting [58]. The present data showed a wide diversity of vitellogenins, including LsVit1 and LsVit2 as reported in L. salmonis [59], and several vitellogenin-likes proteins. Moreover, genes associated with cell development, including homologues of vasa, homeobox, argonaute, cell division protein kinase, and centromere-associated protein, were also specifically expressed in female adults. For salmon louse males, transcription activity related with cytoskeletal and molting cycle, as well as with sperm proteins were mainly overregulated. Based on the data of the present study it is therefore likely that the actin, troponin, myosin, and cuticle proteins are an important part of cuticle formation during the final molt for C. rogercresseyi. The sex-related genes reported in the present study represent novel molecular information regarding salmon louse reproduction.
The initial analysis of C. rogercresseyi transcriptome revealed that approximately 71.4% had no significant hits in GenBank using the nr-database. Even the re-annotation of the contigs revealed a total of 13% novel homologous proteins to L. salmonis. Similar high proportions of novel genes have been reported in non-model crustacean species [20,43]. Furthermore, a total of 230 hypothetical proteins evidenced significant gene expression differences among the developmental stages, demonstrating the potential for discovery of unknown genes and novel biological processes involved in the life cycle of salmon lice.

Conclusions
The present study represents a step forward in identifying a number of possible conserved genes that are likely to be involved in various important biological activities. Using de novo assembly, 83,444 high quality contigs and 24,000 genes, as based on known proteins, were identified from the C. rogercresseyi transcriptome. Future studies will address validating the discovered gene profiles, thus avoiding misinterpretations of the functional genomics information. The present data provide the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions.

Supporting Information
Table S1 Primer list for qPCR validated in C. rogercresseyi genes. (DOCX) File S1 Figure S1. Volcano plot displaying the 2log 10 of the P values from Kal's statistical test in terms of the log 2 fold change for nauplius I-II/copepodid, copepodid/chalimus and female/ male of C. rogercresseyi. The selected genes have significantly different expression values (P#10 216 -P#10 25 ). Dots, triangles and squares represent individual ESTs from larvae stages and adult salmon lice, respectively. Annotated and unannotated sequences according BLAST analysis as filled and empty spots were denoted. Figure S2. Number of contigs annotated and unannotated showing up/down regulation for nauplius I-II/ copepodid, copepodid/chalimus and female/male of C. rogercresseyi. Figure S3. Principal component analysis from six Caligus rogercresseyi development stages -nauplius I, nauplius II, copepodid, chalimus and female and male adults. Figure S4. Relative expression levels of acetoacetyl-CoA synthetase gene from six developmental stage of Caligus rogercresseyi. Each bar represents the mean of expression levels (6 SD). Figure S5. Relative expression level of flotillin and allatostatin precursor protein from six developmental stage of Caligus rogercresseyi. Each bar represents the mean of expression levels (6 SD). Figure S6. Relative expression level of tropomyosin and putative cuticle protein from six developmental stage of Caligus rogercresseyi. Each bar represents the mean of expression levels (6 SD). Figure S7. Relative expression levels of vitellogenin 1 and 2 gene from six developmental stage of Caligus rogercresseyi. Each bar represents the mean of expression levels (6 SD). Figure S8. Relative expression levels of argonaute 1 isoform C and Vasa gene from six developmental stage of Caligus rogercresseyi. Each bar represents the mean of expression levels (6 SD). Figure S9. Correlation analysis between transformed expression values obtained by qPCR and in silico analysis from six developmental stage of Caligus rogercresseyi.

(DOCX)
File S2 Table S3. Relevant annotated genes identified by clustering analysis between Nauplius I-II and Copepodid stages of C. rogercresseyi transcriptome. Table S4. Relevant annotated genes identified by clustering analysis between Copepodid and Chalimus stages of C. rogercresseyi transcriptome. Table S5. Relevant annotated genes identified by clustering analysis between Female and Male stages of C. rogercresseyi transcriptome. (DOCX) File S3 Table S6. Hypothetical proteins and unannotated contigs Up/down-regulated in Copepodid/Nauplius I-II groups. Table S7. Hypothetical proteins and unannotated contigs Up/ down-regulated in Chalimus/Copepodid groups. Table S8. Hypothetical proteins and unannotated contigs Unannotated contigs Up/down-regulated in Male/Female groups. (DOCX)

Author Contributions
Conceived and designed the experiments: CGE. Performed the experiments: CGE VVM GNA. Analyzed the data: CGE. Contributed reagents/ materials/analysis tools: CGE. Wrote the paper: CGE VVM GNA.