De Novo Transcriptome Analysis Provides Insights into Immune Related Genes and the RIG-I-Like Receptor Signaling Pathway in the Freshwater Planarian (Dugesia japonica)

Background The freshwater planarian Dugesia japonica (D. japonica) possesses extraordinary ability to regenerate lost organs or body parts. Interestingly, in the process of regeneration, there is little wound infection, suggesting that D. japonica has a formidable innate immune system. The importance of immune system prompted us to search for immune-related genes and RIG-I-like receptor signaling pathways. Results Transcriptome sequencing of D. japonica was performed on an IlluminaHiSeq2000 platform. A total of 27,180 transcripts were obtained by Trinity assembler. CEGMA analysis and mapping of all trimmed reads back to the assembly result showed that our transcriptome assembly covered most of the whole transcriptome. 23,888 out of 27,180 transcripts contained ORF (open reading fragment), and were highly similar to those in Schistosoma mansoni using BLASTX analysis. 8,079 transcripts (29.7%) and 8,668 (31.9%) were annotated by Blast2GO and KEGG respectively. A DYNLRB-like gene was cloned to verify its roles in the immune response. Finally, the expression patterns of 4 genes (RIG-I, TRAF3, TRAF6, P38) in the RIG-I-like receptor signaling pathway were detected, and the results showed they are very likely to be involved in planarian immune response. Conclusion RNA-Seq analysis based on the next-generation sequencing technology was an efficient approach to discover critical genes and to understand their corresponding biological functions. Through GO and KEGG analysis, several critical and conserved signaling pathways and genes related to RIG-I-like receptor signaling pathway were identified. Four candidate genes were selected to identify their expression dynamics in the process of pathogen stimulation. These annotated transcripts of D. japonica provide a useful resource for subsequent investigation of other important pathways.


Introduction
The freshwater planarian Dugesia japonica (D. japonica) is a remarkable creature with extraordinary regenerative abilities. They have been used as model animals for years for study of biological evolution processes and occupy a key position in the field of animal phylogeny [1]. In recent years, researchers have shown an increased interest in D. japonica, not only because they can regenerate entire triploblastic body plan from small fragments [2], but also they can continuously renew all cell types from pluripotent stem cells (neoblasts) [3,4]. The increasingly unraveled regenerative capacity of planarian, which is not possessed by other commonly used invertebrate model, makes it a valuable system for studying regeneration and development [5]. It is noteworthy that there is little wound infection in regenerated planarians, suggesting that planarians may possess a complicated and effective immune system [6,7]. Intermingling planarian wound response and regeneration will provide an exciting opportunity to advance our knowledge of repair mechanism and immune defense system.
Planarian does not have an adaptive immune system (specificity); instead, it has developed innate immunity (non-specificity) as the dominant system of biological host defense [8][9][10]. As the first line of defense, the surface structures and tissues in gastrointestinal tract form a physical barrier for invading pathogens and are essential for the animal. Beyond that, the most important mechanism is the recognition of infectious non-self. Studies shown that a number of proteins, such as pattern recognition receptors (PRRs), recognize common antigens existed on the surface (pathogen associated molecular patterns, PAMPs) of pathogens according to the pattern recognition to resist and expel foreign substances [11,12]. Subsequently, non-self recognition signals trigger a series of cascade reactions through signal modulation and amplification and activate corresponding signal transduction pathways [13,14].
RIG-I is a member of pattern recognition receptors (PRRs) and plays a pivotal role in immune response by recognizing and binding the double stranded RNAs and 5'-triphosphate single stranded RNAs of invading virus [28,29]. After binding with virus nucleic acid, RIG-I forms a complex with an adaptor protein MAVs/VISA/Cardif/IPS-1 which is anchored on the outer mitochondrial membrane [30][31][32][33]. Then, the complex on the membrane recruits tumor necrosis factor receptor associated factor 3 (TRAF3) and TRAF6 through the TIM binding sites on MAVs and activates related transcription factors, including nuclear factor-κB (NF-κB), interferon-regulated-factor (IRF) and type I interferons (IFN-α/β) [30,34]. TRAFs regulate cell physiological and pathological processes through multiple signaling pathways and participate in immune response [35]. Reports have showed that the activation of IFN-α/β in RIG-I-like receptor signaling pathway requires the participation of P38 [36,37]. The activity of P38 is essential for viral elimination of IFN-α/β.
The importance of innate immunity in invertebrates is indisputable, while the pivotal immune-related genes and signaling pathways are poorly understood in D. japonica. The emergence of high-throughput sequencing technologies has permitted new approaches and designs in investigation of functional genes involved in various specific signal-transduction and metabolic and pathways. De novo transcriptome analysis has been employed as an appropriate technique for identifying unknown genes in non-model organisms [38].
Here, we sequenced the transcriptome of D. japonica (clonal strain BS, called Dj-BS) using IlluminaHiSeq2000, and assembled the transcriptome using Trinity (http://trinityrnaseq. sourceforge.net/) after quality filtration and trimming of raw reads. ORF prediction, functional annotation, GO (Gene Ontology) analysis, and KEGG (Kyoto encyclopedia of genes and genomes database) analysis were performed. Immune-related genes and immune system related pathways were also identified and the expression patterns of four candidate genes involved in RIG-I-like receptor signaling pathway were identified after stimulation with lipopolysaccharide (LPS) and peptidoglycan (PGN). This study will make an important contribution to the understanding of planarian innate immune system, especially in discovering those immune-related genes in planarian.

Results and Discussion
Sequencing and de novo assembly Workflow for cDNA preparation, sequencing, assembly, and annotation of Dj-BS transcriptome is presented in Fig 1. cDNA libraries were constructed from Dj-BS mRNA and were sequenced using an IlluminaHiSeq2000 sequencing platform. Original images were translated into sequences by base calling, and a total of 42,877,438 paired-end raw reads were obtained. The sequences account for approximately 4.3G bp with a Q20 (proportion of nucleotides with quality value larger than 20 in reads) over 92.87% and numerical value of N% is very low (S1 File). Low quality reads, which contain adaptors, many Ns and low quality scores, and short reads (<20 bp in length) were removed. In total, 40,449,653 high quality reads with an average length of 90 bp were generated. Approximately, 95% filtered reads were obtained and used for future analysis.
High quality reads were assembled using the Trinity program (http://trinityrnaseq. sourceforge.net/) [39]. A total of 27,180 transcripts (including all isoforms from alternative splicing) were obtained, with an average length of 958 nt and N50 length of 1,196 nt, which consist in 21,536 genes (Table 1). Among these, there were 12,119 transcripts (44.59%) with a length between 400 to 800 nt, and the length of longest and shortest was 12,141 and 351 nt, respectively (S2 File).

Assessment of assembly
To determine the integrity of the transcriptome assembly, the completeness of our transcriptome assembly was assessed by using CEGMA and by mapping of all trimmed reads back to the assembly result. 218 out of the 248 core proteins (87.9%) were defined as 'complete' by CEGMA, and 89% of all trimmed pair-end reads were mapped back to the final assembled transcriptome by Bowtie2. These results indicated that our transcriptome assembly covered most of the whole Dj-BS transcriptome. Another parameter for assessing the quality of a transcriptome assembly is the number of assembled transcripts that appear to be full-length. To perform the full-length transcript analysis, we aligned the assembled transcripts against all proteins from Uniprot protein database with to find unique top matching proteins, and then calculated the percentage of the protein length that covered by our assembled transcripts. Our results showed that the top matching proteins, which had !80% of their sequences covered by the assembled transcripts, occupied 51% (3,747/7,279) of all matched proteins ( Table 2).
Although the final transcript number of the Dj-BS transcriptome assembly (27,180) is 26.9% less than the D. japonica transcriptome in Qin et al's work (37,218, named Dj-Qin trancriptome in this paper for comparison) [40], the average length of our assembled transcriptome  (958 nt) is twice as long as the average length of their transcriptome (468 nt), suggesting that our assembled Dj-BS transcriptome is more complete. Compared with the 1,456 up-regulated transcripts from the work by Abnave et al [6], all could find the corresponding hits in our transcriptome by TBLASTX, and 918 of those are highly conserved sequences in our transcriptome showed by the BLASTN result (E-value 1e-5). These results showed that the assembly quality was sufficiently high for the subsequent functional analysis.

Functional annotation of assembled transcripts
The ORF prediction of assembled transcripts was carried out using Trinity. The results showed that 23,888 out of 27,180 transcripts contained ORF candidates. After those ORFs were converted into proteins, functional annotation was performed on the NCBI website by BLASTP [41] with the NCBINR, STRING and GENE database. 11,140 and 1,504 proteins were found from the protein databases NCBINR and STRING, respectively. Then, all assembled sequences were aligned with sequences from the NR, STRING and GENE database by BLASTX (BLAST Version 2.2.25, E-value 1e-5). We discovered that these transcripts in Dj-BS showed significant similarity with those in Schistosoma mansoni (9.9%), followed by those in Clonorchis sinensis (9.2%) and Crassostra gigas (8.9%) (Fig 2 and Table 3).

GO annotation
Gene Ontology (GO) assignments [42] were used to classify the functions of transcripts, which help us understand the biological significance of the tested genes. 8,079 of Dj-BS transcripts (29.7%) were annotated by Blast2GO (http://www.blast2go.com/b2ghome) (Table 3), and gene functional classification was plotted by WEGO (http://wego.genomics.org.cn/) (Fig 3). These annotated transcripts can be classified into three categories in GO assignments: cellular component (GO: 00005575), molecular function (GO: 0003674) and biological process (GO: 0008150). Under the cellular component category, which contains 13 functional classes, classes for cell and cell part account for the major proportion with 4,370 (54.1%) and 4,368 (54.07%) transcripts, respectively. Under the molecular function category, which contains 14 functional classes, classes for binding and catalytic activity accounted for the preponderant portion, with 4,198 (51.96%) and 3,970 (49.14%) transcripts, respectively. In the biological process category, which contains 24 functional classes, classes for cellular process and metabolic process were dominant, with cellular process 4,772 (59.07%) and metabolic process 3,699 (45.79%) transcripts, respectively. In addition, 240 (2.97%) transcripts were assigned to the class for immune system process (S3 and S4 Files). To further analyze the GO annotation results, we compared the GO annotation between our Dj-BS transcriptome to the previous Dj-Qin transcriptome (Fig 3 and S5 File). Because the transcripts might belong to the same species, it is reasonable that, as a percentage of classified transcripts, the predicted transcript sets for Dj-BS and Dj-Qin are very similarly distributed among different functional categories. In addition, in almost all functional categories, the number of annotated from Dj-BS is much bigger than the number in Dj-Qin, and the total annotated transcripts (8,079) from Dj-BS are twice more than those from Dj-Qin (3,313) (S5 File), which indicated Dj-BS transcriptome provides a better platform for future study of planarian D. japonica.

Functional classification using KEGG
Spatial and temporal regulations are essential events for life. In recent years, investigation of relationships among proteins, in each specific pathway, has received considerable attention. KEGG [43] is a large knowledge base, which connects the genomic information and functional information and provides large-scale analyses of biological functions in a systematic manner. In this study, by BLASTX, we discovered 3,511 sequences from KEGG GENE database belonging to 300 pathways, which matched 8,668 transcripts in our results. Among these, 507 transcripts (396 genes) matched with 150 orthologues, which are involved in 15 KEGG immunerelated pathways (S4 and S6 Files). Combining with the KEGG and GO annotation results, we were able to identify 240 immune-related genes involved in 165 KEGG pathways from Dj-BS. Conserved pathways were found both in planarians and higher animals, such as complement and coagulation cascades, RIG-I-like receptor, Toll-like receptor, NOD-like receptor and FcεRI signaling pathway.

Dj-BS immune associated genes
As a platyhelminthes, inside the planarian there must be a mechanism which prevent and defend the invasion and infection of pathogens. However, the planarian immune system is still largely unknown [44]. We selected genes in 15 KEGG immune-related pathways firstly. Genes participated in the immune response which have been previously verified are presented in Table 4 and S7 File. All these sequences were aligned with sequences from NCBI to find conserved domains. These candidate genes include PRRs (C-lectin, A2M, RIG-I), crucial factors participated in multiple signaling pathways (PI3K, P38, JNK, MAPK, AKT), TRAF family proteins, tolloid-like proteins, tyrosinase, CUTA-like protein and DYNLRB-like protein. C-type lectin, as an emblematic PRR, mediated a variety of immune responses within invertebrates, such as agglutinating bacteria, identifying specific sugar and promoting effect on phagocytosis [45,46]. Tyrosinase, also known as phenoloxidase, is an important element of the immune reaction process resisted pathogens or parasites in invertebrates [47][48][49]. Importantly, Abnave et al found that CUTA-like and DYNLRB-like genes actively contribute to the elimination of L. pneumophila and S. aureus (bacteria which were selected to stimulate planarians for producing immune response) in planarian D. japonica [6]. Therefore, detection of candidate genes and further functional investigation will advance the understanding of planarian immune system. LPS and PGN are commonly used as a wide range of immune stress reagents for stimulating invertebrates to produce immune response [50,51], and in this study we also used LPS and PGN in planarian immune respons research. To verify genes participating in the immune response and to confirm the stimulation of LPS and PGN can cause the immune response of planarians, we cloned the DYNLRB-like gene from Dj-BS (S7 and S8 Files) and performed whole-mount in situ hybridization. The results revealed that the DYNLRB-like gene was primarily expressed in pharynx and intestinal tissues in response to the challenge with LPS and PGN (S9 File), which showed similar expression pattern with the previously study after fed with L. pneumophila and S. aureus [6]. This showed that LPS and PGN could simulate bacteria in planarian immune response. The DYNLRB-like gene belongs to the km23/DYNLRB/LC7/ robl family of dynein light chains. Reports [52] found that km23 plays an important role in TGF-β signal transduction in mammalian cells and zebrafish ovarian follicle cells. In drosophila, the roblz deletion mutant shows a female sterile phenotype [53]. The mechanism of the DYNLRB-like gene in planarian merits further study.

Quantitative real-time PCR analysis of immune related gene transcripts in RIG-I-like receptor signaling pathway
To investigate whether these four genes (RIG-I, TRAF3, TRAF6, P38) related to RIG-I-like receptor signaling pathway (Fig 4) are involved in the immune response process of Dj-BS, realtime qRT-PCR was performed (S7 and S8 Files). Animals were inducted with LPS (10 μg/ml) or PGN (10 μg/ml), and gene expression patterns were measured at six different time points. As shown in Fig 5, the tendencies of gene expression of all genes stimulated with different PAMPs (LPS and PGN) were similar. Gene expression increased upon simulation and the level of expression peaked (P<0.01) at 5 h, compared with the expression level at 0 h. The expression gradually declined after 9 h, and leveled off after a slight fluctuation. The concentration of stimulus after 5 h of these genes increased three-to five-fold time compared with concentration of homeostasis stage, implying that these genes belong to Group 2 APP [54]. As expected, these candidate genes are acute phase protein whose the levels may increase and decrease markedly within 24 h in response to pathogens infection. Then, in order to maintain homeostasis balance these genes will return to normal levels. The up-regulated gene expression at 5 h, which was similar with the temporal expression patterns of CgRIG-I in poly(I:C) infected C. gigas [55], demonstrates that these genes are involved in the corresponding immune response pathway upon LPS and PGN stimulation. Subsequent results of candidate genes expression changes after bacteria stimulation further evidence that these genes involve in the response to bacterial infection (S10 File). Reports revealed that TRAF6, TRAF3 and P38 play a pivotal role in the RIG-I-like receptor signaling pathway [44,46,47]. Previously study showed that CgTRAF3 responds to bacterial and viral stimulation [56]. CgTRAF3-L was up regulated at 6 h after injection with V. anguillarum, and CgTRAF3-S had a significantly increase at 6 h post-injection with OsHV-1 [53]. In addition, duTRAF6 knockdown by RNAi observably reduced poly(I:C) and Sendai virus-induced NF-κB activation in DEFs, suggesting that duTRAF6 functions as a positive regulator in response to nucleic acid challenge [57]. Similar results were observed in E. coioides, where EcTRAF6 activates NF-jB and has an important role in host defense against parasitic infections [58]. We suspected the activation of RIG-I of Dj-BS upon LPS and PGN simulation might activate downstream signaling pathways that lead to up-regulation of the expression of TRAF3, TRAF6 and P38. However, some previous studies showed that RIG-I associated with the infection of virus, which mainly recognizes the dsRNA of virus [37,42,[59][60][61]. Even though CgRIG-I shares significant similarity with Dj-BS RIG-I gene (68%), in PAMPs infected C. gigas, no prominent increase in expression was observed after LPS and PGN stimulation [55]. The functional amino acids of RIG-I may have mutated during evolution, leading to the loss of binding affinity with bacteria and other pathogens. The mechanism of LPS and PGN participating in RIG-I pathway in Dj-BS merits further study. Sequence alignments of candidate RIG-I gene Elevated gene expression of RIG-I gene was observed, for the first time in this study, upon LPS and PGN stimulation in animals. Previous studies have shown that RIG-I is a cytosolic sensor of viral RNAs and plays crucial roles in immune response [33]. It binds RNAs in its C-terminal domain (CTD) [28,29]. Lu et al. found that, in RIG-I-like receptors (RLRs), three clusters of residues participate in RNA binding with different mechanisms [62].

Conclusion
In this study, we performed a de novo transcriptome analysis of Dj-BS based on the high throughput sequencing technology. Through function annotation of the assembled transcripts, a number of critical and conserved signaling pathways and genes related to innate immune response were identified. A DYNLRB-like gene was cloned, and whole-mount in situ hybridization was used to verify the function of the DYNLRB-like gene in the immune response. We also demonstrated LPS and PGN could simulate bacteria in planarian immune response. Four candidate genes were further studied, which are involved in the RIG-I-like receptor signaling pathway, including RIG-I, TRAF3, TRAF6 and P38. The gene expressions of these factors were elevated upon pathogen stimulation, indicating a possible role in immune response. In addition, the annotated transcripts provide a useful resource for subsequent investigation of molecular mechanism of innate immune system in planarians.

De novo transcriptome assembly
Firstly, raw image data obtained from IlluminaHiSeq2000 sequencing were converted to sequenced reads in the format of FASTQ through base calling. After initial quality evaluation, these raw reads were pre-processed by stripping the adaptors using SeqPrep (https://github. com/jstjohn/SeqPrep). The remaining sequences with a length of less than 20 bp under the specific parameters (-q = 20, -minlen = 20) were also removed using Sickle (https://github.com/ najoshi/sickle). De novo assembly of high quality reads was performed using Trinity (http:// trinityrnaseq.sourceforge.net/), which is composed of three independent software modules: Inchworm, Chrysalis and Butterfly [39]. By Inchworm, RNA-seq data was assembled into unique sequences through establishing a k-mer graph (K = 25). By Chrysalis, the contigs generated in the previous step were clustered, and a de Bruijn graph was constructed for each cluster. Subsequently, by Butterfly, these Bruijn graphs were processed in parallel and the paths were traced according to reads and pairs of reads within graphs to obtain full-length transcripts for alternative splicing isoforms and distinguish transcripts of paralogous genes. Finally, the distributed situation of length of result sequences was counted and analyzed.

Functional annotation and Gene Ontology and KEGG pathway annotation
ORF prediction of assembled transcripts was carried out using Trinity (http://trinityrnaseq. sourceforge.net/analysis/extract_proteins_from_trinity_transcripts.html) before functional annotation. Predicted protein sequences with ORFs and nucleic acid sequences without ORFs were aligned with Genbank NR, String and Gene databases using BLAST (Version 2.2.25 with an E-value 1e-5). Top-Hit sequences were retrieved from NT, NR and String and were annotated according to the Gene Ontology (GO) databases (http://www.geneontology.org/) using the Blast2GO program (http://www.blast2go.com/b2ghome) [42]. All assembled transcripts were aligned with GENES (http://www.genome.jp/kegg/genes.html) using BLAST algorithm (Blastx/Blastp 2.2.24+) based on KEGG databases (http://www.genome.jp/kegg/), and the specific biological pathways shared by multiple genes were assigned accordingly.

Whole-mount in situ hybridization (WISH)
Whole-mount in situ hybridization was performed as previously described [63]. Planarians samples were observed with a Nikon SMZ 1500 stereomicroscope (Nikon, Japan). Planarians were exposed and challenged with 10 μg/ml LPS and 10 μg/ml PGN in sterile water. After 6 h of inducement, the animals were washed and used for experiment.

Real-Time Quantitative Reverse Transcription PCR (real-time qRT-PCR) analysis
Total RNA was extracted with TRIZOL reagent from adult planarians stimulated by LPS (10 μg/ml), PGN (10 μg/ml), Gram-negative bacteria (Escherichia coli 44102 and Pseudomonas aeruginosa 10104, 10 7 /ml) and Gram-positive bacteria (Staphylococcus aureus 26003 and Bacillus subtilis 63501, 10 7 /ml), respectively, at different time points (0, 1, 5, 9, 12 and 24 h). RNA was reverse transcribed into cDNA, which was used as a template in the PCR according to the protocols of Revert Aid First Strand cDNA Synthesis Kit (Thermo Scientific, Waltham, Massachusetts, USA). PCR amplification was performed on an ABI 7500 real-time PCR system using a Fast Start Universal SYBR Green Master (Rox) (Roche, Basel, Switzerland). The primers for PCR were designed by Primer 5.0. Values are given as the means ± S.D. (N = 5), and Ã indicates a significant difference, with P< 0.01.