Transcriptome Analysis and Discovery of Genes Involved in Immune Pathways from Hepatopancreas of Microbial Challenged Mitten Crab Eriocheir sinensis

Background The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq) technology provides a powerful and efficient method for transcript analysis and immune gene discovery. Methods/Principal Findings A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 108 cfu·mL−1) was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr) database. For function classification and pathway assignment, 18,734 (36.00%) unigenes were categorized to three Gene Ontology (GO) categories, 12,243 (23.51%) were classified to 25 Clusters of Orthologous Groups (COG), and 8,983 (17.25%) were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. Conclusions/Significance This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab.


Introduction
Chinese mitten crab Eriocheir sinensis (Crustacea: Decapoda: Grapsidae, Eriocheir) (Henri Milne Edwards, 1854) is one of the important economic aquaculture species in China. However, with rapid development of large-scale culture, frequent outbreaks of diseases caused by viruses, bacteria and rickettsia-like organisms have led to catastrophic economic losses in cultured E. sinensis stocks [1][2][3]. Characterizing immune molecules and understanding defense mechanism are useful to health management and disease control in crab aquaculture.
Like other invertebrates, E. sinensis lacks adaptive immune system and mainly depends on innate immunity. Innate immune system provides a first line for host to defense against invading pathogens. It is composed of cellular responses like phagocytosis and encapsulation, and humoral responses that produce immunerelated factors. Immune relevant genes, such as crustin [4], cathepsin L [5], prophenoloxidase (proPO) [6], C-type lectin [7] and anti-lipopolysaccharide factor (ALF) [8], have been separately cloned and characterized from E. sinensis. However, knowledge about immune system of E. sinensis is still fragmentary and different signaling pathways implicated in immune response also remain incomplete.
To date, genome sequence of any crab species is still unavailable, which limits resources of molecular information. In recent years, high-throughput RNA-sequencing (RNA-Seq), including Solexa/Illumina, Roche/454 and ABI/SOLiD, has offered high-effective technology for analysis of gene expression, discovery of novel transcripts, identification of differentially expressed genes and others [9]. The powerful technology provides a new opportunity for studies of genome reference-free species and non-model organisms. With development of this technology, RNA-Seq has been widely applied in various invertebrates, such as Eriocheir sinensis [10,11], Litopenaeus vannamei [12], Bactrocera dorsalis [13], Pinctada martensii [14] and Crassostrea gigas [15].
In crustacean, apart from functioning as a digestive gland, hepatopancreas is also an important immune organ that functions as a primary site to synthesize and excrete immune molecules, such as beta-1,3-glucan binding protein (LGBP) [16], antibacterial peptide (AMP) [17], lectin or lectin related proteins and others [18]. Expressed sequence tag (EST) analysis and gene discovery of L. vannamei and L. setiferus also demonstrated that hepatopancreas played a crucial role in innate immunity and hepatopancreas cDNA library appeared to be more diverse than hemocytes library [18]. Hence, large-scale identification of immune genes from hepatopancreas is of great value and necessity to study immune mechanism in crustacean. Previously, Jiang et al [19] constructed a nonnormalized hepatopancreas cDNA library of E. sinensis and characterized immune-associated genes by EST approach. It aided to understand biological function of hepatopancreas and served a basis for in-depth investigation of Chinese mitten crab.
In the present study, by Illumina sequencing and bioinformatics analysis, we analyzed hepatopancreas transcriptome of E. sinensis that was infected with a mixture of three pathogen strains (Grampositive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 10 8 cfu?mL 21 ). Main objective of this study was to annotate functional genes from this transcriptome analysis and identify potential immune molecules of different signaling pathways, such as Toll, immune deficiency (IMD), janus kinase (JAK)-signal transducers and activators of transcription (STAT) and mitogen-activated protein kinase (MAPK) pathways.

Ethic Statment
This study was strictly performed in accordance with the Guide for Care and Use of Laboratory Animals by Chinese Association for Laboratory Animal Sciences (No. 2011-2).

Mitten crab and microbial challenge
Healthy mature female mitten crabs were obtained from a commercial farm in Panjin, China and acclimatized in oxygenated seawater at 1561uC for a week before processing. During whole period of the experiment, all crabs were fed with clam meat and the water was changed every day. For immune challenge experiment, we prepared a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris), which were suspended in 0.1 mol/L PBS (pH 7.0) with the final pathogens concentration of 10 8 cfu?mL 21 . The crabs were injected at arthrodial membrane of the last walking leg with 100 mL the mixture of pathogens and returned into seawater tanks for 8 h. Hepatopancreas of treated crabs were collected and kept in liquid nitrogen for RNA extraction.

RNA isolation and cDNA library construction
Total RNA was isolated with Trizol Reagent (Invitrogen), after which the concentration, quality and integrity were determined with a NanoDrop spectrophotometer and an Agilent 2100 Bioanalyzer. Poly-(A)-containing mRNA was purified using oligo(dT) magnetic beads and Oligotex mRNA Kits (Qiagen). The mRNA was fragmented and used as template to synthesize first-stranded cDNA with reverse transcriptase and random hexamer-primers. Second-stranded cDNA was synthesized using RNase H and DNA polymerase I. These double-stranded cDNA fragments underwent process of end repair, addition of a single 'A' base and ligation of adapters. Adaptor modified fragments were selected by gel purification and amplified through PCR to create the final cDNA library.

Illumina sequencing, assembly, and annotation
Transcriptome sequencing was carried out on an Illumina HiSeq 2000 platform that generated about 100 bp paired-end (PE) raw reads (Novogene Bioinformatics Technology Co.Ltd). Raw sequences were deposited to NCBI Short Read Archive (SRA) database (http://www.ncbi.nlm.nih.gov/Traces/sra/). After removing adaptor sequences, ambiguous 'N' nucleotides (with the ratio of 'N' to be more than 10%) and low quality sequences (with quality score to be less than 5), the remaining clean reads were assembled using Trinity software as described for de novo transcriptome assembly without reference genome [20].
For homology annotation, non-redundant sequences were subjected to public databases including NCBI (http://www.ncbi. nlm.nih.gov/) non-redundant protein (Nr) and non-redundant nucleotide (Nt), Swiss-Prot (http://www.ebi.ac.uk/uniprot/), Gene Ontology (GO) (http://www.geneontology.org/), Clusters of Orthologous Groups (COG) (http://www.ncbi.nlm.nih.gov/ COG/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/). If results of different databases were conflicted, a priority order of alignments from Nr, Nt, KEGG, Swiss-Prot, GO and COG databases was followed. Comparing to Nr, Nt and Swiss-Prot databases was carried out using BlastX algorithm with an E-value cut-off of 10 210 . GO terms at 2 nd level was used to perform GO annotation. COG and KEGG classification were done using BlastX with an E-value cutoff of 10 25 .

Immune gene identification
Immune genes belonging to different signaling pathways were manually identified according to annotated sequences in above databases. Protein coding sequences (CDSs) were also predicted by Trinity software and multiple sequence alignment was carried out using ClustalX.

Gene expression validation
Genes identified in this transcriptome sequencing analysis were validated and quantified by real-time PCR (RT-PCR). Primers   (Table S1) were designed according to Illumina sequencing data with Primer Premier 5. Prepared total RNA used in RT-PCR analysis was isolated from the same treated crab hepatopancreas as that in Illumina sequencing. Reversed cDNA was also synthesized using the same method as described in Illumina sequencing preparation. RT-PCR was performed in an ABI 7300 Real-time Detection System (Applied Biosystems). b-actin of E. sinensis was used as an internal control to normalize the expression level and all experiments were performed in triplicate. The reaction was carried out in a total volume of 10 mL, containing 5 mL of 26 SYBR Premix Ex Taq TM II (TaKaRa), 0.2 mL of 506 ROX Reference Dye, 2 mL of diluted cDNA mix, 0.2 mL of each primer (10 mM) and 2.4 mL of Milli-Q water. Thermal profile for SYBR Green RT-PCR was 95uC for 5 min, followed by 40 cycles of 95uC for 5 s and 60uC for 31 s. To confirm that only one PCR product was amplified and detected, dissociation curve analysis of amplification products was performed at the end of each PCR reaction. After the PCR program, data were analyzed with ABI 7300 SDS software (Applied Biosystems). The comparative CT method (2 2DD CT method) was used to analyze the expression level of different genes.

Transcriptome sequencing and assembly
Illumina sequencing data from microbial challenged E. sinensis hepatopancreas were deposited to NCBI SRA database under accession number of SRA068878. Approximately 40.78 million Illumina PE raw reads were generated (Table 1). After removing adaptor sequences, ambiguous nucleotides and low-quality sequences, 39.76 million clean reads with an average length of 101.10 bp remained. Assembly of clean reads resulted in 70,300 unigenes that ranged from 201 bp to 16874 bp with a N50 length of 1834 bp (Table 1). Length statistics of assembled unigenes were displayed ( Figure 1).

Blast analysis
After eliminating repeated and short-length sequences, 52,074 non-redundant unigenes were subjected to public databases for similarity searching. 17,617 (33.83%) and 5,033 (9.67%) nonredundant unigenes ( Table 2) showed identity with sequences in NCBI Nr and Nt databases, respectively. E-value and score distribution of best hits in Nr database revealed that 55.45% (9,481) of matched sequences showed high homology with an Evalue ,1E-50 and 55.79% (9,539) with a score .500 ( Figure 2). Our results also showed that 28.54% (14,862) of non-redundant unigenes demonstrated similarity to known genes in Swiss-Prot database ( Table 2).
Using KEGG, 8983 unigenes ( Table 2) were assigned to six specific pathways, including metabolism, cellular processes, organism system, human diseases, genetic information processing and environmental information processing (Table 3). Totally 2,582 unigenes were identified in metabolism and main metabolism terms were 'carbohydrate metabolisms', 'nucleotide metabolisms' and 'amino acid metabolisms'. Dominant subcategories of other five pathways were 'cell growth and death', 'nervous system', 'infectious diseases', 'translation' and 'signal transduction', respectively.

Immune gene and pathway analysis
High-throughput sequencing effort revealed that a large number of molecules were highly enriched in immune processes and signaling pathways. Among them, we focused on key genes involved in Toll, IMD, JAK-STAT and MAPK signaling pathways. Main components of these immune pathways were described as follow.
Toll pathway. Twenty-six non-redundant unigenes were identified with identity to main molecules of Toll pathway, including Spatzle, Toll, myeloid differentiation factor 88 (MyD88), Pelle, Cactus, Dorsal/ Dorsal-related immunity factor (Dif) ( Table 4, Figure S1). Tube and tumor necrosis factor receptorassociated factor 6 (TRAF6) appeared to be absent in this study. In putative Toll pathway, microbial components triggered activating of Spaetzle and Toll, which initiated signaling pathway by recruiting MyD88 and other molecules (Tube, Pelle and TRAF6). Then, it induced nuclear translocation of Dorsal/Dif. The key adaptor protein coordinating Toll pathway, MyD88, contained a death domain (DD) and a Toll/interleukin-1 receptor (TIR)  domain ( Figure 5A). Multiple sequence alignment of MyD88 from E. sinensis and other thirteen species revealed that they were more similar at N-terminus and less conserved at C-terminus ( Figure 5B). Sequence of E. sinensis MyD88 showed highest identity (65%) to homolog from L. vannamei. IMD pathway. Fourteen non-redundant unigenes showed similarity to signaling molecules of IMD pathway, such as IMD, transforming growth factor beta-activated kinase (dTAK1), inhibitor of nuclear factor kappa-B kinase (IKK), Dredd and Relish, while Fas-associated death domain protein (dFADD) was not detected (Table 5, Figure S2). In putative IMD pathway, bacterial components activated the adaptor protein IMD, causing signaling cascade and finally leading to activation of Relish. Relish then regulated expression of antimicrobial peptide (AMP) and other immune-related genes.
JAK-STAT pathway. Various molecules involved in JAK-STAT signaling pathway were characterized in our analysis (Table 6, Figure S3). Through JAK-STAT pathway, different aspects of hematopoiesis and immune response were mediated by cytokines, including interleukin (IL), interferon (IFN), growth hormone (GH) and thyroid peroxidase (TPO) [21]. In putative JAK-STAT pathway, after interaction of cytokine and cytokine receptor (CytokineR), STAT was activated by JAK, dimerizing, translocating to the nucleus and regulating the expression of target genes. Furthermore, numerous regulatory layers were found in this pathway. They were divided to negative regulators including SH2containing phosphatases SHP1 and SHP2, cytokine inducible SH2-containing protein (CIS), suppressor of cytokine signaling (SOCS) and protein inhibitor of activated STAT (PIAS), and positive regulators including signal transducing adaptor molecule (STAM), mitogen-activated protein kinase (MAPK) and the interacting proteins.
MAPK pathway. Putative MAPK signaling pathway containing 122 non-redundant unigenes was also analyzed ( Table 7, Figure S4). MAPKs were composed of three different major families -c-Jun N-terminal kinase (JNK) family, p38/stressactivated protein kinase (p38/SAPK) family and extracellularsignal regulated kinase (ERK) family, and regulated different processes by protease cascade. In this putative pathway, each cascade was triggered by extracellular signals and resulted in activation of MAPK kinase kinase (MAPKKK/MEKK), followed by activation of MAPK kinase (MAPKK/MEK/MKK) and MAPK/ERK, finally leading to function of diverse substrates and NF-kB proteins.
Results of RT-PCR revealed different expression abundances of the analyzed genes ( Figure 6). Among them, TrySP, ChySP and ALF showed highest expression level, followed by Thy and TrxR, while LZW displayed the lowest level. It was consistent with the     results of Illumina sequencing data (Table S1), which not only validated the expression profile of different identified immune genes, but also verified the reliability and accuracy of our transcriptome analysis.

Discussion
In this context, considerable efforts have been made to research hepatopancreas transcriptome of microbial challenged E. sinensis by high-throughput sequencing technology (Solexa/Illumina). Comparing with EST analysis of hepatopancreas from E. sinensis with traditional method [19,22], our study produces more sequencing reads and assembled unigenes. It largely enriches transcriptome data of E. sinensis and indicates enormous advantage of high-throughput technology. Although a comparative transcriptome analysis of haemocytes from E. sinensis under normal condition and in response to Spiroplasma eriocheiris infection indicates certain microRNAs may be essential in interaction between host and pathogen [11], only miRNAs are identified and analyzed for the expression pattern. In our study, various immune genes and pathways are annotated from hepatopancreas of E. sinensis after immune challenge. The analysis increases molecular information and genomic resources of E. sinensis in response to microorganism stimulation.
Toll pathway was initially identified in genetic screen of genes involved in early embryonic development of Drosophila [23] and gradually studied of importance in innate immunity. In economic crustaceans, many genes related to Toll pathway, such as Spatzle [24], Toll [24], MyD88 [25], Pelle [26] and TRAF6 [27], have been reported from shrimp, while only SpToll of Scylla paramamosain [28] has been cloned and characterized from crab. In the present study, we are first to find various key members of Toll pathway in E. sinensis. This suggests the existence of putative Toll pathway in crab and indicates its crucial function in antimicrobial response. Moreover, different from mammalian Toll-like receptors (TLRs) directly functioning as a pattern recognition receptor (PRR) to recognize pathogen-associated molecular patterns (PAMPs) [29], DmToll of Drosophila melanogaster uses the cytokine-like molecule Spatzle as a ligand [24,30,31]. In E. sinensis, identification of Spatzle in our study also suggests that the Chinese mitten crab Toll may be activated by functioning with Spatzle. In addition, in spite of different MyD88 variants in human, mice, chicken and other vertebrates, only a MyD88 variant gene is found in an invertebrate species L. vannamei [25]. Here, we find one MyD88 sequence of E. sinensis that shows highest similarility to homolog from L. vannamei. This will provide a foundation for further study of MyD88 in crab.
Gram-negative bacteria-yielded diaminopimelic acid (DAP)type peptidoglycan can be recognized by peptidoglycan recognition protein (PGRP)-LE and PGRP-LC receptor complex, which then activate IMD and cause activation of signaling cascade to trigger Relish [32]. Experiments of Drosophila also reveal that infection by Gram-negative bacteria activates IMD pathway, but not Toll pathway [30,31]. In this context, among different molecules relevant to IMD pathway, not only caspase and Relish previously reported [33,34] are identified, but also IMD, dTAK1 and IKK are first found in microbial challenged E. sinensis. Similarly, LvIMD of L. vannamei and FcRelish of Fenneropenaeus chinensis are identified after immune challenge and characterization of them implies that they can induce expression of some antimicrobial peptides (AMPs), which are integral components of innate immune system and exhibit great activities to defense against pathogens [35,36]. Taken these reports together, investigation of principal component molecules will promote researching on innate immune mechanism and immune pathway of E. sinensis.
A large number of molecules involved in JAK-STAT signaling pathway such as four JAKs, seven STATs and more than 30 cytokines are widely found in mammals [21]. However, only SOCS and leptin receptor protein (LEPR) have been cloned from E. sinensis [37,38]. In the present study, along with SOCS and LEPR, many other genes including CytokineR, JAK, STAT, downstream genes and regulatory molecules (CIS, SHP1, SOCS, PIAS and STAM) are first fully and systemically identified in crab. Considering different aspects of cell development and host response activated by JAK-STAT pathway [39], there is no surprise that lots of regulators are found to control this pathway. Expression and regulation of components in JAK-STAT pathway are also reported in transcriptome analyses of microbial infected Pseudosciaena crocea [40] and Laodelphax striatellus [41]. These reports together increase knowledge of JAK-STAT pathway on microbial stimulation and provide valuable information for further study of immune response against pathogen infection. Additionally, researchers have compared the one single STAT gene from invertebrates with seven STATs from vertebrates by phylogenetic analysis [42]. This comparison supports the hypothesis that STAT genes duplicate before splitting in invertebrates and vertebrates and shows difference between them. In mammals and other vertebrates, JAK-STAT pathway plays a crucial function in lots of biological processes of both innate and adaptive immunity, such as apoptosis, proliferation, differentiation, hematopoiesis, oncogenesis and immune defense [39,43]. However, in crustaceans, only the antibacterial or antiviral activities of several JAK/STAT genes are known so far [37,44,45]. It is still unknown whether the pathway has other functions and needs us to do more efforts for its complete function research.
MAPK pathway widely exists in all eukaryotes from yeast to human. Through a conserved three-kinase cascade that finally phosphorylates intracellular substrates and transcription factors, it transducts extracellular cues to cytoplasm and nucleus to control physiological processes [46]. Currently, similar with JAK-STAT pathway, most knowledge of MAPK pathway is also focused on vertebrate system. In vertebrates, this pathway is multifunctional and plays a key role in anti-stress, reproduction, cell development, differentiation and inflammation [46]. In shrimp, anti-lipopolysaccharide factor treatment can regulate Trichomonas vaginalisinduced proinflammatory cytokines through MAPK pathway [47]. Interestingly, in the crab Chasmagnathus, MAPK pathway participates in neural plasticity, which can only be found in rodents and mollusks before, and is necessary for long-term memory consolidation of this crab model [48]. Thus, MAPK pathway may have many different functions in various species of vertebrates and crustaceans. However, for the reason that knowledge about MAPK pathway in aquatic invertebrates is largely unclear, it still needs deep research to fully clarify the role of this pathway. Our detection of numerous genes involved in MAPK pathway, such as ERK, JNK, p38, MEK, MEKK, Elk, Crk and CREB, will offer valuable reference in crab and other important crustaceans.
In conclusion, numerous genes from hepatopancreas of microbial challenged E. sinensis are characterized to be associated with Toll, IMD, JAK-STAT and MAPK pathways. Accuracy of Illumina sequencing data and expression profile of the identified genes are also further confirmed by RT-PCR. This research will be not only helpful to fully research host-pathogen interaction and comprehensively understand immune system of crab, but also beneficial to prevent diseases appeared in crab culture. Figure S1 Putative Toll pathway. Putative Toll pathway of E. sinensis was constructed based on knowledge in Drosophila and shrimps. Proteins appearing in hepatopancreas of microbial challenged E. sinensis were represented in grey circle and absent proteins in grey square. However, most interactions have to be confirmed experimentally. (TIF) Figure S2 Putative IMD pathway. Putative IMD pathway of E. sinensis was constructed based on knowledge in Drosophila and shrimps. Proteins appearing in hepatopancreas of microbial challenged E. sinensis were represented in grey circle and absent proteins in grey square. However, most interactions have to be confirmed experimentally. (TIF) Figure S3 Putative JAK-STAT pathway. Putative JAK-STAT pathway of E. sinensis was constructed based on KEGG reference pathway. Proteins appearing in hepatopancreas of microbial challenged E. sinensis were represented in circle and absent proteins in square. However, most interactions have to be confirmed experimentally. (TIF) Figure S4 Putative MAPK pathway. Putative MAPK pathway of E. sinensis was constructed based on KEGG reference pathway. Proteins appearing in hepatopancreas of microbial challenged E. sinensis were represented in grey circle and absent proteins in grey square. However, most interactions have to be confirmed experimentally.

(TIF)
Table S1 Genes and specific primers used for real-time PCR. (DOC)