Transcriptome Analysis Revealed Changes of Multiple Genes Involved in Haliotis discus hannai Innate Immunity during Vibrio parahemolyticus Infection

Abalone (Haliotis discus hannai) is one of the most valuable marine aquatic species in Korea, Japan and China. Tremendous exposure to bacterial infection is common in aquaculture environment, especially by Vibrio sp. infections. It’s therefore necessary and urgent to understand the mechanism of H. discus hannai host defense against Vibrio parahemolyticus infection. However studies on its immune system are hindered by the lack of genomic resources. In the present study, we sequenced the transcriptome of control and bacterial challenged H. discus hannai tissues. Totally, 138 MB of reference transcriptome were obtained from de novo assembly of 34 GB clean bases from ten different libraries and annotated with the biological terms (GO and KEGG). A total of 10,575 transcripts exhibiting the differentially expression at least one pair of comparison and the functional annotations highlight genes related to immune response, cell adhesion, immune regulators, redox molecules and mitochondrial coding genes. Mostly, these groups of genes were dominated in hemocytes compared to other tissues. This work is a prerequisite for the identification of those physiological traits controlling H. discus hannai ability to survive against Vibrio infection.


Introduction
Abalones (Gastropoda; Haliotidae) has become an increasingly important fishery and aquaculture resource with high nutrition value and widely distributed throughout the southern coasts of Korea, Japan, China and Taiwan. Abalones have sedentary lifestyle along shallow rocky coast lines; also these gastropods are vulnerable to capture [1]. Totally, 111,000 metric tons (mt) of abalone (from all sources) were supplied globally in 2013 with over 103,000 mt harvested directly from aquaculture facilities (http://www.fao.org/statistics/en/). China itself account for over 79% of the global abalone aquaculture product followed by South Korea (12%). It is estimated the total output of abalone production in South Korea is more than 1000 mt per year [2].
Like other invertebrates, Haliotis discus hannai lacks adaptive immune system and mainly depends on innate immunity. Innate immune system provides a first line for host to defense against invading pathogens and other stress. It is composed of diversified repertoires of receptors, regulators, and/or effectors including Toll-like receptors (TLRs), fibrinogen-related proteins (FREPs), scavenger receptor cysteine-rich (SRCRs), and antimicrobial proteins, as well as many other molecules involved in the key processes of agglutination, phagocytosis, and encapsulation [3]. Immune relevant genes such as ferritin [4], C-type lectin [5], cysteine protease inhibitor [6], cathepsin B [7], chicken-type lysozyme [8] and HSP70 [9] have been separately cloned and characterized from H. discus hannai. However, knowledge about immune system of this pacific abalone is still fragmentary and different signaling pathways implicated in immune response also remain incomplete.
To date, genome sequence of any Haliotis is still unavailable, which limits resources of molecular information. In recent years, de novo approach has offered highly-effective technology for analysis of gene expression, discovery of novel transcripts, identification of differentially expressed genes and others [10]. To understand the abalones biology, genomic resources especially transcriptome sequencing for these non-model organisms are necessary. This can lead to the discovery of a large number of candidate genes and address differential gene expression levels that are valuable for further genomic researches.
From broader perspective, earlier studies on the abalone transcriptome is limited but has been increased in recent years [1,[11][12][13][14]. Huang et al. [11] used transcriptome sequencing to identify genes that were expressed during early development of abalone. Franchini et al. [1] characterized the South African abalone (Haliotis midae) transcriptome with more than 25 million short reads resulting in more than 20,000 relatively short assembled transcripts. Shiel et al. [14] characterized green lip abalone (Haliotis laevigata) and mainly focused on the molecular chaperone, Heat Shock Protein 70 (HSP70). Published mitochondrial genomes are available for a few abalone species including Haliotis rubra [15] and Haliotis laveigata [16] whereas organ specific transcriptomes were studied in Haliotis asinine (Jackson et al. 2010) [17] and Haliotis rufescens [13]. Organs are connected by substance flow to maintain their overall functions, however the details of which have yet to be made clear at cellular and molecular levels. It has been reported that, hepatopancreas function as an immune organ and primary site to synthesize and excrete immune molecules, such as pattern recognition proteins and lectin or lectin related proteins in all invertebrates [18].
In the present study, we have generated a reference transcriptome for H. discus hannai that represents multiple tissues responding to multiple stressors common to aquaculture environments. In addition, we have also analyzed hemocytes, gill, hepatopancreas and mantle transcriptome of H. discus hannai that was infected with the Gram-negative bacteria Vibrio parahemolyticus. We further annotated the transcripts by matching them against uniprot, Gene ontology (GO), and Kyoto Encyclopedia of Gene and Genome (KEGG). A subset of these unigenes resulted potential immune molecules of different signaling pathways. The obtained transcriptome data provide an invaluable genetic resource to study the genome and functional genes of abalones and also for comparative genomic studies with other bivalves.

Abalone and bacterial strains
Abalone, Haliotis discus hannai (mean body length 62±1 mm body weight 50±1g, body height 15±1 mm, body width 45±1 mm) were supplied from the Genetics and Breeding Research Center (GRBC) of the National Fisheries Research and Development Institute (NFRDI), Republic of Korea. Abalones were maintained in 6 tons flow-through tank at 18-20°C under a natural photoperiod. The bacterial strain 8M 4-1 which was used for challenge experiment in the present study was isolated originally from the Ark Shell (Scapharca broughtonii) during an outbreak of mass mortality in Gangjin bay, Southern Korea in June and September 2012 [19]. According to the analysis of API 20NE profiles and 16S rDNA sequence, the isolate was identified as Vibrio parahaemolyticus by API 20NE profiles and 16S rDNA sequence (showing 99% identity with other public reported V. parahaemolyticus). According to the 50% lethal dose (LD 50 ) test with the 8M4-1 strain, the LD50 value was determined at 1.0×10 6 CFU/mL (data not shown).V. parahemolyticus from a single working stock was streaked and grown onto a brain-heart infusion (BHI) agar plate (BD Biosciences, USA) at 25°C overnight. To prepare the injection dose, V. paraheamolyticus was harvested from the plates and suspended in phosphate-buffered saline (PBS), quantified by optical density at 600 nm and then serially diluted in the PBS buffer to obtain the test dosages. The serial diluted bacteria at each OD 600 values were confirmed by counting colony-forming units (CFU) on the BHI agar plate. OD 600 values of 0.1 and 1.0 were determined to be equal to approximately 1.2×10 8 CFU/mL and 1.0×10 9 CFU/mL, respectively. For validation, abalones were infected with the intracellular injection 100 μL of V. parahemolyticus (1.2×10 8 CFU/mL) suspended in PBS buffer, and a control ones were injected with PBS buffer. Tissue samples (hemocytes, mantle, gill and hepatopancreas) from the abalones (three individuals per time point) were collected at 0, 3, 6, 9, and 12 h after infection and immediately ground under liquid nitrogen for RNA extraction. Hemocytes were extracted by centrifugation at 3000rpm for 5min 4°C. In order to check the bacterial challenge, we performed qRT-PCR of V. parahemolyticus genes (tlh, trh, toxR, toxRS) using control and infected samples. Unfortunately, the qRT-PCR results could not obtained significantly due to too short infection time (data not shown). As H. discus hannai is not an endangered or protected species, and collections were only made from GRBC of NFRDI which is a government funded research institute. Hence, no specific permits were required for the described study.

Next generation sequencing of transcriptome
To obtain high-throughput transcriptome data of Haliotis discus hannai, we implemented Illumina-based NGS sequencing. Total RNA was extracted individually from the six tissues (hemocytes, mantle, gonad, digestive gland, gill and hepatopancreas) of control and four tissues (hemocytes, mantle, gill and hepatopancreas) of infected, using TRIzol reagent (Invitrogen) according to the manufacturer's protocol. Total RNA was then quantitated using Nanodrop spectrophotometer (Thermo Scientific) and quality-assessed by RNA 6000 Nano assay kit (Agilent) and Bioanalyser2100 (Agilent). NGS sequencing libraries were generated from one microgram of total RNA using TruSeq RNA Sample Prep Kit (Illumina) according to the manufacturer's protocol. In brief, the poly-A containing RNA molecules were purified using poly-T oligo attached magnetic beads. After purification, the total poly A + RNA was fragmented into small pieces using divalent cations under elevated temperature. The cleaved mRNA fragments were reverse transcribed into first strand cDNA using random primers. Short fragments were purified with a QiaQuick PCR extraction kit and resolved with EB buffer for end reparation and addition of poly (A). Subsequently, the short fragments were connected with sequencing adapters. Each library was separated by adjoining distinct MID tag. The resulting cDNA libraries were then paired-end sequenced (2x101bp) for control organs with HiSeq™ 2000 system and paired-end sequenced (2x300bp) for infected samples with MiSeq (Illumina). window:4:15, and minlen:30. Processed sequences were checked for the bacterial contamination with marine metagenome whole genome shotgun (WGS) sequences (Bio Project: PRJNA13694) downloaded from NCBI. Preprocessed clean reads were mapped to marine metagenome database using Bowtie2 with default parameters and removed those mapped reads with their respective pairs, from now these sequence called as preprocessed. Total, preprocessed sequences from HiSeq were pooled together and assembled with Trinity assembler [20] using default values. To remove the redundant sequences, CD-HIT-EST [21] was used with 95% sequence similarity. To confirm the assembly, the sequence reads from infected samples were mapped to the assembled transcriptome, showing high mapping rate (~91%), the MiSeq were not used for the assembly. Finally the transcripts more than 500 bp were selected as reference transcriptome. Reference transcriptome were subjected to functional annotation using BLASTX mapping (e-value cut-off 1e -5 ) against to UniProt KB (Metazoa) database and the Gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway maps were using Blast2GO [22]. GO annotations were classified using the WEGO web server [23].

Identification of differentially expressed genes
Differentially expressed genes (DEGs) were measured by counting tags from infected samples against the non-infected H. discus hannai and normalized using the RNA Sequence Expected Maximization (RSEM) method [24]. Initially, reads from non-infected tissues (hemocytes, mantle, gill and hepatopancreas) were mapped to reference transcriptome and subjected to check the differential expression using trinity utility scripts (align_and_estimate_abundance.pl and abundance_estimates_to_matrix.pl) as instructed (http://trinityrnaseq.github.io/). From the edgeR statistics files, regulated transcripts across libraries were filtered with default parameters (i.e. 1 log 2 (FC), FDR < 0.01) using python scripts. To obtain the differential expression pattern from GenBank datasets, same procedures were followed.

Experimental validation with qRT-PCR
To validate the transcriptom dataset, five random transcripts from immune-related genes were selected for real-time qPCR confirmation. Primer sequences were designed using (http:// fokker.wi.mit.edu/primer3/) and the related information are shown in S4 Table. According to the FastStart DNA Master SYBR Green I (Roche Diagnostics) protocol, the reactions were run on LightCycler system (Roche Diagnostics) using 20 mL reaction system. Reaction procedures were: 95°C 10 s, 45 cycles at 60°C 5 s, 72°C 20 s with fluorescence reading. Immediately following PCR, the machine performed a melting curve analysis by gradually increasing the temperature (0.1°C/s) while measuring the intensity of fluorescence emission. The mRNA expression of each gene was normalized to 18S rRNA expression (accession no. AY319433; 18S rRNA-F, 5'-CTC ACG GAA AGA GCG CGT TTA-3', 18S rRNA R: 5'-GAC TTG CCC TCC AAT AGA TC-3') as a reference gene. Each sample was analyzed in triplicate and the data were calculated as the mean ± standard deviation (SD) of relative mRNA expression.

Sequence information sources
The GenBank database has the most up to date and comprehensive DNA sequence information where scientific communities rely on it for each and everything. To obtain the more precise expression pattern, the sequences were collected from GenBank using the keyword "Haliotis discus hannai" and the reference transcriptome were mapped using megablast against to GenBank nucleotide database with the custom parameters (E-value: 100,000, word size: 28, perc_identity:75). Sequences collected from GenBank were mined for the word "Partial/ strains" were removed. Mitochondrial genes of H. discus hannai were obtained from GenBank accession: KF724723 [25]. To confirm completeness of our transcriptome data, reads were mapped with mitochondrial genes using CLC Mapper with default parameters [length fraction 80% and similarity (90%)]. Finally, sequences were selected with 70% coverage with 2 depth of individual bases. From here, there sequences named as GenBank datasets (336 genes).

Results and Discussion
Transcriptome profile of H. discus hannai To obtain more detailed information of H. discus hannai transcriptome, adult abalones (6-8cm shell length) were collected and cDNA was transcribed from total RNA followed by cDNA library sequencing with Illumina HiSeq™ 2000 system to obtain as many transcripts as possible. Sequencing of cDNA libraries generated a total of 361,633,550 transcriptomic reads with a length of 36,524,988,550 nucleotides (nt), corresponding to 96% of clean reads. Files containing these data were deposited in the Sequence Read Archive of the National Center for Biotechnology Information (NCBI) with the accession number of SRP059307. Because there is no assembled and annotated Haliotis genomic sequences, Trinity de novo assembler (Grabherr et al. 2011) was used to assemble all the trimmed reads with optimized Kmer length of 25. The highquality clean reads were assembled into 97,828 non-redundant unigenes ranging from 500bp to 29,122 bp with an average length of approximately 1,416 bp with a threshold level of 500 bp were obtained. The number of unigenes longer than 500 bp is another measure used to compare transcriptome assembly. Among these unigenes, 25,674 unigenes (26.2%) were no more than 700 bp in length, 11,275 unigenes (11.5%) were in the length range of 900 to 1,100bp, and 13,857 unigenes (17.1%) were longer than 2,000 bp. The detailed length distribution of the transcripts is shown in Fig 1A, and a summary of sequencing and assembly results are presented in Table 1.

Functional annotations
To provide putative annotations for the reference transcripts, sequence was annotated using Blast2GO as explained in the method section. A total of 43,843 (45%) unigenes were identified from 97,828 consensus sequences of H. discus hannai. Among these, 23,560 (24%) unigenes belong to three categories of Gene Ontology (GO), and 3,337 (3%) to 7 categories of KEGG (Table 1). In annotations, the lengthy transcripts were contributed more rather than the short transcripts ( Fig 1A) and mostly genes/sequences from Lottia gigantea (43%) and Crassostrea gigas (24%) were mapped to H. discus hannai transcripts (Fig 1B). Other bivalve species in the BLASTX top-hit were Capitella teleta (3%), Branchiostoma floridae (3%) and Stronglylocentrotus purpuratus (3%). H. discus hannai (<1%) itself fell in the least position of the top-hit species distribution. This may be explained by the limited number of H. discus hannai proteins that currently available at NCBI database. The rest of the low number of matches indicates a lack of bivalve data in public databases. GO analysis of our dataset showed that among 97,828 assembled unigenes, 22,971 of them were successfully annotated by GO assignments to one or more of the three categories: biological process, cellular component and molecular function (Fig 2A), which fell further into 23, 13 and 12 subcategories with the largest ones being the ''cellular process", ''cell" and ''binding", respectively.
In summary, these terms account for a large fraction of the overall assignments in the H. discus hannai transcriptome data. For further identification of the biological pathways in H. discus hannai, KEGG pathway analysis was performed on all assembled unigenes. Enzyme commission (EC) numbers were assigned to 3,191 unique sequences, which categorized them into 125 pathways. Among those, purine metabolism (2,290), thiamine metabolism and amino benzoate degradation were the top three representatives (S2 Table).

Differentially expressed genes (DEGs) upon V. parahemolyticus infection
Crosstalk between stressors and their stress response have been extensively reported in abalones [5,8,[27][28][29][30][31][32]. To obtain a general view of DEG expression patterns, pairwise comparison of expression abundance in the RNA-seq data of four library sets were first conducted in accordance with GenBank as described in methods. A Venn diagram in Fig 2B describes, overall, 10,575 transcripts expression were altered in all four tissues after exposure to V. parahemolyticus infection. Among these tissues hemocytes, hepatopancreas, mantle and gill, a total of 9,726, 332, 83 and 170 transcripts, respectively, were found to be differentially expressed (Fig 3). Organ/tissue specific expressions were plotted individually for non-infected ( Fig 3A) against infected (Fig 3B) H. discus hannai. The detected fold changes showed that a huge number of transcripts were expressed only in hemocytes (6,195 up-regulated; 3,732 down-regulated) ( Fig  3C). In addition, during non-infected condition a significant number of genes were expressed both in hepatopancreas and mantle whereas expression changes observed in both gill and mantle. Further, we checked the full length sequence of our resulted transcripts in GenBank. The results showed that greatest number of genes were up regulated in hepatopancreas and mantle, whereas down regulated in hemocytes and gill (Fig 3D). It has been reported that 78 genes in hemocytes showed differential expression patterns upon bacterial infection in H. diversicolor [29]. However, most studies in abalones were reported that the tissue specific characterization or expression pattern of genes with respect to dietary supplements [4].

Validation of de novo transcripts using quantitative real-time PCR
In order to validate the transcripts identified from Illumina sequencing, quantitative real-time RT-PCR (qRT-PCR) analysis was performed with randomly selected from immune-related genes (S1 Fig). Results showed that expression value of these genes was in good agreement with the RNA-seq analysis.  Candidate genes involved in H. discus hannai immune response The innate defense mechanisms described in invertebrates have been extensively reviewed [33,34]. Though there are several reports on isolation and characterization of immune related genes in H. discus hannai [5][6][7][8][9], molecular basis on the immune response to pathogens are still rare. GO classification revealed 524 and 150 unigenes belongs to two subcategories: response to stimulus and immune system process respectively (Fig 2A). KEGG pathway analysis revealed three signaling pathways, including the T-cell receptor signaling pathway, phosphatidylinositol signaling system and mTOR signaling pathway (S1 Table). These signaling pathways provide comprehensive information for the understanding of the H. discus hannai immune system. Among them, we focused on key genes corresponding to binding protein, apoptosis and cell cycle related protein, pattern recognition protein (PRR), immune regulator, protease, protease inhibitor, reduction/oxidation (redox) related protein, signal transduction related protein, and stress protein. They were found in our transcriptome dataset infected with V. parahemolyticus. Subsequently, we selected several genes that have relatively 100% coverage, 2 fold regulated and its full length sequences available in public databases ( Table 2). C-type lectins (CTLs) play an important role in the innate immunity of invertebrates. It specifically binds to the carbohydrate residues present on the surface of pathogens. CTLs are widely distributed in the marine populations as the first line of defense and well characterized in oysters [35], but less information is available for abalones. Recently, Zhang et al. [5] reported that, a novel C-type lectin from H. discus hannai (HdCTL1) was involved in host defense against gram-negative bacterial pathogens in the presence of mannose.
Pattern recognition receptors (PRRs) recognize the invading pathogens by innate pattern recognition, and such recognition in the early stage of infection is an important process in the prevention of disease outbreak. Toll-like receptors (TLRs) and peptidoglycan recognition protein (PGRP) are one among the PRRs family, known to induce immune responses against the pathogens by interacting with evolutionarily conserved pathogen-associated molecular patterns (PAMPs) including lipopolysaccharides or peptidoglycan in bacterial cell wall, β-1,3-glucan on fungal cell wall and dsRNA from viruses [36]. These pattern recognition receptors, AbTLR and AbPGRP have been characterized in H. discus discus [31]. The AbPGRP showed significant homology with other molluscan PGRPs implies that it showed high level of expression in hemocytes. Similar expression pattern of 26 TLR and/or PGRP transcripts were observed in our results during exposure to bacteria. In addition, 494 T cell receptor signaling pathway transcripts were identified in which 53 transcripts showed higher expression (RPKM>0.3) in hemocytes compared to other tissues. Information about transcripts that showed homology to molecules involved in T cell receptor signaling pathway is included in S2 Table. The engulfment of bacteria by phagocytic cells results in the activation of numerous innate immune signaling pathways. Reactive oxygen species (ROS) generation is one of the most ancient and efficient means to kill pathogens and it is functionally linked to phagocytosis. Free oxygen radicals and antioxidants are highly toxic and prevent colonization of microbial invaders. Glutathione-S-transferase (GST), metal dependent superoxide dismutase (SOD), peroxiredoxin and catalase are the major antioxidants responsible for abalone innate immunity. Previously, a number of redox genes were characterized and their expression pattern was studied widely in abalones [37][38][39][40][41][42][43][44]. It has been reported that almost all these redox genes were expressed only in hemocytes. In our data, large number transcripts were shown to be down regulated in hemocytes, indicate that ROS has been maintain during the bacterial infection (S3 Table).  In-house analysis with mitochondrial genome Mitochondria emerge as a key component in host innate immune defense, as this organelle is a major cellular source of ROS, which contribute in many ways to host defense against pathogens. We performed an in-house coverage analysis of our infected transcriptome profile with the complete H. discus hannai mitochondrial genome published recently by our group [25]. The results are presented as total number of reads matching with the genes. Overall, 96.57% (16,307 bp) of the mitochondrial genome was mapped with our transcriptome data. Amongst, 13 coding genes, 2 rRNAs and 3 tRNAs showed 100% coverage with the reads. In contrast, the tRNAs varied enormously in their coverage levels. However, when comparing normalized expression, the most highly expressed genes were nd4L and nd6, especially in gill. Five genes had low expression in hepatopancreas, namely nd4, nd4L, nd6, rrn12 and rrn16. The genes encoding subunits of the respiratory complexes (nd1, nd2, nd3, cytb, cox1 to cox3, atp6 and atp8) showed low expression in all tissues and after normalization (Fig 4). Conclusion This is the first comprehensive investigation on the transcriptome of H. discus hannai. In this study, comparative analysis of transcriptome changes between V. parahemolyticus infected and non-infected H. discus hannai revealed a large number of differentially expressed candidate genes and pathways related to immune response were identified in the transcriptome dataset, providing abundant genomic data for future studies on the molecular mechanisms behind important abalone aquatic environment infected with various pathogens.
Supporting Information  Author Contributions