De Novo Assembly of Mud Loach (Misgurnus anguillicaudatus) Skin Transcriptome to Identify Putative Genes Involved in Immunity and Epidermal Mucus Secretion

Fish skin serves as the first line of defense against a wide variety of chemical, physical and biological stressors. Secretion of mucus is among the most prominent characteristics of fish skin and numerous innate immune factors have been identified in the epidermal mucus. However, molecular mechanisms underlying the mucus secretion and immune activities of fish skin remain largely unclear due to the lack of genomic and transcriptomic data for most economically important fish species. In this study, we characterized the skin transcriptome of mud loach using Illumia paired-end sequencing. A total of 40364 unigenes were assembled from 86.6 million (3.07 gigabases) filtered reads. The mean length, N50 size and maximum length of assembled transcripts were 387, 611 and 8670 bp, respectively. A total of 17336 (43.76%) unigenes were annotated by blast searches against the NCBI non-redundant protein database. Gene ontology mapping assigned a total of 108513 GO terms to 15369 (38.08%) unigenes. KEGG orthology mapping annotated 9337 (23.23%) unigenes. Among the identified KO categories, immune system is the largest category that contains various components of multiple immune pathways such as chemokine signaling, leukocyte transendothelial migration and T cell receptor signaling, suggesting the complexity of immune mechanisms in fish skin. As for mucin biosynthesis, 37 unigenes were mapped to 7 enzymes of the mucin type O-glycan biosynthesis pathway and 8 members of the polypeptide N-acetylgalactosaminyltransferase family were identified. Additionally, 38 unigenes were mapped to 23 factors of the SNARE interactions in vesicular transport pathway, indicating that the activity of this pathway is required for the processes of epidermal mucus storage and release. Moreover, 1754 simple sequence repeats (SSRs) were detected in 1564 unigenes and dinucleotide repeats represented the most abundant type. These findings have laid the foundation for further understanding the secretary processes and immune functions of loach skin mucus.


Introduction
Fish skin has vital biological functions including chemical and physical protection, sensory activity, behavioral purposes, thermoregulation, hormone metabolism, maintenance of fluid balance and osmotic homeostasis [1,2].In general, fish skin is composed of three layers namely the epidermis, dermis and hypodermis [1,2].One of the most distinctive features of fish skin is the production of mucus by the unicellular glands of epidermis, mainly goblet cells and club cells [3,4].Compared with terrestrial vertebrates, fishes live in a more adverse environment which contains greater numbers of pathogenic organisms such as bacteria, fungi and parasites.As the interface between environment and the inner body, fish skin is persistently exposed to environmental stressors and provides an important first line of defense against the attachment and penetration of various invading pathogens [5][6][7].
The epidermal mucus is suggested to be one of the most important protective substances associated with fish skin [8].This notion is supported by the isolation and identification of numerous immune factors in fish skin mucus, including immunoglobulins, antimicrobial peptides, lysozymes, protease, lectins, C-reactive protein and complement proteins [9][10][11][12][13][14][15][16][17][18][19][20].Certain substances in fish mucus have been reported to function in promoting wound healing [8,21].Thus, the importance of fish skin mucus has attracted extensive research interests in recent years.However, due to the lack of genomic and transcriptomic data for non-model fish species, identification and characterization of bioactive substances produced by fish epidermis have been conducted using conventional biochemical methods, which restricts the ability of researchers to uncover the full repertoire of mucous substances.Therefore, molecular mechanisms underlying the development and maturation of mucous cells, the synthesis and release of mucus bioactive products, and the responses of mucus cells to environmental stressors and pathogens remain largely unknown.
Mud loach (Misgurnus anguillicaudatus) belongs to Actinopterygii Cypriniformes Cobitidae (http://www.fishbase.org),which is a freshwater fish widely spread in eastern Asia.The loach skin is composed of many mucous cells, which can produce a mucin rich mucus layer on the body surface.Mud loach is commercially important in China, both for traditional Chinese medicine and food purposes [22].The cultured output of mud loach in China mainland has reached 204552 tons in 2010 [23].A peptide prepared from the muscle [24] and a polysaccharide (named MAP) isolated from the skin mucus of mud loach exhibited antiproliferative and apoptotic effects on human cancer cell lines [22,25].The polysaccharide MAP also displayed protective effects on immunological liver injury in mice [26].Several antimicrobial peptides including misgurin [27], hepcidin [28] MAPP [29] were also identified from the whole body homogenates of mud loach.Despite the economical and medical significance of mud loach, limited genetic resources are currently available and only 22158 ESTs for this species can be found in public database.
In this study, we aim to assemble and characterize the skin transcriptome of mud loach using RNA-seq.The sequencing was performed using Illumina Genome Analyzer IIx platform.The high quality reads were subjected to de novo assembly.The performances of assembly tools including Oases [38], Trinity [39] and SOAPdenovo-Trans (http://soap.genomics.org.cn/index.html) were compared and the data set produced by Oases were used for subsequent analyses.A total of 40364 unigenes (.100 bp) were finally assembled from the filtered short reads and the quality of the assembled transcriptome was confirmed by RT-PCR and Sanger sequencing.Blastx searches against the NCBI nonredundant (nr) protein database annotated 17336 sequences and 86.9% of these sequences exhibited high homology with fish proteins.Gene ontology and KEGG orthology annotations identified GO terms and KEGG pathways highly represented in the skin transcriptome of mud loach.The data presented here will lay the foundation for investigation of the molecular mechanisms underlying the immune functions of fish skin and the epidermis mucus production.

Illumina paired-end sequencing and de novo assembly
To characterize the skin transcriptome of mud loach, total RNA samples isolated from the skin of mud loach were subjected to library construction and high-throughput sequencing using the Illumina GA IIx platform.The main steps and bioinformatics tools used for data analysis were shown in Figure 1.A total of 111.0 M data for 36 bp paired-end reads (3.99 Gbp) were generated in this study.After trimming and filtering the raw reads by PRINSEQ with strict quality score threshold (Q$20, length$31 bp), 86.7 M of high quality reads (3.07 Gbp, 76.94%) were retained (Figure 1).
De novo assembly was performed using transcriptome assembly tools including Oases, Trinty and SOAPdenovo-Trans and performances of these tools were assessed according to N50 value, mean length, maximum length and transcript/scaffold numbers.Multiple k-mer values can be used by Oases and SOAPdenovo-Trans, while the version of Trinity used in this study takes a single k-mer value of 25.When compared to those from other two assembling tools at the same k-mer value of 25, the dataset generated by Trinity contained the highest number of transcripts and total bases, but exhibited the lowest continuity parameters (Table S1).SOAPdenovo-Trans gave higher continuity parameters including mean sequence length, maximum length and N50 value than Oases at all k-mer values; however, Oases used more reads and produced more transcripts.In addition, the transcripts assembled by Oases contained no gaps, but considerable portion of gaps (1.36-7.5%)were included in the assembly produced by SOAPdenovo-Trans (Table S1).As the optimal assembly of transcripts with different abundance requires different k-mer values, the combination of transcripts generated at different k-mer values would markedly improve the quality of assembly [40].The merge of transcripts from different k-mer assemblies can be performed with Oases [38].After the merging process, the contiguity parameters of the Oases assembly appear to be comparable with those of the best SOAPdenovo-Trans assembly (Table S1).
Therefore, the merged transcript datasets from Oases assembly were used for subsequent analysis.The transcripts were further assembled and clustered using the TIGR Gene Indices clustering tools (TGICL) [41] with default parameters to reduce the data redundancy.The longest sequence in each cluster and the singletons were retained and designated as unigenes.Taken together, a total of 40364 unigenes were assembled.The mean length, N50 value and maximum length of assembled unigenes were 387, 611 and 8670 bp, respectively (Table 1).Length distribution of the assembled unigenes is displayed in Figure 2. The majority of sequences (71.99%) are ranged from 100 to 400 bp and 3257 (8.07%) unigenes are longer than 1 kb.These results are comparable with those of previous studies [42,43].

Quality assessment of the assembly
To evaluate the quality of the assembly, the unigenes were first aligned with the mRNA sequences of Misgurnus genus available in GenBank database using the blastn tool.Among the 57 mRNA sequences for M. anguillicaudatus, M. mizolepis and M. fossilis, 24 (42.11%) of them identified counterparts (e-values from 5E-70 to 0) in the mud loach skin transcriptome with sequence identities ranging from 88.35% to 99.85%.The missing of some sequences in our assembly may be attributed to the low abundance of these transcripts in the skin of mud loach.Additionally, more than 90% of the full length for 12 out of 17 mRNA sequences (70.59%) with complete CDS was found in the assembled skin transcriptome of mud loach (Table S2).
The sequence accuracy of assembled unigenes was further examined using RT-PCR and Sanger sequencing.A total of 25 unigenes with homology to known proteins were selected for this purpose.The primer sequences, size of amplicons, sequence description, length and abundance of these unigenes are displayed in Table S3.Total RNA samples isolated from skin, brain, gill, muscle, liver, intestine, testis and kidney were subjected to cDNA synthesis followed by RT-PCR analysis.Most of RT-PCR reactions at the first time produced specific amplicons, but nonspecific products were detected in reactions for two unigenes.However, these two unigenes were specifically amplified using redesigned primers, indicating that the previous unsuccessful amplification may be ascribed to the non-specificity of primers or assembly errors at the primer sites.The PCR products were sequenced and the sequencing data were aligned with the corresponding unigenes.As shown in Table S3, all the PCR products demonstrate the desired size and no insertion or deletion events exist in these sequences, but substitutions with an overall rate of 1.19% (199 out 16718) are found probably due to alternative splicing of exons or sequencing errors.Furthermore, most of these unigenes were extensively expressed in multiple tissues (Figure S1).These results indicate a high quality transcriptome of mud loach skin has been obtained through this study.

Expression abundance of assembled unigenes
The abundance information is useful for understanding the function of identified genes.Relative abundance of the assembled unigenes was calculated using RSEM, which was reported to be accurate and powerful in abundance estimation of sequences from RNA-seq experiments, and was particularly suitable for de novo assemblies [44,45].When two mismatches are allowed in the mapping seed, a total of 68579975 (79.14%) reads were mapped back to the assembly with at least one alignment and the average sequencing coverage is 158-fold for the unigenes assembled in this study.The abundance of unigenes were expressed as TPM (Transcripts per million).This parameter is independent of the mean length of expressed transcripts and comparable across samples and species [44,45].The relationships between the length and abundance of unigenes are displayed in Figure 3 and most of the unigenes have an abundance value less than 400 TPM (Figure 3, Table S4).The top ten most abundant unigenes are displayed in Table 2 and the most abundant unigene was annotated as ictacalcin according to the result of blast search against the NCBI nr protein database (e-value = 1.83E-29).This  gene encodes a calcium binding protein and was reported to be highly expressed in the skin of zebrafish embryos [46].The rest of most abundant unigenes were annotated as keratin, ribosomal proteins, cytochrome c oxidase subunit, senescence-associated protein and ATP synthase subunit beta, respectively (Table 2).
Obviously, these data will be of great value for further investigation of the physiological functions of fish skin.

Functional annotation of assembled unigenes
To further understand the mud loach skin transcriptome, the assembled unigenes were aligned to the NCBI nr protein database and zebrafish transcriptome.The number of unigenes with significant blast hits (e-value,1E-5) to known proteins and zebrafish transcripts were 17336 (43.76%) and 7615 (18.87%), respectively.The descriptions of best hits are shown in Table S4 and the e-value distributions of blast hits are displayed in Figure 4A.6.53% of the blast hits against the nr protein database exhibited an excellent match (E-value = 0) and 60.76% of the sequences were annotated with blast E-values from E-6 to E-50, while these data were 29.01%and 24.82% for the blast hits from zebrafish transcriptome.These results suggest the relatively high homology of mud loach unigenes to zebrafish transcripts.The species distributions of best blast hits are shown in Figure 4B.15062 (86.88%) blast hits against the nr protein database were originated from fish species and 779 (4.49%) hits were from other vertebrates including mammals, birds and reptiles.The remaining 1495 (8.62%) sequences were from other species mainly microorganisms.The relatively high ratio of blast hits from microorganism sequences may be ascribed to the attachment of microorganisms to the skin of mud loach and the high sequencing depth in this study.
The results of blastx searches against the nr protein database were imported into Blast2GO [47] for GO mapping and annotation.A total of 108513 GO terms were assigned to 15369 unigenes (38.08% of the total assembled sequences).The numbers of mapped GO terms for biological process, molecular function and cellular component were 52231, 27565 and 28717, respectively.The GO classifications at level 2 of assembled unigenes are displayed in Figure 5. Cellular process (8748, 21.67%) and metabolic process (8355, 20.70%) were the most highly represented terms in the category of biological process, followed by biological regulation (5472, 13.56%), developmental process (3361, 8.33%), cellular component organization (3312, 8.21%), multicellular organismal process (3178, 7.87%), localization (2764, 6.85%), establishment of localization (2754, 6.82%), response to stimulus (2751, 6.82%) and pigmentation (2638, 6.54%).These data indicate that the genes expressed in mud loach skin are involved in a wide variety of biological processes.The most abundant molecular function ontologies were binding (10328, 25.59%) and catalytic (6122, 15.17%), whereas the remaining GO terms such as molecular transducer activity and transcription regulator activity were assigned to far less unigenes.In the cellular component category, cell (11679, 28.93%), cell part (10775, 26.69%) and organelle (8289, 20.54%) were the most abundant terms (Figure 5).
KEGG annotation of the assembled unigenes was performed using the KAAS [48].A total of 9337 (23.23%) assembled unigenes were assigned with KO identifiers and 213 pathways were associated with more than 5 unigenes.The well represented pathways were focal adhesion (295), protein processing in endoplasmic reticulum (291), regulation of actin cytoskeleton (271), endocytosis (264), phagosome (264), RNA transport (257), MAPK signaling pathway (240), spliceosome (237) and ribosome (228) (Table S5).The number and ratio of sequences associated with each sub-category in the five top KO categories including metabolism, genetic information processing, environmental information processing, cellular processes and organismal systems, are shown in Figure 6.Among the identified functional categories, immune system (866), signal transduction (808), transport and catabolism (783), translation (763) and folding, sorting and degradation (716) were the most highly represented ones, followed by cell communication (532), carbohydrate metabolism (456), endocrine system (452) and cell growth and death (426).The results of GO and KEGG annotations provide firsthand information for investigation of the tissue-specific processes and functions of fish skin.

Identification of non-coding RNAs
Given the existence of unigenes that were not mapped to known proteins and zebrafish transcripts by blast searches, we characterized the non-coding RNAs in the assembled dataset by searching against the Rfam database [49].The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs) [49].As shown in Table S6, a total of 138 unigenes without significant blast hits in the nr protein database were identified as putative non-coding RNAs (e-value,1E-3).Small nucleolar RNA and micro RNA were the most abundant ncRNA families with 26 identified members in each group and the number of associated unigenes was 27 and 30 (Table 3).Additionally, 33, 11, 6, 6, 3 and 3 unigenes were mapped to unaL2 LINE 39 element, 5S ribosomal RNA, 5.8S ribosomal RNA, selenocysteine insertion sequence 1, tRNA and let-7 microRNA precursor, respectively (Table 3).

Immune pathways annotated from mud loach skin transcriptome
Although immune genes and pathways in fish tissues such as gill, liver, spleen, head kidney and larvae of turbot [50] and head kidney of grass carp [51] have been previously characterized using RNA-seq, the immune factors in the mud loach skin remains unknown.As shown in Figure 6, immune system was the most highly represented KO sub-category from the mud loach skin transcriptome, suggesting a large number of genes expressed in the skin are associated with immunity.All the immune pathways annotated in the mud loach skin transcriptome are listed in Table 4. Pathways associated with the highest number of unigenes were leukocyte transendothelial migration (216) and chemokine signaling (208), followed by Fc gamma R-mediated phagocytosis (145), T cell receptor signaling (141), antigen processing and presentation (139), natural killer cell mediated cytotoxicity (117) and Toll-like receptor signaling pathway (114).Among these immune pathways, Fc gamma R-mediated phagocytosis (42/55, 76.36%) and antigen processing and presentation (31/41, 75.61%) contained the highest ratios of identified genes versus the total number of known genes in the reference pathway.Thus, these results have provided an overview of the pathways involved in the immune functions of mud loach skin.

Enzymes involved in the mucin type O-glycan biosynthesis pathway
Mucins are highly O-glycosylated glycoproteins ubiquitous in mucous secretions on cell surfaces [52].Although the functions of mucins are well known and several fish mucin genes have been cloned [53], enzymes involved in the biosynthesis of mucins in the epidermal of fish remains to be characterized.A total of 37 unigenes from the mud loach skin transcriptome were mapped to 7 enzymes in the mucin type O-glycan biosynthesis pathway.The name of mapped enzymes, EC number and identity with corresponding enzyme from zebrafish are listed in Table 5. 24 unigenes assigned to the enzyme GALNT (EC2.4.1.41)were 82.86 to 98.59% homologous with the zebrafish N-acetylgalactosaminyltransferase family members including Galnt4, 5, 6, 7, 8, 11, 12 and 14.Other enzymes such as glycoprotein-N-acetylgalactosamine 3beta-galactosyltransferase (EC2.4.1.122),beta-galactoside alpha-2, 3-sialyltransferase (sialyltransferase 4A, EC2.4.99.4) and C1GALT1-specific chaperone 1 (EC2.4.1.-)were mapped by 2 or 3 unigenes with sequence identity from 52.05% to 95.45%.These results provide interesting clues for the characterization of enzymes responsible for mucin biosynthesis in mucous cells of fish skin.

Genes associated with the SNARE interactions in vesicular transport pathway
Like goblet cells in the airways and intestine of mammals, mucus secreting cells in fish epidermis package their products in Figure 6.KEGG orthology (KO) classification of assembled unigenes.KO functional annotation was performed using KEGG Automatic Annotation Server (KAAS).The sequences of assembled unigenes were submitted to KAAS and the homology to KEGG genes were calculated using single-directional best hit (SBH) method.KO functional hierarchy of mapped pathways was viewed by KegHier software downloaded from the KEGG web site.The number and ratio of sequences assigned to each sub-category of the five top KO categories, namely metabolism (red), genetic information processing (blue), environmental information processing (green), cellular processes (organe) and organismal systems (purple), were calculated and displayed.doi:10.1371/journal.pone.0056998.g006secreting vesicles and release the contents through exocytosis [4,54,55].Vesicle trafficking from endoplasmic reticulum to Golgi apparatus and then from Golgi apparatus to plasma membrane represent the main events during mucus secretion [55].SNAREs are the key factors mediating membrane fusion between vesicles and their target membrane [56] and therefore play important roles in mucus secretion.In this study, a total of 38 unigenes were mapped to 23 genes of the SNARE interactions in vesicular transport pathway and only 4 factors in this pathway were not covered by the skin transcriptome of mud loach (Table S5 and Figure 7).The identified genes in this pathway included 9 syntaxins (Stx), 5 vesicle-associated membrane proteins (VAMP), 2 synaptosomal-associated proteins (SANP), 2 vesicle transport protein (Sec), 2 golgi SNAP receptor complex members (Gos1 and Bos1), Ykt6, Vti1 and Bet1 (Figure 7).These findings indicate the high activity of vesicular transport pathway in the epidermis of mud loach.

Discussion
Fish skin plays various vital functions especially in immunity and defense against invading pathogens and environmental stressors [5,57].Recently, the immune functions of fish skin have attracted intensive interests of the research community, and a large number of antimicrobial and bioactive substances have been identified in the skin mucus of fish [15,25,58].However, the molecular mechanisms underlying the biosynthesis, packaging and releasing of substances in skin mucus remain largely unknown due to the lack of genomic information such as the high quality skin transcriptome for economically important fish species.Mud loach, a freshwater fish with plentiful epidermal mucus on the body surface, has a relatively small body size and a strong ability of resistance to environmental stressors, and is amenable to experimental manipulations.Thus, mud loach has the potential to be a fish model for investigation of skin immunity and mucus secretion.Moreover, mud loach has long been used for food and medical purposes in eastern Asia.Therefore, characterization of its transcriptome will be of great value for the breeding, cultivation and disease prevention of this species.
In this study, we characterized the skin transcriptome of mud loach using Illumina paired-end RNA sequencing.Similar to the previous studies investigating the bark transcriptome of rubber tree [43] and the root transcriptome of sweet potato [59], this study has focused on the skin tissue and produced relatively short read length (36 bp), which is cost-effective and allows to increase the sequencing depth (111.0M raw reads, 158-fold coverage) and strict read filtering thresholds.To improve the assembly, we compared the performances of three trancriptome assembly tools including Oases, Trinity and SOAPdenovo-trans.Oases was found to perform well than the other two tools in terms of used reads, assembled transcript numbers, total bases, continuity parameters and ratio of gaps.This may be ascribed to the combination of transcripts assembled with different k-mer values at the merging step of Oases since it has been previously reported that this approach can improve the performance of de novo assembly [40].It is possible that Oases is more suitable for assembling relatively short reads.The assembly generated by Oases was thus used for subsequent analyses.After reducing the redundancy with TGICL clustering, a total of 40364 unigenes were finally assembled.The sequence accuracy of this assembly was independently confirmed by blast searches against available loach mRNA sequences deposited in GenBank database, RT-PCR and Sanger sequencing.The maximum length of assembled sequences is 8670 bp, indicating that extremely long transcripts are missing from the assembly; this may be attributed to the relatively short sequencing reads generated in this study.
A large number of unigenes were annotated by blast searches against the NCBI nr protein database (17336, 43.76%) and zebrafish transcriptome (7615, 18.87%), GO mapping (15369, 38.08%) and KEGG annotation (9337, 23.23%).The classification of GO categories including biological process, molecular function and cellular component provide overall information about biological functions of genes expressed in the skin of mud loach.The ratio of annotated sequences and the highly represented GO terms were comparable with previous studies performed with nonmodel species [32,43].The KO system provides a resource to link genomic data to higher-level systemic and functional information through KEGG PATHWAY and BRITE mappings [60].The assembled unigenes were found to be involved in a wide array of KEGG pathways and KO categories.The predominantly represented pathways such as focal adhesion, regulation of actin cytoskeleton, endocytosis and phagosome all belong to the cellular processes of KO modules and are associated with molecular functions including cell communication, cell motility, cell proliferation, cell differentiation, intracellular transport and phagocytosis [61,62], suggesting that multiple cellular events are active in the skin of fish.Other highly represented pathways like RNA transport, spliceosome, ribosome and protein processing in endoplasmic reticulum are involved in the gene expression processes including transcription, translation and protein folding, sorting and degradation.The activities of these biological processes may be the basis of the quick protein biosynthesis and secreting ability of fish epidermis.
The ability of fish to secret antimicrobial materials into skin mucus has recently attracted heavy research interests and a large array of antimicrobial peptides, lectins and antibacterial enzymes have been identified in fish epidermal mucus [6,12,18,58].However, the signaling pathways involved in fish skin immune responses remain to be defined.In this study, a total of 866 unigenes that represent the largest KO category were found to be associated with immune system (Figure 6).Multiple immune pathways were identified to be active in the skin of mud loach (Table 4).These pathways are known to be involved in various aspects of immune processes such as leukocyte transendothelial migration, immune signal transduction, phagocytosis, antigen processing and presentation.Moreover, many innate immune components such as lectins, lysozymes, cathepsins and complement factors were also identified in the mud loach skin transcriptome (Table S4).RNA-seq technology has been used to mine immune-related genes and pathways in several fish species [50,51,63].Immune pathways including complement pathway, Toll-like receptor signaling pathway, B cell receptor signaling pathway, T cell receptor signaling pathway, apoptosis or programmed cell death and cytokines have been identified in the transcriptome generated using pooled RNA samples from gill, liver, spleen, head kidney and brain of turbot (Scophthalmus maximus), which was challenged with viruses [50].In the spleen of large yellow croaker (Pseudosciaena crocea) infected with Aeromonas hydrophila, chemokine signaling, T cell receptor signaling, Leukocyte transendothelial migration and B cell receptor signaling pathway are found to be differentially regulated [63].Findings from this study provide a full view of the immune components equipped with fish skin.Mucins are the main constituents of fish skin mucus, which can form a matrix to trap and hold a wide range of antimicrobial molecules and are responsible for the gel-like nature of mucus.Mucins are high molecular weight glycoproteins characterized by their extensive O-glycosylation [52].The O-glycosylation modification of mucins begins with the addition of N-acetylgalactosamine (GalNAc) to serine or threonine on the tandem repeats by activity of polypeptide N-acetylgalactosaminyltransferases [64].The glycosyltransferases responsible for the elongation of O-glycan chain by sequential addition of carbohydrates are expressed in a cell-and tissue-specific manner [65].The skin transcriptome of Atlantic salmon (Salmo salar) was recently assembled and several mucin genes were identified [66].However, the enzymes involved in the biosynthesis of fish skin mucin remain not well known.In this study, 7 enzymes associated with mucin type O-glycan biosynthesis were identified from the mud loach skin transcriptome (Table 5).A total of 8 members in the polypeptide Nacetylgalactosaminyltransferases (EC2.4.1.41)family were found to be expressed in the skin of mud loach.These enzymes are active at the first step of O-glycosylation and belong to a large enzyme family containing at least 15 distinct members with tissue-specific expression and different substrate specificities [67].The high redundancy of these enzymes in mud loach skin indicates the complexity of mucins produced by fish skin.
Similar to mucus secreting cells in the airways and intestine of mammals, goblet cells and club cells in fish epidermis package their products in secreting vesicles and release the contents through exocytosis [4,55].SNARE proteins are the key determinants that mediate the fusion of carrier vesicles with the target membrane during the vesicle traffic between cellular compartments [56].In this study, nearly all the SNARE genes in vesicular transport pathway were identified from the mud loach skin transcriptome (Figure 7), suggesting the high activity of mucus secretion in mud loach epidermis.In the SNARE system, the vesicle membrane protein named v-SNARE interacts with the target membrane protein dubbed t-SNARE to form SNARE complex [56].Among the identified SNARE molecules, the syntaxins (Stx), synaptosomal-associated proteins (SNAP) and Bet1 are t-SNAREs; while the vesicle-associated membrane proteins (VAMP), vesicle transport proteins (Sec), Ykt6 and Vti1 are v-SNAREs [68,69].The cell organelle specific distribution of these SNAREs accounts for the distinct vesicle transport events between different cellular compartments.Although all the key molecules in various vesicle traffic steps were suggested to be conserved from yeast to human [56], the regulation of mucus storage and release in fish epidermal mucus secreting cells may be quite different from the well characterized synaptic vesicle exocytosis.Thus, the results of this study provide valuable clues for charactering the responses of vesicular transport pathway in fish epidermis to various environmental stressors.

Conclusions
In this study, we have assembled and characterized the skin transcriptome of mud loach using Illumina paired-end RNA sequencing.The assembling efforts generated 40364 unigenes.The sequence accuracy of this assembly was confirmed by RT-PCR and Sanger sequencing.A large portion (43.76%) of assembled unigenes was annotated by Blast searches.Functional classifications in terms of GO and KO have identified multiple biological processes and signaling pathways including those for immune systems and mucin biosynthesis in the skin of mud loach.To our knowledge, this is the first effort on assembling the transcriptome of mud loach skin.The data presented here will provide valuable resources for functional genomics of mud loach and investigation of mechanisms underlying the immune responses and mucus secretion of fish skin.

Ethics Statement
The animal protocol for this study was approved by the Institutional Animal Care and Use Committee of Institute of Hydrobiology (Approval ID: Y25E051501).

Samples and RNA extraction
Mud loach (average body weight 15.4262.63g) were purchased from a local fishery market at Wuhan, China and acclimated to laboratory conditions for 2 weeks.The fish were anesthetized by placing into ice-slurry for 3 minutes and anesthesia was judged by loss of equilibrium [70,71].The fish were then euthanized by incising between the skull and the first cervical vertebra using a scalpel (cervical dislocation) [72].Skin samples from the region between the dorsal fin and lateral line were collected with scissors and forceps.Total RNA was extracted with TRIZOL reagent from Invitrogen following the manufacturer's instructions.Total RNA contents were determined using the NanoDrop 8000 from Thermo Scientific.The quality of RNA samples was assessed by agarose gel electrophoresis.Equal volumes of total RNA from three individuals were combined and used for RNA-seq analysis.
cDNA library construction and sequencing cDNA library construction was performed by SinoGenoMax Co., Ltd, Beijing, China (http://www.sinogenomax.com/).Before library construction, the integrity of RNA samples was confirmed using Agilent 2100 Bioanalyzer and 4 mg of total RNA was used for isolation of polyA RNA with Sera-mag Magnetic Oligo (dT) beads from Illumina.The purified mRNA was fragmented into small pieces (100-400 bp) using divalent cations at 94uC for 5 minutes.Double-stranded cDNA was synthesized using the SuperScript Double-Stranded cDNA Synthesis kit (Invitrogen, Camarillo, CA) with random hexamer primers from Illumina.The synthesized cDNA was subjected to end-repair, phosphorylation, 39 adenylation and adapter ligation in sequential.After these steps, cDNA fragments ranging from 250 to 350 bp were collected and purified by gel electrophoresis.Then, the purified cDNA template was enriched by PCR amplification and the quality of the library was validated in a LightCycler480 (Roche Diagnostics) using an Illumina PhiX174 Control.High throughput sequencing was performed by the Analytical & Testing Center at Institute of Hydrobiology, Chinese Academy of Sciences (http://www.ihb.ac.cn/fxcszx/).The constructed library was sequenced for 36 bp at both ends using an Illumina Genome Analyzer IIx platform according to the standard Illumina protocols as reported previously [59].Sequencing experiments of constructed loach skin cDNA library generated a total of 111.0 M paired-end reads.The sequencing data have been deposited in NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/sra) with an accession number SRA057415.This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GAAD00000000.

Data preprocessing and de novo assembly
The raw read data was preprocessed using PRINSEQ (version 0.19.3)[73].The low quality (Q,20) and ambiguous bases (N) were first trimmed from both ends of the reads and then trimmed reads were filtered with Phred quality score (Q$20 for all bases) and read length ($31 bp).The trimming process spared reads with low quality bases at the ends.Paired reads were selected from preprocessed data sets using a perl script cmpfastq (http:// compbio.brc.iop.kcl.ac.uk/software/cmpfastq.php)and a total of 74.7 M paired and 11.9 M single reads were obtained.The high quality reads were assembled using Velvet (version 1.2.07) followed by Oases (version 0.2.08) [38,74], Trinity (r2012-06-08) [39] and SOAPdenovo-trans (release 1.01, http://soap.genomics.org.cn/index.html),respectively.The process of Oases assembling includes several steps [38].The sequencing reads were first assembled using Velvet at distinct k-mer values (19, 21, 23, 25, 27, 29 and 31) and the contigs produced by Velvet at each k-mer value were further assembled into transcripts using Oases.Finally, the transcript data sets assembled at different k-mer values were merged using Oases with default settings (k-mer = 27).SOAPdenovo-Trans analysis was performed at the same k-mer values as with Velvet/Oases, while the k-mer value (k-mer = 25) was fixed in the version of Trinity used in this study.The performances of these assembly tools were assessed according to parameters including N50 value, mean length, maximum length and transcript/scaffold number.The data sets produced by Velvet/Oases were selected for subsequent analyses.To reduce data redundancy, the transcripts were assembled and clustered using TGICL [41] with default parameters.The longest sequence in each cluster and singletons were reserved and designated as unigenes.

Quality assessment of the assembly
To assess the quality of sequences assembled in this study, the assembled unigenes were aligned to mRNA sequences of Misgurnus genus available in GenBank database using blast tools (version 2.2.26, http://www.ncbi.nlm.nih.gov/).Furthermore, 25 unigenes homologous to known proteins were validated by RT-PCR and Sanger sequencing.Total RNA samples were isolated from tissues including skin, brain, gill, muscle, liver, intestine, testes and kidney.cDNA template were synthesized using RevertAidTM First Strand cDNA Synthesis Kit from Fermentas.PCR primers were purchased from Sangon Biotech Co., Ltd., Shanghai, China.Specific PCR products were cloned and subjected to Sanger sequencing.The primer sequence, amplicon size and sequence description are displayed in Table S3.

Abundance estimation
The relative abundance of assembled unigenes was calculated using RSEM (version 1.1.19)[45].The unique feature of this tool is that it does not rely on the existence of a reference genome and it is therefore particularly useful for quantification with de novo transcriptome assemblies [45].The preprocessed reads were first aligned to the reference generated from the assembled unigenes with single-end read mode.Then, the relative abundances of unigenes expressed as TPM were calculated from the alignment results.

Functional annotation
The assembled unigenes were searched against the NCBI nr protein database (released on April 15, 2012 at http://www.ncbi.nlm.nih.gov) and zebrafish transcriptome (released on July 24, 2012 at http://www.ncbi.nlm.nih.gov) using blastx and blastn tools, respectively.E-value,1E-5 indicates the sequence conservation and the best aligning results were used to annotate the unigenes.The outputs of blast searching against the NCBI nr protein database were imported into Blast2GO program [47] for GO term mapping.The results of Blast2GO analysis were submitted to the WEGO [75] for GO classification under the biological process, molecular function and cellular component ontologies.KEGG annotation was performed using the singledirectional best-hit (SBH) method in KAAS web server [48].This tool is able to assign KEGG Orthology (KO) identifiers or K numbers to query sequences according to the sequence similarity and perform the pathway mapping and BRITE mapping processes [48].The KO system is structured as a four level hierarchy.The top level consists of the following six categories: metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems and human diseases.Each top level category contains a wide arrange of sub-categories (the second level).The third level corresponds directly to the KEGG pathways, and the fourth level consists of the leaf nodes representing the functional terms [60].

Identification of non-coding RNAs and EST-SSRs
To identify putative non-coding RNAs from the assembled unigenes, sequences without significant blast hits in the nr protein database were searched against the Rfam non-coding RNA database using Rfam 11.0 (http://rfam.sanger.ac.uk/) [49].Potential EST-SSR markers were detected within the assembled unigenes using the MISA microsatellite identification tool (http:// pgrc.ipk-gatersleben.de/misa/).The parameters were adjusted for identification of perfect di-, tri-, tetra-, penta-, and hexanucleotide motifs with a minimum of 6, 5, 4, 4, and 4 repeats, respectively.Mononucleotide repeats were ignored since it is difficult to distinguish genuine mononucleotide repeats from the polyadenylation products.

Figure 1 .
Figure 1.Flow chart of data analysis.The main steps and bioinformatics tools used for data analysis, volumes of raw and preprocessed data and number of assembled unigenes were displayed.doi:10.1371/journal.pone.0056998.g001

Figure 3 .
Figure 3. Relationship between the length and abundance of assembled unigenes.TPM: transcripts per million.The dashed line in the figure indicates that the abundances for most of the assembled unigenes were less than 400 TPM.doi:10.1371/journal.pone.0056998.g003

Figure 4 .
Figure 4. Blast e-value and distribution of top hit species.(A) Distribution of e-value for top hits obtained by blast searches of the unigenes against the NCBI nr protein database and zebrafish transcriptome.(B) Distribution of top hit species by blast searches against the NCBI nr protein database.doi:10.1371/journal.pone.0056998.g004

Figure 5 .
Figure 5. Gene ontology (GO) classification of assembled unigenes.Results of blastx searches against the NCBI nr protein database were imported into Blast2GO software for GO term mapping and annotation.Annotation results from Blast2GO analysis were submitted to the WEGO (http://wego.genomics.org.cn/cgi-bin/wego/index.pl) web server for GO classification.The number and ratio of sequences assigned to level 2 GO terms from sub GO categories including biological process, molecular function and cellular component were shown.doi:10.1371/journal.pone.0056998.g005

Figure 7 .
Figure 7. Map of the SNARE interactions in vesicular transport pathway.Genes identified from the transcriptome of mud loach skin were shown in red.doi:10.1371/journal.pone.0056998.g007

Figure 8 .Table 6 .
Figure 8. Frequency distribution of EST-SSRs based on motif types.A total of 162 EST-SSR motifs were identified and frequencies of main motif types were displayed.doi:10.1371/journal.pone.0056998.g008

Table 1 .
Statistics of the assembled transcripts and unigenes.
TPM: transcripts per million.The NCBI nr protein database, zebrafish transcriptome and NCBI nr nucleotide database were searched in sequential to annotate these unigenes.doi:10.1371/journal.pone.0056998.t002

Table 3 .
Classification of identified non-coding RNAs.

Table 4 .
Immune pathways annotated in the mud loach skin transcriptome.

Table 5 .
Enzymes involved in mucin type O-glycan biosynthesis.

Table S1
Statistics of the assembly generated by Oases, Trinity or SOAPdenovo-Trans.(XLSX)TableS4Length, abundance and description of the best blast hits of assembled unigenes.(XLSX)Table S5 Summary of unigenes involved in the KEGG pathways and KO categories.(XLSX)TableS6 Non-coding RNAs identified through searching against Rfam database.(XLSX)