Transcriptome profiling of claw muscle of the mud crab (Scylla paramamosain) at different fattening stages

In crustaceans, muscle growth and development is complicated, and to date substantial knowledge gaps exist. In this study, the claw muscle, hepatopancreas and nervous tissue of the mud crab (Scylla paramamosain) were collected at three fattening stages for sequence by the Illumina sequencing. A total of 127.87 Gb clean data with no less than 3.94 Gb generated for each sample and the cycleQ30 percentages were more than 86.13% for all samples. De Bruijn assembly of these clean data produced 94,853 unigenes, thereinto, 50,059 unigenes were found in claw muscle. A total of 121 differentially expressed genes (DEGs) were revealed in claw muscle from the three fattening stages with a Padj value < 0.01, including 63 genes with annotation. Functional annotation and enrichment analysis showed that the DEGs clusters represented the predominant gene catalog with roles in biochemical processes (glycolysis, phosphorylation and regulation of transcription), molecular function (ATP binding, 6-phosphofructokinase activity, and sequence-specific DNA binding) and cellular component (6-phosphofructokinase complex, plasma membrane, and integral component of membrane). qRT-PCR was employed to further validate certain DEGs. Single nucleotide polymorphism (SNP) analysis obtained 159,322, 125,963 and 166,279 potential SNPs from the muscle transcriptome at stage B, stage C and stage D, respectively. In addition, there were sixteen neuropeptide transcripts being predicted in the claw muscle. The present study provides a comprehensive transcriptome of claw muscle of S. paramamosain during fattening, providing a basis for screening the functional genes that may affect muscle growth of S. paramamosain.


Introduction
The mud crab Scylla paramamosain belonging to genus Scylla, is widely distributed along inshore region in the southeast China coasts and other Asian countries [1]. Due to the short growth cycle and the large market demand, the mud crab has high commercial value in farming [2]. The aquaculture of this species has emerged more than 100 years in China and more than 30 years in other Asian countries [1,3]. The lower yields own to many factors, such as growth, nutrition, diseases, and administration patterns [4]. Therefore, research into mechanism underlying the growth and development is of obvious importance. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 algorithm [12]. All clean reads for de bruijn assembly were deposited in GenBank, National Centre for Biotechnology Information (NCBI) under the Accession No. PRJNA389966.

Functional annotation
The unigenes were identified based on sequence similarity with known proteins and using a BLASTX (E 1e-5) search against the Non-Redundant (NR), Swiss-Prot, Gene Ontology (GO), Clusters of Orthologous Groups (KOG). The KEGG Orthology of each unigene was analyzed using KOBAS 2.0 software [13]. Unigene annotation information was obtained using HMMER (E 1e-10) search against the Pfam database with forecast of unigene amino acid sequence [14].

Structural and expression analysis
Reliable potential CDS regions from the transcript sequences were identified by TransDecoder software, based on the length of open reading frame (ORF), the Log-likelihood Score, the amino acid sequence alignments with protein domain sequence in Pfam database and other information. Only reliable, STAR mapped reads were considered for SNPs detection. SNPs were called using GATK [15]. SNPs were qualified by continuous single nucleotide mismatch within 35 bp range no more than 3 and the value of standardized serialized SNP greater than 2.0. SNP markers were divided into homozygous-(only one allele) and heterozygous-SNP (two or more alleles). SNPs density in Unigenes was computed. All clean reads were aligned with Unigene database using Bowtie [16], followed performing the estimation of expression level combined RSEM [17]. The fragments per kilobase of exon model per million mapped reads [18] (FPKM) value indicates the corresponding unigene expression abundance.

Identification and validation of differentially expressed genes
Correlation analysis was conducted with Pearson's Correlation Coefficient (r) [19] to assess the relevance of biological replicates in the same condition. DESeq [20] was performed to analyze the differentially expressed genes (DEGs) between the samples in two conditions. Benjamini-Hochberg was employed to correct the significant value (p-value) for hypothesis test to reduce false positives. False discovery rate (FDR<0.01) and fold change (FC!2) adopted as the key thresholds in difference expression genetic screening. Do hierarchical clusters to show differential expression patterns of gene sets under different experimental conditions. DEGs annotated in GO annotation database were enrichment analysis using the topGO software. COG statistical classification and KEGG annotation of DEGs were also performed. The significant enrichment of pathways was analyzed using the method of Fisher exact test.
Total RNA from claw muscle of specimens in three stages were extracted as described above. Approximately 2 μg RNA were used to the reverse transcription of cDNA. qRT-PCR was carry out in the 7500 Fast Real-Time PCR (Applied Biosystems) with 2×SYBR Select Master Mix (Applied Biosystems) to validate 10 DEGs (8 in the comparison of stage C and stage D, 2 in the comparison of stage B and stage C) expressed transcripts. Primers were designed using Primer5.0 Tool (PREMIER Biosoft International, Palo Alto, CA) with housekeeping gene 18s RNA as standard gene (Table 1). PCR reactions were performed under the following conditions: 95˚C for 30 s, 40 cycles of 95˚C for 5 s, 58˚C for 30 s and 72˚C for 30 s. Six biologic repetitions and three technical repetitions were performed in this study. The ultrapure water was the template in blank control. The 2 -ΔΔCt method was used to calculate the gene expression, and Ct values were the mean values of six biologic replicates [21]. The relative expression ratio was represented as mean±SD. All statistical analyses were performed using SPSS 18.0, including Duncan's multiple range tests and the significance of differences analyzed using one-way ANOVA.

Transcriptome sequencing and read assembly
In this study, 27 cDNA libraries of S. paramamosain were sequenced using Illumina HiSeq 2500 platform. Clean sequencing reads and alignment statistics were showed in Table 2. The cycleQ30 percentage of all samples was more than 86.13%. The assembled transcripts (n = 183,760) had a total size of 291,529,594 bp, an average size of 1,586.47 bp and a N50 assembled transcripts with length of 3,031 bp. Nearly half of (54.56%) assembled transcripts were at the length range of 300-2000 nt (Fig 1).

Functional annotation
In this study, 23,787 transcripts mapped back to the protein database. S1 and S2 Tables show the unigenes and the annotation information. Arthropods account for the largest proportion in homology analysis of S. paramamosain transcriptome. The top three organism were Nevada termite Zootermopsis nevadensis (9.93%), Alveolate Perkinsus marinus (9.77%) and Water flea Daphnia pulex (5.54%) (Fig 2). Table 3 shows the unigenes with highest quality annotations. In addition, 16 transcripts encoding neuropeptide precursors (13 complete and 3 partial) were identified from the transcriptome data (Table 4). The deduced neuropeptides include B-type allatostatin (AST-B), short Neuropeptide F (sNPF), neuroparsin (NP), crustacean hyperglycemic hormone (CHH), orcokinin, diuretic hormone 31 (DH31), tachykinin, myosuppressin,  bursicon hormone alpha subunit, putative insulin-like protein growth factor binding protein and insulin-like androgenic gland factor. The analysis of GO terms showed that the 27,917 unigenes were in the GO domains (Fig 3). In biological domain, 19 terms contained 11,924 unigenes. The top three terms were metabolic process (285 unigenes), oxidation-reduction process (223 unigenes) and translation (214 unigenes). In cellular component domain, 4,596 unigenes were found in 16 terms. There were 429 unigenes involved in nucleus, followed by 383 unigenes in integral component of membrane and 376 unigenes in membrane. In molecular function domain, a total of 11,397 unigenes were distributed in 17 terms. The ATP binding (563 unigenes) and catalytic activity (459 unigenes) were top two terms. COG analysis showed that 7,742 transcripts were assigned to 25 COG terms, the top three COG terms of which were general function prediction (2,651 unigenes), replication, recombination and repair (1,096 unigenes) as well as transcription (862 unigenes) (Fig 4). In addition, 10,495 unigenes were classified into 220 KEGG pathways, of

Structural and expression analysis
A total of 56,328 CDS were predicted from 94,853 unigenes. CDS was not detected at sequence size 0-100 (bp). The top 3 levels were at sequence size 100-200 (bp), 200-300 (bp) and 300-400 (bp). As length increases, the number decreases. Therefore, the number of glimmer cds dropped significantly at sequence size 2700-2800 (bp), 2800-2900 (bp), 2900-3000 (bp). The length of CDS is shown in S1 Fig Table 5). The distribution of SNP in unigene is showed in Fig 5. SNP-free unigenes accounted for the largest proportion, followed by the unigenes with 0-1 SNP per Kb. As the density of SNPs increases in unigene, the number of unigene decreases. In addition, the correlation analysis has been performed between SNP and 121 DEGs of claw muscle transcriptomes (S5 Table). A total of 177 SNPs were found in 31 DEGs, thereinto, 105 SNPs in 12 annotated genes.  In this study, FPKM value indicates the unigene expression abundance (Fig 6). The results showed that the biologically repeated samples of nervous tissue behaved the highest consistency with the whole of three stages. In addition, the nervous tissue has the highest overall gene expression level, followed by hepatopancreas and muscle, while the dispersion degree of gene expression was opposite. As far as muscle sample, the stage D showed the highest overall gene expression level when compared with stage B and C. However, the reproducibility of muscle samples it is best at stage C. The overall genes expression level of hepatopancreas was slightly large at stage C.

DEGs in the muscle transcriptome
The analysis of muscle transcriptome data revealed 121 DEGs at different stages, 63 (52.07%) of which were annotated successfully (Table 6). In the comparison of stage D and stage B, 2    (Fig 7).

Functional analysis of DEGs of muscle
GO terms of DEGs were analyzed. In the comparison of stage D and stage B, the gene was annotated as calcified cuticle protein like gene and classified into structural molecule activity term (GO: 0042302) in molecular function domain (Fig 8A). In comparison of stage B and C, the DEGs were assigned to 15 terms, of which 6 genes in biological process, 3 genes in cellular component and 9 genes in molecular function. "single-organism process", "membrane" and "binding" were the top terms in the three GO domains with 4, 2 and 4 genes, respectively ( Fig  8B). In comparison of stage D and stage C, 46 genes were GO-categorized into biological  process (13 genes), cellular component (8 genes) and molecular function (17 genes). The "cellular process" (10 genes), "cell part" (8 genes) and "binding and catalytic activity terms" (11 genes) were the top terms in the biological process domain, cellular component and molecular function process (Fig 8C). Table 7 shows the GO terms associated with significant DEGs. COG function classification was performed with significant DEGs. In the comparison of stage D and stage C, a total number of 17 (17.2%) genes were COG-categorized into nine COG domains, 5 genes of which were in "carbohydrate transport and metabolism", followed by 3 genes in "general function prediction only", 2 genes in "signal transduction mechanisms" and 2 genes in "amino acid transport and metabolism", one gene in the remaining domains ( Fig  9A). In the comparison of stage B and stage C, 2 genes were found in "general function prediction only", and 1 gene was in the rest domains, including "Transcription", "Carbohydrate transport and metabolism", "Amino acid transport and metabolism", "Nucleotide transport and metabolism" and "Lipid transport and metabolism" (Fig 9B).
In comparison of stage B and stage C, 3 genes were classified into cellular processes and metabolism. The genes were assigned to transport and catabolism related to lysosome (ko: 04142), amino acid and nucleotide metabolism related to purine metabolism (ko: 00230) and tryptophan metabolism (ko: 00380), respectively (Fig 10A). In comparison of stage D and stage C, 19 genes were classified into environmental information processing, genetic information processing and metabolism (Fig 10B). In environmental information processing, one gene related to signal transduction was found in Notch signaling pathway (ko: 04330); five folding, sorting and degradation related genes were found in genetic information processing, two of which were assigned to protein processing in endoplasmic reticulum (ko: 04141) and the other were assigned to RNA degradation (ko: 03018). Glycolysis/gluconeogenesis, carbon metabolism and biosynthesis of amino acids were the top three pathway in metabolism (23 genes), with four genes respectively.

Discussion
A total of 127.87 Gb clean data consisting of 94,853 unigenes were successfully obtained from the mud crab S. paramamosain at three fattening stages. Previous study has shown 21,791 isotigs in the testis and ovary of the same crab species [8]. In addition, the gills of this crab species   Transcriptome profiling of claw muscle of mud crab challenged with mud crab reovirus was analyzed, producing 67,279 assembled unigenes [9]. The unigenes identified in the study not only enrich transcriptomes data of S. paramamosain, but also provide the basis for studying the mechanisms of muscle growth. In this study, 23,787 genes could be annotated. Main gene categories were found to be involved in ATP binding and catalytic activities, metabolic and oxidation-reduction processes and nucleus and membrane, which are necessary to maintain the basic life activities. The genes in these categories are relatively conservative [22,23]. The annotation rate was lower than previous reports in yesso scallop Patinopecten yessoensis and sea cucumber Apostichopus japonicus, whose annotation rate was 27.9% and 39.1%, respectively [24,25]. The low rate of annotated gene might be due to insufficient of the genomic data of S. paramamosain. The potentially available and unexploited genes, especially DEGs might play an important role in the process of muscle growth.
SNPs were used in trait-mapping and whole-genome association studies owing to the variability and abundance in the genome [4,26]. SNPs, as potential markers, were even found in the specie without the full genome sequences [27,28]. In the current study, 451,564 SNPs were called using GATK, with approximately 34-folds increase in comparison with the 13,271 SNPs in the gonads [8]. The putative SNPs could be useful in various fields of aquaculture of S. paramamosain, for example, conservation and population genetics of wild species, mapping of economically important traits, selection and breeding programmes [29]. In addition, a total of 177 SNPs were found in 31 DEGs, including 105 SNPs in 12 annotated genes. Previous study has described that SNPs in actin and CHH could affect the growth in the giant freshwater prawn Macrobrachium rosenbergii [30]. Therefore, SNPs in DEGs are speculated to affect the muscle growth during fattening process.
In crustaceans, the muscle growth and development is very complicated, and many details remain unknown to date. Muscle growth is accompanied with the increase in the number and length of muscle fibers [5]. There is few definite report on increasing the number of muscle fibers. But, once specific muscles have been established during the early stages of development, muscles continue to grow in both length and diameter [31]. In fully differentiated lobster, the Transcriptome profiling of claw muscle of mud crab muscle fibers grow in length by the addition of sarcomeres [6], and grow in cross-sectional area by the splitting of large myofibrils upon molting and enlargement during the inter-molt [6]. In crayfish Procambarus clarkii, increase in fiber length is accomplished by lengthening of existing sarcomeres [32]. Previous study speculated that the muscle growth could be stimulated by fibers stretching due to the expansion of the new exoskeleton during the post-molt period [33]. However, the mechanism involved in muscle growth is unclear in the mud crab S. paramamosain.
Extensive studies show that muscle growth responds to exercise, nutrient supply, metabolism, cytokines and endocrine factors and denervation [34]. In Drosophila, muscle growth is associated with nutrient sensing by the Insulin/Akt/TOR pathway, which was achieved by the integrated regulation of the transcription factors FOXO, Myc and Mnt [34][35][36]. The biosynthesis promotes formation of syncytial muscle, which can be stimulated by glycolysis [36]. In addition, glycolysis/pyruvate metabolism has a negative effect on the Notch pathway, and which were regulated by insulin pathway or target of rapamycin pathways [37]. Tixier et al. reported that seven genes (Pglym78, Pfk, Tpi, Gapdh, Pgk, Pyk, and Impl3) involved in glycolysis/pyruvate metabolism, play a vital role in the increasing of muscle fibers size and the myoblast fusion [36]. Some conserved cytokines could activate the pathways associated with muscle growth. For example, Drosophila eiger (CG12919), outstretched, unpaired 2, and unpaired 3, which regulate the muscle development by activation of the TNF signaling cascade and the Eya and JAK-STAT signaling [38,39]. Octopamine receptor 2 causes muscle hypertrophy by regulating synthesis of cyclic-AMP and PKA [40]. The study on the interconnections between exercise and muscle mass highlighted that spargel promotes mitochondrial activity and is required for sensing the physiological effects caused by exercise [34,41]. The muscle-nerve interactions play an important role in the formation of muscle size and structure in Drosophila [42]. In addition, a lot of factors were proven to be involved in the atrophy of muscle, such as Drosophila (CG11658, abba, CG10961, Cbl, TER94 and CG6233), p97/VCP ATPase complex, MLCs, actin and so on [43][44][45][46].
In this study, the claw muscle transcriptomes were sequenced for screening the functional genes that affect the muscle growth of S. paramamosain. The predominant catalogs of DEGs clusters were regulation of transcription, ATP binding and integral component of membrane.
The results suggested that abundant DEGs involved in maintaining the basic life activities during different fattening stages. The enzymes, related to glycogen metabolism (enolase, glycogen phosphorylase) and protein metabolism (arginine kinase, aminopeptidase N), were up-regulated at stage C specimens compared to stage D specimens, which may take part in maintaining metabolic balance during fattening process. Furthermore, several DEGs have been found, which play roles in skeletal muscle differentiation, atrophy and transformation in Drosophila or mice, such as Pfk-1, Gapdh, Hspb6, Su (H), Sspn and Mlc [36,[46][47][48][49].
PFK-1 and GAPDH are the basic enzymes in glycolysis. Previous study showed that PFK-1 is regulated by signals from cell proliferation [50,51]. In addition, PFK-1 regulates the muscle fatigue by interaction with neuronal nitric oxide synthase [52]. PFK-1 and GAPDH play important roles in the increasing of muscle fibers size and the myoblast fusion in Drosophila [36]. In this study, they were found in claw muscle transcriptomes and up-regulated at stage C specimens compared to stage D specimens. The results suggest that they could participate in  the muscle growth. As it is known that the muscle fibers rely mainly on glycolysis [53], the high expression of PFK-1 and GAPDH also may be associated with higher energy metabolism at stage C specimens.In D. melanogaster, the notch signalling pathway can regulate the differentiation of muscle [54]. Su (H) functions as a transcriptional repressor, has been shown to participate in the notch signaling pathway and the myogenesis [46]. Myogenesis in crustaceans may share some similarities with Drosophila. Su (H) was found with higher expression in stage C 3/4 specimens than in stage D 3 specimens in muscle transcriptomic database, and it is likely to be involved in the myogenesis.   HSPB6, as a member of the small heat shock protein family, was constitutively expressed in skeletal muscle [55]. HSPB6 not only plays a role in prevention of atrophy, ischemia, hypertensive stress, and metabolic dysfunction, but also regulate muscle contraction by binding to the troponin complex [47,56]. In the present study, twelve HSPB6 transcripts were up-regulated expression in claw muscle under stage C 3/4 specimens compared to the stage D 3 specimens. The claw muscular atrophy occurred mainly at the stage D 3 . The high expression of these genes suggest that HSPB6 could be involved in the muscle protection against atrophy by stabilizing myofibrillar proteins at stage C 3/4 .
In mice, Sarcospan (SSPN) plays important roles in muscular dystrophy and muscle force development by linked to sarcoglycans or as a component of the dystrophin-glycoprotein complex [48,57,58,59]. SSPN was discovered in the claw muscle transcriptome that has not been identified previously in S. paramamosain. According to the function of SSPN reported in mice, SSPN may be involved in muscle atrophy of S. paramamosain. MLCs, as the myofibrillar isoform, expressed to varying degrees in different fiber types in crustacean muscle. And the contractile apparatus transformation is accompanied by change in fiber types, which can significantly affect muscle size [49]. In addition, MLCs has been found to play a key role in atrophy, which can regulate the extraction effect of ubiquitinated proteins from the myofibrils [44]. In this study, MLCs expression may be consistent with the contractile apparatus transformation.
In addition, 34 neuropeptides were found in the transcriptome data of claw muscle in S. paramamosain. Some of the neuropeptides (e.g., the AST-B, sNPF, NP, CHH and orcokinin, DH31) had been confirmed to be widely expressed in various tissues (e.g., the cerebral ganglia and ovary) of this crab species [7]. Neuropeptides may be secreted by neuroendocrine in the nervous system and autocrines/paracrines in the non-nervous system, which could regulate many physiological processes, including growth, locomotion, reproduction and metabolism [7,60,61]. The neuropeptides in claw muscle might function as neurotransmitters or neuromodulators to regulate muscle growth.

Conclusions
The first transcriptome analysis on the claw muscle of S. paramamosain was carried out successfully and yielded 35.46 Gb clean data. Data obtained in present study greatly contributes to the understanding of the gene expression and genome structure occurring within the claw muscle of S. paramamosain at different molting stages. Potential SNPs found in the transcriptome are useful for future selective breeding, trait-mapping, and gene localization studies. The discovery and validation of DEGs showed that these particular genes might be conducive to claw muscle atrophy or restoration in S. paramamosain.
Supporting information S1