Genome sequencing of the perciform fish Larimichthys crocea provides insights into stress adaptation

The large yellow croaker Larimichthys crocea (L. crocea) is one of the most economically important marine fish in China and East Asian countries. It also exhibits peculiar behavioral and physiological characteristics, especially sensitive to various environmental stresses, such as hypoxia and air exposure. These traits may render L. crocea a good model for investigating the response mechanisms to environmental stress. To understand the molecular and genetic mechanisms underlying the adaptation and response of L. crocea to environmental stress, we sequenced and assembled the genome of L. crocea using a bacterial artificial chromosome and whole-genome shotgun hierarchical strategy. The final genome assembly was 679 Mb, with a contig N50 of 63.11 kb and a scaffold N50 of 1.03 Mb, containing 25,401 protein-coding genes. Gene families underlying adaptive behaviours, such as vision-related crystallins, olfactory receptors, and auditory sense-related genes, were significantly expanded in the genome of L. crocea relative to those of other vertebrates. Transcriptome analyses of the hypoxia-exposed L. crocea brain revealed new aspects of neuro-endocrine-immune/metabolism regulatory networks that may help the fish to avoid cerebral inflammatory injury and maintain energy balance under hypoxia. Proteomics data demonstrate that skin mucus of the air-exposed L. crocea had a complex composition, with an unexpectedly high number of proteins (3,209), suggesting its multiple protective mechanisms involved in antioxidant functions, oxygen transport, immune defence, and osmotic and ionic regulation. Our results provide novel insights into the mechanisms of fish adaptation and response to hypoxia and air exposure.


Introduction 1
Teleost fish, nearly half of all living vertebrates, display an amazing level of diversity in body 2 forms, behaviors, physiologies, and environments that they occupy. Strategies for coping with 3 diverse environmental stresses have evolved in different teleost species. Therefore, teleost 4 fish are considered to be good models for investigating the adaptation and response to many  However, to better clarify the conserved and differentiated features of the adaptive response 10 to specific stresses and to trace the evolutionary process of environmental adaptation and 11 response in teleost fish, insight from more teleost species with different evolutionary 12 positions, such as Perciformes, is required. Perciformes are by far the largest and most 13 diverse order of vertebrates, and thus offer a large number of models of adaptation and 14 response to various environmental stresses. 15 The large yellow croaker, Larimichthys crocea (L. crocea), is a temperate-water migratory 16 fish that belongs to the order Perciformes and the family Sciaenidae. It is mainly distributed 17 in the southern Yellow Sea, the East China Sea, and the northern South China Sea. L. crocea 18 is one of the most economically important marine fish in China and East Asian countries due 19 to its rich nutrients and trace elements, especially selenium. In China, the annual yield from L. 20 crocea aquaculture exceeds that of any other net-cage-farmed marine fish species (Liu et al. such as hypoxia and air exposure. For example, the response of its brain to hypoxia is quick 6 and robust, and a large amount of mucus is secreted from its skin when it is exposed to air  To understand the molecular and genetic mechanisms underlying the responses of L. 14 crocea to environmental stress, we sequenced its whole genome. Furthermore, we sequenced 15 the transcriptome of the hypoxia-exposed L. crocea brain and profiled the proteome of its 16 skin mucus under exposure to air. Our results revealed the molecular and genetic basis of fish 17 adaptation and response to hypoxia and air exposure. 18 19 Results 20 Genome features 21 We applied a bacterial artificial chromosome (BAC) and whole-genome shotgun (WGS)  Table S1). The total length of all combined BACs was 3,006 4 megabases (Mb), which corresponded to approximately 4.3-fold genome coverage 5 (Supplemental Tables S2-S3). All BAC assemblies were then merged into super-contigs and 6 oriented to super-scaffolds with large mate-paired libraries (2-40 kb). Gap filling was made 7 with reads from short insert-sized libraries (170-500 bp) (Supplemental Tables S3-S4). In 8 total, we sequenced 563-fold coverage bases of the estimated 691 Mb genome size. The final 9 assembly was 679 Mb, with a contig N50 of 63.11 kb and a scaffold N50 of 1.03 Mb (Table   10 1). The 672 longest scaffolds (11.2% of all scaffolds) covered more than 90% of the assembly 11 (Supplemental Table S5). To assess the completeness of the L. crocea assembly, 52-fold 12 coverage paired-end high-quality reads were aligned against the assembly (Supplemental 13 Fig. S3). More than 95.63% of the generated reads could be mapped to the assembly. 14 Furthermore, the integrity of the assembly was validated by the successful mapping of 15 98.80% of the transcripts from the mixed-tissue transcriptomes (Supplemental Table S6). 16 These results indicate that the genome assembly of L. crocea has high coverage and is of high 17 quality (Supplemental Table S7). 18 The repetitive elements comprise 18.1% of the L.crocea genome (Supplemental Table   19 S8), which is a relatively low percentage when compared with other fish species, such as 20 Danio rerio (52.2%), Gadus morhua (25.4%), and Gasterosteus aculeatus (25.2%). This 21 suggests that L. crocea may have a more compact genome (Supplemental Tables S9-S10). 22 6 We identified 25,401 protein-coding genes based on ab initio gene prediction and 1 evidence-based searches from the reference proteomes of six other teleost fish and 2 humans (Supplemental Fig. S4; Table S11), in which 24,941 genes (98.20% of the whole 3 gene set) were supported by homology or RNAseq evidence (Supplemental Fig. S5). Over  Table S12). 6 Phylogenetic relationships and genomic comparison 7 L. crocea is the first species of Sciaenidae of the order Perciformes with a complete genome 8 available, therefore we estimated its phylogenetic relationships to seven other sequenced 9 teleost species based on 2,257 one-to-one high-quality orthologues, using the maximum 10 likelihood method. According to the phylogeny and the fossil record of teleosts, we dated the 11 divergence of L. crocea from the other teleost species to approximately 64.7 million years 12 ago (Fig. 1A). We also detected 19,283 orthologous gene families (Supplemental Table S3), 13 of which 14,698 families were found in L. crocea. The gene components of L. crocea were 14 similar to those of D. rerio (Fig. 1B). The gene contents in four representative teleost species 15 and L. crocea genomes were also analysed, and 11,205 (76.23%) gene families were found to 16 be shared by five teleosts (Fig. 1C). We confirmed that the one-to-one orthologous genes of G. 17 aculeatus and L. crocea have higher sequence identities from the distribution of the percent 18 identity of proteins (Fig. 1D), which indicates that Sciaenidae has a closer affinity to 19 Gasterosteiformes and coincides with our genome-level phylogeny position.  Table S16).

5
Unique genetic features of the L. crocea. 6 L. crocea is a migratory fish with good photosensitivity, olfactory detection, and sound 7 perception, and it contains high levels of selenium (Su 2004). Our genomic analyses provide 8 genetic basis for these behavioral and physiological characteristics. Several crystallin genes 9 (crygm2b, cryba1, and crybb3), which encode proteins that maintain the transparency and  Table S17). 12 Phylogenetic analysis showed that the crystallin genes from L. crocea cluster together, 13 indicating that these genes were specifically duplicated in L. crocea lineage (Supplemental 14 Fig. S6). The specific expansion of these crystallin genes may be helpful for improving 15 photosensitivity by increasing lens transparency, thereby enabling the fish to easily find food 16 and avoid predation underwater. 17 We also identified 112 olfactory receptor (OR)-like genes from the L. crocea genome 18 (Supplemental Table S18; Fig. S7), and almost all of them (111) have been reported to be 19 expressed in the olfactory epithelial tissues of L. crocea (Zhou et al. 2011). The majority of 20 these genes (66) were classified into the "delta" group, which is important for the perception 21 of water-borne odorants (Niimura 2009). L. crocea also possessed the highest number of 22 genes that were classified into the "eta" group (30, P < 0.001), and these genes may 1 contribute to the olfactory detection abilities, which could be useful for feeding and migration  Selenium is highly enriched in L. crocea (Su 2004), and it is mainly present as 11 selenoproteins. We used the SelGenAmic-based selenoprotein prediction method (Jiang et al.  Table S20). Interestingly, 14 five copies of MsrB1, which encodes methionine sulfoxide reductase, were found in L. 15 crocea (MsrB1a, MsrB1b, MsrB1c, MsrB1d, and MsrB1e), whereas only two copies (MsrB1a 16 and MsrB1b) were found in other fish, thus suggesting its broader specificity to reduce all   Table S21). L. crocea has a relatively complete innate immune system, whereas its adaptive immune system may possess unique characteristics. The CD8 + T and 1 CD4 + T-helper type 1 (Th1) -type immune systems are well conserved in L. crocea, and 2 almost all CD8 + T and CD4 + Th1 cell-related genes were found ( Fig. 2A). Moreover, the 3 genes related to Th17 cell-and γδ-T cell-mediated mucosal immune responses were 4 conserved in L. crocea. These observations suggest that L. crocea may exhibit powerful 5 cellular and mucosal immunity. However, the CD4 + Th2-type immunity seemed to be weak in 6 L. crocea, as suggested by the absence of many CD4 + Th2-related genes and humoral 7 immune effectors ( Fig. 2A). We detected gene expansions in several of these  Table S22). Expansions were also observed in the genes encoding four 11 key proteins for mammalian antiviral immunity: tripartite motif containing 25 (TRIM25), 12 cyclic GMP-AMP synthase (cGAS), DDX41, and NOD-like receptor family CARD domain 13 containing 3 (NLRC3) (Fig. 2B). However, retinoic acid-inducible gene-1 (RIG-I), which 14 initiates antiviral signaling pathway in mammals, was not found in the L. crocea genome and  Stress response under hypoxia 10 The brain allows rapid and coordinated responses to the environmental stress by driving the 11 secretion of hormones. Therefore, we studied the response of the L. crocea brain to hypoxia. 12 We sequenced seven transcriptomes of the brains at different times of hypoxia exposure and  Hypoxia stress can induce the response of the central neuroimmune system, in which brain 18 neuropeptides, endocrine hormones, and inflammatory cytokines closely participate (Herman  Table S23; Fig. S12). 4 Results from transcriptome analyses show that the key HPA axis-relevant genes and IL-6/TNF-α and trigger a positive feedback loop between them (Fig. 3). Furthermore, 19 ET-1/ADM-IL-6/TNF-α may activate the HPA axis, and the latter subsequently induces hypoxia-induced cerebral inflammation in L. crocea. 10 Hypoxia can influence the hypothalamic-pituitary-thyroid (HPT) axis (Hou and Du 2005).

11
HPT axis was found to regulate protein synthesis and glucose metabolism by production of 12 thyroid hormones (Yen 2001 Table S25). This suggests that the HPT axis may inhibit protein synthesis 1 under hypoxia by decreasing the production of thyroid hormones (Fig. 3), which is beneficial 2 for saving energy during hypoxia stress. Thyroid hormones can also accelerate the oxidative 3 metabolism of glucose and inhibit the glycolytic anaerobic pathway (Sabell et al. 1985). Our respectively) (Supplemental Table S24). The down-regulation of HPT axis-thyroid 10 hormones may inhibit the TCA cycle and accelerate the anaerobic glycolytic pathway in the 11 brain during hypoxia exposure (Fig. 3). The repression of the TCA cycle and the strong  2009), were not significantly changed in the L. crocea brain (Supplemental Table S24). It is 17 possible that the HIF-1α-mediated mechanism may not be essential for the hypoxia response 18 in the L. crocea brain during the early period of hypoxia. These results suggest that the HPT 19 axis-mediated effects may play major roles in response to hypoxia by reorganizing energy 20 consumption and energy generation.  Table S26), based on 8 previous studies in mammals (Pluta et al. 2012). This indicates that the mucin synthetic 9 pathway is conserved between fish and mammals. Among these gene families, GALNT,  Fig. S13). Syntaxin-11 was also expanded. Additionally, genes encoding 13 syntaxin-binding protein 1 and syntaxin-binding protein 5, which are related to mucus 14 secretion, were positively selected in the L. crocea genome (Supplemental Table S16). The 15 expansion and positive selection of these genes may explain why the L. crocea secretes more 16 mucus than other fish under stress. 17 We identified 22,054 peptides belonging to 3,209 genes in the L. crocea skin mucus 18 proteome, and this accounted for more than 12% of the protein-coding genes in the genome 19 (Supplemental Table S27). The complexity of the L. crocea mucus presumably relates to the 20 multitude of its biological functions that allow the fish to survive and adapt to environmental  . 4A; Supplemental Fig. S14). Two hundred 3 and thirty-two antioxidant proteins that were related to oxidoreductase activity and 4 peroxidase activity were highly enriched in the L. crocea mucus, and they included 5 peroxiredoxins, glutathione peroxidase, and thioredoxin (Supplemental Table S28). These oxidative damage (Fig. 4B). Eight proteins related to oxygen transport, including hemoglobin 10 subunits α1, αΑ, αD, β, and β1, and cytoglobin-1, were identified in the L. crocea skin mucus 11 (Supplemental Table S29). The abundant expression of hemoglobin may contribute to the 12 binding and holding of oxygen for respiration. Various immune molecules that provide 13 immediate protection to fish from potential pathogens, such as lectins, lysozymes, C-reactive 14 proteins, complement components, immunoglobulins, and chemokines, were also found in 15 the L. crocea skin mucus (Supplemental Table S30). To date, the mechanisms of osmotic 16 and ionic regulation of the skin mucus have not been confirmed (Shephard 1994). In this 17 study, a large number of ion-binding proteins were identified in the L. crocea mucus 18 (Supplemental Table S31). These proteins and the layer of mucus may have a role in 19 limiting the diffusion of ions on the surface of the fish (Fig. 4B). However, a substantial 20 proportion of the proteins, which are highly present in the skin mucus of fish under air 21 exposure, play an unknown role in the mucus response.

1
We sequenced and assembled the genome of the large yellow croakerr (L. crocea) using 2 BACs and the WGS hierarchical assembly strategy. This methodology is effective for 3 high-polymorphism genomes and produces a high quality genome assembly, with the 63.11 4 kb contig N50 and 1.03 Mb scaffold N50 (Table 1). Support from the 563-fold coverage of 5 genome yields high single-base resolution and 98.80% completeness of the coding region 6 (Supplemental Table S6). Further genomic analyses showed the significant expansion of several natural brakes, including HPA axis-Glucocorticoids and SOCS family members, 16 exhibit secondary protection effects to avoid excessive inflammatory responses in the brain. 17 Our transcriptome results show that a novel HPA axis-ET-1/ADM-IL-6/TNF-α feedback 18 regulatory loop in neuro-endocrine-immune networks contributed to the protective effect and 19 regulated moderate inflammation under hypoxia stress (Fig. 3). On the other hand, the suggest that the skin mucus exerts multiple protective mechanisms, which are involved in 19 antioxidant functions, oxygen transport, immune defence, and osmotic and ionic regulation 20 (Fig. 4B). These results expand our knowledge of skin mucus secretion and function in fish, 21 highlighting its importance in response to stress. In addition, the mucus proteome shares Overall, our results revealed the molecular and genetic basis of fish adaptation and response 10 to hypoxia and air exposure. In addition, the data generated by this study will facilitate the 11 genetic dissection of aquaculture traits in this species and provide valuable resources for the 12 genetic improvement of the meat quality and production of L. crocea.

15
Genome assembly annotation 16 The wild L. crocea individuals were collected from the Sanduao sea area in Ningde, Fujian, 17 China. Genomic DNA was isolated from the blood of a female fish by using standard molecular and multi-copy genes in the library were filtered out before the RepeatScout library was used 17 to find homologs in the genome and to categorise the found repeats by RepeatMasker (Smit 18 1996-2010 ). 19 Gene models were integrated based on ab initio predictions, homologue prediction, and 20 transcription evidence.      "Fish" = fish-specific genes; "SD" = genes that have undergone species-specific duplication; 9 "Homology" = genes with an e-value less than 1e-5 by BLAST but do not cluster to a gene 10 family; "ND" = species-specific genes; and "Others" = orthologues that do not fit into the LGP2 is able to bind to double-stranded RNA (dsRNA) to trigger interferon production, but  Down-regulation of HPT axis-thyroid hormones also repressed the tricarboxylic acid (TCA) 19 cycle and accelerated the anaerobic glycolytic pathway in the brain, along with increases in 20 the exposure to hypoxia. Genes related to the neuro-endocrine system (orange), immunity 21 (red), and metabolic system and protein synthesis (blue) are indicated. The outer border 22 indicates the brain of L. crocea. The arrow represents promotion, and the interrupted line 1 represents inhibition. Solid lines indicate direct relationships between genes. Dashed lines 2 indicate that more than one step is involved in the process. immune defence, oxygen transport, and osmotic and ionic regulation, respectively.