The sea cucumber genome provides insights into morphological evolution and visceral regeneration

Apart from sharing common ancestry with chordates, sea cucumbers exhibit a unique morphology and exceptional regenerative capacity. Here we present the complete genome sequence of an economically important sea cucumber, A. japonicus, generated using Illumina and PacBio platforms, to achieve an assembly of approximately 805 Mb (contig N50 of 190 Kb and scaffold N50 of 486 Kb), with 30,350 protein-coding genes and high continuity. We used this resource to explore key genetic mechanisms behind the unique biological characters of sea cucumbers. Phylogenetic and comparative genomic analyses revealed the presence of marker genes associated with notochord and gill slits, suggesting that these chordate features were present in ancestral echinoderms. The unique shape and weak mineralization of the sea cucumber adult body were also preliminarily explained by the contraction of biomineralization genes. Genome, transcriptome, and proteome analyses of organ regrowth after induced evisceration provided insight into the molecular underpinnings of visceral regeneration, including a specific tandem-duplicated prostatic secretory protein of 94 amino acids (PSP94)-like gene family and a significantly expanded fibrinogen-related protein (FREP) gene family. This high-quality genome resource will provide a useful framework for future research into biological processes and evolution in deuterostomes, including remarkable regenerative abilities that could have medical applications. Moreover, the multiomics data will be of prime value for commercial sea cucumber breeding programs.

Introduction Echinodermata, an ancient phylum of marine invertebrates, comprises 5 extant classes, including Echinoidea (sea urchins), Asteroidea (sea stars), Holothuroidea (sea cucumbers), Ophiuroidea (brittle stars), and Crinoidea (sea lilies). Together, the phyla Echinodermata, Hemichordata, and Chordata form the deuterostome clade, based on their closely shared developmental features. To date, 2 complete echinoderm genomes, that of the sea urchin Strongylocentrotus purpuratus and that of the sea star Acanthaster planci, have been successfully sequenced [1,2]. However, because sea cucumbers are unique among echinoderms, possessing many distinctive biological characteristics, their genome holds invaluable insight that can extend the scope and depth of molecular research in Echinodermata and Deuterostomia.
Sea cucumber adults exhibit an elongated shape that belies their pentaradial symmetry, combined with weak calcification in the form of microscopic ossicles that contrasts with the solid calcified test of sea urchins. Exploring these features can help in exploring the evolution of mineralization in echinoderms, which remains poorly understood.
Of even greater interest is the fact that sea cucumbers display a capacity to regrow body parts and internal organs [3], which is much greater than that of sea stars and sea urchins, making them prime regeneration models. The use of A. japonicus in this field is facilitated by its natural ability to discard its internal organs, rapidly regenerate them, and restore normal functions within a few weeks, through a process that involves cell migration, proliferation, differentiation, and organ/tissue reconstruction [3][4][5]. Finally, sea cucumbers, like many echinoderms, can be extremely long-lived and somewhat immune to senescence [6,7]. Therefore, knowledge of the complete genome of a sea cucumber will provide a unique framework for studies that seek to understand cell and tissue regeneration, treat organ failure, and alleviate the symptoms of aging.
Sea cucumbers are also widespread, occurring from the shore to the abyss, and can represent up to 80% of the whole biomass of benthic invertebrates in some areas. They are the target of important fisheries and represent the fastest-growing aquaculture sector worldwide [8]. However, overfishing and poor management of these valuable resources are a growing concern [8,9].
The sea cucumber A. japonicus is one of the most studied echinoderms; it is being cultivated commercially on a large scale in the western North Pacific Ocean and is one of the most valuable sea foods worldwide, due to its potent nutritional and medicinal properties [10][11][12]. In China alone, around 200,000 tons of sea cucumbers were produced in 2015, with an estimated value of about 4,000,000,000 United States dollars [13]. Improving genomic knowledge of this sea cucumber may therefore benefit the seafood industry and concurrently yield pharmaceutical and biomedical breakthroughs.
In early 2017, a draft genome of A. japonicus was published that represented only about 80.5% of the estimated genome size (0.82 Gb), with scaffolds N50 value of 10.5 Kb [14]. These data provided an important resource for sea cucumber genomics, but their incompleteness and fragmentation limit applications for research. Here we present a high-quality reference genome of A. japonicus investigated through a multiomics approach, providing valuable insights into the molecular and genomic basis of crucial evolutionary traits in sea cucumbers and deuterostomes. Knowledge of the complete genome of a holothuroid offers a particularly useful framework for studies that seek to understand the mechanisms of cell and tissue/organ regeneration.

Results and discussion
The sea cucumber reference genome Genome assembly. The genome of A. japonicus was sequenced using a combination of Illumina shot-gun and PacBio Single Molecule Real-Time (SMRT) sequencing. A total of 260 Gb (approximately 294.97× coverage) of Illumina clean data and approximately 64 Gb (approximately 72.66× coverage) of long subreads from SMRT sequencing were obtained for the genome assembly (S1 and S2 Tables). A high level of heterozygosity is one of the main challenges of genome assembly in marine invertebrates. Initially, the whole-genome shotgun strategy resulted in an assembled genome that was highly fragmented. With the benefit of PacBio sequencing, genome assembly was performed with 4 different newly developed approaches (S3 Table), and we selected the most performant software (Fast Alignment and Consensus for Assembly [FALCON]) for the contig construction, with Illumina mate-pair data integrated for the scaffold extension. Finally, we obtained a reference genome with the assembly of 804. 99 Mb, which covered approximately 91.47% of the genome size estimated from flow cytometry (880.20 ± 58.68 Mb) [15]. The assembled contigs displayed high continuity with an N50 length of 190 Kb, which is longer than that of other marine invertebrates whose genomes are available (S4 Table), such as S. purpuratus (18 Kb) and A. planci (55 Kb). A total of 3,821 scaffolds were assembled with an N50 length of 486 Kb, and approximately 90% of the total assembled genome consisted of the 1,779 longest scaffolds (Table 1 and S5 Table). A high-density genetic linkage map with 6,350 high-quality markers was used to position and orient 1,313 scaffolds (573.32 Mb in length, 71.2% of genome) into 22 linkage groups (Fig 1).
A comparison of the assembly results with different coverage (30× to 70×) of the PacBio data showed that 70-fold coverage was sufficient to cover the full-length genome (S1 Fig, S3 Table). Over 94% of reads from the Illumina short-insert library could be successfully mapped to the genome; moreover, over 97% of transcriptome unigenes were mapped to the assembly, and over 93% of unigenes were covered by a single scaffold at half length (50% in 1 scaffold) (S6 Table), reflecting the high integrity and accuracy of the assembly. Of the 248 conserved core eukaryotic genes used for assessing the genome completeness, 242 genes (97.6%) were covered by the genome (S7 Table). These data show that the assembled A. japonicus genome is complete for protein-coding sequences.
Various noncoding small RNAs were found in the sea cucumber genome, including 1,127 tRNAs, 223 small nuclear RNAs (snRNAs), 149 small nucleolar RNAs (snoRNAs), 137 micro-RNAs (miRNAs), and 75 rRNAs (S1 Data). The 137 miRNAs can be clustered into 76 families, including 39 conserved and 37 novel families, among which miR-92 (with 12 copies) was the most abundant. Moreover, most of the miRNAs are distributed across the genome in clusters (12 clusters), in which 2 or more miRNAs are located in physically adjacent regions, indicating possible cotranscription and functional cooperation (Fig 1).
A total of 30,350 protein-coding gene models were constructed, and among them, 28,144 (92.73%) were supported by the transcriptome data. Compared to S. purpuratus and A. planci, A. japonicus has a similar average exon size (193.43 bp) and exon number per gene (6.1), but it has a shorter intron size (1,319 bp on average) than S. purpuratus (S5 Fig and S9 Table).

Genomic analyses shed light on morphological evolution in deuterostomes
Phylogenetic analyses. To understand the phylogenetic location of sea cucumbers, we used a maximum likelihood method for genome-wide phylogenetic analysis based on single- Sea cucumber genome copy genes from 17 genomes (Fig 2A). The results support the view that echinoderms and hemichordates (Saccoglossus kowalevskii) are sister groups and share a common Ambulacraria ancestor, which is the basal taxon of deuterostomes. Based on phylogeny and fossil records, we dated the divergence time of hemichordates and echinoderms to about 533 million years ago (Mya), which roughly coincides with the lower Cambrian evolutionary explosion. Moreover,  We performed comparative genomic analyses among 17 metazoan species and detected 49,351 families of homologous genes. Owing to gene loss and duplication, species-specific genes occupied a large part of genes, and few deuterostome-specific and Ambulacraria-specific genes were identified, while some Echinoderm-specific genes were cumulated (Fig 2C). There were 763 echinoderm-specific gene families shared by 3 echinoderms, which were enriched in the genes encoding membrane proteins, ion channels, and proteins involved in signal transduction (S10 Table). A total of 452 gene families were shared by A. japonicus and S. purpuratus (Fig 2B), many of them enriched in metabolism pathways of glycerophospholipid (P = 2.2E-04), arachidonic acid (P = 9.6E-04), and steroid hormone (P = 5.2E-03). Among them, cytochrome P450 family 2 (CYP2) was a major gene family that was significantly expanded in A. japonicus and S. purpuratus (S7 Fig). However, 13,103 genes in A. japonicus could not be grouped with any genes from the other 16 species; these genes were referred to as orphan genes. In total, 1,576 gene families were expanded in A. japonicus (S8 Fig), whereas 1,579 gene families were contracted (P < 0.01) (Fig 2A). The gene families expanded in the genome of A. japonicus were chiefly enriched in categories of signal recognition and immunity (S11-S15 Tables and S8 Fig), and gene repertoires showed some differences in the contents of multiplecopy genes and Echinoderm-specific genes ( Fig 2C).
Notochord and gill slits. Unlike hemichordates and chordates, adult echinoderms lack the ancestral deuterostome characters of gill slits and notochord. However, fossil evidence supports vetulocystids as the ancestral echinoderms, since they possess gill slits like hemichordates [20]. Some fossil evidences, such as stylophorans and Jaekelocarpus, share calcite skeletons with extant echinoderms and also share gill slits with chordates, suggesting a close relationship between echinoderms and chordates [21][22][23]. Furthermore, we found that most marker genes associated with notochord formation and gill slit development (e.g., brachyury, β-catenin, foxA, and otx) were commonly present in echinoderms, whereas some of them were absent in the nondeuterostome species (Fig 3A and 3C). Two conserved gene clusters related to pharyngeal gill slit development (pax1/9~foxA and six6~six4) showed strong synteny within deuterostomes [24], whereas they were incomplete and showed poor synteny within nondeuterostomes ( Fig 3C and S10 Fig).
The transcription factor Brachyury is a key regulator of notochord formation in chordates, and its transcriptional activation is regulated by fibroblast growth factor (FGF) signaling pathway genes [25]. However, FGF gene families contracted significantly in echinoderms, such that only 1 FGF gene (fgf8) exists in A. japonicus, S. purpuratus, and A. planci (Fig 3B). Transcriptome data are consistent with this opinion, showing only 1 FGF gene expressed in A. japonicus. Similar to other chordates, brachury and FGF start to express at the gastrula stage (S11 Fig). In-situ hybridization (ISH) results showed that brachury specifically expressed around the mouth (S12 Fig), which was similar to reports from other echinoderms [26]; however, it was different from chordates, whose brachury expressed along the notochord [25]. Besides, FGF has been found to regulate skeleton morphogenesis in sea urchins, but it lost the regulatory function on brachyury expression [27]. The stomochord was considered as the supporting organ of hemichordates, even though no molecular evidence of homology has been detected between stomochord and notochord [28,29]. The marker gene of stomochord formation (foxE) is specifically expressed in the pharyngeal region of S. kowalevskii [30]. However, echinoderms have not evolved any tissues equivalent to the stomochord, and foxE was absent in all sequenced echinoderms (including sea star and brittle star) (Fig 3A and S16 Table). As for the gene cluster related to pharyngeal gill slit development (pax1/9~foxA), Nkx2.1, FoxA, and Pax1/9 showed similar expression patterns that were highly expressed at the stages of gastrula and doliolaria, but other genes displayed variable expression patterns, and Slc25A21 even displayed a reverse expression pattern (S13 Fig). In summary, the genomic characteristics related to notochord and gill slits in A. japonicus bear similarities and some differences to chordates, suggesting that these features might have been present in ancestral echinoderms but subsequently lost. Homeobox genes. The Hox gene cluster is widely known for its role in patterning the anterior-posterior axis of animal embryos. The ParaHox gene cluster, including Gsx, Xlox, and cdx, is the evolutionary sister of the Hox gene cluster. In the A. japonicus genome, we found a single, substantially intact, and well-ordered Hox gene cluster containing 10 Hox genes, as well as a conserved ParaHox gene cluster (Fig 4) and large numbers of other homeobox genes (S17 Table). The Hox gene cluster, Hox1 to Hox11/13c, spanned approximately 940 Kb, which is similar to reports from acorn worms (S. kowalevskii and Ptychodera flava, approximately 500 Kb) [31], sea urchin (S. purpuratus, 588 Kb) [32], and sea star (A. planci, 1,195Kb) [33]. However, the gene order of Hox clusters in A. japonicus was similar to that of A. planci and different from that of S. purpuratus, whose anterior class genes (Hox1, Hox2, and Hox3) were inverted and translocated to the end of the cluster because of genomic rearrangement (Fig 4) [32]. Thus, the present results are consistent with the opinion that translocation and inversion (TAI) of the Hox cluster was not a feature of the Echinozoa (sea urchins and sea cucumbers), and the pentameral body plan was not linked to Hox gene arrangements in echinoderms [34]. Hox4 and Hox6 were not found in the A. japonicus genome. Generally, Hox4 is conserved in Asteroidea, and its expression is characterized in the dwarf cushion star Parvulastra exigua [35]. However, Hox4 is lost in S. purpuratus [32] whose Hox6 was highly homologous to A. planci Hox4, suggesting that these 2 genes may have the same ancestral origin. A homeodomain of Hox2 was found between Hox1 and Hox3 in the A. japonicus Hox cluster, but its gene structure was not intact. It could not be found in the available A. japonicus transcriptomes. The annotated Hox2 sequence could not be amplified by PCR from embryonic and larval cDNA of A. japonicus. As there are no reports of their function in echinoderms, we speculate that the absence of these Hox genes may be involved in the unique body shape of the sea cucumber and other echinoderms.
Skeleton degeneration. Unlike other echinoderms with a well-developed calcareous endoskeleton, sea cucumbers have a soft body wall, and microscopic skeletal ossicles are imbedded into a thick layer of the dermal connective tissue (Fig 5A). In spite of its markedly reduced skeleton, the sea cucumber A. japonicus shares many components of the skeletogenic regulatory system with heavily calcified sea urchins and other echinoderms [36-38]. It has relatively intact skeletogenic signal pathways, key transcription factors, and important skeletogenesis-related genes (e.g., otx, soxB, ets1, alx1, dlx, runx2, snail, and twist) (S18 Table). However, the number of downstream biomineralization genes is significantly different between the sea urchin and the sea cucumber. Compared with 31 biomineralization genes reported in S. purpuratus [39], only 7 were found in A. japonicus (S18 Table). Further analysis of these genes in the sea star (A. planci) and acorn worm (S. kowalevskii) found that colp3a, p19, cyp1, cyp2, msp130, and can1 are the common biomineralization genes in the 4 Ambulacraria animals, but these genes are fewer in A. japonicus than in the other 3 species, especially S. purpuratus ( Fig 5B). Therefore, we speculated that biomineralization genes were contracted in sea cucumbers and significantly expanded in sea urchins ( Fig 5B). The transcriptome data revealed that most biomineralization genes were highly expressed in S. purpuratus from the blastula (embryo) through the pluteus (larva) stages, whereas the 7 corresponding biomineralization genes in A. japonicus showed relatively lower expression levels in these developmental stages (Fig 5C). Biomineralization is a key process of skeleton formation, and mineralization gene expression directly determines the final scale of the skeleton. Most of these biomineralization genes (Sm30 and Sm50) encode spicule matrix proteins that mediate the deposition of the initial calcite crystal granules [39,40]; the others encode collagens (colp3a), the mesenchyme-specific cell surface protein (Msp130), and the primary mesenchyme cell (PMC) specific carbonic anhydrase (can1) [41]. It appears that A. japonicus lacks the spicule matrix genes (such as the Sm30 family). Therefore, the contraction of biomineralization genes likely underlies both the evolutionary and developmental degeneration of the sea cucumber's skeleton.
Nervous system-related genes. Sea cucumbers have no true brain or classic sensory organs, but they have a primary central nervous system specifically, a radial nerve cord and various nerve endings scattered through the skin giving them a response to touch and sensitivity to light. The genome data supported these characteristics. Like sea urchins and sea stars, A. japonicus has a large number of genes related to the development of the nervous systems in its genome, whether in adult or larva [42]. However, some crucial genes related to central nervous system development were absent in A. japonicus, S. purpuratus, and A. planci, including hunchback, vax, sax7, pax2, gfap, krox20, L-fng, Gli, and pax3/7 (S2 Data), suggesting that all echinoderms have a similar genomic comparison and an absence of central nervous system genes. One difference is that A. japonicus has some significantly expanded gene families mainly enriched in signal recognition and nervous system, such as tyrosine-protein kinase receptor and netrin receptor UNC5B (S11-S15 Tables). One of the most expanded gene families was the netrin receptor (234 genes), which expanded many folds compared to related species. The netrin receptor is the key regulator in neuronal cell growth and migration [43]. Therefore, the expansion of these gene families might be a compensatory adaptation of the response to the lack of a central nervous system in A. japonicus. The underpinnings of exceptional regenerative capacities in sea cucumbers A novel regeneration-related gene cluster. Evisceration is a defensive strategy shared by many sea cucumbers. In A. japonicus, it can be induced artificially, and new visceral organs will be regenerated completely in a few weeks, making it an ideal model animal for research on organ regeneration. Gene tandem repeat and whole genome duplication are regarded as primary drivers of genome evolution [44], since they can provide raw material for the generation of novel genes, expansion of gene families, increases in gene dosage, or the development of new functions [45]. In the A. japonicus genome, 74 clusters with tandem-duplicated genes were found in more than 5 repeats (Fig 6A). When verifying the expression of tandem-duplicated genes from these clusters during visceral regeneration (Fig 7A), we found that genes in 1 cluster located on Scaffold889 showed significant up-regulation by more than 10,000 folds during the early stages (0-3 days post evisceration, dpe) of regeneration (Fig 6B and S14 Fig). In this cluster, the genes were arranged with 11 tandem duplications that had no homologs in other species and contained highly conserved cysteine residues, which suggested that they may belong to a gene family specific to A. japonicus. Moreover, these genes were specifically expressed in regenerating intestines, with no expression detected in any embryonic or larval stages (S19 Table). Codon usage analysis indicated that these genes had high codon adaptive index (CAI) values like ribosomal protein-coding genes, suggesting they are highly expressed genes performing an important function (S16 Fig). Our corresponding proteomic study confirmed their up-regulation during the same stages (S20 Table). Further analysis on the translated amino acid sequences of these genes showed that they all had a domain that was analogous to the prostatic secretory protein of 94 amino acids (PSP94) based on the topological similarities of the cysteine residues, so we designated them as PSP94-like genes ( Fig 6C). It was reported that the PSP94 genes in vertebrates underwent rapid evolution, which resulted in an improvement of their functions in a wide range of biological activities, such as inhibition of tumor growth [46], activation of immunity [47], and mesoderm formation [48]. Taken together, their specific arrangement pattern and distinguished expression profiles strongly suggested that the PSP94-like genes might play a vital role in the regenerative abilities of A. japonicus.
Expanded and clustered fibrinogen-related protein (FREP) genes. Another cluster with 21 tandem-duplicated FREP genes was located on Scaffold75. Many FREPs, like tenascin [49], fibrinogen-like protein [50], angiopoietin [51], and hepassocin [52], have been reported to be involved in the regulation of regeneration. These FREPs showed an obvious genome-wide gene duplication in A. japonicus and also formed the gene family with the greatest expansion (Fig 1 and S15 Table). Phylogenetic analysis showed that the FREP gene family expanded twice in the A. japonicus genome, and regeneration-related FREPs were clustered and expanded at the same time (S9 Fig). Immune factors have been reported to play important roles in regeneration [53,54]. They have proved to be critical signals for the activation of cell cycle re-entry during the regeneration of salamander and mammalian liver [54,55]. To investigate whether the FREPs are activated during visceral regeneration, we analyzed their expressions in the transcriptome and proteome data. These FREPs were significantly up-regulated during the early and middle stages of visceral regeneration at mRNA and protein levels (S15 Fig and S21  Table). Collectively, the expansion and tandem duplication of FREPs might underlie the high regenerative potential of the sea cucumber. Conserved regeneration regulation system. Over the past few years, many analyses related to the regeneration process in sea cucumbers have been conducted, providing a list of key candidate genes and a roadmap for future studies of regulatory mechanisms [56][57][58][59][60].
These key candidate genes were classified into several main groups, including developmental genes, extracellular matrix (ECM)-related genes, pluripotency factors, neurogenesis-related Sea cucumber genome genes, and cytoskeletal genes. In the A. japonicus genome, some regeneration-related factors were found to be exclusively present or expanded, for example, the genes related to ECMreceptor interaction, apoptosis, and adherens junction (S13 and S14 Tables). Our transcriptomic analyses also showed that the visceral regeneration in A. japonicus was a very complicated process, regulated by a large number of regeneration-related genes, including stem cell pluripotency factors, signaling pathways genes, ECM-related genes, and myogenesis-related genes, and their expression profiles showed differential patterns during regeneration (Fig 7B  and S15 Fig). Corresponding proteomic studies confirmed that these factors may play important roles in this process (S22-S24 Tables). Specifically, the most well-characterized set of mammalian pluripotency factors including Sox2, c-Myc, Oct4, and Klf4, which can induce transformation of differentiated mammalian fibroblasts into induced pluripotent stem (iPS) cells [61] had corresponding orthologs in A. japonicus. Among them, the orthologs (SoxB1, Oct1/2/11, and Klf1/2/4) were found to be up-regulated at the early stage post evisceration (0-6 hours) (Fig 7B and S17 Fig). In fact, the key pathways of stem cell pluripotency were considerably conserved as compared to mammals (S18 Fig). Consistent with the phylogenetic position of echinoderms (sister group to chordates), our results show that sea cucumbers share similar characteristics with vertebrates in regard to regeneration, supporting the hypothesis that regenerative mechanisms might be conserved across evolution in the Deuterostomia [62,63]. Meanwhile, some exclusive or significantly expanded genes, such as PSP94-like proteins and FREPs, might play key roles in the remarkable regenerative capacity of sea cucumbers.

Conclusions and implications
We present a high-quality sea cucumber reference genome from the commercially cultivated species A. japonicus, generated from Illumina and PacBio platforms. An integrated multiomics approach offered insight into the genome architecture of sea cucumbers and the genetic underpinnings of their unique biological traits. The sea cucumber constitutes a particularly promising model animal for regenerative medicine because of the convenient induction of organ regeneration and the newly available genomic data for A. japonicus. A genome resource of this completeness and quality also makes an important contribution to holothuroid and echinoderm research. The findings should facilitate our understanding of the requirements for sustainable utilization and effective breeding of echinoderms, in support of the high-value sea cucumber industry.

Animal materials and DNA isolation
The animal material for genome sequencing and assembly was from a male A. japonicus captured off the coast of Laoshan, Qingdao, China. The sea cucumber was acclimated in sea water at 15 ± 1˚C before experiments. Muscle, gonad, and respiratory tree tissues were collected and immediately frozen in liquid nitrogen and stored at −80˚C. Genomic DNA was extracted using a TIANamp Marine Animal DNA Kit (TIANGEN, Beijing, China) according to the manufacturer's instructions.

Genome sequencing
For Illumina sequencing, short-insert paired-end (PE) (180 bp and 500 bp) and long matepaired (MP) (5 Kb, 10 Kb, and 20 Kb) DNA libraries were constructed according to the manufacturer's instructions (Illumina, San Diego, California, US). Sequencing runs for the PE libraries were performed on the Illumina HiSeq2000 platform, and long MP libraries on the HiSeq2500 platform. To obtain long reads to promote genome assembly, Pacific Biosciences RS II (Pacific Biosciences, Menlo Park, California, US) was used as the sequencing platform. Five 10-Kb SMRTbell libraries were prepared and sequenced using the C4 sequencing chemistry and P6 polymerase.

Genome assembly
In order to get the best assembly results, we tried Short Oligonucleotide Analysis Package de novo assembly tool (SOAPdenovo) [64], de Bruijn graph to Overlap-Layout-Consensus (DBG2OLC) (https://sourceforge.net/projects/dbg2olc/), FALCON (https://github.com/PacificBiosciences/ FALCON), and SMARTdenovo (https://github.com/ruanjue/smartdenovo). The assembly result of SOAPdenovo was fragmented (1,165,887 contigs, N50 1,770 bp, and N90 230 bp). Compared with the SOAPdenovo assembly using only Illumina data, DBG2OLC, FALCON, and SMARTdenovo genome assemblies using PacBio data or a hybrid of PacBio and Illumina data gave more satisfactory results, all with contig N50 values above 110 Kb. This result confirmed the effectiveness of the approaches and the superiority of PacBio long reads for large genome assembly (S2 Table). After comparing the 3 approaches of DBG2OLC, FALCON, and SMARTdenovo, genome assembly by FALCON gave higher continuity (with a contig N50 of 190 Kb) than the other methods.
To test the amount of PacBio data sufficient for A. japonicus genome assembly, we performed a set of assemblies by FALCON with coverage from 30× to 70× of PacBio data. For all assemblies, default parameters were used. Genome size and N50 of assembled contigs were calculated and used for performance evaluation (S1 Fig). On the basis of these assembly experiments, 70-fold coverage of PacBio data (64 Gb) was sufficient and used for the A. japonicus genome assembly. The assembled contigs were corrected 8 times with Illumina PE reads using Quiver (https://github.com/PacificBiosciences/GenomicConsensus). Finally, we generated scaffolds and performed gap filling with SSPACE 3.0 (https://www.baseclear.com/genomics/ bioinformatics/basetools/SSPACE) using Illumina MP sequencing data.
To locate scaffolds on chromosomes, 2 parents and 130 offspring were genotyped, and a high-density linkage map was constructed with Genotyping by Sequencing (GBS) technology [68]. Sequencing depth averaged 50-fold for parents and 10-fold for the 130 offspring. Clean reads were mapped to genome scaffolds by the Burrows-Wheeler Aligner [69]. Variants including SNPs were identified using SAMtools [70]. JoinMap 4.0 was applied for linkage analysis based on a maximum likelihood algorithm [71]. Recombination values were converted to genetic distances in centiMorgans based on the Kosambi mapping function. A total of 71.2% of the genome scaffolds anchored in the linkage map, and Circos64 was used for visualization [72].
To estimate the level of heterozygosity in the genome, we carried out k-mer (where k represents the chosen length of substrings) distribution approximation using simulated heterozygous genome sequences (S2 Fig). Jellyfish were used to calculate k-mer depth distribution [73]. PE reads from short-insert libraries (180 bp) were used for analysis. Adaptor and lowquality reads were trimmed using the NGS QC Toolkit (v2.3.3) [74]. Reads for 26.01 Gb of clean data were split into k-mers, and k-mer depth was computed to obtain depth distribution. Two peaks of k-mer depth 14 and 28 were observed in 17-mer analysis. We found that the kmer distribution was fitted best by a simulated k-mer distribution with 2% heterozygosity (S3 Fig). To obtain an exact heterozygous rate, we used the software gce to analyze k-mer frequency data with parameters of -c unique_depth -H 1 (ftp://ftp.genomics.org.cn/pub/gce). We used gce to estimate the k-mer heterozygous ratio (KHR) based on the following formula: KHR = a1 / 2 / (2 − a1 / 2). We used the formula base heterozygous ratio (BHR) = KHR / Kmer (assuming each SNP caused K new k-mers) for BHR. An exact heterozygous rate of 1.59% was estimated, which was close to the evaluation by simulated k-mer distribution. We also investigated the level of heterozygosity by mapping all reads back to the assembled genome using BWA with default parameters. For reads with multiple mapping positions, only the single best hit was retained. SNPs and INDELs were called based on alignment results with SAMtools [70]. A number of SNPs (5,379,554) and INDELs (486,341) were detected on the assembly genome, suggesting heterozygosity of about 0.73%. Because the part of the genome with the lowest heterozygosity had the best assembly, 0.73% was considered an underestimation due to sampling bias.

Repeat sequence analysis
RepeatModeler (http://www.repeatmasker.org/RepeatModeler/) was used for repeat family identification in a de novo approach. RepeatMasker was used to identify transposable elements by aligning genome sequences against RepBase (RepBase21.04) and a local library generated by RepeatModeler with default parameters. Tandem repeats were analyzed with the Tandem Repeats Finder (TRF) program [75]. Transposable elements (TEs) in the A. japonicus genome were discovered by a combination of de novo-based and homology-based approaches.
A local repeat library of 1,507 sequences was produced by RepeatModeler and was used for the following prediction [76]: All major classes of TEs were summarized and compared with S. purpuratus (S7 Table). A total of 210.87 Mb (26.20%) of the assembled genome was predicted to be TEs, slightly less than S. purpuratus (32.67%). Of these, 18.93% were classified as unknown repeats. Among the 7.10% annotated repeat families, retrotransposable element Long Interspersed Nuclear Elements (LINEs) (2.31%) and DNA transposons (3.20%) were 2 major TEs discovered in the assembled genome that were similar to S. purpuratus. However, S. purpuratus has more DNA transposable elements (8.36%) than A. japonicus (S7 Table and

Noncoding RNA prediction
Noncoding RNA genes were predicted for repeat-masked genome by sequence-and structurebased alignments with the Rfam noncoding RNA database (http://xfam.org/) with an E-value cutoff at 0.01, using Infernal [77]. Specifically, for miRNA identification, small RNA sequencing data were downloaded from the SRA database of NCBI, including sRNAs from respiratory tree (SRX878425), tubefoot (SRX642039), and longitudinal muscle (SRX878452). Adaptor and primer sequences were trimmed, and low-quality sequences removed. Clean sRNA reads were compared with the Rfam database (http://xfam.org/) to exclude noncoding RNAs other than miRNAs. The remaining sRNA reads were subjected to miRNA identification by mapping to predicted pre-miRNA structures in the A. japonicus genome using miRDeep2 [78] and miReap (http://sourceforge.net/projects/mireap/). Finally, novel and conserved miRNAs were classified by searching against the miRBase (http://mirbase.org/).

Gene prediction and function annotation
Three approaches were used to predict protein-coding genes: homology-based predictions, de novo predictions, and transcriptome-based predictions. Homologous proteins from 8 known whole genome sequences Homo sapiens, Danio rerio, B. floridae, S. kowalevskii, S. purpuratus, Daphnia pulex, C. gigas, and Hydra vulgaris were used for alignment to the repeat-masked A. japonicus genome using Exonerate (version 2.2.0) [79]. Genewise (version 2.2.0) was used to generate gene structures based on homology alignments of proteins to the genome [80,81]. For ab initio gene prediction, we used Augustus (version 2.5.5), Genescan (version 1.0), Glim-merHMM (version 3.0.1), and SNAP15 to predict coding genes. RNA-Seq data were used to improve gene annotation and mapped to the genome using Tophat (version 2.0.8) [82]. Cufflinks (version 2.1.1) (http://cole-trapnell-lab.github.io/cufflinks/) was used to identify spliced transcripts in gene models [83]. All gene evidence predicted from the 3 approaches was combined by EVM into a weighted and nonredundant consensus of gene structures [84]. Gene models generated by EVM were filtered according to the following criteria: coding region lengths less than 150 bp and values of reads per kilobase of exon model per million mapped reads (fragments per kilobase of transcript per million mapped reads [FPKM]) < 5 when a predicted gene supported by ab initio methods only hit with the uniref 90 database [85]. For gene functional prediction, NCBI nr and the SwissProt database were used for gene blasts. All predicted genes were blasted against the 2 databases using BLASTP (E-value 1E-10).

Gene family analysis
To understand the evolutionary relationship of A. japonicus with other metazoans, we performed systematic gene comparisons. For greater insight into the evolutionary dynamics of the genes, we determined the expansion and contraction of the gene ortholog clusters among these 17 species. We used CAFE software for computational analysis of gene family evolution [86] and defined expansion and contraction by comparing cluster size differences between ancestors and each current species. Extinction and evolution of gene families were processed by CAFE, using a random birth and death process model to identify gene gain and loss along each lineage of the RAxML tree ( Fig  2). For all species, expanded and contracted gene families (compared to ancestors) were compared with A. japonicus to identify gene families that expanded or contracted only in the sea cucumber.
For genes exclusively present and gene families specifically expanded in A. japonicus, we conducted gene ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis using blast2go and KAAS [87,88]. Using Omicshare CloudTools (http://www.omicshare.com/tools/?l=en-us), enriched GO terms and KEGG Orthology (KO) terms were calculated relative to the background of full protein-coding genes. Among the 30,350 protein-coding genes of A. japonicus, many-to-many orthologs (15.38%), patchy orthologs (36.44%), and species-specific genes (43.17%) were 3 major gene model groups (Fig 2C). Echinoderm-specific orthologs (1.94%) were also detected in the A. japonicus genome. GO enrichment analyses of the A. japonicus-specific genes and significantly expanded gene families suggested that most genes were associated with GO terms in cellular process, metabolic process, and binding and catalytic activity (S6 and S8 Figs). Based on the full protein-coding genes, A. japonicus-specific genes were predominately enriched in GO terms related to signal recognition and immunity (S12 Table and S8 Fig). Similar results were found for GO enrichment analysis of significantly expanded gene families relative to full genes (S13 Table, S8 Fig). KEGG pathway enrichment analyses indicated that A. japonicus-specific genes and gained genes were also enriched in pathways related to signal recognition and immunity (S14 and S15 Tables, S8 Fig).

Phylogenetic tree construction
Peptide sequences were clustered by the Markov clustering program orthoMCL [89]. These sequences were also searched against the nr database by an all-versus-all BLASTP with threshold E 1E-05 and then clustered by MCL with an inflation value of 1.5. A total of 43 singlecopy orthologous genes were clustered among 17 genomes. Ortholog alignments were produced using MUSCLE (v3.6) and concatenated into a single multiple-sequence alignment by an in-house Perl script. A neighbor-joining phylogeny was reconstructed using MEGA (v5) [90].
To examine the phylogeny of sea cucumbers, we used a maximum likelihood (ML) method for genome-wide phylogenetic analysis based on single-copy genes from the 10 deuterostome and 7 nondeuterostome genomes. Based on the gene clustering results from orthoMCL, 49,351 gene families were collected from the 17 species. Among them, 43 single-copy genes were used for phylogenetic tree construction. To understand the relationship of A. japonicus to other echinoderms, we constructed another phylogenetic tree of 5 echinoderms based on orthologous genes. To extend the taxonomic sampling, gene families were surveyed in transcriptome datasets in F. serratissima, Patiria miniata, and A. filiformis (NCBI SRA database accession numbers SRR2454338, SRR573710, SRR573709, SRR573708, SRR573706, SRR573707, SRR573705, SRR573675, SRR1523743, SRR1533125, SRR794587, SRR794568, SRR789489, and SRR3097584). Transcriptome data for the 3 echinoderms were assembled into unigenes using Trinity and cap3 with default parameters [91,92]. Single-copy gene families were extracted from the full genes of S. kowalevskii (outgroup), A. japonicus, and S. purpuratus. For each gene family, protein sequences from all represented sequenced genomes were searched using BLASTP (Evalue cutoff 1.00E-10) against unigenes from each species. The unigene with the best score was translated as the longest open reading frame (ORF) in the frame detected by BLASTP. Gene clustering analysis was performed on the unigenes. Finally, 2,066 orthologous genes were obtained for constructing a phylogenetic tree using the ML method. For ML tree construction, sequence alignments were performed using MUSCLE 3.6 [93]. The substitution models that best fit observed alignment data were estimated using the program jModelTest 2 [94]. Using PhyML [95], we performed ML analysis with the substitution model WAG + gamma + Inv, and 1,000 bootstraps were conducted to produce the branch support values.

Morphological evolution analyses
Pharyngeal gill slits gene cluster. A conserved deuterostome-specific pharyngeal gene cluster was discovered to be important for pharyngeal gill slits [24]. The gene cluster was also located on a single scaffold in A. japonicus, but with altered gene cluster order. Generally, nkx2.1-nkx2.2 and pax1/9-slc25A21 were the 2 most conserved subclusters in the gene cluster, but both were interrupted in A. japonicus. The homologous gene slc25A29 was far from pax1/ 9, while nkx2.1 was in another scaffold. The pax1/9 gene is important for formation of gill slits via modulation of six1/2 and tbx1/10 expression [107]. In A. japonicus, six1 was a neighbor to six6 and six4, which form a gene cluster similar to that of chordates and hemichordates (S6 Fig).
Notochord formation-related gene families. We searched the A. japonicus genome for genes involved in notochord formation (Fig 3). To determine the copy number of FGF genes in the A. japonicus genome, we screened the A. japonicus transcriptome data using full FGF genes from the other 16 species. Only a single unigene was homologous to an FGF gene.
foxE is specifically expressed in the buccal and pharyngeal regions and the stomochord region, and it is a marker gene for stomochord formation [30]. We compared fox genes of S. kowalevskii against A. japonicus transcriptome unigenes. Several unigenes were homologous to fox genes; however, no unigenes belonged to foxE. To further test if foxE was absent in echinoderms, we compared S. kowalevskii fox genes against the draft assembly of other echinoderms from NCBI, including sea urchin Lytechinus variegates (AGCV00000000.2), sea cucumber Parastichopus parvimensis (JXUT00000000.1), brittle star Ophiothrix spiculata (JXSR00000000. 1), and sea star P. miniata (AKZP00000000.1). No foxE gene was detected in these genomes (S16 Table).
Skeletal development genes. We compared the genes of skeletogenic regulatory systems in sea cucumbers and sea urchins and found that biomineralization genes were significantly different between the 2 echinoderms. There are 31 biomineralization proteins in sea urchins that are important for osteogenesis [39,108]; however, only 7 homologous biomineralization genes were identified in the A. japonicus genome (S19 Table). Furthermore, we searched for these 31 genes in the sea star A. planci (NCB Bioproject PRJDB3175) and the hemichordate S. kowalevskii (NCB Bioproject PRJNA12887) and listed the numbers of the genes in the 4 animals.
To compare the expression of the biomineralization genes of sea urchins and sea cucumbers, we collected expression data on the 31 biomineralization proteins of S. purpuratus from the "Quantitative Developmental Transcriptomes of S. purpuratus" in Echinobase (http:// www.echinobase.org:3838/quantdev/), which includes 4 development stages: fertilized eggs (10 hours), blastula (24 hours), gastrula (48 hours), and pluteus (72 hours). The corresponding expression information for 4 biomineralization genes was obtained from A. japonicus developmental transcriptomes. Using tubulin expression as the control, the biomineralization genes showed significantly lower expression in A. japonicus than S. purpuratus (Fig 3F).

Visceral regeneration
In order to better understand the regeneration mechanism, we conducted a set of genomic and multiomic analyses of intestinal regeneration in A. japonicus. Adults of A. japonicus (100-120 g) were collected from the coast of Qingdao, Shandong Province. They were held in a laboratory for 1 week in sea water at 15-17˚C and fed once a day. Evisceration was induced by injecting about 2 mL 0.35 molL -1 KCl into the coelom. Eleven individuals per stage were sampled at 0.5 hours, 2 hours, 6 hours, 3 dpe, 5 dpe, 7 dpe, 14 dpe, and 21 dpe, and intestinal tissues were collected for RNA isolation; noneviscerated sea cucumbers served as controls. Dissected intestines were frozen in liquid nitrogen and stored at −80˚C until RNA extraction.
Total RNA was extracted using the TRIzol extraction method (Thermo Fischer Scientific, Germany) according to the manufacturer's protocol. Poly-A mRNA was isolated from 40 μg total RNA per sample using oligo-dT-coupled beads and sheared. Isolated RNA samples were used for first-strand cDNA synthesis using random hexamers and Superscript II reverse transcriptase. After end repair and addition of a 3 0 -dA overhang, cDNA was ligated to an Illumina paired-end adapter oligo mix and size-selected by gel purification to enrich for approximately 200 bp fragments. After 16 PCR cycles, transcriptomes of the entire intestinal regeneration process of 9 stages were sequenced using Illumina HiSeq-2000 and the PE end sequencing module. These stages corresponded to 6 key phases of intestinal regeneration: early response (0 to 6 hours post evisceration), wound healing (6 hours to 3 dpe), blastema formation (3 to 7 dpe), lumen formation (7 to 14 dpe), intestinal differentiation (14 to 21 dpe), and growth (21 dpe) [57,109]. RNA expression analysis was based on the predicted genes of A. japonicus genome. Tophat was used to map mRNA reads to the genome, and Cufflinks was used to calculate expected FPKM as expression values for each transcript.
Corresponding proteomic studies (3,5,7,14, and 21 dpe) of intestinal regeneration were analyzed using isobaric tags for relative and absolute quantitation (iTRAQ) coupled with mass spectrometry (MS). Total protein was taken from sample solutions, and protein was digested with Trypsin Gold ( Table. Gene ontology (GO) enrichment analysis of echinoderm-specific gene families. (XLSX) S11 Table. Gene ontology (GO) enrichment of the genes exclusively present in A. japonicus. The gene ratios in each GO term are compared between the specific and full protein-coding genes of the genome. The GO terms with a light green background are associated with immunity, and those with a yellow background are associated with signal recognition. (XLSX) S12 Table. Gene ontology (GO) enrichment of the expanded gene families in A. japonicus. The gene ratios in each GO term are compared between the significantly expanded and full protein-coding genes of the genome. (XLSX) S13 Table. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of the genes exclusively present in A. japonicus. The gene ratios in each KEGG Orthology (KO) term are compared between the specific and full protein-coding genes of the genome. The KO terms with a light green background are associated with immunity, and those with a yellow or blue background are associated with signal recognition and regeneration, respectively. (XLSX) S14 Table. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of the expanded gene families in A. japonicus. The gene ratios in each KEGG Orthology (KO) term are compared between the significantly expanded and full protein-coding genes of the genome. The KO terms with a light green background are associated with immunity, and those with a yellow or blue background are associated with signal recognition and regeneration, respectively. (XLSX) S15  Table. Fox genes in 4 echinoderm genomes. The draft scaffolds of 4 echinoderm genomes were blasted against the Fox gene family, and the homologous regions were extracted to blast against the nr database. Gene names indicate the best hit gene from the nr database. (XLSX) S17 Table