Large scale changes in the transcriptome of Eisenia fetida during regeneration

Earthworms show a wide spectrum of regenerative potential with certain species like Eisenia fetida capable of regenerating more than two-thirds of their body while other closely related species, such as Paranais litoralis seem to have lost this ability. Earthworms belong to the phylum Annelida, in which the genomes of the marine oligochaete Capitella telata and the freshwater leech Helobdella robusta have been sequenced and studied. Herein, we report the transcriptomic changes in Eisenia fetida (Indian isolate) during regeneration. Following injury, E. fetida regenerates the posterior segments in a time spanning several weeks. We analyzed gene expression changes both in the newly regenerating cells and in the adjacent tissue, at early (15days post amputation), intermediate (20days post amputation) and late (30 days post amputation) by RNAseq based de novo assembly and comparison of transcriptomes. We also generated a draft genome sequence of this terrestrial red worm using short reads and mate-pair reads. An in-depth analysis of the miRNome of the worm showed that many miRNA gene families have undergone extensive duplications. Sox4, a master regulator of TGF-beta mediated epithelial-mesenchymal transition was induced in the newly regenerated tissue. Genes for several proteins such as sialidases and neurotrophins were identified amongst the differentially expressed transcripts. The regeneration of the ventral nerve cord was also accompanied by the induction of nerve growth factor and neurofilament genes. We identified 315 novel differentially expressed transcripts in the transcriptome, that have no homolog in any other species. Surprisingly, 82% of these novel differentially expressed transcripts showed poor potential for coding proteins, suggesting that novel ncRNAs may play a critical role in regeneration of earthworm.


Introduction
Members of the phylum Annelida, commonly represented by earthworms and leeches are cosmopolitan in distribution, occupying a variety of niches from the soil in our gardens to marine sediments. Some species are ecologically relevant because they sieve our soil [1], while others burrow through deep marine sediments [2]. Several annelids parasitize a variety of marine and terrestrial hosts [3]. Annelids have also been found as fossilized remains from the Cambrian era [4]. Broadly divided into two major classes; Clitellata which is further divided into Oligocheates (earthworms) and Hirudinea (leeches) and the class Polychaeta (largely comprising marine worms), the systematics of this group of segmented worms undergoes constant reworking in the light of modern molecular tools employed for classification [5,6]. They belong to the superphylum Lophotrochozoa, which encompass phyla including the molluscs and flatworms, grouped according to protein coding surveys into a 402-ortholog dataset [7]. Earthworms also show intriguing behaviors like the ability to distinguish light of different wavelengths [6,8] and respond to tactile stimuli [9], vibrations [10] and dragging objects along directions that offer least resistance [11].
Many earthworm species have a remarkable ability to regenerate part of the body lost due to injury. The potential to regenerate lost body parts has repeatedly appeared and disappeared during evolution [12]. Planaria and Hydra display a type of regeneration called morphallaxis involving the reorganization of existing cells and re-programming of cell fates. The lizard tail, starfish arms and legs of some arthropods show autotomy as a defense. The breaking off of a body part is followed by re-growth and sometimes, incomplete re-establishment of cell types. The zebrafish fin and beaks of certain birds display spatially restricted regenerative potential of certain tissue types. In contrast, the earthworm and salamander limb offer models that display the coordinated regeneration of nerve, muscle, vasculature and blood.
In the presence of model organisms such as Hydra [13] or planarians [14] for studying regeneration, annelids pose an interesting new challenge. Annelids are highly variable in their regenerative capacity, with leeches showing none to some sabellids, chaetopterids and lumbriculids producing an entire individual from a mid-body segment [15]. All this hints toward a diversity of mechanisms for regeneration and asexual reproduction in annelids.
Previous attempts at characterizing the regenerative potential of earthworms have focused on the extent to which regeneration may be accomplished. Barring characterization of a few specific genes, there have been no comprehensive analyses of gene expression during regeneration in earthworms. Transcriptomic studies in the regenerating zebrafish fin [16,17], planaria [18,19] and hydra [20] have revealed the importance of growth factor signaling and re-programming factors during regeneration. Eventually, comparative analysis of genomes and transcriptomes of closely related species with widely different regenerative potential may reveal novel genes, regulatory elements or genomic rearrangements that account for the intermittent manifestation of regenerative ability during evolution.
Here, we report the transcriptome of the epigeic vermicomposting worm Eisenia fetida, also known as red-worm, along with an extensive analysis of transcriptome dynamics during regeneration. We report the presence of a neurotrophin gene, with limited similarity to mammalian nerve growth factor family that is upregulated during regeneration. We show that genes known to enhance neural regeneration are induced during earthworm regeneration implying that certain conserved molecular pathways are involved in this gene expression program. Based on HOX gene analysis from a draft genome Zwarycz et. al. have previously reported that the E. fetida genome has undergone extensive duplications [21]. We used miRNA as indicators of phylogenetic history to find that like Hox genes, several miRNA families have multiple paralogs in E. fetida.
In pursuit of novel drivers of regeneration, we show that 315 earthworm genes that encode proteins with no orthologs, were differentially expressed. Amongst these novel genes, 216 showed poor coding potential while 99 transcripts can potentially code for peptides that may be important in regeneration. The dominance of non-coding transcripts during regeneration may provide an explanation for the remarkable variation in regenerative potential amongst closely related annelid species.

DNA and RNA isolation
Eisenia fetida earthworms were procured from farmers engaged in vermicomposting and maintained in a culture in the laboratory at around 22 o C with moderate humidity. No specific permissions were required for procuring earthworms since they were being sold as vermicompost. They were brought in the lab and cultured as it is. They are neither endangered species nor do they come under animals requiring ethical approval. The worms were rinsed in tap water to remove any attached soil. They were then fixed in 70% ethanol for 5 minutes. A platform was used to pin the worm from both ends in order to dissect out the gut. The body wall was then cleaned thoroughly to remove all residual soil matter. The DNA was then isolated according to a protocol adapted from Adlouni et. al [22]. Briefly, the tissue was homogenized using liquid nitrogen and dissolved in 1ml of DNA extraction buffer (100mM NaCl; 50mM EDTA; 7%Sucrose; 0.5%SDS; 100mM Tris base; pH 8.8). Fifty microlitres of Proteinase K (10mg/ml) were added and the homogenate was incubated at 65 o C for 2 hours. The proteins were precipitated by 120uL of 8M Potassium acetate at 4 o C. Precipitated proteins were removed by centrifugation at 10,000g while the supernatant was treated with equal volume of Phenol: Chloroform: Isoamyl alcohol on ice. The aqueous layer was recovered after a 15-minute spin at 10,000g and DNA was precipitated by equal volume of Isopropanol. DNA was centrifuged at 10,000g and desalted by repeated 70% ethanol washes. The pellet was air dried and dissolved in Tris EDTA buffer (pH 7.5).
Samples from regenerating worms were collected at four different days post amputation (dpa)-immediately following amputation (0dpa), and subsequently at 15dpa, 20dpa and 30dpa. The tissue collected at 0dpa from 60 ± 6 segments was used as reference (0C) for comparison. The regenerated tissue and the tissue adjacent to it, termed control, were collected separately. RNA was isolated using the Trizol method. Briefly, the tissues were frozen in liquid nitrogen, homogenized using a mortar and pestle in the presence of 1ml Trizol reagent and transferred to a microfuge tube. Phase separation was done by adding 200uL chloroform and centrifugation at 10,000g. The aqueous phase was collected and equal volume of Isopropanol was added to precipitate RNA. The pellet was collected by centrifugation at 10,000g, followed by repeated 70% ethanol washes. The RNA was then air-dried and dissolved in nuclease-free water.

DNA and RNA sequencing
Approximately 5ug of DNA, taken from three different worms, was sheared using Covaris S220 platform and desired fragment sizes were selected by agarose gel electrophoresis. Paired end libraries of fragment sizes 200bps and 500bps were constructed using Illumina TruSeq DNA Library Prep Kit, while three mate-pair libraries were constructed of insert sizes 10Kb, 7Kb and 5Kb (average sizes since a range was cut from the agarose gel) using Nextra Mate-Pair Library Prep Kit according to manufacturer's protocol. It should be noted that for the matepair library construction, shearing was performed after circularization, as per manufacturer's guidelines. Briefly, the sheared DNA was end-repaired and purified using AMPure XP beads (Beckman Coulter). This end repaired DNA was then A-tailed and ligated to adapters. The ligated DNA fragments were then amplified by PCR and purified again using AMPure XP beads generating paired end libraries for sequencing on the Illumina Platform. The 200bpinsert library was sequenced using Illumina GAnalyzer II Platform while the 500bps-insert library was sequenced using HiSeq 2500. For generating long reads (not used in the assembly), Roche 454 libraries were constructed as per manufacturer's protocol using Shotgun sequencing approach of GS FLX+ system from Roche. Genomic DNA(1μg; as estimated using Qubit high sensitivity assay) was used for rapid library preparation, which includes DNA fragmentation by nebulization, fragment end repair, adaptor ligation and small fragment removal by AMPure beads based purification. Library was quantified using Quantifluor (Promega) and qualitatively assessed by Bioanalyzer (High sensitivity chip from Agilent). The average fragment length of library was between 1400-1800 bps with <10% of total fragments under 650 bps. This was used to make working aliquots (1 x 10 7 molecules/μl) for emulsion PCR (emPCR) standardization. For clonal amplification of the library, the optimal amount of DNA titrated using small volume emPCR was used for sequencing. After emPCR optimization, eight DNA copies per bead (cpb) were used for large volume emPCR. The enriched beads were sequenced using pyrosequencing on a picotiter plate. The raw data comprising of series of images were extracted, normalized and converted into flowgrams. The flowgrams were used during signal processing to generate analysis ready sequencing reads. The libraries were sequenced in two-region format of the picotiter plate. Lastly, for Oxford Nanopore sequencing, MAP06 kit was used according to manufacturer's protocol. Briefly, the DNA was sheared using Covaris g-Tube at 3000rpm, end repaired and A-tailed. Then, ligation was done with hairpin adapters linked to biotin labeled motor protein. The library was purified using Dynabeads M-280 and then sequenced. Metrichor was used for basecalling. Poretools was used to get FASTA sequences [23].
Approximately, 1ug of RNA was taken per sample and RNA sequencing libraries were made using TruSeq v2 Library Prep Kit as per manufacturer's protocol. Briefly, the RNA was polyA selected using OligodT magnetic beads followed by shearing into 200-500bp fragments. This sheared RNA was then used to generate cDNA. The cDNA was end-repaired to blunt ends. These blunt ends were then A-tailed i.e. an "A" overhang was added so as to ligate the adapters in the next step. The adapter-ligated cDNA was then amplified by PCR and purified by AMPure XP beads. This library was then quality checked and sequenced on Illumina HiSeq 2500. For small RNA sequencing, total RNA was isolated from body wall and sequenced on the Illumina platform.~800ng of Total RNA was used as the starting material. Briefly, 3' adaptors were ligated to the specific 3'OH group of microRNAs followed by 5' adaptor ligation. The ligated products were reverse transcribed with Superscript III Reverse transcriptase by priming with Reverse transcriptase primers. The cDNA was enriched and bar-coded by PCR (15 cycles) and cleaned using Polyacrylamide gel. The library was size selected in the range of 140 -160bp followed by overnight gel elution and salt precipitation using Glycogen, 3M Sodium acetate and absolute ethanol. The precipitate was re-suspended in Nuclease free water. The prepared library was quantified using Qubit Fluorometer, and validated for quality on High Sensitivity Bioanalyzer Chip (Agilent).

Genome assembly
The quality of sequencing data was checked using FastQC and reads of Phred score>33 were used for further. Before starting the assembly, the quality check of the reads was done using FASTQC. The reads were trimmed to remove low quality reads (Phred score 33) using Trimmomatic. Further, reads were filtered for microbial contamination by removing reads matching any organism using an in-house database. The draft genome was made using CLC Genomics Workbench with a word size of 64 and bubble size of 50. The paired end data (obtained from Illumina HiSeq 2500 with an insert size of 500bps) was used for assembling contigs while mate pair data (with an insert size of 3.5-5.5Kb) was used for scaffolding the contigs for the final assembly. The resulting assembly was assessed using Assemblathon script to get the N50 statistics.

Transcriptome analysis and annotation
The quality of RNA sequencing reads were checked using FastQC and reads of Phred score>33 were used for adapter trimming by Trimmomatic-v0.36 (default parameters). The data was then de novo assembled using Trinity-v2.3.2 package [24]. The assemblies were annotated using Transdecoder-v4.1 (http://transdecoder.sf.net) for finding ORFs. The peptides were compared to Uniprot entries using default parameters in BLAST. The samples were assembled together in a single assembly using Trinity-v2.3.2 and annotated by Trinotate. This assembly was then used as a reference to align reads of individual samples using Bowtie-v2.2.9 [25] and was then assembled using Cufflinks-v2.2.1. Read counts were obtained from alignment files using HTSeq [26]. Differential expression profiles were generated using DESeq2 [27] with multiple sample correction by Benjamini Hochberg method. The data generated was then analyzed using the MATLAB suite. Small RNA sequencing data was analyzed using miRminer [28,29] pipeline. Functional classification of differentially expressed genes was carried out using DAVID. Comparisons were done against user-defined background gene list of Uniprot IDs of E. fetida orthologs. Benjamini-Hochberg corrected p-value <10 -4 was used to select over-represented GO terms. Coding potential of transcripts was calculated using CPAT [30] and CPC2[31].

RT PCR
The cDNA was prepared using NEB MMULV-RT enzyme using random hexamers. RT PCR was performed using Takara SYBRII in Roche LightCycler 480. The primers used have been listed in Supplementary Table (S1 Table). The products were visualized by agarose gel electrophoresis.

PCR cloning and sanger sequencing
The cDNA was prepared using Transcriptor High Fidelity cDNA synthesis kit (Roche). The cDNA was used for PCR reaction using Fermentas Taq Polymerase. The PCR product was cloned in pET23A and sequenced using T7 primers in ABI 3130xl Genetic Analyzer according to manufacturer's protocol.

Cryosectioning and histology
Earthworm with regenerated tails were fixed in 4% para-formaldehyde solution at room temperature for four to five hours. After fixation, the samples were washed two to three times with phosphate buffered saline (pH 7.0) at room temperature. After, washing, the samples were kept in a 15% sucrose solution (in PBS) and later in 30% sucrose solution at 4˚C until the tissue was immersed. The regenerated portion of earthworm was then cut into appropriate sizes to be mounted into cryostat. The samples were frozen at -20˚C in the presence of Tissue Freezing Medium (Jung). Cross-sections of 20μm thickness were cut from the regenerated portion and immediately transferred onto glass slides coated with gelatin by an artist brush and thaw mounted. The images were taken using a standard bright field microscope at 5X and 10X magnifications after hematoxylin and eosin staining.

In situ hybridization
Regenerated earthworms (15dpa, 20dpa and 30dpa) were collected and washed with autoclaved ultrapure type-1 water, followed by overnight fixation at 4˚C in 4% (w/v) para-formaldehyde (Sigma-Aldrich) prepared in 1XPBS, pH 7.4. Stringent washes using 0.1% tween-20 in 1XPBS were given to the worms with subsequent storage in 100% methanol at -20˚C.

Regeneration in E. fetida
Mature E. fetida with 100±10 body segments show a remarkable potential to regenerate posterior segments following amputation. Early studies on E. fetida [32,33], reviewed recently by Bely and Sikes [34] mention that the posterior can also regenerate the anterior segments. However, a more recent report by Nengwen et. al. shows that the posterior can survive and regenerate the anterior segments only when the loss is restricted to the first 7 segments [35]. This is in agreement with our own observations, which prompted us to focus on the ability of E. fetida to regenerate posterior segments. To identify genes differentially expressed during regeneration, we carried out RNA sequencing and created a temporal profile of gene expression during regeneration in E. fetida.
In this study, earthworms were cut transversely at 60 ± 6 segments and allowed to regenerate for 15, 20 or 30 days (Fig 1A). At 15, 20 and 30 days post amputation (dpa), the regenerated tissue (henceforth referred to as 15R, 20R and 30R) and the (old) adjacent tissue (henceforth referred to as 15C, 20C and 30C) were collected ( Fig 1B) from each batch consisting of 50-100 worms. In one batch, immediately after amputation, tissues were collected from the site (60 ± 6; henceforth called 0C) to be used as a reference for comparing the regenerated and control tissues. Besides these samples, we also carried out RNA-Seq of the posterior (segments 60-100; called P0 for Posterior at 0days) as shown in Fig 1A. Direct comparison of the regenerating tissue with P0 allowed us to establish the extent to which regeneration is completed by 30dpa. Three biological replicates, starting with different batches of earthworms were performed. Total RNA was used for RNA-Seq analysis and the reads were assembled, de novo, using Trinity [24] and annotated (see Methods). As shown in S1 Fig, comparative analysis was performed using DESeq2[27] to identify genes that were differentially expressed in any of the tissues (15C, 20C, 30C, 15R, 20R or 30R) when compared to the reference (0C). In the tissue immediately anterior to the site of injury, less than hundred genes (Table 1) were significantly differentially expressed at any point during regeneration. Gene Ontology classification using DAVID also failed to cluster these genes into any specific functional category (Table 2) on the basis of biological process or molecular function (Benjamini Hochberg corrected pVal < 10 -4 ).
In contrast, in the regenerating tissue, thousands of transcripts from a total of 71,341 transcripts (including isoforms) were differentially expressed ( Table 1). The largest number of  (14), Cilium movement (20) DNA replication initiation (21) Peptidyl-prolyl cis-trans isomerase activity (22) Aromatase activity (20), Heme binding (53) Regenerated at 20dpa (20R) BP Cell division (72), Axoneme assembly (14) DNA replication initiation (16) differentially expressed genes was in the regenerated region at 15dpa. The number of differentially expressed genes steadily reduced as regeneration proceeds, at later time points (20dpa and 30dpa). Next, we identified biological processes and molecular functions that were overrepresented in the up-regulated or down-regulated genes (Fig 1C, 1D, 1E and 1F). Overall, it is clear that in the early stages of regeneration, genes involved in translation, DNA replication and cell division are upregulated while in later stages, extra-cellular matrix remodeling is undertaken.
We also compared the changes occurring in the regenerating tissue with the adjacent tissue. A majority of the differentially expressed genes were downregulated in the regenerating tissue when compared to the adjacent tissue (Fig 2). This apparent downregulation of genes maybe a reflection of reduced heterogeneity amongst the undifferentiated, rapidly proliferating cells in the regenerating tissue. The regenerating tissue, initially, is a homogenous mass of undifferentiated cells, wherein many genes appear downregulated because the corresponding cell type has not been formed. For instance, cilium associated genes (Fig 1E and 1F) are down regulated until the nephridia and sensory cells in the epidermis are restored. The giant extracellular hemoglobin of annelids has been studied extensively, for its exceptional oxygen carrier properties. Interestingly, the expression of the gene for giant extracellular hemoglobin (Fig 2D, 2E and 2F) of E. fetida is induced in the regenerating tissue at 15dpa and 20dpa (Adjusted pVal 0.05). By 30dpa, the regenerated tissue has fewer differentially expressed genes, suggesting that the program of regeneration is restoring the expression of genes. Our results suggest  that, following injury, the regenerating tissue mounts a well-coordinated program of gene expression.

Regeneration factors
We used k-means clustering to identify co-regulated gene clusters upregulated in the regenerating tissue. The most highly induced gene (>100 fold) in the regenerating tissue is Brachyury (Fig 3A). Using the Brachyury profile as a reference, we identified a cluster of 951 genes with a strong profile match (Fig 3B). This cluster consists of several developmental regulators like FGF, BMP2, SOX4 and WNT signaling genes (Fig 3C). We verified the expression of SOX4, in the regenerating tissue using in situ hybridization. The tip of the regenerating tissue showed a strong and consistent induction of SOX4 (Fig 3D). Another gene, strongly induced in the regenerating posterior is the earthworm ortholog of the homeobox segmentation gene evenskipped (Fig 3C). In earthworm, 38 novel genes with no apparent homology in vertebrate genomes are part of the cluster of 951 genes co-expressed with Brachyury during regeneration (Fig 3B and 3C). In the absence of functional conservation, co-expression of genes can be useful in predicting function, in this instance, implicating these novel genes in regeneration and posterior development.

Regeneration of blood and immune cells
A closer examination of the global gene expression profile, especially the genes that are strongly repressed at 15dpa revealed that a group of apparently co-regulated globins was initially depleted in the regenerating tissue. As the newly formed tissue matured, the globin genes became more abundant and eventually at 30dpa were comparable in expression to the adjacent tissue. We believe this is a reflection of the vascularization of the regenerating tissue and the associated presence of blood cells.
The Sialidase (SIAE) gene, which is involved in B-cell differentiation in humans (56), had no known ortholog in the C. elegans and Drosophila melanogaster genomes, but genomic analysis revealed the presence of two closely related sialidase genes in the earthworm genome, occuring on two distinct contigs (Fig 4A). We were intrigued by the divergent expression patterns of the two orthologs, both of which resemble the single human SIAE gene (Fig 4B). We designed ortholog specific exon-spanning primers and validated our findings from genome assembly and RNA-Seq by RT-PCR. SIAE1, was upregulated in the regenerating tissue by 6.3 fold (data not shown) at 15days post cut while SIAE2 was downregulated (Fig 4C).

Neural regeneration
We studied the differentially expressed genes during earthworm regeneration to identify expression profiles associated with neural regeneration in the posterior segments. The strong induction of Nerve Growth Factor and the neurofilament gene, NF70 indicated ongoing neurogenesis (Fig 5A and 5B). Earthworm anatomy shows a ventral nerve chord that connects the dorsal ganglia present in the anterior segments to the rest of the body [36,37]. In agreement with early reports [38], the ventral cord was visible in the cross-section of the regenerated tissue thus offering a model for nerve regeneration (Fig 5C). The conserved gene architecture with the presence of the characteristic furin cleavage site allowed us to align invertebrate and vertebrate NGF like genes (Fig 5D, 5E and 5F), in spite of the limited homology (S2 Fig). The annotation of the E. fetida transcriptome also comprises TrkB receptors, known to bind nerve growth factors. As shown in Fig 3C, TRK1, encoding TrkB, was amongst the highly induced genes in the regenerating tissue. Large scale changes in the transcriptome of Eisenia fetida during regeneration

Novel regeneration genes
Having established that the regenerating tissue of the earthworm showed differential expression of protein coding genes involved in tissue re-programming, developmental cell fate decisions and growth factor signaling, we explored the contribution of "function unknown" genes. This includes 54555 non-coding RNA containing no ORFs and 3008 transcripts which potentially code for peptides with no similarity to any known protein. Amongst 9645 differentially expressed mRNA transcripts from the DEseq2 analysis (padj<0.05), 315 had no apparent ortholog. We analyzed these genes using CPAT[30] and the more stringent CPC2[31] to verify their protein coding potential. Only 18% of the novel differentially expressed genes showed any coding potential. Of these, sixteen transcripts with no apparent ortholog or protein coding potential had a significant basal expression (>100) and were upregulated more than four fold at least at one time point during regeneration (Table 3). Taking transcript isoforms into consideration, these sixteen transcripts arise from thirteen genes. We mapped the genes to genomic scaffolds (see below) to identify protein coding genes located in the vicinity of the novel non-coding genes. One of the novel lncRNAs partially overlaps with the ABCF4 gene. Thus, the vast majority of novel regeneration genes in E. fetida were long non-coding transcripts. In spite of the annotation presented in Table 3, we cannot rule out the possibility that some of   these apparently non-coding transcripts maybe fragments derived from protein coding isoforms.

E. fetida genome sequence
The diploid genome of a group of Eisenia fetida worms from a vermicompost culture maintained in the laboratory was sequenced. We confirmed the species of the worms by sequencing  the variable region of cytochrome c oxidase amplified using universal primers recommended for DNA bar-coding [39,40]. The whole-genome shotgun sequencing and de novo assembly was based on Illumina short reads from the genome and a mate-pair library for scaffolding. The best assembly provided 4,63,133 contigs (N50 = 967bp) and 3,99,006 scaffolds (N50 = 9Kb) (  [41]. The diploid chromosome number has been shown to be 22 [42]. In good agreement with their findings, we also report that the GC content of the E. fetida genome is 40%. The genome size estimated from nuclear DNA content (1.37 X 10 9 ) also agrees well with the genome size reported here (1.4 X10 9 ). The scaffolds were obtained by assembling 1.3 billion paired-end Illumina reads (read length = 100bp) and 398 million reads from a mate pair library of 10Kb. The quality of the assembled genome was verified using BUSCO [43], wherein 18.2% of the 303 BUSCO groups searched were identified in our genome. By assessing the ability to detect the 4,329 EST sequences of E. fetida that were previously available in Genbank, we found 85.8% of the previously known ESTs (3,718 of 4,329) were identified in our genome assembly. We also carried out a de novo assembly of RNA-Seq reads from the body wall of a pool of E. fetida worms using the Trinity package. 94.6% (67,491 out of 71,341) of Transdecoder verified transcripts were mapped to at least one scaffold of the genome. We also carried out RNA-Seq and de novo assembly for tissue collected from the regenerating region earthworms during three stages of regeneration (15,20 and 30 days post amputation). A combined assembly of all the RNA-Seq data led to the identification of 42,610 genes collectively accounting for 71,341 transcript. The mitochondrial genome sequence was assembled as a single contig of 16kb from an assembly of 1.1 million Roche 454 reads (read length = 1000bp) and showed high conservation to the mitochondrial genomes of the deep-dwelling earthworm Lumbricus terrestris and the marine oligochaete, Capitella teleta ( [44,45], S3 Fig). During the course of this study, Zwarycz et. al. have also sequenced a North American isolate of E. fetida [21]. Our assembly has approximately five fold higher contig N50 and Scaffold N50 (see Table 4 for comparison of both assemblies). Zwarcyz et. al. reported, from the analysis of HOX genes, that an exceptionally large number of duplication events have occurred in the earthworm lineage [21].

miRNome: Genome duplications
Using MiRminer[28,29] we identified 190 microRNA genes belonging to 66 miRNA families, including 18 gene families present only in other lophotrochozoans, neotrochozoans, or annelid worms, and 7 novel miRNA genes not found in Capitella teleta or any other metazoan thus far investigated (S2 Table; Fig 6). Like most metazoan taxa [46,47], the amount of miRNA family loss appears to be small; these losses, affecting five different miRNA families where neither a locus was detected in the various assemblies nor reads detected in our small RNA library (S2 Table), appear to be shared with Lumbricus terrestrialis [29], and thus most likely occurred after oligochaetes split from C. teleta, but before the split between the two earthworm species.
In contrast to C. teleta or any other invertebrate yet investigated, E. fetida is rich in miRNA paralogs averaging nearly 2.5 paralogs per ancestral gene (S3 Table). Some families, like Mir-10, Mir-124 and Mir-137, have more copies per gene family than any other animal (including human) yet investigated. Because each of these paralogs is characterized by a unique star (and often mature) sequence, and reads of the mature (and often star) sequence were detected in our small RNA library (S2 Table), the unusually high number of paralogs cannot simply be due to unrecognized heterozygosity (see Zwarycz et al. 2015[21]). Instead, our observation is consistent with what is known regarding homeobox genes in this worm; Zwarycz et al. [21]  found that E. fetida has at least 364 homeobox genes with multiple representatives of Hox genes including four Hox1/labial, Post1 and Post2 genes; and at least three each of Hox3, Hox5/ Scr, Lox2, and Lox4. Indeed, we found multiple copies of Mir-10 genes associated within the Hox cluster [48] including two mir-10s, three mir-993 genes, two copies of mir-10b (a mir-10 paralog found in annelids and molluscs), and one copy each of mir-1991 and mir-10c (an annelid-specific mir-10 paralog) (S2 Table). Therefore, the microRNAs we found in this worm are consistent with the idea that large, if not whole-scale, gene/genome duplication events occurred in this lineage after its split from C. teleta [21], duplications that affected the entire genome including the miRNAs.

Discussion
Studies in comparative regeneration within annelids would provide insight into the gain or loss of regenerative capability over a short evolutionary span. Some species that recently lost regenerative capability might possess a latent ability suppressed due to evolutionary loss of key genes [15]. In Hydra and some vertebrates regeneration has been 'rescued' by experimental expression of certain vital signaling molecules, Wnt-signalling pathway members [49,50], membrane proteins [51] or by suppression of immune responses [52]. In annelids, work on marine naidine annelids has yielded that the position and developmental stage of amputations made, can 'naturally' rescue regenerative capacities [34]. The ability to regenerate anterior segments has been lost at least 12 times while the posterior regenerative ability was lost at least 4 times during the evolution of annelids [53]. The genome of E. fetida can in future provide the Number of Paralogs: basis for comparative genomics of regenerating and non-regenerating species within the large class of annelids. Regenerating regions of earthworm showed a robust induction of the brachyury gene. Brachyury, literally meaning short-tail after the heritable shortened tail of mice that carry a mutation in this transcription factor [54,55], is expressed in the mesoderm and is required for the specification of posterior structures in fruit flies [56], sea urchins [57], nematodes [58], zebrafish [59], frogs [60], chicken [61] and humans, suggesting a conserved role across evolution [62,63]. In the hydra and newt, it has been shown to be induced in regenerating tissues [64,65]. In agreement with our RNAseq results, in situ hybridization studies in Capitella telata [66] and Platynereis dumerilii [67] have shown that even skipped is a marker of posterior development. Further, it has been implicated in zebrafish fin regeneration [68]. Taken together, the similarities between the genes induced in the regenerating regions, suggest a common regeneration program in these organisms.
SOX4 was also co regulated with Brachyury in the regenerating tissue. Sox4 is known to be a master regulator of epithelial to mesenchymal transition [69]. Our Gene Ontology analysis had shown that ECM genes were over-represented amongst the genes upregulated in the regenerating tissue. In earthworm, we found that SOX4 was strongly induced at the distal tip of the regenerating tissue. Taken together, the induction of SOX4, a master regulator of EMT and ECM genes suggests that signaling through the extra-cellular matrix is a vital component of regeneration in E. fetida.
Subtractive hybridization experiments in regenerating fragments of Enchytraeus japonensis revealed that novel genes (which are not represented in other phyla or even the Lumbricus rubellus) were upregulated in regenerative processes hinting towards the involvement of annelid specific genes in regeneration [70]. Gene expression profiling in anterior regenerating segments of Perionyx excavatus also yielded several novel ESTs, involved in regeneration [71]. Transcriptomics analysis of a mixed-stage sample of regeneration and fission in Pristina leidyi has also been recently attempted [72]. In our study, we found that 315 amongst the 9,645 genes upregulated during regeneration had no known ortholog. The large fraction of genes of unknown function suggests that, like in other annelid species, novel factors are involved in E. fetida regeneration. Besides these novel factors, well known re-programming genes like SOX4 [73] and Brachyury [64,74] were also induced in the regenerating tissue. The co-regulation of these and the invertebrate specific factors (conserved differentiation factors) implies that they work in unison. We speculate that novel factors derived from earthworms could participate in cellular networks driven by mammalian reprogramming factors. For instance, our analyses revealed the presence of an earthworm Nerve Growth Factor with only rudimentary similarities to mammalian counterparts, but the Trk receptors showed a higher degree of conservation. In future, functional regeneration assays in mammalian nerve injury models using earthworm-derived factors can ascertain the barriers to regeneration in the mammalian nervous system.
Neural regeneration is diminished even in tail regeneration models like the leopard gecko, E. macularius, which shows robust regeneration of a hollow surrounding cartilaginous cone devoid of the spinal cord within [75]. Neurofilament, a major component of the neuronal cytoskeleton are primarily responsible for providing structural support for the axon and regulating axon diameter [76,77]. Mammalian neural regeneration is accompanied by the tight regulation and time dependent differential expression of neurofilament genes [78]. Besides neurofilament, nerve growth factors play an important role in neurogenesis, axon guidance [79] and appropriate innervations of regenerated tissue following injury in vertebrates. Nerve growth factors were not known in invertebrates and were even contested until recently. The presence of a nerve growth factor in Aplysia was thought to be a unique feature of molluscs amongst invertebrates [80]. The regenerative ability of E. fetida prompted us to explore the newly sequenced genome for NGF-like genes. We report here, the identification of a 747bp pro-NGF, which can give rise to a 681bp mature form with limited similarity to mammalian NGF. The regenerated ventral nerve cord was evident in histology while the poorly conserved NGF gene and neurofilament genes were induced in the regenerating earthworm. Taken together, it is clear that the neural tissue in earthworm is restored after regeneration.
Sialic acids are the N-or O-substituted derivatives of the 9-carbon monosaccharide [81], neuraminic acid, N-acetylneuraminic acid being the predominant form in mammalian neural cells while O-acetyl neuraminic acid is a marker for differentiation of immune cells [82,83]. In mammalian genomes, lysosomal and cytosolic isoforms of sialic acid 9-O-acetylesterase catalyze the removal of 9-O-acetylation and play an important role in auto-immunity and B-cell differentiation [84][85][86]. Earthworms, especially E. fetida have been studied extensively as a model for invertebrate innate immunity [87], but the role of SIAE has never been reported. Sialidase (SIAE) showed two paralogs with divergent expression pattern at all time points in the regenerated tissue, compared to the adjacent tissue. The expansion of Hox genes, miRNA genes and the SIAE genes suggest that gene duplication is a recurrent phenomenon in the evolutionary history of E. fetida. While the impact of this gene duplication and divergent regulation is yet unknown, the knowledge of novel orthologs may provide deeper insights into aspects of sialic acid mediated immune cell differentiation.
Varhalmi et. al. have previously studied the expression of orthologs of pituitary adenylate cyclase-activating polypeptide (PACAP) using radioimmunoassay and immunohistochemistry in E. fetida [88,89]. Since these factors are Neurotrophin in vertebrates, it was thought that their accumulation in the regenerating tissue of E. fetida was critical for its innervation. However, the sequences of these factors were not known and the authors relied on immunoreactivity to antibodies against orthologous human proteins. In our experiments, at 15dpa and 20dpa, the earthworm ortholog of PACAP type I receptor was downregulated in the regenerating tissue.
Eisenia fetida, or the red worm, is a surface dwelling epigeic annelid, distributed on every continent except Antarctica. Genetic mapping and profiling exercises have involved microsatellite markers[90-92] and AFLP [93]. Attempts have been made at bar-coding species using mitochondrial DNA markers such as COI and COII [93,94]. Terrestrial annelids are now represented by the genome of E. fetida (Indian isolate), reported here and an independently sequenced North American strain [21]. The ongoing Lumbricus genome sequencing effort is also likely to complement information derived from the E. fetida genomes paving the way for a detailed comparative analysis of deep burrowing and surface dwelling worms. For instance, the genome of the highly photophobic E. fetida contains several opsin-like sequences. Comparative genomics between deep dwelling and surface dwelling worms in the distribution and wavelength spectra of opsins could reveal interesting insights into molecular basis of behavior.
The emergence of affordable, next generation sequencing technology has enabled comparative genomics of vertebrates, but sequences of invertebrate genomes, especially larger ones has remained scant. By sequencing the genome along with a detailed transcriptomics analysis, we were able to identify novel factors involved in invertebrate specific regeneration. Study of invertebrate biology has in the past led to unforeseen applications of human use exemplified by the discovery of Green Fluorescent protein from Aequorea victoria that has led to an array of technologies. The earthworm genome and metagenome holds promise of novel factors involved in regeneration and a better understanding of soil ecology, paving the way for genomics based tools to study genetic diversity. The large diversity of earthworm species suggests that the E. fetida genomes can serve as a node for cross-species genome-scale comparisons.
Supporting information S1