The transcriptome of the Bermuda fireworm Odontosyllis enopla (Annelida: Syllidae): A unique luciferase gene family and putative epitoky-related genes

The Bermuda fireworm Odontosyllis enopla exhibits an extremely tight circalunar circadian behavior that results in an impressive bioluminescent mating swarm, thought to be due to a conventional luciferase-mediated oxidation of a light-emitting luciferin. In addition, the four eyes become hypertrophied and heavily pigmented, and the nephridial system is modified to store and release gametes and associated secretions. In an effort to elucidate transcripts related to bioluminescence, circadian or circalunar periodicity, as well as epitoky-related changes of the eyes and nephridial system, we examined the transcriptomic profile of three female O. enopla during a bioluminescent swarm in Ferry Reach, Bermuda. Using the well-characterized luciferase gene of the Japanese syllid Odontosyllis undecimdonta as a reference, a complete best-matching luciferase open reading frame (329 amino acids in length) was found in all three individuals analyzed in addition to numerous other paralogous sequences in this new gene family. No photoproteins were detected. We also recovered a predicted homolog of 4-coumarate-CoA ligase (268 amino acids in length) that best matched luciferase of the firefly Luciola with the best predicted template being the crystal structure of luciferase for Photinus pyralis, the common eastern firefly. A wide variety of genes associated with periodicity were recovered including predicted homologs of clock, bmal1, period, and timeless. Several genes corresponding to putative epitoky-related changes of the eyes were recovered including predicted homologs of a phototransduction gene, a retinol dehydrogenase and carotenoid isomerooxygenase as well as a visual perception related retinal rod rhodopsin-sensitive cGMP 3',5'-cyclic phosphodiesterase. Genes correlating to putative epitoky-related changes of the nephridia included predicted homologs of nephrocystin-3 and an egg-release sex peptide receptor.

Introduction store and release gametes, as well as release fluids that provide nutrition and support for the gametes [4].
Odontosyllis species are classified in the annelid family Syllidae, which contains 74 genera and 700+ species [11,12]. Odontosyllis is currently a member of the subfamily Eusyllinae, reorganized to be monophyletic by Aguado et al. [13]. The genus comprises 55 species (WoRMS), though phylogenetic analyses that included several species revealed that it may not be a monophyletic group [13,14]. The bioluminescent genera within the Syllidae are Odontosyllis, Eusyllis, and Nudisyllis, the three with a relatively large number of species. For Odontosyllis, several species have been documented with light emitting properties, and within Eusyllis and Nudisyllis at least one species, respectively (i.e., E. blomstrandi and N. pulligera) [15,16]. It has been suggested that this species-level diversity could be attributed to strong sexual selection stemming from bioluminescent courtship displays [17]. The Syllidae dominate many benthic communities and are known for a striking variety of reproductive modes [4,13,14,18]. Bioluminescence also occurs in several phylogenetically diverse marine and terrestrial annelid lineages [19,20,21]. Chaetopterus and Mesochaetopterus and the terebellid Polycirrus release a luminescent glow when disturbed [22,23]. Some species of Tomopteris produce light of different colors [24] and a group of deep-sea polychaetes releases green luminescent "bombs" when alarmed (the "bombs" are fluid-filled structures, homologous to segmental branchiae, that emit light when released [25]). Whereas several polynoids use a protein triggered with superoxide radicals [26], the biochemical mechanisms related to bioluminescence are not yet well understood for other taxa [27].
Bioluminescence in the "Bermuda fireworm" Odontosyllis enopla has been the subject of considerable study [2,3,5,7,28,29,30] and is thought to be due to a conventional luciferasemediated oxidation of a light-emitting luciferin [31]; though some have speculated on the involvement of a secondary photoprotein [9,32]. Support for a luciferin-based system comes from [7,30] which showed that the eyes of males, which are larger than females, display increased spectral sensitivity (~510-520 nm) in the luciferin emission spectrum (507-516 nm).
Other species of Odontosyllis that have been documented as bioluminescent include: O. phosphorea [6,9], O. luminosa [8], O. octodentata [33], O. polycera [10], and O. undecimdonta [34,35], among others. The mysterious lights described by Columbus have been attributed to O. enopla or O. luminosa [1,3,8]. While bioluminescence in these polychaetes is directly related to intraspecific reproductive behavior, it might also be a multifunctional process [9]. For example, luminescence during non-swarming periods has been documented for the benthic forms of O. enopla and O. phosphorea in response to physical disturbance, which has been considered a possible deterrent strategy against predators [3,9]. Moreover, Gaston and Hall [8] proposed that bioluminescence in O. luminosa is indeed used as an aposematic signal, since their predators were observed to regurgitate recently ingested luminescent worms. Eusyllis blomstrandi is also known to emit light not only for reproductive purposes but also as defensive mechanisms. They are able to detach their posterior luminescent part of their bodies to distract predators while the anterior end escapes [15].
During the last decade, new technologies and progress in sequencing techniques have made it possible to elucidate whole genomes and transcriptomes [36]. Transcriptome sequencing has been applied widely for different purposes in annelid research; e.g., to reconstruct the Annelida Tree of Life [37] or to find certain genes that participate in specific biological processes, such as adaptation to deep sea and extreme environments [36], various larval developmental modes [38], anticoagulant capabilities [39], reproductive processes [40], and characterization of certain venomous toxins [41]. Recently, Mehr et al. [42] found several genes in Hermodice carunculata (Amphinomidae) that could be involved in light production, though the species is not known to be bioluminescent. The first available syllid transcriptome was provided by Weigert et al. [37], and recently, the transcriptome of Typosyllis antoni has been presented as a tool for the study of developmental and evolutionary processes in the Syllidae (Ponz et al.,submitted [43]). Schultz et al. [44] used a combination of bioluminescent protein purification, luciferin purification, in vitro expression, and RNA sequencing to identify and characterize a luciferase within the Japanese syllid O. undecimdonta. The authors concluded that the luciferase of O. undecimdonta is "evolutionarily unique" as they found no identifiable homologous proteins when querying publicly available datasets using BLAST and HMMER.
Herein we examine the transcriptomic profile of female O. enopla during a bioluminescent swarm in an effort to elucidate transcripts related to bioluminescence, circadian and circalunar periodicity, as well as changes of the eyes and nephridial system. Genes involved in bioluminescence processes are not only interesting from an evolutionary point of view but also because of their possible biotechnological applications [45].

Materials & methods
After acquiring a collection permit (#151001) from the Bermuda Department of Environmental Protection, we collected 12 female Odontosyllis enopla from Ferry Reach, St. George's Parish, Bermuda (32.362368, -64.714034) on October 30, 2015 beginning at 7:26 pm (the third night after the full moon, 55 minutes after sunset and 90 minutes after a -3.7 cm low tide). Bioluminescing female worms were captured from the water surface with live insect forceps (Fine Science Tools, Foster City, CA, USA) and immediately immersed in 40 mL of RNAlater (Thermo Fisher Scientific, Waltham, MA, USA). Specimens were individually placed in fresh RNAlater in cryotubes and stored at -80˚C at the Bermuda Institute of Ocean Sciences prior to export to the American Museum of Natural History (New York City) in April 2016. Total RNA was isolated from the whole body of three of the 12 worms with a modified RNeasy Tissue Kit (Qiagen) protocol (see [39] for details). Final RNA concentration was determined with the Agilent RNA 6000 Nano Kit on an Agilent 2100 Bioanalyzer. Isolates were prepared using the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA) with a 350 bp insert size and run at the New York Genome Center on an Illumina HiSeq 2500 (2 x 125 bp) allocating one-eighth of a lane for each isolate.
Adaptor sequences, polyadenylation and low-quality regions (Phred score <20) of resulting reads were trimmed with Trimmomatic [46]. Overall quality of reads was verified with FastQC v0.10.1 and have been deposited in NCBI's Short Read Archive under BioProject ID PRJNA448700. Filtered reads were assembled using the default parameters of the Trinity de novo assembler version r20130225 for each O. enopla specimen independently. Assembled contigs then were examined with Transdecoder v. 3.0.0. (https://transdecoder.github.io/) for best-predicted open reading frames (ORFs) greater than 100 amino acids in length. Assembled contigs (with blastx) and predicted ORFs (with blastp) were screened against published sequences for a recently described luciferase from O. undecimdonta, an additional 536 annotated luciferase proteins, 71 annotated photoproteins, and 23 annotated proteins pertaining to periodicity (i.e., clock, period, timeless, bmal, cry, timeout, pdp1 and vrille; these stand-alone databases of known annotated luciferase and photoprotein genes are available upon request) and against 2748 KOGs (euKaryotic Orthologous Groups; comprised of 458 core eukaryotic genes for six species; [47]). Matching contigs and ORFs exceeding a threshold of 1e -10 from this screening were then compared to the UniProtKB/Swiss-Prot annotated database of NCBI. Sequences that did not return best reciprocal matches in this manner were not analyzed further. Conserved domain architecture and active site prediction were determined in relation to NCBI curated domains (CDD online, v3.16) [48]. The best scoring hit to the annotated database of luciferase proteins (i.e., an ORF 268 amino acids in length) was also blasted against the annotated transcriptome of Typosyllis antoni (Ponz et al., submitted [43]) to further confirm or refute its putative identity.
Illumina reads (150 x 150 bp) produced by Schultz et al. [44] were assembled de novo using default parameters in Trinity (v2.6.5) in order to further examine the extent of paralogs of the Odontosyllis-specific luciferase gene family.
Genes associated with putative epitoky-related changes of the eyes and nephridia were recovered via a blastp comparison of O. enopla transcripts (for all three individuals) against the entire nr database. Using default parameters, the RaptorX web server [49] was used to predict the protein structure and function of the O. enopla CoA ligase, which is 268 amino acids in length.
Among contigs for each worm were (respectively) 7, 4 and 5 principal assembly isoform groups that matched (at 1e -20 to 1e -107 ) known luminescent proteins from O. undecimdonta (Fig 2), and an additional 483, 503 and 536 that yielded high scoring hits to a stand-alone database of other known annotated luciferase/monooxygenase genes.
Only 4, 2 and 1 yielded hits to the stand-alone database of known annotated photoprotein genes. Reciprocal comparison to UniProtKB/Swiss-Prot and the nr database yielded no other matching sequences for the transcripts matching the known luminescent proteins from O. undecimdonta and otherwise only yielded a single complete firefly luciferase-like open reading frame (268 amino acids in length) from two of the three worms (Conserved Protein Domain Family [hereby abbreviated CPDF]: Firefly_Luc_like [cd05911]) at 4.72e -79 ). No photoproteins resulted from reciprocal comparisons to UniProtKB/Swiss-Prot. The foregoing amino acid sequence produced no specific high-scoring matches to any other annelid sequences in the nr, RefSeq, or EST databases of NCBI. In the absence of functional data, we could not confirm the identity of this luciferase-like ORF as a true luciferase. Thus, in an effort to determine which superfamily this ORF is affiliated with, we conducted a blastp analysis against the annotated transcriptome of Typosyllis antoni (Ponz et al., submitted [43]), which resulted in a best hit to 4-coumarate-CoA ligase (e-value: 5e -66 ). The RaptorX web server predicted a single domain (with 100% of the 268 residues modeled) for the O. enopla CoA ligase, with the best template being the crystal structure of luciferase (RCSB Protein Data Bank template ID: 5DV9A) for Photinus pyralis, the common eastern firefly, at a p-value of 1.57e -17 .
The luciferase genes from individuals 2 and 3 of O. enopla were identical at the amino acid level; however, the luciferase of individual 1 contained two variable sites when compared to individuals 2 and 3 (p-distance: 0.608%). At the nucleotide level, the difference was 1.01% when comparing individual 1 with 2 and 3, and 0.606% when comparing individual 2 with 3. Given the number of paralogs of the luciferase gene found in O. enopla, we assembled the raw Illumina reads produced by Schultz et al. [44] and searched for paralogs of the luciferase gene in O. undecimdonta. We recovered two paralogs of the luciferase gene in O. undecimdonta; however, neither were reported by Schultz et al. [44] (Fig 3).
We compared each of the luciferase paralogs found within each O. enopla to the transcriptome of O. undecimdonta and did not find any significant matches.
A wide variety of genes putatively associated with periodicity were recovered from comparison of O. enopla transcripts against annotated local databases followed by reciprocal verification against UniProtKB/Swiss-Prot; these included predicted homologs of clock (GenBank Acc No AGX93013), bmal1 (GenBank Acc No O88529), period (GenBank Acc No AEJ87229), and timeless (GenBank Acc No AGX93010) (S1 Fig). In addition, two putative photosensitive cryptochrome transcripts were found corresponding to predicted homologs of the light-receptive (l-cry: GenBank Acc No AEJ87227) and transcriptional repressive (tr-cry: GenBank Acc No AGX93012) forms known from Platynereis dumerilii (S2 Fig) [50]. Orthologs representing timeout, vrille, or pdp1 were not found in any of the three female O. enopla.
Eight genes associated with putative epitoky-related changes of the eyes (Table 1 and  All 11 genes associated with putative epitoky-related changes of the eyes and nephridia were found in Individuals #1 and #2, while 10 of the 11 genes were found in Individual #3 (a transcript for a predicted homolog of nephrocystin-3 was not located).

Discussion
A complete luciferase open reading frame was found in all three individuals of Odontosyllis enopla that were analyzed which were highly similar to the expressed luciferase gene of the Japanese syllid O. undecimdonta (Schultz et al. [44]). These luciferase genes and their paralogs are evolutionarily unique; like Schultz et al. [44], we found no other identifiable homologous proteins when querying publicly available databases. No photoproteins were detected in any of the O. enopla transcriptomes. While we also recovered a predicted homolog of the luciferase of the firefly Luciola, this is also in the much larger 4-coumarate-CoA ligase gene family. The recovery of a luciferase, and perhaps too the CoA ligase with strong sequence similarity to a firefly luciferase, corroborates prior suppositions regarding the biochemical process of light emission in O. enopla [31] and casts doubt on the role of any photoproteins in bioluminescing O. enopla. Schultz et al. [44] are in orange. 'O_undecimdonta_DN31989' (green) is identical to one of the four isoforms but has a different name because it is based on our Trinity assembly. The two additional green terminals are paralogs of the O. undecimdonta luciferase that were not reported by Schultz et al. [44]. For O. enopla, orthologs are shown in purple and the paralogs in blue. O_enopla_1: Individual 1, O_enopla_2: Individual 2, O_enopla_3: Individual 3. The ML tree was constructed using a MUSCLE-based amino acid alignment and the following parameters: WAG + gamma + I model; aLRTbased support values. Multiple paralogs of the Odontosyllis-luciferase gene family were recovered from each of three O. enopla transcriptomes generated in this study. We also located two unreported paralogs of the luciferase gene in the published O. undecimdonta transcriptome. The specifics of which gene expression profiles vary with lunar cycles differs phylogenetically [51]. Transcripts from O. enopla comprise most of those loci already known to be responsible for entrainment and modification of circadian and circalunar periodicity in the P. dumerilii polychaete model [50]. Specifically, in P. dumerilii, each of clock, period, pdp1 and timeless circadian oscillator loci are modified by the worm's endogenous circalunar clock, while expression levels of bmal, tr-cry, vrille, and timeout are not. The cryptochrome l-cry has already been demonstrated to be involved in forebrain blue-light direct photo reception whereas tr-cry orthologs function in a core circadian clock positive/negative transcriptional loop [50].

Fig 3. An unrooted maximum likelihood-based phylogenetic tree showing the relationship of both orthologs and paralogs of the luciferase gene for Odontosyllis enopla and O. undecimdonta. The four transcripts (isoforms) found by
Odontosyllis enopla and O. luminosa exhibit extremely tight circalunar circadian behavior with bioluminescent mating swarms being predictable to the minute [8,29]. Should O. enopla be successfully kept in laboratory culture for extended periods, transcripts identified here should serve to shed new light on the level of control exerted on bioluminescence by circadian clock gene systems.
During the breeding period, female eyes become enlarged and heavily pigmented with a carotenoid (putatively a rhodopsin system). Several genes putatively corresponding to these epitoky-related changes of the eyes were recovered from O. enopla transcripts. Of particular interest is a predicted homolog of a phototransduction gene (354 amino acids in length) that is similar to that found within the retinal transcriptome of the sabellid polychaete Acromegalomma interruptum (GenBank Acc No ASQ43263). Optic-related phototransduction proteins are associated with ciliary photoreceptors, the latter of which hyperpolarize in response to illumination [52]. This process may be important during mating to coordinate spatially with males that are swimming rapidly toward glowing females while emitting short flashes of light in advance of releasing their own gametes. Other putative eye-related genes included: 1) a retinitis pigmentosa GTPase regulator (761 aa), which may help maintain photoreceptors by regulating the formation and function of cilia [53]; 2) a retinol dehydrogenase (retinaldehyde reductase) (327 aa) and epidermal retinol dehydrogenase 2-like isoform (306 aa), which are expressed in retinal pigment epithelium and help generate the polyene chromophore retinaldehyde (also known as retinal or vitamin A [54]; and 3) a carotenoid isomerooxygenase-like gene (529 aa) that is putatively involved in the synthesis of retinal from dietary caroteoids [55]. The presence of predicted homologs of retinol dehydrogenase and carotenoid isomerooxygenase is important as opsin-bound retinal is the chemical basis of vision. We also recovered a predicted homolog of a Class A rhodopsin-like G-protein coupled receptor (319 aa), which may function in light receptors. However, rhodopsin-like GPCRs can also function as hormones, which, if confirmed, would suggest that interindividual communication is also being conducted chemically via sex pheromones. We also recovered a predicted homolog of a photoreceptor-specific-like isoform of phosphatidate cytidylyltransferase (442 aa), which may be involved in the signal transduction mechanism of retina and neural cells, and a retinal rod rhodopsin-sensitive cGMP 3',5'-cyclic phosphodiesterase subunit delta gene (147 aa), which is involved in visual perception.
The nephridial system is also modified to store and release gametes and associated secretions during the breeding period. Several genes putatively corresponding to epitoky-related changes of the nephridia were also recovered from O. enopla transcripts. These included predicted homologs of a nephrocystin-3 (347 aa), a nephrocystin-3-like gene (437 aa), and a sex peptide receptor-like gene (178 aa). Nephrocystin-3 is required for normal ciliary development and function but is also involved in the development and morphogenesis of human kidneys. Nephridia are synonymous with the excretory organs of higher organisms (i.e., kidneys). During epitoky, nephrocystin-3 may help modify the cilia-lined tubules of the nephridia to store and release gametes, as well as release fluids that provide nutrition and support for the gametes. According to [56], sex peptide receptor (SPR) mediates the release of stored sperm, receptivity, egg production, and egg release in female Drosophila melanogaster. We hypothesize that SRP may have a similar function in O. enopla with regard to egg release during bioluminescent mating swarms.
Future studies should conduct a control experiment (i.e., analyzing non-mating worms from another day, males, etc.) to determine if the transcripts recovered during a bioluminescent mating swarm are specifically involved in the functions and/or processes we are hypothesizing herein, or whether they are expressed at all times in relation to general functions.

Conclusions
From each of three actively luminescing female O. enopla, we recovered a complete open reading frame matching a known and recently discovered Odontosyllis-specific luciferase gene family. We also recovered a CoA ligase that shows strong sequence similarity to a firefly luciferase. These findings corroborate prior suppositions regarding the biochemical process of light emission in O. enopla to the exclusion of photoproteins. Given the large suite and diversity of luciferase transcripts recovered from each of the three O. enopla transcriptomes, this species might serve as an ideal model organism to study the evolution and selection pressures apparent in this new luciferase gene family. The transcriptome also yielded a number of genes putatively related to circadian or circalunar periodicity, as well as epitoky-related changes of the eyes and nephridial system, all of which are important for successful breeding. The origins and evolutionary history of this and other luciferases will ultimately require a more complete understanding of the range of monooxygenases found among the vast diversity of taxa that lie between fireflies and syllid polychaetes.