An annotated CNS transcriptome of the medicinal leech, Hirudo verbana: De novo sequencing to characterize genes associated with nervous system activity

The medicinal leech is one of the most venerated model systems for the study of fundamental nervous system principles, ranging from single-cell excitability to complex sensorimotor integration. Yet, molecular analyses have yet to be extensively applied to complement the rich history of electrophysiological study that this animal has received. Here, we generated the first de novo transcriptome assembly from the entire central nervous system of Hirudo verbana, with the goal of providing a molecular resource, as well as to lay the foundation for a comprehensive discovery of genes fundamentally important for neural function. Our assembly generated 107,704 contigs from over 900 million raw reads. Of these 107,704 contigs, 39,047 (36%) were annotated using NCBI’s validated RefSeq sequence database. From this annotated central nervous system transcriptome, we began the process of curating genes related to nervous system function by identifying and characterizing 126 unique ion channel, receptor, transporter, and enzyme contigs. Additionally, we generated sequence counts to estimate the relative abundance of each identified ion channel and receptor contig in the transcriptome through Kallisto mapping. This transcriptome will serve as a valuable community resource for studies investigating the molecular underpinnings of neural function in leech and provide a reference for comparative analyses.


Library construction, sequencing, and de novo transcriptome assembly
Library construction and high-throughput sequencing services were performed by the University of Missouri DNA Core Facility (Columbia, MO, USA). Briefly, cDNA libraries were generated using the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, California, USA). Libraries were sequenced on the Illumina NextSeq 500 instrument in a 2 x 75 bp paired-end configuration. Fastq files generated from sequencing resulted in 919,249,178 total reads. Raw reads were trimmed with the Trimmomatic software package [36] to remove low quality bases, resulting in 897,014,584 clean reads. Trimmed fastq files were assembled into reference transcriptomes through two separate de novo assemblers for comparison: Trinity

Transcriptome quality assessment
To determine the quality of the Trinity and SeqMan NGen de novo whole nervous system transcriptome assemblies, a Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis was carried out on each transcriptome. The BUSCO analysis determines the number of complete, fragmented, or missing orthologs present in the transcriptome relative to the amount expected in the phylogenetic clade to which the organism is most closely related [38]. The H. verbana nerve cord transcriptome assemblies were compared against the Nematode and Arthropod databases. Complete orthologs are defined as being within 2 standard deviations in length from the ortholog in the database; fragmented orthologs are matches that fall outside the 95% expectation in length; missing orthologs failed to find a match in the transcriptome. The sequence length distributions were also compared between assembled transcriptomes to determine which assembly produced longer, higher quality contigs more often. Given the significantly larger number of contigs generated by the Trinity build, and the inherent advantages of its open-source software, all further analyses were done using the Trinity assembly. Following contamination analysis carried out by NCBI, 161 contigs were excluded from the final transcriptome for similarity to known bacterial sequences. each given gene family. That is to say, we do not mean to imply specific membership positions within each gene subfamily, but rather simply provide a unique identifier to each sequence based on the order that it was identified in the transcriptome.

Multiple sequence alignment and gene identification
Predicted coding sequences from H. verbana were identified from NCBI's Open Reading Frame finder (ORFfinder) tool (https://www.ncbi.nlm.nih.gov/orffinder/) using the standard genetic code and default minimal ORF nucleotide length. Amino acid sequences were translated from the longest ORF to be used in multiple sequence alignment (MSA) to compare gene families across species. H. verbana amino acid sequences were compared against that of Mus musculus, Drosophila melanogaster, and Aplysia californica. MSA was carried out using NCBI's Constraint Based Multiple Alignment (COBALT) tool (https://www.ncbi.nlm.nih.gov/tools/ cobalt/) using default parameters. COBALT was chosen over other MSA tools because it incorporates conserved protein domain information into the alignments, beyond simply comparing amino acid sequence independently [39]. Phylogenetic trees were then generated using tree method set to cobalt tree and a max seq distance set to 0.85 with midpoint rooting.

Gene ontology annotation
For assignment of gene ontology (GO) terms, the software package Blast2GO [40] (v5.0) was employed to functionally annotate the contigs of the Trinity assembled transcriptome. Using NCBI's validated RefSeq database of Animalia protein sequences, a fast-BLASTX search (Evalue threshold = 10 −5 ) returned hit scores against aligned validated sequences. Descriptions were assigned to each contig based on the BLAST Top-hit out of the 20 significant hits for each contig (score threshold ! 55). GO term assignment produces multiple levels of GO terms, with the broadest ontological classifications being biological process, molecular function, and cellular component [41]. Assignment of GO terms included an E-value threshold = 10 −5 , GO weight = 5, and annotation cutoff = 55. GO term annotation, unique contig identifiers, sequence lengths, and other associated annotation information can be found in S1 File.

RNA-seq TPM abundances
For quantitation of RNA-seq abundances for the transcripts investigated in this study, the software package Kallisto [42] (v0.43.1) was used to generate pseudo-alignments of the hundreds of millions of paired-end reads from the fastq files generated in the sequencing of the CNS of H. verbana against the coding sequences of identified genes of interest from the Trinity assembled transcriptome. Abundances were normalized using the transcripts per kilobase million (TPM) method, which accounts for contig length in kilobases and the number of millions of reads aligned.

CNS sequencing and de novo transcriptome assembly
From the sequencing of the entire CNS of H. verbana, 919,249,178 raw reads were generated from 75 bp paired-end Illumina sequencing. Filtering of reads resulted in 897,014,584 clean reads with over 97% of reads with a quality score greater or equal to 30. These high-quality reads were used in the de novo assembly of transcriptomes using two separate transcriptome assembly software packages. The two assemblers used in this study were the Trinity (v2.4.0) software package (comprised of Inchworm, Chrysalis, and Butterfly [43]) and SeqMan NGen from the DNAstar's Lasergene software package [44]. The Trinity assembly yielded 107,704 contiguous sequences (contigs), comprising 146,860,824 total bases, with an N 50 of 2544bp and mean contig length of 1363bp. The SeqMan assembly resulted in 64,565 contigs from 51,631,692 total bases with an N 50 of 1286 and mean length of 800bp (Table 1). When the sequence length distributions are compared, the Trinity assembly not only has more total sequences, but particularly has more sequences of a greater length than that of the SeqMan assembly ( Fig 1A).
To compare the quality of the Trinity transcriptome against the SeqMan transcriptome, the Benchmarking Universal Single-Copy Ortholog (BUSCO) software was used to assess the transcriptome completeness based on the presence of expected lineage-specific orthologs [38]. At the time of this publication, the Nematoda BUSCO reference contained 978 identified universal single-copy orthologs for assessing genomic and transcriptomic assembly quality. The BUSCO analysis reports how many of these orthologs were found to be complete, fragmented, or missing in the queried assembly. The Trinity assembly of the H. verbana CNS transcriptome resulted in 93.6% complete, 2.9% fragmented, and 3.5% missing BUSCOs, while the SeqMan assembly resulted in 86.7% complete, 7.8% fragmented, and 5.5% missing BUSCOs ( Fig 1B). Both transcriptome assemblies performed well compared to BUSCO assessments from other species, including both tissue-specific and whole organism transcriptomes. Particularly, the Trinity assembled H. verbana transcriptome had the greatest percentage of complete BUSCOs out of the references in the database. We also included two BUSCO assessments for previously published arthropod CNS transcriptomes from Cancer borealis and Homarus americanus [34] to compare BUSCO scores in a tissue specific manner, since the Nematoda references were missing comparable nervous system tissues.
Based on the transcriptome assembly statistics, sequence length distribution comparison, and BUSCO quality metrics, we decided to proceed with the Trinity transcriptome in annotation and analysis and did not move forward with the SeqMan transcriptome, which all metrics indicated was lower quality. The H. verbana Trinity assembled Transcriptome Shotgun Assembly (TSA) project has been deposited at DDBJ/ENA/GenBank (BioProject No. PRJNA435743; BioSample No. SAMN08595902) under the accession GGIQ00000000. The version described in this paper is the first version, GGIQ01000000. The Sequence Read Archive (SRA) can be found at SRR6782848.

Sequence identification and gene ontology annotation
Using the Blast2GO software suite [40], a fast-blastx alignment was carried out against NCBI's validated RefSeq Animalia database to identify and annotate contigs from the H. verbana CNS transcriptome. Out of the 107,704 contigs in the transcriptome, 39,047 contigs were annotated with a BLAST hit and sequence descriptor. Following BLASTing of the transcriptome, gene ontology mapping was carried out to assign GO terms to the contigs that returned a BLAST hit. The top 10 GO terms for each of the major GO classifications (biological process, molecular function, and cellular component) at a GO level of 3 are represented in Fig 2. This GO level was chosen as it balances returning broad GO terms while still being relevant to the specific tissue type from which the transcriptome was assembled. Examples of this can be seen in the GO terms reported in the cellular component classification, such as "cell projection" and "neuron part", as well as in molecular function in "transmembrane transporter activity". This annotated reference transcriptome is available for studies examining GO terms of interest from more specific levels than what is presented here.

Species hit distribution
In the annotation of the H. verbana CNS transcriptome, we captured the distribution of BLAST hits that corresponded to specific species (Fig 3). We separated the distributions into total BLAST hits, which includes up to 20 BLAST results for each contig, and top BLAST hit, which is the single hit with the highest score for that contig. The top species for both distributions was by far the California leech Helobdella robusta, which is unsurprising as both organisms belong to the sub-class Hirudinea within the clitellate annelids. Another annelid worm, Capitella teleta, was in the top 5 species for each distribution. Other notable species included the brachiopod Lingula anatine, as well as many members of the phylum Mollusca: the giant owl limpet Lottia gigantea, Pacific oyster Crassostrea gigas, and California sea slug Aplysia californica. With Aplysia being ranked 6 th in each distribution, as well as having a well-annotated nervous system transcriptome [45], we compared the previously undiscovered ion channel and receptor contigs mined from our transcriptome to those of Aplysia, as well as M. musculus and D. melanogaster.

Innexins
Since the innexin gap-junction proteins have been well-characterized in H. verbana from genomic sequencing [12], we used innexins as an additional benchmarking tool for assessing transcriptome quality by comparison of innexin contigs from our transcriptome against  Table 2. At the nucleotide level, our contigs matched the published innexin sequences with high fidelity (average 99.8% nucleotide identity). Independently arriving at near 100% nucleotide identity from over 20 genomic and transcriptomic comparisons provided us strong confidence in the quality of our transcriptome assembly. We further performed a basic comparison of innexin presence in the leech CNS by comparing the RT-PCR results from Kandarian et al. 2012 to RNA-seq counts from our transcriptome ( Table 2). The results coincide moderately well, with innexins having higher RNA-seq counts producing positive bands in the previous RT-PCR results, and those with lower RNA-seq counts not detected by RT-PCR [12].

Ion channels
In mining the H. verbana CNS transcriptome for sequences related to nervous system function, our first targets were ion channels. Our first-pass approach to mining the transcriptome yielded contigs that putatively belong to subfamilies of ion channels, but we cannot say with certainty that each contig represents a single unique gene. Rather, we report here the number of discrete, nonoverlapping contigs identified from our de novo transcriptome that could not be further combined based on assembly sequence alone. The contigs reported here lay the foundation for further more detailed characterization. In total, we identified 40 potassium (K + ) channels contigs, 5 transient receptor potential (TRP) channel contigs, 4 cyclic  nucleotide-gated channel contigs, 10 calcium (Ca 2+ ) channel contigs, and 4 sodium (Na + ) channel contigs. Comparing the ion channels pulled from the transcriptome against that of A. californica, M. musculus, and D. melanogaster provided additional confidence in our identification. Ion channel subtypes with descriptions of the putative currents known to be generated by their orthologs, as well as their NCBI accession numbers, are reported in Tables 3 and 4.
For the voltage-gated K + channel family, we identified 4 shaker-like, 5 shab-like, 4 shawlike, and 3 shal-like K + channel contigs, named according to their original identification in Drosophila [46]. The mammalian equivalents are K v 1, K v 2, K v 3, and K v 4, respectively. Clustering of these voltage-gated K + channel conitgs resulted in discrete nodes separating the four subfamilies (Fig 4). In these clustering analyses, we do not mean to imply true phylogenetic relationships, but rather use this as a tool to provide additional confidence to our sequence identification (see Methods). Other voltage-gated K + channel contigs identified from the H. verbana CNS transcriptome include three members of the slow delayed rectifier KCNQ (K v 7) family and 8 members of the "ether-a-go-go" like KCNH family ( Table 3). As stated in the methods, the numbering of each gene name corresponds to the order in which it was identified from our transcriptome, and not to unique membership positions within a subfamily. Also identified was one two-pore domain leak K + (K 2p ) channel similar to the KCNK family of channels. As expected, clustering of H. verbana KCNK and KCNQ with that of A. californica, D. melanogaster, and M. musculus resulted in two major nodes for each subfamily (Fig 5).
Other identified H. verbana K + channel contigs included three large (BK) and three small (SK) conductance Ca 2+ -activated K + channels labeled BKKCa and SKKCa, respectively. One Na + -activated K + channel KCNT was also identified. The clustering of BKKCa, SKKCa, and KCNT from the four species resulted in three unique nodes in the clustering of these K + channels ( Fig 6A). The last of the other K + channels identified was the inward rectifier K + channel IRK, of which four sequences were found resembling this family (Table 3).
For ion channels non-selectively permeable to cations, 4 cyclic nucleotide-gated (CNG) channel contigs and 5 transient receptor potential (TRP) channel contigs were identified in the CNS transcriptome of H. verbana. The TRP channels from H. verbana were compared against the TRP-M (Melastatin), TRP-A (Ankyrin), and TRP-V (Vanilloid) subtypes of D. melanogaster, A. californica, and M. musculus (Fig 6B). The clustering results indicated that the H. verbana TRP channels were not similar enough to assign a specific subtype, as they were equally distant to TRP-A and TRP-V. Clustering of the CNG channels along with IRK and KCNH separated out into three major nodes for each gene family (Fig 7).
The voltage-gated Na + and Ca 2+ channel contigs identified from the H. verbana CNS transcriptome totaled 13 sequences: three voltage-gated Na + and 10 voltage-gated Ca 2+ channels (Table 4). Voltage-gated Na + and Ca 2+ channels were enumerated using roman numerals rather than numbers to avoid implying-for instance-that calcium channel one belonged to the CaV1 family of voltage-gated calcium channels. One Na + leak channel contig orthologous to NALCN was identified. This was expected as it has been shown that NALCN appears with only one copy in the vast majority of species studied [47]. Clustering of the voltage-gated Na + and Ca 2+ channels, as well as the Na + leak channel, yielded an expected separation where the Na + leak channels were more similar to the voltage-gated Na + channels than voltage-gated Ca 2+ channels (Fig 8).

Comparison of ion channel content across leech species
Following identification of ion channels in the H. verbana CNS transcriptome, we carried out a similar methodology for predicted coding sequences from the genome of the Californian Table 3. K + , TRP, and CNG channel putative currents and accession numbers from CNS transcriptome.

Channel Family
Gene Name Current/Channel Type H. verbana accession

Shaker1
Voltage-gated A-type potassium (I A or K v 1) MG973375

Shaker2
Voltage-gated A-type potassium (I A or K v 1) MG973376

Shaker3
Voltage-gated A-type potassium (I A or K v 1) MG973377

Shaker4
Voltage-gated A-type potassium (I A or K v 1) MG973378
Overall, relatively similar sequence numbers were found between the two species when the same methods were employed; however, more putative sequences belonging to ion channel families were determined to be present in the predicted CDS from the genome of H. robusta as compared to the CNS transcriptome of H. verbana. This difference is potentially due to the transcriptome of H. verbana representing only CNS tissue, whereas the predicted CDS of H. robusta reflects the full genome. The genome sequence may also reveal ion channel pseudogenes that are not actually expressed. The corresponding H. robusta predicted CDS ID numbers can be found in Table 5.

GABA receptors
From the H. verbana CNS transcriptome, 12 putative GABA receptor contigs were identified (Table 5). While these sequences were identified as GABA receptors using BLAST top hits from D. melanogaster, M. musculus, and A. californica (see Methods), we also noted that they shared some sequence similarity with glycine receptors as well, which perhaps is not surprising considering both receptor families include ionotropic receptors that produce Clcurrents. The identified H. verbana GABA receptors were more similar to ionotropic rather than metabotropic GABA receptors from other species (Fig 10).

Glutamate receptors
In our search of glutamate receptors in the CNS transcriptome of H. verbana we identified three metabotropic glutamate receptor contigs and 11 ionotropic glutamate receptor contigs, of which we were confident enough to divide the ionotropic receptors into 9 "kainate-like" and two "NMDA-like" glutamate receptors ( Table 7). Clustering of these sequences resulted in the major separation of metabotropic and ionotropic glutamate receptors, followed by the minor separation of ionotropic receptors into kainate and NMDA (Fig 11).

Acetylcholine receptors
The acetylcholine receptor content of the H. verbana CNS transcriptome includes 20 putative acetylcholine receptor contigs, of which 4 were identified as muscarinic, 14 as nicotinic, and 2 NCBI's COBALT tool generated both amino acid sequence alignment and, subsequently, tree diagrams. These tree-based analyses are not meant to represent true phylogenetic relationships, but rather add another layer of confidence to the identification of putative gene-family subtypes; thus, bootstrap values were not analyzed. Transcript prefix identifications are as follows: "Hv" = H. verbana, "Mm" = Mus musculus, "Ac" = Aplysia californica, and "Dm" = Drosophila melanogaster. H. verbana voltage-gated K + channel subtypes, putative membrane currents, and accession numbers can be found in Table 3.
https://doi.org/10.1371/journal.pone.0201206.g004 Leech central nervous system transcriptome that were not identified as muscarinic or nicotinic based on ambiguity in their BLAST results ( Table 7). The clustering of the acetylcholine receptors resulted in a clear distinction between the muscarinic and nicotinic acetylcholine receptors (Fig 12). Note that the two unclassified acetylcholine receptors clustered more similarly to nicotinic acetylcholine receptors than muscarinic ones.

Transmitter-related genes
Beyond ion channels and receptors, the genes that synthesize, break down, and transport neurotransmitters are important for distinguishing neuronal types throughout the nervous system. We specifically identified putative tyramine beta-hydroxylase (TBH), acetylcholinesterase (ACHE), vesicular glutamate transporter (vGlutT), choline acetyltransferase (ChAT), and vesicular acetylcholine transporter (vAChT) orthologous contigs ( Table 8). Clustering of the H. verbana transmitter-related sequences with the A. californica, M. musculus, and D. melanogaster corresponding sequences created five distinct clusters (Fig 13).

Ion channel RNA-seq counts
In order to gauge a rough abundance of the contribution of each ion channel contig to the CNS of H. verbana, all paired-end reads were mapped to the reference transcriptome using Kallisto [42] mapping software to generate a transcripts per kilobase million (TPM)  Table 3. normalized count of read alignments to each sequence (Fig 14). We want emphasize that while these RNA-seq counts represent a sample size of N = 1, these data are still useful in comparing relative transcript abundances. We also included the housekeeping genes GAPDH (contig Fig 6. TRP, Ca 2+ -activated K + , and Na + -activated K + channels identified in H verbana CNS transcriptome. Trees were produced in the same manner as in Fig 4. H. verbana TRP channels were more similar to A-or V-like than Mlike, but could not be further described accurately, being equal nodes away from the A-and V-like clusters. H. verbana TRP, Ca 2+ -activated K + , and Na + -activated K + channel subtypes, putative membrane currents, and accession numbers can be found in Table 3.   Table 3. DN38861), EF1α (contig DN32737), and actin (contig DN35234) to give a sense of relative channel and receptor abundance. By far, the most abundant ion channel transcript is KCNQ3, a voltage-gated slow delayed rectifier K + channel. For Ca 2+ channels, CaV-VIII and CaV-IX both show strong expression, while CaV-VI, CaV-IV, and CaV-II have relatively low expression. The Na + channel family has NaV-I as its most abundant member, followed closely by NaV-II and NALCN. The predominant TRP channel expressed is TRP1 with abundances greater than that of TRP2-5. The CNG channels overall seem to be less abundant in the CNS of the leech, with CNG4 as its most abundant member. However, even the ion channels with a TPM < 1000 could still play an important role in neural function, particularly if they are only  Table 4 with accession numbers and putative associated membrane currents.
https://doi.org/10.1371/journal.pone.0201206.g008 Leech central nervous system transcriptome  Table 6. Biogenic amine and GABA receptor accession numbers from CNS transcriptome.   Table 5. expressed in certain neuron types, which could explain the lower relative abundance in the whole CNS.

Receptor RNA-seq counts
Repeating the same RNA-seq Kallisto mapping analysis for receptors yielded a greater overall abundance than that of the ion channel contigs (Fig 15). Particularly, kainate-like receptors were massively abundant in the transcriptome. The putative ionotropic glutamate receptor Kainate-8 contig was nearly an order of magnitude greater than the next most abundant sequence, Kainate-2, as is noted with the line break where Kainate-8 approaches a TPM of nearly 300,000. While kainate-like orthologs in the CNS certainly seem to be the dominant ionotropic glutamate receptor (holding the top 3 most abundant positions), NMDA1 also generated a fairly large TPM. The metabotropic glutamate receptors have only one strongly expressed member: mGluR2; mGluR1 and mGluR3 both have some of the lowest receptor abundances out of the data set. For GABA receptor contigs, GABAr8 is the most abundant, with GABAr5 and GABAr9 having strong abundance as well. The acetylcholine receptor abundance favors nicotinic acetylcholine receptors over muscarinic, with 7 nicotinic receptors having higher abundance than the highest muscarinic, mAChR4. The highest expressed nicotinic acetylcholine receptor contig is nAChR6. The two acetylcholine receptor contigs that were not given muscarinic or nicotinic designations had generally lower abundances as well. For the biogenic amine contigs, serotonin receptors dominate with expression far exceeding that of octopamine and dopamine. However, dopamine receptorcontigs still show much stronger abundance than octopamine receptors, which had some of the lowest expression levels noted.  Table 5. https://doi.org/10.1371/journal.pone.0201206.g010

Discussion
This study provides the first annotated CNS transcriptome for the leech H. verbana. A previously generated nervous system transcriptome obtained from H. medicinalis provided some of the first extensive RNA-seq data for a related leech species [16], but that transcriptome used selected ganglia (G2, G10, and G19) to examine spatial differences in expression patterns, whereas the present study surveyed the entire CNS. The inventory of ion channels and receptor contigs from the CNS transcriptome is a valuable community resource for further investigations. Prior to our work, NCBI had 110 entries for H. verbana amino acid sequences, derived primarily from genes encoding functions related to the mitochondria, coagulation, and gap junctions (innexins). In the present study, we have more than doubled the total number of individually curated and annotated GenBank sequences for H. verbana through the mining of this CNS transcriptome for transcripts related to neural function.

Kainate-like Receptors
https://doi.org/10.1371/journal.pone.0201206.t007 An extensive characterization of Na + , K + , and Ca 2+ currents in the leech CNS has previously been examined using voltage-clamp measurements, ion replacement, and pharmacological blockers to understand the role that these ionic currents play in cultured leech Retzius (R), anterior pagoda (AP) and sensory neurons, attributing different membrane properties to distinct ion channel complements [49]. In the pressure-sensitive mechanosensory (P) neurons, it  Table 7.
https://doi.org/10.1371/journal.pone.0201206.g011 Leech central nervous system transcriptome was shown that the probabilities of K + -channel open and closed states could be modulated by different phosphorylation events mediated by a 5HTR subtype [50]. The fast-transient A-type potassium current also increased with the age of cultured AP neurons, while other potassium  Table 7.
https://doi.org/10.1371/journal.pone.0201206.g012 Leech central nervous system transcriptome currents remained unchanged [51]. Through patch clamp experiments, the potassium leak channel KCNK has been shown to be affected by axotomy of AP neurons through patch clamp experiments, which showed that the number, but not the properties, of KCNK channels was changed following the loss of an axon [52]. A persistent cesium-sensitive inward current, potentially carried by IRK, has also been characterized in cultured R cells with voltage-clamp measurements during pharmacological blockade [53]. The dynamics of the Na + -activated K + channel KCNT have been examined in leech P cells where membrane resistance was strongly influenced by changes in outwardly directed membrane current with changing intracellular Na + concentration [54].
While our analysis identified ten sequences in the Ca 2+ channel family, in this study we did not functionally classify these channels into further subcategories, such as L-, N-, R-, T-, or P/ Q-type. Such one-to-one classification between mammals and invertebrates or non-mammlaian vertebrates can be problematic, and is better done thorough a combination of firsthand verified full-length sequence information combined with pharmacological profiling of the resulting currents (e.g. [55,56]). Other investigations into the Ca 2+ channel content of the leech CNS have suggested that the vast majority of leech neurons contain voltage-gated Ca 2+ channels similar to L-type channels found in vertebrates [57], as well as showing heart neurons with Ca 2+ dynamics that resemble T-type Ca 2+ channels. L-type Ca 2+ channels have also been implicated in the somatic release of serotonin in cultured R cells [58]. In H. medicinalis, four putative Na + alpha subunits have been previously identified that were sequenced through subcloning, and RT-PCR revealed their differential expression across cell types of the nervous system [59]. Our analysis found three putative H. verbana Na + channels which most closely matched the 1, 2, and 4 isoforms of H. medicinalis. Sodium-channel blocking tetrodotoxin (TTX) sensitivity in leech Retzius neurons displayed altered responses to frequency stimulations, suggesting a role of TTX-sensitive sodium channels in sensitization and habituation [60]. The non-specific cation TRP channels are thought to be the primary channel responsible for nociceptive, temperature, and pressure sensations. Multiple responses to nociceptive stimuli in neurons residing within the leech segmental ganglia have been reported, with cells responding to capsaicin, mechanical stimuli, temperature perturbations, and pH changes [61]. The five H. verbana TRP channels identified in this study were similar to vanilloid (TRPV) or ankyrin (TRPA) as compared to other types, but classifying these five channels further proved difficult. Evidence supports the presence of capsaicin-sensitive TRPV channels from capsaicin responses in nociceptive neurons being reduced in the presence of a selective TRPV1 antagonist [62]. Endocannabinoid pathways have also been shown in the leech nervous system to be reliant on TRPV channels [63].
Both ionotropic and metabotropic glutamate receptors have been studied for their role in the CNS of the leech. Metabotropic glutamate receptors in the leech, of which we identified three putative sequences in this study, have been found to increase intracellular Ca 2+ by drawing on intracellular Ca 2+ stores when mGluR-selective agonists were applied [64]. Two unpublished ionotropic glutamate receptors have been previously identified, simply described as glutamate receptor 1 and 2 (Accession Numbers ARJ36889.1 and AGL96589.1), which correspond to sequences we have identified as Hv-Kainate1 and Hv-kainate2, respectively. Ionotropic glutamate receptors were identified throughout the neuropil and neural somata within ganglia comprising the ventral nerve cord using immunostaining procedures [65]. Our present study identified 9 kainate-like ionotropic glutamate receptors. These receptors most resembled kainate-like and not AMPA-like receptor orthologs; however, these pharmacological designations cannot be determined directly from sequence data, and therefore such receptors can only be identified as a distinct subtype family from the putative NMDA-like receptors. While the vast majority of ionotropic glutamate receptors were kainate-like, we also identified two NMDA-like receptors. NMDA receptors have been implicated as a requirement for the production of long-term potentiation (LTP) in the leech nervous system [66]. One partial NMDA-receptor has been previously identified (Accession No. ACE95895.1), which most closely resembles Hv-NMDA1 from H. verbana.
In previous studies, autoradiographically-labeled GABAergic neurons were observed throughout the ventral nerve cord of the leech, revealing approximately thirty GABAergic neurons per each segmental ganglion in bilaterally paired or unpaired configurations [67]. It has been shown that GABAergic synaptic responses in the leech can have either excitatory or inhibitory effects, inducing hyperpolarization in pressure-sensitive cells while depolarizing nociceptive cells, mediated in part by Clhomeostasis [68].
Dopamine and serotonin have been extensively studied in connection with many behaviors in the leech. Dopamine has been found to activate fictive crawling in locomotor CPG networks [69] while also inhibiting swimming behavior [70]. Fictive swimming behaviors can also be Horizontal bar plot ordered based on read alignment with color coordinating to channel families. Read counts were normalized using TPM method. Inset displays zoomed in visualization of channels where TPM fell below 1000. Housekeeping genes are displayed with a scale bar that is an order of magnitude greater than that of the ion channels.
https://doi.org/10.1371/journal.pone.0201206.g014 inhibited through application of both serotonin and octopamine to the brain [71] or promoted when administered separately to the entire nervous system [72]. Serotonin has further been implicated in increasing force production in the body wall muscles associated with locomotion and feeding behaviors [73]. In the recently identified stomatogastric nervous system (STN) of H. verbana, serotonin-immunopositive somata were identified in the stomatogastric ganglia (STGs), while no dopaminergic somata were found to be present [74], further strengthening the association of serotonin with feeding behavior.
Our analysis of acetylcholine receptors from the leech CNS transcriptome yielded 20 different putative receptor contigs, with 14 being nicotinic-like, 4 being muscarinic-like, and two acetylcholine receptors that were unclassified into a subfamily. In the leech, acetylcholine and the muscarinic agonist carbachol have been shown to inhibit heart interneurons and motor neurons [75]. Acetylcholine application has also been shown to increase both the frequency and total number of action potentials in leech pressure-sensitive cells [76]. Removal of the axon from the leech anterior pagoda neuron has been shown to reduce the density of acetylcholine receptors in that neuron, leading to reduced sensitivity to acetylcholine [77]. Acetylcholine has also been shown to mediate Ca 2+ increases in glial cells of the neuropil through the action of nicotinic acetylcholine receptors [78].
Without the enzymes and transporters responsible for synthesizing and concentrating neurotransmitters, neuronal communication as we know it would not be possible. The enzyme responsible for converting tyramine to octopamine is tyramine beta-hydroxylase (TBH), which has been shown through antibody labeling to be associated with identified octopaminergic neurons in the cephalic and terminal ganglia of the leech [48]. The acetylcholine synthesizing enzyme choline acetyltransferase (ChAT) has been detected in excitatory motor neurons of the leech CNS, but not in the Retzius cells or mechanosensory cells [79]. Acetylcholinesterase (ACHE) activity has been found throughout the leech CNS, as well as in the blood, with most ACHE-positive neurons being cholinergic motoneurons [80]. The vesicular acetylcholine transporter (vAChT) and vesicular glutamate transporter (vGluT) were also identified in this study; to the best of our knowledge, this is the first identification of these sequences in the medicinal leech.

Conclusions
This transcriptomic study lays a foundation for further molecular analyses in the leech preparation that has been a stalwart contributor to our understanding of the fundamentals of nervous system function and behavior. The 126 ion channels, receptors, transporters, and enzyme contigs individually described in this study, as well as the entire annotated CNS transcriptome, provide a strong reference for not only H. verbana, but future comparative analyses across the nervous systems of related species [34]. These data stand as a substantial foundation for future work with more focused bioinformatics, sequence assembly, and supplementation with further sequencing approaches to curate fully genes of interest related to neural function. With highquality transcripts readily available, the additional incorporation of qRT-PCR, siRNA, overexpression [81], and RNA-seq approaches promise to lower the hurdles towards addressing some of the outstanding issues remaining to be addressed in the neurosciences.